|
CPUID Algorithm Wars
Robert R. Collins
Robert is a design verification manager at Texas Instruments' Microprocessor Design Center. Robert can
be reached via e-mail at rcollins@ti.com.
In September, I presented the basics for determining the identification of specific families of x86
processors--the 8086/88, 80186/88, 80286, 80386, 80486, Pentium, Pentium Pro, and the like. In doing
so, I analyzed the processor-detection algorithm distributed by Intel, noting that it failed to detect some
processor families, contained a bug that caused it to misdiagnose some processors, depended on
undocumented processor behavior, and wasn't guaranteed to work outside of real mode.
In spite of these shortcomings, the Intel algorithm remains in widespread use primarily because it's
supported by Intel, and it works most of the time. I concluded by mentioning the possibility of writing an
algorithm that was more reliable, detected more processor types, didn't rely on undocumented processor
behavior, and returned the actual processor-stepping information--even on processors that don't support the
CPUID instruction.
Much like Intel's, my technique uses an algorithm to detect different processor families that haven't
employed the CPUID instruction. My algorithm executes a series of instructions that do not exist on all
processor families. The algorithm detects if a processor supports a particular instruction, thereby
detecting the processor family. For example, Intel implemented at least one new instruction with each
successive processor family. My algorithm detects processor differences by attempting to execute these
newly implemented instructions. If the instruction doesn't exist, an invalid-opcode exception is generated.
My algorithm intercepts this exception and reports that an invalid opcode was detected. Once a newly
implemented opcode executes successfully, the processor family has been determined. This will not work on
the 8086/88 however, as this microprocessor did not employ the invalid-opcode exception type.
The 8086-family and 80186-family processors must be detected in a different manner. The iAPX 286
Programmer's Reference Manual, Appendix D (Intel, part #210498) provides critical information that
benefits this algorithm:
"The iAPX 286 will push a different value on the stack for PUSH SP than the iAPX 86/88."
The 8086/88 and 80186/188 had a bug in the PUSH SP microcode. The 8086 and 80186 stored the value of
SP on the stack after it was decremented, instead of decrementing SP first and storing the resultant value
on the stack. Any program that executed PUSH SP and a subsequent POP SP would not receive the same
value for the PUSH/POP combination. Therefore, this well-documented difference is exploited to detect the
difference in these families of processors. Examples 1(a) and 1(b) illustrate use of the PUSH SP microcoded algorithm.
While the 80186 also demonstrates this same PUSH SP behavior, it also implements the invalid-opcode
exception type. Therefore, it is possible to install an invalid-opcode exception handler and execute an
instruction that exists only on the 80186-family processors. This would determine whether an
80186-family processor was installed. However, I couldn't use this approach. I needed to differentiate the
8086 from 80186, and therefore couldn't use the invalid-opcode exception for this purpose, as it does not
exist on the 8086. Instead, I relied upon another documented difference between the 8086 and 80186. This
difference is mentioned in the 80186/188, 80C186/C188 Hardware Reference
Manual (#270788), page A-2:
When a word write is performed at offset FFFFh in a segment, the 8086 will write one byte at offset FFFFh,
and the other at offset 0, while an 80186 family processor will write one byte at offset FFFFh, and the
other at offset 10000h (one byte beyond the end of the segment).
All processors from the 80286 to 80486 are detected by installing an invalid-opcode exception handler and
executing instructions unique to each processor. When an instruction executes (doesn't generate an
exception), the processor family has been determined.
Any processor family that implements the CPUID instruction does not need to use this technique. This would
include all Pentium-class processors, late-model 486s, and some non-Intel x86-compatible processors.
Once a 386- or 486-class processor has been detected, my algorithm blindly executes a CPUID instruction.
I don't make any attempt to verify the existence of the CPUID instruction using the Intel-recommended
approach (attempt to toggle EFLAGS.ID). I do this because
The remainder of the algorithm depends on the success or failure of the CPUID instruction. If CPUID exists,
it is used to determine whether or not the processor is a genuine Intel processor. If it is, the
processor-detection algorithm is completed; if not, the results are modified with a flag to indicate a
non-Intel processor (see the comments in CPUID.ASM, available electronically). If the CPUID instruction
doesn't exist, an invalid-opcode exception occurs, and the processor model and stepping are determined by
other means.
Detecting the Processor Model
Detecting the processor model on processors without the CPUID instruction is considerably more difficult.
For example, the 80386, 80486, and Pentium have many model types. The 80386 family consists of
models DX, SX, CX, EX, and SL. The 80486 family consists of models DX, DX2, DX4, SX, SX2, and SL, along
with the 80487 DX and SX. The Pentium family is still growing, but currently has the P5, P54C,
OverDrive, and OverDrive for DX4 processors. Sometimes it is possible to detect these different models,
and other times it isn't. (The Intel algorithm makes no attempt at determining any processor models.)
The 80386 DX and SX may be differentiated from each other by detecting differences in the CR0 register.
Bit 4 of this register (CR0.ET) may be toggled on the 80386 DX, while it is hardwired to 1 on the 80386
SX, CX, EX, and SL. This difference is found by comparing Figure 2.2 of the 386 SX microprocessor data
sheet (#240187) to the other 386-class processor data sheets. A further distinction may be made to
detect the 80386 SL, which has the ability to return its processor identification by executing a series of
I/O operations (IN/OUT instructions). Accessing this internal register is documented in the 80386 SL data
sheet (#240815), section 8.2. In this case, the actual family, model, and stepping information is
returned.
In theory, the 80486 DX and SX may be detected in the same manner as the 80386 DX and SX. This
difference is documented in Figure 2.5 of the 486 Microprocessor, 487 Math Coprocessor data sheet
(#240950). However, in reality, the documentation is wrong, as CR0.ET is hardwired to 1, regardless of
which 80486 is in use. Therefore, no distinction is made between the 80486 DX and the 80487 SX.
However, the 80486 DX and 80486 SX may be distinguished from each other by detecting the presence of
the floating-point unit (FPU). This algorithm is documented in the 80286 and 80287 Programmer's
Reference Manual (80287 section of the manual), Figure 3-1 (#210498). The 80486 SL processor may
be detected in the exact same manner as the 80386 SL. I did not attempt to write an algorithm to detect the
differences between the 80486 DX2 and DX4. Therefore this algorithm makes no attempt to differentiate
these models.
Detecting the Processor Stepping Information
A deeper level of processor identification includes obtaining the actual stepping information of the
processor. The stepping information is useful in determining which bugs are fixed in a specific model. Even
with this information, you must cross reference the stepping ID with the errata documents to determine
which bugs are fixed. This information is publicly available for the Pentium and Pentium Pro. For all other
processors, it is still kept confidential and released only by nondisclosure agreement.
The stepping information is contained in an internal processor register called the "ID register." On all
80386s and later processors, this information is returned in EDX after the processor completes a RESET.
This information is not provided on any processor family before the 80386. Even though the format of this
register has changed slightly over time, it contains the same essential information. To obtain this ID
register on processors that do not support the CPUID instruction, you must RESET the processor. All PC
compatibles have a means to recover from a RESET and return control to a user program. To do this, you
must store a special signature in CMOS RAM, and a far-jump pointer in memory before the RESET occurs.
After RESET, the BIOS reads this signature and returns control to your program at the specified far-jump
pointer. This doesn't always work to retrieve the processor ID, as there is no guarantee that EDX hasn't
been destroyed by the time the user program regains control. There is another technique, but it is much
more complicated.
To capture the processor ID, you must take control of code execution immediately after a RESET occurs.
Whenever the processor is RESET, the CPU starts execution from physical address 0xFFFFFFF0. The chipset
maps the BIOS from 0xF0000 to the top of memory at 0xFFFF0000. This allows the CPU to start executing
BIOS code immediately after RESET. We want to prevent the BIOS from gaining control of execution after
RESET and, instead, direct control to our own program. In theory, this is impossible, but due to the design
of the PC, there is a way to gain control of execution before the BIOS. Imagine what would happen if the
first instructions the processor fetched from the RESET vector contained a bunch of invalid opcodes. The
processor would generate an invalid-opcode exception and dispatch the invalid-opcode exception handler.
The invalid-opcode exception handler is dispatched by reading a far pointer from address 0:0x18 and
branching to that location whenever such an exception occurs. If you set up an invalid-opcode exception
handler before a RESET occurs, and somehow prevent the BIOS from taking control of execution upon
RESET, the exception handler could save the contents of the CPUID (contained in EDX) before returning
control to your CPUID-detection program. The difficulty in this technique is forcing an invalid-opcode
exception to occur after RESET, upon the first instructions fetched. Normally, this would not occur, as the
first instructions fetched would always be valid BIOS instructions.
Forcing the CPU to fetch invalid opcodes upon RESET is easier than it might sound. This technique doesn't
employ anything as esoteric as reprogramming the chipset to unmask the BIOS Shadow RAM, then modifying
the executable RAM image of the BIOS to contain invalid opcodes, then resetting the processor. Surely this
would work to generate an invalid-opcode exception, but this technique would be specific to each chipset
implementation and is a ludicrous suggestion. There is a much simpler solution. Remember that the CPU
starts executing at physical address 0xFFFFFFF0 upon RESET. If you could force the CPU to execute from a
different physical-memory address or somehow make a different physical-memory address appear at
0xFFFFFFF0, this would likely suffice to cause the CPU to fetch invalid opcodes and generate the requisite
exception. Unfortunately, there is no way to force the CPU to fetch from any address other than
0xFFFFFFF0, so you must think of a technique to force some other memory location to appear at that
address. In this case, you can thank the CPU A20 gate, which is required by every PC-compatible computer.
(I will not explain the CPU A20 gate, or the significance of the CPU address line A20 for PC-compatibility.
For further information on this subject, see http://
www.x86.org/articles/A20.) When the A20 gate is programmed for 8086 compatibility, the CPU address
line A20 is always grounded. This has the side-effect of making physical address 0xFFEFFFF0 map to
0xFFFFFFF0. If a reset occurs when the A20 gate is programmed in this mode, the CPU fetches opcodes
from address 0xFFEFFFF0 instead of the normal location (though the CPU doesn't know the difference). Most
likely, there aren't any valid instructions at this address, and the processor fetches invalid opcodes.
Subsequently, the requisite invalid-opcode exception occurs. Thus, our invalid-opcode exception handler
seizes control of execution and stores the CPUID value from EDX for later use by your program. The
program SHUTDOWN.ASM (available electronically) demonstrates this technique.
There are times when this technique doesn't work. Some chipsets think of this anomaly as a bug that could
lead to a system hang. Such chipsets map the BIOS to both locations (0xFFEF0000 and 0xFFFF0000).
Therefore, regardless of the state of the A20 gate, a system RESET will gracefully invoke the BIOS reset
code instead of allowing the system to hang. When a chipset decodes the BIOS in both regions, this technique
will not work. Therefore, my source code detects this condition and doesn't attempt to use this technique if
it's not going to work.
The Caveats
This algorithm is written for real and v86 mode. However, the algorithm (not the actual implementation)
can easily be adapted to CPL-0 protected mode.
Executing this algorithm while using some memory managers may not work. Older memory managers, like
Microsoft's EMM386.EXE, will not allow an application program to execute its own invalid-opcode exception
handler. Such memory managers halt the system when such an exception occurs. Such behavior by the
memory manager should be considered a design flaw on their part, and not consistent with Intel's original
intent of providing the invalid-opcode exception type. The invalid-opcode exception was intended to trap
invalid opcodes and allow extensions to the x86 architecture. It was envisioned that a software developer
could define his own opcodes and rely on his own invalid-opcode exception handler to implement instruction
extensions. This intention was documented by Intel in the iAPX 286 Programmer's Reference Manual,
Appendix B:
Since the offending opcode will always be invalid, it cannot be restarted. However, the #UD handler might
be coded to implement an extension of the iAPX 286 instruction set. In that case, the handler could advance
the return pointer beyond the extended instruction and return control to the program after the extended
instruction is emulated. Any such extensions may be incompatible with the iAPX 386.
Further supporting my point is the fact that Intel has reserved two opcodes specifically for this purpose.
These opcodes will not be used by Intel and are guaranteed to be reserved on future generations of
processors. These opcodes are documented in the 1995 edition of the Pentium Processor Family
Developer's Manual, Volume 3 (#241430), Appendix A: "0F0B and 0FB9 should be used when
deliberately generating an invalid-opcode exception."
All currently available memory managers pass the invalid-opcode exception to the faulting v86 task. This
development relegates all of the objections lodged against my algorithm into obsolescence. I tested
EMM386-95, 386Max v8.0, and QEMM v8.0, and found that my algorithm works without any problems.
I only make a half-hearted attempt to identify non-Intel processors. Some non-Intel processors are
identified as a byproduct of my algorithm. To accommodate non-Intel processors, I have used two bits of
the result code to indicate their detection (bits 15 and 14). The Intel CPUID processor specification
(#241618) lists these two bits as reserved. Should Intel change their specification, my algorithm would
need modification.
I make no attempt at determining some processor model types. In some cases, it may be possible to take
advantage of processor differences (such as the number of CPU address lines) to detect model differences.
However, any such technique would be at the mercy of the chipset implementation (which can map unused
memory to the system bus), and render the technique ineffective.
The Results
I tested each processor in as many environments as I had available. In all cases, I booted from a Windows
95 boot floppy (except the 80286, which isn't supported by Windows 95). I tested the algorithm without
any device drivers loaded (Clean), and using the latest versions of EMM386, 386Max, and QEMM. I also
downloaded V86MON from Jim Brooks' Web site (http://webusers.anet-dfw.com/~jimb/ x86.htm).
V86MON is a virtual-mode manager that is ideal for testing how a real-mode application will operate in v86
mode. The author claims to have run hundreds of real-mode applications without a single error by providing
near-perfect real-mode emulation. V86MON has four flavors: v86 mode with and without interrupt
virtualization, and enhanced v86 mode with and without interrupt virtualization. Table 1 summarizes the results.
Occasionally, the Intel algorithm fails to properly detect any processor higher than an 80386, instead
identifying an 80486, Pentium, and Pentium Pro as an 80386. (I described this bug in my September
column.) This error manifests itself when I boot without any device drivers (Clean) and while using all of
the V86MON variants. As I found out, the near-perfect real-mode emulation of V86MON was a little too
perfect, as Intel's algorithm demonstrated the same errant behavior when run under V86MON. The bug
didn't manifest itself when run under EMM386, 386Max, or QEMM. In these cases, the bug was obscured
because these memory managers didn't perfectly emulate real-mode behavior. Regardless, my algorithm
isn't prone to this failure.
Intel's algorithm locked up the computer when run on V86MON (normal v86 mode) with interrupt
virtualization enabled. My algorithm ran perfectly in this environment.
Intel's algorithm detects the AMD 5k86 as an 80486-class processor, even though it has all of the features
and instructions of the Pentium, including the CPUID instruction. My algorithm correctly detects this
processor as a non-Intel, Pentium-class processor.
Intel's algorithm detected the Cyrix 6x86 as an 80486-class processor, even though it has all of the
performance and instructions of a Pentium-class processor. In real mode, my algorithm used the
processor-shutdown technique to correctly detect the 6x86 as a non-Intel, Pentium class processor.
Outside of real mode, my algorithm also detected the 6x86 as an 80486-class processor.
Intel's algorithm detected the Nexgen Nx586 as an 80386-class processor. My algorithm detected the
Nx586 using CPUID, and correctly reported it as a non-Intel, Pentium-class processor.
Overall, both algorithms produce similar results and operate properly in a wide range of environments.
Back when the CPUID algorithm wars were raging (circa 1988, 1989), the pundits lauded the Intel
algorithm over my approach, claiming that their algorithm worked in a wider range of operating
environments. I argued the technical correctness of my algorithm and claimed the memory managers, which
kept my algorithm from running, were designed incorrectly. Therefore, I was proud to discover that all of
the memory managers I tested for this column have fixed their designs. This allows my algorithm to run
without failure. As my results demonstrate, my algorithm works in a wider range of operating
environments, isn't prone to misdiagnosing the processor family, isn't prone to misdiagnosing non-Intel
processors, provides the processor model type where applicable, and provides the model and stepping
information on processors that don't support the CPUID instruction. In addition to these benefits, my
algorithm is adaptable to protected mode, whereas Intel's isn't. This should settle the CPUID algorithm wars
once and for all, or at least for the next delta of time.
(a)
PUSH(SP) {
SP = SP - 2;
[SS:SP] = SP;
}
(b)
PUSH(SP) {
TEMP = SP;
SP = SP - 2;
[SS:SP] = TEMP;
}
Clean EMM386 386Max QEMM V86Vi V86NoVi Ev86Vi Ev86Nvi
80286
Intel Algorithm 2 N/A N/A N/A N/A N/A N/A N/A
My Algorithm 20F N/A N/A N/A N/A N/A N/A N/A
80386 DX
Intel Algorithm 3 3 3 3 Hang 3 N/A N/A
My Algorithm 30F 30F 32F 30F 30F 30F N/A N/A
80386 SX
Intel Algorithm 3 3 3 3 Hang 3 N/A N/A
My Algorithm 320 32F 32F 32F 32F 32F N/A N/A
80486 DX
Intel Algorithm 4 4 4 4 Hang 4 N/A N/A
My Algorithm 402 40F 40F 40F 40F 40F N/A N/A
Intel 486 DX-50
Intel Algorithm 4 4 4 4 Hang 4 N/A N/A
My Algorithm 411 40F 40F 40F 40F 40F N/A N/A
Intel 486 DX4
Intel Algorithm 4 4 4 4 Hang 4 N/A N/A
My Algorithm 480 480 480 480 480 480 N/A N/A
Pentium
Intel Algorithm 5 5 5 5 Hang 5 5 5
My Algorithm 52B 52B 52B 52B 52B 52B 52B 52B
Pentium Pro
Intel Algorithm 6 6 6 6 Hang 6 6 6
My Algorithm 611 611 611 611 611 611 611 611
Texas Instruments 486
Intel Algorithm 4 4 4 4 Hang 4 N/A N/A
My Algorithm 40F 40F 40F 40F 40F 40F N/A N/A
AMD 5k86
Intel Algorithm 4 4 4 4 Hang 4 4 4
My Algorithm C500 C500 C500 C500 C500 C500 C500 C500
Cyrix 6x86
Intel Algorithm 4 4 4 4 Hang 4 4 4
My Algorithm 4520 40F 40F 40F 40F 40F N/A N/A
Nexgen
Intel Algorithm 3 3 3 3 Hang 3 N/A N/A
My Algorithm C504 C504 C504 C504 C504 C504 N/A N/A
|