3. Implementation/Windows API

Just Enough Assembly Language to Get By

SSKK 2008. 9. 30. 22:37

At a recent lunch troika of MSJ columnists (Paul DiLascia, John Robbins, and me), we were commenting on how so few of today's programmers are skilled in what was essential knowledge just a few years ago. For instance, we all agreed that many programmers lack even a basic understanding of assembly language. In the idealized world presented by most language vendors, coding is so easy that there are no bugs to speak of. And if there ever was a bug, you'd certainly be able to find it easily. No need to resort to messy instruction-by-instruction code slogging, no sir.
Contrast that utopian vision with your own experience. How many times have you been in your debugger stepping through somebody else's code in assembly language because there's no source available? This is especially annoying when some third-party component blows up and you're assigned to track down the problem. Even when debugging your own code, knowing a little assembly language can help you figure out why your high-level language code isn't working the way you think it should. Just put the debugger into mixed source/assembly mode and observe how the compiler translated your code into machine instructions.
Paul DiLascia observed that there's a big difference between programming in assembler and knowing just enough to get by in a pinch while debugging. He jokingly suggested an "assembly language survival guide" that would cover just enough to debug the most common situations. Sounds like a darn good idea to me, so this column presents "Matt's Just Enough Assembly Language to Get By." Think of it as a cram course in Intel x86 assembly language, with all of the esoteric stuff omitted. Afterward, I'll show the assembler code for a typical procedure, and show how its operations can be inferred by the instructions I've covered.
Before jumping into the various instructions and instruction sequences, let me add a couple of prefaces and warnings. First, I'm going to describe only 32-bit Intel code. If you're still stuck programming in 16-bit land, my sympathies. Second, different compilers from different vendors generate different code. However, what I describe here should apply to all compilers (including Visual Basic
®
5.0 when generating native code.)
Third, don't be surprised if you encounter instructions and instruction sequences that aren't mentioned below. Most compilers use only a small fraction of the instruction set available to them (at least on the Intel platform). But many compilers support inlining of raw assembly language. This allows assembly language gurus to use CPU instructions that the compiler isn't aware of. An inline assembler may be used to optimize a particular sequence, or it may be used to get at CPU-specific instructions such as the timers available on Pentium-class CPUs. In addition to inline assembly code, don't forget that programmers sometimes write entire source modules in assembly language—hard to believe, isn't it?
Just as most 32-bit compilers use only a small fraction of the available instructions, they also use only a subset of the registers of the CPU. Since so much of what I'll describe depends on the registers, a quick review of the commonly used Intel x86 register set is in order. In
Figure 1, all registers are 32 bits except where noted. "Multipurpose" means the register can hold any arbitrary 32-bit value (for example, literal values, addresses, and bit flags).

 

Figure 1    Common Intel x86 Registers

 

EAX

Multipurpose. Return values from a function are usually stored in EAX. Low 16 bits are referenced as AX. AX can be further subdivided into AL (the low 8 bits), and AH (the upper 8 bits of AX).

EBX

Multipurpose. Low 16 bits are referenced as BX. BX can be further subdivided into BL (the low 8 bits), and BH (the upper 8 bits of BX).

ECX

Multipurpose. Often used as a counter, for example, to hold the number of loop iterations that should be performed. Low 16 bits are referenced as CX. CX can be further subdivided into CL (the low 8 bits), and CH (the upper 8 bits of CX).

EDX

Multipurpose. Low 16 bits are referenced as DX. DX can be further subdivided into DL (the low 8 bits), and DH (the upper 8 bits of DX).

ESI

Multipurpose. In certain operations that move or compare memory, ESI contains the source address. Low 16 bits are referenced as SI.

EDI

Multipurpose. In certain operations that move or compare memory, EDI contains the destination address. Low 16 bits are referenced as DI.

ESP

Stack pointer. Implicitly changed by PUSH, POP, CALL, and RET instructions.

EBP

Base pointer. Usually points to the current stack frame for a procedure. Procedure parameters are usually at positive offsets from EBP (for example, EBP+8). Local variables are usually at negative offsets (for example, EBP-16). Sometimes, optimizing compilers won't use a stack frame, and use EBP as a multipurpose register.

EFLAGS

Rarely directly referenced. Instead, instructions implicitly set or clear bitfields within the EFLAGS register to represent a certain state. For example, when the result of a mathematical operation is zero, the Zero flag is toggled on in the EFLAGS register. The conditional jump instructions make use of the EFLAGS register.

FS

16-bit. Under Win32, the FS register points to a data structure with information pertaining to the current thread. FS is a segment register (segment registers are beyond the scope of this discussion). Intel CPUs have six segment registers, but the operating system sets them up and maintains them. Win32 compilers only need to explicitly refer to the FS segment register, which is used for things like structured exception handling and thread local storage.


In addition to being familiar with the registers, it's essential to understand how instruction arguments are used. With the exception of a few obscure cases, all instructions take zero, one, or two arguments. Instructions that take zero or one arguments don't require explanation. For instructions that take two arguments, the first argument is usually the destination, while the second is the source. For example, the "ADD EAX,ESI" instruction adds the contents of the ESI (the source) to EAX. The result is stored in EAX (the destination). Put another way, the first argument is the one that's modified as a result of the instruction.
A basic knowledge of how instructions reference memory is also vital. Some instructions implicitly reference memory. For example, PUSH EAX pushes the current value of the EAX register onto the stack. Where's the stack? It's whatever the ESP register is currently pointing to. Likewise, instructions like SCASB require that the ESI and/or EDI registers contain the address of the memory location you want to use.
Other instructions use arguments to explicitly state the address to be used. You can usually tell this by the presence of square brackets in the instruction. For example, "MOV EBX,[00401234]" reads from the address 0x00401234. Another form of addressing uses registers and possibly offsets. For example, in "MOV EBX,[ECX]", the ECX register contains an address (also known as a pointer by C++ users). The instruction "MOV EBX,[EBP+8]" reads from the address calculated by adding 8 to the contents of the EBP register.
Intel CPUs have a very formal definition for allowable forms of instruction addresses. It's complex enough to make most people's heads swim. If you know what a modR/M byte is, or know how S-I-B addressing works, then you already know more than this column can teach you. In the "Just Enough to Get By" guide, the preceding paragraph should be enough.
With the theory part over with, let's now look at the most common instructions and instruction sequences. I've grouped them into several categories rather than sorting them alphabetically. As you'll see, some instructions are used in multiple categories.

 

Procedure Entry and Exit


These instructions are automatically inserted by the compiler to create a standard method for accessing parameters and local variables. This method is called a stack frame, as in "frame of reference." In fact, the Intel CPU dedicates the EBP register to maintaining a stack frame. For this group of instructions, it's especially important to note that not every procedure will use exactly the same sequence, and that certain things may be omitted entirely.

Sequence PUSH EBP / MOV EBP,ESP / SUB ESP,XX
Purpose Sets up the EBP stack frame for a new procedure
Examples

  PUSH    EBP

  MOV     EBP, ESP

  SUB     ESP, 24

Description "PUSH EBP" saves the previous frame pointer on the stack. "MOV EBP,ESP" sets the EBP register to the same value as the stack pointer (ESP). "SUB ESP,XX" creates space for local variables below the EBP frame.
In optimized code, you may see this sequence interspersed with other instructions (for example, "PUSH ESI"). Since "PUSH EBP" and "MOV EBP,ESP" both use the EBP register, a processor with multiple pipelines would ordinarily need to stall one of the pipelines. By interspersing other instructions that don't use the EBP register, the processor can do more work in the same amount of time.

Instruction ENTER
Purpose Sets up the EBP stack frame for a new procedure
Examples

  ENTER 8, 0 ; Sets up stack frame with

             ; 8 bytes of local variables

Description The ENTER instruction first became available on the 80286 processor. It was intended to replace the "PUSH EBP / MOV EBP,ESP / SUB ESP,XX" sequence with a single, smaller instruction. On current processors the ENTER instruction is slower than the three-instruction sequence, so ENTER is rarely used.

Sequence MOVE ESP,EBP / POP EBP
Purpose Removes the EBP stack frame before leaving a procedure
Description The "MOV ESP,EBP" instruction bumps up the stack pointer past any space allocated for local variables on the stack. "POP EBP" restores the stack frame pointer to point at the previous EBP frame. This sequence is normally followed by a return instruction to return control to the calling procedure.

Instruction LEAVE
Purpose Removes the EBP stack frame before leaving
Description The LEAVE instruction is the inverse of the ENTER instruction. It can also be used to remove a frame set up by the "PUSH EBP / MOV EBP,ESP" sequence. The LEAVE instruction is only 1 byte long, which is smaller than the longer "MOV ESP,EBP / POP EBP" sequence. Unlike the ENTER instruction, there's no performance penalty for using it, so some compilers use LEAVE.

Instruction PUSH register
Purpose Saves the previous values of register variables
Examples

  PUSH EBX

  PUSH ESI

  PUSH EDI

Description Sometimes compilers use a general-purpose register to hold the value of parameters or local variables. This can be more efficient than storing the same value in memory. These are commonly known as register variables. The EBX, ESI, and EDI registers are most often used as register variables.
The convention most compilers use is that register variable values are preserved across procedure calls. If the compiler decides to use register variables in a procedure, it is responsible for preserving the value of the registers that it alters (typically, EBX, ESI, and EDI). Typically, compilers preserve these register values on the stack as part of setting up the procedure's stack frame. If the compiler uses only one or two of the aforementioned registers, it needs to preserve only those registers.

Instruction POP register
Purpose Restores the previous values of register variables
Examples

  POP EDI

  POP ESI

  POP EBX

Description In preparing to return from a procedure, the register variable registers need to be restored to their previous values. These instructions remove a value from the stack and place it into the designated register.

 

Accessing Variables


The Intel CPU has many instructions that work with variables, which are just locations in memory. For example, you can add or subtract from a variable representing a counter. Likewise, a variable may contain a pointer to something. There are just too many instructions to describe here, and in most cases the instruction name gives a good clue about what the instruction is doing. However, I will show how variables of different storage classes appear in assembly language.

Instruction instruction [global]
Purpose Global/static variables
Examples

  MOV EAX,[00401234]

  MOV [00401238],ESI

  PUSH [77852432]

  ADD [00620428],00001000

Description When you see an instruction that includes an actual machine address inside the square brackets, it's accessing memory that was declared as either a global or static variable. These addresses are known at program load time, so the instruction contains the actual memory address to read or write.

Instruction instruction [parameter]
Purpose Procedure parameters and this pointers
Examples

  MOV ESI,[EBP+14]

  MOV [ESP+30],EAX

  ADD [EBP+0C],2

  OR  [ESP+20],00000010

Description Parameters to procedures are usually passed on the thread's stack. Since these values are pushed before the procedure call and before the called procedure sets up its stack frame, the parameters appear at positive offsets from the stack frame base pointer (EBP). Just about any instruction that makes reference to memory above EBP (for example, "[EBP+8]") is making use of a procedure parameter. The advantage of using EBP for accessing parameters is that EBP doesn't change throughout the lifetime of a procedure. This makes it easier to keep track of the procedure's parameters.
Prior to the 80386, the only effective way to access parameters was with the base pointer register. The 386 added the ability to access memory just as easily with displacements from the stack pointer (ESP) register. Thus, optimized code can dispense with setting up an EBP frame and still reference parameters by using positive offsets from ESP. For example, "ADD [ESP+20],4" adds four to whatever DWORD is at [ESP+20]. From a debugging standpoint, using ESP to access parameters is inconvenient. Since ESP can change during a procedure, a given parameter may be at different offsets from ESP at different points in a procedure's code.
One last word on parameters. In C++, the this pointer of a member function is really a hidden parameter. Usually the this pointer is the last parameter pushed on the stack before the call. In Visual Basic, the self-referential me is the same thing as the C++ this pointer.

Instruction instruction [local]
Purpose Local Variables
Examples

  MOV ESI,[EBP-14]

  MOV [EBP-30],EAX

  SUB [ESP],2

  AND [ESP+4],00000010

Description From the vantage point of an assembly instruction, local variables aren't much different than parameters when an EBP frame is used. The only distinction is that local variables are at negative offsets from the EBP stack frame. You can get an idea of how big the sandbox for local variables will be by examining the "SUB ESP,XX" instruction near the beginning of the procedure.
Things do get messy when the compiler decides to omit an EBP frame. When this happens, the compiler addresses both local variables and parameters as positive offsets from the ESP register. There's no good way to tell a local apart from a parameter in this situation except to find out how much space the procedure has allocated for locals (see above). If the offset is less than the space allocated, it's a local. Otherwise, it's probably a parameter.

Instruction LEA variable
Purpose Load Effective Address
Examples

  LEA EAX,[ESP+14]

  LEA EDX,[EBP-24]

Description Despite the square brackets, LEA doesn't actually read memory or dereference a pointer. Instead, it loads the first operand with an address specified by the second parameter. For example, "LEA EAX,[ESP+14]" takes the current value of the ESP register, adds 14 to it, and puts the result in EAX.
LEA's primary use is to obtain the address of local variables and parameters. For example, in C++, if you use the & operator on a local variable or parameter, the compiler will likely generate an LEA instruction. As another example, "LEA EAX,[EBP-8]" loads EAX with the address of the local variable at EBP-8.
A less obvious use of LEA is as a fast multiplication. For example, multiplying a value by 5 is relatively expensive. Using "LEA EAX,[EAX*4+EAX]" turns out to be faster than the MUL instruction. The LEA instruction uses hardwired address generation tables that makes multiplying by a select set of numbers very fast (for example, multiplying by 3, 5, and 9). Twisted, but true.

 

Calling Procedures

Instruction CALL location
Purpose Transfer control to another procedure
Examples

  CALL 00682568

  CALL [00401234]

  CALL ESI

  CALL [EAX+24]

Description The CALL instruction doesn't need much explanation in itself. It pushes the address of the instruction following it onto the stack, then transfers control to the address given by the argument. The various ways of specifying a target address are worth mentioning, however.
The simplest form of the CALL instruction is when the argument contains the destination address as an immediate value (for example, "CALL 00682568"). This type of call is almost always to another location within the same module (EXE or DLL). Slightly more complicated is when the CALL instruction indirects through an address (for example, "CALL [00401234]"). You'll see this form of CALL instruction when calling a function imported from another module. It's also seen when calling through a function pointer stored in a global variable.
Two other forms of CALL instruction use registers as part of their address. If just a register name is specified (for example, "CALL ESI"), the CPU transfers to whatever address is in the register. If a register is used within brackets, perhaps with an additional displacement ("CALL [EAX+24]"), the instruction is calling through a table of function addresses. Where would these come from? You may know these tables by the more familiar name of vtables. In the preceding instruction example, the sixth member function is being called. (24 divided by the size of a DWORD is 6.)

Instruction PUSH value
Purpose Places a parameter onto the stack in preparation for calling procedure
Examples

  PUSH [00405234]    ; Push a global variable

  PUSH [EBP+C]       ; Push a parameter

  PUSH [EBP-14]      ; Push a local variable

  PUSH EAX           ; Push whatever is in EAX

  PUSH 12345678      ; Push an immediate value.

Description When it comes to passing parameters, all variations of the PUSH instruction are used by the compiler. Global variables, local variables, parameters, the results of a calculation, and immediate values can all be passed with a single instruction. When you see a sequence of PUSH instructions prior to a CALL instruction, the odds are good that the PUSHes are putting the parameters onto the stack.
As mentioned earlier, if a member function or method is being called, the this or me pointer is usually passed last. In some cases, the this pointer is passed in the ECX register instead. You can identify when this occurs by looking for code that initializes the ECX register and then does nothing with it before the CALL instruction.

Instruction RET
Purpose Return from a procedure call
Examples

  RET

  RET 8

Description The RET instruction returns from a procedure call. It simply pops whatever value is currently at [ESP] into the EIP (instruction pointer) register. The "RET XX" form does the same thing, and then adds XX to the ESP value. This is how __stdcall procedures clear parameters off the stack before returning to their caller. (Most Win32® APIs are __stdcall based.) By dividing the number of cleared bytes by four (the size of a DWORD), you can usually figure out how many parameters a procedure takes. For instance, a procedure that returns with a "RET 8" instruction takes two parameters.
Functions that return an integer or pointer value usually return the value in the EAX register. By examining what's in EAX before executing the RET instruction, you can see the function's return value.

Instruction ADD ESP, value
Purpose Removes parameters off the stack
Examples

  ADD ESP,24

Description When calling procedures that don't remove parameters before returning, it's up to the calling function to remove its parameters. This is the case with cdecl functions, which is the default for C and C++ code. The "ADD ESP,XX" function bumps up the stack pointer so that any passed parameters are below the resulting ESP.
If the function doesn't take a variable number of parameters, the "ADD ESP,XX" instruction gives insight to how many parameters the called procedure accepts. (See the description above for "RET XX".) If the called procedure takes a variable number of parameters (like printf and wsprintf do), the "ADD ESP,XX" instruction tells you how many parameters were passed for that particular CALL.

 

Flow Control


In the context of this column, flow control means code that affects which portions of a program's code are subsequently executed. At the simplest level, this means conditional execution (colloquially known as if statements). More complex flow control sequences such as while loops and for statements are usually built from the lower-level if statement constructs. In one case though (the LOOP instruction), the processor has built-in knowledge of these higher-level language constructs.
Before I get to these instruction sequences, let me highlight two things that can easily trip you up. For starters, the term "Jcc" is used as a stand-in for any of the 16 conditional jump instructions. The cc means condition code.
More insidiously, there are several sets of Jcc instructions that are aliases for one another. For example, JZ (Jump if Zero flag set) is the same instruction as JE (Jump if Equal). Likewise, JNZ (Jump if Zero flag NOT set) is the same instruction as JNE (Jump if Not Equal). Unfortunately, some disassemblers use the JZ/JNZ form, while others use the JE/JNE form. Is this confusing? Yes! The moral of the story: be prepared to mentally substitute an aliased form of the instruction if it makes the code easier to understand.

Sequence CMP value, value / Jcc location
Purpose Compare two values, and branch accordingly

Examples

  CMP    EAX,2
  JE     10036728
 
  CMP    [EBP+20],1000
  JNE    00427824

Description The CMP instruction is used when two values are to be compared. The CMP instruction sets or clears a variety of flags, including the Zero, Sign, and Overflow flags. From this, a variety of Jcc instructions can then be used to branch accordingly. Most often, the JE and JNE instructions follow a CMP instruction.
The following C++ code sequence would be implemented with a CMP / JNE sequence:

 

if ( MyVariable == 2 )
 {
     // Whatever code you want
 }

If the CMP instruction determines that MyVariable isn't 2, the flag will be set so that the JNE instruction that follows will skip over the code in curly brackets.

Sequence TEST value, value / Jcc location
Purpose Determine if a bit is set, and branch accordingly
Examples

  TEST    EAX,EAX

  JNZ     00400124

 

  TEST    EDX,00400024

  JZ      77f85624

Description The TEST instruction does a logical AND of the two arguments, which sets or clears the Zero flag in the EFLAGS register. The next instruction (JZ or JNZ) does a jump to the target address if the Zero flag is set or cleared, depending on the instruction used. If the JZ/JNZ doesn't jump, execution continues at the following instruction.
This sequence is typically used to test one or more bits as part of an if statement. For example, this C++ code could be implemented using a "TEST / JZ" sequence.

 

if ( MyVariable & 0x00400024 )
 {
     // Whatever code you want
 }

If MyVariable has any of the same bitfields set as in the value 0x00400024, the Zero flag won't be set. This prevents the JZ instruction from jumping, and execution falls into the code in the curly brackets.

Instruction JMP location
Purpose Transfer control to some other location
Examples

  JMP 10047820

Description The unconditional JMP instruction occurs in at least three scenarios. The first is as part of an if/else clause. At the end of the code generated for the if clause, a JMP instruction transfers control past all the code in the else clause. Consider this code snippet:

 

if ( MyVariable == 2 )
 {
     // some code
     // JMP past "else" code
 }
 else
 {
     // some other code
 }

The second place where JMP instructions crop up is as part of a loop. At the end of the loop's code, some code sequence determines if it's time to break out of the loop. If the loop isn't finished, a JMP instruction transfers control back to the beginning of the loop's code.
The third scenario where you'll see JMP instructions is when a procedure has a common exit sequence. That is, no matter how many return statements there are in the procedure, there's only one spot in the code that cleans up the stack frame and returns. In this situation, a return statement in the middle of the procedure's code is implemented as a JMP to the common exit sequence code.
It's also possible that you'll encounter a JMP instruction from a goto statement. Fortunately, most programmers don't bother with goto's anymore. Finally, if you see a JMP instruction that simply jumps to the next instruction, you're probably in code that wasn't compiled with optimizations enabled.

Instructions LOOP, LOOPZ, LOOPNZ
Purpose Purpose Jump back to the beginning of a loop's code, if conditions are right
Examples

  LOOP     00401234

  LOOPZ    65432108

Description The LOOP instruction uses the contents of the ECX register as a counter. Each invocation of the LOOP instruction decrements the ECX register. In the simplest case, the LOOP instruction branches back to the beginning of the instruction sequence if ECX isn't zero. The LOOPZ and LOOPNZ only branch if ECX is nonzero, and the Zero flag in EFLAGS is set accordingly.
The C++ for loop construct can be implemented with the LOOP instruction if the number of iterations is known ahead of time. Before executing the actual code inside the loop, ECX is loaded with the number of iterations. At the end of the code inside the loop is a LOOP instruction. After the specified number of iterations, ECX becomes zero and the LOOP instruction doesn't branch.

 

Bitwise Manipulation


The bitwise instructions are used to turn individual bits on and off in a value. The value can be a global variable, a local variable, a parameter, or a register. Here, I'll show the two most common instructions, AND and OR. There's also an XOR instruction, but it's less commonly used.

Instruction AND value,bitfield
Purpose Performs a logical AND of the bitfields of two operands
Examples

  AND    EAX,00001000

  AND    [ESI+4],00000004

Description Unlike the TEST instruction (see above), the AND instruction actually modifies the destination operand. For example, in C++, the statement

 

MyVar &= 0x00010001;

could be implemented as:

 

AND [MyVar],00010001h

The AND instruction is also used to turn off particular bitfields. To do this, the desired bits to be turned off are set to the off (zero) state in the source operand. All of the bits to be left alone are set to true in the source operand.

Instruction OR value,bitfield
Purpose Performs a logical OR of the bitfields of the two operands
Examples

  OR EDX,10101010

  OR [EBP+24],00080000

Description The OR instruction is used to turn on one or more bits in the destination operand. For example, the value of WS_VISIBLE is 0x10000000. The following C++ statement

 

wndStyle |= WS_VISIBLE;

would translate to something like this:

 

OR [wndStyle],10000000h

String Manipulation


The string instructions allow sequences of consecutive memory locations to be processed without branching after every operation. The instructions are able to do this because the CPU dedicates two registers (ESI and EDI) to point at the source and/or destination locations. After every operation, these registers are incremented or decremented based upon the CPU's direction flag. In the real world, the registers are rarely decremented, so from here on out I'll just say "incremented."
When combined with the REP / REPE / REPNE class of instruction prefixes, very powerful code sequences can be implemented using only a single instruction. For example, with one string instruction, you can find the end of a null-terminated string (some setup and assembly required). In addition, the code executes much faster than if it were implemented as a series of instructions in a loop.

Instruction SCASB, SCASW, SCASD
Purpose Scan for a particular BYTE, WORD, or DWORD
Examples

  REPNE    SCASB

  REPZ     SCASD

Description These instructions scan consecutive locations in memory looking for a particular BYTE, WORD, or DWORD. Alternatively, they can be used to find the first occurrence of a value that's different from a target value. The BYTE, WORD, or DWORD target value is placed in the AL, AX, or EAX register. Each iteration of SCASx compares the contents of the AL/AX/EAX register to the memory pointed at by the EDI register, and sets EFLAGS accordingly. Afterwards, EDI is incremented.
To search for the zero byte in an ANSI string, the AL register should be set to zero and EDI should be set to the beginning of the string. The ECX register is set to the maximum number of bytes to search. Finally, the REPNE SCASB instruction executes. The REPNE prefix causes the SCASB instruction to execute until one of two conditions is met. If ECX is zero, no zero byte was found in the entire string. Alternatively, a zero byte was found, and EDI points to the next byte in memory.

Instruction CMPSB, CMBSW, CMPSD
Purpose Compares two strings in memory
Examples

REPE CMPSB

Description These instructions are used to compare the BYTEs, WORDs, or DWORDs pointed to by the ESI and EDI registers with the EFLAGS set appropriately after the comparison. Each iteration of the CMPSx instruction causes the ESI and EDI registers to increment by the appropriate amount (one, two, or four bytes).
It's not hard to see how the C++ memcmp function could be implemented by using the REPE prefix with the CMPSx instructions. The REPE prefix causes the CMPSx instruction to keep iterating while the two memory locations are equal and ECX is nonzero. The memcmp function could be implemented using "REPE CMPSB", although optimized code will use "REPE CMPSD" for the bulk of the string and "REPE CMPSB" for the last three or fewer bytes.

Instruction MOVSB, MOVSW, MOVSD
Purpose Moves BYTEs, WORDs, or DWORDs from the source string to the destination string
Examples

REP MOVSD

Description The MOVSx instructions copy memory pointed to by ESI into the memory pointed at by EDI. After each iteration, ESI and EDI are incremented. Typically, MOVSx is used with the REP prefix to copy a predetermined number of BYTEs, WORDs, or DWORDs. The number to copy is specified in the ECX register. The C++ memcpy function can be implemented using "REP MOVSB".

Instruction STOSB, STOSW, STOSD
Purpose Sets a series of BYTEs, WORDs, or DWORDs to a specified value
Examples

  REP STOSB

Description The STOSx instructions copy the value in AL, AX, or EAX into the memory pointed to by the EDI register. Typically, STOSB is used with the REP prefix to copy the number of bytes specified in the ECX register. The C++ memset function can be implemented with "REP STOSB", or by a combination of "REP STOSD" and "REP STOSB".

 

Miscellaneous


In this final group are random instructions that you'll often encounter. Of the list, "XOR EAX,EAX" is most prevalent.

Instruction XOR register, register
Purpose Sets a register's value to zero
Examples

  XOR EAX,EAX

Description Using the XOR instruction to zero out a register takes less space than the equivalent MOV instruction. For example, "MOV EAX,0" takes five bytes, while "XOR EAX,EAX" uses only two bytes. Is using XOR twisted? Yes. But after years of stepping through assembly code, you too will automatically substitute "zero out the register" when you see this instruction.

Instruction MOVZX DWORD value, byte or word value
Purpose Copies an unsigned value into a larger type
Examples

  MOVZX EAX,BYTE PTR [EBP+8]

  MOVZX EAX,WORD PTR [00451234]

Description In most languages, a value of a smaller type can be copied into or used in place of a larger type. For example, in C++ an unsigned char can be copied into an unsigned short (aka a WORD). Likewise, an unsigned short can be used where an unsigned long is expected. The compiler uses MOVZX (move with zero extend) to convert the smaller type into a larger type. In C++, BYTEs can be converted to WORDs or DWORDS, and WORDs can be converted to DWORDs.

Instruction MOVSX DWORD value, byte or word value
Purpose Copies a signed value into a larger type
Examples

  MOVSX EAX,BYTE PTR [EBP+8]

  MOVSX EAX,WORD PTR [77f81234]

Description In most languages, a value of a smaller type can be copied into or used in place of a larger type. For example, in C++ a char can be copied into a short. Likewise, a short can be used where a long is expected. The compiler uses MOVSX (move with sign bit extend) to convert the smaller type into a larger type. In C++, chars can be converted to shorts or longs, and shorts can be converted to longs.

Instructions: MOV EAX,FS:[0], MOV FS:[0],ESP
Purpose Establish a new structured exception handling frame
Examples

  MOV     EAX,FS:[00000000]

  Push    EAX

  MOV     FS:[00000000h],ESP

Description In Win32, the FS register points to the Thread Environment Block (TEB). A data structure unique for each thread, the TEB contains values that the system uses to control the thread. At offset 0 in the TEB is a pointer to the first node in the structured exception handling chain. When you see code that uses FS:[0], it's usually setting up or tearing down a try block.

Instruction MOV EAX,FS:[18]
Purpose Makes a linear pointer to the TEB
Examples

  MOV EAX,FS:[18]

  MOV EAX,[EAX+24]

Description The TEB is always pointed to by the FS register. To make code portable, it's helpful to use a flat, linear address for the TEB. The TEB's linear address can be found at offset 0x18 in the TEB. Code that reads from FS:[18] is preparing to read some other value from the TEB. Step through all three instructions in GetCurrentThreadId under Windows NT® to see this for yourself.

Instruction MOV ECX,FS:[2C]
Purpose Makes a pointer to the Thread Local Storage (TLS) array
Examples

  ECX,DWORD PTR FS:[0000002C]

  EDX,DWORD PTR [ECX+EAX*4]

Description At offset 0x2C in the TEB is a pointer to the TLS array for the thread. This array contains 64 DWORDs, each corresponding to a particular index value that would be passed to TlsGetValue. Code that uses FS:[2C] is using TLS.

 

To The Code!


To show many of the instructions and sequences that I've described, I wrote the InstructionDemo program. A quick look at the source code in Figure 2 shows that the two functions don't do anything worthwhile. But the code is well commented, pointing out the particular instruction or instruction sequence it's designed to produce.

 

Figure 2   InstructionDemo.CPP

 

//==========================================

 // Matt Pietrek

 // Microsoft Systems Journal, February 1998

 // Program: InstructionDemo.CPP

 // FILE: InstructionDemo.CPP

 //==========================================

 #define WIN32_LEAN_AND_MEAN

 #include <windows.h>

 #include <stdlib.h>

 #include <stdio.h>

 

 // Force these functions inline (/O2 would normally do this

 #pragma intrinsic( memset, strlen, strcmp )

 

 __declspec(thread) int tlsVariable = 0; // Make a thread local variable

 

 int g_myGlobalVariable;                 // Make a global variable

 

 void MySubProcedure( void );

 

 int main( int argc, char *argv[] )

 {

     char szBuffer[128];

     char *pszString = "Hello";

     unsigned long   localUnsignedLong = 2;

     unsigned char   localUnsignedChar = 2;

     long            localSignedLong = 2;

     char            localSignedChar = 2;

     int             i;

 

     g_myGlobalVariable = 0x12345678;        // Assignment to global

            

     localSignedLong = localSignedChar;      // signed type promotion

 

     // Conditional execution   

     if ( localUnsignedLong == 2 )

         localSignedLong = 1;

     else

         localSignedLong = 2;   

 

     // Using TEST

     if ( localUnsignedLong & 0x00040008 )

         i = 3;

 

     // AND'ing off bitfields

     localUnsignedLong &= 0x01020304;

 

     // OR'ing on bitfields 

     localSignedLong |= 0x05060708;

 

     // LOOP code   

     for ( i = 0; i < 4; i++ )

         localUnsignedLong += i;

 

     // Procedure invocation

     printf( "%u %u %08X %s", localUnsignedLong, argc, &argc, szBuffer );

 

     // Using STOSD / STOSB 

     memset( szBuffer, 0, sizeof(szBuffer) );

 

     // Using SCASB

     i = strlen( szBuffer );

 

     MySubProcedure( );

        

     return 0;

 }

 

 void MySubProcedure( void )

 {

     tlsVariable = 2;

    

     // Use of try/except code

     __try

     {

         g_myGlobalVariable = 2;

     }

     __except( EXCEPTION_EXECUTE_HANDLER )

     {

         g_myGlobalVariable = 4;

     }

 }

 


I compiled InstructionDemo.CPP with the following command line:

 

CL InstructionDemo.CPP

I then disassembled the relevant parts of the executable and annotated the listing. Above each instruction or sequence is the C++ code responsible for it (see Figure 3). This is similar to what the Developer Studio IDE does when you select "Go To Disassembly" in the source window. Many of the instructions don't need explanation, but it's worthwhile to point out a few things.

 

Figure 3   InstructionDemo Mixed Source and Assembly

 

int main( int argc, char *argv[] )

{

401000:    PUSH       EBP

401001:    MOV        EBP,ESP

401003:    SUB        ESP,00000098

401009:    PUSH       EDI

    char *pszString = "Hello";

40100A:    MOV        DWORD PTR [EBP-0000008C],00406030

 

    unsigned long     localUnsignedLong = 2;

401014:    MOV        DWORD PTR [EBP-00000088],00000002

 

    unsigned char     localUnsignedChar = 2;

40101E:    MOV        BYTE PTR [EBP-00000094],02

 

    long              localSignedLong = 2;

401025:    MOV         DWORD PTR [EBP-00000084],00000002

 

    char              localSignedChar = 2;

40102F:    MOV        BYTE PTR [EBP-00000098],02

 

    g_myGlobalVariable = 0x12345678;        // Assignment to global

401036:    MOV        DWORD PTR [004088E8],12345678

 

    localSignedLong = localSignedChar;      // signed type promotion

401040:    MOVSX      EAX,BYTE PTR [EBP-00000098]

401047:    MOV        DWORD PTR [EBP-00000084],EAX

 

    // Conditional execution   

    if ( localUnsignedLong == 2 )

40104D:    CMP        DWORD PTR [EBP-00000088],02

401054:    JNE        00401062

 

        localSignedLong = 1;

401056:    MOV        DWORD PTR [EBP-00000084],00000001

    else

401060:    JMP        0040106C

 

        localSignedLong = 2;   

401062:    MOV        DWORD PTR [EBP-00000084],00000002

 

    // Using TEST

    if ( localUnsignedLong & 0x00040008 )

40106C:    MOV        ECX,DWORD PTR [EBP-00000088]

401072:    AND        ECX,00040008

401078:    TEST       ECX,ECX

40107A:    JE         00401086

 

        i = 3;

40107C:    MOV        DWORD PTR [EBP-00000090],00000003

 

    // AND'ing off bitfields

    localUnsignedLong &= 0x01020304;

401086:    MOV        EDX,DWORD PTR [EBP-00000088]

40108C:    AND        EDX,01020304

401092:    MOV        DWORD PTR [EBP-00000088],EDX

 

    // OR'ing on bitfields   

    localSignedLong |= 0x05060708;

401098:    MOV        EAX,DWORD PTR [EBP-00000084]

40109E:    OR         EAX,05060708

4010A3:    MOV        DWORD PTR [EBP-00000084],EAX

 

    // LOOP code   

    for ( i = 0; i < 4; i++ )

4010A9:    MOV        DWORD PTR [EBP-00000090],00000000

4010B3:    JMP        004010C4

 

4010B5:    MOV        ECX,DWORD PTR [EBP-00000090]

4010BB:    ADD        ECX,01

4010BE:    MOV        DWORD PTR [EBP-00000090],ECX

4010C4:    CMP        DWORD PTR [EBP-00000090],04

4010CB:    JNL        004010E1

 

        localUnsignedLong += i;

4010CD:    MOV        EDX,DWORD PTR [EBP-00000088]

4010D3:    ADD        EDX,DWORD PTR [EBP-00000090]

4010D9:    MOV        DWORD PTR [EBP-00000088],EDX

4010DF:    JMP        004010B5

 

    // Procedure invocation

    printf( "%u %u %08X %s", localUnsignedLong, argc, &argc, szBuffer );

4010E1:    LEA        EAX,[EBP-80]

4010E4:    PUSH       EAX

4010E5:    LEA        ECX,[EBP+08]

4010E8:    PUSH       ECX

4010E9:    MOV        EDX,DWORD PTR [EBP+08]

4010EC:    PUSH       EDX

4010ED:    MOV        EAX,DWORD PTR [EBP-00000088]

4010F3:    PUSH       EAX

4010F4:    PUSH       00406038

4010F9:    CALL       004011C0

4010FE:    ADD        ESP,14

 

    // Using STOSD / STOSB   

    memset( szBuffer, 0, sizeof(szBuffer) );

401101:    MOV        ECX,00000020

401106:    XOR        EAX,EAX

401108:    LEA        EDI,[EBP-80]

40110B:    REP        STOSD

 

    // Using SCASB

    i = strlen( szBuffer );

40110D:    LEA        EDI,[EBP-80]

401110:    OR         ECX,FF

401113:    XOR        EAX,EAX

401115:    REPNE      SCASB

401117:    NOT        ECX

401119:    ADD        ECX,FF

40111C:    MOV        DWORD PTR [EBP-00000090],ECX

 

    MySubProcedure( );

401122:    CALL       0040112E

 

    return 0;

401127:    XOR        EAX,EAX

 

}

401129:    POP        EDI

40112A:    MOV        ESP,EBP

40112C:    POP        EBP

40112D:    RET

 

void MySubProcedure( void )

{

40112E:    PUSH       EBP

40112F:    MOV        EBP,ESP

401131:    PUSH       FF

401133:    PUSH       00405058

401138:    PUSH       004012F8

40113D:    MOV        EAX,FS:[00000000]

401143:    PUSH       EAX

401144:    MOV        DWORD PTR FS:[00000000],ESP

40114B:    SUB        ESP,08

40114E:    PUSH       EBX

40114F:    PUSH       ESI

401150:    PUSH       EDI

401151:    MOV        DWORD PTR [EBP-18],ESP

 

    tlsVariable = 2;

401154:    MOV        EAX,[004088EC]

401159:    MOV        ECX,DWORD PTR FS:[0000002C]

401160:    MOV        EDX,DWORD PTR [ECX+EAX*4]

401163:    MOV        DWORD PTR [EDX+00000004],00000002

 

    __try

    {

40116D:    MOV        DWORD PTR [EBP-04],00000000

 

        g_myGlobalVariable = 2;

401174:    MOV        DWORD PTR [004088E8],00000002

40117E:    MOV        DWORD PTR [EBP-04],FFFFFFFF

401185:    JMP        004011A1

 

    __except( EXCEPTION_EXECUTE_HANDLER )

401187:    MOV        EAX,00000001

40118C:    RET

 

40118D:    MOV        ESP,DWORD PTR [EBP-18]

 

        g_myGlobalVariable = 4;   

401190:    MOV        DWORD PTR [004088E8],00000004

 

    }

40119A:    MOV        DWORD PTR [EBP-04],FFFFFFFF

 

}

4011A1:    MOV        ECX,DWORD PTR [EBP-10]

4011A4:    MOV        DWORD PTR FS:[00000000],ECX

4011AB:    POP        EDI

4011AC:    POP        ESI

4011AD:    POP        EBX

4011AE:    MOV        ESP,EBP

4011B0:    POP        EBP

4011B1:    RET


First, examine the instructions at offset 0x401000. They're establishing the stack frame for the procedure, including creation of space below the frame for local variables. If you look throughout the procedure, you won't see the EBX and ESI registers used, so the stack frame only preserves the EDI register.
After a whole bunch of variable initialization instructions, notice that the signed type promotion (char to long) at offset 0x401040 requires two instructions. This is because (in the general case) the Intel architecture doesn't allow one instruction to reference two memory addresses. Therefore, the assignment must go through a register that acts as an intermediate location.
Also interesting is the if statement starting at offset 0x40104D. After the code that executes when the expression evaluates to TRUE, note the JMP instruction at offset 0x0x401060. This JMP instruction makes the CPU skip over all the code for the else clause. A bit later (at offset 0x40106C), another if statement uses the TEST instruction to see if bitfields are set. In that sequence, the compiler treats the ECX register as a private, unnamed local variable.
Examining the for loop at offset 0x4010A9 is interesting because of the way the compiler orders the initialization, termination condition, and post-iteration code. The MOV instruction at 0x4010A9 performs the initialization, and then control JMPs past the post-iteration code to get to the termination condition code. The termination condition code looks very similar to an if statement. If you understand what the code is doing here, you can see how a for statement could be rewritten using if and goto statements.
Starting at offset 0x4010E1, the code begins pushing parameters on the stack in preparation for calling printf. It's important to realize that the parameters are passed right to left. Note that there are two distinct LEA instructions. The first calculates the address of the szBuffer array, while the second calculates the address of the argc parameter. After the call to printf at offset 0x4010F9, the code cleans all the pushed parameters off the stack with the "ADD ESP,14" instruction.
In the MySubProcedure code starting at offset 0x40112E, the stack frame setup is considerably more complex than the prior procedure's. The instructions like "PUSH 00405058" and "MOV EAX,FS:[00000000]" are building a frame for the structured exception handler code that results from using __try. Also, this time the stack frame setup code preserves all the register variable registers (EBX, ESI, and EDI).
At offset 0x401154, the code modifies the TLS variable called tlsVariable. The "MOV ECX,DWORD PTR FS:[0000002C]" instruction loads the ECX register with a pointer to the array of 64 DWORDs that each thread uses for TLS. The next instruction uses an advanced addressing form to index into the array and read the slot corresponding to a particular TLS index. ECX contains the pointer to the array, while EAX contains the TLS index. The code multiplies EAX by four (the size of a DWORD), and adds it to the TLS array pointer.

Wrap-up


In the real world, you will no doubt encounter instructions beyond what I've described here. But now you should be familiar with most of the commonly used registers and how memory is addressed. You should be able to tell a local variable apart from a parameter. You should also be able to distinguish these type classes from global and TLS variables.
Beyond the basic theory, I've also shown a reasonably large subset of the instructions that Win32 compilers generate. It's unlikely that my introduction will enable you to start writing your code in MASM. Still, with this working knowledge, you can be more confident when your debugger takes you to dark, scary places in other people's code, especially when even the dim light of source code isn't available.

Read more about assembly language in the June 1998 installment of Under the Hood.

Have a question about programming in Windows? Send it to Matt at mpietrek@tiac.com

From : http://www.microsoft.com/msj/0298/hood0298.aspx