GhantaEngineering: 04/21/13

Sunday 21 April 2013

how does 64 bit os work on 32 bit processor

Will this 32-bit software run on my 64-bit operating system? or

Will this 64-bit software run on my computer?

If you've asked these questions then this tutorial should help you to understand the concepts of 32-bit and 64-bit computing. We'll look at your computer system as three parts: the hardware, the operating system and the application programs. At the end we'll look at some of the common questions people have.

32-bit versus 64-bit

As the number of bits increases there are two important benefits.

More bits means that data can be processed in larger chunks which also means more accurately.
More bits means our system can point to or address a larger number of locations in physical memory.

32-bit systems were once desired because they could address (point to) 4 Gigabytes (GB) of memory in one go. Some modern applications require more than 4 GB of memory to complete their tasks so 64-bit systems are now becoming more attractive because they can potentially address up to 4 billion times that many locations.

Since 1995, when Windows 95 was introduced with support for 32-bit applications, most of the software and operating system code has been 32-bit compatible.

Here is the problem, while most of the software available today is 32-bit, the processors we buy are almost all 64-bit.

So how long will the transition from 32-bit to 64-bit systems take?

The main issue is that your computer works from the hardware such as the processor (or CPU, as it is called), through the operating system (OS), to the highest level which is your applications. So the computer hardware is designed first, the matching operating systems are developed, and finally the applications appear.

We can look back at the transition from 16-bit to 32-bit Windows on 32-bit processors. It took 10 years (from 1985 to 1995) to get a 32-bit operating system and even now, more than 15 years later, there are many people still using 16-bit Windows applications on older versions of Windows.

The hardware and software vendors learnt from the previous transition, so the new operating systems have been released at the same time as the new processors. The problem this time is that there haven't been enough 64-bit applications. Ten years after the PC's first 64-bit processors, installs of 64-bit Windows are only now exceeding those of 32-bit Windows. Further evidence of this inertia is that you are probably reading this tutorial because you are looking to install your first 64-bit software.

Your computer system in three parts

Now we'll look at those three components of your system. In simple terms they are three layers with the processor or CPU as the central or lowest layer and the application as the outermost or highest layer as shown below:

To run a 64-bit operating system you need support from the lower level: the 64-bit CPU.

To run a 64-bit application you need support from all lower levels: the 64-bit OS and the 64-bit CPU.

This simplification will be enough for us to look what happens when we mix the 32-bit and 64-bit parts. But if you want to understand the issue more deeply then you will also need to consider the hardware that supports the CPU and the device drivers that allow the OS and the applications to interface with the system hardware.

What 32-bit and 64-bit combinations are compatible and will work together?

This is where we get to the practicalities and can start answering common questions.

The general rule is that 32-bit will run on a lower level 64-bit component but 64-bit does not run on a lower level 32-bit component:

A 32-bit OS will run on a 32-bit or 64-bit processor without any problem.
A 32-bit application will run on a 32-bit or 64-bit OS without any problem.
But a 64-bit application will only run on a 64-bit OS and a 64-bit OS will only run on a 64-bit processor.

These two tables illustrate the same rule:

Table 1 — What is compatible if I have a 32-bit CPU?
Processor (CPU)	32-bit	32-bit	32-bit	32-bit
Operating System (OS)	32-bit	32-bit	64-bit	64-bit
Application Program	32-bit	64-bit	32-bit	64-bit
	Yes	No	No	No

Table 2 — What is compatible if I have a 64-bit CPU?
Processor (CPU)	64-bit	64-bit	64-bit	64-bit
Operating System (OS)	64-bit	64-bit	32-bit	32-bit
Application Program	64-bit	32-bit	32-bit	64-bit
	Yes	Yes	Yes	No

The main reason that 32-bit will always run on 64-bit is that the 64-bit components have been designed to work that way. So the newer 64-bit systems are backward-compatible with the 32-bit systems (which is the main reason most of us haven't moved to 64-bit software).

An example of backward compatibility is Windows 64-bit. It has software called WOW64 that provides compatibility by emulating a 32-bit system. See the article How Windows 7 / Vista 64 Support 32-bit Applications if you want to know more. One important point that is made in that article is that it is not possible to install a 32-bit device driver on a 64-bit operating system. This is because device drivers run in parallel to the operating system. The emulation is done at the operating system level so it is available to the higher layer, the application, but it is not available to the device driver which runs on the same level.

Hardware virtualization is the exception to the rule

Another question many people have is whether a 32-bit system can run 64-bit software. As more people are looking to use 64-bit Windows they are wanting to try it out on their existing systems. So we are getting more questions about whether they can run it on their 32-bit processor or under their 32-bit OS.

Following the general rule, we would expect that you cannot run 64-bit software on a 32-bit system. Except that there is one exception called virtualization.

Virtualization creates a virtual system within the actual system. Virtualization can be achieved in hardware or software but it works best if the virtual machine is created in the system hardware. The guest operating system is not aware that there is a host operating system already running. This is the way that a 64-bit operating system can think that it is running on 64-bit hardware without being aware that there is a 32-bit operating system in the mix.

Tables 3 and 4 illustrate the result. Provided that the virtual machine can actually be created and isolated by the virtualizing software then the host OS is effectively removed from the equation, so I've grayed it out. We can now apply the general rules for a non-virtualized system to the three remaining layers.

Table 3 — What is compatible if I have a 32-bit CPU and software virtualization?
Processor (CPU)	32-bit	32-bit	32-bit	32-bit
Host Operating System	32-bit	32-bit	32-bit	32-bit
Guest Operating System	32-bit	32-bit	64-bit	64-bit
Application Program	32-bit	64-bit	32-bit	64-bit
	Yes	No	No	No

Table 4 — What is compatible if I have a 64-bit CPU and software virtualization?
Processor (CPU)	64-bit	64-bit	64-bit	64-bit
Host Operating System	32/64-bit	32/64-bit	32/64-bit	32/64-bit
Guest Operating System	64-bit	64-bit	32-bit	32-bit
Application Program	64-bit	32-bit	32-bit	64-bit
	Yes	Yes	Yes	No

Before you hurry away to try running 64-bit in a virtual machine, you must check that your computer BIOS supports hardware virtualization. If it does not then hardware virtualization will not work even if the CPU does support it.

Emulation of the 64-bit CPU is not an option

All the feasible configurations that we have looked at so far have the processors (CPUs) running software that use the instruction set that is native to that processor. Running 64-bit software on a 32-bit processor doesn't work because the 64-bit instructions are not native to a 32-bit processor. But what if I could emulate a 64-bit processor using 32-bit software?

It is theoretically possible but practically impossible to emulate a 64-bit processor while running software on a 32-bit processor. Even if you can get non-native 64-bit emulation to work, the virtual machine that duplicates a 64-bit CPU would run very slowly because every 64-bit instruction has to be trapped and handled by the emulator. 64-bit memory pointers also have to be converted to work within the 32-bit address space.

Furthermore, my understanding is that the x86 (32-bit) processors used in PCs and Apple Macs are not able to completely emulate the x64 (64-bit) instruction set. Some 64-bit instructions cannot be trapped by the emulator. This causes the system to crash when the x86 processor tried to run those x64 instructions.

How does function work

A typical stack frame

Figure 1 on the right is what a typical stack frame might look like. In these diagrams, the stack grows upward and smaller numbered memory addresses are on top.This would be the contents of the stack if we have a function foo with the prototype:

   int foo (int arg1, int arg2, int arg3) ;

and foo has two local int variables. (We are assuming here that sizeof(int) is 4 bytes). The stack would look like this if say the main function called foo and control of the program is still inside the function foo. In this situation, main is the "caller" and foo is the "callee".The ESP register is being used by foo to point to the top of the stack. The EBP register is acting as a "base pointer". The arguments passed by main to foo and the local variables in foo can all be referenced as an offset from the base pointer.
The convention used here is that the callee is allowed to mess up the values of the EAX, ECX and EDX registers before returning. So, if the caller wants to preserve the values of EAX, ECX and EDX, the caller must explicitly save them on the stack before making the subroutine call. On the other hand, the callee must restore the values of the EBX, ESI and EDI registers. If the callee makes changes to these registers, the callee must save the affected registers on the stack and restore the original values before returning.
Parameters passed to foo are pushed on the stack. The last argument is pushed first so in the end the first argument is on top. Local variables declared in foo as well as temporary variables are all stored on the stack.
Return values of 4 bytes or less are stored in the EAX register. If a return value with more than 4 bytes is needed, then the caller passes an "extra" first argument to the callee. This extra argument is address of the location where the return value should be stored. I.e., in C parlance the function call:

   x = foo(a, b, c) ;

is transformed into the call:

   foo(&x, a, b, c) ;

Note that this only happens for function calls that return more than 4 bytes.Let's go through a step-by-step process and see how a stack frame is set up and taken down during a function call.

ESP ==>	. . .

	Callee saved registers EBX, ESI & EDI (as needed)

	temporary storage

	local variable #2	[EBP - 8]

	local variable #1	[EBP - 4]

EBP ==>	Caller's EBP

	Return Address

	Argument #1	[EBP + 8]

	Argument #2	[EBP + 12]

	Argument #3	[EBP + 16]

	Caller saved registers EAX, ECX & EDX (as needed)

	. . .
	Fig. 1

The caller's actions before the function call



ESP ==>		Return Address

		Arg #1 = 12

		Arg #2 = 15

		Arg #3 = 18

		Caller saved registers EAX, ECX & EDX (as needed)

EBP ==>		. . .
		Fig. 2

In our example, the caller is the main function and is about to call a function foo. Before the function call, main is using the ESP and EBP registers for its own stack frame.First, main pushes the contents of the registers EAX, ECX and EDX onto the stack. This is an optional step and is taken only if the contents of these 3 registers need to be preserved.
Next, main pushes the arguments for foo one at a time, last argument first onto the stack. For example, if the function call is:

     a = foo(12, 15, 18) ;

The assembly language instructions might be:

        push    dword 18 
        push    dword 15
        push    dword 12

Finally, main can issue the subroutine call instruction:

        call    foo

When the call instruction is executed, the contents of the EIP register is pushed onto the stack. Since the EIP register is pointing to the next instruction in main, the effect is that the return address is now at the top of the stack. After the call instruction, the next execution cycle begins at the label named foo.Figure 2 shows the contents of the stack after the call instruction. The red line in Figure 2 and in subsequent figures indicates the top of the stack prior to the instructions that initiated the function call process. We will see that after the entire function call has finished, the top of the stack will be restored to this position.

The callee's actions after function call

When the function foo, the callee, gets control of the program, it must do 3 things: set up its own stack frame, allocate space for local storage and save the contents of the registers EBX, ESI and EDI as needed.So, first foo must set up its own stack frame. The EBP register is currently pointing at a location inmain's stack frame. This value must be preserved. So, EBP is pushed onto the stack. Then the contents of ESP is transferred to EBP. This allows the arguments to be referenced as an offset from EBP and frees up the stack register ESP to do other things. Thus, just about all C functions begin with the two instructions:

        push    ebp
        mov     ebp, esp

The resulting stack is shown in Figure 3. Notice that in this scheme the address of the first argument is 8 plus EBP, since main's EBP and the return address each takes 4 bytes on the stack.



ESP=EBP =>	`main`'s EBP

	Return Address

	Arg #1 = 12	[EBP + 8]

	Arg #2 = 15	[EBP + 12]

	Arg #3 = 18	[EBP + 16]

	Caller saved registers EAX, ECX & EDX (as needed)


	Fig. 3

In the next step, foo must allocate space for its local variables. It must also allocate space for any temporary storage it might need. For example, some C statements in foo might have complicated expressions. The intermediate values of the subexpressions must be stored somewhere. These locations are usually called temporary, because they can be reused for the next complicated expression. Let's say for illustration purposes that foo has 2 local variables of type int (4 bytes each) and needs an additional 12 bytes of temporary storage. The 20 bytes needed can be allocated simply by subtracting 20 from the stack pointer:

        sub     esp, 20

The local variables and temporary storage can now be referenced as an offset from the base pointer EBP.Finally, foo must preserve the contents of the EBX, ESI and EDI registers if it uses these. The resulting stack is shown in Figure 4.
The body of the function foo can now be executed. This might involve pushing and popping things off the stack. So, the stack pointer ESP might go up and down, but the EBP register remains fixed. This is convenient because it means we can always refer to the first argument as [EBP + 8] regardless of how much pushing and popping is done in the function.
Execution of the function foo might also involve other function calls and even recursive calls to foo. However, as long as the EBP register is restored upon return from these calls, references to the arguments, local variables and temporary storage can continue to be made as offsets from EBP.



ESP ==>	Callee saved registers EBX, ESI & EDI (as needed)

	temporary storage	[EBP - 20]

	local variable #2	[EBP - 8]

	local variable #1	[EBP - 4]

EBP==>	`main`'s EBP

	Return Address

	Arg #1 = 12	[EBP + 8]

	Arg #2 = 15	[EBP + 12]

	Arg #3 = 18	[EBP + 16]

	Caller saved registers EAX, ECX & EDX (as needed)


	Fig. 4

The callee's actions before returning



ESP ==>		Arg #1 = 12

		Arg #2 = 15

		Arg #3 = 18

		Caller saved registers EAX, ECX & EDX (as needed)

EBP ==>		. . .

		Fig. 5

Before returning control to the caller, the callee foo must first make arrangements for the return value to be stored in the EAX register. We already discussed above how function calls with return values longer than 4 bytes are transformed into a function call with an extra pointer parameter and no return value.Secondly, foo must restore the values of the EBX, ESI and EDI registers. If these registers were modified, we pushed their original values onto the stack at the beginning of foo. The original values can be popped off the stack, if the ESP register is pointing to the correct location shown in Figure 4. So, it is important that we do not lose track of the stack pointer ESP during the execution of the body of foo--- i.e., the number of pushes and pops must be balanced.
After these two steps we no longer need the local variables and temporary storage for foo. We can take down the stack frame with these instructions:

        mov     esp, ebp
        pop     ebp

The result is a stack that is exactly the same as the one shown in Figure 2. The return instruction can now be executed. This pops the return address off the stack and stores it in the EIP register. The result is the stack shown in Figure 5.The i386 instruction set has an instruction "leave" which does exactly the same thing as the mov and popinstructions above. Thus, it is very typical for C functions to end with the instructions:

        leave
        ret

The caller's actions after returning

After control of the program returns to the caller (which is main in our example), the stack is as shown in Figure 5. In this situation, the arguments passed to foo is usually not needed anymore. We can pop all 3 arguments off the stack simultaneously by adding 12 (= 3 times 4 bytes) to the stack pointer:

        add     esp, 12

The caller main should then save the return value which was placed in EAX in some appropriate location. For example if the return value is to be assigned to a variable, then the contents of EAX could be moved to the variable's memory location now.Finally, the main function can pop the values of the EAX, ECX and EDX registers if their values were preserved on the stack prior to the function call. This puts the top of the stack at the exact same position as before we started this entire function call process. (Recall that this position is indicated by a red line in Figures 2-5.)

Examples

Understanding these conventions lets you do two things: 1) write assembly language programs that can be called from a C program, and 2) call standard C functions from your own assembly language programs.As an example of the first case, we have a function arrayinc written in assembly language that adds one to each element of an integer array. The array is passed toarrayinc as the only argument. Here are the files:

The assembly language program that implements arrayinc: arrayinc.asm.
The C function that calls the arrayinc function: arraytest.c.
A transcript of the UNIX commands & output: arraytest.txt.

Notice that the C program treats arrayinc as any other function that is implemented elsewhere. It really doesn't care if the function is implemented in C or in assembly language.The second situation, calling C functions from assembly language programs, is commonly used to invoke C input/output routines. Here you must decide if your main function will behave like a C function or as an a "normal" assembly language program.
An example of the first case, we call printf from an assembly language program. The entry point for the assembly language program is labeled main and must be declared global. This program must behave like a C function. It must set up and take down the stack frame and preserve the registers according to the C function call convention. The identifier printf must be declared external to get the linker to do the right thing. We also use gcc to do the final linking and loading (instead of using ld), since gcc knows which library contains the printf function. Here are the files:

An assembly language program that calls printf (version 1): printf1.asm.
A transcript of the UNIX commands & output: printf1.txt.

In the second example, the assembly language program is a "normal" one. The entry point is labeled "_start" and the program exits using a Linux kernel system call. In this case, we need to give gcc the "-nostartfiles" option for the linking to work correctly.

An assembly language program that calls printf (version 2): printf2.asm.
A transcript of the UNIX commands & output: printf2.txt.

A good way to understand how C handles function calls is to examine the assembly language code generated by the compiler. We can use "gcc -S" to tell the gcc compiler to create a file with extension ".s" that contains the assembly language code. The gcc compiler does normally generate assembly language code, but without the -S option it does not save it to a file. Unfortunately the assembly code in the .s file is in AT&T-style syntax. This can be converted to Intel-style syntax using an "intel2gas" command.
Here are the files from a simple example.

The original C program: cfunc.c.
The assembly output in AT&T style: cfunc.s.
The assembly output converted to Intel style: cfunc.asm.
A transcript of the UNIX commands & output: cfunc.txt.

For a more complicated example, look at the assembly code generated by gcc for a program with nested function calls:

The original C program: cfunc2.c
The converted assembly language output in Intel style: cfunc2.asm.

Finally, we have an example with a C function with a return value with size more than 4 bytes.

The original C program: cfunc3.c.
The converted assembly language output in Intel style: cfunc3.asm. The comments in this file were added afterwards.