The main reason for the fact that software that is built on one system , runs on another system without much extra effort is a data structure known as Portable Executable Format (here we are specifically referring to Windows systems, though other Operating Systems have similar data structures, for eg. the Mach-O for Mac and ELF or linux), which is nothing but a collection of headers that contain information necessary for the loader to manage the executable code.
The Portable Executable format is a file format for executables, DLLs, object code used in 32 bit and 64 bit Windows systems. It includes dynamic libraries that are referenced, API import and export tables and TLS (Thread Local Storage) data. The intent was to have a common file format for all flavors of Windows, on all supported CPUs. Loading an executable into memory then becomes a function of mapping certain ranges of a PE file into the address space. The most important point to remember is that if we know how to find something in a PE file, we will almost certainly find the same information after the file is loaded in memory.
PE File Data Structures
On opening an Win32 binary executable in a hex editor, one notices that the first two letters are always MZ (after Mark Zbikowski one of the developers of MS-DOS). The first few bytes that determine the type of the file are called File ID Tag or File Signature or informally “magic numbers”. A list of major file signatures is present here. The reason why the DOS header is maintained in 32 bit executable files is two-fold: 1) for backward compatibility 2) to decline new file types, for example if the 32 bit executable is run on a 16 bit DOS machine, it gracefully exits with the error message immediately following the DOS header. The message is actually part of a DOS stub. In Win32 systems the PE loader automatically skips this DOS stub.
NOTE: The exact offsets at which each of these sections is located in the hex file will be looked at with a practical example since that has changed a bit over time, though the basics remain the same. Hence, initially we only concentrate on the sections in the PE file format.
The PE header consists of a File ID signature along with the value ‘PE\0\0’. The location of the PE header can be determined by adding the 2-byte (stored in lil endian format) value at x3C of the DOS header to the base address. It also contains information that concerns the entire file such as whether the file is a DLL, an executable, machine type that the code runs on and much more. The PE Header is followed by an Optional PE Header. It is not optional per se, because all executables must have one, but certain types of Object files do not. The optional PE header begins with a 2-byte magic code representing the architecture (0x010B for PE32, 0x020B for PE64, 0x0107 ROM). This is used in conjunction with the machine type to see in the PE header to detect if the PE file is running on a compatible system. Other useful memory-related variables including the size and virtual base of the code and data, entry point, and how many directories there are may be specified.
Section Table and Sections
Each section has an entry in the Section Table. A PE file is made up of sections which consist of a name, offset within the file, as well as the size of the section in the file and memory, virtual address to copy to and associated flags. This is followed by the sections themselves. Some well known sections are the .text section that holds the program code, the .data section that holds the global variables etc.
This post just introduces one to the idea of the PE file format. There are many resources online for further reading. A post practically explaining how the above knowledge is used will be coming soon. Further reading on this topic may be done here: