Anatomy of the Portable Executable (PE) Format

In Windows systems we can find countless extensions, although mainly .exe, .dll, .sys, among others, stand out. All these extensions have something in common: they follow the PE (Portable Executable) format. The PE format is the standard used by Windows for executable files and dynamic link libraries (DLLs), allowing the system to load and execute programs efficiently.

This format is an extension of the COFF format, which was originally developed to store object files in Unix systems. Microsoft adapted and extended COFF to create the PE format to handle the specific needs of executables and libraries in Windows.

PE executable files in Windows: EXE, DLL, SYS

In today’s article we will explain how this format is structured. We will use a simple C executable that prints a Hello World, all the screenshots in this article will belong to this executable.

To begin, we can observe a basic high-level structure of a PE file in the following image:

Basic structure of a PE file at high level

We are going to talk in detail about each of these headers. If for example, we open our executable using the PE-Bear tool, we will be able to visualize all the fields from the image above:

Visualization of PE headers in PE-Bear

That said, let’s start by looking at some important concepts and then we will see each part of the PE format one by one.

Glossary of Terms
Loader and Linker
Relative Virtual Address (RVA)
Sections
winnt.h
DOS Header
DOS Stub
Rich Header
Difference in Execution Flow Between a Windows System and MS-DOS
NT Headers (IMAGE_NT_HEADERS)
Section Headers
Conclusion
References

Glossary of Terms

Throughout the article we are going to see many terms that may be worth knowing beforehand. Anyway, when these terms are used they will be reminded of what they are about, but at least this way we have them in a centralized manner:

PE32: 32-bit portable executable file.
PE32+: 64-bit portable executable file.
BYTE: One byte of data (also known as DB).
WORD: Two bytes of data (also known as DW).
DWORD: Four bytes of data (also known as DD).
Section: Sections are the containers of executable file data. Each PE file can have multiple sections, and each one has a specific name and attributes that determine how the operating system should handle it.
Object file: It is the result of the assembler or compiler and is usually in COFF (Common Object File Format). These files serve as input for the linker.
Image file: It is the result of the linker and is called an image file. It can be an executable file (.exe) or a dynamic link library (.dll). The PE (Portable Executable) format is an extension of the COFF format and is the standard for executable files and DLLs on Windows systems.
Binary file: A binary file can be an object file or an image file.
Windows Loader (or simply loader): It is the code responsible for loading a PE file into memory. When a PE file is loaded into memory, its in-memory version is called a module.
Offset: It is a reference to the position of data or an instruction within a file or in memory. The offset indicates how many bytes you need to advance from an initial position, such as the start of a file or a memory segment, to reach a specific location. In PE files, it is used to indicate the position of different sections or components within the file.
Mapping: In the context of operating systems and executable files, “mapping” means assigning parts of a file directly to the system’s memory so they can be used by a program. It’s like creating a link between the file content on disk and a specific area of memory, allowing the program to access that information quickly and efficiently. In the case of PE (Portable Executable) files, the Windows loader takes the sections of the executable file and places them in memory. This establishes a correspondence between the positions of the file on disk and the addresses in memory. Thanks to this process, the program can directly access its necessary instructions and data during execution, without having to load the entire file into memory at once.
NT Headers: Sometimes NT headers are also known as PE headers, so if you see PE Header instead of NT Header out there, know that it refers to the same thing.

Additionally, for the article we will make the respective translations, that is, OS loader we will mention it as operating system loader, linker as linker, etc. Finally, we will refer to executable files (.exe, .dll) as image files, likewise, when we refer to PE file, we will be including both object files and image files.

Loader and Linker

When we run a program, two of the most important components in the process are the linker and the loader. The linker combines the object files generated by the compiler or assembler, along with other pieces of code, to create an executable file, such as a file with a .exe extension. This process ensures that all references between different parts of the program are correctly resolved, allowing the code to function as a unit.

The loader, on the other hand, is responsible for taking this executable file generated by the linker and loading it into the system’s main memory, preparing it for the processor to execute. Without the loader, the program could not be executed, since the code must be in memory before the CPU can access it.

These two components, the linker and the loader, work together so that a program goes from being simple code to a functioning application, making them essential elements in the life cycle of a program.

Compilation and linking process: Linker and Loader

Let’s see the entire compilation process manually using the MSVC tools (cl and link), which allow us to obtain the files corresponding to each step. We will use cl to compile and generate assembler code and object file, and link to create the final executable.

Initially we have the following C++ code that prints a simple Hello World:

C++ source code that prints Hello World

To obtain the assembler code from this C++ file, we use cl with the /Fa and /Fo options, which allow us to generate the assembler (.asm) and the output object file (.obj):

cl /Fahello.asm /Fohello.obj /c hello.cpp

Compilation with cl generating ASM and OBJ files

This command generates a file called hello.asm that contains the assembler code of the program in ASCII format. It also generates an object file (hello.obj) that contains the corresponding machine code that can be executed by the processor. The object file is not yet an executable program by itself. Nor does it contain all the references to standard libraries or operating system functions, those references are resolved during the linking process.

Generated files: hello.asm and hello.obj

If we pass these two files to Linux we can observe the file type using the file command:

File command showing COFF format of ASM and OBJ files

Once we have the assembler code and the object file generated by cl, we can proceed to the next step to create the executable.

To convert the object file into a Windows executable program, we use link. This ensures that all function and library references are properly managed:

link hello.obj /OUT:hello.exe

Linking object file with link to create executable

Executable hello.exe generated by the linker

This step links the object file and generates an executable called hello.exe. The linker, which is called by the MSVC compiler, connects all the necessary function and library references so that the program can run correctly in a Windows environment.

This whole explanation has been more out of simple curiosity (and because there will be some mention throughout the article), but the important thing was to understand how the compilation and linking process converts our source code into an executable program, managing both the references to functions and libraries as well as the structure of the final file.

The image file generated in this section is not the same as the one in the rest of the article.

Relative Virtual Address (RVA)

Before continuing, let’s look at another important concept on which PE files depend to a great extent. This is the relative virtual address (RVA), this address is basically an offset from where the image (PE file) has been loaded into memory (Image Base).

For example, if an image is loaded into memory with a base address 0x400000 and the RVA to the entry point (main function) is 0x1000. We could obtain the virtual address of the entry point by adding these two values:

Image Base + RVA = VA
- 0x00400000 + 0x00001000 = 0x00401000

In this way we would obtain the virtual address of the main function.

Actually, the RVA value of a method or variable does not necessarily have to always be the offset from the beginning of the file (Image Base). It is normally the relative value from a virtual address, which is commonly the base address of the image but can also be the base address of a section.

Sections

The last important concept before starting with the PE format is that of sections. Sections are the containers of executable file data, they occupy the rest of the PE file after the headers, specifically after the section headers (which we will see at the end).

Sections in a PE file after the headers

Some sections have special names that indicate their purpose, you can see the complete list of them in the Microsoft official documentation on special sections. Still, the most common ones are the following:

.text: Contains the executable code of the program.
.data: Contains initialized data.
.bss: Contains uninitialized data.
.rdata: Contains read-only initialized data.
.edata: Contains export tables.
.idata: Contains import tables.
.reloc: Contains image relocation information.
.rsrc: Contains resources used by the program, which can include images, icons and even embedded binaries.
.tls: Provides specific storage for each thread running in the program.

View of PE sections in PE-Bear

I preferred to introduce this concept at the beginning because it is mentioned more than once, but do not confuse its position, sections are the last thing found in a PE file, as we can see in the image at the beginning of the article.

winnt.h

Throughout the article we are going to see many different definitions and data structures, the common point of all of them is that they are found in the winnt.h header file.

This file is available in the mingw-w64 GitHub repository. All the structures and definitions that we will mention can be consulted directly in the mingw-w64 repository on GitHub.

DOS Header

The DOS header (also called MS-DOS header) is a 64-byte structure that exists at the beginning of the PE file. This header is not new to the PE format, it is the same MS-DOS header that has existed since version 2 of the MS-DOS operating system. The main reason for keeping the same structure intact at the beginning of the PE format is that, when you try to load a file created under Windows version 3.1 or earlier, or MS-DOS version 2.0 or later, the operating system can read the file and understand that it is not compatible.

In other words, if you try to run a Windows NT executable on MS-DOS version 6.0, you will receive the message “This program cannot be run in DOS mode.”.

If the MS-DOS header was not included as the first part of the PE format, the operating system would simply fail to load the file and would offer an error message like “The name specified is not recognized as an internal or external command, operable program or batch file.”

Although it is not a header that is commonly used in modern Windows systems, it is still present for compatibility reasons with older systems. We can see the structure of this header by checking the definition of the “IMAGE_DOS_HEADER” structure located in the winnt.h library:

typedef struct _IMAGE_DOS_HEADER {      // DOS .EXE header
    WORD   e_magic;                     // Magic number
    WORD   e_cblp;                      // Bytes on last page of file
    WORD   e_cp;                        // Pages in file
    WORD   e_crlc;                      // Relocations
    WORD   e_cparhdr;                   // Size of header in paragraphs
    WORD   e_minalloc;                  // Minimum extra paragraphs needed
    WORD   e_maxalloc;                  // Maximum extra paragraphs needed
    WORD   e_ss;                        // Initial (relative) SS value
    WORD   e_sp;                        // Initial SP value
    WORD   e_csum;                      // Checksum
    WORD   e_ip;                        // Initial IP value
    WORD   e_cs;                        // Initial (relative) CS value
    WORD   e_lfarlc;                    // File address of relocation table
    WORD   e_ovno;                      // Overlay number
    WORD   e_res[4];                    // Reserved words
    WORD   e_oemid;                     // OEM identifier (for e_oeminfo)
    WORD   e_oeminfo;                   // OEM information; e_oemid specific
    WORD   e_res2[10];                  // Reserved words
    LONG   e_lfanew;                    // File address of new exe header
  } IMAGE_DOS_HEADER, *PIMAGE_DOS_HEADER;

Although this header is not widely used, it is true that it contains 6 bytes with important information for the operating system loader:

e_magic [2-bytes/WORD]: This field, also called “magic number”, is positioned right in the first 2 bytes of the PE file. These first two bytes have a fixed hexadecimal value of 0x5A4D, which translates to ‘MZ’ in ASCII. This value is used to identify a type of MS-DOS compatible file, and for this reason, serves as a signature to indicate that it is a valid MS-DOS executable file. As a curiosity, ‘MZ’ refers to Mark Zbikowski, one of the developers of MS-DOS.

e_magic field of the DOS Header with MZ value — e_magic

e_lfanew [4-bytes/DWORD]: This field corresponds to the last 4 bytes of the DOS header structure, starting at byte 60 (an offset of 0x3C from the base address). This field is truly important as it serves as an offset towards the NT headers. The value of these 4 bytes indicates where the NT headers start in the file, which coincides with where the “PE Signature” header begins. The NT headers are the modern successor to the DOS header and are crucial for the operating system to be able to load and execute the file correctly.

e_lfanew field pointing to NT headers — e_lfanew

The DOS header seen from PE-Bear is as follows:

Complete DOS Header view in PE-Bear — DOS Header

We can see the “e_magic” with its fixed value and subsequently the value of “e_lfanew” which would be 0x100, that is, following an offset of 0x100 we would find the start of the NT headers.

DOS Stub

After the DOS header, comes the DOS Stub (offset of 0x40), a small MS-DOS 2.0 compatible executable. Its main function is to display an error message that says: “This program cannot be run in DOS mode” when the file is attempted to run in a DOS environment.

This code is necessary to ensure compatibility with older systems and provide a friendly response instead of an unexplained failure.

Rich Header

Once we have seen the DOS header and the DOS Stub we should move on to the NT headers. However, before them, there is a possible piece of data known as the Rich header that may or may not be present. It is an undocumented structure that is only present in those executables that have been created with Microsoft’s toolset.

Rich Header in PE-Bear hexadecimal view

Rich Header details in PE-Bear

This header is formed by:

A set of data passed through XOR.
The keyword Rich (in PE-Bear it corresponds to the Rich ID field).
An XOR key: which on one hand serves as a checksum and on the other as a key itself to decrypt the XOR data.

This structure is generated by the linker and contains some metadata about the tools used to build the executable, for example:

Rich Header analysis with rich.py showing compiler modules

In the image we can observe the Rich header analysis of our image file using rich.py. The script details the compilation environment, VS2008 SP1 build 30729 in this case. Additionally, different modules are listed through specific IDs and versions, along with a count indicating the number of times each module was used during the image construction. For example, the module with ID 259, version 33808, appears 3 times. On the other hand, the line indicating unmarked objects (Unmarked objects count=64) refers to sections of the executable that are not directly associated with specific compiler tools, but that are still part of the final file.

This field not only contains relevant information about the compilation environment profile, but can also be very useful to be used as a signature or fingerprint. The count also gives possible indications about the size of the project, and the checksum can also be used as a signature.

As it contains this type of information, malware developers often modify this header to not provide this information or take advantage of this header to perform other actions. If you want to see more information I leave some links here:

Difference in Execution Flow Between a Windows System and MS-DOS

Based on what we have seen so far, the behavior when executing an image file between both systems would be as follows:

Execution flow comparing Windows and MS-DOS — Based on the same diagram from: A dive into the PE file format - PE file structure - Part 2: DOS Header, DOS Stub and Rich Header

NT Headers (IMAGE_NT_HEADERS)

The NT headers are a structure defined in the winnt.h library as “IMAGE_NT_HEADERS”:

typedef struct _IMAGE_NT_HEADERS64 {
    DWORD Signature;
    IMAGE_FILE_HEADER FileHeader;
    IMAGE_OPTIONAL_HEADER64 OptionalHeader;
} IMAGE_NT_HEADERS64, *PIMAGE_NT_HEADERS64;

typedef struct _IMAGE_NT_HEADERS {
    DWORD Signature;
    IMAGE_FILE_HEADER FileHeader;
    IMAGE_OPTIONAL_HEADER32 OptionalHeader;
} IMAGE_NT_HEADERS32, *PIMAGE_NT_HEADERS32;

If we look at its definition, we can find three members:

Signature [4-bytes/DWORD]
FileHeader [IMAGE_FILE_HEADER/Structure]
OptionalHeader [IMAGE_OPTIONAL_HEADER/Structure]

Additionally, the structure is defined in two different versions, one for 32-bit executables (also referred to as PE32 executables) and another for 64 bits (referred to as PE32+). Although if we look closely, this difference is only reflected in the “OptionalHeader” member.

PE Signature

The PE file signature is the first member of the NT headers structure, it is a DWORD data type, so it occupies 4 bytes. We can visualize it in PE-Bear:

PE signature with value 0x50450000 in PE-Bear

The signature always has a fixed value of 0x50450000, which translates to ASCII as PE\0\0 (P + E + Null Byte + Null Byte). The purpose of the signature is to serve as a mark that tells the operating system that it is indeed an executable in PE format.

File Header (IMAGE_FILE_HEADER)

This structure stores information about the PE file, it is also called “COFF file header”. It is defined in winnt.h as “IMAGE_FILE_HEADER”:

typedef struct _IMAGE_FILE_HEADER {
    WORD    Machine;
    WORD    NumberOfSections;
    DWORD   TimeDateStamp;
    DWORD   PointerToSymbolTable;
    DWORD   NumberOfSymbols;
    WORD    SizeOfOptionalHeader;
    WORD    Characteristics;
} IMAGE_FILE_HEADER, *PIMAGE_FILE_HEADER;

Let’s look at each member of the structure:

Machine [2-bytes/WORD]: Specifies the target architecture for the executable (x86, x64, ARM, etc). Although we are only interested in two, 0x8864 for AMD64 and 0x14c for i386. The complete list of values can be seen in the Microsoft documentation on machine types.
NumberOfSections [2-bytes/WORD]: This field stores the number of sections, in other words, the number of section headers (size of the section table).
TimeDateStamp [4-bytes/DWORD]: This value indicates the creation or compilation time of the program in the form of epoch timestamp (1728682065), which measures the seconds elapsed since 01/01/1970 00:00:00. An interesting aspect to keep in mind is the risk of an integer overflow in 2038 due to the limited space available in this DWORD field.
PointerToSymbolTable and NumberOfSymbols [4-bytes/DWORD]: These two fields contain the file offset to the COFF symbol table and the number of entries in that symbol table. However they are set to 0 because COFF debugging information is obsolete in modern PE files (no COFF symbol table present).
SizeOfOptionalHeader [2-bytes/WORD]: Stores the size of the optional header. For PE32 it is usually 0x00E0 (224 bytes) and for PE32+ it is usually 0X00F0 (240 bytes).
Characteristics [2-bytes/WORD]: Contains flags that indicate the attributes of the file. These attributes can be things like whether the file is executable, whether it is a system file and not a user program, and many other things. The complete list of attributes can be found in the Microsoft documentation on characteristics, some of them would be:
- 0x0002: Executable file
- 0x0020: The application can handle addresses greater than 2 GB
- 0x0100: The file is a DLL
- 0x2000: The file is a system file.
- 0x4000: The file is a loadable driver.
- 0x8000: The file is media-aware.

The file header of our image file would be as follows:

File Header with architecture and executable characteristics

File Header details in hexadecimal view

Optional Header (IMAGE_OPTIONAL_HEADER)

This header is the most important of the NT headers. The system loader will look at the information given by this header to be able to load and execute the executable. It is called optional header because some file types like object files do not have it, however, it is essential for image files. This header does not have a fixed value, that’s why the “SizeOfOptionalHeader” member exists in the “IMAGE_FILE_HEADER” structure.

The first 8 members of the optional header are standard for each implementation of the COFF file format, the rest of the header is an extension of the standard COFF optional header defined by Microsoft, these additional members of the structure are necessary for the Windows loader and linker.

As mentioned previously, there are two versions for this header, one for 32-bit executables and another for 64. The two versions are different in two aspects:

The size of the structure itself (or the number of members defined within the structure): “IMAGE_OPTIONAL_HEADER32” has 31 members while “IMAGE_OPTIONAL_HEADER64” only has 30 members, that additional member in the 32-bit version is a DWORD called “BaseOfData” that contains the RVA of the beginning of the data section.
The data type of some of the members: The following 5 members of the “IMAGE_OPTIONAL_HEADER” structure are defined as DWORD data type in the 32-bit version and as ULONGLONG in the 64-bit version:
- “ImageBase”
- “SizeOfStackReserve”
- “SizeOfStackCommit”
- “SizeOfHeapReserve”
- “SizeOfHeapCommit”

Let’s see the definition of both structures:

32 bits

typedef struct _IMAGE_OPTIONAL_HEADER {
    //
    // Standard fields.
    //

    WORD    Magic;
    BYTE    MajorLinkerVersion;
    BYTE    MinorLinkerVersion;
    DWORD   SizeOfCode;
    DWORD   SizeOfInitializedData;
    DWORD   SizeOfUninitializedData;
    DWORD   AddressOfEntryPoint;
    DWORD   BaseOfCode;
    DWORD   BaseOfData;

    //
    // NT additional fields.
    //

    DWORD   ImageBase;
    DWORD   SectionAlignment;
    DWORD   FileAlignment;
    WORD    MajorOperatingSystemVersion;
    WORD    MinorOperatingSystemVersion;
    WORD    MajorImageVersion;
    WORD    MinorImageVersion;
    WORD    MajorSubsystemVersion;
    WORD    MinorSubsystemVersion;
    DWORD   Win32VersionValue;
    DWORD   SizeOfImage;
    DWORD   SizeOfHeaders;
    DWORD   CheckSum;
    WORD    Subsystem;
    WORD    DllCharacteristics;
    DWORD   SizeOfStackReserve;
    DWORD   SizeOfStackCommit;
    DWORD   SizeOfHeapReserve;
    DWORD   SizeOfHeapCommit;
    DWORD   LoaderFlags;
    DWORD   NumberOfRvaAndSizes;
    IMAGE_DATA_DIRECTORY DataDirectory[IMAGE_NUMBEROF_DIRECTORY_ENTRIES];
} IMAGE_OPTIONAL_HEADER32, *PIMAGE_OPTIONAL_HEADER32;

64 bits

typedef struct _IMAGE_OPTIONAL_HEADER64 {
    WORD        Magic;
    BYTE        MajorLinkerVersion;
    BYTE        MinorLinkerVersion;
    DWORD       SizeOfCode;
    DWORD       SizeOfInitializedData;
    DWORD       SizeOfUninitializedData;
    DWORD       AddressOfEntryPoint;
    DWORD       BaseOfCode;
    ULONGLONG   ImageBase;
    DWORD       SectionAlignment;
    DWORD       FileAlignment;
    WORD        MajorOperatingSystemVersion;
    WORD        MinorOperatingSystemVersion;
    WORD        MajorImageVersion;
    WORD        MinorImageVersion;
    WORD        MajorSubsystemVersion;
    WORD        MinorSubsystemVersion;
    DWORD       Win32VersionValue;
    DWORD       SizeOfImage;
    DWORD       SizeOfHeaders;
    DWORD       CheckSum;
    WORD        Subsystem;
    WORD        DllCharacteristics;
    ULONGLONG   SizeOfStackReserve;
    ULONGLONG   SizeOfStackCommit;
    ULONGLONG   SizeOfHeapReserve;
    ULONGLONG   SizeOfHeapCommit;
    DWORD       LoaderFlags;
    DWORD       NumberOfRvaAndSizes;
    IMAGE_DATA_DIRECTORY DataDirectory[IMAGE_NUMBEROF_DIRECTORY_ENTRIES];
} IMAGE_OPTIONAL_HEADER64, *PIMAGE_OPTIONAL_HEADER64;

Magic [2-bytes/WORD]: is a field that identifies the state of the image. The Microsoft documentation on Optional Header standard fields mentions three common values:
- 0x10B: Identifies the image as a PE32 executable.
- 0x20B: Identifies the image as a PE32+ executable (aka. 64 bits)
- 0x107: Identifies the image as ROM.

The value of this field is what determines whether the executable is 32 or 64 bits, the “Machine” member of the “IMAGE_FILE_HEADER” structure is ignored by the Windows PE loader, instead, this one is used.

MajorLinkerVersion and MinorLinkerVersion [1-byte/BYTE]: Indicate the major and minor version number of the linker used to create the executable file.
SizeOfCode [4-bytes/DWORD]: Stores the total size, in bytes, of all sections containing executable code (normally the .text section).
SizeOfInitializedData [4-bytes/DWORD]: Indicates the total size, in bytes, of all sections containing initialized data (normally the .data section).
SizeOfUninitializedData [4-bytes/DWORD]: Stores the total size, in bytes, of all sections containing uninitialized data (normally the .bss section).
AddressOfEntryPoint [4-bytes/DWORD]: It is an RVA (Relative Virtual Address) that points to the entry point of the image when loaded into memory. In executable applications, this value points to the start of the main function (for example, main or WinMain). In device drivers, it points to the initialization function. For DLLs, the entry point is optional, if it does not exist, this field is set to 0.
BaseOfCode [4-bytes/DWORD]: It is an RVA that indicates the starting address of the code section (normally .text) in memory once the file is loaded.
BaseOfData [4-bytes/DWORD]: It is an RVA that points to the start of the data section (normally .data) in memory after loading the file. This field does not exist in the PE32+ format.
ImageBase [4-bytes/DWORD in PE32 and 8-bytes/ULONGLONG in PE32+]: Contains the preferred base address for loading the first byte of the image into memory. This value must be a multiple of 64 KB. However, due to protection mechanisms such as ASLR (Address Space Layout Randomization) and other reasons, the image is often not loaded at this address. In that case, the PE loader chooses an unused memory area to load the image and then performs a process called relocation. During relocation, the internal addresses of the image are adjusted so that they work with the new load base. There is a special section, called relocation section (.reloc), that contains information about the places that need to be adjusted if relocation is required.
SectionAlignment [4-bytes/DWORD]: Specifies the alignment, in bytes, of the sections when they are loaded into memory. Sections are aligned at boundaries that are multiples of this value. By default, it is usually the page size of the architecture (for example, 4 KB) and cannot be less than the FileAlignment value.
FileAlignment [4-bytes/DWORD]: Specifies the alignment, in bytes, of section data in the file (on disk). If the actual size of a section’s data is less than FileAlignment, the rest is padded with zeros to meet the alignment. This value must be a power of 2 between 512 and 64 KB. If SectionAlignment is less than the architecture’s page size, then FileAlignment and SectionAlignment must be equal.
MajorOperatingSystemVersion, MinorOperatingSystemVersion [2-bytes/WORD]: Specify the major and minor version of the operating system required to run the file.
MajorImageVersion and MinorImageVersion [2-bytes/WORD]: Indicate the major and minor version of the file image. These versions can be used by tools or systems to determine compatibility.
MajorSubsystemVersion and MinorSubsystemVersion [2-bytes/WORD]: Specify the major and minor version of the required subsystem. By subsystem it refers to the execution environment, which can be a Windows graphical application (GUI), a console application (CUI), an EFI environment (Extensible Firmware Interface), or even a native application that interacts directly with the operating system kernel, among others.
Win32VersionValue [4-bytes/DWORD]: Reserved field that, according to official documentation, must be set to 0.
SizeOfImage [4-bytes/DWORD]: Indicates the total size of the image in memory (in bytes), including all headers and sections. It is rounded to the nearest multiple of “SectionAlignment”, as this value is used when loading the image into memory.
SizeOfHeaders [4-bytes/DWORD]: Is the combined size, in bytes, of the “DOS stub”, PE headers (NT headers) and section headers, rounded to the nearest multiple of “FileAlignment”.
CheckSum [4-bytes/DWORD]: Field used to store the checksum of the image, allowing verification of the file’s integrity. The checksum is a value calculated from the file content that acts as a fingerprint. If any byte of the file changes (for example, due to corruption or manipulation), the resulting checksum will also change, which allows detecting inconsistencies or damage to the file. If this field is set to 0, the checksum is not calculated or verified.
Subsystem [2-bytes/WORD]: Specifies the subsystem required to run this file. Indicates the type of interface the program uses, such as whether it is a console application, a Windows GUI application, a driver, etc. The complete list of possible values is found in the Microsoft documentation on Windows subsystems.
DLLCharacteristics [2-bytes/WORD]: Defines various characteristics of the image file, such as support for NX (No eXecute) or whether the image can be relocated at runtime. Although it is called “DLLCharacteristics” for historical reasons, it also applies to normal executable files (EXE). The complete list of available flags can be consulted in the Microsoft documentation on DLL characteristics.
SizeOfStackReserve [8-bytes/ULONGLONG]: Specifies the total amount of memory that is reserved for the main thread’s stack.
SizeOfStackCommit [8-bytes/ULONGLONG]: Indicates the initial amount of stack memory that is committed.
SizeOfHeapReserve [8-bytes/ULONGLONG]: Specifies the total amount of memory that is reserved for the process’s local heap.
SizeOfHeapCommit [8-bytes/ULONGLONG]: Indicates the initial amount of heap memory that is committed.

Note: The “reserve” values indicate how much virtual memory is reserved, while the “commit” values indicate how much physical memory is initially allocated. Reserved memory can be committed later as needed.

LoaderFlags [4-bytes/DWORD]: Reserved field that must be set to 0 according to the Microsoft documentation on Windows-specific fields.
NumberOfRvaAndSizes [4-bytes/DWORD]: Indicates the number of entries in the “DataDirectory” array. Specifies how many “IMAGE_DATA_DIRECTORY” structures follow.
DataDirectory [IMAGE_DATA_DIRECTORY/Structure]: Is an array of “IMAGE_DATA_DIRECTORY” structures, where each entry provides the address and size of a specific table or information, such as the import table, exports, resources, etc. The number of entries is defined by “NumberOfRvaAndSizes”.
- Now we will talk in more detail about this member.

The optional header in our PE looks like this:

Optional Header showing Magic, Entry Point and alignments

In the image we can observe that the “Magic” field has a value of 0x20B, which means that the file is a PE32+ executable. This format allows the file to use the full capacity of the 64-bit virtual address space.

The RVA (Relative Virtual Address) of the entry point is 0x14E0, which indicates the relative address where code execution will begin once loaded into memory. The code section starts at address 0x1000, and is aligned according to the “Section Alignment” value.

Additionally, the “File Alignment” field, with a value of 0x200, defines how sections are aligned in the file on disk, while in memory they follow the “Section Alignment” of 0x1000.

.data section with zero padding for File Alignment

For example, the .data section has content from 0x2A00 to 0x2A70, and the rest of the section is padded with zeros to 0x2BFF, complying with the alignment defined by the “FileAlignment” field. This alignment ensures that sections on disk are organized in blocks of a predetermined size, while “SectionAlignment” defines how these sections are organized in memory once the file is loaded.

As for other important members, “SizeOfImage” has a value of 0x9000, which represents the total size of the image loaded in memory, and this value is a multiple of “SectionAlignment” (0x1000). “SizeOfHeaders”, on the other hand, has a value of 0x400 and is aligned according to “FileAlignment”, which ensures that the file headers occupy a complete block on disk, facilitating their correct reading by the operating system.

Finally, the “Subsystem” field has a value of 3, indicating that this file is a console application, which is consistent with the option I selected when creating the project in Visual Studio:

Console subsystem selection in Visual Studio

Data Directories (IMAGE_DATA_DIRECTORY)

As we said before, the last member of the “IMAGE_OPTIONAL_HEADER” structure is an array of “IMAGE_DATA_DIRECTORY” structures.

IMAGE_DATA_DIRECTORY DataDirectory[IMAGE_NUMBEROF_DIRECTORY_ENTRIES];

“IMAGE_NUMBEROF_DIRECTORY_ENTRIES” is a constant defined with the value 16, which means that the standard PE file can have up to 16 “IMAGE_DATA_DIRECTORY” entries.

#define IMAGE_NUMBEROF_DIRECTORY_ENTRIES    16

The “IMAGE_DATA_DIRECTORY” structure is as follows:

typedef struct _IMAGE_DATA_DIRECTORY {
    DWORD   VirtualAddress;
    DWORD   Size;
} IMAGE_DATA_DIRECTORY, *PIMAGE_DATA_DIRECTORY;

Compared to other PE file structures, “IMAGE_DATA_DIRECTORY” is quite simple, as it only has two members: “VirtualAddress”, which contains the RVA that points to the start of the corresponding entry, and “Size”, which defines the size of that entry.

So, the “Data Directories” member of “IMAGE_OPTIONAL_HEADER” is nothing more than a table containing the addresses and sizes to other important parts of the executable that are useful for the operating system loader. For example, an important directory is the “Import Directory”, as it contains a list of external functions imported from other libraries.

Not all directories have the same structure, the “IMAGE_DATA_DIRECTORY.VirtualAddress” points to whatever directory, but the type of directory is what determines how the chunk of data will be interpreted.

In the winnt.h library we can find a series of defined “Data Directories”:

// Directory Entries

#define IMAGE_DIRECTORY_ENTRY_EXPORT          0   // Export Directory
#define IMAGE_DIRECTORY_ENTRY_IMPORT          1   // Import Directory
#define IMAGE_DIRECTORY_ENTRY_RESOURCE        2   // Resource Directory
#define IMAGE_DIRECTORY_ENTRY_EXCEPTION       3   // Exception Directory
#define IMAGE_DIRECTORY_ENTRY_SECURITY        4   // Security Directory
#define IMAGE_DIRECTORY_ENTRY_BASERELOC       5   // Base Relocation Table
#define IMAGE_DIRECTORY_ENTRY_DEBUG           6   // Debug Directory
//      IMAGE_DIRECTORY_ENTRY_COPYRIGHT       7   // (X86 usage)
#define IMAGE_DIRECTORY_ENTRY_ARCHITECTURE    7   // Architecture Specific Data
#define IMAGE_DIRECTORY_ENTRY_GLOBALPTR       8   // RVA of GP
#define IMAGE_DIRECTORY_ENTRY_TLS             9   // TLS Directory
#define IMAGE_DIRECTORY_ENTRY_LOAD_CONFIG    10   // Load Configuration Directory
#define IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT   11   // Bound Import Directory in headers
#define IMAGE_DIRECTORY_ENTRY_IAT            12   // Import Address Table
#define IMAGE_DIRECTORY_ENTRY_DELAY_IMPORT   13   // Delay Load Import Descriptors
#define IMAGE_DIRECTORY_ENTRY_COM_DESCRIPTOR 14   // COM Runtime descriptor

As we could observe before in the optional header, we can find the directory table with their respective values:

Data Directories table with addresses and sizes

When an entry has both values (“Address” and “Size”) at zero it means that that specific data directory is not used (does not exist). Next, before continuing we are going to briefly mention the export directory and the import address table, which are two important entries of the “Data Directories”.

Export Directory

The export directory of an image file is a data structure that contains information about the functions and variables that are exported from the executable.

typedef struct _IMAGE_EXPORT_DIRECTORY {
    ULONG   Characteristics;
    ULONG   TimeDateStamp;
    USHORT  MajorVersion;
    USHORT  MinorVersion;
    ULONG   Name;
    ULONG   Base;
    ULONG   NumberOfFunctions;
    ULONG   NumberOfNames;
    PULONG  *AddressOfFunctions;
    PULONG  *AddressOfNames;
    PUSHORT *AddressOfNameOrdinals;
} IMAGE_EXPORT_DIRECTORY, *PIMAGE_EXPORT_DIRECTORY;

It contains the addresses of the exported functions and variables, which can be used by other executable files to access those functions and data. The export directory is generally found in DLLs that export functions (for example, kernel32.dll exporting CreateFileA).

Import Address Table (IAT)

The import address table is a data structure in an image file that contains information about the addresses of functions imported from other executable files. These addresses are used to access functions and data in other executables (for example, Programita.exe importing CreateFileA from kernel32.dll).

Section Headers

After the optional header we can find the section headers. These headers contain information about the sections of the PE file. A section header is a structure called “IMAGE_SECTION_HEADER” defined in winnt.h:

typedef struct _IMAGE_SECTION_HEADER {
    BYTE    Name[IMAGE_SIZEOF_SHORT_NAME];
    union {
            DWORD   PhysicalAddress;
            DWORD   VirtualSize;
    } Misc;
    DWORD   VirtualAddress;
    DWORD   SizeOfRawData;
    DWORD   PointerToRawData;
    DWORD   PointerToRelocations;
    DWORD   PointerToLinenumbers;
    WORD    NumberOfRelocations;
    WORD    NumberOfLinenumbers;
    DWORD   Characteristics;
} IMAGE_SECTION_HEADER, *PIMAGE_SECTION_HEADER;

Name [1-byte/BYTE]: the first field of the section header is an array of the size of “IMAGE_SIZEOF_SHORT_NAME”:

#define IMAGE_SIZEOF_SHORT_NAME 		8

Having a default value of 8, it means that the section name cannot be longer than 8 characters. For executables this value is maintained, for other types of files there are some options to be able to set longer names.

PhysicalAddress or VirtualSize [4-bytes/DWORD]: This field is a union and can be called “PhysicalAddress” or “VirtualSize”. In object files, it is called “PhysicalAddress” and contains the total size of the section. In executable images, it is called “VirtualSize” and contains the total size of the section when loaded into memory.
VirtualAddress [4-bytes/DWORD]: Contains the relative virtual address (RVA) of the first byte of the section with respect to the image base when loaded into memory. For object files, contains the address of the first byte of the section before relocations are applied.
SizeOfRawData [4-bytes/DWORD]: This field contains the size of the section on disk, it must be a multiple of “IMAGE_OPTIONAL_HEADER.FileAlignment”.
PointerToRawData [4-bytes/DWORD]: A pointer to the beginning of the section within the file, for executable images, it must be a multiple of “IMAGE_OPTIONAL_HEADER.FileAlignment”.
PointerToRelocations [4-bytes/DWORD]: A file pointer to the beginning of the section’s relocation entries. Set to 0 for executable files.
PointerToLineNumbers [4-bytes/DWORD]: A file pointer to the beginning of COFF line number entries for the section. Set to 0 because COFF debugging information is obsolete.
NumberOfRelocations [2-bytes/WORD]: The number of relocation entries for the section, set to 0 for executable images.
NumberOfLinenumbers [2-bytes/WORD]: The number of COFF line number entries for the section, set to 0 because COFF debugging information is obsolete.
Characteristics [4-bytes/DWORD]: Contains flags that describe the characteristics of the section. These flags indicate whether the section contains executable code, initialized or uninitialized data, whether it can be shared in memory, among others. A complete list of these flags can be found in the Microsoft documentation on section flags.

Of all these mentioned members, an important detail is that the value of “SizeOfRawData” and “VirtualSize” can be different. What do I mean and why does this happen?

“SizeOfRawData” must be a multiple of “IMAGE_OPTIONAL_HEADER.FileAlignment” (0x200 in hexadecimal, 512 in decimal). Therefore, if the actual size of the section on disk is less than “FileAlignment” or is not a multiple of this value, “SizeOfRawData” will be rounded to the next nearest multiple. For example, if the size of a section on disk is 600 bytes, it will be rounded to the next multiple of 512 (0x200), which is 1024 bytes (0x400).

On the other hand, “VirtualSize” represents the actual size of the section in memory. Unlike “SizeOfRawData”, “VirtualSize” does not need to be a multiple of any alignment value. However, the virtual address where the section begins (“VirtualAddress”) must be aligned according to “IMAGE_OPTIONAL_HEADER.SectionAlignment” (0x1000 in hexadecimal, 4096 in decimal).

Due to these differences, it can happen that the size of the section on disk is larger than the size of the section in memory. This happens because “SizeOfRawData” is aligned to the next multiple of “FileAlignment”, which can introduce unused space in the file.

Conversely, it can also happen that “VirtualSize” is larger than “SizeOfRawData”. This occurs if the section contains uninitialized data, such as global or static variables with no assigned value, commonly located in the .bss section. This uninitialized data does not take up space on disk because there is no real information to store, “SizeOfRawData” does not include its size. However, when the executable is loaded into memory, the operating system reserves space for this data, generally initializing it to zero. In this way, the section expands in memory to include the necessary space for the uninitialized data, and “VirtualSize” reflects this larger size.

In summary, “SizeOfRawData” can be larger than “VirtualSize” due to on-disk alignment that adds unused space. On the other hand, “VirtualSize” can be larger than “SizeOfRawData” when the section includes uninitialized data that requires space in memory but does not occupy space on disk.

Well with all this, the section headers of our executable in PE-Bear look like this:

Section headers showing addresses and sizes

In the image we can observe the following columns (among others):

“Raw. Addr.” —> “IMAGE_SECTION_HEADER.PointerToRawData”
“Raw Size” —> “IMAGE_SECTION_HEADER.SizeOfRawData”
“Virtual Addr.” —> “IMAGE_SECTION_HEADER.VirtualAddress”
“Virtual Size” —> “IMAGE_SECTION_HEADER.VirtualSize”

Each pair of fields (Raw and Virtual) are used to calculate where a section ends, both on disk and in memory, which is exactly what we have been talking about before.

For example, the .text section on disk has an address 0x400 and a size of 0x1200. Adding both, we get 0x1600, which marks the final limit of the .text section and the start of the next. This value indicates the first byte immediately after the .text section in the file on disk.

In memory, we can do a similar calculation. The .text section has a virtual address (“VirtualAddress”) of 0x1000 and a virtual size (“VirtualSize”) of 0x10C9. Although the actual size of the section is 4297 bytes (0x10C9 in hexadecimal), sections must start at addresses that are multiples of “IMAGE_OPTIONAL_HEADER.SectionAlignment” (0x1000 in hexadecimal, 4096 in decimal). This means that the .text section starts at 0x1000.

Because the “VirtualSize” of the .text section (4297 bytes) exceeds the size of “SectionAlignment” (4096 bytes), the section occupies more than one alignment interval. By adding the initial virtual address and the virtual size (0x1000 + 0x10C9), we get 0x20C9, which is where the .text section ends in memory. However, sections must start at addresses that are multiples of “SectionAlignment”, so the next section starts at the next aligned address after 0x20C9. The next multiple of 0x1000 after 0x20C9 is 0x3000.

Therefore, although the “VirtualSize” of the .text section is not a multiple of “SectionAlignment”, the operating system reserves space in memory from 0x1000 to 0x3000 for the .text section, extending its final limit to 0x3000, which is the beginning of the next section in memory, the .rdata section. So there is unused space between the actual end of the .text section data and the start of the next section.

To finish, the “Characteristics” field indicates that some sections are read-only, others allow reading and writing, and some are executable. On the other hand, the “Ptr to Reloc.”, “Num. of Reloc.” and “Num. of Linenum.” fields are at zero, which is normal for an image file.

Conclusion

Well, up to this point we have seen practically the most fundamental aspects of the Portable Executable (PE) format, it’s not everything because there are still things to see but without a doubt now you know the fundamentals. Knowing how this type of file is structured is truly important if you want to learn both malware development and analysis.