Phobos Notes on the format of DOS .EXE files

Introduction

This page is intended to document the format of DOS executable files which have a filename extension of .EXE. Note that other formats (e.g. .COM) are not covered.

This information is necessary for anyone who might wish to write programs which process such files, including those who are writing bootstraps and program loaders.

Note that this only covers the 16 bit executable files used by DOS, and not those used by other systems such as Windows and OS/2, even though they also have a filename extension of .EXE.

Overview

.EXE program files contain information that permits the program to use the full capabilities of the 8086 family. The linker places all this extra information in a header area at the start of the .EXE file. Although the .EXE file structure could accommodate a header as small as 32 bytes, it is usual to make the header a multiple of 512 bytes in size, as this makes things easier for programs (such as loaders) that have to manipulate the file. The header contains information about memory requirements, and details of adjustments (relocations) that need to be made when the program is loaded.

When loading a program from a .EXE file, the loader will generally read the header into a separate temporary work area.

Hexadecimal numbers are indicated using the convention that they are followed by the letter H; the format commonly used in C (a leading 0x) is not used. All offsets are relative to the start of the header; however, since the header is the first item in the .EXE file, the offsets are also relative to the start of the file.

The term page is sometimes used; in this context it means the units in which parts of the file are stored. The size of a page, for this purpose, is 512 bytes.

The term paragraph refers to a size of 16 bytes (this is derived from the way that memory segmentation is implemented).

The program segment prefix (or PSP) is a data structure used by DOS to record information about the running program. It is normally stored in memory immediately before the load module, and is 256 bytes in size. If a non-DOS system is in use, the PSP may not exist.

Format of the .EXE file header

The first 1CH bytes of the header are commonly known as the formatted header, since their position and contents are fixed. Optional information (e.g. that used by overlay managers) can follow this area. In the absence of such optional information, the relocation pointer table (if present) immediately follows the formatted header area.

In the following, values that occupy more than one byte are stored with the least significant byte first. For example, a value of 1234H would be stored as a byte containing 34H followed by a byte containing 12H.

Offset Size Description
00H 2 Signature Word. This contains a 'magic number' which provides a simple check that the file really is a DOS .EXE file; it follows that the filename extension does not in fact have to be .EXE, as long as programs check this word. The value of this word is 5A4DH (with the 4DH coming first). These two bytes represent the character string 'MZ', the initials of Mark Zibowski, a Microsoft employee at the time the file format was designed.
02H 2 Last Page Size. The file occupies a number of 512 byte pages. The last page may contain between 1 and 512 bytes. This word indicates the number of bytes actually used in the last page, with the special case of a full page being represented by a value of zero (since the last page is never empty).
04H 2 File Pages. This word contains a count of the number of pages required to hold the file. For example, if the file contains 1024 bytes, this word would contain 0002H; if the file contains 1025 bytes, this word would contain 0003H. The Last Page Size field is used to determine the number of valid bytes in the final page. Thus, if the file contains 1024 bytes, the Last Page Size field contains 0000H, because no bytes overflow into a final partly used page. If the file contains 1025 bytes, then the Last Page Size field contains 0001H, because the final page contains only one valid byte (the 1025th byte).
06H 2 Relocation Items. This word gives the number of entries that exist in the relocation pointer table. It is quite in order for this value to be zero, in which case there are no relocation entries.
08H 2 Header Paragraphs. This word gives the size of the .EXE header in paragraphs. It indicates the offset of the program's compiled/assembled and linked image (the load module) within the .EXE file. The size of the load module can be deduced by subtracting this value (converted to bytes) from the overall file size derived from combining the File Pages and Last Page Size values. The header always spans an even number of paragraphs.
0AH 2 MINALLOC. This word indicates the minimum number of paragraphs the program requires to begin execution. This is in addition to the memory required to hold the load module. This value normally represents the total size of any uninitialised data and/or stack segments that are linked at the end of a program. This space is not directly included in the load module, since there are no particular initialising values and it would simply waste disk space.
0CH 2 MAXALLOC. This word indicates the maximum number of paragraphs that the program would like allocated to it before it begins execution. This indicates additional memory over and above that required by the load module and the value specified by MINALLOC. If the request cannot be satisfied, the program is allocated as much memory as is available.
0EH 2 Initial SS value. This word contains the paragraph address of the stack segment relative to the start of the load module. At load time, this value is relocated by adding the address of the start segment of the program to it, and the resulting value is placed in the SS register before the program is started. In DOS, the start segment of the program is the first segment boundary in memory after the PSP.
10H 2 Initial SP value. This word contains the absolute value that must be loaded into the SP register before the program is given control. Since the actual stack segment is determined by the loader, and this is merely a value within that segment, it does not need to be relocated.
12H 2 Complemented Checksum. This word contains a checksum of the contents of the .EXE file. Its value is rarely checked, but its purpose is to ensure the integrity of the data within the file. Full details of how it is calculated appear in the section on checksum calculation.
14H 2 Initial IP value. This word contains the absolute value that should be loaded into the IP register in order to transfer control to the program. Since the actual code segment is determined by the loader, and this is merely a value within that segment, it does not need to be relocated.
16H 2 Pre-relocated initial CS value. This word contains the initial value, relative to the start of the load module, that should be placed in the CS register in order to transfer control to the program. At load time, this value is relocated by adding the address of the start segment of the program to it, and the resulting value is placed in the CS register when control is transferred.
18H 2 Relocation table offset. This word gives the offset from the start of the file to the relocation pointer table. This value must be used to locate the relocation pointer table (rather than assuming a fixed location) because variable-length information pertaining to program overlays can occur before this table, causing its position to vary. A value of 40H in this field generally indicates a different kind of executable file, not a DOS 'MZ' type.
1AH 2 Overlay number. This word is normally set to 0000H, because few programs actually have overlays. It changes only in files containing programs that use overlays; see the note below.

Relocation pointer table

The relocation pointer table consists of a list of pointers to words within the load module that must be adjusted before the program is given control. These words consist of references made by the program to its individual segments. These segment address references must be adjusted when the program is loaded, because it can be loaded at any address.

Each pointer in the table consists of two words. The first word contains an offset from the segment address given in the second word, which in turn indicates a segment address relative to the start of the load module. Together, these two words point to a third word within the load module that must have the start segment address added to it (the start segment address corresponds to the segment address at which the beginning of the program's image has been loaded).

Checksum calculation

The checksum word contains the one's complement of the summation of all of the words in the .EXE file (excluding itself). The procedure for calculating the checksum is as follows:

  1. Temporarily set the Complemented Checksum word to 0000H.
  2. If the file contains an odd number of bytes, temporarily assume that there is an additional byte at the end with a value of 00H.
  3. Add together all of the words in the file, ignoring overflow, to obtain a value that is still one word in length.
  4. Perform a one's complement operation on the total, and store this in the Complemented Checksum word.

The validity of a .EXE file can then be checked by performing steps 2 and 3 above; the total should be 0FFFFH.

Notes

The load module

The load module starts where the .EXE header ends, and consists of the fully loaded image of the program. It always appears on a paragraph boundary. The load module appears within the .EXE file exactly as it would appear in memory if it were to be loaded at segment address 0000H. The only changes made to the load module during the loading process are the relocation of any direct segment references.

Although the .EXE file contains distinct segment images within the load module, it provides no information for separating those individual segments from one another. The load module is simply copied into memory, any direct segment references are relocated, and the program is given control.

Loading the .EXE program

This is a summary of the process needed to load a .EXE file.

  1. Read the formatted area of the header (the first 1CH bytes) from the .EXE file into a work area.
  2. Determine the size of the largest available block of memory.
  3. Determine the size of the load module, using the Last Page Size and Header Paragraphs fields from the header.
  4. Add the contents of the MINALLOC field from the header to the calculated load module size. If the loading is being done by DOS, the size of the PSP (256 bytes) is also allowed for. If this total exceeds the size of the largest available block of memory, the load is terminated with an error (because there is not enough memory).
  5. Add the contents of the MAXALLOC field from the header to the calculated load module size. If the loading is being done by DOS, then the size of the PSP (256 bytes) is also allowed for. If the size of the memory block found earlier exceeds this calculated total, then this total is allocated from the memory block and the rest is kept for use elsewhere. If the calculated total exceeds the size of the memory block, then the entire block is allocated.
  6. If the MINALLOC and MAXALLOC fields both contain 0000H, then DOS uses the calculated load module size to determine a start segment, such that the load module will load into the high end of memory. If the loading is not being done by DOS, these values may be ignored. In all other cases, the start segment is chosen so that the load module is loaded into the start of the allocated block of memory (leaving space for the PSP).
  7. The load module is physically loaded into memory at the chosen start segment.
  8. The relocation pointers are read into a work area, and are used to relocate the load module's direct segment references.
  9. If loading is being done by DOS, a PSP is built in the first 256 bytes of the allocated memory block. While building the two FCBs within the PSP, DOS determines initial values for the AL and AH registers (they are set to reflect drive letters for the files specified by the FCBs).
  10. The SS and SP registers are set to the values in the .EXE file header, after the start segment is added to the SS value.
  11. If loading is being done by DOS, the DS and ES registers are set to point to the beginning of the PSP.
  12. Control is transferred to the program by setting CS and IP to the values in the header, after adding the start segment to the CS value.

Register contents at program entry

The contents of the registers when a .EXE program is entered are as follows.

Register Contents
AX If loading under DOS: AL contains the drive number for the first FCB in the PSP, and AH contains the drive number for the second FCB.
BX Undefined.
CX Undefined.
DX Undefined.
BP Undefined.
SI Undefined.
DI Undefined.
IP Initial value copied from .EXE file header.
SP Initial value copied from .EXE file header.
CS Initial value (relocated) from .EXE file header.
DS If loading under DOS: segment for start of PSP.
ES If loading under DOS: segment for start of PSP.
SS Initial value (relocated) from .EXE file header.

Valid XHTML 1.0! Valid CSS!

This site is copyright © 2008 Bob Eager
Last updated: 27 Nov 2008