Implementation of extended attributes on the FAT file system


This was originally written because of all the queries about the EA DATA. SF file, which was a frequent subject of discussion. I have tried to explain what this file does, why it exists, and what one should and should not do with it. Various people gave me extra information; particular thanks to Dean Gibson ( who figured out the format of the EA DATA. SF file and put me right on a few points. Some of the following information is due to Dean.

In the following, all numbers are decimal unless followed by an H (in which case they are hexadecimal)

What are extended attributes?

All versions of OS/2 (except versions 1.0 and 1.1) support the concept of 'extended attributes' (EAs) on files. These are used for all kinds of things, and can be very small or quite large (the limit is 64KB per file at present). EAs might represent a file type, a file classification, an icon type, some free text...practically anything. One important use is for the storage of instance data for some classes of Workplace Shell objects.

Extended attributes on HPFS file systems

EAs are supported directly by the High Performance File System (HPFS). They are stored in an efficient manner; a small EA does not effectively take any additional space most of the time (typically, if it is less than several hundred bytes).

Extended attributes on FAT file systems

For backwards compatibility the old DOS (File Allocation Table, or FAT) file system needs to support EAs too. In order to do this, and at the same time keep the file system consistent for DOS if it is booted instead of OS/2 on the same machine, some trickery is needed.

FAT directory entries have ten spare bytes in them, starting at offset 0CH (immediately after the filename and the attribute byte); these are normally zero. They are there because originally the directory entry layout was modelled on the CP/M file system, and these bytes (among others) were used to describe the location of the disk extents making up the file; they aren't used for that purpose under DOS. Two of these spare bytes (at offsets 14H and 15H within the directory entry) are used to head a chain of disk allocation units (or clusters) which hold the EAs for that file. This caused interesting problems (for example) with early versions of the Norton Utilities, which flagged the directory entry as one with an 'illegal' format!

So, effectively an OS/2 FAT directory entry can head two chains of clusters; one for the file itself (as usual) and one for the EAs attached to the file. The latter listhead is often null (indicated by zeros).

All this would be fine until you ran CHKDSK under DOS. It would find all these clusters holding the EAs, and because they would appear not to belong to any file, they would be collected up and marked as 'lost' clusters to be added to the free list. Disaster would ensue the next time OS/2 looked at the file (well, eventually anyway) because the chances are that the clusters making up the EAs would have been allocated to another file by that time. To prevent this, the file named EA DATA. SF (the EA datafile) is used. This file is never meant to be read directly, and indeed it should never normally be backed up as a file. Its directory entry heads a chain of clusters (as usual), but these clusters are the same ones that hold all the EAs on that file system. In other words, there are two references to every EA cluster; one via the file's directory entry and one via the EA datafile. This makes the disk appear consistent under DOS; all of the clusters used on the disk belong to a valid file, and of course DOS will not see the second reference because it ignores the EA listhead in the directory entry.

Microsoft have said that the EA datafile is position dependent, and it shouldn't be manipulated or deleted; to make this hard, it has a strange name with spaces in it (which defeats a lot of software), and it is marked readonly, system and hidden. Observation has shown this not to be strictly true; it seems that you can back up and restore the file without any damage (of course, the EA datafile must correspond to the files on the disk; if you attempted to restore such a file on its own without also restoring the various files that reference it, you would have problems). The snag is that restored files won't generally have the entire directory entry restored, so the head of the EA cluster chain (in offsets 14H and 15H) will be lost (set to zero).

Notice the implication for backup under OS/2. A proper, EA-aware backup program should not back up the EA datafile; it simply reads the EAs for each file as it is backed up, and of course it restores them the same way - with the relevant system calls. So, the fact that OS/2 locks the EA datafile open is actually a benefit of sorts - it saves the file being backed up when its contents will never be needed; and in any case it would be semi-useless unless the directory entries were also restored in their entirety.

The EA datafile is created when the first EA is attached to any file on the disk; try it out with a diskette. It also takes one cluster (the first one) for some kind of internal housekeeping information. I suspected that this cluster is some kind of map similar to the FAT, chaining together the clusters relating to one file within the EA datafile; if so, it would probably expand if you had a lot of EAs on your disk. Dean Gibson figured out a lot more about the format of the file; the details are given later.

EAs are removed from the EA datafile if the file to which they are attached is deleted; this only applies if deletion takes place under OS/2 (including DOS sessions). If deleted under vanilla DOS, the EA datafile retains the 'lost' EA clusters; they can be reclaimed by running CHKDSK under OS/2.

All this of course plays havoc with defragmenters. They have to work round all of the scattered, immobile clusters making up the EA datafile. Yes, it's a kludge; but quite a good one, given the constraint that it has to look OK under normal DOS as well as provide the functionality under OS/2.

Notes on the format of the EA datafile

Most of this information came from Dean Gibson - many thanks, Dean! I have made the occasional addition.

The actual EA DATA. SF file format is as follows. All references to 'words' mean 16 bit quantities.

Given a non-zero 16 bit EA pointer 'X' in a FAT system directory entry (in offsets 14H and 15H):

  1. Shift X right 7 bits, and use the result as a word index to obtain a word entry from table A. Note that since a FAT system can only have 64K entries, that means a maximum of 32K files that have EA entries (since each file and each EA take one cluster each), so the maximum EA pointer value is < 32K, and thus the high-order bit of X is unused.
  2. Use X as a relative word index into table B, to obtain the word entry at that location. A value of FFFFH means that the entry is unused.
  3. Add the values from steps 2 & 3 to obtain the relative cluster of the EA for the target file within EA_DATA._SF.

In order to keep the EA DATA. SF file logically contiguous when table B is expanded into a new cluster or when an EA is deleted, the FAT cluster chain for EA DATA. SF is altered, and values in table A and/or segments of table B are changed to reflect this.

The first word of the EA sector is for identification and contains the ASCII characters 'EA'; the next word is the relative sector number of this sector (consistency check); then the next two words are zero; the next twelve bytes contain the target file name (no path); the next word has an as yet undeciphered meaning; then the next two words are zero; followed by the EA data for the target file. The first word of the EA data is the length of the EA data in bytes, including the count word.

Back to Tavi OS/2 main page

Last Updated: 28th October 2000
© 2000 by Bob Eager, Tavi Systems