DM3 Image Format

Introduction

Gatan DM3 images use a tag hierarchy to contain all the image information and data. This document contains the tag hierarchy format. It is a slightly modified version of information that was formerly posted on the EMAN Project website at Baylor. There is also some information posted by Chris Boothroyd on the DM2 and DM3/DM4 formats here.

Image Format Overview

(Note: long = 4 bytes, short = 2 bytes, char = 1 byte)

DM3Image   =   long
           |   long
           |   long
           |   TagGroup

TagGroup   =   char
           |   char
           |   long (assume value = N)
           |   [TagEntry] x N

TagEntry   =   char
           |   short (assume value = N)
           |   N bytes
           |   (TagGroup | TagType)

TagType	   =   long
           |   long ( assume value = N)
           |   [long] X N
           |   TagData

TagData	   =   [item] x N

DM3 Image Format

1.  4 bytes, image file version, should be 3 
2.  4 bytes, number of bytes in the file 
3.  4 bytes, byte-ordering of the tag data (0 = bigendian, 1 = littleendian) 
4.  a single TagGroup instance including all the tags and data

TagGroup format

1.  1 byte, is this group sorted? 
2.  1 byte, is this group open? 
3.  4 bytes, number of tags in this group 
4.  all tag entries. each tag entry is a TagEntry instance.

TagEntry format

1.  1 byte, specifying whether it is data (21) or tag group (20) 
2.  2 bytes, length of tag's label 
3.  n bytes, tag's label as a string 
4.  tag instance: if tag is a group, it's a TagGroup instance.
    if tag is a data tag, it's a TagType instance.

TagType format

1.  4 bytes, equals to "%%%%". 
2.  4 bytes, n = length of definiton of encoded type: 
    for a simple type this will = 1,
    for a string this will = 2,
    an array of a simple type will = 3,
    structs have 1+2*f where f=number of fields in struct
	
NB arrays of structs and arrays of arrays etc. will require additional 
fields for the complete definition (see Further details: TagType 3. 
EncodedType)

3.  4 x n bytes: where n is number of encoded types defined above;
    each 4 byte value helps defines the Encoded Type, either as an ID
    to be looked up in the table below or (for complex types) indicating
    field numbers etc as described below
4.  data, its size depends on encoded types.

Further details:

TagType 3. EncodedType

The definition fields of each encoded type (TagType.3) occupy 4 bytes. The first one will always be one of the following numbers

    SHORT   = 2,
    LONG    = 3,
    USHORT  = 4,
    ULONG   = 5,
    FLOAT   = 6,
    DOUBLE  = 7,
    BOOLEAN = 8,
    CHAR    = 9,
    OCTET   = 10,
    STRUCT  = 15,
    STRING  = 18,
    ARRAY   = 20

For structs, strings and arrays additional fields are needed for the defintion:

   string: field 2 = string length. 
    
    struct: contains the following additional fields: 
        struct_namelength 
        number of fields (n) 
        (field_namelength, fieldtype) x n
        where each of the n fieldtypes is one of the EncodedTypes. 

    array: field 2 = array_type, which is one of the EncodedTypes. 
        field 3 = array_length.

    complex array: to give one example, an array of structs will look
	something like this:
	field 1 (encoded type) = 20  (Array)
	field 2 (type of elements of array) = 15 (Struct)
		definition of struct follows immediately
	field 3= array_length

	where definition of struct follows the normal pattern

Data

(which will be stored in little endian format on PC files) Here are the number of bytes for each kind of data:

1.  short: data size = 2 
2.  long: data size = 4 
3.  unsigned short: data size = 2 
4.  unsigned long: data size = 4 
5.  float: data size = 4 
6.  double: data size = 8 
7.  boolean: data size = 1 
8.  char: data size = 1 
9.  octet: data size = 1 
10. string: data size = 2 x string length
    these strings are stored as 2 byte unicode
11. struct: contains the following data: 
    struct_name: data size = struct_namelength x 1
    
        [ for the n fields in the struct ]
        field_name: data size = field_namelength x 1
        field_value: data size = sizeof(fieldtype)
    
12. array: 
    data size = array_length x sizeof(array_type)

Byte-ordering

All fields, except the tag data, are stored using big-endian byte ordering. The tag data is stored in the platform's byte ordering.

Label and String format

Labels are stored in a single-byte character set, which contains characters from the ASCII subset. String data is stored as unicode in an array of unsigned short.

------------------------------------------------------------------------

lpeng@bcm.tmc.edu Last modified: Thu Sep 26 15:50:49 CDT 2002

Modified by Greg Jefferis. Last modified 2013-11-23