Mark0's Forum
		Software => TrID File Identifier => Topic started by: jenderek on March 07, 2023, 01:26:57 AM
		
			
			- 
				Hello trid users,
 
 some days ago i looked at medium of my last Linux installation. So i looked
 at files in boot directory. On many installation CD-ROM images i find
 corresponding boot catalog. Often this has name boot.cat, sometimes i found
 name boot.catalog and in one case the name was ISOLINUX.CAT. When using
 creating program mkisofs you can specify the path and filename of the boot
 catalog to be used when making an "El Torito" boot able CD by -c option.
 That is described in the man page mkisofs(8).
 
 When running TrID command on such examples these are described as "Unknown!"
 or wrong as "bCAD Drawing" by "bdf-drawing.trid.xml" (See appended
 output/trid-v-old.txt).
 
 For comparison reason i also run file command (newest version 5.44) on such
 samples. Here most samples are described also generic as "data" or few are
 misidentified as tp-link firmware (See appended output/file-5.44.txt).
 
 For comparison reason i also run the file format identification utility
 DROID ( See https://sourceforge.net/projects/droid/). This also does not
 recognize that samples.
 
 Unfortunately i find no good file format explanation. So i myself use a PDF
 document with title "El Torito Bootable CD-ROM Format Specification". The
 sample i found was version 1.0 with date January 25, 1995. This is
 incomplete but was sufficient to understand what is going on. So in the end
 i choose the "El Torito" section on ISO 9660 CD-ROM page on Wikipedia. So
 this now is expressed inside definition by line like:
 <RefURL>https://en.wikipedia.org/wiki/ISO_9660#El_Torito</RefURL>
 
 Because files are binary the generic application/octet-stream is not wrong.
 For the corresponding CD-ROM images the application/x-iso9660-image mime
 type is used. So i choose a similar name. That is expressed by line like:
 <Mime>application/x-iso9660-bootcatalog</Mime>
 
 The suffix for such catalog is often cat or some times catalog. That is
 expressed by line like:
 <Ext>CAT/CATALOG</Ext>
 Unfortunately these 2 suffix are also used by other file types. So an unique
 mime type is especially needed, when you are interested that the most best
 describing system will win.
 
 So i run tridscan on such catalog samples to generate cat-boot.trid.xml.
 With the help of file format specification i look at my TrID definition and
 try to understand why some constructs appear or where i can refine the
 definition.
 For comparison reason i create a patched file command that shows the stored
 fields (See appended output/file.txt).
 
 According to documentation first entry (size is 32 bytes) is validation
 entry. This starts with Header ID byte which must be 1.  This is expressed
 by first XML construct like:
 <Bytes>01</Bytes>
 <Pos>0</Pos>
 
 At offset 1 the Platform ID byte is stored. In most examples value was
 nil. That means 80x86 platform. In few sample i get hexadecimal value
 EF. That means EFI. This is not mentioned in old specifications.
 
 At offset 2 a reserved word is stored. According to documentation value must
 be nil. This is expressed by second XML construct like:
 <Bytes>0000</Bytes>
 <Pos>2</Pos>
 
 From offset 7 til 27 an ID string is located. In many of my examples this
 field was empty. But in few examples i found short string like ipxe.org. So
 this was expressed by XML construct like:
 <Bytes>000000</Bytes>
 <Pos>25</Pos>
 So when string is reaching maximal length this construct will vanish. So i
 delete this.
 
 From offset 28 til 29 a check sum is stored. In many of my examples this
 field was byte sequence aa55. For sample with non nil id string i get other
 values.  I do not understand this, because the sum of all the words in this
 record (that are 32 bytes if i understand right) should be 0.
 
 From offset 30 til 31 the boot signature is stored. That is byte sequence
 55aa or hexadecimal aa55 in little endian format.
 
 At offset 32 the next section start. Here comes the Initial/Default
 Entry. The first byte is the Boot Indicator. Hexadecimal value 88 means boot
 able and value 0 means not boot able. In all my examples i only found the
 first value. If i try to create with mkisofs a boot catalog with -c option
 only it complains with error message.
 mkisofs: No boot image specified.
 So i do not know if it is possible to create a boot catalog without a boot
 image. So probably the 88 value is probably always true.
 
 So these 2 observations are expressed by XML construct like:
 <Bytes>55AA88</Bytes>
 <ASCII> U</ASCII>
 <Pos>30</Pos>
 
 At offset 33 the boot media type byte is stored. In many examples i get here
 value 0. That means no emulation mode. In few samples like boot-gag.catalog
 (https://gag.sourceforge.net/ -> gag4_10.zip -> cdrom.iso -> boot.catalog) i
 found value 2. That means boot floppy disc with size 1440 KB.
 
 At offset 34 the load segment for the initial boot image is stored as 2 byte
 value. In all my examples this field was nil. That means the system will use
 the traditional segment of 7C0. It is very unlikely that other value occur,
 but according to mkisofs(8) man page you can choose another segment address
 value by -boot-load-seg option.
 At offset 36 the system type is stored as byte. This must be a copy of byte
 5 (System Type) from the Partition Table found in the boot image. This
 applies only for media_type 4. That means emulation of boot able hard
 disk. Unfortunately in my expected samples i found no emulated boot able
 hard disc. So in my examples this value was always 0.
 At offset 37 an reserved byte value is stored. This must be 0.
 So this 3 facts are expressed by XML construct like:
 <Bytes>00000000</Bytes>
 <Pos>34</Pos>
 
 Assuming that non standard boot segment and other system types are possible
 only reserved byte will survive. So this will become like:
 <Bytes>00</Bytes>
 <Pos>37</Pos>
 
 At offset 38 the length of boot part is stored as 2 byte value.  This is the
 number of virtual/emulated sectors the system will store at Load Segment
 during the initial boot procedure.
 At offset 40 the start address of the virtual disk is stored as as 4 byte
 value. In my examples i got "low" values ( like 0x1a 0x35 0x4b 0x52 0x91
 0xa2 0x21e 0x50d).
 At offset 44 til 63 an reserved area is stored. This must be 0.
 So these 2 observations are expressed by XML construct like:
 <Bytes>00000000000000000000000000000000000000000000</Bytes>
 <Pos>42</Pos>
 Assuming that higher start addresses can occur, then only the reserved area
 will survive. So this now becomes like:
 <Bytes>0000000000000000000000000000000000000000</Bytes>
 <Pos>44</Pos>
 
 At offset 68 the next section start. This starts with header indicator
 byte. A few of my examples start with hexadecimal value 91. That means final
 Header. But in many of my examples i find here value 0. That is not
 explained in the documents i found. According to documentation i also expect
 as second possible other value 90 here for more headers follow. In the
 examples with 91 value i found plausible value 1 for Section entries, but in
 the entries with 0 header indicator i get 0 for Section entry. So i am not
 sure about the meaning of these fields at such higher offset. In my
 generated TrID definition i got here only some nil sequences at higher
 offsets like:
 <Bytes>0000000000000000000000000000000000000000000000000000000000</Bytes>
 <Pos>67</Pos>
 <Bytes>0000000000</Bytes>
 <Pos>97</Pos>
 <Bytes>00000000000000000000000000000000000000000000000000000000
 <Pos>107</Pos>
 Maybe that there exist samples with more entries at higher offsets. Then
 this nil sequence probably will shrink. But i decide to delete these nil
 patterns at higher offsets. With leading pattern the recognition hopefully
 will still be unique enough and probably other catalogs with more entries
 will also be recognized.
 
 With the new trid definitions now my boot catalog examples are now
 recognized described but the misidentification as "bCAD Drawing" still exist
 and unfortunately comes first (see appended output/trid-v-new.txt). TrID
 definition, some samples and output are stored in archive boot_catalog.zip.
 
 I hope that my XML file can be used in future version of triddefs.
 
 With best wishes
 Jörg Jenderek
 
- 
				Will try to check with some other file samples. Thanks!