Author Topic: cat-boot.trid.xml for CD-ROM boot catalog  (Read 799 times)

jenderek

  • Sr. Member
  • ****
  • Posts: 375
cat-boot.trid.xml for CD-ROM boot catalog
« on: March 07, 2023, 01:26:57 AM »
Hello trid users,

some days ago i looked at medium of my last Linux installation. So i looked
at files in boot directory. On many installation CD-ROM images i find
corresponding boot catalog. Often this has name boot.cat, sometimes i found
name boot.catalog and in one case the name was ISOLINUX.CAT. When using
creating program mkisofs you can specify the path and filename of the boot
catalog to be used when making an "El Torito" boot able CD by -c option.
That is described in the man page mkisofs(8).

When running TrID command on such examples these are described as "Unknown!"
or wrong as "bCAD Drawing" by "bdf-drawing.trid.xml" (See appended
output/trid-v-old.txt).

For comparison reason i also run file command (newest version 5.44) on such
samples. Here most samples are described also generic as "data" or few are
misidentified as tp-link firmware (See appended output/file-5.44.txt).

For comparison reason i also run the file format identification utility
DROID ( See https://sourceforge.net/projects/droid/). This also does not
recognize that samples.

Unfortunately i find no good file format explanation. So i myself use a PDF
document with title "El Torito Bootable CD-ROM Format Specification". The
sample i found was version 1.0 with date January 25, 1995. This is
incomplete but was sufficient to understand what is going on. So in the end
i choose the "El Torito" section on ISO 9660 CD-ROM page on Wikipedia. So
this now is expressed inside definition by line like:
   <RefURL>https://en.wikipedia.org/wiki/ISO_9660#El_Torito</RefURL>

Because files are binary the generic application/octet-stream is not wrong.
For the corresponding CD-ROM images the application/x-iso9660-image mime
type is used. So i choose a similar name. That is expressed by line like:
   <Mime>application/x-iso9660-bootcatalog</Mime>

The suffix for such catalog is often cat or some times catalog. That is
expressed by line like:
   <Ext>CAT/CATALOG</Ext>
Unfortunately these 2 suffix are also used by other file types. So an unique
mime type is especially needed, when you are interested that the most best
describing system will win.

So i run tridscan on such catalog samples to generate cat-boot.trid.xml.
With the help of file format specification i look at my TrID definition and
try to understand why some constructs appear or where i can refine the
definition.
For comparison reason i create a patched file command that shows the stored
fields (See appended output/file.txt).

According to documentation first entry (size is 32 bytes) is validation
entry. This starts with Header ID byte which must be 1.  This is expressed
by first XML construct like:
   <Bytes>01</Bytes>
   <Pos>0</Pos>

At offset 1 the Platform ID byte is stored. In most examples value was
nil. That means 80x86 platform. In few sample i get hexadecimal value
EF. That means EFI. This is not mentioned in old specifications.

At offset 2 a reserved word is stored. According to documentation value must
be nil. This is expressed by second XML construct like:
   <Bytes>0000</Bytes>
   <Pos>2</Pos>

From offset 7 til 27 an ID string is located. In many of my examples this
field was empty. But in few examples i found short string like ipxe.org. So
this was expressed by XML construct like:
   <Bytes>000000</Bytes>
   <Pos>25</Pos>
So when string is reaching maximal length this construct will vanish. So i
delete this.

From offset 28 til 29 a check sum is stored. In many of my examples this
field was byte sequence aa55. For sample with non nil id string i get other
values.  I do not understand this, because the sum of all the words in this
record (that are 32 bytes if i understand right) should be 0.

From offset 30 til 31 the boot signature is stored. That is byte sequence
55aa or hexadecimal aa55 in little endian format.

At offset 32 the next section start. Here comes the Initial/Default
Entry. The first byte is the Boot Indicator. Hexadecimal value 88 means boot
able and value 0 means not boot able. In all my examples i only found the
first value. If i try to create with mkisofs a boot catalog with -c option
only it complains with error message.
     mkisofs: No boot image specified.
So i do not know if it is possible to create a boot catalog without a boot
image. So probably the 88 value is probably always true.

So these 2 observations are expressed by XML construct like:
   <Bytes>55AA88</Bytes>
   <ASCII> U</ASCII>
   <Pos>30</Pos>

At offset 33 the boot media type byte is stored. In many examples i get here
value 0. That means no emulation mode. In few samples like boot-gag.catalog
(https://gag.sourceforge.net/ -> gag4_10.zip -> cdrom.iso -> boot.catalog) i
found value 2. That means boot floppy disc with size 1440 KB.

At offset 34 the load segment for the initial boot image is stored as 2 byte
value. In all my examples this field was nil. That means the system will use
the traditional segment of 7C0. It is very unlikely that other value occur,
but according to mkisofs(8) man page you can choose another segment address
value by -boot-load-seg option.
At offset 36 the system type is stored as byte. This must be a copy of byte
5 (System Type) from the Partition Table found in the boot image. This
applies only for media_type 4. That means emulation of boot able hard
disk. Unfortunately in my expected samples i found no emulated boot able
hard disc. So in my examples this value was always 0.
At offset 37 an reserved byte value is stored. This must be 0.
So this 3 facts are expressed by XML construct like:
   <Bytes>00000000</Bytes>
   <Pos>34</Pos>

Assuming that non standard boot segment and other system types are possible
only reserved byte will survive. So this will become like:
   <Bytes>00</Bytes>
   <Pos>37</Pos>

At offset 38 the length of boot part is stored as 2 byte value.  This is the
number of virtual/emulated sectors the system will store at Load Segment
during the initial boot procedure.
At offset 40 the start address of the virtual disk is stored as as 4 byte
value. In my examples i got "low" values ( like 0x1a 0x35 0x4b 0x52 0x91
0xa2 0x21e 0x50d).
At offset 44 til 63 an reserved area is stored. This must be 0.
So these 2 observations are expressed by XML construct like:
   <Bytes>00000000000000000000000000000000000000000000</Bytes>
   <Pos>42</Pos>
Assuming that higher start addresses can occur, then only the reserved area
will survive. So this now becomes like:
   <Bytes>0000000000000000000000000000000000000000</Bytes>
   <Pos>44</Pos>

At offset 68 the next section start. This starts with header indicator
byte. A few of my examples start with hexadecimal value 91. That means final
Header. But in many of my examples i find here value 0. That is not
explained in the documents i found. According to documentation i also expect
as second possible other value 90 here for more headers follow. In the
examples with 91 value i found plausible value 1 for Section entries, but in
the entries with 0 header indicator i get 0 for Section entry. So i am not
sure about the meaning of these fields at such higher offset. In my
generated TrID definition i got here only some nil sequences at higher
offsets like:
   <Bytes>0000000000000000000000000000000000000000000000000000000000</Bytes>
   <Pos>67</Pos>
   <Bytes>0000000000</Bytes>
   <Pos>97</Pos>
   <Bytes>00000000000000000000000000000000000000000000000000000000
   <Pos>107</Pos>
Maybe that there exist samples with more entries at higher offsets. Then
this nil sequence probably will shrink. But i decide to delete these nil
patterns at higher offsets. With leading pattern the recognition hopefully
will still be unique enough and probably other catalogs with more entries
will also be recognized.

With the new trid definitions now my boot catalog examples are now
recognized described but the misidentification as "bCAD Drawing" still exist
and unfortunately comes first (see appended output/trid-v-new.txt). TrID
definition, some samples and output are stored in archive boot_catalog.zip.

I hope that my XML file can be used in future version of triddefs.

With best wishes
Jörg Jenderek

Mark0

  • Administrator
  • Hero Member
  • *****
  • Posts: 2743
    • Mark0's Home Page
Re: cat-boot.trid.xml for CD-ROM boot catalog
« Reply #1 on: March 11, 2023, 01:49:30 AM »
Will try to check with some other file samples. Thanks!