Author Topic: updated iso-9660-image.trid.xml,iso-hfs.trid.xml for ISO images and others  (Read 772 times)

jenderek

  • Sr. Member
  • ****
  • Posts: 375
Hello trid users,

Some days ago i handle CD-ROM images (401 samples including duplicates). The
standard format has a suffix like iso. That is expressed in
iso-9660-image.trid.xml by line like:
   <Ext>ISO</Ext>

The iso9660 extension is less common, but also in use. On my system i do not
found such samples. Also the suffix isoimg and cdr should be used according
to file formats archive team. But i do not found such samples on my
systems. But i found one example with IBQ suffix. That means ImgBurn Queue
File. I also found one example DropboxInstaller.dmg with DMG suffix.

The mime type is shown by line like:
   <Mime>application/x-iso9660-image</Mime>
That is what i am suited. But i look on Debian 11 based raspbian
system. There this mime type is not listed any more. Instead for ISO i found
some entries ( apparently depending on sub classifications) like:
   application/x-cd-image
   application/x-gamecube-rom
   application/x-saturn-rom
   application/x-sega-cd-rom
   application/x-wii-rom

In the next step i do sub selection. Now i only consider ISO samples which
are described by file command (version 5.44) as "ISO 9660 CD-ROM filesystem
data" (See appended output/file-5.44.txt). Here 2 suffix iso/iso9660 are
listed (See appended output/file-ext-5.44.txt) and
application/x-iso9660-image is used as mime type (See appended
output/file-i-5.44.txt). Now i get 344 samples. 263 samples are described
by iso-9660-image.trid.xml and 81 are not.

So i take 15 of such samples and run trid with high n option to see what are
these described (see appended output/trid-old.txt). To be sure that this are
valid ISO 9660 CD image i run isoinfo(8) command line tool by line like:
      isoinfo -d -i MYImage.iso  > MYImage.iso.txt
The -d option print information from the primary volume descriptor (PVD) of
the iso9660 image. This includes information about Rock Ridge, Joliet
extensions and Eltorito boot information if present.  By the -i option you
specifies the path of the iso9660 image that you wish to examine.
With -l option you generate output as if a 'ls -lR' command had been run on
the iso9660 image. That might be done by command line like:
    isoinfo.exe" -l -i  krd12Jan2019.iso  > krd12Jan2019.iso-l.txt
You can also do this verification by 7-zip packing tool via command
lines like:
   7z t -tIso krd12Jan2019.iso
   7z l -tIso krd12Jan2019.iso > krd12Jan2019.iso-7z.txt
With t option only integrity of image is tested ("Everything is Ok" and with
l option the contents of archive is listed. With option -tIso option the
archive type is explicitly selected. The advantage of 7-zip tool is that you
can also select other archive types (-tMBR for master boot record and -tGPT
for GUID Partition Table).

If isoinfo command with -d option really succeed then the output start with
line like:
   CD-ROM is in ISO 9660 format
For one example fmt-468-signature-id-730.iso.txt this start with line like:
   CD-ROM is NOT in ISO 9660 format
That is not surprising because the sample is used by DROID tool to recognize
ISO image and therefor contain only some leading bytes and many fields are
empty (See appended output/*.iso.txt).

So i run tridscan on undetected ISO samples to update trid definition. Then
i look what has changed and try to understand why. In global strings section
the line with FILES vanish and only one survived. That looks like:
   <String>CD001</String>
Furthermore in Front Block section six nil sequences vanish.  The first one
was:
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>336</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>356</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>373</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>376</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>386</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00000000</Bytes>
      <Pos>396</Pos>
   </Pattern>
   <Pattern>
      <Bytes>0000000000000000000000000000000000</Bytes>
      <Pos>403</Pos>
   </Pattern>
   <Pattern>
      <Bytes>0000000000000000</Bytes>
      <Pos>424</Pos>
   </Pattern>
   <Pattern>
      <Bytes>0000</Bytes>
      <Pos>435</Pos>
   </Pattern>
Number six was:
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>965</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>969</Pos>
   </Pattern>
   <Pattern>
      <Bytes>0000</Bytes>
      <Pos>972</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>977</Pos>
   </Pattern>

The first sectors are not used by the ISO 9660 file systems itself. But
there exist hybrid system which contain more than one file system or other
staff. One additional system is the DOS master boot record (MBR). This
occupies the first 512 bytes. In file command output this are marked by
additional (DOS/MBR boot sector) phrase. So in current definition such
samples are not recognized because some bytes in first 512 range are not nil
here. This applies to following samples:
   AcronisBootableMedia.iso
   krd12Jan2019.iso
   openSUSE-Leap-15.4-NET-x86_64-Build243.2-Media.iso
   super_grub2_disk_hybrid_2.02s4.iso
   ventoy-1.0.75-livecd.iso
   ventoy-1.0.88-livecd.iso
These samples are described correctly also as "Master Boot Record dump" by
mbr-dump.trid.xml. At the beginning the initial program loader code starts
and at the end of this range the partition table entries are stored. So it
should be possible to create an ISO image with a MBR with all partition
entries are filled with high values. Then all XML construct til offset 512
will vanish.

Then i found some other hybrid ISO samples like:
   WordPerfect_ClipArt_Premium_MacWin_1994.iso
   3ds max R4 Bible.iso
   krd12Jan2019.iso

When running file command with keep going option -k, then such samples are
also described as "Apple Driver Map" with associated block size with values
like 512 (0200 hexadecimal big endian) or 2048 (0800 hexadecimal big endian)
and used block count is shown (See appended output/file-k-5.44.txt).

These samples should be described as "Apple ISO9660/HFS hybrid CD image" by
iso-hfs.trid.xml. The current first XML construct look like:
   <Bytes>4552020000</Bytes>
   <ASCII> E R</ASCII>
   <Pos>0</Pos>

According to documentation the start is 2 bytes ER as magic pattern. At
offset two the used block size is stored as 2 byte integer in big endian
form. At offset four the block count is stored as 4 byte integer in big
endian form. So by current definition only samples with block size 512
(=0x0200) are matched. So sample krd12Jan2019.iso (Kaspersky anti virus;
current krd.iso found on web site kaspersky.com under item Downloads
Kaspersky Rescue Disk) with block size 2048 (=0x0800) is not recognized. So
i update iso-hfs.trid.xml. With that higher block count values this XML
construct becomes like:
   <Pattern>
      <Bytes>4552</Bytes>
      <ASCII> E R</ASCII>
      <Pos>0</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>3</Pos>
   </Pattern>

In current definition no reference URL is is mentioned. So i add one. This
is now expressed by line like:
 <RefURL>http://fileformats.archiveteam.org/wiki/Apple_Partition_Map</RefURL>

The ASCII characters "PM" may also appear at larger multiples of 512. This
is the start magic of a Apple Partition Map entry. Often this appears at
offset 512. That was expressed by second XML construct like:
   <Pattern>
      <Bytes>504D</Bytes>
      <ASCII> P M</ASCII>
      <Pos>1024</Pos>
   </Pattern>

The krd12Jan2019.iso example contains here 8-byte string "EFI PART" which is
indicator for GPT partition table. So the second XML construct vanish.  I
verified this by 7-zip command line like:
   7z  l -tGPT krd12Jan2019.iso > krd12Jan2019-GPT.txt

Then i also found ISO image which contain only "Apple Driver Map" but no CD
ISO 9660 file file system any more. Such samples are:
   Mac OS 9.2.2 Universal Install.iso
   macOS Catalina Final [Geekrar].iso

These are now described by updated iso-hfs.trid.xml. Because CD ISO 9660
file file system is not used here any more than of course the phrase
"ISO9660/" inside file type is wrong and must be canceled.  On documentation
about Apple Partition Map no mime type is mentioned, but the partitions
itself are apple disk images with mentioned mentioned mime type. Then the
map should get the same type. That is expressed by line like:
   <Mime>application/x-apple-diskimage</Mime>

That is also shown by file command when running with -m Magdir/apple and -i
option (See appended output/file-apple-i-5.44.txt).

On web site 4 different extensions are listed:
   .iso, .dmg, .cdr, .toast

So i looked on my system for DMG samples ( See
http://fileformats.archiveteam.org/wiki/DMG). Samples that are bzip2 or zlib
compressed are skipped by myself and i only look on samples with pure "Apple
Driver Map". Such samples are:
   Apple Media Toolkit - Fall 1992.dmg
   hdiutil-1.dmg
   expander_installer_multi_4790.dmg
   expander_installer_en_6330-2.dmg
   mmdr3.dmg
   Firefox 48.0-2.dmg
   test-apm-10mb.dmg
   appleiiworksenvoy.dmg
   Gimp_2.2.14_PPC_quartz.dmg
   Apple_Legacy_CD.dmg
Because these samples are not CD images then inside iso-hfs.trid.xml the
phrase "CD image" must be changed to something like "Disk image".

I am not sure that such images already contains a disk image with HFS file
format. If this is true this will be indicated by phrase "type Apple_HFS" in
file command output. In expander_installer_en_6330-2.dmg example this is
missing. But i am not sure if this image is corrupt. So i delete phrase in
"HFS" in File Type element. In the end i changed description. So this is now
expressed by line like:
   <FileType>Apple Partition Map (APM) disk image</FileType>

So i looked on my system for TOAST samples ( See
http://fileformats.archiveteam.org/wiki/TOAST). Samples that also contains
an ISO 9660 CD-ROM file system are skipped by myself and i only look on
samples with pure "Apple Driver Map". Such samples are:
     691-6551-A,2Z,iWork v9.0.3. Install Disc_2009 (DVD).toast
     Aladdin Spring Cleaning 2.0 CD-ROM.toast
     MacHome CD-ROM 02-1998.toast
     May - Mac Apps 3.toast
     Norton AntiVirus 10.0 CD for Macintosh.toast
     Power Mac Screen Saver.toast

The first is not detected by old definition, but with the updated definition
it is. I also look for CDR samples with APM, but i do not found such
samples. So 3 extensions are described by updated iso-hfs.trid.xml. This is
now expressed by line like:
   <Ext>ISO/DMG/TOAST</Ext>

Then of course the name iso-hfs.trid.xml makes no real sense but at
the moment i keep it.

For comparison reason i also run the file format identification utility
DROID ( See https://sourceforge.net/projects/droid/).  The samples described
by file command as "Apple Driver Map" are here described as "Apple Partition
Map Disk Image" by PUID fmt/1740 (See appended output/droid-iso.csv).
In principal it does same checks as other tools. It check also block size
for value 512 or 2048. It assume that at offset 512 PM magic is always
found, which is not always true. Furthermore according to DROID ER at
beginning can be missing and additional 16 bytes can occur before. It
displays 2 more extensions (bin and img).
The samples described by file command as "ISO 9660 CD-ROM filesystem" are
described here as "ISO 9660 Disk Image File" by PUID fmt/468.
The sector size is 2048 and the first sixteen sectors are not relevant
here. In next sector DROID look for something like string *CD001 where
* represents the volume descriptor type as defined within the
specification. The file command recognize in similar way. This looks at
offset 32769 for string CD001.

When transferring this method inside a TrID definition like
iso-9660-image-foo.trid.xml this would be mainly expressed by XML construct
like:
   <Bytes>4344303031</Bytes>
   <ASCII> C D 0 0 1</ASCII>
   <Pos>32769</Pos>
Unfortunately this does work with current TrID version 2.24. This probably
depend on size limit. So it would be nice if TrID tool can be changed to
accept such definitions.

At the moment by updated iso-9660-image.trid.xml ISO 9660 CD images are
recognized by some nil sequences near the beginning, but as far as i can see
there my exist ISO hybrid samples with more filled GPT or APM parts. Then in
consequence all pattern in front block will vanish and definition will not
work any more.

With the new trid definition now most ISO, DMG and TOAST examples are
described (see appended output/trid-v-new.txt). TrID definitions, some
samples and output are stored in archive iso_9660.zip. I hope that my
updated definitions can be used in future version of triddefs.

There exist some ISO samples which are not recognized by current
definitions. I will try to handle these in a future session.

With best wishes
Jörg Jenderek

Mark0

  • Administrator
  • Hero Member
  • *****
  • Posts: 2743
    • Mark0's Home Page
Thanks!