Hello,
when i run trid on some VirtualBox Disk Image like qemu-nostatic-3MB.vdi
it is only described as "Unknown!" or other (see appended vdi-old.txt )
A good starting point for VirtualBox Disk Image is
http://fileformats.archiveteam.org/wiki/VDISo i add to new trid definition files this URL as reference by line
<RefURL>
http://fileformats.archiveteam.org/wiki/VDI</RefURL>
The format of such VirtualBox Disk Images is described in header file
VDICore.h of VirtualBox source found at
https://www.virtualbox.org/browser/vbox/trunk/src/VBox/Storage/VDICore.hAccording to that file at offset 0 szFileInfo[64] is stored , which is just text
info for eyes only about image type.
Often a text like
<<< Oracle VM VirtualBox Disk Image >>>
is found. This variant is described by vdi-vbox-oracle.trid.xml.
But examples created by QEMU tools like "qemu-img create -f vdi "
start with other word sequence like
<<< QEMU VM Virtual Disk Image >>>
So derived from vdi-vbox-oracle.trid.xml a new variant vdi-vbox-qemu.trid.xml with
this start pattern.
Furthermore an old (2010) image like NewHardDisk1.vdi is not described by
vdi-vbox-sun.trid.xml because it starts with
<<< Sun VirtualBox Disk Image >>>
instead
<<< Sun xVM VirtualBox Disk Image >>>
So i create a variant vdi-vbox-sun_old.trid.xml for that example.
File innotec-static-4MB.vdi is not detected by vdi-vbox-img.trid.xml because
start string <<< innotek VirtualBox Disk Image >>> is terminated by nul instead
linefeed. So i create variant vdi-vbox-innotek.trid.xml.
File x.vdi is only detected with 0.6% at about position 50 as "VirtualBox Disk
Image (Oracle)" whereas biggest rate with 8.9% is "Acrobat Distiller Job
Options" by joboptions.trid.xml.
So i try to compare two trid definition files. joboptions.trid.xml contains
a GlobalStrings sections. So i create variant vdi-vbox-oracle_new.trid.xml with
additional section
<GlobalStrings>
<String>ORACLE VM VIRTUALBOX DISK IMAGE</String>
</GlobalStrings>
But this does not help.
So i run tridscan and look for generic true patterns.
According to header file at offset 0 character szFileInfo with 64 characters is
stored. If field is not maximal filled it is padded with null bytes, expressed
by XML construct
<Bytes>000000000000000000000000000000000000000000000000</Bytes>
<Pos>40</Pos>
At offset 44h 4 byte version number (major and minor) is stored. Most and up to date
version is 1.1 , but according to documentation also 1.0 and old 0.y should
exist. I only found version 1.1 but with the help of dd command and hex editor
i construct a version 0.2 variant of vdi-5c32h4s.vdi.
The correctness can be verified by executing VirtualBox tool:
vbox-img info --filename vdi-5c32h4s-v0.2.vdi | grep Version
Header: Version=00000002 Type=1 Flags=0 Size=5242880
So version is general described by patterns
<Pattern>
<Bytes>00</Bytes>
<Pos>69</Pos>
</Pattern>
<Pattern>
<Bytes>00</Bytes>
<Pos>71</Pos>
</Pattern>
When looking in header source file it can seen that header of VDI files
use little changed structures depending on version like
VDIHEADER0 ~version 0.y
VDIHEADER1 ~version 1.x
VDIHEADER1PLUS ~version 1.1 and probably newer
That means that block size which is normally 512 occurs at different location
after cylinder/head/sector field depending on version.
So patterns after offset 72 are not generic any more.
But with this new oracle variant .JOBOPTIONS variant rate is still 25.6% and
rate raise to 2.4% from old 1.7%.
So the weight algorithm of trid seems to need some improvements.
With these 4 new definition files finally all my VDI files
are now recognized ( see output vid-new.txt )
trid definition and output are stored in attached archive vdi.zip.
I hope that my XML files can be used in future version of triddefs.
With best wishes
J?rg Jenderek