Author Topic: TrID definition variants for VirtualBox Disk Image (.VDI)  (Read 4370 times)

jenderek

  • Sr. Member
  • ****
  • Posts: 370
TrID definition variants for VirtualBox Disk Image (.VDI)
« on: May 04, 2017, 03:06:45 PM »
Hello,
when i run trid on some VirtualBox Disk Image like qemu-nostatic-3MB.vdi
it is only described as "Unknown!" or other (see appended vdi-old.txt )

A good starting point for VirtualBox Disk Image is
http://fileformats.archiveteam.org/wiki/VDI
So i add to new trid definition files this URL as reference by line
   <RefURL>http://fileformats.archiveteam.org/wiki/VDI</RefURL>

The format of such VirtualBox Disk Images is described in header file
VDICore.h of VirtualBox source found at
https://www.virtualbox.org/browser/vbox/trunk/src/VBox/Storage/VDICore.h

According to that file at offset 0 szFileInfo[64] is stored , which is just text
info for eyes only about image type.
Often a text like
      <<< Oracle VM VirtualBox Disk Image >>>
is found. This variant is described by vdi-vbox-oracle.trid.xml.

But examples created by QEMU tools like "qemu-img create -f vdi "
start with other word sequence like
      <<< QEMU VM Virtual Disk Image >>>
So derived from vdi-vbox-oracle.trid.xml a new variant vdi-vbox-qemu.trid.xml with
this start pattern.

Furthermore an old (2010) image like NewHardDisk1.vdi is not described by
vdi-vbox-sun.trid.xml because it starts with
   <<< Sun VirtualBox Disk Image >>>
instead
   <<< Sun xVM VirtualBox Disk Image >>>
So i create a variant vdi-vbox-sun_old.trid.xml for that example.

File innotec-static-4MB.vdi is not detected by vdi-vbox-img.trid.xml because
start string <<< innotek VirtualBox Disk Image >>> is terminated by nul instead
linefeed. So i create variant vdi-vbox-innotek.trid.xml.

File x.vdi is only detected with 0.6% at about position 50 as "VirtualBox Disk
Image (Oracle)" whereas biggest rate with 8.9% is "Acrobat Distiller Job
Options" by joboptions.trid.xml.

So i try to compare two trid definition files. joboptions.trid.xml contains
a GlobalStrings sections. So i create variant vdi-vbox-oracle_new.trid.xml with
additional section
   <GlobalStrings>
      <String>ORACLE VM VIRTUALBOX DISK IMAGE</String>
   </GlobalStrings>
But this does not help.

So i run tridscan and look for generic true patterns.

According to header file at offset 0 character szFileInfo with 64 characters is
stored. If field is not maximal filled it is padded with null bytes, expressed
by XML construct
   <Bytes>000000000000000000000000000000000000000000000000</Bytes>
   <Pos>40</Pos>

At offset 44h 4 byte version number (major and minor) is stored. Most and up to date
version is 1.1 , but according to documentation also 1.0 and old 0.y should
exist. I only found version 1.1 but with the help of dd command and hex editor
i construct a version 0.2 variant of vdi-5c32h4s.vdi.
The correctness can be verified by executing VirtualBox tool:
   vbox-img info --filename vdi-5c32h4s-v0.2.vdi | grep Version
   Header: Version=00000002 Type=1 Flags=0 Size=5242880

So version is general described by patterns
      <Pattern>
         <Bytes>00</Bytes>
         <Pos>69</Pos>
      </Pattern>
      <Pattern>
         <Bytes>00</Bytes>
         <Pos>71</Pos>
      </Pattern>

When looking in header source file it can seen that header of VDI files
use little changed structures depending on version like
   VDIHEADER0   ~version 0.y
   VDIHEADER1   ~version 1.x
   VDIHEADER1PLUS   ~version 1.1 and probably newer

That means that block size which is normally 512 occurs at different location
after cylinder/head/sector field depending on version.
So patterns after offset 72 are not generic any more.

But with this new oracle variant .JOBOPTIONS variant rate is still 25.6% and
rate raise to 2.4% from old 1.7%.

So the weight algorithm of trid seems to need some improvements.

With these 4 new definition files finally all my VDI files
are now recognized ( see output vid-new.txt )

trid definition and output are stored in attached archive vdi.zip.
I hope that my XML files can be used in future version of triddefs.

With best wishes
J?rg Jenderek

Mark0

  • Administrator
  • Hero Member
  • *****
  • Posts: 2685
    • Mark0's Home Page
Re: TrID definition variants for VirtualBox Disk Image (.VDI)
« Reply #1 on: May 05, 2017, 01:29:12 PM »
Hi!
Thanks as usual for the new defs and all the infos!
I'm not sure of how much details are needed in the defs for this filetype. But at the very least, I think I'll add a generic def that match all those different variants (and probably others/futures too).