Hello trid users,
some days ago i run TrID on dozens of Virtual HD image (*.vhd). Nearly all
are described correctly as "Virtual PC Virtual HD image". But one example
"x-vhd.VHD" was recognised only with 59.2% rate. This example was also
recognised with 40.7% probability by atm_vue4.trid.xml as "Vue D'Esprit 4
Atmosphere Preset". Furthermore all ASCII text files starting with string
"conectix" like example conectix.txt are misidentified as VHD image.
When looking inside trid definition file vhd.trid.xml, i see that only one
pattern is used for recognition. That is described by XML construct:
<Bytes>636F6E6563746978</Bytes>
<ASCII> c o n e c t i x</ASCII>
<Pos>0</Pos>
After work with the newest file command { see
https://en.wikipedia.org/wiki/File_(command) and appended
output/file-new.txt} i know that such images contain more specific
patterns. Information about this disk image format can be found on download
server of Microsoft as word document named "Virtual Hard Disk Format
Spec_10_18_06.doc".
First i run tridscan to generate starting definition file
vhd-v1.trid.xml. Next i refine this definition file according to found word
document. All values in the file format are stored in network byte order
(big endian). The described VHD variant starts with copy of Hard Disk
Footer, which begins with 8 byte cookie string "conectix". Next comes
Features field with possible values in range from 0 to 3. These 2 fields
are now described by XML construct:
<Bytes>636F6E6563746978000000</Bytes>
<ASCII> c o n e c t i x</ASCII>
<Pos>0</Pos>
Third field is File Format Version. For the current specification, this
field must be initialized to 0x00010000. Because VHD format has been
superseded by VHDX format, so higer version will not developed any more i
think. So this field value can be considered as fixed in my opinion.
Fourth field Data Offset holds the absolute byte offset as 8 byte value,
from the beginning of the file, to the next structure. I only found value
0x200. What does this mean? The next data structure directly starts after
the first block with size of 512 bytes. Assuming that no artificial
gaps occur this should always be true for considered non fixed VHD disk
variant in my opinion. These 2 field are now described by XML construct:
<Bytes>000100000000000000000200</Bytes>
<Pos>12</Pos>
According to documentation at offset 60 Disk Type is stored by 4 bytes. Used
values are in the range from 0 (NONE) til 6 (Reserved deprecated). This is
expressed by XML construct:
<Bytes>000000</Bytes>
<Pos>60</Pos>
After the last 1 byte variable Saved State at offset 84 the remaining
(512-85=427) bytes in block are reserved and therefore contains zeroes.
This is expressed by construct like:
<Bytes>0000000000000000...</Bytes>
<Pos>85</Pos>
For dynamic and differencing disk images, the "Data Offset" field points to
a secondary structure that provides additional information about the disk
image. The dynamic disk header should appear on a sector (512-byte)
boundary. The Dynamic Disk Header Format start with the cookie identifying
string "cxsparse". This can be expressed by construct:
<Bytes>6378737061727365</Bytes>
<ASCII> c x s p a r s e</ASCII>
<Pos>512</Pos>
I do not know the internals of TrID, but recognition rate increase
significantly if GlobalStrings section with 2 lines are added like:
<String>CONECTIX</String>
<String>CXSPARSE</String>
With this new definition file vhd-v1.trid.xml example "x-vhd.VHD" is
recognised with higher rate of 77.9% and example conectix.txt is not
misidentified any more ( see appended output/trid-new.txt).
Some VHD images like my example Drvspace98.vhd are not recognized by
vhd.trid.xml definition. By above mentioned vhd*.trid.xml definitions a
specific VHD variant is described. This can be easily verified by using
command line tool of virtualization QEMU to create disk images. An
recognised image can be created by command like:
qemu-img create -f vpc -o subformat=dynamic qemu16MB-dynamic.vhd 16M
If i replace "dynamic" by "fixed" in command i get another vhd variant like:
qemu-img create -f vpc -o subformat=fixed qemu16MB-fixed.vhd 16M
This example with null bytes is first recognised correctly by definition
files like null_bytes.trid.xml. When using such images in virtual machines
this raw disk must initialised by programs like fdisk. Then this fixed VHD
images get an master boot record and is described by mbr-dump.trid.xml as
"Master Boot Record dump".
So i update definition file mbr-dump.trid.xml. So first i add filename
extension "VHD". Then i look for further similar disk images. On Linux
i found inside file associations also "IMG" as name extension like in
example 2018-11-13-raspbian-stretch-lite.img. Furthermore there a user
defined mime type is listed. This is now expressed by added XML line:
<Mime>application/x-raw-disk-image</Mime>
If the Master Boot Record ( first sector) of such disks is stored
alone, then often filename extension "MBR" is used. So these 3
filename extensions are now described by line:
<Ext>MBR/IMG/VHD</Ext>
Furthermore i add information about different name extensions in remark
line.
According to Microsoft's VHD specification the above described
characteristic found at the beginning of dynamic disk images should be
also found at the end of image, because dynamic images begins with a
copy of footer. So in fixed VHD images the magic string "conectix"
should be found in the end zone. So i create a variant of
mbr-dump.trid.xml as vhd-mbr.trid.xml with 1 additional line in global
string section like:
<String>CONECTIX</String>
There exist also a third variant of VHD images dealing with
differencing disk images. Unfortunately i have no example for that type,
but is should look similar to "dynamic" variant.
With new and updated trid definitions all dozens of inspected disk
images are now recognized ( see appended output/trid-new.txt). TrID
definition and output are stored in archive vhd_mbr.zip. I hope that
the XML files can be used in future version of triddefs.
With best wishes
J?rg Jenderek