Author Topic: vhdx.trid.xml for Virtual HD image eXtended (*.vhdx)  (Read 5263 times)

jenderek

  • Sr. Member
  • ****
  • Posts: 375
vhdx.trid.xml for Virtual HD image eXtended (*.vhdx)
« on: November 07, 2018, 01:09:43 AM »
Hello trid users,

some days ago i run TrID on my backup files with filename extension
VHDX. These examples like Esp.vhdx are created by Windows own backup, but
are described by trid as "Unknown!" or "ISO 9660 CD image" ( see appended
output/trid-old.txt)

So first i run tridscan to generate a trid definition file. But unfortunately
this XML file contains many pattern based on few examples. So beginning is
described by pattern like:
   <ASCII> v h d x f i l e M . i . c . r . o . s . o . f . t .</ASCII>
   <Pos>0</Pos>

So i look for information about such file types. Luckily i found needed
information at Wikipedia. So i add this page as reference. This is expressed
by XML construct:
     <RefURL>https://en.wikipedia.org/wiki/VHD_(file_format)</RefURL>

According to that page the inspected files are in the successor format of
VHD already described by vhd.trid.xml. So i could describe such files as
"Virtual Hard Disk v2". But finally i labeled such file types as "Virtual HD
image eXtended", because the newer format extend the older VHD format with
new capabilities such as 16 TB maximum size. So mention this relation fact
in remark line. Furthermore this label fits good for abbreviation of used file
name extension expressed by line:

   <Ext>VHDX</Ext>

On that Wikipedia page a reference for that file format is mentioned. In
that document [MS-VHDX].pdf more information can be found. According to that
document the creator name like "Microsoft Windows 6.3.9600.18512" is
optional. So i remove that string from pattern and mention this fact in
remark line.

After creator name the next bytes are null, but this is not reliable. These
bytes are only described as reserved space. So i removed null bytes in
patterns. To distinguish VHDX files from text files starting with phrase
vhdxfile i look for more specific patterns.

Most file sections start with a specific signature. For header section this
are the 4 bytes "head". Furthermore for power failure consistency there exist
2 header sections.  One header is stored at offset 64 KB and the other at
128 KB. This is expressed by patterns:

   <Bytes>68656164</Bytes>
   <ASCII> h e a d</ASCII>
   <Pos>65536</Pos>
   <Bytes>68656164</Bytes>
   <ASCII> h e a d</ASCII>
   <Pos>131072</Pos>

At the moment there exist only one Version of VHDX format. That number is 1
and is stored as 2 bytes in header section. That is expressed by construct

   <Bytes>0100</Bytes>
   <Pos>65602</Pos>

The region tables start with 4 bytes "regi". These are stored at file
offset 192 KB and file offset 256 KB. This is expressed by patterns:

   <Bytes>72656769</Bytes>
   <ASCII> r e g i</ASCII>
   <Pos>196608</Pos>
   <Bytes>72656769</Bytes>
   <ASCII> r e g i</ASCII>
   <Pos>262144</Pos>

The Data Sector, Metadata Region, Data Descriptor and Entry Header start
with specific signatures. This is expressed in global string section by
lines:
   <String>DATA</String>
   <String>METADATA</String>
   <String>DESC</String>
   <String>LOGE</String>

Unfortunately i found no mime type. So handle it as binary by line:

   <Mime>application/octet-stream</Mime>

On the other hand for VHD images on my PC i found
"application/x-virtualbox-vhd". So may be vhd.trid.xml should be contains
updated mime type.

With new trid definition all inspected VHDX archives are now recognized (
see appended output/trid-new.txt). TrID definition and output are stored in
archive vhdx_trid.zip. I hope that the XML file can be used in future
version of triddefs

With best wishes
J?rg Jenderek

Mark0

  • Administrator
  • Hero Member
  • *****
  • Posts: 2840
    • Mark0's Home Page
Re: vhdx.trid.xml for Virtual HD image eXtended (*.vhdx)
« Reply #1 on: November 07, 2018, 03:47:32 AM »
Hi Joerg!
Thanks for the new definition!