Hello trid users,
some months ago i send definitions to do sub classification of Windows NT
Registry Hive. In this session i will handle Boot Configuration Data variant.
Such files have name suffix BCD (like NeoSmart.bcd or "EasyBCD Backup
(2014-06-30).bcd"). Or the file name is BCD. Such samples can be typically
found in directory EFI\Microsoft\BOOT on the EFI partition.
Unfortunately as usual you do not find information about file format from
Microsoft. Either you get samples with accessing files via API (BCD.docx) or
low level information like click on foo to get bar. Luckily there exist an
unofficial page about Windows registry file format on GitHub. This describe
some technical aspects. So i use this as reference in one definition. That is
expressed by line like:
<RefURL>
https://github.com/msuhanov/regf/blob/master/ Windows%20registry%20file%20format%20specification.md
</RefURL>
There exist on Wikipedia a page about Boot Configuration (BCD)
Data. Unfortunately this page exist only in language for German, french and
Spanish but not for English language. There is described what is BCD
about. Also tools to handle such files are described. That is expressed in
other definition by line like:
<RefURL>
https://de.wikipedia.org/wiki/Boot_Configuration_Data </RefURL>
So i run trid utility on such BCD examples. All samples are recognized and are
described in principal OK as "Windows NT Registry Hive (generic)" by
hiv.trid.xml. But file name suffix is wrong. It is not HIV/DAT (see appended
trid-v-old.txt in output).
For comparison reason i also run the file format identification utility DROID
(See
https://sourceforge.net/projects/droid/). Here the samples are not
recognized.
For comparison reason i also run file command (version 5.45) on such
samples. Here such samples are recognized and described generic as "MS Windows
registry file, NT/2000 or above" (see appended file-5.45.txt in output). The
mime type is here generic application/octet-stream (see appended
file-i-5.45.txt in output). The file name suffix is also not recognized (see
appended file-ext-5.45.txt in output).
Instead of generic application/octet-stream mime type i choose the type used
for generic Windows NT Registry Hive. That is expressed by line like:
<Mime>application/x-ms-registry</Mime>
Because of missing complete information i first create TrID definition
hiv-bcd.trid.xml.~ by running tridscan on many (87) samples. Apparently i get
definition that is the average of variants. In global string sections i get 3
lines:
<String>H'''X</String>
<String>HBIN</String>
<String>REGF</String>
According to documentation the second is triggered by 4 byte signature hbin of
Hive bins header. The third is triggered by 4 byte signature regf of Base
block, also known as a file header. All these lines are apparently
characteristics for Windows Registry Hive, but no is specif for BCD. In
examples the first is triggered by second hbin cell but i do not know what
this exactly means. So i delete it because it does not seems to be specific
for BCD.
In Front Block section the same problem occur. There exist some nil sequences.
Obviously these are triggered by lucky circumstances (too few samples with
fields not reaching maximum filling or padding bytes). The first is
characteristic for all hive. That is expressed by XML construct like:
<Pattern>
<Bytes>72656766</Bytes>
<ASCII> r e g f</ASCII>
<Pos>0</Pos>
</Pattern>
Then there are 2 non nil sequences. These look like:
<Pattern>
<Bytes>01010000000300000000000000010000002000000000</Bytes>
<Pos>19</Pos>
</Pattern>
<Pattern>
<Bytes>0001000000</Bytes>
<Pos>43</Pos>
</Pattern>
Apparently here are some fields constant, but none are specific for BCD.
So i looked in output of patched file command according to documentation (see
file.tmp in output). 6 of my samples look "strange" (like "EasyBCD Backup
(2019-04-15).bcd" "EasyBCD Backup (2019-04-21).bcd"). These are "empty"
without content. This becomes visible when you load such samples with
Microsoft registry editor regedit.exe. Or you can use the Forensic Registry
EDitor (fred). The advantage of this program is that there exist ports for
Windows and Linux. The disadvantage is that is does not work on all registry
examples, because file format is not officially revealed. So maybe the
function of some fields are not known and lead to program crashes. This tool
can be found at
https://www.pinguin.lu/fred .
I do not remember under which circumstances such samples appear. I guess that
these probably occur when you create a new BCD store. So probably at the
beginning of existence it contains no entries and is "empty". Now i found how
i can create such short samples. For examples by command like:
bcdedit /createstore MyStore.bcd
So i create hiv-bcd-empty.trid.xml by running tridscan on such samples. Here i
get only few samples, but the advantages is the file size is "low". So i got
not so much patterns in definition (So no needle in haystack problem).
When i look inside Global Strings section i see only 2 additional lines:
<String>NEWSTOREROOT</String>
<String>RMTM</String>
According to documentation the second is triggered by 4 byte signature rmtm of
GUID. According to documentation this field exist in Windows 10. So maybe if
other users have samples from older Windows version like Vista, Windows 7 or 8
this line will vanish. My samples are from Windows 10 and Windows 8, but such
samples are maybe "reorganized" by newer Windows version. So i keep this line
at the moment. Interesting is the first additional line. This is triggered by
string NewStoreRoot. This seems to be one specific tag for BCD. In my samples
the GUID signature appears as expected at fixed offset. This is expressed
inside front block section by XML construct like:
<Pattern>
<Bytes>00224DB08107726D746D0000000000000000000000000
<ASCII> . " M . . . r m t m</ASCII>
<Pos>158</Pos>
</Pattern>
According to documentation before that 16 byte GUID is stored. So apparently
in my samples 6 bytes are the same by lucky circumstance. So i remove bytes
before signature. After the signature 8 byte last reorganized timestamp is
stored. So in my samples value zero obviously means not reorganized. This make
sense for me and this is probably always true for such samples. So this
construct becomes like:
<Pattern>
<Bytes>726D746D0000000000000000000000000
<ASCII> r m t m</ASCII>
<Pos>154</Pos>
</Pattern>
The pattern before looks like:
<Pattern>
<Bytes>E91182</Bytes>
<Pos>154</Pos>
</Pattern>
According to documentation at offset 148 16 byte GUID TmId is stored. So
apparently by lucky circumstances 3 bytes are the same. Assuming that other
GUID values can occur this construct vanish.
The pattern before looks like:
<Pattern>
<Bytes>00224DB0810700000000</Bytes>
<ASCII> . " M</ASCII>
<Pos>138</Pos>
</Pattern>
The same thoughts must be applied here. According to documentation at offset
128 16 byte GUID LogId is stored and at offset 144 4 byte flags are stored. In
some other registry samples i also found values like 8760c578 D5501C4E
FFFFFF90. But in my samples flags are zero which means not locked and
fragmented. So apparently by luckily circumstances the last 6 bytes of LogId
in my samples are the same. Assuming that other GUID values can occur this
construct shrink and becomes like:
<Pattern>
<Bytes>00000000</Bytes>
<Pos>144</Pos>
</Pattern>
The pattern before look like:
<Pattern>
<Bytes>E91182</Bytes>
<Pos>134</Pos>
</Pattern>
So same middle bytes in GUID LogId are the same by lucky circumstances.
Assuming that other GUID values can occur this construct vanish.
The pattern before look like:
<Pattern>
<Bytes>00224D B0 81 07</Bytes>
<ASCII> . " M</ASCII>
<Pos>122</Pos>
</Pattern>
The same thoughts must be applied here. According to documentation at offset
112 16 byte GUID RmId is stored. Here in my samples the last 6 bytes of that
GUID are same. Assuming that other GUID values can occur this construct
vanish.
The pattern before look like:
<Pattern>
<Bytes>E91182</Bytes>
<Pos>118</Pos>
</Pattern>
According to documentation 3 middle bytes of GUID RmId are the same in my
samples. Assuming that other GUID values can occur this construct vanish.
At that point only 2 constructs are not inspected. The first XML construct
looks like:
<Pattern>
<Bytes>726567660100000001000000</Bytes>
<ASCII> r e g f</ASCII>
<Pos>0</Pos>
</Pattern>
According to documentation after regf signature 4 byte primary sequence number
is stored followed by secondary sequence number. The first number is
incremented by 1 in the beginning of a write operation on the primary
file. And the second is incremented by 1 at the end of a write operation on
the primary file and numbers should be equal after a successful write
operation. So apparently for such samples the sequence numbers are always be
1. So the above construct is probably always true. So i keep it.
So only second XML construct must be inspected. This looks like:
<Bytes>D401010000000300000000000000010000002000000000100000010000000
<Pos>18</Pos>
According to documentation at offset 12 8 byte last written timestamp is
stored. So by lucky circumstances (sames year range) the 2 last bytes
are the same in my examples. At offset 20 major version is stored. The
value in all NT Windows is 1. At offset 24 minor version is stored. In
my examples i get value 3. This is the values mostly found in my other
registry samples. Value 0 means "pre" version, 1 means NT 3.1, 2 means NT 3.5
and higher values 3,4,5,6 means NT 4 til Windows 11. At offset 28 file
type is stored. 0 means primary file and other values are used for
transaction variants (*.LOG*). So this is always true here. At offset
32 file format is stored. 1 means direct memory load; This is what i
also found in my other registry samples. At offset 36 root cell
offset is stored. In all my registry samples i get value 20h. At
offset 40 Hive bins data size is stored. Here for my examples i get
value 1000h. That is apparently the minimal possible value. Together
with same size 4096 for base block the summed size is 8192. So this
is the file size. So apparently this is true for such empty samples.
At offset 44 the clustering factor is stored. In all my registry
samples i get here value 1. That means 512 block size. At offset 48
the partial file path is stored. These 64 bytes contain UTF-16 LE
encoded name. In the empty samples this string is not used So it is
filled with nil bytes. So assuming that other modified time-stamps are
possibles. The first 2 bytes in that pattern ( last 2 byte of
modification time stamp ) vanish and this becomes like:
<Bytes>010000000300000000000000010000002000000000100000010000000
<Pos>20</Pos>
So i create hiv-bcd.trid.xml by running tridscan on non empty samples.
The last XML construct is like in other definition. At offset 200h comes 600h
padding nil bytes. This is expressed by XML construct like:
<Bytes>0000000000000000000000000000000000000000000000000000000
<Pos>512</Pos>
In other definition after rmtm signature comes long nil sequence til about 512
limit. Here in this area i get short nil byte sequences like:
<Pattern>
<Bytes>00</Bytes>
<Pos>177</Pos>
</Pattern>
...
<Pattern>
<Bytes>00000000</Bytes>
<Pos>504</Pos>
</Pattern>
Assuming that fields in this area are not relevant for BCD i delete such
patterns.
At offset 144 4 byte flags are stored. In 2 samples (like BCD) i get here
value 0x310034 whereas for the other value is zero, but documents mention only
value value 1 and 2 for locked and defragmented. So i do not know if this was
an accident. This is expressed by XML constructs like:
<Pattern>
<Bytes>00</Bytes>
<Pos>145</Pos>
</Pattern>
<Pattern>
<Bytes>00</Bytes>
<Pos>147</Pos>
</Pattern>
After running tridscan on short but not empty sample many items vanish. So in
global strings section only 7 lines survived. Three of them we already
know. These look like:
<String>H'''X</String>
<String>HBIN</String>
<String>REGF</String>
These apparently are characteristics of registry samples. I do not know the
meaning of the first one. It is part of a hive bin cell. I am quite sure that
it is not relevant for BCD. So i delete that line. The other 4 lines look
like:
<String>B'C'D'0'0'0'0'0'0'0</String>
<String>DESCRIPTION</String>
<String>KEYNAME</String>
<String>OBJECTS</String>
Apparently the first seem to be characteristic for all non empty BCDs. When
you look in exported output of regedit you see 2 key-branches Description and
Objects. The first branch contains a string combination like
KeyName=BCD00000001 ( See appended MyStore2.reg in output). In most examples i
get that combination, but in some (5) samples instead of 1 another digit
character like 0 is used. I guess that this something like a counter. So i
mention my observations in the remark line.
At offset 48 the partial file path is stored. These 64 bytes contain UTF-16 LE
encoded name whereas in other definition these are nil. These names parts look
like:
EasyBCD Backup (2014-06-30).bcd
fb50349}\EFI\Microsoft\Boot\BCD
Device\HarddiskVolume1\Boot\BCD
Because name part are ASCII like stored as UTF16-LE on odd offsets i get a nil
byte. That is expressed by XML constructs like:
<Pattern>
<Bytes>00</Bytes>
<Pos>49</Pos>
</Pattern>
<Pattern>
<Bytes>00</Bytes>
<Pos>51</Pos>
</Pattern>
...
<Pattern>
<Bytes>00</Bytes>
<Pos>107</Pos>
</Pattern>
<Pattern>
<Bytes>000000</Bytes>
<Pos>109</Pos>
</Pattern>
I do not know if it possible to create BCD on file system with directory names
with exotic languages like Chinese. When this is true then the nil bytes will
vanish and only the terminating nil character for UTF-16 will survive. So the
above mentioned patterns vanish and the last one becomes like:
<Pattern>
<Bytes>0000</Bytes>
<Pos>110</Pos>
</Pattern>
The pattern before looks like:
<Pattern>
<Bytes>0001000000</Bytes>
<Pos>43</Pos>
</Pattern>
At offset 40 Hive bins data size is stored. Summed up with 4096 for base block
size i get file sizes about 28672, 36864 or 69632. At offset 44 the
clustering factor is stored. In all my registry samples i get here value
1. That means 512 block size. So when Hive bins size is reaching 32-bit high
limit and clustering factor is always 1 this now becomes like:
<Pattern>
<Bytes>01000000</Bytes>
<Pos>44</Pos>
</Pattern>
After the first XML construct characteristic for registry samples the next 2
constructs look like:
<Pattern>
<Bytes>0000</Bytes>
<Pos>6</Pos>
</Pattern>
<Pattern>
<Bytes>0000</Bytes>
<Pos>10</Pos>
</Pattern>
According to documentation after regf signature 4 byte primary sequence number
is stored followed by secondary sequence number. In my samples i got sequence
numbers like 1, 2, 341 or 776. So the 2 upper bytes of these numbers were
nil. Assuming that sequence numbers can reach 32-bit high limit these 2
constructs will vanish. So i delete these 2 constructs.
Then only 1 XML construct is left for inspection. That looks like:
<Bytes>01010000000300000000000000010000002000000000</Bytes>
<Pos>19</Pos>
According to documentation at offset 12 8 byte last written timestamp is
stored. So by lucky circumstances (same year range) the last byte is the same
in my examples. At offset 20 major version is stored. The value in all NT
Windows is 1. At offset 24 minor version is stored. In my examples i get value
3. So that means version 1.3 for my examples. At offset 28 file type is
stored. 0 means primary file and other values are used for transaction
variants (*.LOG*). At offset 32 file format is stored. 1 means direct memory
load. At offset 36 root cell offset is stored. In all my registry samples i
get value 20h. This is probably always true because that means cell directly
start after 32 byte hbin header. At offset 40 Hive bins data size is
stored. Here for my examples i get values corresponding -4096 size is about
file size in bytes. A hive bin size is multiple of 4096 bytes (=1000h) So
lowest byte of that size is always nil. So assuming that other modified
time-stamps are possible. The first bytes in that pattern ( last 1 byte of
modification time stamp) vanish and this becomes like:
<Bytes>010000000300000000000000010000002000000000</Bytes>
<Pos>20</Pos>
With these 2 new trid definition now all my BCD samples are described. TrID
definition and output are stored in archive BCD.zip. I hope that my
definitions can be used in future version of triddefs.
With best wishes
Jörg Jenderek