Author Topic: wbcat*.trid.xml for Windows Backup Catalog File  (Read 972 times)

jenderek

  • Sr. Member
  • ****
  • Posts: 375
wbcat*.trid.xml for Windows Backup Catalog File
« on: October 30, 2023, 03:30:57 AM »
Hello trid users,

some days ago i used the internal backup tool of Windows 10. Then i looked
where the files are stored on the backup drive. Some files with name suffix
wbcat are part of the backup. There exist numbered samples like "Backup files
1.wbcat" normally found under sub directory Catalogs inside directory
%COMPUTERNAME% on backup drive. These look like:

n:\MYPC1\Backup Set 2023-09-14 005504\Backup Files 2023-10-17 205940\Catalogs\Backup files 1.wbcat
n:\MYPC1\Backup Set 2023-09-14 005504\Backup Files 2023-10-17 205940\Catalogs\Backup files 2.wbcat
n:\MYPC1\Backup Set 2023-09-14 005504\Backup Files 2023-10-17 205940\Catalogs\Backup files 3.wbcat
n:\MYPC2\Backup Set 2016-07-27 233605\Backup Files 2016-07-27 234708\Catalogs\Backup files 1.wbcat
n:\MYPC2\Backup Set 2016-07-27 233605\Backup Files 2016-07-27 234708\Catalogs\Backup files 2.wbcat
n:\MYPC2\Backup Set 2016-07-27 233605\Backup Files 2016-07-27 234708\Catalogs\Backup files 3.wbcat

These obviously describe the ZIP archives with corresponding names (like
"Backup files 1.zip" in parent directory). then there exist samples with names
like GlobalCatalog.wbcat or GlobalCatalogCopy.wbcat. These are found in
directors like:
c:\System Volume Information\Windows Backup\Catalogs\GlobalCatalog.wbcat
c:\System Volume Information\Windows Backup\Catalogs\GlobalCatalogCopy.wbcat
n:\MYPC1\Backup Set 2023-02-02 233414\Catalogs\GlobalCatalog.wbcat
n:\MYPC2\Backup Set 2016-07-27 233605\Catalogs\GlobalCatalog.wbcat

So i run trid utility on my WBCAT examples. All samples are not recognized and
are described as "Unknown!" (see appended trid-v-old.txt in output)

For comparison reason i also run the file format identification utility DROID
(See https://sourceforge.net/projects/droid/). Here the samples are also not
recognized.

For comparison reason i also run file command (version 5.45) on such
samples. Here such samples are also not recognized and described also as
"data" (see appended output/file-5.45.txt). The mime type is here generic
application/octet-stream (see appended file-i-5.45.txt in output). The file
name suffix is also not recognized (see appended file-ext-5.45.txt in
little_zip/output).

Unfortunately i found no file format specification especially from
Microsoft. Shame on them! Because the backup tool is not reliable working and
i often get error messages with some error codes. Then i must spend hours for
searching what this mean and how to fix it.  So when spending much time and
resource in AI development IT companies should try to improve their own
products so that these can be easily use by normal users.

When searching on the net you can find always the same spare information
parts. So it looks like every page is a plagiary of another page. I the end i
find a page about Windows Backup and Restore on a forensic blog entry with
some technical information pieces. Thar is expressed inside new definitions by
line like:
 <RefURL>
 https://az4n6.blogspot.com/2012/08/windows-backup-and-restore.html
 </RefURL>

Because of missing information i first create TrID definition wbcat.trid.xml
by running tridscan on numbered samples. The concerning directory and label
with date information are stored inside as UTF-16 strings. These are expressed
inside Global Strings sections by lines like:
   <String>B'A'C'K'U'P' 'S'E'T' '2'0</String>
   <String>B'A'C'K'U'P' 'F'I'L'E'S' '2'0</String>
   <String>.'Z'I'P</String>

The first string seems to appear at same constant offset. That was expressed
inside front block section by XML construct like:
   <Bytes>4200610063006B00750070002000530065007400200032003000</Bytes>
   <ASCII> B . a . c . k . u . p .   . S . e . t .   . 2 . 0</ASCII>
   <Pos>92</Pos>
Assuming that it is theoretically possible to create or have backups dated
before year 2000 or after year 2100 in global strings sections the lines
become like:
   <String>B'A'C'K'U'P' 'S'E'T' '</String>
   <String>B'A'C'K'U'P' 'F'I'L'E'S' '</String>
And the XML construct becomes like:
   <Bytes>4200610063006B00750070002000530065007400</Bytes>
   <ASCII> B . a . c . k . u . p .   . S . e . t .</ASCII>
   <Pos>92</Pos>

As in "Backup Set 2023-09-14 005504" the date information after phrase Set
seems to be always stored in a format with minus and space characters. When
"Backup Set" appears at constant offset then these is also true for
date-stamps. So this is expressed by XML constructs like:
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>119</Pos>
   </Pattern>
   <Pattern>
      <Bytes>002D00</Bytes>
      <ASCII> . -</ASCII>
      <Pos>121</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>125</Pos>
   </Pattern>
   <Pattern>
      <Bytes>002D00</Bytes>
      <ASCII> . -</ASCII>
      <Pos>127</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>131</Pos>
   </Pattern>
   <Pattern>
      <Bytes>002000</Bytes>
      <Pos>133</Pos>
   </Pattern>

As expressed by reference page the information about concerned drives is also
stored inside catalog files. This expressed as UTF-16 string inside global
strings section are done by lines like:
   <String>V'O'L'U'M'E'{</String>
   <String>M'E'D'I'A'T'Y'P'E</String>

Then i found some 4-byte strings which maybe are characteristic patterns, but
i do not know. These are described inside Global Strings section by lines
like:
   <String>CCOL''''(</String>
   <String>MDIS</String>
   <String>MOTA</String>
   <String>TDEM</String>
   <String>TLIF</String>
   <String>TSKB</String>

Some of these patterns are found at constant offsets. These are expressed
inside Front Block section by XML constructs like:
   <Bytes>43636F4C0000000028000000</Bytes>
   <ASCII> C c o L . . . . (</ASCII>
   <Pos>0</Pos>
   <Bytes>016D6F7441</Bytes>
   <ASCII> . m o t A</ASCII>
   <Pos>19</Pos>

I also get some short nil patterns like:
      <Pattern>
         <Bytes>00</Bytes>
         <Pos>27</Pos>
      </Pattern>
      <Pattern>
         <Bytes>00</Bytes>
         <Pos>205</Pos>
      </Pattern>
      <Pattern>
         <Bytes>00</Bytes>
         <Pos>209</Pos>
      </Pattern>
      <Pattern>
         <Bytes>00</Bytes>
         <Pos>281</Pos>
      </Pattern>
Most of them seems to be are triggered by lucky circumstances ( ASCII
characters encoded as UTF-16 strings). So i delete such short nil patterns.

Because i do not know if these are optional or required i keep all the other
constructs at the moment. I hope that other user can refine the definition by
running tridscan with more samples or knowing specifications.

Then i create wbcat-global.trid.xml by running tridscan on samples with names
GlobalCatalogCopy.wbcat or GlobalCatalog.wbcat. Unfortunately here i get only
few samples. That means i get in definitions many patterns. So it is difficult
to see what are the differences. The main difference to first definition is
the starting magic. That is expressed by first XML construct. That looks like:
   <Bytes>436C62470000000028000000</Bytes>
   <ASCII> C l b G . . . . (</ASCII>
   <Pos>0</Pos>
This information is also shown inside Global Strings section by line like:
   <String>CLBG''''(</String>

Here the concerning directory and label with date information also exist but
not at constant offset. So this is here expressed only inside global strings
section by lines like:
   <String>B'A'C'K'U'P' 'F'I'L'E'S' '2'0</String>
   <String>B'A'C'K'U'P' 'S'E'T' '2'0</String>

So instead of improving wbcat-global.trid.xml i create wbcat-generic.trid.xml
base on wbcat.trid.xml by running tridscan on "global" examples. Then i looked
what has changed. The first XML construct now becomes like:
   <Pattern>
      <Bytes>43</Bytes>
      <ASCII> C</ASCII>
      <Pos>0</Pos>
   </Pattern>
   <Pattern>
      <Bytes>0000000028000000</Bytes>
      <ASCII> . . . . (</ASCII>
      <Pos>4</Pos>
   </Pattern>

Again i remove year 20xx parts and the construct with minus and space
characters for date stamp automatically vanished. I also delete short nil
patterns.

From the 4 byte string some vanished inside global strings section by
lines. That lines were:
      <String>MDIS</String>
      <String>TDEM</String>
      <String>TSKB</String>

But when grepping for these strings (like "MdiS" "TdeM" "TSkB" ) these still
exist in global examples. I guess that these strings are not located near the
beginning in global samples ( with typically bigger sizes up to 54 MB in my
samples) and therefore not found by tridscan any more (with MAX_FILE_SIZE =
10 MB).
So only 2 4 byte patterns survived.. That are expressed by lines like:
      <String>MOTA</String>
      <String>TLIF</String>
And the first seems to be located at fixed offset. That is still expressed by
XML construct like:
   <Bytes>016D6F7441</Bytes>
   <ASCII> . m o t A</ASCII>
   <Pos>19</Pos>
Then the two lines
   <String>M'E'D'I'A'T'Y'P'E</String>
   <String>V'O'L'U'M'E'{</String>
becomes like
   <String>M'E'D'I'A</String>
   <String>T'Y'P'E</String>
   <String>I'A'T</String>
   <String>O'L'U'M</String>
   <String>V'O'L</String>
   <String>U'M'E</String>
which i do not understand, because when grepping for such strings (like
"M.e.d.i.a.T.y.p.e" and "V.o.l.u.m.e.{") this succeeds. I also tries to back
port these strings by bar variants, but then some samples are not recognized.

So i just keep 2 TrID definitions wbcat-generic.trid.xml and wbcat.trid.xml.

With these new trid definitions now all my WBCAT samples are described. TrID
definitions and output are stored in archive wbcat_.zip. I hope that my
definitions can be used in future version of triddefs.

With best wishes
Jörg Jenderek

Mark0

  • Administrator
  • Hero Member
  • *****
  • Posts: 2743
    • Mark0's Home Page
Re: wbcat*.trid.xml for Windows Backup Catalog File
« Reply #1 on: October 30, 2023, 11:47:14 PM »
Thanks Joerg!
I found some other .wbcat files, and refined the generic definitions. I'll add that at the moment, and keep searching form some others *global* ones for a more specific one.