Author Topic: TrID replacement ark-cab-snp.trid.xml Microsoft Access report snapshot ;*.SNP  (Read 4428 times)

jenderek

  • Sr. Member
  • ****
  • Posts: 375
Hello,
when handling Microsoft Cabinet just for interest i look inside trid
definitions for string "<Bytes>4D534346" which is the start magic "MSCF" for
CAB files. So i find a dozen of files type which are probably Cabinet
archives::

gadget-microsoft-cab.trid.xml
kodu.trid.xml
mco.trid.xml
xtp-infopath.trid.xml
onepkg.trid.xml
themepack.trid.xml
tsk.trid.xml
snp.trid.xml
xsn.trid.xml
lva.trid.xml
lvf.trid.xml
mbf.trid.xml
ppz.trid.xml
ima.trid.xml
ime.trid.xml
imf.trid.xml
imi.trid.xml
imn.trid.xml
ims.trid.xml
imw-mid.trid.xml
imw-wav.trid.xml

When i run TrID on a few SNP files created by Microsoft Access 2003 these are
identified correctly as "Microsoft Access report snapshot" by snp.trid.xml and
"Microsoft Cabinet Archive" by ark-cab.trid.xml ( see appended
output/trid-old.txt ).

But when looking in definition file there exist only 2 patterns where first
looks like:
   <Pattern>
      <Bytes>4D534346</Bytes>
      <ASCII> M S C F</ASCII>
      <Pos>0</Pos>
   </Pattern>

At offset 24 of cabinet file format version is stored. Currently only
versionMajor = 1 and versionMinor = 3 is used . So byte sequence 0301h should
occur as pattern. The absence of that pattern means some information is
lost. For identifying this is maybe not a problem but for further
investigating of such files this is not sufficient. So declaring JAR file as
ZIP files is OK when you have complete specification for JAR
files. Unfortunately for SNP files there does not exist such complete
specification. Best information about SNP file format is found on
Wikipedia. So instead of general Microsoft page concerning Office i used that
page by reference URL:
     <RefURL>https://en.wikipedia.org/wiki/SNP_file_format</RefURL>

So i decided to start a replacement definition by running tridscan and
finally generate ark-cab-snp.trid.xml.

According to Microsoft Cabinet Format specification found at
https://msdn.microsoft.com/en-us/library/bb267310.aspx CABinet archives start
with file signature and reserved1 area. This is now expressed by first XML
construct:
   <Pattern>
      <Bytes>4D53434600000000</Bytes>
      <ASCII> M S C F</ASCII>
      <Pos>0</Pos>
   </Pattern>

Although SNP files are cabinet files the mime type registered by Access 2003 is
not "application/vnd.ms-cab-compressed". Instead Access concerning is used by
line:
   <Mime>application/msaccess</Mime>

At this point it might be useful to look at output of 7-Zip Console
tool with list command (see output/7z-l.txt) and file (1) output
(see output/file-5.31.txt).

There and also by reference it can be seen that archive contains just one
file, that is the uncompressed report snapshot. This implies at position 28
short value "cFiles" then has value 1 and at 26 short "cFolders" value is 1.

At offset 30 cabinet archive flag is stored as short little endian value 0.
Value 1 and 2 are used to for additional header bytes for building cabinet
chains (for example PRECOPY1.CAB-> PRECOPY2.CAB->PRECOPY3.CAB). Obviously
this is not used for SNP files. Value 4 is used to reserve additional bytes in
header for something. This is not found for observed SNP files.

That also means no optional bytes or in other word header is minimal (36 bytes),
CFFOLDER structure is minimal (8 bytes) and CFFILE structure is minimal (16
bytes + name bytes).

So offset of the first long CFFILE entry at offset 16 should be equal to sum
of size of header(36) and 1 folder entry size (8).  Yes, this is true (44 ~
2Ch).

At position 32 ID is stored as short. For all inspected examples this was
5309 = 14BDh.

iCabinet at offset 34 is number of cabinet file in a set, where 0 for
the first cabinet. Apparently this is 0 for SNP files.

At position 36 CFFOLDER structure starts with offset of the first CFDATA block
stored as long "coffCabStart"

offset can be calculated by formula:
header+ 1 folder entry + 1 file entry + length of "_AccRpt_.snp" + 1 null byte
36 + 8 + 16 + 12 + 1 = 73 = 49h

All these values are expressed by second XML pattern:
<Pattern>
   <Bytes>000000002C000000000000000301010001000000BD14000049000000</Bytes>
   <ASCII> . . . . , . . . . . . . . . . . . . . . . . . . I</ASCII>
   <Pos>12</Pos>
</Pattern>

At position 42 compression type is stored as short "typeCompress" For
inspected examples this was always MSZIP (=0001h). This is now expressed by
third XML construct:
   <Pattern>
      <Bytes>0100</Bytes>
      <Pos>42</Pos>
   </Pattern>

At position 44 strict CFFILE starts. At position 48 uncompressed offset of
file is stored as long "uoffFolderStart". For the first file, this value will
usually be zero. At position 52 index into the CFFOLDER area is stored as
short "iFolder". A value of zero indicates this is the first folder in this
cabinet file. These 2 facts are expressed by XML construct:
   <Pattern>
      <Bytes>000000000000</Bytes>
      <Pos>48</Pos>
   </Pattern>

At position 54 short values for date and time are stored. These of course are
different after creating also names-1Jan1980.snp with help of hex editor.

At position 58 member attribute are stored as short "attribs". When we
believe in Microsoft's CAB specification, where highest bit is given by
_A_NAME_IS_UTF with value 0x80 high byte of attribute is never used. For
inspected examples i always found 0020h, which means _A_ARCH flag is set,
because file is modified since last backup. With help of hex editor create
example names-NoAttrib.snp with attribs=0. This example was accepted and shown
by snp viewer. At position 60 first archive member name ( Apparently always
_AccRpt_.snp") is stored. These 2 fact are expressed by XML construct:
   <Pattern>
      <Bytes>005F4163635270745F2E736E7000</Bytes>
      <ASCII> . _ A c c R p t _ . s n p</ASCII>
      <Pos>59</Pos>
   </Pattern>

Afterward strict CFDATA starts at position 73. At position 79 number of
uncompressed bytes in block is stored is as short "cbUncomp". This should be
different when size of member differs. But if block size is a multiple of 256
then low byte is always 0. This makes sense for me. So keep construct:
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>79</Pos>
   </Pattern>

At position 81 the compressed data starts as byte stream "ab". For all
inspected samples this starts with same 3 bytes, expressed by:
   <Pattern>
      <Bytes>434BED</Bytes>
      <ASCII> C K</ASCII>
      <Pos>81</Pos>
   </Pattern>

_AccRpt_.snp file is also characterized as "Generic OLE2 / Multistream
Compound File" by trid and i find "EMF" magic inside. That is also described
on reference page:
SNP files are based on the Microsoft Compound File Binary Format (CFBF). For
SNP files, Microsoft Access uses CFBF to store each page as a separate
Enhanced Metafile (EMF)-like format.
Probably compressing always similar data type give same start bytes in data
stream. So i keep last pattern.

Furthermore i summarize all observed facts in remark line like
<Rem>
MZIP compressed Cabinet Archive with minimal header, ID 14BD and 1 OLE2 "_AccRpt_.snp"
containing Windows Enhanced MetaFile generated by Microsoft Access version before 2010
</Rem>

With new definition file all inspected Windows Access report snapshot are now
described more precise (See appended output/trid-new.txt).

TrID definition, some examples and output are stored in archive access.zip .
I hope that my XML file can be used in future version of triddefs.

With best wishes
Joerg Jenderek

Mark0

  • Administrator
  • Hero Member
  • *****
  • Posts: 2743
    • Mark0's Home Page
Hi!
Thanks for the detailed analysis. I'll think about it, but I think I'll keep the present/simpler definition. I'll update the URL reference since the Wikipedia page is definitely better than the old one.