Author Topic: TrID variant ark-cab-single.trid.xml for single Cabinet Archive (*_)  (Read 3757 times)

jenderek

  • Sr. Member
  • ****
  • Posts: 369
Hello,

when i run TrID on thousands of *_ files all are identified too general
("Microsoft Cabinet Archive" see appended output/trid-old.txt ).

Instead extension "CAB" mentioned in ark-cab.trid.xml underscore "_" is used
as last character in filename extension.

First i tried to add also more extension to ark-cab.trid.xml, but finally i
find another way.

Such files are found on Windows XP and 2000 CD inside i386 folder.
When i run tridscan on such files after 4256 examples and a dozen of
file name extensions i got a line like
   <Ext>CH_/DL_/EX_/GI_/HL_/HT_/IN_/JP_/PN_/SY_/TT_/WA_</Ext>

Here i look at the output of file(1) command. This tool also identifies such
examples as "Microsoft Cabinet archive data" ( see output\file-5.31.txt). But
it also display "1 file" for such archives, whereas other CAB files often
contain more than 1 file. Now i know i am on the right way. After searching on
the net i found "Microsoft Cabinet Format". So i add this site as reference by
line:
   <RefURL>https://msdn.microsoft.com/en-us/library/bb267310.aspx</RefURL>

When extracting archives like from TERMCAP._, ATTRIB.EX_, WINMINE.CH_ til to
NPDRMV2.ZI_ the rule to generate cabinet name is mentioned in reference
by MAKECAB variable CompressedFileExtensionChar also seen:
Take source file name and append underscore "_" ( which is the default
CompressedFileExtensionChar ) to extension to generate cabinet name. If this
rule generate 4 character extension remove the second last character to get
DOS 8.3 file names. This is now expressed as XML construct:
   <Ext>_/??_</Ext>
Again also add for such cabinet archives a line for mime types
   <Mime>application/vnd.ms-cab-compressed</Mime>

At offset 30 cabinet archive flag is stored as short little endian
value. Value 1 and 2 are used to for additional header bytes for building
cabinet chains (for example PRECOPY1.CAB-> PRECOPY2.CAB-> PRECOPY3.CAB).
Obviously this is not used for single cabinets. Value 4 is used to reserve
additional bytes in header for something like signatures. To day we know about
importance of code checksums, but in times of XP security thinking was not so
big. So this features was implemented but apparently never used in such single
cabinets. So for such single cabinets flag value is apparently always 0. That
means no optional bytes or in other word header is minimal (36 bytes),
CFFOLDER structure is minimal (8 bytes) and CFFILE structure is minimal (16
bytes + name bytes).

Single cabinets have only 1 CFFOLDER ( cFolders short at offset 26)  and 1 CFFILE
( cFiles short at 28 ). So offset of the first CFFILE entry at offset 16
should be equal to sum of size of header(36) and 1 folder entry size (8).
Yes, this is true (44 ~ 2Ch).
Reserved areas have 0 values. At offset 24 cabinet file format version.
Currently only versionMajor = 1 and versionMinor = 3.

So second pattern with reserved2, 2C-offset, reserved3 , version , folder-,
file-entry_number, flags is expressed by
      <Pattern>
         <Bytes>000000002C000000000000000301010001000000</Bytes>
         <ASCII> . . . . ,</ASCII>
         <Pos>12</Pos>
      </Pattern>
At offset 32 short value with id is stored. Only examples with default 0 are
found, but this must not be always true. So cancel this item.

iCabinet at offset 34 is number of cabinet file in a set, where 0 for the
first cabinet. So for single this is always 0 expressed by XML construct:
      <Pattern>
         <Bytes>0000</Bytes>
         <Pos>34</Pos>
      </Pattern>

At position 36 long offset of 1st CFDATA block (following file entry) is stored
as "coffCabStart" calculated by
header size + 1 folder entry size + 1 file entry size + DOS name length
(including point and terminating null)
which is maximal 36+8+16+8+3+1+1=73=49h

So values are always below 256. This is expressed by
      <Pattern>
         <Bytes>000000</Bytes>
         <Pos>37</Pos>
      </Pattern>
This is followed by short value for number of CFDATA blocks by "cCFData". This
is low for small archives and high for big archives.

Afterwards compression type indicator is stored as short value. I verified
these facts by 7-Zip Console tool with test/list command. For inspected
samples always LZX:21 (0315h) is used whereas other CAB archives with also 1
file have other values like 1 for MSZIP compression. This is expressed by

      <Pattern>
         <Bytes>0315</Bytes>
         <Pos>42</Pos>
      </Pattern>

With new definition file all *.*_ files are now described more precise (See
appended output/trid-new.txt).

TrID definition, some examples and output are stored in archive cab_single.zip.
I hope that my XML file can be used in future version of triddefs.

With best wishes
J?rg Jenderek

Mark0

  • Administrator
  • Hero Member
  • *****
  • Posts: 2683
    • Mark0's Home Page
Re: TrID variant ark-cab-single.trid.xml for single Cabinet Archive (*_)
« Reply #1 on: August 09, 2017, 03:48:06 PM »
Uhm... I'll consider it, but I'm thinking that this may be perhaps too much detail.

Thanks for your work, as usual!