Author Topic: TrID variant ark-cab-msu.trid.xml for Windows Update Package Cabinet (*.MSU)  (Read 3997 times)

jenderek

  • Sr. Member
  • ****
  • Posts: 375
Hello,
when i run TrID on a few *.MSU only 2 identified as "Windows Update
Package" whereas the others are identified too general as "Microsoft
Cabinet Archive"( see appended output/trid-old.txt ).

First i thought to update msu.trid.xml, but when looking in this file
there exist only one pattern like
      <Pattern>
         <Bytes>4D53434600000000</Bytes>
         <ASCII> M S C F</ASCII>
         <Pos>0</Pos>
      </Pattern>
      
This are the start bytes of cabinet. At offset 24 of cabinet file
format version is stored. Currently only versionMajor = 1 and
versionMinor = 3. So byte sequence 0301h should occur as pattern.

So i decided to start a replacement definition by running tridscan and
finally generate ark-cab-msu.trid.xml.

I look at the output of file(1) command. This tool also identifies
such examples as "Microsoft Cabinet archive data" ( see
output\file-5.31.txt). But it also display always "4 files" for such
archives, whereas other CAB files often contain more or only 1
file. Now i know i am on the right way. After searching on the net i
found "Description of the Windows Update Standalone Installer in
Windows". So i add this site as reference by line:
   <RefURL>http://support.microsoft.com/kb/934307/en-US</RefURL>

Again also add for such cabinet archives a line for mime types
   <Mime>application/vnd.ms-cab-compressed</Mime>

At this point it might be useful to look at output of 7-Zip Console
tool with list command (see output/7z-l.txt). All inspected archives
contain a file with name WSUSSCAN.cab. Obviously this must be file
that is called "Windows Update meta data" by Microsoft reference. In
most cases this is the first member. But in MediaFeaturePack update
this file occurs as second member with low case name "wsusscan.cab".
So this gives finally in global string section a line like
      <String>WSUSSCAN.CAB</String>

The other files have often the same main name as MSU archive. The
archives always contains near end a file *.xml which is used by
Wusa.exe to perform installation of the update. This gives finally in
global string section a line like:
      <String>.XML</String>

According to reference MSU archive contain a properties file with
information concerning the Microsoft Knowledge Base. This must be the
file with name like *-pkgProperties.txt. This is expressed by
XML construct:
      <String>-PKGPROPERTIES.TXT</String>
These three mentioned strings seems to be reliable indicators for MSU files.

Tridscan produce more strings like
      <String>MICROSOFT1-0+</String>
      <String>8HTTP</String>
These seem to be accidental expressions. At the moment i keep it in
the current trid definition file, so that other user can refine the
definition, but maybe these strings can be removed in future versions,
because enough reliable MSU indicators are available.

The remaining archive members are CAB files with the real update
according to reference. There one or more .cab files are mentioned.
So 100% percent sure is that MSU files contain at least 4 archive
members. Because a knowledge base article describe only one problem
with offering a fix and even bigger updates like
IE9-Windows6.1-KB2744842-x86.msu for Internet Explorer 9 contain only
1 CAB, it is quite likely that MSU archives always contains exactly 4
members.

According to Microsoft Cabinet Format specification found at
https://msdn.microsoft.com/en-us/library/bb267310.aspx
at position 28 long value "cFiles" then has value 4.

At offset 30 cabinet archive flag is stored as short little endian
value. Value 1 and 2 are used to for additional header bytes for
building cabinet chains (for example PRECOPY1.CAB-> PRECOPY2.CAB->
PRECOPY3.CAB). Obviously this is not used for MSU files. Value 4 is
used to reserve additional bytes in header for something. This is
found for observed MSU files. For inspected samples found 20 extra
bytes in header (at position 36 cbCFHeader=0014h), no extra bytes in
folder and data structure (following cbCFFolder=0,cbCFData=0).
iCabinet at offset 34 is number of cabinet file in a set, where 0 for
the first cabinet. Apparently this is 0 for MSU files. These facts are
expressed by third XML construct:
   <Bytes>0000140000</Bytes>
   <Pos>34</Pos>

The 20 extra bytes "abReserve" start at position 40 with something
like 000010009e40d300. The last bytes of this areas seems to be
null. This was expressed by XML construct:
   <Bytes>00000000000000000000</Bytes>
   <Pos>50</Pos>

Real code signature needs about 1800h bytes. Because i do not know
what these specific bytes mean, i remove that pattern parts from trid
definition file.
The wine emulator in development version contains its own
implementation of wusa.exe found in patches for "programs/wusa/*" like:
https://github.com/wine-compholio/wine-staging/tree/master/patches/wusa-MSU_Package_Installer
Maybe this can help in the future to understand this area.

Found for MSU files only 1 CFFOLDER ( cFolders=1 short at offset 26).
So offset of the first CFFILE entry "coffFiles" at offset 16 is
calculated by formula:
minimal header+ extra header bytes+ header meta + 1 folder entry
36+20+4+8 = 68
Yes, this is true (68 ~ 44h).

Reserved areas have 0 values. So second pattern with reserved2,
44h-offset, reserved3 , version , folder-, file-entry_number, flags is
expressed by second XML construct
      <Bytes>0000000044000000000000000301010004000400</Bytes>
      <ASCII> . . . . D</ASCII>
      <Pos>12</Pos>

At position 72 uncompressed byte offset of the start of file's
data is stored as long "uoffFolderStart". For the first file in each
folder, this value will usually be zero. This is expressed by XML
construct:
   <Pattern>
      <Bytes>00000000</Bytes>
      <Pos>72</Pos>
   </Pattern>

All first archive members have the _A_ARCH flag (short 20h for
modified since last backup or "A" Attribute in 7 zip output) set in
1st CFFILE structure at position 82. This was expressed by last xml
construct
   <Pattern>
      <Bytes>2000</Bytes>
      <Pos>82</Pos>
   </Pattern>
This must not be always true. So i removed that pattern.

With new definition file all inspected *.MSU are now described more
precise (See appended output/trid-new.txt).

TrID definition, output are stored in archive msu_cab.zip.
I hope that my XML file can be used in future version of triddefs.

With best wishes
Joerg Jenderek

Mark0

  • Administrator
  • Hero Member
  • *****
  • Posts: 2744
    • Mark0's Home Page
Uhm... I think I'll keep the old def/name, keep the old header part, but refine the strings section with the one you collected.
Thanks as usual!