Author Topic: ark-cab-msm.trid.xml for variant of Windows Installer Merge Module *.msm  (Read 1893 times)

jenderek

  • Sr. Member
  • ****
  • Posts: 375
Hello trid users,

some days ago i run trid on Windows Installer Merge Modules (*.msm). Many
are recognized as "Windows Installer Merge Module" by msm.trid.xml.
But some are described as "Microsoft Cabinet Archive" with file name
extension "CAB" by ark-cab.trid.xml (See appended output/trid-v-old.txt)

For comparison reason i run other file identifying tools. The file(1)
command describes these examples as "Microsoft Cabinet archive data" also
with wrong extension (See appended output/file.txt)

On page about Windows Installer on Wikipedia is mentioned that MSM samples
are used as merge modules. This is now expressed by reference URL line like:

   <RefURL>http://en.wikipedia.org/wiki/Windows_Installer</RefURL>

Furthermore i add a user defined mime type. That is expressed by line line:

   <Mime>application/x-ms-msm</Mime>

Unfortunately i found no precise information about MSM file format. The
samples recognized as "Generic OLE2 / Multistream Compound File" by
docfile.trid.xml seems to be older ( dated from 2000 til 2004).
The samples like _14248_Microsoft_VC80_MFC_x86.msm described by
ark-cab.trid.xml seems to be newer ( dated about 2019).

I guess that Microsoft changed the internal form of Windows Installer Merge
Module depending on some versions. So i mention this variant observation in
the remark line.

When looking in output of file command we see that first archive member is a
file starting with name "manifest." followed by version looking string
"8.0.50727.6195." Afterward comes a string like
"97F81AF1_0E47_DC99_FF1F_C8B3B9A1E18E" which is probably a GUID based.

The second archive member is a file starting with name "catalog." For the
remaining name part the same applies as for first member.

That can also be verified when running CAB extracting tool like 7zip. When
looking in listing output (See appended output/7z-l.txt) , we see that third
member is a file starting with "ul_manifest." followed by GUID based string.
Fourth member is file starting with "ul_catalog." followed by GUID based
string.

This was also expressed in global string section of tridscan generated definition
ark-cab-msm.trid.xml by 4 lines like:
   <String>MANIFEST.8.0.50727.6195.</String>
   <String>CATALOG.8.0.50727.6195.</String>
   <String>UL_MANIFEST.</String>
   <String>UL_CATALOG.</String>
I guess that there exist also MSM samples with other version like
"8.0.50727.6195." So i removed that parts. Furthermore i also delete GUID
looking phrases like:
   <String>_FF1F_C8B3B9A1E18E</String>

With the new trid definition all my inspected newer MSM samples are now
recognized (See appended output/trid-new-v.txt). Unfortunately my definition
file is generated by just a dozen of examples. So i do not know if my
observations are always true.

TrID definition, some examples and output are stored in archive cab_msm.zip. I
hope that my XML file can be used in future version of triddefs.

With best wishes
Jörg Jenderek

Mark0

  • Administrator
  • Hero Member
  • *****
  • Posts: 2743
    • Mark0's Home Page
Thanks!