Hello,
when i run TrID on some IncrediMail specific *.im? files some are
identified wrong or too general as "Microsoft Cabinet Archive" (see
appended output/trid-old.txt ).
The examples are found on disk after installing current German
(version 2.5 de build 6605338) by installer IncrediMailSetup_de.exe
from web site
http://www.incredimail.com/.
First i trid to update imn.trid.xml for IncrediMail notifier
(*.imn). The current definition file contains only 1 pattern with XML
construct:
<Bytes>4D53434600000000</Bytes>
<ASCII> M S C F</ASCII>
<Pos>0</Pos>
That is the magic for cab archives. That means all such IncrediMail files
are also identified by ark-cab.trid.xml as "Microsoft Cabinet
Archive".
In global string section lines occur like:
<String>CONTENT.INI</String>
<String>GOTMAIL.SWF</String>
<String>HAVEMAIL.SWF</String>
<String>READMAIL.SWF</String>
That must be interpreted as following. IncrediMail file contains an
initialisation file with name content.ini and some Flash Player files
(GOTMAIL.SWF , HAVEMAIL.SWF , READMAIL.SWF). The inspected notifier
examples however does not contain any Flash Player files
(*.swf). These facts can be verified by looking at output of 7-Zip
console unpacking tool with list command (see imn/output/7z-l.txt). So
examples are not recognised by imn.trid.xml (see
imn/output/trid-old.txt). And an updated definition would only contain
1 line in string section like:
<String>CONTENT.INI</String>
Do the the same consideration for *.imw files. By imw-wav.trid.xml
IncrediMail sounds with wave Audios (*.wav) are described in global
string section by
<String>CONTENT.INI</String>
<String>.WAV</String>
But inspected examples does not contain any audio files ( see
imw/output/7z-l.txt). So an updated definition file would contain in
string section only 1 line:
<String>CONTENT.INI</String>
At this point a problem arise. As far as i can see it is then not
possible to distinguish general IncrediMail files like notifiers
(*.imn) from sounds (*.imw). In my mind this is only possible after
extracting content.ini and then there looking for Type variable, which
is Notifier for imn-files and Sound for imw-files.
So i do not try further to update existing trid definitions and start
to create a generic definition file im_.trid.xml for such IncrediMail
files.
In current definitions only start magic part with "MSCF is used in
pattern section. At offset 24 of cabinet file format version is
stored. Currently only versionMajor = 1 and versionMinor = 3. So byte
sequence 0301h should occur as pattern.
For identifying only one pattern is sufficient, but if you want to do
some deeper inspection additional information is lost. That is no
problem if official or complete file specification exist. Unfortunately
this is not the case. There exist Microsoft Cabinet Format found at
https://msdn.microsoft.com/en-us/library/bb267310.aspx .
But i found no information for such IncrediMail files. Nothing is
written about used compression method, id , etc.
If you do not build such cab files by "Photo Notifier and Animation
Creator" (pnac.exe) because you running Linux with WINE emulator for
example you need additional information or exact identification to
create compatible package.
So i decided to start a replacement definition by running tridscan and
refine XML file according to cab specification.
At offset 30 cabinet archive flag is stored as short little endian
value. Value 1 and 2 are used to for additional header bytes for
building cabinet chains (for example PRECOPY1.CAB-> PRECOPY2.CAB->
PRECOPY3.CAB). Obviously this is not used for IncrediMail files. Value
4 is used to reserve additional bytes in header for something like
signatures. So this features was implemented but apparently never used
in such cabinets. So for IncrediMail files flag value is apparently
always 0. This is expressed by:
<Bytes>0000</Bytes>
<Pos>30</Pos>
Reserved areas have 0 values. At offset 16 offset of the first CFFILE
entry is stored. If header contains no optional parts and entry is
just after header then minimal value 2Ch occur. At offset 26 number of
CFFOLDER entries is always 1. So reserved2, offset, reserved3, version
and cFolders values are expressed now by second XML construct:
<Bytes>000000002C0000000000000003010100</Bytes>
<Pos>12</Pos>
iCabinet at offset 34 is number of cabinet file in a set, where 0 is used for
the first cabinet. So for examples this is seems to be always 0.
This fact is now expressed by 3rd XML construct:
<Bytes>0000</Bytes>
<Pos>34</Pos>
At position 36 long offset of 1st CFDATA block (following file entry) is stored
as "coffCabStart". If archive contains only some members, then number of CFFILE
structure is low and and offset is not so high. This was expressed by:
<Bytes>0000</Bytes>
<Pos>38</Pos>
This must become false if archive contains many members. So i removed
that pattern.
Following number of CFDATA blocks in folder by short "cCFData" at
position 40. Often this low when archive contain only few and little
members. So removed that pattern part. At position 42 compression type
indicator is stored as short "typeCompress". 0315h means compression
LZX:21. This is expressed now by XML construct:
<Bytes>0315</Bytes>
<Pos>42</Pos>
At position 48 uncompressed byte offset of the start of data is stored
as long "uoffFolderStart" if only 1 cFolders. For the first file in
each folder, this value will usually be zero. Afterwards index of the
folder containing data is stored as as short "iFolder". A value of zero
indicates this is the first folder in this cabinet file. This is now
expressed by:
<Bytes>000000000000</Bytes>
<Pos>48</Pos>
Then after date and time field at position 58 first file attribute is
stored:
<Bytes>2000</Bytes>
<Pos>58</Pos>
Above expression means that first member has modified since last
backup ( _A_ARCH~0x20). Of course this must not always be true. So
last XML construct now becomes:
<Bytes>00</Bytes>
<Pos>59</Pos>
So mention found CAB values in remark line. By this generic definition
file now IncrediMail animation (*.ima), ecard (*.imf), image (*.imi),
notifier (*imn), skin (*.ims) and sound (*.imw) are recognized (see
IncrediMail.txt). So mention this fact in remark line. Files name
extension is now expressed by:
<Ext>IMA/IMF/IMI/IMN/IMS/IMW</Ext>
Instead "application/vnd.ms-cab-compressed" IncrediMail adds a user
defined mime type for such IncrediContent. This is now expressed by
line:
<Mime>application/x-incredimail</Mime>
In old output we got something like:
File: racing_snail_light.imn
65.9% (.IMI) IncrediMail image (15500/1/2)
34.0% (.CAB) Microsoft Cabinet Archive (8000/1)
that looks for user at first glance like
49.9% (.PI2) DEGAS med-res bitmap (2000/1)
25.0% (.ABR) Adobe PhotoShop Brush (1002/3)
where trid offers more file types and user must do further inspection
to decide what is true.
But in reality *.im? files are just cab archives with content.ini
member. So this is easy visible for users when using new definition
with new description like:
39.7% (.IMA/IMF/IMI/IMN/IMS/IMW) IncrediMail (Cabinet Archive) (15529/7/2)
20.4% (.CAB) Microsoft Cabinet Archive (8000/1)
(see appended imn\output\trid-new.txt ).
TrID definition, some examples and output are stored in archive
im_.zip. I hope that my XML file can be used in future version of
triddefs.
With best wishes
J?rg Jenderek