Hello trid users,
Some days ago i handle some Microsoft Works portfolio with file name
extension WSB.
For my examples i got only a description "Generic OLE2 / Multistream
Compound" by docfile.trid.xml which is in principal correct but less
specific (See appended output/trid-v-old.txt ).
For comparison reason i also run the file utility (version 5.40). This
behaves similar. The examples are described generic as "OLE 2 Compound
Document" but no sub type classification was done. Instead only "UNKNOWN" is
shown , but luckily the used CLSID is shown in hexadecimal form as
c0c7266eb98cd311a1c800c04f612452.
With the help of online windows GUID converter like on toolslick.com i was
able to convert this in GUID form with curly braces and hyphen
separated. This now looks like:
{6E26C7C0-8CB9-11D3-A1C8-00C04F612452}
With CLSID in GUID form i search the net for information and on web site
like file-extension.net for extension. So i come to conclusion that examples
like Sammlung.wsb and wsbsamp.wsb are Microsoft Works portfolio files. The
first i found later in directory MSWorks/Common on Works 2003 CD. That
information is fixed inside TrID definition by line like:
<RefURL>
http://fileformats.archiveteam.org/wiki/Microsoft_Compound_File </RefURL>
Because the definition generated by tridscan is based on only 2 examples i
get inside global string section obviously short garbage lines like:
<String>IPFIPF</String>
<String>$'''%</String>
<String>FBEFC</String>
<String>FMCHQ</String>
<String>(%%%</String>
<String>(CA-</String>
<String>*))1</String>
<String>*HDO</String>
So i delete such lines.
Portfolio is apparently the part of the Microsoft Works suite that organize
pictures like JPEG images. This becomes visible that the image content is
stored in streams with UTF-16 LE name __cf1 and the corresponding file name
like 001.JPG, 002.JPG. etc. is stored in streams with name __fname. This can
be verified by extracting that stream via Michal Mutl MiTeC Structured
Storage Viewer for example. So example __cf1.stream seems to be a bad
constructed JPEG (see appended wsbsamp.tmp\__cf1.stream). So these facts are
expressed inside trid definition by string lines like:
<String>0'0'1'.'J'P'G</String>
<String>0'0'2'.'J'P'G</String>
<String>0'0'3'.'J'P'G</String>
<String>_'_'F'N'A'M'E</String>
<String>_'_'C'F'1</String>
Then there are streams where names start with 2 underscores like:
__BTHUMB __COUNTDELAYTHUMB __IDSDELAYTHUMB __LASTID __SIZEUSED __THUMB
These are expressed by lines like:
<String>_'_'B'T'H'U'M'B</String>
<String>_'_'C'O'U'N'T'D'E'L'A'Y'T'H'U'M'B'''''''''''''''''''''''''''''''$</String>
<String>_'_'I'D'S'D'E'L'A'Y'T'H'U'M'B</String>
<String>_'_'L'A'S'T'I'D</String>
<String>_'_'S'I'Z'E'U'S'E'D</String>
<String>_'_'T'H'U'M'B</String>
Then there are 2 streams without underscores like PFORDER and
PFSELECTION. These are expressed by lines like:
<String>P'F'O'R'D'E'R</String>
<String>P'F'S'E'L'E'C'T'I'O'N</String>
But of course i do not know if these observed names are always required or
if some are only optional.
In front block i get many short nil patterns like:
<Pattern>
<Bytes>00</Bytes>
<Pos>1153</Pos>
</Pattern>
...
<Pattern>
<Bytes>00</Bytes>
<Pos>1299</Pos>
</Pattern>
I am quite sure that these are generated by too small number of examples. So
i delete such patterns.
I do not know which version are described by mentioned CLSID; probably
version 6 and/or 7. So i choose the describing line like:
<FileType>Microsoft Works portfolio</FileType>
<Mime>application/vnd.ms-works</Mime>
The above mime type is registered ad IANA, but there nearly zero
information is found. And when searching the net often only 4
extensions wps, wcm, wdb and wks are mentioned, but i believe that
this also suitable for WSB examples. If this not true then the
generic mime type application/x-ole-storage should be used.
With the new definition all WSB examples are now described (see appended
output/trid-v.txt). TrID definitions, some examples and output are stored in
wsb_.zip. I hope that my XML file can be used in future version of triddefs.
With best wishes
Jörg Jenderek