Hello trid users,
Some days ago i handled some files with ART file name extensions. Some
samples are "newer" Greenstreet Art drawings.
So i run trid utility on such ART examples. Older variants are described as
"Greenstreet Art drawing (old)" by art-gst.trid.xml. Newer variant examples
(like BCARD2.ART POSTER2.ART SLIDE2.ART) are only described as "Generic OLE2
/ Multistream Compound" by docfile.trid.xml (See appended
output/trid-v-old.txt).
For comparison reason i check these examples by file command utility. When
running file command (version 5.43) here all "new"" variants are only
described as "Composite Document File V2 Document" (See appended
output/file-5.43.txt) or as "OLE 2 Compound Document". It tried to do a sub
classification, but fails. It misidentifies such ART samples as "Microsoft",
because CLSID is similar to the ID of Microsoft products like "Microsoft
Visio Drawing" (See appended file-soft-5.43.txt for Visio2002Test.vsd).
A patched file command describes all these examples with clsid
{00022C60-0000-0000-C000-000000000046} and directory names CONTENTS and
Preview.dib (See appended output/file.tmp)
For comparison reason i also run the file format identification utility
DROID ( See
https://sourceforge.net/projects/droid/). This does describe the
newer variants as "OLE2 Compound Document Format" by PUID fmt/111.
i run tridscan to generate definition art-gst-docfile.trid.xml for newer
ART variant. The information mentioned for "older" also applies to "newer"
variant. So this is expressed by line like:
<RefURL>
http://fileformats.archiveteam.org/wiki/GST_ART</RefURL>
There under item "Software & Samples" the Greenstreet Publishing Suite 99 is
mentioned. On that CD-ROM image i found such "new" variant samples.
On Web page it is written that the "older" part now is stored as content
stream. That name is also shown by patched file command.
Because the ART samples are OLE2 Compound container we can inspect such
examples by suited tools like Michal Mutl Structured Storage Viewer for
example. There we see that stream and we can save it as CONTENTS-stream.art
for example. This is described by TrID as "old" variant.
The shown characteristic directory names are here described inside global
strings section by lines like:
<String>R'O'O'T' 'E'N'T'R'Y</String>
<String>C'O'N'T'E'N'T'S</String>
<String>P'R'E'V'I'E'W'.'D'I'B</String>
This section contain lines which look like garbage:
<String>D'''''''''''''''D</String>
<String>QQQ'CCC'TTT</String>
<String>(''')'''*</String>
<String>Q''QQ'QQQ</String>
<String>F'''H</String>
<String>Q'Q'Q</String>
<String>$'''%</String>
I assume that these lines are triggered by lucky circumstances. So i delete
such lines.
In front block section i found many short nil patterns like:
<Pattern>
<Bytes>000000</Bytes>
<Pos>1653</Pos>
</Pattern>
<Pattern>
<Bytes>00</Bytes>
<Pos>1665</Pos>
</Pattern>
<Pattern>
<Bytes>000000000000</Bytes>
<Pos>2042</Pos>
</Pattern>
I assume that these lines are triggered by lucky circumstances. So i delete
such lines.
At offset 30 (1Eh) the Sector Shift Exponent is stored as 2 byte little
endian. In my examples this value was 9. This means block size is 2**9=512.
At offset 48 (30h) the SecID of first sector of the directory stream is
stored as 4 byte little endian integer. In my examples this value was 2.
That means directory start in my examples at offset (2+1)*512=1536=600h. So
at that offset i found the UTF-16 encoded string "Root Entry". But i do not
know if this observed values are always true. So i keep them at the moment.
With the new trid definition now all my new ART images are described
(see appended output/trid-v-new.txt). TrID definition and output are stored
in archive art_cdf.zip. I hope that my XML file can be used in future
version of triddefs.
With best wishes
Jörg Jenderek