Author Topic: 2 variants for StarOffice Drawings; *.sxd *.std  (Read 1920 times)

jenderek

  • Sr. Member
  • ****
  • Posts: 375
2 variants for StarOffice Drawings; *.sxd *.std
« on: March 23, 2020, 12:00:33 AM »
Hello trid users,
some days ago i handled some non Microsoft Office documents. When running
TrID on samples with file name extension SXD, STD, some examples are only
described as "Zip document container (generic)" by
zip-doc-cont.trid.xml. Some samples are identified correctly as "StarOffice
Drawing" by sxd-staroffice.trid.xml (See appended output/trid-v-old.txt).

For comparison reason i also run other file identifying tools.  The
file(1) command identifies the SXD samples as "OpenOffice.org 1.x Draw
document" and the STC examples as "OpenOffice.org 1.x Draw template" (See
appended output/file-5.38.txt). So for the Drawing templates there exist no
trid definition.

So i run tridscan to generate std-staroffice.trid.xml for templates
variant. The format of such StarOffice/OpenOffice.org examples is described
for example at file formats archive team site. That is expressed by
reference URL line:
   <RefURL>
   http://fileformats.archiveteam.org/wiki/OpenOffice.org_XML
   </RefURL>

According to reference such Drawing templates get their own mime
type. That 37 byte sized string is expressed by line:
   <Mime>application/vnd.sun.xml.draw.template</Mime>

Then i start to refine the trid definition file to get same structure as for
other StarOffice trid definitions. The mime type string is also find at the
beginning of the ZIP container. That is expressed by XML construct like:
   <Bytes>6D696D65747970656170706C69636174696F6E2F766E64
   <ASCII> m i m e t y p e a p p l i c a t i o n / v n d
   <Pos>30</Pos>

That string is stored as clear text without any compression. So value for
packing method is zero. That is expressed by XML construct.
   <Bytes>0000</Bytes>
   <Pos>8</Pos>
Because the string is stored uncompressed, so value for compressed and
uncompressed size of first archive member has the same value 37. That is
0x25 in hexadecimal. That is expressed by XML-construct:
   <Bytes>250000002500000</Bytes>
   <Pos>18</Pos>

The mime type is always stored in a file with ASCII name mimetype. So the
size of this filename is eight. That is expressed by XML construct:
   <Bytes>0800</Bytes>
   <Pos>26</Pos>
Pattern at higher offset happened by lucky circumstances. So i remove such
patterns. I also get lines like
   <String>XMIMETYPEPK</String>

That is triggered by file inside zip archive with name mimetype. If the
header entry contains no extra field, then after stored file name the next
ZIP fragment start with magic string PK. That was true for my inspected
samples, but now where is explicitly written that this a strict
requirement. And i remember that for some java JAR files i found examples
with extra fields. So i remove such appended PK string parts in all file
patterns. So in global string section now lines becomes like:
   <String>MIMETYPE</String>

But without appended PK string is not so easy any more to distinguish the
Draw from template variant. This is now done by additional stored
length of mime type. That is 37 (=25h) for template and 28 (=1Ch) for
Drawing variant.
With the additional std-staroffice.trid.xml now the unrecognized STD samples
are now described as "StarOffice Drawing template" ( See appended
output/trid-v-new.txt).

By sxd-staroffice.trid.xml from Ryan Jones many Drawing and templates are
detected, but some like oo-draw-template-extra-non1st.std or sdraw.sxd are
not detected. When looking inside this trid definition i see that is looks
for specific mime type but not as first member, that starts at offset 30.

So i decided to generate such a variant sxd-staroffice-mime.trid.xml by
running tridscan and refining it in the same way as described above.

Here the mime type value is 28 byte ASCII application/vnd.sun.xml.draw
string. So stored length 0x1C in hexadecimal of this string is expressed
by XML-construct:
   <Bytes>1C0000001C000000</Bytes>
   <Pos>18</Pos>
And the mime type value is shown by line:
   <Mime>application/vnd.sun.xml.draw</Mime>

The file format is described on OpenOffice.org XML page on Wikipedia. That
is now expressed by line:
   <RefURL>https://en.wikipedia.org/wiki/OpenOffice.org_XML</RefURL>

With the 2 additional trid definition now the unrecognized StarOffice
Drawing variants are detected and all definitions have correct reference URL
and mime type ( See appended output/trid-v-new.txt).

2 TrID definitions, output and some examples stored in archive s_draw.zip. I
hope that my 2 XML files can be used in future version of triddefs.

There may exist some StarOffice Drawing samples which are not recognized by
these 2 trid definitions. After handling many StarOffice/OpenOffice
documents there seems to occur some typical errors:

1 No mimetype file.
2 mimetype file is not first archive member.
3 mime type entry has extra fields.
4 content of mimetype file are packed with non zero method like deflate.
5 content of mimetype file contains wrong mime type string.

So file type identifying tools looking for characteristic byte sequences
like trid, file and droid fail. The reason is that the mimetype file is just
for user information. That means that office software or spreadsheet
programs works with such "bad"-formed documents. And even StarOffice and
up-to-date OpenOffice program behaves bad and generate drawing without or
wrong mimetype file.

With best wishes
Jörg Jenderek

Mark0

  • Administrator
  • Hero Member
  • *****
  • Posts: 2743
    • Mark0's Home Page
Re: 2 variants for StarOffice Drawings; *.sxd *.std
« Reply #1 on: March 23, 2020, 02:53:41 AM »
Thanks Jörg!

I think I'll add the template definition, and refine the existing SXD one.