Author Topic: updated oxt.trid.xml for OpenOffice Extension *.OXT  (Read 1099 times)

jenderek

  • Sr. Member
  • ****
  • Posts: 375
updated oxt.trid.xml for OpenOffice Extension *.OXT
« on: March 06, 2022, 10:39:37 PM »
Hello trid users,

some days ago i had trouble with my spelling software. So i looked at the
components. When i run TrID on such examples with file name extension OXT
most are described correctly as "OpenOffice Extension" by oxt.trid.xml.  But
a few like Ereignis-Prozess-Ketten.oxt and Foral-Studio.oxt are only
described generic as "ZIP compressed archive" by ark-zip.trid.xml ( see
appended output/trid-v-old.txt).

For comparison reason i also run other file identification tools.

For comparison reason i check these examples by file command utility. When
running file command (version 5.41) on such extensions these are only
described generic as "Zip" archive or data and generic mime type is
application/zip (see appended file-5.41.txt and file-i-5.41.txt output
directory). But for one example Gallery-Puzzle.2.1.0.1.oxt is also shows a
phrase like (MIME type
"application/vnd.openofficeorg.extension"?). Obviously this is the mime type
for such extensions. This is also used by shared-mime-info and also listed
on nirsoft website extension.nirsoft.net. So this now expressed by new line
like:
   <Mime>application/vnd.openofficeorg.extension</Mime>

The tool DROID ( See http://digital-preservation.github.io/droid/) also
describes the extensions only generic as "ZIP Format" by signature x-fmt/263
and complains about the OXT file name extension ( See output/oxt-droid.csv).

The extension documents are just zip containers. This is expressed by
XML-construct:
   <Bytes>504B0304</Bytes>
   <ASCII> P K</ASCII>
   <Pos>0</Pos>

So i look in output of decompressing tools 7-zip with list (See appended
output/7z-l.txt) and output of unzip with zipinfo option (See
output/unzip-Z.txt).

Most extensions contain a file with name description.xml. This was expressed
inside TrID definition inside global strings section by a line like:
   <String>DESCRIPTION.XML</String>
But in undescribed examples this file is missing. So after running tridscan
to update definition this line vanish.

Why does extension packagers and package tools programmers not put a 8 byte
file with name mimetype and with 39 byte mime type content string
application/vnd.openofficeorg.extension as first archive member?  When this
file is stored uncompressed and without extra fields ( for more time-stamps
and ownership numbers) then life for software like trid, droid, file command
and depending software (like apache web server or worker file manager )
would be easier. Then only a check for that mime type at specif position at
the beginning must be done to get an unique result. So several checks must
be done and these are probably not unique and reliable. So software often
must use really ugly windows concept that file type is given by file name
extension.

I also find such extensions inside LibreOffice directories. So in this fork
the extension can also be used. So this now expressed by update line like:
   <FileType>LibreOffice/OpenOffice Extension</FileType>

With the updated trid definition all examples are now detected ( see
appended output/trid-new-v.txt). TrID definitions, some examples and output
are stored in archive oxt.zip. I hope that my XML file can be used in future
version of triddefs.

With best wishes
Jörg Jenderek

Mark0

  • Administrator
  • Hero Member
  • *****
  • Posts: 2730
    • Mark0's Home Page
Re: updated oxt.trid.xml for OpenOffice Extension *.OXT
« Reply #1 on: July 14, 2022, 01:38:44 AM »
But removing that string make the definition really too generic. Uhm... will leave like it is for the moment.