Author Topic: zip-ja.trid.xml for Mozilla archive omni.ja  (Read 5157 times)

jenderek

  • Sr. Member
  • ****
  • Posts: 375
zip-ja.trid.xml for Mozilla archive omni.ja
« on: November 14, 2017, 06:38:14 PM »
Hello,

when i run trid in Mozilla Firefox and Thunderbird program directory on
different OS there i found dozen of files with name omni.ja which are not
recognized (see appended output/trid-old.txt).

Some Information about such archives is found on Wikipedia page about newer
firefox browser. So i use this as reference URL by line:
   <RefURL>https://en.wikipedia.org/wiki/Firefox_4</RefURL>

After searching on the net i found a site explaining some facts:
https://sourceforge.net/p/sevenzip/discussion/45798/thread/411a70b3/?limit=25

This kind of archive has the central directory ( start with magic
PK\001\002) at the beginning ( offset 4), where real ZIP files has central
directory at the end and start with file header ( magic PK\003\004). Mention
this fact in remark line. So this is not strict ZIP file format. So instead
mime type application/zip use an other one by line:
   <Mime>application/x-zip</Mime>

Because it is a kind of zip archive name trid definition file
zip-ja.trid.xml. Most ZIP unpacker are able to extract such archives maybe
only partly or after some warnings. An example is command line like
zip -FF omni.ja --out omni.zip
So mention this also in remark line.

To understand pattern of definition file look at ZIP specification in
APPNOTE.TXT found at https://pkware.cachefly.net/webdocs/casestudies/ .

The main and first pattern is magic of file header by XML construct:
   <Bytes>504B0102</Bytes>
   <ASCII> P K</ASCII>
   <Pos>4</Pos>

The remaining pattern are determined by meta information of first file etc.
( look also at output\7z-l.txt). Version made was always found
2.0~2*10+0=20=14h. This is expressed by
   <Bytes>1400</Bytes>
   <Pos>8</Pos>
Unused upper byte of version needed to extract is expressed by:
   <Bytes>00</Bytes>
   <Pos>11</Pos>

Unused upper byte of general purpose bit flag is expressed by:
   <Bytes>00</Bytes>
   <Pos>13</Pos>
Unused upper byte of compression method and time and date of 1st file found
is 1 Jan 2010 00:00. This is expressed by:
   <Bytes>000000213C</Bytes>
   <Pos>15</Pos>
Upper byte of compressed size is null if first file is small.  This is
expressed by:
   <Bytes>0000</Bytes>
   <Pos>26</Pos>
Upper byte of uncompressed size is null if first file is small. This is
expressed by:
   <Bytes>0000</Bytes>
   <Pos>30</Pos>
Upper byte of file name length size is null if first file is like
"chrome.manifest". Then followed by Extra field, file comment, disk number,
file attributes are null. This is expressed by:
   <Bytes>00000000000000000000000000</Bytes>
   <Pos>33</Pos>
Upper byte of file offset size is null. This is expressed by:
   <Bytes>00</Bytes>
   <Pos>49</Pos>
More null pattern are part of 1st file content or by next file stuff.

If central directory contains a file name like appstrings.properties and no
extra field and file comment are used then next file header (magic
PK\001\002) immediately follows. This is expressed by global string like
   <String>APPSTRINGS.PROPERTIESPK</String>

I keep all found patterns, even if some occur by lucky circumstances,
because the file format of omni.ja is not official commentated. This is
very annoying that widely used web browser like firefox claiming to be open
source use undocumented file format to store data.

With new trid definition files now all inspected omni.js archive are now
recognized (see appended output/trid-new.txt). TrID definition, and output
are stored in archive ja.zip. I hope that the XML file can be used in
future version of triddefs.

With best wishes
J?rg Jenderek

Mark0

  • Administrator
  • Hero Member
  • *****
  • Posts: 2743
    • Mark0's Home Page
Re: zip-ja.trid.xml for Mozilla archive omni.ja
« Reply #1 on: November 14, 2017, 08:57:48 PM »
Good! I will try to collect some more .ja file to further refine the definition, if possible.
Thanks Joerg!