Author Topic: ja-mozilla.trid.xml for Mozilla archive (optimized ZIP) like OMNI.JA  (Read 959 times)

jenderek

  • Sr. Member
  • ****
  • Posts: 375
Hello trid users,

some days ago run trid inside Firefox and Thunderbird directories. There
exist files with name "omni.ja"
When i run TrID on such examples most are described as "Unknown!" (see
appended output/trid-v-old.txt)

For comparison reason i also run the file utility (version 5.40). This
describes this examples with keep-going option as "Zip archive data" and
"Mozilla archive omni.ja" ( see appended output/file-k-5.40.txt). According
to documentation i also run a patched file command that displays more
information ( see appended output/file.tmp)

According to documentation the ZIP central directory is not placed near the
end of the file but placed near the beginning (at offset 4) because of
performance reasons and at position 0 an offset to last entry read on
startup is stored. In consequence such "optimized"archive are not valid ZIP
archives any more but many newer version of zip tools ( 7z and original zip
tool) can handle such archives with warning messages about "invalid" file
structure. So you can regenerate the "valid" ZIP archive by a command like:
   zip -FF omni.ja --out omni.zip

There was mentioned a python script optimizejars.py which should do the
conversion job. For me the deoptimize step works, but this was already done
by zip tool with fix broken argument. But the other way "optimize" did not
work for me.

After gathering enough examples i run tridscan to generate
ja-mozilla.trid.xml. First XML construct looks like:
   <Bytes>00504B01021400</Bytes>
   <ASCII> . P K</ASCII>
   <Pos>3</Pos>

At position 0 offset to last entry read on startup is stored as 4 byte
little endian integer. So in theory maximal value is FFffFFff hexadecimal,
but in real life only "lower" values occur. So upper byte of that value was
nil.
At offset 4 the central directory starts with signature ( hexadecimal
0x02014b50 or as PK\001\002 string).
Afterwards at offset 8 the version made by is stored as 2 byte little endian
integer.
The lower byte indicates the ZIP version of this file. The value/10
indicates the major version number, and the value mod 10 is the minor
version number. So value 0x14=20 means version 2.0.
The upper byte indicates the compatibility of the file attribute information
(host mode). If the file is compatible with MS-DOS (v 2.04g) then this value
will be zero.
So assuming that higher offset are possible then first construct now becomes
like:
   <Bytes>504B01021400</Bytes>
   <ASCII> P K</ASCII>
   <Pos>4</Pos>
After additional example from Linux Firefox 20.0 with made by v3.0 UNIX this
now becomes like:
   <Bytes>504B0102</Bytes>
   <ASCII> P K</ASCII>
   <Pos>4</Pos>

At offset 10 the Version needed to extract (minimum) is stored as 2 byte
little endian integer. Here i found version 1.0 and 2.0 and always
"DOS". That is expressed by second XML construct like:
   <Bytes>00</Bytes>
   <Pos>11</Pos>

At offset 12 the General purpose bit flag is stored as 2 byte little endian
integer.  ere i found that upper byte was always nil. That means bits 8 til
15 are zero. That means no language encoding, enhanced compression and
alternate streams. That is expressed by third XML construct like:
   <Bytes>00</Bytes>
   <Pos>13</Pos>

At offset 14 the compression method is stored as 2 byte little endian
integer. I found values like 0 for stored or 8 for deflated. Highest
possible value is 99 for AE-x encryption marker. So upper byte is always
nil.
At offset 16 the file last modification time is stored as 2 byte little
endian in DOS format. In my examples this value was nil. That means time
00:00.
At offset 18 the file last modification date is stored as 2 byte little
endian in DOS format. In all my examples this value was 3C21 in little
endian. That is date 2010-01-01 as correctly reported by 7z and unzip but
wrong by file command. That 3 observations are expressed by XML construct
like:
   <Bytes>000000213C</Bytes>
   <ASCII> . . . !</ASCII>
   <Pos>15</Pos>

At offset 24 the compressed size is stored as 4 byte little endian
integer. I expect that all values are possible. So the following XML
construct was triggered by lucky circumstances and should be removed:
   <Bytes>00</Bytes>
   <Pos>27</Pos>

At offset 28 the uncompressed size is stored as 4 byte little endian
integer. I expect that all values are possible. So the following XML
construct was triggered by lucky circumstances and should be removed:
   <Bytes>00</Bytes>
   <Pos>31</Pos>

At offset 32 the file name length is stored as 2 byte little endian integer
variable n. The file name some bytes later stored at offset 50 is like:
    chrome.manifest
    greprefs.js
    defaults/preferences/webide-prefs.js

So length is lower 256, That means upper byte is nil.
At offset 34 the extra field length is stored as 2 byte little endian
integer variable m. In my examples this was nil.
At offset 36 the file comment length is stored as 2 byte little endian
integer variable k. In my examples this was nil.
At offset 38 the Disk number is stored as 2 byte little endian integer
variable. In my examples this was nil.
That 4 observations are expressed by XML construct like:
   <Bytes>00000000000000</Bytes>
   <Pos>33</Pos>

At offset 40 the Internal file attributes are stored as 2 byte little endian
integer variable. The lowest bit of this field indicates if file is text.
The remaining 14 bits are unused ( This means nil). So upper byte is always
zero.
At offset 42 the external file attributes attributes are stored as 4 byte
little endian integer. In my examples the 2 lower bytes of that variable
are nil. That two observations are expressed by XML construct like:
   <Bytes>000000</Bytes>
   <Pos>41</Pos>

At offset 46 the relative offset of local file header is stored as 4 byte
little endian integer. So in theory maximal value is FFffFFff hexadecimal,
but in real life only "lower" values occur. So upper byte of that value was
nil. That is expressed by second XML construct like:
   <Bytes>00</Bytes>
   <Pos>49</Pos>
Assuming that also higher offset are possible the above construct vanish.

Because i inspect only a few dozen examples i probably get short nil
sequences at higher offsets. That was expressed by XML construct like:
   <Bytes>00</Bytes>
   <Pos>91</Pos>
Assuming that was triggered by lucky circumstances i remove the above
construct.

Instead of mime type application/zip for valid ZIP i choose a user defined
one. That is expressed by line like:
   <Mime>application/x-zip</Mime>

Because of using only few examples i get in global strings section some
obviously garbage strings like:
   <String>$'''C</String>
   <String>%'''C</String>
So i delete such lines.

Because inspected entries have no extra field and no file comments, so after
file name the next segment start with PK-magic. That observations are
expressed in global strings section some lines like:
   <String>APPSTRINGS.PROPERTIESPK</String>
   <String>DIALOG.JSPK</String>
   <String>.DTDPK</String>
But i do not know which parts fare really characteristic for Mozilla omni
archives.

With my trid definition all of my inspected optimized OMNI.JA examples are
now recognized (see appended output/trid-v.txt). TrID definition and output
are stored in archive omni.zip. I hope that the XML file can be used
in future version of triddefs.

But things are become worse. The optimized ZIP layout is often used in newer
Mozilla products, but some "new" Thunderbird versions still use standard ZIP
variant. And some Mozilla add on ( with XPI extension) seems to also use the
optimized variants.

With best wishes
Jörg Jenderek
C21 greprefs APPSTRINGS PROPERTIESPK JSPK DTDPK
# LocalWords:  XPI

Mark0

  • Administrator
  • Hero Member
  • *****
  • Posts: 2730
    • Mark0's Home Page
Re: ja-mozilla.trid.xml for Mozilla archive (optimized ZIP) like OMNI.JA
« Reply #1 on: September 26, 2021, 02:26:13 PM »
Thanks!
I found some omni.ja files with a slightly different structure, maybe older, so some more research will probably be necessary.