Author Topic: little.trid.xml little-zip.trid.xml for Mozilla startupCache.4.little or startup  (Read 704 times)

jenderek

  • Sr. Member
  • ****
  • Posts: 375
Hello trid users,

some weeks ago i update to windows 10. So i looked on the Mozilla Firefox web
browser and Thunderbird email client directories.

Some files have names like startupCache.4.little or startupCache.8.little.

So i run trid utility on my LITTLE examples. Many (15/18) samples are
described as "ZIP compressed archive" with mime type application/zip by
ark-zip.trid.xml, but file name suffix is LITTLE instead of ZIP.  (see
appended little_/little_zip/output/trid-v-old.txt). Some samples are described
as "Unknown!" (see appended output/trid-v-old.txt).

For comparison reason i also run the file format identification utility DROID
(See https://sourceforge.net/projects/droid/). Here the zip based samples are
also recognized. These are described as "ZIP Format" with mime type
application/zip by PUID x-fmt/263, but file name suffix is considered as
"bad".

For comparison reason i also run file command (version 5.45) on such
samples. Here such samples are described also generic as "Zip archive data"
(see appended little_zip/output/file-5.45.txt). The mime type is here also
application/zip (see appended file-i-5.45.txt in little_zip/output). The file
name suffix is also not recognized (see appended file-ext-5.45.txt in
little_zip/output). The non zip samples are also not recognized and are
described as data (see appended output/file-5.45.txt).

I and other people often complaining about Microsoft behaviour, but open
software is also not the holy grail in every field. And Mozilla Firefox and
Thunderbird are considered as flagships in that field. Some samples with name
suffix little are found in some Mozilla Firefox and Thunderbird user
directories. I found such samples on Windows 8/10, Raspbian and Mint operating
systems. Now comes the evil part which i call orcifying software and it is not
the first time Mozilla is doing such steps. It is like elves turn into orcs as
told in Tolkien tales. They probably took standard software like ZIP
compression algorithm and modify it. This step is OK but they do not mention
what they do and there exist no file format specification. The file type is
also not officially registered. Some people say "may the source be with you",
but when unpacking Firefox or Thunderbird packages i get hundred of MB with
source lines. Unfortunately nearly nobody has enough expertise and time to
find there the needed explanations. And the worst part is that the use little
suffix. So nobody assume that you can unpack some file with standard tools
like unzip.

If all goes well (file name suffix is OK and samples found in well known sub
directories) then it does not hurt, but in real world you must consider also
all other point views. So some people behave like Putin claiming the world
belongs to me or the whole disc belongs to me for software developers. But
what if hard disc crash, a virus scanner is blocking the start of programs
because suspicious file is found inside little samples.  Then you often get
hundreds of "unknown" samples lying somewhere on your disc and undoing the
chaos is then nearly impossible. Then in the end you must reinstall the
software package or in worst case you must install the whole system. That is
annoying!

Because i found no real documentation for that file format is use generic page
about Mozilla inside. This is expressed by line like:
   <RefURL>https://en.wikipedia.org/wiki/Mozilla</RefURL>
For the zip based samples i used page about ZIP file format on Wikipedia. That
is expressed by line like:
      <RefURL>http://en.wikipedia.org/wiki/Zip_(file_format)</RefURL>

For these samples i use the mime type for zip archives. That is expressed by
line like:
      <Mime>application/zip</Mime>
For the other the mime type is expressed inside little.trid.xml by line like:
      <Mime>application/octet-stream</Mime>

So i create a TrID definition little-zip.trid.xml by running tridscan on zip
based samples. The first sequence is typical for ZIP archive. This is
expressed by XML construct like:
   <Bytes>504B0304140000080800</Bytes>
   <ASCII> P K</ASCII>
   <Pos>0</Pos>
Here i keep the whole byte sequence because i do not know what is required and
what is optional, because i found no file specification. So this looks a
little bit other comparing with zip archive where XML construct inside
ark-zip.trid.xml looks like:
   <Bytes>504B0304</Bytes>
   <ASCII> P K</ASCII>
   <Pos>0</Pos>

So i get some more patterns inside Front Block section. These look like:

   <Pattern>
      <Bytes>0000</Bytes>
      <Pos>20</Pos>
   </Pattern>
   <Pattern>
      <Bytes>0000</Bytes>
      <Pos>24</Pos>
   </Pattern>
   <Pattern>
      <Bytes>000900</Bytes>
      <Pos>27</Pos>
   </Pattern>

Because i do not know if these are option or required i keep all these
construct at the moment. I hope that other user can refine the definition by
running tridscan wiuth more samples or knowing specifications.

When using more samples (15) inside global strings section i get onyly few
lines like:
   <String>RESOURCE</String>
   <String>XBLCACHE</String>
   <String>CONTENT</String>
   <String>TOOLKIT</String>
   <String>.XMLUT</String>
   <String>CHROME</String>
   <String>GLOBAL</String>
By these lines the sub classification should be done. But when i try
to add more samples (called by myself startupCache.4-a.little
startupCache.4-g.little startupCache.4-j.little and dated about 2012
or 2014) then the whole global strings section vanish (see
little-zip.trid.xml.tmp). Then of course the sub classification
characteristic vanish. This behaviuor is typical when you describe by
trid defition the average of different variants. So i hope that other
users know such facts and can improve my defitions. For control reason
i look inside archive by unziping tool 7z (see appended 7z-l-slt.txt
and 7z-l.txt in little_zip/output).

So i create a TrID definition little.trid.xml by running tridscan on not zip
based samples. Here the first bytes seems to be most carateristic. This is
expsressed by XMl conscruct like:
   <Bytes>7374617274757063616368653030303200</Bytes>
   <ASCII> s t a r t u p c a c h e 0 0 0 2</ASCII>
   <Pos>0</Pos>

Here i keep the other byte sequnce because i do not know what is required nad
what is optionl, because i found no file specification. So inside Front Block
section i get also constrct, that maybe are specific like:
   <Bytes>006A73</Bytes>
   <ASCII> . j s</ASCII>
   <Pos>34</Pos>
But i get also XML costructs like:
   <Pattern>
      <Bytes>0000</Bytes>
      <Pos>27</Pos>
   </Pattern>
   <Pattern>
      <Bytes>63</Bytes>
      <ASCII> c</ASCII>
      <Pos>59</Pos>
   </Pattern>

Probably these are triggered by lucky circumstances ( too few examples), but i
do not know. So i hope that other users can improve the defition.

In global strings section i get also many lines. Some maybe are specific like:
      <String>NSXULPROTOTYPECACHE.STARTUPCACHE</String>
      <String>BROWSER_CHROME_URL</String>
      <String>STARTUPCACHE0002</String>
      <String>I'XULCACHE</String>
      <String>BCACHE</String>
      <String>CACHED</String>
      <String>MCACHE</String>
      <String>MOZILLA.ORG</String>
      <String>MOZXUL</String>
      <String>-MOZ-</String>
But i also get lines which are triggered by lucky circum stances like:
      <String>{D'D</String>
      <String>} 'D</String>
      <String>}$'D</String>
      <String>~D'D</String>
So here i hope that other users can improve my deftiontion.

With the new trid definitions now nearly all my LITTLE samples are
described. TrID definitions and output are stored in archive little_.zip. I
hope that my definitions can be used in future version of triddefs.

With best wishes
Jörg Jenderek

Mark0

  • Administrator
  • Hero Member
  • *****
  • Posts: 2744
    • Mark0's Home Page
I think I keep the non-zipped definition, and just the initial pattern seems unique enough.
The zip one instead seems to match many other files, so I'll wait on that.

Thanks as usual.