Hello trid users,
some days ago i must update my python package. From previus version an
program/library directory remains with files inside. So i look at file types
remaining there. On type has file name suffix EGG. So i look for such files on
my systems.
So i run trid utility on such examples with EGG suffix. Many of the samples
are "recognized" and described with highest priority as "Python Egg" without
mime type by egg-python.trid.xml. But many are only described as "ZIP
compressed archive" with mime type application/zip by ark-zip.trid.xml (see
appended trid-v-old.txt in output).
For comparison reason i also run the file format identification utility DROID
(See
https://sourceforge.net/projects/droid/). Here the samples are recognized
generic and described as "ZIP Format" by PUID x-fmt/263. Here Egg suffix is
considered as "bad".
For comparison reason i also run file command (version 5.45) on such samples.
Here the samples are also "recognized". These are here described generic as
"Zip archive data" (see appended file-5.45.txt in output). Here also generic
application/zip mime type is shown (see appended file-i-5.45.txt in output )
and no suffix is here shown (see appended file-ext-5.45.txt in output ).
On Linux according to shared MIME-info database such samples are called " "Zip
archive" Here "application/zip" is used as mime type. The samples are just
recognized by looking for 4 byte sequence PK\003\004 at the beginning. Here 2
suffix are listed (*.zip *.zipx). That information can be seen in source
freedesktop.org.xml.in found for example on gitlab.freedesktop.org.
The samples are described correctly as ZIP. That is expressed by first XML
construct. That looks like:
<Bytes>504B0304</Bytes>
<ASCII> P K</ASCII>
<Pos>0</Pos>
The zip content can be seen by running unpacking tools (like 7z unzip see
appended 7z-l-slt.txt 7z-l.txt unzip-lv.txt unzip-Z.txt in output).
The sub classification is done by lines inside global strings section. These
looks like:
<String>TOP_LEVEL.TXTPK</String>
<String>__INIT__.PYCPK</String>
<String>PKG-INFOPK</String>
<String>ZIP-SAFEPK</String>
<String>EGG-INFO</String>
<String>S.PYCPK</String>
<String>S.TXTPK</String>
<String>SOURCE</String>
After running tridscan on undetecxted samples now i look what has changed.
One line vanished. That was.
<String>__INIT__.PYCPK</String>
Most samples contain a python module with name __init__.pyc. But in some samples
(like example-21.12-py3.6.egg vboxapi-1.0-py3.11.egg) the moduel name is like
__init__.cpython-36.pyc inside __pycache__ directory and some packages (like
esptool-2.5.1-py2.7.egg) do not have simialar module.
In one line the s letter befor point vainhed. The old line looks like:
<String>S.PYCPK</String>
So this now becomes like:
<String>.PYCPK</String>
In recognizeds samples i founf modues with names like:
adatags.pyc
batchtags.pyc
conftags.pyc
CppSemantics.pyc
cpptags.pyc
difftags.pyc
dtags.py
That triggereed the old line. The undetected samples also contain python
module with PYC file name suffix, but there the main names have no s character
as last character of main name.
For such files i found no mime type. But the egg are a sub class of
ZIP container. So in my opinion the pytheon eggs should at least get then the mime type of
container. So this is now expressed by line like:
<Mime>application/zip</Mime>
With the updated definition now all my inspected Pythion EGG files recognized
(see appended trid-v-new.txt in output).
TrID definition and output are stored in archive egg_.zip. I hope that
my definition can be used in future version of triddefs.
With best wishes
Jörg Jenderek