Author Topic: Updates of {ark-pak-quake,ark-packdir,pack-git}.trid.xml for "PACK" start magic  (Read 3874 times)

jenderek

  • Sr. Member
  • ****
  • Posts: 369
Hello,

after handling some Dzip compressed archives with DzipGui.exe downloaded at
http://speeddemosarchive.com/dzip/ i saw that this pack tool create also
another archive type with "PAK" file name extension.

The examples are described correctly as "Quake archive" by
ark-pak-quake.trid.xml. Unfortunately pak files are also described with 50%
rate as "Acorn PackDir compressed Archive" (see appended
output/trid-old.txt) by ark-packdir.trid.xml because they use the same XML
construct:
   <Pattern>
      <Bytes>5041434B</Bytes>
      <ASCII> P A C K</ASCII>
      <Pos>0</Pos>
   </Pattern>

DzipGui tools also register for pak files same things as dz. This is now
expressed by line:
   <Mime>application/x-dzip</Mime>

"Quake" variant is correct. I also verified this fact by file command
version 5.32 ( see appended output/file-5.32.txt) which gives also
additional information. After some net searching i find a page about that
Quake's container file format. So add this as reference URL by line:
   <RefURL>https://quakewiki.org/wiki/.pak</RefURL>

According to that side:
At offset 4 4-bytes integer (little endian) with position of the file table.
At offset 8 4-bytes integer (little endian) with size of the file table.

For inspected examples table is stored at end of archive. Then adding these
2 value gives archive size. Because 1 table entry has 64 bytes, so dividing
file table size by 64 gives number of files inside archive. So mention this
facts in remark line. I verified this facts by using a patched file command
which show this information ( see output/file-new.txt).

For real world examples these 2 values are not at upper limit or in other
words upper bytes are null. So for such examples like pak0.pak with an
archive size of just some couple of Megabytes and some hundreds of files
inside archive table offset and size is expressed by 2 XML constructs like:
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>7</Pos>
   </Pattern>
   <Pattern>
      <Bytes>0000</Bytes>
      <Pos>10</Pos>
   </Pattern>

If somebody finds bigger archive try a more restrictive construct like in
ark-pak-quake-16M.trid.xml like:
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>11</Pos>
   </Pattern>
This is true for all such quake archives if table size is less than
16*1024*1024 or number of files is less than 262144.


"Acorn PackDir compressed Archive" have no file name extension according to
ark-packdir.trid.xml. This is true for some examples like xfmpdem. But i
find also examples with extensions like asylumsrc.pkd or fontfix.bin. So
extension is now expressed by line:
   <Ext>/BIN/PKD</Ext>

Because no URL exist in TrID definition file ark-packdir.trid.xml so add 1 by
line like:
   <RefURL>https://www.kyzer.me.uk/pack/xad/#PackDir</RefURL>

Little information about such compressed files is written in textual form, but
you can download source as xad_PackDir.lha archive. There in PackDir.c file
format is described. According to C source archive start with 5 byte
"PACK\0" magic. This is now described by first new construct:
   <Pattern>
      <Bytes>5041434B00</Bytes>
      <ASCII> P A C K</ASCII>
      <Pos>0</Pos>
   </Pattern>

Afterwards compression mode is stored as 4 byte little-endian in range from
0 to 4. 0 is used for GIF LZW with a maximum of 12 bits til 4 is used for
GIF LZW with a maximum of 16 bits. So upper 3 bytes are always null. This
is expressed by additional second XML construct:

   <Pattern>
      <Bytes>000000</Bytes>
      <Pos>6</Pos>
   </Pattern>

Mention these facts also in remark line.  Unfortunately i still get detection
collision with git pack files. So i create 5 more specific variants for used
compression method starting with ark-packdir-12bit.trid.xml.

Unfortunately there still exist a detection collision. At this point PackDir
archive with compression mode 0 like constructed testLZW12.bin is the same
as "Git pack format" version 0 like constructed testV0git.pack.

According to sources at offset 9 root directory object begins, that start
with null terminated object name. That information is shown by using a
patched file command ( see output/file-new.txt). That objects look similar.

"IDEFS::IDE-4.$.Apps.GRAPHICS.!XFMPdemo" is shown for xfmpdem archive and
"ADFS::RPC.$.websitezip.FONTFIX" is shown for fontfix.bin. According to
Wikipedia page about RISC OS that means for second example ADFS file system
where Colons are used to separate the file system from the rest of the path.
Unfortunately not all file system names end with "FS" like exception
"SystemDevice" mentioned on web site about RISC OS file systems found at
https://www.riscosopen.org/wiki/documentation/show/Introduction%20To%20Filing%20Systems
So instead "FS::" 2 byte string "::" should always be found in root
name. This is expressed in new global string section by line:
   <String>::</String>

Furthermore the root is represented by a dollar sign ($) and directories are
separated by a full stop (.). This is now expressed by line:
   <String>.$.</String>

RISC OS use also a system to mark file types. Such a file type list is found
at http://www.filebase.org.uk/filetypes. There we find for Packdir
compressed archive three hexadecimal digits code 68E. So i translate this in
user defined mime type by line:
   <Mime>application/x-acorn-68E</Mime>

With these modifications now all Acorn PackDir compressed archive get this
as first description ( see output/trid-new.txt).



Third file type that start with "PACK" string are Git pack files (*.pack).
described by pack-git.trid.xml with XML construct:
      <Bytes>5041434B00000002</Bytes>
      <ASCII> P A C K</ASCII>
      <Pos>0</Pos>

Git pack format is described by pack-format.txt found for example at
https://github.com/git/git/blob/master/Documentation/technical/ . According
to that text these git files start with 4-byte signature "PACK". Afterwards
4-byte version number in network byte order is stored which is current 2.
This can be verified by looking in output of file command. So
pack-git.trid.xml only recognise version 2 but does not recognise older pack
files like constructed example testV0git.pack. So current TrID definition
file transforms to something like pack-git-v2.trid.xml. It is not clearly
written how version is stored. But example testV0git.pack created by git
version 2.1.2 contains only 2 as version. So apparently only major part of
version is stored inside pack files. So the new generic pack-git.trid.xml
contains a reduced XML construct:

      <Bytes>5041434B000000</Bytes>
      <ASCII> P A C K</ASCII>
      <Pos>0</Pos>

When we look on Wikipedia at git page https://en.wikipedia.org/wiki/Git we
see git version start with 0.99 and end at 2.14 in august 2017.  Furthermore
mention version number facts in remark line. So we can also use 3 version
depending trid definitions starting with next pack-git-v1.trid.xml for
version 1 with XML construct:

      <Bytes>5041434B0100000001</Bytes>
      <ASCII> P A C K</ASCII>
      <Pos>0</Pos>

Unfortunately for version 0 by pack-git-v0bad.trid.xml with construct
      <Bytes>5041434B0100000000</Bytes>
      <ASCII> P A C K</ASCII>
      <Pos>0</Pos>

still gives collision with Acorn PackDir (12-bit LZW) variant like example
pack-git-v0bad.trid.xml. So create better variant pack-git-v0.trid.xml with
splitted 2 pattern:
      <Bytes>5041434B</Bytes>
      <ASCII> P A C K</ASCII>
      <Pos>0</Pos>
      
      <Bytes>00000000</Bytes>
      <Pos>4</Pos>

No mime in trid or IANA found. So add user defined mime type by line:
   <Mime>application/x-git</Mime>

With variant definition files all inspected examples are now detected
correct (See appended output/trid-new.txt).

TrID definition, some examples and output are stored in archive pak.zip .
I hope that my XML files can be used in future version of triddefs.

With best wishes
J?rg Jenderek

Mark0

  • Administrator
  • Hero Member
  • *****
  • Posts: 2683
    • Mark0's Home Page
Hi Joerg!
Thanks for the analyses.
I'm not sure I will add so many defs to avoid some conflicts.
But I'll surely add at least the Git v1 one.

Thanks as usual!