Hello trid users,
some days ago i must handle some patch files. Unfortunately there exist about
a dozen of different variants. In this session i will handle "bsdiff" samples
which are "binary" and not text. The samples are created by bsdiff utility.
So i run trid utility on such examples. These samples are recognized and
described as "bsdiff patch" by bsdiff.trid.xml. As suffix BSDIFF is listed
here. No mime type is shown here (see appended trid-v-old.txt in output).
For comparison reason i also run the file format identification utility DROID
(See
https://sourceforge.net/projects/droid/). Here the samples are also
recognized. These are described as "BSDIFF" with version 4.0 by PUID fmt/439.
No mime type is listed here (see appended bsdiff.csv in output).
For comparison reason i also run file command (version 5.45) on such samples.
Here the samples are recognized and described as "bsdiff(1) patch file"" (see
The mime type here is application/octet-stream triggered by binary nature of
the samples (see appended file-i-5.45.txt in output). Here no file name suffix
is listed (see appended file-ext-5.45.txt in output).
On Linux according to shared MIME-info database such samples are called
"Binary differences between files". Here application/x-bsdiff is used as mime
type. This makes sense because the samples are binary file and not text files
like in many other difference output. The samples are just recognized by
looking for 8 byte sequence "BSDIFF40" at the beginning. but also "BSDIFN40"
is here considered as valid start magic, but i do not or can create such
samples. That information can be seen in source freedesktop.org.xml.in found
for example on gitlab.freedesktop.org.
So in worst case the patterns inside bsdiff.trid.xml must be reduced or variants
must be created. The 2 byte string 40 at offset apparently is interpretable as
version "4.0". I do not know if other versions exist, but i found hints at
https://github.com/cperciva/bsdiff/blob/master/bsdiff-ra/FORMATSo i decided to create a variant bsdiff-v4.trid.xml based on current definition.
First i choose the mentioned mime type from Linux shared database. That is expressed
by line like:
<Mime>application/x-bsdiff</Mime>
The characteristics with version is still described by first XML construct. This
looks like:
<Bytes>4253444946463430</Bytes>
<ASCII> B S D I F F 4 0</ASCII>
<Pos>0</Pos>
The second XML construct looks like:
<Pattern>
<Bytes>0000000000425A6839314159</Bytes>
<ASCII> . . . . . B Z h 9 1 A Y</ASCII>
<Pos>27</Pos>
<Pattern>
According to documentation at offset 24 patch data block length is stored at 8
byte integer. In samples data length is low ( less than 100000h). So the 5
upper bytes are nil. At offset 32 the compressed data starts. As mentioned
here patterns for bzip2 compressed archive occur here. Such pattern are
described by ark-bz2-old.trid.xml, ark-bzip.trid.xml or ark-bz2.trid.xml.
In inspected samples "not" old described by third definition and block size 900k is
used (see appended file.tmp from patched file command in output). In my
bsdiff tool there exist no option to change the block size. I tried to force
block size usage of 200 k by setting environment variable BZIP2="-s" or BZIP2="-1" which
works for bzip2 command, but has no effect for bsdiff. When looking in DROID
definition the data length and block size bytes are ignored. So i do the same and
the above construct now becomes like:
<Pattern>
<Bytes>425A68</Bytes>
<ASCII> B Z h</ASCII>
<Pos>32</Pos>
</Pattern>
<Pattern>
<Bytes>314159</Bytes>
<ASCII> 1 A Y</ASCII>
<Pos>36</Pos>
</Pattern>
TrID definition, some samples and output are stored in archive
bsdiff_.zip. I hope that my definition can be used in future version of
triddefs.
Maybe there exist little bit other variants of bsdiff.
With best wishes
Jörg Jenderek