Author Topic: bsdiff-chrome.trid.xml for Courgette binary diff output for "chromium" browsers  (Read 562 times)

jenderek

  • Sr. Member
  • ****
  • Posts: 375
Hello trid users,

some days ago i must handle some patch files. Often the file name suffix PATCH
is used. For control reason i look for samples with that suffix on my
systems. In context of "chromium" web browsers like Opera i found samples
which are currently not recognized by TrID. This variant called "Courgette" is
used to store updates for such web browsers. Unfortunately also BSDIFF suffix
is used for such update patches. This suffix is also used by BSD binary
patching tool from Colin Percival. The Google developer used in the past that
format but switched to Courgette tool. It took me two days to understand that
Courgette file format is total different from bsdiff format described by
bsdiff.trid.xml. So i looked after header for signatures of known compression
methods like (BZ0 for BZIP and son on) but i found nothing because it is an
own algorithm. Also the header format is different which is not visible at
first glance because both formats contains length information in header. So i
mention these facts in remark line.

So i run trid utility on such examples with PATCH or BSDIFF suffix. The
samples are "recognized" and described wrong as "PrintFox/Pagefox bitmap" by
bitmap-printfox-g.trid.xml. But this is triggered because all such patch
samples start with up cased letter G (see appended trid-v-old.txt in output).

For comparison reason i also run the file format identification utility DROID
(See https://sourceforge.net/projects/droid/). Here the samples are not
recognized.

For comparison reason i also run file command (version 5.45) on such samples.
Here the samples are "recognized". These are here described wrong as
"Nintendo Gameboy Music/Audio Data" because samples start with 3 byte
string GBS (see appended file-5.45.txt in output).

Luckily i found page about Courgette on chromium web server.
 <RefURL>
 https://www.chromium.org/developers/design-documents/software-updates-courgette/
 </RefURL>

Following these mentioned sources i found more samples in testdata and the
starting bytes are understandable when looking in thirdparty header file
bsdiff.h. For my samples i get "high" values (GB) for length values which i do
not understand. Unfortunate i found only few samples because that patch is
only used internally by web browsers and i found no standalone utility.

The samples are not text files. So the mime type like text/plain is wrong for
such samples. The application/x-bsdiff used by for binary BSD bsdiff is also
not correct because file format is totally different.  So at the moment these
samples get generic type used for binary files. So that is expressed by line
like:
   <Mime>application/octet-stream</Mime>

The characteristics for such samples is described uniquely by first and onyl
XML construct. This looks like:
   <Bytes>4742534449463432</Bytes>
   <ASCII> G B S D I F 4 2</ASCII>
   <Pos>0</Pos>
So this is 8 bytes string which is defined as MBS_PATCH_HEADER_TAG inside
bsdiff.h. So this look similar to starting magic BSDIFF40 for BSD bsdiff
files. That is the least common between these 2 bsdiff formats.

With the new definition my inspected web browser patches are now recognized
and described (see appended trid-v-new.txt in output).

TrID definitions, some samples and output are stored in archive
bsdiff_chrome.zip. I hope that my definition can be used in future version of
triddefs.

With best wishes
Jörg Jenderek

Mark0

  • Administrator
  • Hero Member
  • *****
  • Posts: 2743
    • Mark0's Home Page
Thanks!