Author Topic: diff-context.trid.xml for "context" variant of diff output text  (Read 598 times)

jenderek

  • Sr. Member
  • ****
  • Posts: 375
diff-context.trid.xml for "context" variant of diff output text
« on: December 29, 2023, 02:29:48 PM »
Hello trid users,

some days ago i must handle some patch files. Unfortunately there exist about
a dozen of different variants. In this session i will handle "context" samples
which are not "unified".

So i run trid utility on such examples. These samples are not recognized (see
appended trid-v-old.txt in output).

For comparison reason i also run the file format identification utility DROID
(See https://sourceforge.net/projects/droid/). Here the samples are not
recognized.

For comparison reason i also run file command (version 5.45) on such samples.
Here the samples are recognized and described as "context diff output" (see
appended file-5.45.txt in output). If running file command with keep going
option -k then with lower priority such samples are described with second
phrase as "text" (see appended file-k-5.45.txt in uni/output). The mime type
here is not text/plain but text/x-diff (see appended file-i-5.45.txt in
output). Here no file name suffix is listed (see appended file-ext-5.45.txt in
output).

On Linux according to shared MIME-info database such samples are called
"Differences between files". Here text/x-patch is used as mime type. But
text/x-diff is listed as alias of sub class text/plain. The samples are just
recognized by looking for 4 byte sequence "*** " at the beginning. Here 2
suffix (diff patch) are listed. That information can be seen in source
freedesktop.org.xml.in found for example on gitlab.freedesktop.org.

Unfortunately there exist no precise documentation about this file format and
what is the difference compared with offer diff documents. In already existing
definitions like diff.trid.xml a page about Diff utility on Wikipedia is
used. Luckily there also exist a section about Context format. So i use this
as reference. That is expressed by line like:
 <RefURL>https://en.wikipedia.org/wiki/Diff_utility#Context_format</RefURL>

I choose the mentioned mime type from Linux shared database. That is expressed
by line like:
   <Mime>text/x-patch</Mime>

Such output are used/created by diff and patch utility. Therefore these 2
names are often used as file name suffix. Probably maybe here also abbreviated
suffix (DIF PCH) is used, but i myself do not found such samples. I also do
not found rejected patches with REJ suffix for this variant.  So in the end i
found only 2 instead of 5 suffix. That is expressed by line like:
   <Ext>DIFF/PATCH</Ext>

Such diff variants are created by --context or the shortened -c option of GNU
diff for example. So i mention this fact in the remark line.

After running tridscan i look at generated diff-context.trid.xml. As expected
the first pattern is characteristic for some context diff. This looks like:
   <Bytes>2A2A2A20</Bytes>
   <ASCII> * * *</ASCII>
   <Pos>0</Pos>

But according to share mime database there exist sample where instead of space
character a tabulator char is used at offset 3. But i myself do not found such
samples. I also get 2 apparently characteristics expressed inside global
strings section by lines like:
   <String>***************</String>
   <String>----</String>


TrID definition, some samples and output are stored in archive
context_diff.zip. I hope that my definitions can be used in future version of
triddefs.

As described at the beginning there exist some other difference output. I will
try to handle this in future session.

With best wishes
Jörg Jenderek

Mark0

  • Administrator
  • Hero Member
  • *****
  • Posts: 2743
    • Mark0's Home Page
Re: diff-context.trid.xml for "context" variant of diff output text
« Reply #1 on: December 31, 2023, 12:12:21 PM »
Thanks Jörg!