Author Topic: hex-intel.trid.xml for Intel hexadecimal object  (Read 554 times)

jenderek

  • Sr. Member
  • ****
  • Posts: 375
hex-intel.trid.xml for Intel hexadecimal object
« on: January 23, 2024, 10:05:40 PM »
Hello trid users,

some days ago i started to build my own GRUB switch. The project page URL is:
https://github.com/rw-hsma-fpga/grub-switch

In last step the firmware is written to micro controller with help of avrdude
tool. The firmware files have file name suffix HEX.

So i run trid utility on such HEX examples. All samples are not recognized
and are described therefore as "Unknown!" (see appended trid-v-old.txt
in output).

For comparison reason i also run the file format identification utility DROID
(See https://sourceforge.net/projects/droid/). Here the samples are also not
recognized (See droid-hex.csv in output).

For comparison reason i also run file command (version 5.45) on such
samples. Here such samples are also not recognized and described generic as
"ASCII text" (see appended output/file-5.45.txt). The mime type is therefor
text/plain (see appended file-i-5.45.txt in output). The file name suffix is
here not shown (see appended file-ext-5.45.txt in output).

On Linux (Raspian 11) such samples are called "Intel® hexadecimal object
file". Here text/x-hex is used as mime type. Here suffix HEX is displayed.
That information can be not seen in freedesktop shared MIME-info database.
By Notepad++ Editor this format is called "Intel HEX binary data".

Luckily i found on the net information about the HEX format on Wikipedia.  So i use
this as reference. That is expressed inside new definitions by line like:
   <RefURL>https://en.wikipedia.org/wiki/Intel_HEX</RefURL>

Apparently these files are just pure text files. So the generic text/plain
mime type is not wrong but more suited is type shown on Linux. So i choose
that user defined type.  That is expressed by line like:
   <Mime>text/x-hex</Mime>

Often such samples are just called "Intel HEX". If you are not a programmer or
Latin based language speaking person this means nothing to you. So i choose
text with more precise "hexadecimal" phrase. So this is expressed by line like:
      <FileType>Intel hexadecimal object</FileType>

On Wikipedia a dozen of suffix are mentioned, but in my inspected samples only
HEX was used as file name suffix. So this is expressed by line like:
   <Ext>HEX</Ext>

So i create TrID definition hex-intel.trid.xml by running tridscan on my
samples. Then i create a patched file command according to documentation.
When running this version (see appended file.tmp in output) i try to
understand TrID patterns and try to refine definition.

The first XML construct looks like:
   <Bytes>3A</Bytes>
   <ASCII> :</ASCII>
   <Pos>0</Pos>
According to documents the colon character is start code (record
mark). According to Wikipedia there exist variants with leading bytes, symbol
tables or comments before. Such samples are of course not matched by my
current definition. An all my examples start with colon character.

According to documentation at offset 7 record type (two hex digits) is stored.
Here only six values are possible (00 - 05). That is expressed by second XML
construct. That looks like:
   <Bytes>30</Bytes>
   <ASCII> 0</ASCII>
   <Pos>7</Pos>

The sequence 01 is used for "End Of File" record type. This must occur exactly
once per file in the last record of the file. The byte count is 00, the
address field is typically 0000 and the data field is omitted. So the total
last record looks like :00000001FF. That is expressed inside Global Strings
section by line like:
   <String>00000001FF</String>

If the definition is not unique enough then a few sub class variants can be
created.  At offset 1 RECLEN (two hex digits) is stored. This indicates the
number of bytes (hex digit pairs) in the data field. The maximum byte count is
255 (0xFF). 8 (0x08) 16 (0x10)and 32 (0x20) are commonly used byte counts. In
my samples i found values (like 0x02 0x04 0x10). So for samples with 16 record
length the memory address offset starts often with 0x0000 and is incremented
by sixteen so next record has offset 0x0010 followed by next record at 0x0020
offset and so on. Often the fields before and same has same values. So most of
the remaining line are triggered by the behaviour. So these are expressed
inside global strings section by lines like:
   <String>0003</String>
   <String>0005</String>
   <String>0006</String>
   <String>0007</String>

Maybe that when scanning more samples (especially short and with other more
exotic record length) most or all of these lines will vanish. But at the
moment i keep all these lines.

With this new trid definition now all my intel HEX samples are recognized and
described. TrID definition, some samples and output are stored in archive
hex_.zip. I hope that my definition can be used in future version of triddefs.

With best wishes
Jörg Jenderek

Mark0

  • Administrator
  • Hero Member
  • *****
  • Posts: 2743
    • Mark0's Home Page
Re: hex-intel.trid.xml for Intel hexadecimal object
« Reply #1 on: January 24, 2024, 11:54:33 PM »
Thanks for the new def!

I collected and scanned some other ~20 .hex files and indeed most of the strings go away: in addition to EOF one, just 0002 survived, so I removed that too.