Hello trid users,
some days ago i looked at the content of an exotic CD-ROM. There are also
stored samples which are misidentified. The samples have TFM file name suffix.
Unfortunately other TFM samples are misidentified as other file formats. One
sample (tri10u.tfm) is misidentified as "gfxboot compiled html help".tfm" by
hlp-gfxboot.trid.xml without mime type. The file name suffix shown is HLP (see
appended trid-v-old.txt in output). Such samples can be found for example
inside package gfxboot-themes. The recognition happens by one XML
construct. That looks like:
<Bytes>0412</Bytes>
<Pos>0</Pos>
So only 16 bit are used for recognition. Apparently this is sometimes too
weak. According to file command recommendations at least 32 bits should be
used for recognition.
For comparison reason i also run the file format identification utility DROID
(See
https://sourceforge.net/projects/droid/). Here only the HTML samples are
recognized. These are described as "Hypertext Markup Language" with
mime type text/html by PUID fmt/96. The other samples are not
recognized.
For comparison reason i also run file command (version 5.45) on such
samples. Here the HTML and HLP samples are "recognized". The HP
samples are here also described as "gfxboot compiled html help file"
(see appended and file-5.45.txt in output). For the samples here also
generic application/octet-stream mime type is shown (see appended
file-i-5.45.txt in output). Here no file name suffix is shown (see
appended file-ext-5.45.txt in output). The TFM samples is not
described as "TeX font metric data" (see appended file-k-5.45.txt in
output).
In current definition a page on sourceforge is used as reference. That
is not wrong, but not really useful because after redirecting with
invalid links the quintessence is that development now happens on
GitHub. So in variants i use gfxboot page on Github as reference. That
is expressed by line like:
<RefURL>
https://github.com/openSUSE/gfxboot</RefURL>
In current definition as remark is written how the step from HTML to HLP
is done as described in GFXBOOT(1) man page. This is done by command
line like:
gfxboot --help-create
By this step the tool "compiles" and generate from "readable" HTML
text binary HLP help pages. These can be considered as "tokenized" html
pages. How this happens can be see when looking inside perl script
gfxboot. The relevant lines for identification are like:
page => "\x04", # start new page
label => "\x12", # label start, no text output; label end = "\x13"
title => "\x14", # start page description; ends with "\x10"
normal => "\x10", # back to normal (color, text output)
li => "\x16", # start list item; ends with "\x15" or "\x16"
ind => "\x17", # set indentation
link => "\x13", # label end; set link text color (gfx_color2/3)
So i think it is better to reference the reverse way. How to recreate the
HTML page from binary HLP file. This is done by commands like:
gfxboot --help-show en.hlp > en.html
gfxboot --help-show it.hlp > it.html
gfxboot --help-show de.hlp > de.html
So we see that byte sequence at the beginning 0412 means start new page
followed by label without no text output. Afterwards comes the ASCII
like label name.
Now comes the interesting part. In theory you can compile a HTML page
about god but in reality the HLP samples are used as help text for
boot loaders like GRUB, syslinux and so on. So in real world examples
i got only 2 label names.
In about half of the samples the first label is 4 byte string main. Similar to
c program where entry starts with function name main here main seems to be used as
first label. When generating hlp-gfxboot-main.trid.xml by running tridscan
this is expressed by first and characteristic XML construct. That looks like:
<Bytes>04126D61696E14</Bytes>
<ASCII> . . m a i n</ASCII>
<Pos>0</Pos>
In the other half of samples the first label is 3 byte string opt. Apparently
it start with section about options for booting. When generating
hlp-gfxboot-opt.trid.xml by running tridscan this is expressed by first and
characteristic XML construct. That looks like:
<Bytes>04126F707414</Bytes>
<ASCII> . . o p t</ASCII>
<Pos>0</Pos>
Furthermore we see that after label name comes byte with hexadecimal value
14. That means that afterward comes title. When looking in output of patched
file command (file.tmp in output) we see that in "opt" variant the title is
like 'Boot Options' in English help file, 'Bootoptionen' in German help file,
and 'Opzioni di avvio' in Italian help file. In "main" variant the title is like 'Help
voor bootloader' like in Netherlands help file.
In global strings section i get line that are obviously triggered by phrases
used in context of help with boot. These look like:
<String>BOOT</String>
<String>HELP</String>
Then there are phrases with links to specific boot items. These look like:
<String>O_SPLASH</String>
<String>O_ACPI</String>
<String>O_APM</String>
<String>O_IDE</String>
<String>SCSI</String>
For older systems (dated about 2000 or earlier APM instead of ACPI and IDE
instead SCSI disk was used. At the moment such problem items are explained in
help files, but maybe in the future items for such old boot option may
vanish. Then such keywords and corresponding lines in definition will vanish.
Most bootloaders offer the ability to configure the keyboard layout and load
different configuration (saved as profile). So we find corresponding keyword in
help file and TrID definition. These are expressed by line like:
<String>2000</String>
<String>PROFILE</String>
<String>KEYTABLE</String>
The first two sound too unspecific to me. So i delete these 2 lines.
In main variant i got more phrases concerning boot parameters. These look like:
<String>HTTP</String>
<String>192.168.0.1</String>
<String>O_VNCPASSWORD</String>
<String>O_HOSTIP</String>
<String>O_SPLASH</String>
<String>O_GATEWAY</String>
<String>O_INSTALL</String>
<String>O_NETMASK</String>
<String>VIDEOMODE</String>
<String>NOLAPIC</String>
<String>NOACPI</String>
<String>INSTALL_SRC</String>
<String>DRIVERUPDATE</String>
<String>NETWORK</String>
Here in help files is also described how to configure your network
(with predefined IP address like 192.168.0.1), allow remote desktop access via
VNC protocol, where to get driver and sources updates. Maybe not all boot
loaders already configure network or maybe use other IP addresses for the booting
computer. So i delete the first two lines which are too unspecific for me. But
maybe more lines must be deleted if help is about bootloaders without network
staff.
In this variant is also described that instead of starting Linux booting memory
diagnose tool memtest, BIOS firmware or hard disc can be done. The Lin's can
be started in 32/64-bit variant or with options to use rescue or fail safe
mode. These is expressed by lines like:
<String>BITS</String>
<String>FAILSAFE</String>
<String>FIRMWARE</String>
<String>HARDDISK</String>
<String>RESCUE</String>
<String>LINUX</String>
<String>MEMTEST</String>
If the boot loader does not offer such abilities then such lines vanish. For me
the item with 32/64-bit variant sound too unspecific and many distribution does
not offer 32 bit variant any more. So i delete the concerning line.
The gfxboot tool was developed by SUSE. So in the help pages is described
how to configure network in the operating system after that is booted. In SUSE
systems this is done by their own tool called yast2. So the reference to this
configuration tools is expressed by line like:
<String>YAST2</String>
Most other distributions do not use yast2. So probably in help pages of other
distributions the phrase with yast2 do not exist. So i delete that line.
With the new definition all of my inspected HLP samples are still described, but the
misidentification (like tri10u.tfm) vanish, because more items are inspected (see
appended trid-v-new.txt trid-new.txt in output).
TrID definitions, some samples and output are stored in archive hlp-tfm.zip. I
hope that my definitions can be used in future version of triddefs.
With best wishes
J?rg Jenderek