Recent Posts

Pages: [1] 2 3 ... 10
1
Definitions DB change log / Re: Current - Year 2024
« Last post by Mark0 on Today at 01:59:48 PM »
Updated:
  • Olympus RAW format (FluoView) (OIR)
Added:
  • Mac ICOL colors LUT ()
  • Mac PICT bitmap ()
  • Veeco D3100 data (old) (001/002/003)
  • attocube Systems 2D data (ASCII) (ASC)
  • Hitachi AFM data (AFM)
  • BCR-STM SPM data (16bit int) (BCR)
  • BCR-STM SPM data (16bit int, UTF-16) (BCR)
  • BCR-STM SPM data (32bit FP) (BCRF)
  • BCR-STM SPM data (32bit FP, UTF-16) (BCRF)
  • Createc SPM data (16bit int) (DAT)
  • Createc SPM data (32bit FP compressed) (DAT)
  • Createc SPM data (32bit FP) (DAT)
  • Zygo MetroPro SPM data (var.1) (DAT)
  • Zygo MetroPro SPM data (var.2) (DAT)
  • Zygo MetroPro SPM data (var.3) (DAT)
  • EPOC/Symbian Library (DLL)
  • Nano Measuring Machine profile data (DSC)
  • Nanosurf EZD SPM data (EZD/NID)
  • MicroProf FRT profilometry data (FRT)
  • Danish Micro Engineering General Data Exchange Format (GDF)
  • Gwyddion XYZ field SPM data (v1.0) (GXYZF)
  • Danish Micro Engineering Rasterscope SPM data (IMG)
  • MapVue profilometry data (intensity) (MAP)
  • MapVue profilometry data (phase) (MAP)
  • NT-MDT SPM data (MDT)
  • Molecular Imaging MI image SPM data (MI)
  • Molecular Imaging MI spectroscopy SPM data (MI)
  • Danish Micro Engineering MIF SPM data (MIF)
  • Nanoeducator SPM data (MSPM)
  • Nanonics NAN SPM data (NAN)
  • Nanomagnetics SPM data (v3) (NMI)
  • Nanomagnetics SPM data (v5) (NMI)
  • Olympus RAW format (OIR)
  • Dektak OPDx profilometry data (OPDX)
  • Anfatec SPM Parameters (PAR)
  • Hitachi SEM data (SEM)
  • ISO 28600:2011 SPM data transfer format (SPM)
  • Veeco Nanoscope II SPM data (SPM)
  • Veeco Nanoscope III SPM data (binary) (SPM)
  • Veeco Nanoscope III SPM data (text) (SPM)
  • Veeco Nanoscope III SPM force data (binary) (SPM)
  • Nanonis SXM data (SXM)
  • Keyence VK3 profilometry data (VK3)
  • Keyence VK4 profilometry data (VK4)
  • Keyence VK6 profilometry data (VK6)
  • NanoScan SPM data (XML)
2
Hello trid users,

some days ago i looked at the content of an exotic CD-ROM. There are also
stored samples which are misidentified. The samples have TFM file name suffix

There exist other variants. So in this session i will handle only variant with
lh=0x02. I will explain later what this means.

I found such samples after installing MiKTeX version 23.12 on Windows. On
Linux Mint 21.3 i found such samples as part of packages (like texlive-base
texlive-fonts-recommended texlive-lang-greek texlive-lang-other
texlive-latex-extra texlive-music texlive-pictures texlive-science).

So i run trid utility on my TFM samples with lh=0x02. The samples are not
recognized. Many are described as "Unknown!". For not unknown samples i get
many different descriptions, but all are wrong (see appended trid-v-old.txt in
output). I get many hundreds of such TFM samples. It took some time to get
dozen of non TFM samples which matches the misidentified TFM samples. Many are
described as "Adobe PhotoShop Brush" by abr.trid.xml. Few are described as
"Commodore 128 BASIC V7.0 program" by prg-c128.trid.xml or as "Commodore 128
BASIC V7.0 program (graph mode on)" by prg-c128-gfx.trid.xml. Few (like
cmman.tfm gen9.tfm) are described as "MacBinary 1" by
macbinary-1.trid.xml. Few samples (like yarborn.tfm) are described as "PC9801
rip" by m-mod.trid.xml.

For comparison reason i also run the file format identification utility DROID
(See https://sourceforge.net/projects/droid/). Here most of the samples are
also not recognized.  Samples with bin suffix like (like ttcomp-bin-4k.bin)
are therefore described as "Binary File" by PUID fmt/208. Samples with m
suffix like (like DIES_13.M EVE_18.M EVE_19A.M EVE_26.M) are therefore
described as "MATLAB Script File" by PUID fmt/1678.

For comparison reason i also run file command (version 5.45) on such
samples. Here all such samples are not recognized" and not described as "TeX
font metric data". A few samples with M suffix (like EVE_18.M) are
misidentified as "TeX font metric data".  Few samples (like cmfibs8.tfm
fcitt12.tfm rgrbf10.tfm) are described as "executable" for "MIPS" or "amd 29k"
architecture Few samples (like cmrgrsl10.tfm yrcmex10.tfm) are described as
"object" for "Tower/XP" architecture.  This behaviour get not better when
using no keep going option of file command (see appended file-5.45.txt in
output). For the TFM samples no mime type application/x-tex-tfm is shown (see
appended file-i-5.45.txt in output). Here no file name suffix is shown (see
appended file-ext-5.45.txt in output).

Luckily i found page about TeX Font Metrics on file formats archive team web
site and on Wikipedia. So i use the first because the Wikipedia link is there
also mentioned and furthermore link to download samples are here listed. So
the reference URL in new definition is expressed by line like:
 <RefURL>http://fileformats.archiveteam.org/wiki/TeX_Font_Metrics/RefURL>

So i run tridscan on my samples to generate tfm-tex-0x02.trid.xml.  Afterwards
i tried to understand the generated constructs and look if these are always
true. According to specification the six-word (24-byte) file header contains
twelve unsigned 16-bit integers which describes general TFM characteristics
(the length of the file, the range of character codes contained in the font,
and the size of each of the tables). According to specification i patched file
command (see appended file.tmp in output).

On specification are some formulas listed like:
    bc-1 <= ec < =255
    ne <=256
    lf=6+lh+(ec-bc+1)+nw+nh+nd+ni+nl+nk+ne+np
That means that at least three fields (bc,ec,ne) are always lower 256. Because
the files are stored in big endian format that means upper byte of these
fields are nil. Apparently nearly all others of these twelve fields are below
256.  So at even offsets we have nil bytes. That is expressed by XML
constructs like:
   ...
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>20</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>22</Pos>
   </Pattern>
   <Pattern>

The only exceptions are number of words in the lig table (nl) and file length
(lf). The first value is stored at offset 16 as field nk and is sometimes
bigger than 255. Also the file length is sometimes bigger than 255 (like
casyll10.tfm cmfibs8.tfm fcbx10.tfm fcitt12.tfm mrgrsl10.tfm rgrbf10.tfm
wasysl10.tfmyrcmex10.tfm ). That value is stored at offset 0 as field lf in
word units. By multiplying this value with 4 the file size in bytes can be
obtained.

If samples are real TFM that can be verified by running a command line tool
like:
   tftopl yarborn.tfm yarborn.pl

Now comes the interesting part. At offset 2 the length of the header data is
in word units. For some of my inspected TFM samples this value is 2
(=0x02). The samples in this session all have this value. Together with upper
nil byte of bc (first character code in the font) this significant part is
expressed by XML construct like:
   <Bytes>000200</Bytes>
   <Pos>2</Pos>

As far as i can see there exist less than dozen of variants with other lh
values. In other variants the lh values does change only a little bit. I will
handle the other variants in a future session.

When header size is 2 then there exist only two elements (header[0] is a
32-bit check sum; header[1] size of the font (fix_word are units of TeX
points). So in this variant there exist no header[2..11] (coding name) and no
header[12..16] (font family name). So these samples apparently contain no ASCI
like strings.

With the new definition all TFM samples with header size 0x02 are now
recognized and described (see appended trid-v-new.txt trid-new.txt in
output). The definition is "good" , that it does not misidentifies non TFM
samples. Unfortunately for some TFM samples the description as TeX Font Metric
is not the first. The main reason is that significant characteristic is done
by 16-bit lh value.

Luckily i found page about audio samples with m suffix on file formats archive
team web site. So i use this. So the reference URL in definition is expressed
by line like:

 <RefURL>
 http://fileformats.archiveteam.org/wiki/Professional_Music_Driver_PMD
 </RefURL>

TrID definitions, some samples and output are stored in archive
tfm_0x02.zip. I hope that my definition can be used in future version of
triddefs. As mentioned there exist other variants of TFM. I will try to handle
these in a future session.

With best wishes
J?rg Jenderek
3
Definitions DB change log / Re: Current - Year 2024
« Last post by Mark0 on April 13, 2024, 07:53:34 PM »
Updated:
  • CompDisk compressed disk image ()
  • Astound / Animation Works Movie (AWM)
  • Csound unified file format (CSD)
  • Lynx archive (LNX)
  • Simulation Description Format (SDF)
  • XMILE Model (STMX)
  • Transport Neutral Encapsulation Format (TNEF/DAT)
Added:
  • Alicona 3D image (AL3D)
  • Ambios AMB (AMB)
  • Ambisonic B-Format audio (AMB)
  • Fox Engine DeForm (DFRM)
  • Fox Engine Model (FMDL)
  • Fox Engine Texture (FTEX)
  • Fox Engine Form Variation (FV2)
  • Gwyddion Simple Field (v1.0) (GSF)
  • Gwyddion Container (GWY)
  • gfxboot compiled HTML Help (main) (HLP)
  • gfxboot compiled HTML Help (opt) (HLP)
  • Sibelius Scorch (SCO)
  • TsiLang binary translation data (SIB)
  • TsiLang translation data (SIL)
  • ANDOR SIF (SIF)
  • 3shape STereoLithography (binary) (STL)
  • TeX Font Metric (0x78) (TFM)
Deleted:
  • gfxboot compiled html help (HLP)
  • Csound Score (SCO)
4
Hello trid users,

some days ago i looked at the content of an exotic CD-ROM. There are also
stored samples which are misidentified. The samples have TFM file name suffix

There exist other variants. So in this session i will handle only variant with
lh=0x40. I will explain later what this means. I found few dozens of such
samples (like tgoth10.tfm in standard directory with parent directory
ptex-fonts inside fonts sub directory tfm) after installing MiKTeX version
23.12 on Windows. On Linux Mint 21.2 i found such samples as part of
texlive-lang-japanese package with version 2021.20220204-1.

So i run trid utility on my TFM samples with lh=0x40. The samples are not
recognized. Many are described wrong as "Adobe PhotoShop Brush" by
abr.trid.xml with file name suffix (.ABR). Some real ABR samples are described
as "TTComp archive compressed (bin-4K)" by ark-ttcomp-bin-4k.trid.xml (see
appended trid-v-old.txt in output).

It took some time to get few of non TFM samples which matches the
misidentified TFM samples.

For comparison reason i also run the file format identification utility DROID
(See https://sourceforge.net/projects/droid/). Here no sample is recognized.

For comparison reason i also run file command (version 5.45) on such
samples. Here these TFM samples are not recognized and not described as "TeX
font metric data". These are described as "data". On the other hand the ABR
samples are also not recognized. Many are described first with as "GDSII
Stream file" with some times obviously wrong and high version numbers. Many
are described also as "TTComp archive data, binary, 4K dictionary" (see
appended file-k-5.45.txt in output). For theses TFM samples here mime type
application/x-tex-tfm is not shown (see appended file-i-5.45.txt in
output). Here no file name suffix is shown (see appended file-ext-5.45.txt in
output).

Luckily i found page about TeX Font Metrics on file formats archive team web
site and on Wikipedia. So i use the first because the Wikipedia link is there
also mentioned and furthermore link to download samples are here listed. So
the reference URL in new definition is expressed by line like:
 <RefURL>http://fileformats.archiveteam.org/wiki/TeX_Font_Metrics/RefURL>

So i run tridscan on my samples to generate tfm-tex-0x40.trid.xml. Afterwards
i tried to understand the generated constructs and look if these are always
true. I just thought it is like other variants with just some more words in
data header, but unfortunately this is less than half of the truth.

According to mentioned specification the six-word (24-byte) file header
contains twelve unsigned 16-bit integers which describes general TFM
characteristics (the length of the file, the range of character codes
contained in the font, and the size of each of the tables). According to
specification i patched file command ( See appended file.tmp in output).

On specification are some formulas listed like:
    bc-1 <= ec < =255
    ne <=256
    lf=6+lh+(ec-bc+1)+nw+nh+nd+ni+nl+nk+ne+np

The mentioned specification is an archived version on archive.org dated about
2012. Obviously these described items does not match "newer exotic" fonts like
Japanese. So i assume the described items only apply in full truth for fonts
with 8 bits or lower.  The bc values at offset 4 in my samples was like 108 or
214 The ec values at offset 6 in my samples was 18 (12 hexadecimal). The last
is expressed by XML construct that looks like:
   <Bytes>0012000000</Bytes>
   <Pos>6</Pos>
So here ec is lower than bc.

If try to convert like in other variants by running a command line tool like:
   tftopl tgoth10.tfm tgoth10.pl
      I got output like:
This is TFtoPL, Version 3.3 (MiKTeX 24.3)
There's some extra junk at the end of the TFM file,
but I'll proceed as if it weren't there.
The character code range 214..18 is illegal!
Sorry, but I can't go on; are you sure this is a TFM?

In other variants the number of words file length is stored as 2 byte integer
in big endian at offset 0. By multiplying this value with 4 the file size in
bytes can be obtained. In this variant this information is stored at offset 4
bytes higher.

Apparently also the coding scheme name is stored at higher offset (37=32+4
compared with other variants). So after coding scheme name (maximal 39) like
(JIS X0208) (TEX KANJI TEXT) (UNSPECIFIED) the remaining 25 padding bytes in
my examples are expressed by XML construct like:
   <Bytes>00000000000000000000000000000000000000000000000000</Bytes>
   <Pos>51</Pos>

Afterwards at offset 77 (4 bytes higher than compared with other variants)
font family name (like MINCHO, GOTHIC, UNSPECIFIED or 'OTF KANJI' maximal 19)
is stored. Here at offset 96 (92 plus 4 compared with other variants) again
seems to be stored seven bit safe byte with value 80h. That is expressed by
XML construct like:
   <Bytes>0000000000000000800000</Bytes>
   <Pos>88</Pos>
But i do not know if this always true. So i keep it at the moment and mention
my observations in remark line.

Now comes the interesting part. At offset 2 the length of the header data is
stored in word units. For some dozens of my inspected TFM samples this value
is 64 (=0x40). The samples in this session all have this value. Together with
other parts this is expressed by XML construct like:
   <Bytes>004000</Bytes>
   <ASCII> . @</ASCII>
   <Pos>2</Pos>

As far as i can see there exist less than dozen of variants with other lh
values. In other variants the lh values does change only a little bit. I will
handle the other variants in a future session.

Compared with other variants i get more patterns. Because mentioned
specification does not fully match i do not exactly know how to interpret
these pattern and if these are always true. So i keep must patterns. At higher
offsets after after data header (280=64*4+24) i get short nil sequences like:
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>278</Pos>
   </Pattern>
   ...
   <Pattern>
      <Bytes>0000</Bytes>
      <Pos>430</Pos>
   </Pattern>
I assume that these are triggered by lucky circumstances ( too few
examples). So i delete these patterns.

Unfortunately i found a few dozens of samples with lh=40 which does not fit
with my definition. In that samples i found no ASCII strings like for coding
scheme name and font family name. Maybe that Japanese names are stored in
UTF-16 or similar. Maybe i try to handle such samples in future session

With the new definition most TFM samples with header size 0x40 are now
recognized and described (see appended trid-v-new.txt trid-new.txt in
output). The definition is "good", that it does not misidentifies non TFM
samples. And because of some more conditions compared with other variants the
description as "TeX Font Metric" comes first.

TrID definitions, some samples and output are stored in archive
tfm_0x40.zip. I hope that my definition can be used in future version of
triddefs. As mentioned there exist other variants of TFM. I will try to handle
these in a future session.

With best wishes
J?rg Jenderek
5
TrID File Identifier / Re: 2 variants for gfxboot compiled html help (main opt)
« Last post by Mark0 on April 12, 2024, 10:16:11 PM »
Thanks!
6
Thanks!
7
Hello trid users,

some days ago i looked at the content of an exotic CD-ROM. There are also
stored samples which are misidentified. The samples have TFM file name suffix

There exist other variants. So in this session i will handle only variant with
lh=0x78. I will explain later what this means. I found few dozens of such
samples (like nmin10.tfm in nmin-ngoth directory with parent directory
ptex-fonts inside fonts sub directory tfm) after installing MiKTeX version
23.12 on Windows.  On Linux Mint 21.2 i found such samples as part of
texlive-lang-japanese package with version 2021.20220204-1.

So i run trid utility on my TFM samples with lh=0x78h. The samples are not
recognized. Many are described wrong as "Adobe PhotoShop Brush" by
abr.trid.xml with file name suffix (.ABR). Some real ABR samples are described
as "TTComp archive compressed (bin-4K)" by ark-ttcomp-bin-4k.trid.xml.  (see
appended trid-v-old.txt in output).

It took some time to get few of non TFM samples (like *.gds) which matches the
misidentified TFM samples.

For comparison reason i also run the file format identification utility DROID
(See https://sourceforge.net/projects/droid/). Here no sample is recognized.
The sample with bin file name suffix is therefore described as "Binary File"
by PUID fmt/208.

For comparison reason i also run file command (version 5.45) on such
samples. Here these TFM samples are not recognized and not described as "TeX
font metric data". These are described as "data". On the other hand the ABR
samples are also not recognized. Many are described first with as "GDSII
Stream file" with some times obviously wrong and high version numbers. Many
are described also as "TTComp archive data, binary, 4K dictionary" (see
appended file-k-5.45.txt in output). For theses TFM samples here mime type
application/x-tex-tfm is not shown (see appended file-i-5.45.txt in
output). Here no file name suffix is shown (see appended file-ext-5.45.txt in
output).

Luckily i found page about TeX Font Metrics on file formats archive team web
site and on Wikipedia. So i use the first because the Wikipedia link is there
also mentioned and furthermore link to download samples are here listed. So
the reference URL in new definition is expressed by line like:
 <RefURL>http://fileformats.archiveteam.org/wiki/TeX_Font_Metrics/RefURL>

So i run tridscan on my samples to generate tfm-tex-0x78.trid.xml.  Afterwards
i tried to understand the generated constructs and look if these are always
true. I just thought it is like other variants with just some more words in
data header, but unfortunately this is less than half of the truth.

According to mentioned specification the six-word (24-byte) file header
contains twelve unsigned 16-bit integers which describes general TFM
characteristics (the length of the file, the range of character codes
contained in the font, and the size of each of the tables). According to
specification i patched file command ( See appended file.tmp in output).

On specification are some formulas listed like:
    bc-1 <= ec < =255
    ne <=256
    lf=6+lh+(ec-bc+1)+nw+nh+nd+ni+nl+nk+ne+np

The mentioned specification is an archived version on archive.org dated about
2012. Obviously these described items does not match "newer exotic" fonts like
Japanese. So i assume the described items only apply in full truth for fonts
with 8 bits or lower. According to the mentioned specification first character
code (bc) is "too high" (like 276 299). That is above 255. The ec values in my
samples was 18 (12 hexadecimal). So here ec is lower than bc.

If try to convert like in other variants by running a command line tool like:
   tftopl goth10.tfm goth10.pl

   I got output like:
This is TFtoPL, Version 3.3 (MiKTeX 24.3)
There's some extra junk at the end of the TFM file,
but I'll proceed as if it weren't there.
The character code range 299..18 is illegal!
Sorry, but I can't go on; are you sure this is a TFM?

In other variants the number of words file length is stored as 2 byte integer
in big endian at offset 0. By multiplying this value with 4 the file size in
bytes can be obtained. In this variant this information is stored at offset 4
bytes higher.

Apparently also the coding scheme name is stored at higher offset (37=32+4
compared with other variants). So 15 byte (0Eh maximal 39) coding scheme name
(TEX KANJI TEXT) in my examples is expressed by XML construct like:

 <Bytes>00000E544558204B414E4A492054455854
 0000000000000000000000000000000000000000000000000006</Bytes>
 <ASCII> . . . T E X   K A N J I   T E X T</ASCII>
 <Pos>34</Pos>

Afterwards at offset 77 (4 bytes higher than compared with other variants) 6
bytes font family name (like MINCHO or GOTHIC maximal 19) is stored. Here at
offset 96 (92 plus 4 compared with other variants) again seems to be stored
seven bit safe byte with value 80h. That is expressed by XML construct like:
   <Bytes>00000000000000000000000000800000</Bytes>
   <Pos>83</Pos>
But i do not know if this always true. So i keep it at the moment and mention
my observations in remark line.

Now comes the interesting part. At offset 2 the length of the header data is
in word units. For some dozens of my inspected TFM samples this value is 120
(=0x78). The samples in this session all have this value. Together with other
parts this is expressed by XML construct like:
   <Bytes>000B007801</Bytes>
   <ASCII> . . . x</ASCII>
   <Pos>0</Pos>

As far as i can see there exist less than dozen of variants with other lh
values. In other variants the lh values does change only a little bit. I will
handle the other variants in a future session.

Compared with other variants i get more patterns. Because mentioned
specification does not fully match i do not exactly know how to interpret
these pattern and if these are always true. So i keep must patterns. At higher
offsets i get short nil sequences like:
      <Pattern>
         <Bytes>00</Bytes>
         <Pos>632</Pos>
      </Pattern>
      ...
      <Pattern>
         <Bytes>00</Bytes>
         <Pos>984</Pos>
      </Pattern>
      ..
      <Pattern>
         <Bytes>00</Bytes>
         <Pos>1100</Pos>
      </Pattern>
I assume that these are triggered by lucky circumstances ( too few
examples). So i delete these patterns.

With the new definition all TFM samples with header size 0x78 are now
recognized and described (see appended trid-v-new.txt trid-new.txt in
output). The definition is "good", that it does not misidentifies non TFM
samples. And because of some more conditions compared with other variants the
description as "TeX Font Metric" comes first.

TrID definitions, some samples and output are stored in archive
tfm_0x78.zip. I hope that my definition can be used in future version of
triddefs. As mentioned there exist other variants of TFM. I will try to handle
these in a future session.

With best wishes
J?rg Jenderek

8
Definitions DB change log / Re: Current - Year 2024
« Last post by Mark0 on April 11, 2024, 06:33:20 PM »
Updated:
  • Cryo Interactive APC audio (APC)
  • BRSTM audio (BRSTM)
  • Capella music notation (v3, binary) (CAP)
  • CRYO HNM6 video (HNM/HNS)
  • Kidspiration document (KID)
  • CRYO UBB video (UBB/HNM)
Added:
  • Cryo Interactive game data (3DC/3DM)
  • Ben Daglish game music (BD)
  • Capella music notation (v2.0, binary) (CAP)
  • Capella music notation (v2.1, binary) (CAP)
  • Capella music notation (v2.2, binary) (CAP)
  • Capella CapXML music notation (zipped) (CAPX)
  • Cryo Interactive game data (UBIK) (BF/OLI)
  • Cryo Interactive game data (DAN)
  • Cryo Interactive game data (DSN)
  • Kidproof settings (v1.0) (KID)
  • DxWnd Log (LOG)
  • PureBasic source (with PB IDE info, UTF-8) (PB)
  • PureBasic Project (PBP)
  • PubCoder project (PUBCODER)
  • TeX Font Metric (0x15) (TFM)
  • Capella CapXML music notation (XML)
  • Capella CapXML music notation (UTF-8) (XML)
9
TrID File Identifier / 2 variants for gfxboot compiled html help (main opt)
« Last post by jenderek on April 09, 2024, 08:50:30 PM »
Hello trid users,

some days ago i looked at the content of an exotic CD-ROM. There are also
stored samples which are misidentified. The samples have TFM file name suffix.

Unfortunately other TFM samples are misidentified as other file formats. One
sample (tri10u.tfm) is misidentified as "gfxboot compiled html help".tfm" by
hlp-gfxboot.trid.xml without mime type. The file name suffix shown is HLP (see
appended trid-v-old.txt in output). Such samples can be found for example
inside package gfxboot-themes. The recognition happens by one XML
construct. That looks like:

   <Bytes>0412</Bytes>
   <Pos>0</Pos>

So only 16 bit are used for recognition. Apparently this is sometimes  too
weak. According to file command recommendations at least 32 bits should be
used for recognition.

For comparison reason i also run the file format identification utility DROID
(See https://sourceforge.net/projects/droid/). Here only the HTML samples are
recognized. These are described as "Hypertext Markup Language" with
mime type text/html by PUID fmt/96. The other samples are not
recognized.

For comparison reason i also run file command (version 5.45) on such
samples. Here the HTML and HLP samples are "recognized". The HP
samples are here also described as "gfxboot compiled html help file"
(see appended and file-5.45.txt in output). For the samples here also
generic application/octet-stream mime type is shown (see appended
file-i-5.45.txt in output). Here no file name suffix is shown (see
appended file-ext-5.45.txt in output). The TFM samples is not
described as "TeX font metric data" (see appended file-k-5.45.txt in
output).

In current definition a page on sourceforge is used as reference. That
is not wrong, but not really useful because after redirecting with
invalid links the quintessence is that development now happens on
GitHub. So in variants i use gfxboot page on Github as reference. That
is expressed by line like:

   <RefURL>https://github.com/openSUSE/gfxboot</RefURL>

In current definition as remark is written how the step from HTML to HLP
is done as described in GFXBOOT(1) man page. This is done by command
line like:

   gfxboot --help-create

By this step the tool "compiles" and generate from "readable" HTML
text binary HLP help pages. These can be considered as "tokenized" html
pages. How this happens can be see when looking inside perl script
gfxboot. The relevant lines for identification are like:

    page         => "\x04",      # start new page
    label        => "\x12",      # label start, no text output; label end = "\x13"
    title        => "\x14",      # start page description; ends with "\x10"
    normal       => "\x10",      # back to normal (color, text output)
    li           => "\x16",      # start list item; ends with "\x15" or "\x16"
    ind          => "\x17",      # set indentation
    link         => "\x13",      # label end; set link text color (gfx_color2/3)

So i think it is better to reference the reverse way. How to recreate the
HTML page from binary HLP file. This is done by commands like:

   gfxboot --help-show en.hlp > en.html
   gfxboot --help-show it.hlp > it.html
   gfxboot --help-show de.hlp > de.html

So we see that byte sequence at the beginning 0412 means start new page
followed by label without no text output. Afterwards comes the ASCII
like label name.

Now comes the interesting part. In theory you can compile a HTML page
about god but in reality the HLP samples are used as help text for
boot loaders like GRUB, syslinux and so on. So in real world examples
i got only 2 label names.

In about half of the samples the first label is 4 byte string main. Similar to
c program where entry starts with function name main here main seems to be used as
first label. When generating hlp-gfxboot-main.trid.xml by running tridscan
this is expressed by first and characteristic XML construct. That looks like:

   <Bytes>04126D61696E14</Bytes>
   <ASCII> . . m a i n</ASCII>
   <Pos>0</Pos>

In the other half of samples the first label is 3 byte string opt. Apparently
it start with section about options for booting.  When generating
hlp-gfxboot-opt.trid.xml by running tridscan this is expressed by first and
characteristic XML construct. That looks like:

   <Bytes>04126F707414</Bytes>
   <ASCII> . . o p t</ASCII>
   <Pos>0</Pos>

Furthermore we see that after label name comes byte with hexadecimal value
14. That means that afterward comes title. When looking in output of patched
file command (file.tmp in output) we see that in "opt" variant the title is
like 'Boot Options' in English help file, 'Bootoptionen' in German help file,
and 'Opzioni di avvio' in Italian help file. In "main" variant the title is like 'Help
voor bootloader' like in Netherlands help file.

In global strings section i get line that are obviously triggered by phrases
used in context of help with boot. These look like:

   <String>BOOT</String>
   <String>HELP</String>

Then there are phrases with links to specific boot items. These look like:

   <String>O_SPLASH</String>
   <String>O_ACPI</String>
   <String>O_APM</String>
   <String>O_IDE</String>
   <String>SCSI</String>

For older systems (dated about 2000 or earlier APM instead of ACPI and IDE
instead SCSI disk was used. At the moment such problem items are explained in
help files, but maybe in the future items for such old boot option may
vanish. Then such keywords and corresponding lines in definition will vanish.
Most bootloaders offer the ability to configure the keyboard layout and load
different configuration (saved as profile). So we find corresponding keyword in
help file and TrID definition. These are expressed by line like:

   <String>2000</String>
   <String>PROFILE</String>
   <String>KEYTABLE</String>

The first two sound too unspecific to me. So i delete these 2 lines.

In main variant i got more phrases concerning boot parameters. These look like:

   <String>HTTP</String>
   <String>192.168.0.1</String>
   <String>O_VNCPASSWORD</String>
   <String>O_HOSTIP</String>
   <String>O_SPLASH</String>
   <String>O_GATEWAY</String>
   <String>O_INSTALL</String>
   <String>O_NETMASK</String>
   <String>VIDEOMODE</String>
   <String>NOLAPIC</String>
   <String>NOACPI</String>
   <String>INSTALL_SRC</String>
   <String>DRIVERUPDATE</String>
   <String>NETWORK</String>

Here in help files is also described how to configure your network
(with predefined IP address like 192.168.0.1), allow remote desktop access via
VNC protocol, where to get driver and sources updates. Maybe not all boot
loaders already configure network or maybe use other IP addresses for the booting
computer. So i delete the first two lines which are too unspecific for me. But
maybe more lines must be deleted if help is about bootloaders without network
staff.

In this variant is also described that instead of starting Linux booting memory
diagnose tool memtest, BIOS firmware or hard disc can be done. The Lin's can
be started in 32/64-bit variant or with options to use rescue or fail safe
mode. These is expressed by lines like:

   <String>BITS</String>
   <String>FAILSAFE</String>
   <String>FIRMWARE</String>
   <String>HARDDISK</String>
   <String>RESCUE</String>
   <String>LINUX</String>
   <String>MEMTEST</String>

If the boot loader does not offer such abilities then such lines vanish. For me
the item with 32/64-bit variant sound too unspecific and many distribution does
not offer 32 bit variant any more. So i delete the concerning line.

The gfxboot tool was developed by SUSE. So in the help pages is described
how to configure network in the operating system after that is booted. In SUSE
systems this is done by their own tool called yast2. So the reference to this
configuration tools is expressed by line like:


   <String>YAST2</String>

Most other distributions do not use yast2. So probably in help pages of other
distributions the phrase with yast2 do not exist. So i delete that line.


With the new definition all of my inspected HLP samples are still described, but the
misidentification (like tri10u.tfm) vanish, because more items are inspected (see
appended trid-v-new.txt trid-new.txt in output).

TrID definitions, some samples and output are stored in archive hlp-tfm.zip. I
hope that my definitions can be used in future version of triddefs.

With best wishes
J?rg Jenderek
10
Thanks!
Pages: [1] 2 3 ... 10