Recent Posts

Pages: 1 ... 7 8 [9] 10
81
Hello trid users,

some days ago i looked at the content of an exotic CD-ROM. There are also
stored samples which are misidentified. The samples have TFM file name suffix

There exist other variants. So in this session i will handle only variant with
lh=0x40. I will explain later what this means. I found few dozens of such
samples (like tgoth10.tfm in standard directory with parent directory
ptex-fonts inside fonts sub directory tfm) after installing MiKTeX version
23.12 on Windows. On Linux Mint 21.2 i found such samples as part of
texlive-lang-japanese package with version 2021.20220204-1.

So i run trid utility on my TFM samples with lh=0x40. The samples are not
recognized. Many are described wrong as "Adobe PhotoShop Brush" by
abr.trid.xml with file name suffix (.ABR). Some real ABR samples are described
as "TTComp archive compressed (bin-4K)" by ark-ttcomp-bin-4k.trid.xml (see
appended trid-v-old.txt in output).

It took some time to get few of non TFM samples which matches the
misidentified TFM samples.

For comparison reason i also run the file format identification utility DROID
(See https://sourceforge.net/projects/droid/). Here no sample is recognized.

For comparison reason i also run file command (version 5.45) on such
samples. Here these TFM samples are not recognized and not described as "TeX
font metric data". These are described as "data". On the other hand the ABR
samples are also not recognized. Many are described first with as "GDSII
Stream file" with some times obviously wrong and high version numbers. Many
are described also as "TTComp archive data, binary, 4K dictionary" (see
appended file-k-5.45.txt in output). For theses TFM samples here mime type
application/x-tex-tfm is not shown (see appended file-i-5.45.txt in
output). Here no file name suffix is shown (see appended file-ext-5.45.txt in
output).

Luckily i found page about TeX Font Metrics on file formats archive team web
site and on Wikipedia. So i use the first because the Wikipedia link is there
also mentioned and furthermore link to download samples are here listed. So
the reference URL in new definition is expressed by line like:
 <RefURL>http://fileformats.archiveteam.org/wiki/TeX_Font_Metrics/RefURL>

So i run tridscan on my samples to generate tfm-tex-0x40.trid.xml. Afterwards
i tried to understand the generated constructs and look if these are always
true. I just thought it is like other variants with just some more words in
data header, but unfortunately this is less than half of the truth.

According to mentioned specification the six-word (24-byte) file header
contains twelve unsigned 16-bit integers which describes general TFM
characteristics (the length of the file, the range of character codes
contained in the font, and the size of each of the tables). According to
specification i patched file command ( See appended file.tmp in output).

On specification are some formulas listed like:
    bc-1 <= ec < =255
    ne <=256
    lf=6+lh+(ec-bc+1)+nw+nh+nd+ni+nl+nk+ne+np

The mentioned specification is an archived version on archive.org dated about
2012. Obviously these described items does not match "newer exotic" fonts like
Japanese. So i assume the described items only apply in full truth for fonts
with 8 bits or lower.  The bc values at offset 4 in my samples was like 108 or
214 The ec values at offset 6 in my samples was 18 (12 hexadecimal). The last
is expressed by XML construct that looks like:
   <Bytes>0012000000</Bytes>
   <Pos>6</Pos>
So here ec is lower than bc.

If try to convert like in other variants by running a command line tool like:
   tftopl tgoth10.tfm tgoth10.pl
      I got output like:
This is TFtoPL, Version 3.3 (MiKTeX 24.3)
There's some extra junk at the end of the TFM file,
but I'll proceed as if it weren't there.
The character code range 214..18 is illegal!
Sorry, but I can't go on; are you sure this is a TFM?

In other variants the number of words file length is stored as 2 byte integer
in big endian at offset 0. By multiplying this value with 4 the file size in
bytes can be obtained. In this variant this information is stored at offset 4
bytes higher.

Apparently also the coding scheme name is stored at higher offset (37=32+4
compared with other variants). So after coding scheme name (maximal 39) like
(JIS X0208) (TEX KANJI TEXT) (UNSPECIFIED) the remaining 25 padding bytes in
my examples are expressed by XML construct like:
   <Bytes>00000000000000000000000000000000000000000000000000</Bytes>
   <Pos>51</Pos>

Afterwards at offset 77 (4 bytes higher than compared with other variants)
font family name (like MINCHO, GOTHIC, UNSPECIFIED or 'OTF KANJI' maximal 19)
is stored. Here at offset 96 (92 plus 4 compared with other variants) again
seems to be stored seven bit safe byte with value 80h. That is expressed by
XML construct like:
   <Bytes>0000000000000000800000</Bytes>
   <Pos>88</Pos>
But i do not know if this always true. So i keep it at the moment and mention
my observations in remark line.

Now comes the interesting part. At offset 2 the length of the header data is
stored in word units. For some dozens of my inspected TFM samples this value
is 64 (=0x40). The samples in this session all have this value. Together with
other parts this is expressed by XML construct like:
   <Bytes>004000</Bytes>
   <ASCII> . @</ASCII>
   <Pos>2</Pos>

As far as i can see there exist less than dozen of variants with other lh
values. In other variants the lh values does change only a little bit. I will
handle the other variants in a future session.

Compared with other variants i get more patterns. Because mentioned
specification does not fully match i do not exactly know how to interpret
these pattern and if these are always true. So i keep must patterns. At higher
offsets after after data header (280=64*4+24) i get short nil sequences like:
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>278</Pos>
   </Pattern>
   ...
   <Pattern>
      <Bytes>0000</Bytes>
      <Pos>430</Pos>
   </Pattern>
I assume that these are triggered by lucky circumstances ( too few
examples). So i delete these patterns.

Unfortunately i found a few dozens of samples with lh=40 which does not fit
with my definition. In that samples i found no ASCII strings like for coding
scheme name and font family name. Maybe that Japanese names are stored in
UTF-16 or similar. Maybe i try to handle such samples in future session

With the new definition most TFM samples with header size 0x40 are now
recognized and described (see appended trid-v-new.txt trid-new.txt in
output). The definition is "good", that it does not misidentifies non TFM
samples. And because of some more conditions compared with other variants the
description as "TeX Font Metric" comes first.

TrID definitions, some samples and output are stored in archive
tfm_0x40.zip. I hope that my definition can be used in future version of
triddefs. As mentioned there exist other variants of TFM. I will try to handle
these in a future session.

With best wishes
J?rg Jenderek
82
TrID File Identifier / Re: 2 variants for gfxboot compiled html help (main opt)
« Last post by Mark0 on April 12, 2024, 10:16:11 PM »
Thanks!
83
Thanks!
84
Hello trid users,

some days ago i looked at the content of an exotic CD-ROM. There are also
stored samples which are misidentified. The samples have TFM file name suffix

There exist other variants. So in this session i will handle only variant with
lh=0x78. I will explain later what this means. I found few dozens of such
samples (like nmin10.tfm in nmin-ngoth directory with parent directory
ptex-fonts inside fonts sub directory tfm) after installing MiKTeX version
23.12 on Windows.  On Linux Mint 21.2 i found such samples as part of
texlive-lang-japanese package with version 2021.20220204-1.

So i run trid utility on my TFM samples with lh=0x78h. The samples are not
recognized. Many are described wrong as "Adobe PhotoShop Brush" by
abr.trid.xml with file name suffix (.ABR). Some real ABR samples are described
as "TTComp archive compressed (bin-4K)" by ark-ttcomp-bin-4k.trid.xml.  (see
appended trid-v-old.txt in output).

It took some time to get few of non TFM samples (like *.gds) which matches the
misidentified TFM samples.

For comparison reason i also run the file format identification utility DROID
(See https://sourceforge.net/projects/droid/). Here no sample is recognized.
The sample with bin file name suffix is therefore described as "Binary File"
by PUID fmt/208.

For comparison reason i also run file command (version 5.45) on such
samples. Here these TFM samples are not recognized and not described as "TeX
font metric data". These are described as "data". On the other hand the ABR
samples are also not recognized. Many are described first with as "GDSII
Stream file" with some times obviously wrong and high version numbers. Many
are described also as "TTComp archive data, binary, 4K dictionary" (see
appended file-k-5.45.txt in output). For theses TFM samples here mime type
application/x-tex-tfm is not shown (see appended file-i-5.45.txt in
output). Here no file name suffix is shown (see appended file-ext-5.45.txt in
output).

Luckily i found page about TeX Font Metrics on file formats archive team web
site and on Wikipedia. So i use the first because the Wikipedia link is there
also mentioned and furthermore link to download samples are here listed. So
the reference URL in new definition is expressed by line like:
 <RefURL>http://fileformats.archiveteam.org/wiki/TeX_Font_Metrics/RefURL>

So i run tridscan on my samples to generate tfm-tex-0x78.trid.xml.  Afterwards
i tried to understand the generated constructs and look if these are always
true. I just thought it is like other variants with just some more words in
data header, but unfortunately this is less than half of the truth.

According to mentioned specification the six-word (24-byte) file header
contains twelve unsigned 16-bit integers which describes general TFM
characteristics (the length of the file, the range of character codes
contained in the font, and the size of each of the tables). According to
specification i patched file command ( See appended file.tmp in output).

On specification are some formulas listed like:
    bc-1 <= ec < =255
    ne <=256
    lf=6+lh+(ec-bc+1)+nw+nh+nd+ni+nl+nk+ne+np

The mentioned specification is an archived version on archive.org dated about
2012. Obviously these described items does not match "newer exotic" fonts like
Japanese. So i assume the described items only apply in full truth for fonts
with 8 bits or lower. According to the mentioned specification first character
code (bc) is "too high" (like 276 299). That is above 255. The ec values in my
samples was 18 (12 hexadecimal). So here ec is lower than bc.

If try to convert like in other variants by running a command line tool like:
   tftopl goth10.tfm goth10.pl

   I got output like:
This is TFtoPL, Version 3.3 (MiKTeX 24.3)
There's some extra junk at the end of the TFM file,
but I'll proceed as if it weren't there.
The character code range 299..18 is illegal!
Sorry, but I can't go on; are you sure this is a TFM?

In other variants the number of words file length is stored as 2 byte integer
in big endian at offset 0. By multiplying this value with 4 the file size in
bytes can be obtained. In this variant this information is stored at offset 4
bytes higher.

Apparently also the coding scheme name is stored at higher offset (37=32+4
compared with other variants). So 15 byte (0Eh maximal 39) coding scheme name
(TEX KANJI TEXT) in my examples is expressed by XML construct like:

 <Bytes>00000E544558204B414E4A492054455854
 0000000000000000000000000000000000000000000000000006</Bytes>
 <ASCII> . . . T E X   K A N J I   T E X T</ASCII>
 <Pos>34</Pos>

Afterwards at offset 77 (4 bytes higher than compared with other variants) 6
bytes font family name (like MINCHO or GOTHIC maximal 19) is stored. Here at
offset 96 (92 plus 4 compared with other variants) again seems to be stored
seven bit safe byte with value 80h. That is expressed by XML construct like:
   <Bytes>00000000000000000000000000800000</Bytes>
   <Pos>83</Pos>
But i do not know if this always true. So i keep it at the moment and mention
my observations in remark line.

Now comes the interesting part. At offset 2 the length of the header data is
in word units. For some dozens of my inspected TFM samples this value is 120
(=0x78). The samples in this session all have this value. Together with other
parts this is expressed by XML construct like:
   <Bytes>000B007801</Bytes>
   <ASCII> . . . x</ASCII>
   <Pos>0</Pos>

As far as i can see there exist less than dozen of variants with other lh
values. In other variants the lh values does change only a little bit. I will
handle the other variants in a future session.

Compared with other variants i get more patterns. Because mentioned
specification does not fully match i do not exactly know how to interpret
these pattern and if these are always true. So i keep must patterns. At higher
offsets i get short nil sequences like:
      <Pattern>
         <Bytes>00</Bytes>
         <Pos>632</Pos>
      </Pattern>
      ...
      <Pattern>
         <Bytes>00</Bytes>
         <Pos>984</Pos>
      </Pattern>
      ..
      <Pattern>
         <Bytes>00</Bytes>
         <Pos>1100</Pos>
      </Pattern>
I assume that these are triggered by lucky circumstances ( too few
examples). So i delete these patterns.

With the new definition all TFM samples with header size 0x78 are now
recognized and described (see appended trid-v-new.txt trid-new.txt in
output). The definition is "good", that it does not misidentifies non TFM
samples. And because of some more conditions compared with other variants the
description as "TeX Font Metric" comes first.

TrID definitions, some samples and output are stored in archive
tfm_0x78.zip. I hope that my definition can be used in future version of
triddefs. As mentioned there exist other variants of TFM. I will try to handle
these in a future session.

With best wishes
J?rg Jenderek

85
Definitions DB change log / Re: Current - Year 2024
« Last post by Mark0 on April 11, 2024, 06:33:20 PM »
Updated:
  • Cryo Interactive APC audio (APC)
  • BRSTM audio (BRSTM)
  • Capella music notation (v3, binary) (CAP)
  • CRYO HNM6 video (HNM/HNS)
  • Kidspiration document (KID)
  • CRYO UBB video (UBB/HNM)
Added:
  • Cryo Interactive game data (3DC/3DM)
  • Ben Daglish game music (BD)
  • Capella music notation (v2.0, binary) (CAP)
  • Capella music notation (v2.1, binary) (CAP)
  • Capella music notation (v2.2, binary) (CAP)
  • Capella CapXML music notation (zipped) (CAPX)
  • Cryo Interactive game data (UBIK) (BF/OLI)
  • Cryo Interactive game data (DAN)
  • Cryo Interactive game data (DSN)
  • Kidproof settings (v1.0) (KID)
  • DxWnd Log (LOG)
  • PureBasic source (with PB IDE info, UTF-8) (PB)
  • PureBasic Project (PBP)
  • PubCoder project (PUBCODER)
  • TeX Font Metric (0x15) (TFM)
  • Capella CapXML music notation (XML)
  • Capella CapXML music notation (UTF-8) (XML)
86
TrID File Identifier / 2 variants for gfxboot compiled html help (main opt)
« Last post by jenderek on April 09, 2024, 08:50:30 PM »
Hello trid users,

some days ago i looked at the content of an exotic CD-ROM. There are also
stored samples which are misidentified. The samples have TFM file name suffix.

Unfortunately other TFM samples are misidentified as other file formats. One
sample (tri10u.tfm) is misidentified as "gfxboot compiled html help".tfm" by
hlp-gfxboot.trid.xml without mime type. The file name suffix shown is HLP (see
appended trid-v-old.txt in output). Such samples can be found for example
inside package gfxboot-themes. The recognition happens by one XML
construct. That looks like:

   <Bytes>0412</Bytes>
   <Pos>0</Pos>

So only 16 bit are used for recognition. Apparently this is sometimes  too
weak. According to file command recommendations at least 32 bits should be
used for recognition.

For comparison reason i also run the file format identification utility DROID
(See https://sourceforge.net/projects/droid/). Here only the HTML samples are
recognized. These are described as "Hypertext Markup Language" with
mime type text/html by PUID fmt/96. The other samples are not
recognized.

For comparison reason i also run file command (version 5.45) on such
samples. Here the HTML and HLP samples are "recognized". The HP
samples are here also described as "gfxboot compiled html help file"
(see appended and file-5.45.txt in output). For the samples here also
generic application/octet-stream mime type is shown (see appended
file-i-5.45.txt in output). Here no file name suffix is shown (see
appended file-ext-5.45.txt in output). The TFM samples is not
described as "TeX font metric data" (see appended file-k-5.45.txt in
output).

In current definition a page on sourceforge is used as reference. That
is not wrong, but not really useful because after redirecting with
invalid links the quintessence is that development now happens on
GitHub. So in variants i use gfxboot page on Github as reference. That
is expressed by line like:

   <RefURL>https://github.com/openSUSE/gfxboot</RefURL>

In current definition as remark is written how the step from HTML to HLP
is done as described in GFXBOOT(1) man page. This is done by command
line like:

   gfxboot --help-create

By this step the tool "compiles" and generate from "readable" HTML
text binary HLP help pages. These can be considered as "tokenized" html
pages. How this happens can be see when looking inside perl script
gfxboot. The relevant lines for identification are like:

    page         => "\x04",      # start new page
    label        => "\x12",      # label start, no text output; label end = "\x13"
    title        => "\x14",      # start page description; ends with "\x10"
    normal       => "\x10",      # back to normal (color, text output)
    li           => "\x16",      # start list item; ends with "\x15" or "\x16"
    ind          => "\x17",      # set indentation
    link         => "\x13",      # label end; set link text color (gfx_color2/3)

So i think it is better to reference the reverse way. How to recreate the
HTML page from binary HLP file. This is done by commands like:

   gfxboot --help-show en.hlp > en.html
   gfxboot --help-show it.hlp > it.html
   gfxboot --help-show de.hlp > de.html

So we see that byte sequence at the beginning 0412 means start new page
followed by label without no text output. Afterwards comes the ASCII
like label name.

Now comes the interesting part. In theory you can compile a HTML page
about god but in reality the HLP samples are used as help text for
boot loaders like GRUB, syslinux and so on. So in real world examples
i got only 2 label names.

In about half of the samples the first label is 4 byte string main. Similar to
c program where entry starts with function name main here main seems to be used as
first label. When generating hlp-gfxboot-main.trid.xml by running tridscan
this is expressed by first and characteristic XML construct. That looks like:

   <Bytes>04126D61696E14</Bytes>
   <ASCII> . . m a i n</ASCII>
   <Pos>0</Pos>

In the other half of samples the first label is 3 byte string opt. Apparently
it start with section about options for booting.  When generating
hlp-gfxboot-opt.trid.xml by running tridscan this is expressed by first and
characteristic XML construct. That looks like:

   <Bytes>04126F707414</Bytes>
   <ASCII> . . o p t</ASCII>
   <Pos>0</Pos>

Furthermore we see that after label name comes byte with hexadecimal value
14. That means that afterward comes title. When looking in output of patched
file command (file.tmp in output) we see that in "opt" variant the title is
like 'Boot Options' in English help file, 'Bootoptionen' in German help file,
and 'Opzioni di avvio' in Italian help file. In "main" variant the title is like 'Help
voor bootloader' like in Netherlands help file.

In global strings section i get line that are obviously triggered by phrases
used in context of help with boot. These look like:

   <String>BOOT</String>
   <String>HELP</String>

Then there are phrases with links to specific boot items. These look like:

   <String>O_SPLASH</String>
   <String>O_ACPI</String>
   <String>O_APM</String>
   <String>O_IDE</String>
   <String>SCSI</String>

For older systems (dated about 2000 or earlier APM instead of ACPI and IDE
instead SCSI disk was used. At the moment such problem items are explained in
help files, but maybe in the future items for such old boot option may
vanish. Then such keywords and corresponding lines in definition will vanish.
Most bootloaders offer the ability to configure the keyboard layout and load
different configuration (saved as profile). So we find corresponding keyword in
help file and TrID definition. These are expressed by line like:

   <String>2000</String>
   <String>PROFILE</String>
   <String>KEYTABLE</String>

The first two sound too unspecific to me. So i delete these 2 lines.

In main variant i got more phrases concerning boot parameters. These look like:

   <String>HTTP</String>
   <String>192.168.0.1</String>
   <String>O_VNCPASSWORD</String>
   <String>O_HOSTIP</String>
   <String>O_SPLASH</String>
   <String>O_GATEWAY</String>
   <String>O_INSTALL</String>
   <String>O_NETMASK</String>
   <String>VIDEOMODE</String>
   <String>NOLAPIC</String>
   <String>NOACPI</String>
   <String>INSTALL_SRC</String>
   <String>DRIVERUPDATE</String>
   <String>NETWORK</String>

Here in help files is also described how to configure your network
(with predefined IP address like 192.168.0.1), allow remote desktop access via
VNC protocol, where to get driver and sources updates. Maybe not all boot
loaders already configure network or maybe use other IP addresses for the booting
computer. So i delete the first two lines which are too unspecific for me. But
maybe more lines must be deleted if help is about bootloaders without network
staff.

In this variant is also described that instead of starting Linux booting memory
diagnose tool memtest, BIOS firmware or hard disc can be done. The Lin's can
be started in 32/64-bit variant or with options to use rescue or fail safe
mode. These is expressed by lines like:

   <String>BITS</String>
   <String>FAILSAFE</String>
   <String>FIRMWARE</String>
   <String>HARDDISK</String>
   <String>RESCUE</String>
   <String>LINUX</String>
   <String>MEMTEST</String>

If the boot loader does not offer such abilities then such lines vanish. For me
the item with 32/64-bit variant sound too unspecific and many distribution does
not offer 32 bit variant any more. So i delete the concerning line.

The gfxboot tool was developed by SUSE. So in the help pages is described
how to configure network in the operating system after that is booted. In SUSE
systems this is done by their own tool called yast2. So the reference to this
configuration tools is expressed by line like:


   <String>YAST2</String>

Most other distributions do not use yast2. So probably in help pages of other
distributions the phrase with yast2 do not exist. So i delete that line.


With the new definition all of my inspected HLP samples are still described, but the
misidentification (like tri10u.tfm) vanish, because more items are inspected (see
appended trid-v-new.txt trid-new.txt in output).

TrID definitions, some samples and output are stored in archive hlp-tfm.zip. I
hope that my definitions can be used in future version of triddefs.

With best wishes
J?rg Jenderek
87
Thanks!
88
Definitions DB change log / Re: Current - Year 2024
« Last post by Mark0 on April 08, 2024, 05:46:54 PM »
Updated:
  • Blender 3D data (BLEND)
  • EDIF Netlist (EDN)
  • interLaced eXtensible Trace (v1) (LXT)
  • Personal Ancestral File (v5) (PAF)
Added:
  • Conquest: Frontier Wars game data ()
  • Microsoft ReadImg/WriteImg disk image ()
  • PureBasic resident data (Amiga) ()
  • Origin Systems's setup Archive (A01)
  • Breach 3 campaign data (B3S)
  • CADSTAR PCB Archive (ASCII, DOS/Win) (CPA)
  • CADSTAR PCB Archive (ASCII, Linux) (CPA)
  • CADSTAR Schematic Archive (ASCII, DOS/Win) (CSA)
  • CADSTAR Schematic Archive (ASCII, Linux) (CSA)
  • PureBasic Help (Linux) (HELP)
  • FDIMAGE disk image (IMG)
  • JEMM memory manager (OVL)
  • CADIF format (generic) (PAF)
  • CADIF format (v4) (PAF)
  • CADIF format (v6) (PAF)
  • CADIF format (v7) (PAF)
  • Personal Ancestral File (v3) (PAF)
  • Personal Ancestral File (v4) (PAF)
  • CADSTAR PCB design (binary) (PCB)
  • PureBasic Resident data (Linux/OS X) (RES)
  • TeX Font Metric (0x11) (TFM)
  • My Family Tree data (TRE)
89
Hello trid users,

some days ago i looked at the content of an exotic CD-ROM. There are also
stored samples which are misidentified. The samples have TFM file name suffix

There exist other variants. So in this session i will handle only variant with
lh=0x15. I will explain later what this means.

So i run trid utility on my TFM samples with lh=0x15h. The samples are not
recognized. Many are described as "Unknown!". For not unknown samples i get
many different descriptions, but all are wrong (see appended trid-v-old.txt in
output). I get many dozens of such TFM samples. It took some time to get dozen
of non TFM samples which matches the misidentified TFM samples.

For comparison reason i also run the file format identification utility DROID
(See https://sourceforge.net/projects/droid/). Here the samples are also not
recognized and here i got no false description.

For comparison reason i also run file command (version 5.45) on such
samples. Here all samples are not recognized and not described as "TeX font
metric data". Many are described as "data". Unfortunately i get for some
samples also other descriptions.  (see appended file-5.45.txt in output). For
theses TFM samples here mime type application/x-tex-tfm is not shown (see
appended file-i-5.45.txt in output). Here no file name suffix is shown (see
appended file-tex-ext-5.45.txt in output).

Luckily i found page about TeX Font Metrics on file formats archive team web
site and on Wikipedia. So i use the first because the Wikipedia link is there
also mentioned and furthermore link to download samples are here listed. So
the reference URL in new definition is expressed by line like:
 <RefURL>http://fileformats.archiveteam.org/wiki/TeX_Font_Metrics/RefURL>

So i run tridscan on my samples to generate tfm-tex-0x15.trid.xml.  Afterwards
i tried to understand the generated constructs and look if these are always
true. According to specification the six-word (24-byte) file header contains
twelve unsigned 16-bit integers which describes general TFM characteristics
(the length of the file, the range of character codes contained in the font,
and the size of each of the tables). According to specification i patched file
command ( See appended file.tmp in output).

On specification are some formulas listed like:
    bc-1 <= ec < =255
    ne <=256
    lf=6+lh+(ec-bc+1)+nw+nh+nd+ni+nl+nk+ne+np

That means that at least three fields (bc,ec,ne) are always lower 256. Because
the files are stored in big endian format that means upper byte of these
fields are nil. Apparently nearly all others of these twelve fields are below
256. So at even offsets we have nil bytes. That is expressed by XML constructs
like:
   ...
   <Pattern>
      <Bytes>0001</Bytes>
      <Pos>14</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>18</Pos>
   </Pattern>
   ..
The only exceptions is number of words file length and number of words in the
lign table. The file length is sometimes bigger than 255. That value is stored
at offset 0 as field lf in word units. By multiplying this value with 4 the
file size in bytes can be obtained.

At offset 10 the number of words in the height table is stored as 2 byte
integer in big endian format. In this variant this value is always 10h. That
is expressed by XML construct that looks like:
   <Pattern>
      <Bytes>001000</Bytes>
      <Pos>10</Pos>
   </Pattern>
But i do not know if this always true. So i keep it at the moment and mention
my observations in remark line.

At offset 14 the number of words in italic correction table is stored as 2
byte integer in big endian format. In this variant this value is always
1. That is expressed by XML construct that looks like:
   <Pattern>
      <Bytes>0001</Bytes>
      <Pos>14</Pos>
   </Pattern>
But i do not know if this always true. So i keep it at the moment and mention
my observations in remark line.

At offset 20 the number of words in extensible character table is stored as 2
byte integer in big endian format. In this variant this value is always 0. At
offset 22 the number of font parameter words is stored as 2 byte integer in
big endian format. In this variant this value is always 7.  That is expressed
by XML construct that looks like:
   <Pattern>
      <Bytes>00000007</Bytes>
      <Pos>20</Pos>
   </Pattern>
But i do not know if this always true. So i keep it at the moment and mention
my observations in remark line.

Then only remaining construct looks like:
   <Bytes>00A00000</Bytes>
   <Pos>28</Pos>

According to documentation this is element header[1]. That is the size of the
font in fix_word are units (4 bytes) of TeX points. So in the samples the
"value" is 00A00000. In the other variant i got different values.  But i do
not know if this always true. So i keep it at the moment and mention my
observations in remark line.

If samples are real TFM that can be verified by running a command line tool
like:
   tftopl tri10u.tfm tri10u.pl

Now comes the interesting part. At offset 2 the length of the header data is
in word units. For some dozens of my inspected TFM samples this value is 21
(=0x15). The samples in this session all have this value. Together with upper
nil byte of ec (last character code in the font) this significant part is
expressed by XML construct like:
   <Bytes>001100</Bytes>
   <Pos>2</Pos>

As far as i can see there exist less than dozen of variants with other lh
values. In other variants the lh values does change only a little bit. I will
handle the other variants in a future session.

The next characteristic construct in this variant looks like:
 <Bytes>000000000000000000000000000000000000000000000000000000000000000000000000
 0948504155544F54464D00000000000000000000800000</Bytes>
 <ASCII> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
 . H P A U T O T F M</ASCII>
 <Pos>36</Pos>
At offset 33 an ASCII like coding scheme name is stored. The maximal string
length is 39 and this length value is stored in byte before.  In this variant
with just only some dozen of samples the coding names are very short (like 6J
8U 9T 10U see file.tmp in output). So longest length is 3. That mean 36
remaining bytes of this names are unused. So we get 36 nil bytes at offset 36.
At offset 73 an ASCII like font family name is stored. The length of this
string is always 9 in this variant and this length value is stored in byte
before and the font family in all samples of this variant is HPAUTOTFM.  Like
in variant with lh=0x12 according to documentation the header[17] word
contains a first byte called the seven_bit_safe_flag, then two bytes that are
ignored, and a fourth byte called the face. When looking in file.tmp for
seven_bit_safe_byte i get value always 0x80. Apparently for two ignored/unused
bytes i get values nil. For face byte i got different values (like 0 1 3).
Assuming that there maybe exist samples with longer encoding names with
maximal 39 bytes the above construct becomes like:
 <Bytes>0948504155544F54464D00000000000000000000800000</Bytes>
 <ASCII> . H P A U T O T F M</ASCII>
 <Pos>72</Pos>
I do not know if there exist samples with other font families or if HPAUTOTFM
family is a characteristic of variant with lh=15h. So keep this and mention my
observations in remark line. I found samples in sub directories helvetica and
symbol in parent directory monotype which itself is found in
/usr/share/texlive/texmf-dist/fonts/tfm (Linux Mint 21.2).

Compared with first variant data head contains 3 more element. That are
header[18] at offset 96, header[19] at 100 and header[20] at 104.  So for
these additional elements i got at some locations constant values. That is
expressed by XML constructs like:
   <Pattern>
      <Bytes>4B4E</Bytes>
      <ASCII> K N</ASCII>
      <Pos>96</Pos>
   </Pattern>
   <Pattern>
      <Bytes>0000</Bytes>
      <Pos>99</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>102</Pos>
   </Pattern>
   <Pattern>
      <Bytes>0000</Bytes>
      <Pos>104</Pos>
   </Pattern>
I do not know if this really is characteristic for this variant. So i keep it
and mention my observation in remark line.

Then according to documentation here at offset 108 the next structure
starts. That is array char_info. The units of this array is char_info_word (4
bytes). Like in 0x11 variant apparently parts of char_info_word often are
nil. So these observations are expressed by constructs like:
   <Pattern>
      <Bytes>0000</Bytes>
      <Pos>110</Pos>
   </Pattern>
   <Pattern>
      <Bytes>0000</Bytes>
      <Pos>114</Pos>
   </Pattern>
   ...
I do not understand what and why, but this not relevant at the moment. When i
understand documentation right in worst case bc (first character code) is
equal to ec (last character code). That means this array would contain only 1
element. That starts at offset 108 and at offset 111 next structure would
start. So i can delete all constructs with offset 108 and higher. So only one
construct survive. That describes first element char_info[0]. That is done by
construct that looks like:
   <Pattern>
      <Bytes>0000</Bytes>
      <Pos>110</Pos>
   </Pattern>

With the new definition all TFM samples with header size 0x15 are now
recognized and described (see appended trid-v-new.txt trid-new.txt in
output). The definition is "good", that it does not misidentifies non TFM
samples. And because of some more conditions compared with other variants the
description as "TeX Font Metric" comes first.

TrID definitions, some samples and output are stored in archive
tfm_0x15.zip. I hope that my definition can be used in future version of
triddefs. As mentioned there exist other variants of TFM. I will try to handle
these in a future session.

With best wishes
J?rg Jenderek
90
Thanks! I will rename it as the older def, so it will take its place.
Pages: 1 ... 7 8 [9] 10