Author Topic: tfm-tex-0x12.trid.xml for TeX Font Metric; variant with lh=0x12  (Read 382 times)

jenderek

  • Sr. Member
  • ****
  • Posts: 375
Hello trid users,

some days ago i looked at the content of an exotic CD-ROM. There are also
stored samples which are misidentified. The samples have TFM file name suffix

There exist other variants. So in this session i will handle only variant with
lh=0x12. I will explain later what this means.

So i run trid utility on my TFM samples with lh=0x12. The samples are not
recognized. Many are described as "Unknown!". For not unknown samples i get
many different descriptions, but all are wrong (see appended trid-v-old.txt in
output). I get many hundreds TFM samples. It took some time to get dozen of
non TFM samples which matches the misidentified TFM samples.

For comparison reason i also run the file format identification utility DROID
(See https://sourceforge.net/projects/droid/). Here the samples are also not
recognized and here i got no false description.

WFor comparison reason i also run file command (version 5.45) on such
samples. Here all samples are "recognized" and described as "TeX font metric
data". Also some more details are shown. In parenthesis the coding scheme name
{like () (FONTSPECIFIC) (EC Encoding /Cork/) (L7X Encoding /Lithuanian/) (TEX
MATH EXTENSION) (QX Encoding) (UNSPECIFIED)} is shown (see appended
file-k-5.45.txt in output). This can be seen more clearly when using only tex
magic pattern (see appended file-tex-5.45.txt in output). Unfortunately i get
for most samples also another description when using keep option -k of file
command. Even worse in some samples ( like aebx7.tfm cmvtt10.tfm rtxi.tfm
rtxmi.tfm texnansi-qplb.tfm txbsyc.tfm zpsycmrv.tfm ) the wrong description
comes first. This can be seen more clearly when using no keep going option of
file command (see appended file-5.45.txt in output). For the TFM samples mime
type application/x-tex-tfm is shown (see appended file-tex-i-5.45.txt in
output). Here no file name suffix is shown (see appended file-tex-ext-5.45.txt
in output).

Luckily i found page about TeX Font Metrics on file formats archive team web
site and on Wikipedia. So i use the first because the Wikipedia link is there
also mentioned and furthermore link to download samples are here listed. So
the reference URL in new definition is expressed by line like:
 <RefURL>http://fileformats.archiveteam.org/wiki/TeX_Font_Metrics/RefURL>

So i run tridscan on my samples to generate tfm-tex-0x12.trid.xml.
Afterwards i tried to understand the generated constructs and look if these
are always true. According to specification the six-word (24-byte) file header
contains twelve unsigned 16-bit integers which describes general TFM
characteristics (the length of the file, the range of character codes
contained in the font, and the size of each of the tables). According to
specification i patched file command ( See appended file.tmp in output).

On specification are some formulas listed like:
    bc-1 <= ec < =255
    ne <=256
    lf=6+lh+(ec-bc+1)+nw+nh+nd+ni+nl+nk+ne+np
That means that at least three fields (bc,ec,ne) are always lower 256. Because
the files are stored in big endian format that means upper byte of these
fields are nil. Apparently nearly all others of these twelve fields are below
256.  So at even offsets we have nil bytes. That is expressed by XML
constructs like:
   ...
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>20</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>22</Pos>
   </Pattern>
   <Pattern>

The only exceptions are number of words in the kern table and file length. The
first value is stored at offset 16 as field nk and is sometimes bigger than
255.  Also the file length is sometimes bigger than 255. That value is stored
at offset 0 as field lf in word units. By multiplying this value with 4 the
file size in bytes can be obtained.

If samples are real TFM that can be verified by running a command line tool
like:
   tftopl tgoth10.tfm  tgoth10.pl

Now comes the interesting part. At offset 2 the length of the header data is
in word units. For most of my inspected TFM samples this value is 18
(=0x12). The samples in this session all have this value. Together with upper
nil byte of ec (last character code in the font) this significant part is
expressed by XML construct like:
   <Bytes>001200</Bytes>
   <Pos>2</Pos>

As far as i can see there exist less than dozen of variants with other lh
values. In other variants the lh values does change only a little bit. I will
handle the other variants in a future session.

According to documentation the header[17] word contains a first byte called
the seven_bit_safe_flag, then two bytes that are ignored, and a fourth byte
called the face. When looking in file.tmp for seven_bit_safe_byte i get value
0 or 0x80.  Apparently for two ignored/unused bytes i get value nil.  For face
byte i got different values (like 0 0xea 0xee 0xf4). So the 2 unused bytes are
expressed by last construct. That look like:
   <Bytes>0000</Bytes>
   <Pos>93</Pos>

At offset 33 an ASCII like coding scheme name is stored. For misidentified
samples as this place i often got non-ASCII garbage with octal values. The
maximal string length is 39 and this length value is stored in byte before.
At offset 73 an ASCII like font family name is stored (like CMR ECBX
DUMMYSPAC-FONTFORGE TEX-PAGD8R-CSC UNSPECIFIED HelveNarBol) is stored.  The
maximal string length is 19 and this length value is stored in byte before.

With the new definition all TFM samples with header size 0x12 are now
recognized and described (see appended trid-v-new.txt trid-new.txt in
output). The definition is "good" , that it does not misidentifies non TFM
samples. Unfortunately for some TFM samples the description as TeX Font Metric
is not the first. The main reason is that significant characteristic is done
by 16-bit lh value.

TrID definitions, some samples and output are stored in archive
tfm_0x12.zip. I hope that my definition can be used in future version of
triddefs. As mentioned there exist other variants of TFM. I will try to handle
these in a future session.

With best wishes
J?rg Jenderek

Mark0

  • Administrator
  • Hero Member
  • *****
  • Posts: 2743
    • Mark0's Home Page
Re: tfm-tex-0x12.trid.xml for TeX Font Metric; variant with lh=0x12
« Reply #1 on: April 01, 2024, 09:12:33 PM »
Thanks!