Recent Posts

Pages: 1 ... 5 6 [7] 8 9 10
61
Definitions DB change log / Re: Current - Year 2024
« Last post by Mark0 on May 09, 2024, 02:53:22 AM »
Added:
  • CADSTAR data (generic) ()
  • Manga Studio data (generic) ()
  • Manga Studio Page (CPG)
  • Craft Factory design (CRA)
  • Manga Studio Story (CST)
  • Acorn CP/M disk image (DSD)
  • 16bit DOS COM2EXE (Tom Torfs) executable (v1.0) (EXE)
  • 32bit DOS Executable WDOSX extender (v0.97) (EXE)
  • CADSTAR Report Generator (format 2) (RGF)
62
Thanks!
63
Definitions DB change log / Re: Current - Year 2024
« Last post by Mark0 on May 05, 2024, 02:28:20 PM »
Updated:
  • Boost serialization archive (binary, 32bit) ()
  • Fold compressed archive (ARK)
  • OvationPro document (DPD)
  • 16bit DOS EXE Fold sfx compressed (v1.x) (EXE)
  • Fold compressed (FOL)
  • XPK compressed data (XPKF/XPK)
Added:
  • Boost serialization archive (binary, 64bit) ()
  • YAS serialized data (binary) ()
  • BootX BootBlocks Library (BBLIB)
  • BootX learned bootblocks (BRAIN)
  • Dir Logo Maker bitmap (DLM)
  • 16bit DOS EXE Fold SFX compressed archive (v1.16e) (EXE)
  • STALCRAFT Model (MCSA/MCVD)
  • STALCRAFT texture (OL)
  • Pack List Archive (PKA)
  • BootX bootblock Recognition data (RECOG)
  • TommySoftware CAD/Draw Library (v2) (T2L)
  • MediaWiki XML export (v1.x) (XML)
64
Hello trid users,

some days ago i had trouble with booting UEFI system with GRUB and secure
boot. So i first look at files on EFI partition mounted on /boot/efi. In sub
directory with name like ubuntu i found files like BOOTX64.CSV with file name
suffix CSV.

So i run trid utility on my CSV samples. Most of the samples are not
recognized. Some "artificial" samples with BOM (byte order mark) are described
as "Text - UTF-16 (LE) encoded" by txt-utf-16-le.trid.xml with mime type
text/plain and file name suffix (.TXT) (see appended trid-old.txt
trid-v-old.txt in output).

For comparison reason i also run the file format identification utility DROID
(See https://sourceforge.net/projects/droid/). Here all samples are
recognized. These are described as "Comma Separated Values" by PUID x-fmt/18
with mime type text/csv. But the recognition is here only based on the file
name suffix (see appended droid-sbat.csv in output).

For comparison reason i also run file command (version 5.45) on such
samples. Here most CSV samples especially all real world examples are not
recognized and are described as "data".  A few like sbat.csv are described as
"CSV Unicode text" with UTF-16, little-endian encoding. A few are described as
"Unicode text" with UTF-16, little-endian encoding (see appended file-5.45.txt
in output). For the "recognized " samples here mime type text/csv or
text/plain is shown (see appended file-i-5.45.txt in output). Here no file
name suffix is shown (see appended file-ext-5.45.txt in output).

Luckily i found page about UEFI shim boot loader on github web site. So i use
this. So the reference URL in new definition is expressed by line like:
   <RefURL>https://github.com/rhboot/shim/blob/main/SBAT.md</RefURL>
Unfortunately there no precisely file format for this CSV is listed here.
Some more details can be found at Rod Smith page about managing EFI boot
loaders for Linux and fallback. See:
   https://www.rodsbooks.com/efi-bootloaders/fallback.html

The CSV samples contain comma-separated values (CSV). One line contains 4 data
elements separated by commas (filename, label, options,description).
Unfortunately the exact encoding is not mentioned, but apparently for real
world examples this is UTF-16 (little endian) without BOM (Byte Order
Mark). But i do not know if this always true. So i mention my observations in
the remark line. Because in samples ASCII like strings are stored as UTF-16 LE
i got at odd offset nil bytes. That is expressed by XML constructs like:
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>1</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>3</Pos>
   </Pattern>
   ...
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>107</Pos>
   </Pattern>
Maybe that there exist samples with label or description field in
Chinese. Then 16 bits for characters are used and nil bytes at higher offset
will vanish. At the beginning the file name of UEFI bootable is stored. For
system files all operating system i know used English based names. So the nil
bytes at lower offset will probably always be true.

At the beginning the file name of bootable executables is stored. In real
world examples i found here similar strings (shimx64.efi, shimia32.efi,
refind_x64.efi). The UEFI staff is mainly pushed by Intel and Microsoft. The
standard bootable on such systems is like BOOTX64.EFI or bootia32.efi. So i
assume that other partners use the Windows convention to characterize such
bootable executables by 4 byte .efi sting at the end of file name. That is
expressed inside global strings section by line like:
      <String>.'E'F'I</String>
So this probably always true.

The description field in my inspected real world samples start with phrase
"This is the boot entry for " followed by Linux distribution name (like redhat
ubuntu). That is expressed inside global strings section by line like:
   <String>T'H'I'S' 'I'S' 'T'H'E' 'B'O'O'T' 'E'N'T'R'Y' 'F'O'R</String>
Maybe that there exist some exotic samples with Chinese text. Then the above
construct would be not true any more.

Most Linux (and Windows and macOS) text editors create ASCII files by default
for CSV samples. Or because of nil bytes does not handle the SBAT samples as
text when opening. But as "Text - UTF-16 (LE) encoded" instead of
application/octet-stream get mime type text/plain here we got a similar mime
type. So i choose what is shown by file command and DROID. So this expressed
by line like:
   <Mime>text/csv</Mime>

With the new definition such CSV samples are now recognized and described (see
appended trid-v-new.txt trid-new.txt in output).

TrID definitions, some samples and output are stored in archive CSV_.zip. I
hope that my definition can be used in future version of triddefs.

With best wishes
J?rg Jenderek

65
Definitions DB change log / Re: Current - Year 2024
« Last post by Mark0 on May 01, 2024, 08:06:13 PM »
Updated:
  • Krita resource Bundle (BUNDLE)
  • GDSII stream format layout (binary) (GDS)
  • Open Virtualization Format descriptor (OVF)
  • Open Virtualization Format descriptor (UTF-8) (OVF)
  • GDSII stream format layout (text format) (TXT)
Added:
  • Krita layer/tile ()
  • Krita Action collection (ACTION)
  • 7th Level game data (BIN)
  • Krita Color scheme (COLORS)
  • Fontconfig Configuration (CONF)
  • GFA BASIC Win 3.x compiled Executable (EXE)
  • Krita Gamut Mask (KGM)
  • Krita SeExpr script (KSE)
  • Krita Window Layouts (KWL)
  • Krita Workspace (KWS)
  • Library Exchange Format (v5.x) (LEF)
  • Library Exchange Format (v5.x, with rem) (LEF)
  • PC-Label Label (LBL)
  • PC-File mail-merge Letter (LTR)
  • OASIS stream (OAS)
  • Krita input Profile (PROFILE)
  • PC-File Report (REP)
  • Krita Shorcuts (SHORTCUTS)
66
Thanks!
67
Thanks!
68
Definitions DB change log / Re: Current - Year 2024
« Last post by Mark0 on April 29, 2024, 01:59:32 PM »
Updated:
  • Battery 3 Drum Kit (KT3)
  • Movie Magic Screenwriter document (generic) (MMSW/SCW)
  • Battery 3 quick load sample data (NOV)
Added:
  • Easy Reading Electronic Book format content ()
  • Mac Installation Tome ()
  • Nullsoft Install data (BIN)
  • DreamLight PixelPalette colors data (ACF)
  • Storm C++ Debug strings (DEBUG)
  • Croissant dataset description (0.8 ) (JSON)
  • Croissant dataset description (1.0 ) (JSON)
  • KIT Scenarist script (KITSP)
  • Movie Magic Screenwriter document (v6.x) (MMSW)
  • Battery 2 quick load sample data (NOV)
  • Movie Magic ScriptThing document (v2.x) (SCR)
  • Movie Magic ScriptThing document (v2.x, OLD) (SCR)
  • Movie Magic Screenwriter document (v2.x) (SCW)
  • Movie Magic Screenwriter document (v3.x) (SCW)
  • Movie Magic Screenwriter document (v4.x) (SCW)
  • Score Perfect Professional Song (SON)
  • Score Perfect Professional Font (SPF)
Deleted:
  • Movie Magic Screenwriter document (SCW)
69
Hello trid users,

some days ago i looked at the content of an exotic CD-ROM. There are also
stored samples which are misidentified. The samples have TFM file name suffix

There exist other variants. So in this session i will handle only variant with
lh=0x23. I will explain later what this means. I found no such samples after
installing MiKTeX version 23.12 on Windows.  On Linux Mint 21.3 i found
hundreds of such samples with font family name OTF KANJI and encoding name
(TEX KANJI TEXT) as part of texlive-lang-japanese package with version
2021.20220204-1.

So i run trid utility on my TFM samples with lh=0x23. The samples are not
recognized. Many are described wrong as "Adobe PhotoShop Brush" by
abr.trid.xml with file name suffix (.ABR) (see appended trid-old.txt
trid-v-old.txt in output).

For comparison reason i also run the file format identification utility DROID
(See https://sourceforge.net/projects/droid/). Here no sample is recognized.

For comparison reason i also run file command (version 5.45) on such
samples. Here these TFM samples are not recognized and not described as "TeX
font metric data". These are described as "data" (see appended file-k-5.45.txt
in output). For theses TFM samples here mime type application/x-tex-tfm is not
shown (see appended file-i-5.45.txt in output). Here no file name suffix is
shown (see appended file-ext-5.45.txt in output).

Luckily i found page about TeX Font Metrics on file formats archive team web
site and on Wikipedia. So i use the first because the Wikipedia link is there
also mentioned and furthermore link to download samples are here listed. So
the reference URL in new definition is expressed by line like:
 <RefURL>http://fileformats.archiveteam.org/wiki/TeX_Font_Metrics/RefURL>

So i run tridscan on my samples to generate tfm-tex-0x23.trid.xml. Afterwards
i tried to understand the generated constructs and look if these are always
true. I just thought it is like other variants with just some less words in
data header, but unfortunately this is less than half of the truth.

According to mentioned specification the six-word (24-byte) file header
contains twelve unsigned 16-bit integers which describes general TFM
characteristics (the length of the file, the range of character codes
contained in the font, and the size of each of the tables). According to
specification i patched file command ( See appended file.tmp in output and
nonames/output).

On specification are some formulas listed like:
    bc-1 <= ec < =255
    ne <=256
    lf=6+lh+(ec-bc+1)+nw+nh+nd+ni+nl+nk+ne+np

The mentioned specification is an archived version on archive.org dated about
2012. Obviously these described items does not match "newer exotic" fonts like
Japanese. So i assume the described items only apply in full truth for fonts
with 8 bits or lower. The bc values at offset 4 in my samples was like 126
(=7Eh), 144 (=90h) or 146 (=92h). The ec values at offset 6 in my samples was
18 (12 hexadecimal).

Inside tfm-tex-0x23.trid.xml this is is expressed by XML construct that looks
like:
   <Bytes>0012000000</Bytes>
   <Pos>6</Pos>

So here all other values in header except first one (lf), nh, nd and ne are
constant in hundreds of samples. Maybe this is triggered that samples are part
of texlive-lang-japanese package. So i mention observed items inside remark
line:
The variant with lh=23h at offset 2, bc=2Bh (*4=504 576 584 file size) at
offset 4, ec=12h at offset 6, nw=0 at offset 8, nh<256 at offset 10, nd<256 at
offset 12, ni=2 at offset 14, nl=2 at offset 16, nk=1 at offset 18, ne<256 at
offset 20, np=0 at offset 22.

This is expressed by XML constructs like:
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>0</Pos>
   </Pattern>
   <Pattern>
      <Bytes>0012000000</Bytes>
      <Pos>6</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>12</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00020002000100</Bytes>
      <Pos>14</Pos>
   </Pattern>
   <Pattern>
      <Bytes>000100</Bytes>
      <Pos>22</Pos>
   </Pattern>
So here ec is also lower than bc.

Apparently the file size here is not stored in field lf at offset 0. Instead
the values appears at offset plus 4 bytes higher. So what is called ec
according to documentation contains here the file size in words. So by
multiplying this value (7Eh 90h 92h) with four you get the real file size (504
576 584) in bytes.

When using standard interpretation then value 9 at offset 27 in seventh
construct would be interpreted as part of 32-bit check sum. I believe that
this not true.  When using interpretation with 4 bytes shift then the
following 4 nil bytes would be checksum 0. The value zero means no check is
made. Then the next following 32-bit 00A00000 would mean design size of the
font in fix_word. I remember that i have seen that value in other variants. So
i believe this true. So i keep the above constructs.

If try to convert here like in other variants by running a command line tool like:
   tftopl rubyminr-v.tfm rubyminr-v.pl
      I got output like:
This is TFtoPL, Version 3.3 (MiKTeX 24.3)
There's some extra junk at the end of the TFM file,
but I'll proceed as if it weren't there.
The character code range 126..18 is illegal!
Sorry, but I can't go on; are you sure this is a TFM?

In other variants the number of words file length is stored as 2 byte integer
in big endian at offset 0. By multiplying this value with 4 the file size in
bytes can be obtained. In this variant this information is stored at offset 4
bytes higher.
      So i tried commands like:
   dd bs=1 skip=4 if=rubyminr-v.tfm of=rubyminr-v.bin
   tftopl zu-cidjmr5-v.bin zu-cidjmr5-v.pl
      Now i got output like:
This is TFtoPL, Version 3.3 (MiKTeX 24.3)
The file has fewer bytes than it claims!
Sorry, but I can't go on; are you sure this is a TFM?
      Then i tried commands like:
   cp rubyminr-v.bin rubyminr-v-mod.bin
   echo -n "1234" >> rubyminr-v-mod.bin
   tftopl rubyminr-v-mod.bin rubyminr-v-mod.pl
      Now i got output like:
This is TFtoPL, Version 3.3 (MiKTeX 24.3)
Subfile sizes don't add up to the stated total!
Sorry, but I can't go on; are you sure this is a TFM?

Apparently also the coding scheme name is stored at higher offset (37=33+4
compared with other variants). So after coding scheme name (maximal 39) like
(TEX KANJI TEXT) with string length 14 (=Eh) the remaining 25 padding bytes in
my examples are nil. Apparently also the font family name is stored at higher
offset (77=73+4 compared with other variants). So after family name (OTF KANJ
maximal 19) with string length 9 the remaining 10 padding bytes in my examples
are nil.  Here at offset 96 (92 plus 4 compared with other variants) again
seems to be stored seven bit safe byte with value 80h. These observations are
expressed by by XML constructs like:

<Bytes>00090000000000A00000
0E544558204B414E4A49205445585400000000000000000000000000000000000000000000000000
094F5446204B414E4A49000000000000000000008000000000000000212200</Bytes>
<ASCII> . . . . . . . . . .
. T E X   K A N J I   T E X T
. . . . . . . . . . . . . . . . . . . . . . . . .
. O T F   K A N J I . . . . . . . . . . . . . . . . . . ! "</ASCII>
<Pos>26</Pos>

When assuming 4 byte shifted interpretation at offset 32 probably the design
font size 00A0000 and before is 32-bit checksum 0, then before some bytes with
value 9, then delete bytes before "checksum" part. So when assuming bc=18 is
the real data header size then header[17] is last part in data header.  Here
at offset 96 (92 plus 4 compared with other variants) again seems to be stored
seven bit safe byte with value 80h. After this byte comes 2 unused byres
(apparently nil) followed by face byte. So at offset 100 next structure
starts. So i delete the last bytes after "face" byte. So the above construct
will become like:

<Bytes>0000000000A00000
0E544558204B414E4A49205445585400000000000000000000000000000000000000000000000000
094F5446204B414E4A490000000000000000000080000000</Bytes>
<ASCII> . . . . . . . .
. T E X   K A N J I   T E X T . . . . . . . . . . . . . . . . . . . . . . . . .
. O T F   K A N J I . . . . . . . . . . . . . .</ASCII>
<Pos>28</Pos>

That also means that patterns at higher offsets belong to next structures and
are similar because of lucky circumstances. That was expressed by XML
constructs like:
   <Pattern>
      <Bytes>21230004212400</Bytes>
      <ASCII> ! # . . ! $</ASCII>
      <Pos>108</Pos>
   </Pattern>
   <Pattern>
      <Bytes>2125000421260003212700032128000321290006212A0006213D000521440005
      <ASCII> ! % . . ! . . . ! ' . . ! ( . . ! ) . . ! * . . ! = . . ! D . .
      <Pos>116</Pos>
   </Pattern>
   <Pattern>
      <Bytes>110100</Bytes>
      <Pos>241</Pos>
   </Pattern>
   <Pattern>
      <Bytes>110102</Bytes>
      <Pos>245</Pos>
   </Pattern>
   <Pattern>
      <Bytes>110103</Bytes>
      <Pos>249</Pos>
   </Pattern>
   <Pattern>
      <Bytes>1101</Bytes>
      <Pos>253</Pos>
   </Pattern>
   <Pattern>
      <Bytes>011101</Bytes>
      <Pos>256</Pos>
   </Pattern>
   <Pattern>
      <Bytes>1101</Bytes>
      <Pos>261</Pos>
   </Pattern>
   <Pattern>
      <Bytes>1101</Bytes>
      <Pos>265</Pos>
   </Pattern>
   ...
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>500</Pos>
   </Pattern>
   <Pattern>
      <Bytes>0000</Bytes>
      <Pos>502</Pos>
   </Pattern>
So i delete such "high" patterns.

As far as i can see there exist less than dozen of variants with other lh
values. In other variants the lh values does change only a little bit. I will
handle the other variants in a future session.

With the new definitions TFM samples with lh=23h are now recognized and
described (see appended trid-v-new.txt trid-new.txt in output). The definition
is "good", that it does not misidentifies non TFM samples. And because of some
more conditions compared with other variants the description as "TeX Font
Metric" comes first.

TrID definitions, some samples and output are stored in archive
tfm_0x23.zip. I hope that my definition can be used in future version of
triddefs. As mentioned there exist other variants of TFM. I will try to handle
these in a future session.

With best wishes
J?rg Jenderek
70
Hello trid users,

some days ago i looked at the content of an exotic CD-ROM. There are also
stored samples which are misidentified. The samples have TFM file name suffix

There exist other variants. So in this session i will handle only variant with
lh=0x01. I will explain later what this means. I found few of such samples
(like gbm.tfm gbmv.tfm rml.tfm rmlv.tfm) in dvips directory with parent
directory ptex-fonts inside fonts\tfm sub directory tfm) after installing
MiKTeX version 23.12 on Windows. On Linux Mint 21.3 i found thousands of
samples with font family name as part of texlive-lang-japanese package with
version 2021.20220204-1. Hundreds of samples without coding and font family
names i found in packages (like dvi2ps-fontdata-ja dvi2ps-fontdata-rsp
dvi2ps-fontdata-tbank dvi2ps-fontdata-three dvi2ps-fontdata-ptexfake
texlive-lang-chinese).

So i run trid utility on my TFM samples with lh=0x01. The samples are not
recognized. Many are described wrong as "Adobe PhotoShop Brush" by
abr.trid.xml with file name suffix (.ABR).

For comparison reason i also run the file format identification utility DROID
(See https://sourceforge.net/projects/droid/). Here no sample is recognized.

For comparison reason i also run file command (version 5.45) on such
samples. Here these TFM samples are not recognized and not described as "TeX
font metric data". These are described as "data" (see appended file-k-5.45.txt
in nonames/output and output). For theses TFM samples here mime type
application/x-tex-tfm is not shown (see appended file-i-5.45.txt in
nonames/output and output). Here no file name suffix is shown (see appended
file-ext-5.45.txt in nonames/output and output).

Luckily i found page about TeX Font Metrics on file formats archive team web
site and on Wikipedia. So i use the first because the Wikipedia link is there
also mentioned and furthermore link to download samples are here listed. So
the reference URL in new definition is expressed by line like:
 <RefURL>http://fileformats.archiveteam.org/wiki/TeX_Font_Metrics/RefURL>

So i run tridscan on my samples to generate tfm-tex-0x01-names.trid.xml and
tfm-tex-0x01.trid.xml. Afterwards i tried to understand the generated
constructs and look if these are always true. I just thought it is like other
variants with just some less words in data header, but unfortunately this is
less than half of the truth.

According to mentioned specification the six-word (24-byte) file header
contains twelve unsigned 16-bit integers which describes general TFM
characteristics (the length of the file, the range of character codes
contained in the font, and the size of each of the tables). According to
specification i patched file command ( See appended file.tmp in output and
nonames/output).

On specification are some formulas listed like:
    bc-1 <= ec < =255
    ne <=256
    lf=6+lh+(ec-bc+1)+nw+nh+nd+ni+nl+nk+ne+np

The mentioned specification is an archived version on archive.org dated about
2012. Obviously these described items does not match "newer exotic" fonts like
Japanese. So i assume the described items only apply in full truth for fonts
with 8 bits or lower. The bc values at offset 4 in my samples was like 43
(=2Bh), 27 (=1Bh) or 33 (21 hexadecimal). The ec values at offset 6 in my
samples was 18 (12 hexadecimal) or 2.

Inside tfm-tex-0x01-names.trid.xml this is is expressed by XML construct that
looks like:
 <Bytes>0001002B001200000000000200020002000100000000000000090000000000A00000</Bytes>
 <ASCII> . . . +</ASCII>
 <Pos>2</Pos>

So here all values except first one are constant in thousands of
samples. Maybe this is triggered that samples are part of
texlive-lang-japanese package. So i mention observed items inside remark line:

The variant with lh=1h at offset 2, bc=2Bh (*4=172 file size) at offset 4,
ec=12h at offset 6, nw=0 at offset 8, nh=0 at offset 10, nd=2 at offset 12,
ni=2 at offset 14, nl=2 at offset 16, nk=1 at offset 18, ne=0 at offset 20,
np=0 at offset 22.

Apparently the file size here is not stored in field lf at offset 0. Instead
the values appears at offset plus 4 bytes higher. So what is called ec
according to documentation contains here the file size in words. So by
multiplying this value (2Bh=43) with four you get the real file size 172 in
bytes.

When using standard interpretation then value 9 in above construct would be
interpreted as 32-bit check sum. I believe that this not true.  When using
interpretation with 4 bytes shift then the following 4 nil bytes would be
checksum 0. The value zero means no check is made. Then the next following
32-bit 00A00000 would mean design size of the font in fix_word. I remember
that i have seen that value in other variants. So i believe this true. So i
keep the above construct.

Inside tfm-tex-0x01.trid.xml this is is expressed by XML constructs that looks
like:
   <Bytes>000100</Bytes>
   <Pos>2</Pos>
   ...
   <Bytes>000200000000000200020002000100000000000000</Bytes>
   <Pos>6</Pos>
So here ec is also lower than bc.

If try to convert here like in other variants by running a command line tool like:
   tftopl zu-cidjmr5-v.tfm zu-cidjmr5-v.pl
      I got output like:
This is TFtoPL, Version 3.3 (MiKTeX 24.3)
There's some extra junk at the end of the TFM file,
but I'll proceed as if it weren't there.
The header length is only 1!
Sorry, but I can't go on; are you sure this is a TFM?

In other variants the number of words file length is stored as 2 byte integer
in big endian at offset 0. By multiplying this value with 4 the file size in
bytes can be obtained. In this variant this information is stored at offset 4
bytes higher.
      So i tried commands like:
   dd bs=1 skip=4 if=zu-cidjmr5-v.tfm of=zu-cidjmr5-v.bin
   tftopl zu-cidjmr5-v.bin zu-cidjmr5-v.pl
      Now i got output like:
This is TFtoPL, Version 3.3 (MiKTeX 24.3)
The file has fewer bytes than it claims!
Sorry, but I can't go on; are you sure this is a TFM?
      Then i tried commands like:
   cp zu-cidjmr5-v.bin zu-cidjmr5-v-mod.bin
   echo.exe -n "1234" >> zu-cidjmr5-v-mod.bin
   tftopl zu-cidjmr5-v-mod.bin  zu-cidjmr5-v.pl
      Now i got output like:
This is TFtoPL, Version 3.3 (MiKTeX 24.3)
Subfile sizes don't add up to the stated total!
Sorry, but I can't go on; are you sure this is a TFM?

Apparently also the coding scheme name is stored at higher offset (37=33+4
compared with other variants). So after coding scheme name (maximal 39) like
(TEX KANJI TEXT) (UNSPECIFIED) the up case letter I and the remaining 25
padding bytes in my examples are expressed inside tfm-tex-0x01-names.trid.xml
by XML constructs like:
   <Pattern>
      <Bytes>49</Bytes>
      <ASCII> I</ASCII>
      <Pos>45</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00000000000000000000000000000000000000000000000000</Bytes>
      <Pos>51</Pos>
   </Pattern>
When assuming also other long encoding names with length 38 instead of maximal
39 then only one padding byte will survive and construct becomes like:
   <Bytes>00</Bytes>
   <Pos>75</Pos>

Afterwards at offset 77 (4 bytes higher than compared with other variants)
font family name (like OTF KANJI, JODEL or UNSPECIFIED maximal 19) is
stored. So after font family name (maximal 19) the remaining 8 padding bytes
are stored.  Here at offset 96 (92 plus 4 compared with other variants) again
seems to be stored seven bit safe byte with value 80h. These are expressed by
XML construct like:
 <Bytes>000000000000000080000000000000000111000000000000001000000000000000</Bytes>
 <Pos>88</Pos>
So when fields are here found at 4 higher offsets then what is shown here as
ec value is probably the real data header size with value 18. That means
header[17] is the last element. When this behave like described this element
contains a first byte called the seven_bit_safe_flag, then two bytes that are
ignored, and a fourth byte called the face. This also means at offset 100
(=24+4+18*4) next structure begins. When assuming also other long font family
with length 18 instead of maximal 19 then only on padding byte will survive
and construct becomes like:
 <Bytes>0080000000</Bytes>
 <Pos>95</Pos>

That also means that patterns at higher offset belong to next structures and
are similar because of lucky circumstances. That was expressed by XML
constructs like:
   <Pattern>
      <Bytes>0000000000</Bytes>
      <Pos>124</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00000000000000000000000000</Bytes>
      <Pos>132</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00000000001000000010000000</Bytes>
      <Pos>148</Pos>
   </Pattern>
   <Pattern>
      <Bytes>000000</Bytes>
      <Pos>162</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>168</Pos>
   </Pattern>
   <Pattern>
      <Bytes>0000</Bytes>
      <Pos>170</Pos>
   </Pattern>
So i delete such "high" patterns.

Unfortunately there exist a variant with lh=1 value and without ASCII like
strings for encoding scheme name and font family names. So hundreds of such
samples are described by tfm-tex-0x01.trid.xml.

In other variant at offset 37 encoding names with maximal length 39 are
stored. This followed at offset 77 with font family name with maximal length
19. So in my naive thinking i would expect nil bytes in that area when there
exist no names. But apparently this not true. So this is expressed by XML
constructs like:
   <Pattern>
      <Bytes>0000000000A00000000000000111000000000000001000000000000000</Bytes>
      <Pos>28</Pos>
   </Pattern>
   <Pattern>
      <Bytes>0000000000</Bytes>
      <Pos>60</Pos>
   </Pattern>
   <Pattern>
      <Bytes>0000000000000000000000000000</Bytes>
      <Pos>68</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00000000001000000010000000</Bytes>
      <Pos>84</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00000000000000000000</Bytes>
      <Pos>98</Pos>
   </Pattern>
So i do not know and understand what is going on here. So i keep these
constructs.

When comparing with other variants the next following 32-bit 00A00000 at
offset 32 would mean design size of the font in fix_word. I remember that i
have seen that value in other variants. So i believe this true. So i mention
this observation in remark line.

The significant part with lh=1 is expressed in this variant by XML construct
like:
   <Bytes>000100</Bytes>
   <Pos>2</Pos>
Apparently the file size here is also not stored in field lf at offset
0. Instead the values appears at offset plus 4 bytes higher. So what is called
ec according to documentation contains the file size in words. So by
multiplying this value (1Bh=27 or 21h=33) with four you get the real file size
(108 or 132) in bytes.

The next significant part is expressed by XML construct like:
   <Bytes>000200000000000200020002000100000000000000</Bytes>
   <Pos>6</Pos>
So the remaining fields in header are also constant and in most fields i get the
same value as in other variant. The only difference it that ec field at offset
6 has value 2. So i mention observed items inside remark line.

As far as i can see there exist less than dozen of variants with other lh
values. In other variants the lh values does change only a little bit. I will
handle the other variants in a future session.

With the new definitions TFM samples with lh=01 are now recognized and
described (see appended trid-v-new.txt trid-new.txt in output and
nonames/output/). The definition is "good", that it does not misidentifies non
TFM samples. And because of some more conditions compared with other variants
the description as "TeX Font Metric" comes first.

TrID definitions, some samples and output are stored in archive
tfm_0x01.zip. I hope that my definition can be used in future version of
triddefs. As mentioned there exist other variants of TFM. I will try to handle
these in a future session.

With best wishes
J?rg Jenderek
Pages: 1 ... 5 6 [7] 8 9 10