Recent Posts

Pages: 1 2 [3] 4 5 ... 10
21
Definitions DB change log / Re: Current - Year 2024
« Last post by Mark0 on April 06, 2024, 03:38:51 PM »
Updated:
  • Core Audio Format (CAF)
  • Leica Image File Format (LIF)
  • Shorten lossless compressed audio (SHN)
Added:
  • Chess Assistant DB (v3.0) (generic) ()
  • Chess Assistant DB (v4.0) (generic) ()
  • ENVI Annotations (ANZ)
  • ENVI ASCII Plot (ASC/TXT)
  • Chess Assistant DB (v3.0) - Chess Data Pro (v1.1) (CDP)
  • Super Solvers: Gizmos and Gadgets! players (DAT)
  • Postal Demo (DMO)
  • ENVI Density Slice Range (DSR)
  • Chess Assistant DB (v3.0) - Data Pro Volume (v1.1) (EMB/ST0)
  • ENVI Pyramid (ENP)
  • ENVI File list (EVF)
  • Postal Font (FNT)
  • ENVI Grid parameters (GRD)
  • Postal GUI element (GUI)
  • ENVI Header (HDR)
  • ENVI classic Header (HDR)
  • Postal MultiAlpha (MLP)
  • ENVI Modeler Model (MODEL)
  • ENVI Mosaic template (MOS)
  • ENVI n-D visualizer state (NDV)
  • PyDev Project (PYDEVPROJECT)
  • Postal Realm data (RLM)
  • ENVI Series (SERIES)
  • SARscape XML Header (SML)
  • ENVI Statistics (STA)
  • Chess Assistant DB (v4.0) - Data Pro Tree (v1.1) (TD0)
  • ENVI Vector template (VEC)
  • ENVI Regions Of Interest (XML)
  • MPEG-H Audio Scene Configuration (XML)
22
Hello trid users,

some days ago i looked at the content of an exotic CD-ROM. There are also
stored samples which are misidentified. The samples have TFM file name suffix.

Unfortunately other TFM samples are misidentified as other file formats. Some
samples are misidentified as "interLaced eXtensible Trace" by lxt.trid.xml
without reference URL and with generic mime type application/octet-stream. The
file name suffix is LXT (see appended trid-v-old.txt in output). The
recognition happens by one XML construct. That looks like:
   <Bytes>0138</Bytes>
   <ASCII> . 8</ASCII>
   <Pos>0</Pos>
So only 16 bit are used for recognition. Apparently this is often too
weak. According to file command recommendations at least 32 bits should be
used for recognition.

For comparison reason i also run the file format identification utility DROID
(See https://sourceforge.net/projects/droid/). Here the samples are also not
recognized.

For comparison reason i also run file command (version 5.45) on such
samples. Here all samples are "recognized". All samples are first described
here as "interLaced eXtensible Trace (LXT) file".  Also some more details are
shown. In parenthesis the version number is shown coding scheme name. For LXT
sample i got 1 whereas for TFM samples i get "high" values 17 and 18 (see
appended file-k-5.45.txt in output).  For the samples here also generic
application/octet-stream mime type is shown (see appended file-i-5.45.txt in
output). Here no file name suffix is shown (see appended file-ext-5.45.txt in
output).  When using keep going option -k i get for TFM samples a second and
correct description. The TFM samples are described as "TeX font metric data"
(see appended file-k-5.45.txt in output). Now i also got for the TFM samples
here instead of a generic mime type application/x-tex-tfm (see appended
file-k-i-5.45.txt in output).

Luckily there exist a "Wave Analyzer User's Guide" of GTKWave with some useful
information. That PDF document can be found on GTKWave page on sourceforge. So
this used in new definition as reference. That is expressed by line like:
   <RefURL>https://gtkwave.sourceforge.net/gtkwave.pdf</RefURL>

Now comes the interesting part. In Appendix D "LXT File Format" of user guide
some useful information are written. An LXT file starts with a two byte
LT_HDRID. That is defined as constant value 0x0138.  This characteristic is
used by current TrID and file command as pattern. Afterward comes the two byte
version number LT_VERSION. That is what is shown by file command as
version. In current guide (dated Nov 14, 2020 for GTKWave 3.3.108 and higher
versions) this is defined as constant value 0x0001. The last byte in the file
is the LT_TRLID. This is defined as constant value 0xB4. So these five bytes
are the only "absolutes" in an LXT file. So the file content looks like:
     01 38 00 01 ...file body... B4

So i create a variant lxt-v1.trid.xml for version one. The described
characteristics are here expressed by XML construct looking like:
   <Bytes>01380001</Bytes>
   <ASCII> . 8</ASCII>
   <Pos>0</Pos>

In the guide is also written that that LXT2 files use a completely different
file format as well as different constant values. I interpret that version is
at the moment 1 and apparently will never change (increase higher like 2)
because there exist LXT2 with other file format. So my conclusion is that my
new definition lxt-v1.trid.xml can be used as replacement for lxt.trid.xml.

With the new definition instead of old now the wrong description vanish. The
LXT samples are described by lxt-v1.trid.xml. The TFM samples are not
described as LXT samples and described often as "TeX Font Metric" (see
appended trid-v-new.txt trid-new.txt in output).

TrID definitions, some samples and output are stored in archive tfm_lxt.zip. I
hope that my definition can be used in future version of triddefs.

With best wishes
J?rg Jenderek
23
Hello trid users,

some days ago i looked at the content of an exotic CD-ROM. There are also
stored samples which are misidentified. The samples have TFM file name suffix

There exist other variants. So in this session i will handle only variant with
lh=0x11. I will explain later what this means.

So i run trid utility on my TFM samples with lh=0x11. The samples are not
recognized. Many are described as "Unknown!". For not unknown samples i get
many different descriptions, but all are wrong (see appended trid-v-old.txt in
output). I get many hundreds TFM samples. It took some time to get dozen of
non TFM samples which matches the misidentified TFM samples.

For comparison reason i also run the file format identification utility DROID
(See https://sourceforge.net/projects/droid/). Here the samples are also not
recognized.

For comparison reason i also run file command (version 5.45) on such
samples. Here all samples are "recognized" and described as "TeX font metric
data". Also some more details are shown. In parenthesis the coding scheme name
{like (TeXBase1Encoding) (AdobeStandardEncoding) (cochalphEncoding)
(FontSpecific) (kerkisec) (TeXBase1Encoding) } is shown (see appended
file-k-5.45.txt in output). This can be seen more clearly when using only tex
magic pattern (see appended file-tex-5.45.txt in output). Unfortunately i get
for most samples also another description when using keep option -k of file
command. Even worse in some samples (like NewTXMI.tfm fxlzi-5letters.tfm
pplri8a.tfm rpplru.tfm rtxbmi-rev.tfm) the wrong description comes first.
This can be seen more clearly when using no keep going option of file command
(see appended file-5.45.txt in output). For the TFM samples mime type
application/x-tex-tfm is shown (see appended file-tex-i-5.45.txt in
output). Here no file name suffix is shown (see appended file-tex-ext-5.45.txt
in output).

Luckily i found page about TeX Font Metrics on file formats archive team web
site and on Wikipedia. So i use the first because the Wikipedia link is there
also mentioned and furthermore link to download samples are here listed. So
the reference URL in new definition is expressed by line like:
 <RefURL>http://fileformats.archiveteam.org/wiki/TeX_Font_Metrics/RefURL>

So i run tridscan on my samples to generate tfm-tex-0x11.trid.xml. Afterwards
i tried to understand the generated constructs and look if these are always
true. According to specification the six-word (24-byte) file header contains
twelve unsigned 16-bit integers which describes general TFM characteristics
(the length of the file, the range of character codes contained in the font,
and the size of each of the tables). According to specification i patched file
command ( See appended file.tmp in output).

On specification are some formulas listed like:
    bc-1 <= ec < =255
    ne <=256
    lf=6+lh+(ec-bc+1)+nw+nh+nd+ni+nl+nk+ne+np

That means that at least three fields (bc,ec,ne) are always lower 256. Because
the files are stored in big endian format that means upper byte of these
fields are nil. Apparently nearly all others of these twelve fields are below
256.  So at even offsets we have nil bytes. That is expressed by XML
constructs like:
   ...
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>12</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>14</Pos>
   </Pattern>

At offset 16 the number of words in lig table is stored as 2 byte integer in
big endian (nl). This is followed by number of words in kern table (nk). This
is followed by number of words in extensible character table (ne). In my
samples these 3 integers are 0, but i do not know if this is always true
in. In variant ( with lh=0x12) it is not. So i mention my observations in the
remark line. At offset 22 the number of font parameter words (np) is stored. In
my examples the value was 6, but i do not know if this is always true in. In
variant (with lh=0x12) it is not. So i mention my observation i remark line.
These observations are expressed by line like:
   <Bytes>0000000000000006</Bytes>
   <Pos>16</Pos>

The only exceptions is the file length.  That value is stored at offset 0 as
field lf in word units. By multiplying this value with 4 the file size in
bytes can be obtained.

If samples are real TFM that can be verified by running a command line tool
like:
   tftopl Cochineal-alph.tfm Cochineal-alph.pl

Now comes the interesting part. At offset 2 the length of the header data is
in word units. For most of my inspected TFM samples this value is 17
(=0x11). The samples in this session all have this value. Together with upper
nil byte of ec (last character code in the font) this significant part is
expressed by XML construct like:
   <Bytes>001100</Bytes>
   <Pos>2</Pos>

As far as i can see there exist less than dozen of variants with other lh
values. In other variants the lh values does change only a little bit. I will
handle the other variants in a future session.

According to documentation the header[17] word at offset 92 contains a first
byte called the seven_bit_safe_flag, then two bytes that are ignored, and a
fourth byte called the face. I just used some days to understand why this does
not apply to current variant, because in documentation is also written that
this applies when this is present. The first element is header[0]. That means
header[17] is element number 18 (hexadecimal 12), but in this variant the
header size is 17 (lh=0x11). That means in this variant there do not exist
element header[17]. That means at that offset the next structure
starts. According to documentation this is array char_info. The units of this
array is char_info_word (4 bytes). Apparently one byte char_info_word often is
0. So these observations are expressed by constructs like:

   <Pattern>
      <Bytes>00</Bytes>
      <Pos>95</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>99</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>115</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>123</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>131</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>135</Pos>
   </Pattern>

I do not understand what and why, but this not relevant at the moment. When i
understand documentation right in worst case bc (first character code) is
equal to ec (last character code). That means this array would contain only 1
element. That starts at offset 92 and at offset 96 next structure would
start. So i can delete all constructs with offset 96 and higher. So only one
construct survive. That describes first element char_info[0]. That is done by
construct that looks like:
   <Bytes>01</Bytes>
   <Pos>92</Pos>
I do understand what exactly this means. I also do not know if this always
true. So i mention my observations in the remark line.


The header[2..11], if present, contains 40 bytes that identify the character
coding scheme. The first byte, which must be between 0 and 39. Apparently in
my examples the maximal length 39 for coding names was not used. So the
remaining bytes are filled with nils. That was expressed by construct like:
   <Bytes>0000000000000000000000000000000000</Bytes>
   <Pos>55</Pos>
Assuming that there may exist examples with longest possible coding name the
above construct vanish.

Then only remaining construct looks like:
   <Bytes>00A00000</Bytes>
   <Pos>28</Pos>
According to documentation this is element header[1]. That is the size of the
font in fix_word are units (4 bytes) of TeX points. So in the samples the
"value" is 00A00000. In the other variant i got different values. So i assume
that for 0x11 variant also other font sizes may exist. So i delete the above
pattern.

At offset 33 an ASCII like coding scheme name is stored. The maximal string
length is 39 and this length value is stored in byte before.  At offset 73 an
ASCII like font family name is stored (like CMR ECBX DUMMYSPAC-FONTFORGE
TEX-PAGD8R-CSC UNSPECIFIED HelveNarBol) is stored. The maximal string length
is 19 and this length value is stored in byte before.

With the new definition all TFM samples with data header size 0x11 are now
recognized and described (see appended trid-v-new.txt trid-new.txt in
output). The definition is "good", that it does not misidentifies non TFM
samples. Unfortunately for some TFM samples (like Cochineal-alph.tfm
rpplru.tfm rtxbmi-rev.tfm ) the description as TeX Font Metric is not the
first. The main reason is that significant characteristic is done by 16-bit
1lh value.

TrID definitions, some samples and output are stored in archive
tfm_0x11.zip. I hope that my definition can be used in future version of
triddefs. As mentioned there exist other variants of TFM. I will try to handle
these in a future session.

With best wishes
J?rg Jenderek

24
Definitions DB change log / Re: Current - Year 2024
« Last post by Mark0 on April 04, 2024, 02:54:51 PM »
Updated:
  • AKT compressed archive (generic) (AKT)
  • AppleWin Saved State (v1) (AWS)
  • Wavefront Object (created by 3D Max) (OBJ)
  • Wavefront Object (created by Hexagon) (OBJ)
  • Wavefront Object (generic) (OBJ)
Added:
  • ARIA samples Bank (BNK)
  • OGW Log (LOG)
  • Bethesda game Map (MIF)
  • Nintendo DS ROM (NDS)
  • Wavefront Object (created by VXelements) (OBJ)
  • Pro Pixel Image bitmap (PPG)
  • Pro Pixel Demo Image (PP2D)
  • Pro Pixel 2D Palette Bank (PP2P)
  • Pro Pixel 2D Palette (PPP)
  • Ptex Texture (v1) (PTX)
  • STNG 'A Final Unity' Sprites (SPR/SPT)
  • TeX Font Metric (0x12) (TFM)
  • Xbox One Virtual Disk Info (XVI)
  • AppleWin Saved State (v2) (YAML)
  • AppleWin controller configuration (YAML)
25
Thanks!
26
Definitions DB change log / Re: Current - Year 2024
« Last post by Mark0 on April 01, 2024, 02:15:52 PM »
Updated:
  • Unity Assembly Definition (ASMDEF)
  • Origin Project (OPJ)
Added:
  • lEEt/OS Application (APP)
  • Open Brush 3D metadata (JSON)
  • Helix encoded MP3 audio (with LAME tag) (MP3)
  • Xing encoded MP3 audio (MP3)
  • Origin Project (Unicode compliant) (OPJU)
  • Sid Meier's Civilization Palette (PAL)
  • Open Brush 3D Sketch (SKETCH)
  • Open Brush 3D (TILT)
  • Lix gadget trigger area (TXT)
  • Lix level (TXT)
  • DANS Easy Metadata (XML)
  • X PixMap bitmap (XPM2) (XPM)
27
Hello trid users,

some days ago i looked at the content of an exotic CD-ROM. There are also
stored samples which are misidentified. The samples have TFM file name suffix

There exist other variants. So in this session i will handle only variant with
lh=0x12. I will explain later what this means.

So i run trid utility on my TFM samples with lh=0x12. The samples are not
recognized. Many are described as "Unknown!". For not unknown samples i get
many different descriptions, but all are wrong (see appended trid-v-old.txt in
output). I get many hundreds TFM samples. It took some time to get dozen of
non TFM samples which matches the misidentified TFM samples.

For comparison reason i also run the file format identification utility DROID
(See https://sourceforge.net/projects/droid/). Here the samples are also not
recognized and here i got no false description.

WFor comparison reason i also run file command (version 5.45) on such
samples. Here all samples are "recognized" and described as "TeX font metric
data". Also some more details are shown. In parenthesis the coding scheme name
{like () (FONTSPECIFIC) (EC Encoding /Cork/) (L7X Encoding /Lithuanian/) (TEX
MATH EXTENSION) (QX Encoding) (UNSPECIFIED)} is shown (see appended
file-k-5.45.txt in output). This can be seen more clearly when using only tex
magic pattern (see appended file-tex-5.45.txt in output). Unfortunately i get
for most samples also another description when using keep option -k of file
command. Even worse in some samples ( like aebx7.tfm cmvtt10.tfm rtxi.tfm
rtxmi.tfm texnansi-qplb.tfm txbsyc.tfm zpsycmrv.tfm ) the wrong description
comes first. This can be seen more clearly when using no keep going option of
file command (see appended file-5.45.txt in output). For the TFM samples mime
type application/x-tex-tfm is shown (see appended file-tex-i-5.45.txt in
output). Here no file name suffix is shown (see appended file-tex-ext-5.45.txt
in output).

Luckily i found page about TeX Font Metrics on file formats archive team web
site and on Wikipedia. So i use the first because the Wikipedia link is there
also mentioned and furthermore link to download samples are here listed. So
the reference URL in new definition is expressed by line like:
 <RefURL>http://fileformats.archiveteam.org/wiki/TeX_Font_Metrics/RefURL>

So i run tridscan on my samples to generate tfm-tex-0x12.trid.xml.
Afterwards i tried to understand the generated constructs and look if these
are always true. According to specification the six-word (24-byte) file header
contains twelve unsigned 16-bit integers which describes general TFM
characteristics (the length of the file, the range of character codes
contained in the font, and the size of each of the tables). According to
specification i patched file command ( See appended file.tmp in output).

On specification are some formulas listed like:
    bc-1 <= ec < =255
    ne <=256
    lf=6+lh+(ec-bc+1)+nw+nh+nd+ni+nl+nk+ne+np
That means that at least three fields (bc,ec,ne) are always lower 256. Because
the files are stored in big endian format that means upper byte of these
fields are nil. Apparently nearly all others of these twelve fields are below
256.  So at even offsets we have nil bytes. That is expressed by XML
constructs like:
   ...
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>20</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>22</Pos>
   </Pattern>
   <Pattern>

The only exceptions are number of words in the kern table and file length. The
first value is stored at offset 16 as field nk and is sometimes bigger than
255.  Also the file length is sometimes bigger than 255. That value is stored
at offset 0 as field lf in word units. By multiplying this value with 4 the
file size in bytes can be obtained.

If samples are real TFM that can be verified by running a command line tool
like:
   tftopl tgoth10.tfm  tgoth10.pl

Now comes the interesting part. At offset 2 the length of the header data is
in word units. For most of my inspected TFM samples this value is 18
(=0x12). The samples in this session all have this value. Together with upper
nil byte of ec (last character code in the font) this significant part is
expressed by XML construct like:
   <Bytes>001200</Bytes>
   <Pos>2</Pos>

As far as i can see there exist less than dozen of variants with other lh
values. In other variants the lh values does change only a little bit. I will
handle the other variants in a future session.

According to documentation the header[17] word contains a first byte called
the seven_bit_safe_flag, then two bytes that are ignored, and a fourth byte
called the face. When looking in file.tmp for seven_bit_safe_byte i get value
0 or 0x80.  Apparently for two ignored/unused bytes i get value nil.  For face
byte i got different values (like 0 0xea 0xee 0xf4). So the 2 unused bytes are
expressed by last construct. That look like:
   <Bytes>0000</Bytes>
   <Pos>93</Pos>

At offset 33 an ASCII like coding scheme name is stored. For misidentified
samples as this place i often got non-ASCII garbage with octal values. The
maximal string length is 39 and this length value is stored in byte before.
At offset 73 an ASCII like font family name is stored (like CMR ECBX
DUMMYSPAC-FONTFORGE TEX-PAGD8R-CSC UNSPECIFIED HelveNarBol) is stored.  The
maximal string length is 19 and this length value is stored in byte before.

With the new definition all TFM samples with header size 0x12 are now
recognized and described (see appended trid-v-new.txt trid-new.txt in
output). The definition is "good" , that it does not misidentifies non TFM
samples. Unfortunately for some TFM samples the description as TeX Font Metric
is not the first. The main reason is that significant characteristic is done
by 16-bit lh value.

TrID definitions, some samples and output are stored in archive
tfm_0x12.zip. I hope that my definition can be used in future version of
triddefs. As mentioned there exist other variants of TFM. I will try to handle
these in a future session.

With best wishes
J?rg Jenderek
28
Definitions DB change log / Re: Current - Year 2024
« Last post by Mark0 on March 27, 2024, 03:14:03 AM »
Updated:
  • DuneGraph Uncompressed bitmap (DGU/DG1)
  • GameBoy Sound System dump (GBS)
  • KiCad EESchema Netlist (NET)
  • Slight Atari Player module (SAP)
  • Cyber Paint Sequence animation (SEQ)
Added:
  • Autohotkey script (v2.x) (AHK)
  • Cyber Paint Cell animation (CEL)
  • Linux Preseed Configuration (CFG/SEED/TXT)
  • DISCO Discovery Document (DISCO)
  • DISCO Discovery Output (DISCOMAP/MAP)
  • Linux LiveCD info (DISKDEFINES)
  • Lucasfilm Games MIDI music (GMD)
  • PiyoPiyo Music (PMD)
  • Atari ST Megamax Modula-2 program/executable (v2) (PRG/ACC)
  • Atari ST Pack-Ice compressed program/executable (PRG/APP/TOS/TTP)
  • Atari Midi Sequencer song (SEQ)
  • Cyber Paint Sequence animation (variant) (SEQ)
  • Lucasfilm Games VOC Sound (SOU)
  • Sound Chip Synth patch (SYN)
Deleted:
  • Slight Atari Player music format (SAP)
29
Definitions DB change log / Re: Current - Year 2024
« Last post by Mark0 on March 25, 2024, 12:47:51 PM »
Updated:
  • 16bit COM executable Graham's TXT2COM (generic) (COM)
  • 16bit COM executable Graham's TXT2COM (v1.0) (COM)
  • 16bit COM executable Graham's TXT2COM (v1.1) (COM)
  • 16bit COM executable Graham's TXT2COM (v2.2) (COM)
  • 16bit COM executable Graham's TXT2COM (v2.03) (COM)
  • 16bit COM executable Graham's TXT2COM (v2.06) (COM)
  • 16bit COM executable Graham's TXT2COM (v2.10) (COM)
Added:
  • Impulse Tracker font (CFG)
  • 16bit COM executable Graham's TXT2COM (v2.0) (COM)
  • 16bit COM executable Graham's TXT2COM (v2.1) (COM)
  • 16bit COM executable Graham's TXT2PAS (v2.03) (COM)
  • 16bit COM executable Graham's TXT2PAS (v2.06) (COM)
  • 16bit COM executable Graham's TXT2PAS (v2.10) (COM)
  • 16bit COM executable Graham's TXT2RES (v1.0) (COM)
  • 16bit COM executable Graham's TXT2RES (v2.03) (COM)
  • 16bit COM executable Graham's TXT2RES (v2.06) (COM)
  • 16bit COM executable Graham's TXT2RES (v2.10) (COM)
  • Impulse Tracker sound Driver (DRV)
  • HALAC lossless compressed audio (v0.x) (HALAC)
  • HALIC bitmap (v0.x) (HALIC)
  • MMCMP compressed module (IT/MOD/S3M/XM)
  • SofTest encrypted answers data (JSON)
  • LEGO Batman 2 Saved Game (LEGOBATMAN2SAVEGAMEDATA)
  • Music Box music (SAV)
  • Silicon Graphics bitmap (SGI/BW/RGB/RGBA)
  • SofTest answers data archive (XMDX)
  • BrickLink XML inventory (XML)
  • MecaBricks XML unknow parts (XML)
Deleted:
  • Silicon Graphics bitmap (generic) (SGI)
  • Artisan Dates (XML)
30
Thanks! Will remove the old generic def and keep this one.
Pages: 1 2 [3] 4 5 ... 10