Recent Posts

Pages: 1 2 [3] 4 5 ... 10
21
Thanks!
22
Definitions DB change log / Re: Current - Year 2024
« Last post by Mark0 on April 01, 2024, 02:15:52 PM »
Updated:
  • Unity Assembly Definition (ASMDEF)
  • Origin Project (OPJ)
Added:
  • lEEt/OS Application (APP)
  • Open Brush 3D metadata (JSON)
  • Helix encoded MP3 audio (with LAME tag) (MP3)
  • Xing encoded MP3 audio (MP3)
  • Origin Project (Unicode compliant) (OPJU)
  • Sid Meier's Civilization Palette (PAL)
  • Open Brush 3D Sketch (SKETCH)
  • Open Brush 3D (TILT)
  • Lix gadget trigger area (TXT)
  • Lix level (TXT)
  • DANS Easy Metadata (XML)
  • X PixMap bitmap (XPM2) (XPM)
23
Hello trid users,

some days ago i looked at the content of an exotic CD-ROM. There are also
stored samples which are misidentified. The samples have TFM file name suffix

There exist other variants. So in this session i will handle only variant with
lh=0x12. I will explain later what this means.

So i run trid utility on my TFM samples with lh=0x12. The samples are not
recognized. Many are described as "Unknown!". For not unknown samples i get
many different descriptions, but all are wrong (see appended trid-v-old.txt in
output). I get many hundreds TFM samples. It took some time to get dozen of
non TFM samples which matches the misidentified TFM samples.

For comparison reason i also run the file format identification utility DROID
(See https://sourceforge.net/projects/droid/). Here the samples are also not
recognized and here i got no false description.

WFor comparison reason i also run file command (version 5.45) on such
samples. Here all samples are "recognized" and described as "TeX font metric
data". Also some more details are shown. In parenthesis the coding scheme name
{like () (FONTSPECIFIC) (EC Encoding /Cork/) (L7X Encoding /Lithuanian/) (TEX
MATH EXTENSION) (QX Encoding) (UNSPECIFIED)} is shown (see appended
file-k-5.45.txt in output). This can be seen more clearly when using only tex
magic pattern (see appended file-tex-5.45.txt in output). Unfortunately i get
for most samples also another description when using keep option -k of file
command. Even worse in some samples ( like aebx7.tfm cmvtt10.tfm rtxi.tfm
rtxmi.tfm texnansi-qplb.tfm txbsyc.tfm zpsycmrv.tfm ) the wrong description
comes first. This can be seen more clearly when using no keep going option of
file command (see appended file-5.45.txt in output). For the TFM samples mime
type application/x-tex-tfm is shown (see appended file-tex-i-5.45.txt in
output). Here no file name suffix is shown (see appended file-tex-ext-5.45.txt
in output).

Luckily i found page about TeX Font Metrics on file formats archive team web
site and on Wikipedia. So i use the first because the Wikipedia link is there
also mentioned and furthermore link to download samples are here listed. So
the reference URL in new definition is expressed by line like:
 <RefURL>http://fileformats.archiveteam.org/wiki/TeX_Font_Metrics/RefURL>

So i run tridscan on my samples to generate tfm-tex-0x12.trid.xml.
Afterwards i tried to understand the generated constructs and look if these
are always true. According to specification the six-word (24-byte) file header
contains twelve unsigned 16-bit integers which describes general TFM
characteristics (the length of the file, the range of character codes
contained in the font, and the size of each of the tables). According to
specification i patched file command ( See appended file.tmp in output).

On specification are some formulas listed like:
    bc-1 <= ec < =255
    ne <=256
    lf=6+lh+(ec-bc+1)+nw+nh+nd+ni+nl+nk+ne+np
That means that at least three fields (bc,ec,ne) are always lower 256. Because
the files are stored in big endian format that means upper byte of these
fields are nil. Apparently nearly all others of these twelve fields are below
256.  So at even offsets we have nil bytes. That is expressed by XML
constructs like:
   ...
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>20</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>22</Pos>
   </Pattern>
   <Pattern>

The only exceptions are number of words in the kern table and file length. The
first value is stored at offset 16 as field nk and is sometimes bigger than
255.  Also the file length is sometimes bigger than 255. That value is stored
at offset 0 as field lf in word units. By multiplying this value with 4 the
file size in bytes can be obtained.

If samples are real TFM that can be verified by running a command line tool
like:
   tftopl tgoth10.tfm  tgoth10.pl

Now comes the interesting part. At offset 2 the length of the header data is
in word units. For most of my inspected TFM samples this value is 18
(=0x12). The samples in this session all have this value. Together with upper
nil byte of ec (last character code in the font) this significant part is
expressed by XML construct like:
   <Bytes>001200</Bytes>
   <Pos>2</Pos>

As far as i can see there exist less than dozen of variants with other lh
values. In other variants the lh values does change only a little bit. I will
handle the other variants in a future session.

According to documentation the header[17] word contains a first byte called
the seven_bit_safe_flag, then two bytes that are ignored, and a fourth byte
called the face. When looking in file.tmp for seven_bit_safe_byte i get value
0 or 0x80.  Apparently for two ignored/unused bytes i get value nil.  For face
byte i got different values (like 0 0xea 0xee 0xf4). So the 2 unused bytes are
expressed by last construct. That look like:
   <Bytes>0000</Bytes>
   <Pos>93</Pos>

At offset 33 an ASCII like coding scheme name is stored. For misidentified
samples as this place i often got non-ASCII garbage with octal values. The
maximal string length is 39 and this length value is stored in byte before.
At offset 73 an ASCII like font family name is stored (like CMR ECBX
DUMMYSPAC-FONTFORGE TEX-PAGD8R-CSC UNSPECIFIED HelveNarBol) is stored.  The
maximal string length is 19 and this length value is stored in byte before.

With the new definition all TFM samples with header size 0x12 are now
recognized and described (see appended trid-v-new.txt trid-new.txt in
output). The definition is "good" , that it does not misidentifies non TFM
samples. Unfortunately for some TFM samples the description as TeX Font Metric
is not the first. The main reason is that significant characteristic is done
by 16-bit lh value.

TrID definitions, some samples and output are stored in archive
tfm_0x12.zip. I hope that my definition can be used in future version of
triddefs. As mentioned there exist other variants of TFM. I will try to handle
these in a future session.

With best wishes
J?rg Jenderek
24
Definitions DB change log / Re: Current - Year 2024
« Last post by Mark0 on March 27, 2024, 03:14:03 AM »
Updated:
  • DuneGraph Uncompressed bitmap (DGU/DG1)
  • GameBoy Sound System dump (GBS)
  • KiCad EESchema Netlist (NET)
  • Slight Atari Player module (SAP)
  • Cyber Paint Sequence animation (SEQ)
Added:
  • Autohotkey script (v2.x) (AHK)
  • Cyber Paint Cell animation (CEL)
  • Linux Preseed Configuration (CFG/SEED/TXT)
  • DISCO Discovery Document (DISCO)
  • DISCO Discovery Output (DISCOMAP/MAP)
  • Linux LiveCD info (DISKDEFINES)
  • Lucasfilm Games MIDI music (GMD)
  • PiyoPiyo Music (PMD)
  • Atari ST Megamax Modula-2 program/executable (v2) (PRG/ACC)
  • Atari ST Pack-Ice compressed program/executable (PRG/APP/TOS/TTP)
  • Atari Midi Sequencer song (SEQ)
  • Cyber Paint Sequence animation (variant) (SEQ)
  • Lucasfilm Games VOC Sound (SOU)
  • Sound Chip Synth patch (SYN)
Deleted:
  • Slight Atari Player music format (SAP)
25
Definitions DB change log / Re: Current - Year 2024
« Last post by Mark0 on March 25, 2024, 12:47:51 PM »
Updated:
  • 16bit COM executable Graham's TXT2COM (generic) (COM)
  • 16bit COM executable Graham's TXT2COM (v1.0) (COM)
  • 16bit COM executable Graham's TXT2COM (v1.1) (COM)
  • 16bit COM executable Graham's TXT2COM (v2.2) (COM)
  • 16bit COM executable Graham's TXT2COM (v2.03) (COM)
  • 16bit COM executable Graham's TXT2COM (v2.06) (COM)
  • 16bit COM executable Graham's TXT2COM (v2.10) (COM)
Added:
  • Impulse Tracker font (CFG)
  • 16bit COM executable Graham's TXT2COM (v2.0) (COM)
  • 16bit COM executable Graham's TXT2COM (v2.1) (COM)
  • 16bit COM executable Graham's TXT2PAS (v2.03) (COM)
  • 16bit COM executable Graham's TXT2PAS (v2.06) (COM)
  • 16bit COM executable Graham's TXT2PAS (v2.10) (COM)
  • 16bit COM executable Graham's TXT2RES (v1.0) (COM)
  • 16bit COM executable Graham's TXT2RES (v2.03) (COM)
  • 16bit COM executable Graham's TXT2RES (v2.06) (COM)
  • 16bit COM executable Graham's TXT2RES (v2.10) (COM)
  • Impulse Tracker sound Driver (DRV)
  • HALAC lossless compressed audio (v0.x) (HALAC)
  • HALIC bitmap (v0.x) (HALIC)
  • MMCMP compressed module (IT/MOD/S3M/XM)
  • SofTest encrypted answers data (JSON)
  • LEGO Batman 2 Saved Game (LEGOBATMAN2SAVEGAMEDATA)
  • Music Box music (SAV)
  • Silicon Graphics bitmap (SGI/BW/RGB/RGBA)
  • SofTest answers data archive (XMDX)
  • BrickLink XML inventory (XML)
  • MecaBricks XML unknow parts (XML)
Deleted:
  • Silicon Graphics bitmap (generic) (SGI)
  • Artisan Dates (XML)
26
Thanks! Will remove the old generic def and keep this one.
27
Definitions DB change log / Re: Current - Year 2024
« Last post by Mark0 on March 22, 2024, 08:56:06 PM »
Updated:
  • MAGIX data (generic) ()
  • RPM Package source (src.rpm) (RPM)
Added:
  • Coordinate 3D format (C3D)
  • Dynamix 3D data container (C3D)
  • Caligari TrueSpace Object (v1.x) (COB)
  • 16bit COM ComprEXE compressed executable (v1.0) (COM)
  • 16bit COM executable DOCMAKER (v1.2) (COM)
  • NeoPaint Help (HLP)
  • 16bit DOS ComprEXE compressed Executable (v1.0) (EXE)
  • Artisan Mat Style (PAAF)
  • Artisan Personal Art Kit (CCPKPSTM) (PAKIT)
  • Artisan Personal Art Kit (PAKT) (PAKIT)
  • RPM Package (v3.0 source) (RPM/SPM)
  • Caligari TrueSpace Scene (SCN)
  • NeoPaint Settings (SET)
  • CruZer's SFV Checker checksum (SFV)
  • DF CrcSfv checksum (SFV)
  • Easy SFV Creator checksum (v2.x) (SFV)
  • FireSFV checksum (SFV)
  • FlashSFV checksum (SFV)
  • GwildorSFV checksum (v1.x) (SFV)
  • QuickSFV checksum (v2.x) (SFV)
  • SFV32nix checksum (v1.x) (SFV)
  • SFVGold checksum (v1.x) (SFV)
  • SFVManager checksum (v1.x) (SFV)
  • SFVit checksum (v1.x) (SFV)
  • SOURmp3 CheckSFV checksum (SFV)
  • Sour SFV checksum (v1.x) (SFV)
  • WIN-SFV32 checksum (v1.x) (SFV)
  • cksfv checksum (v1.x) (SFV)
  • Artisan Skin (SKINX)
  • Dynamix SPT game data container (SPT)
  • Artisan Dates (XML)
Deleted:
  • Coordinate 3D (subset of ADTech File Format) file (common) (C3D)
  • Coordinate 3D (subset of ADTech File Format) file (more generic) (C3D)
  • RoboPlay Player plugin (PLY) (duplicated)
28
TrID File Identifier / replacement bitmap-sgi.trid.xml for Silicon Graphics bitmap
« Last post by jenderek on March 21, 2024, 05:55:13 PM »
Hello trid users,

some days ago i looked at the content of an exotic CD-ROM. There are also
stored samples which are misidentified as Silicon Graphics bitmap.

So i run trid utility on such graphics and related files. All samples
are at least described as "Silicon Graphics bitmap (generic)" by
bitmap-sgi-generic.trid.xml. No mime type and reference is shown. As
suffix SGI is shown. Most RGB samples are described with higher
priority as "Silicon Graphics RGB bitmap" by
bitmap-sgi-rgb.trid.xml. Here RGB is listed as suffix. Many BW samples
are described with higher priority as "Silicon Graphics B/W bitmap" by
bitmap-sgi-bw.trid.xml. Here BW is shown as suffix. The TFM are
misidentified as Silicon Graphics bitmap. These few samples are in
reality TeX font metrics (see appended trid-v-old.txt in output).

To check if samples are really SGI graphics you can use command line tools of
some graphical software (like ImageMagick, XnView) by lines like:
   identify -verbose *.sgi *.tfm
   nconvert -in sgi -info *.sgi *.tfm

Then real graphics are described as "SGI" or "SGI (Irix RGB image)" with
dimensions by ImageMagick (see appended identify-verbose.txt identify.txt in
output) and as sgi or "SGI RGB" with correct dimensions by XnView (see appended nconvert-info.txt).

For comparison reason i also run the file format identification utility DROID
(See https://sourceforge.net/projects/droid/). Here most samples are
described as "Silicon Graphics Image" by PUID x-fmt/140. Here mime
type image/x-sgi-bw is listed. The artificial samples with 2 and 5
channels are skipped. Also the TFM samples are not misidentified.
Furthermore here only RGB BW file name suffix are considered as valid.
The 2 suffix RGBA SGI are considered here as invalid (see appended
droid-sgi.csv in output).

For comparison reason i also run file command (version 5.45) on such samples.
Here the samples are also recognized. The graphics are here described as "SGI
image data" The TFM samples are correctly described as "TeX font metric data"
(see appended file-5.45.txt in output). But with keep going option the TFM
samples are also described wrong as SGI image data with invalid "0-D"
dimension and "high" 16 channels (see appended file-k-5.45.txt in
output). According to specification patched variant now shows correct
information (see appended file-ext.tmp file-i.tmp file.tmp in output). The
mime type application/x-tex-tfm is here shown for TFM samples and for graphics
generic application/octet-stream shown (see appended file-i-5.45.txt in
output). Here no file name suffix is shown (see appended file-ext-5.45.txt in
output).

On Linux according to shared MIME-info database the samples are called "SGI
image". Here image/x-sgi is shown as mime type. Here only sgi is listed as
suffix. That information can be seen in freedesktop.org.xml.in source found
for example on gitlab.freedesktop.org.

Luckily i found information about such graphic file format on archive team web
site and Wikipedia. That is expressed inside new definitions
bitmap-sgi.trid.xml by line like:

   <RefURL>http://fileformats.archiveteam.org/wiki/SGI_(image_file_format)</RefURL>

There also link to Wikipedia page is here are mentioned. The advantage is that here also download links to samples and software
are listed.

So i run tridscan on my inspected source samples to get new definition
bitmap-sgi.trid.xml as replacement for bitmap-sgi-generic.trid.xml.  The four
file name suffix are expressed by line like:

   <Ext>BW/RGB/RGBA/SGI</Ext>

On Wikipedia also 2 suffix INT and INTA are mentioned. Unfortunately i found no
such samples on my system. So at the moment i do not add these 2 suffix.

On Wikipedia image/sgi is listed as mime type, but this is not officially
registered at IANA. So i choose what is used on Linux systems.  This mime
type is expressed by line like:

   <Mime>application/x-source-rpm</Mime>

So i looked at generated patterns and try to understand and refine it by
looking at specifications. The first construct looks like:

   <Bytes>01DA</Bytes>
   <Pos>0</Pos>

According to documentation that is the magic pattern for such graphics. This
is used inside bitmap-sgi-generic.trid.xml. Unfortunately 2 byte pattern is not
unique enough. So by bad circumstances this is also true for other file
formats likes some Tex font metric. So more patterns are needed.

The second construct looks like:

   <Bytes>00</Bytes>
   <Pos>4</Pos>

According to documentation at offset 4 the dimensions are stored as 2 byte
big endian integer. Allowed values are three values. 1 means scanline, 2 means
dimension XSIZExYSIZE and 3 means XSIZExYSIZExZSIZE dimensions. That means the
upper byte is not used and therefore always nil. DROID tool explicitly check
for these allowed values and thereby skip the TFM samples with invalid 0
value.

Third XML construct looks like

   <Bytes>00</Bytes>
   <Pos>10</Pos>

According to documentation at offset 10 the channels are stored as 2 byte
big endian integer. value 1 means black and white. highest observed value in
my samples was 4. That means RGB+ALPHA channel. If i understand the
documentation right it is maybe possible to have samples with higher channels.
For examples i can imagine an animated RGBA. So then an additional time
component may be added and the channel number would be 5. So the channel
number is probably always lower 256. That means the upper byte is probably
always nil and third XML construct is true.


Third XML construct looks like:

   <Bytes>000000</Bytes>
   <Pos>12</Pos>

According to documentation at offset 12 the minimum pixel value in the image is stored as 4 byte
big endian integer PINMIN. Often this value is 0 or low, but i can imagine that
there exist samples where this value is reaching maximum. So i delete that pattern.


Forth XML construct looks like:

   <Bytes>0000</Bytes>
   <Pos>16</Pos>

According to documentation at offset 16 the maximum pixel value in the image
is stored as 4 byte big endian integer PINMAX. Often this value is 225 or
similar, but i can imagine that there exist samples where this value is
reaching maximum. So i delete that pattern.

Fifth XML construct looks like:

   <Bytes>00000000</Bytes>
   <Pos>20</Pos>

According to documentation at offset 20 4 used bytes are stored. In my
examples the value is zero. I assume that this is always true. So i keep that pattern.

The XML construct number six looks like:

   <Bytes>000000000000000000000000000000000000000000000000</Bytes>
   <Pos>87</Pos>

According to documentation at offset 24 an image can be stored as 80
bytes. This ASCII string is null terminated. At offset 104 COLORMAP
is stored as 4 byte big endian integer. Allowed value are in range 1-3. So the
3 upper bytes of  COLORMAP are always nil. Assuming that string reach maximal
length the only terminating nil byte and the 3 upper byes of COLORMAP will
survive. So the construct will shrink and become like:

   <Bytes>00000000 </Bytes>
   <Pos>103</Pos>

According to documentation from offset 108 til 511 are dummy bytes to scale
the header to 512 bytes. In some documents is written that these should be set
to nil. That is often true but in some samples some bytes are not nil. So i do
not rely on existence of nil bytes in that area. So i delete corresponding
patterns. These look like:


   <Pattern>
      <Bytes>00</Bytes>
      <Pos>112</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00000000</Bytes>
      <Pos>114</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>120</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00000000000000000000</Bytes>
      <Pos>122</Pos>
   </Pattern>
   <Pattern>
      <Bytes>0000000000</Bytes>
      <Pos>136</Pos>
   </Pattern>
   <Pattern>
      <Bytes>0000000000000000000000000000000000000
      <Pos>152</Pos>
   </Pattern>


With the new definition all my graphic bitmaps still recognized and
described with correct mime type. Now TFM samples are not misidentified
(see appended trid-v-new.txt in output).

TrID definitions, some samples and output are stored in archive sgi_tfm.zip. I
hope that my definition can be used in future version of triddefs.

Then of course the other TrID definitions must updated. Unfortunately i can not do
this because i have too few samples. Especially samples with INT and INTA suffix. I
am also not sure about samples with more channels.

There are no definitions for TeX Font Metric. Unfortunately for TFM samples there
exist no unique and long pattern. So i will need some time to do this work in
the future.

With best wishes
J?rg Jenderek

29
TrID File Identifier / Re: ark-rpm-src-v30.trid.xml for source variant of RPM Package
« Last post by Mark0 on March 20, 2024, 10:48:44 PM »
Thanks Joerg!
30
Definitions DB change log / Re: Current - Year 2024
« Last post by Mark0 on March 19, 2024, 02:02:44 PM »
Added:
  • WiredTiger journal (0000000001)
  • Amazon game data (Atari ST) (CST)
  • Intelligent Games Ltd resource data (HDR/RES)
  • TETRIS! High score (HI)
  • WiredTiger Lock (LOCK)
  • Intelligent Games Video/cutscene (MOV)
  • RoboPlay Player plugin (PLY)
  • Atari ST Atomik packed program/executable (v3.5) (PRG/TOS)
  • Atari ST HiSoft BASIC compiled program/executable (PRG/TOS)
  • Atari ST LHArc SFX archive (v3.10) (PRG/TOS)
  • Atari ST STOS BASIC compiled program/executable (PRG)
  • Atari ST TDI Modula-2/ST compiled program/executable (PRG/ACC)
  • Steel Panthers Shapes data (SHP)
  • WiredTiger data  (WT)
Pages: 1 2 [3] 4 5 ... 10