Author Topic: replacement bitmap-sgi.trid.xml for Silicon Graphics bitmap  (Read 472 times)

jenderek

  • Sr. Member
  • ****
  • Posts: 375
replacement bitmap-sgi.trid.xml for Silicon Graphics bitmap
« on: March 21, 2024, 05:55:13 PM »
Hello trid users,

some days ago i looked at the content of an exotic CD-ROM. There are also
stored samples which are misidentified as Silicon Graphics bitmap.

So i run trid utility on such graphics and related files. All samples
are at least described as "Silicon Graphics bitmap (generic)" by
bitmap-sgi-generic.trid.xml. No mime type and reference is shown. As
suffix SGI is shown. Most RGB samples are described with higher
priority as "Silicon Graphics RGB bitmap" by
bitmap-sgi-rgb.trid.xml. Here RGB is listed as suffix. Many BW samples
are described with higher priority as "Silicon Graphics B/W bitmap" by
bitmap-sgi-bw.trid.xml. Here BW is shown as suffix. The TFM are
misidentified as Silicon Graphics bitmap. These few samples are in
reality TeX font metrics (see appended trid-v-old.txt in output).

To check if samples are really SGI graphics you can use command line tools of
some graphical software (like ImageMagick, XnView) by lines like:
   identify -verbose *.sgi *.tfm
   nconvert -in sgi -info *.sgi *.tfm

Then real graphics are described as "SGI" or "SGI (Irix RGB image)" with
dimensions by ImageMagick (see appended identify-verbose.txt identify.txt in
output) and as sgi or "SGI RGB" with correct dimensions by XnView (see appended nconvert-info.txt).

For comparison reason i also run the file format identification utility DROID
(See https://sourceforge.net/projects/droid/). Here most samples are
described as "Silicon Graphics Image" by PUID x-fmt/140. Here mime
type image/x-sgi-bw is listed. The artificial samples with 2 and 5
channels are skipped. Also the TFM samples are not misidentified.
Furthermore here only RGB BW file name suffix are considered as valid.
The 2 suffix RGBA SGI are considered here as invalid (see appended
droid-sgi.csv in output).

For comparison reason i also run file command (version 5.45) on such samples.
Here the samples are also recognized. The graphics are here described as "SGI
image data" The TFM samples are correctly described as "TeX font metric data"
(see appended file-5.45.txt in output). But with keep going option the TFM
samples are also described wrong as SGI image data with invalid "0-D"
dimension and "high" 16 channels (see appended file-k-5.45.txt in
output). According to specification patched variant now shows correct
information (see appended file-ext.tmp file-i.tmp file.tmp in output). The
mime type application/x-tex-tfm is here shown for TFM samples and for graphics
generic application/octet-stream shown (see appended file-i-5.45.txt in
output). Here no file name suffix is shown (see appended file-ext-5.45.txt in
output).

On Linux according to shared MIME-info database the samples are called "SGI
image". Here image/x-sgi is shown as mime type. Here only sgi is listed as
suffix. That information can be seen in freedesktop.org.xml.in source found
for example on gitlab.freedesktop.org.

Luckily i found information about such graphic file format on archive team web
site and Wikipedia. That is expressed inside new definitions
bitmap-sgi.trid.xml by line like:

   <RefURL>http://fileformats.archiveteam.org/wiki/SGI_(image_file_format)</RefURL>

There also link to Wikipedia page is here are mentioned. The advantage is that here also download links to samples and software
are listed.

So i run tridscan on my inspected source samples to get new definition
bitmap-sgi.trid.xml as replacement for bitmap-sgi-generic.trid.xml.  The four
file name suffix are expressed by line like:

   <Ext>BW/RGB/RGBA/SGI</Ext>

On Wikipedia also 2 suffix INT and INTA are mentioned. Unfortunately i found no
such samples on my system. So at the moment i do not add these 2 suffix.

On Wikipedia image/sgi is listed as mime type, but this is not officially
registered at IANA. So i choose what is used on Linux systems.  This mime
type is expressed by line like:

   <Mime>application/x-source-rpm</Mime>

So i looked at generated patterns and try to understand and refine it by
looking at specifications. The first construct looks like:

   <Bytes>01DA</Bytes>
   <Pos>0</Pos>

According to documentation that is the magic pattern for such graphics. This
is used inside bitmap-sgi-generic.trid.xml. Unfortunately 2 byte pattern is not
unique enough. So by bad circumstances this is also true for other file
formats likes some Tex font metric. So more patterns are needed.

The second construct looks like:

   <Bytes>00</Bytes>
   <Pos>4</Pos>

According to documentation at offset 4 the dimensions are stored as 2 byte
big endian integer. Allowed values are three values. 1 means scanline, 2 means
dimension XSIZExYSIZE and 3 means XSIZExYSIZExZSIZE dimensions. That means the
upper byte is not used and therefore always nil. DROID tool explicitly check
for these allowed values and thereby skip the TFM samples with invalid 0
value.

Third XML construct looks like

   <Bytes>00</Bytes>
   <Pos>10</Pos>

According to documentation at offset 10 the channels are stored as 2 byte
big endian integer. value 1 means black and white. highest observed value in
my samples was 4. That means RGB+ALPHA channel. If i understand the
documentation right it is maybe possible to have samples with higher channels.
For examples i can imagine an animated RGBA. So then an additional time
component may be added and the channel number would be 5. So the channel
number is probably always lower 256. That means the upper byte is probably
always nil and third XML construct is true.


Third XML construct looks like:

   <Bytes>000000</Bytes>
   <Pos>12</Pos>

According to documentation at offset 12 the minimum pixel value in the image is stored as 4 byte
big endian integer PINMIN. Often this value is 0 or low, but i can imagine that
there exist samples where this value is reaching maximum. So i delete that pattern.


Forth XML construct looks like:

   <Bytes>0000</Bytes>
   <Pos>16</Pos>

According to documentation at offset 16 the maximum pixel value in the image
is stored as 4 byte big endian integer PINMAX. Often this value is 225 or
similar, but i can imagine that there exist samples where this value is
reaching maximum. So i delete that pattern.

Fifth XML construct looks like:

   <Bytes>00000000</Bytes>
   <Pos>20</Pos>

According to documentation at offset 20 4 used bytes are stored. In my
examples the value is zero. I assume that this is always true. So i keep that pattern.

The XML construct number six looks like:

   <Bytes>000000000000000000000000000000000000000000000000</Bytes>
   <Pos>87</Pos>

According to documentation at offset 24 an image can be stored as 80
bytes. This ASCII string is null terminated. At offset 104 COLORMAP
is stored as 4 byte big endian integer. Allowed value are in range 1-3. So the
3 upper bytes of  COLORMAP are always nil. Assuming that string reach maximal
length the only terminating nil byte and the 3 upper byes of COLORMAP will
survive. So the construct will shrink and become like:

   <Bytes>00000000 </Bytes>
   <Pos>103</Pos>

According to documentation from offset 108 til 511 are dummy bytes to scale
the header to 512 bytes. In some documents is written that these should be set
to nil. That is often true but in some samples some bytes are not nil. So i do
not rely on existence of nil bytes in that area. So i delete corresponding
patterns. These look like:


   <Pattern>
      <Bytes>00</Bytes>
      <Pos>112</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00000000</Bytes>
      <Pos>114</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>120</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00000000000000000000</Bytes>
      <Pos>122</Pos>
   </Pattern>
   <Pattern>
      <Bytes>0000000000</Bytes>
      <Pos>136</Pos>
   </Pattern>
   <Pattern>
      <Bytes>0000000000000000000000000000000000000
      <Pos>152</Pos>
   </Pattern>


With the new definition all my graphic bitmaps still recognized and
described with correct mime type. Now TFM samples are not misidentified
(see appended trid-v-new.txt in output).

TrID definitions, some samples and output are stored in archive sgi_tfm.zip. I
hope that my definition can be used in future version of triddefs.

Then of course the other TrID definitions must updated. Unfortunately i can not do
this because i have too few samples. Especially samples with INT and INTA suffix. I
am also not sure about samples with more channels.

There are no definitions for TeX Font Metric. Unfortunately for TFM samples there
exist no unique and long pattern. So i will need some time to do this work in
the future.

With best wishes
J?rg Jenderek


Mark0

  • Administrator
  • Hero Member
  • *****
  • Posts: 2744
    • Mark0's Home Page
Re: replacement bitmap-sgi.trid.xml for Silicon Graphics bitmap
« Reply #1 on: March 22, 2024, 11:50:37 PM »
Thanks! Will remove the old generic def and keep this one.