Author Topic: replacing/deleting bitmap-bmp.trid.xml bitmap-rle-bmp.trid.xml for Windows Bitma  (Read 714 times)

jenderek

  • Sr. Member
  • ****
  • Posts: 375
Hello trid users,

Some days ago i run the cleaning tool czkawka found on
https://qarmin.github.io/czkawka/. One menu item concerns bad
extensions. After running tool i looked in saved file list
results_bad_extensions.txt for bad extension examples.

One listed extension is BMP. This is the extension for Windows
Bitmaps. Some samples are considered by czkawka as Windows Bitmap but
do not have the expected suffix BMP or DIB.

So i run trid utility on such "bitmap" examples. All my examples (8531
including duplicates) are described with low priority as "Windows
Bitmap" by bitmap-bmp.trid.xml. All real "bitmap" examples" are
described by variants with version information by 8 definitions like:
   bitmap-bmp-v1.trid.xml
   bitmap-bmp-v2a.trid.xml
   bitmap-bmp-v2o.trid.xml
   bitmap-bmp-v2.trid.xml
   bitmap-bmp-v3a.trid.xml
   bitmap-bmp-v3.trid.xml
   bitmap-bmp-v4.trid.xml
   bitmap-bmp-v5.trid.xml
See appended output/trid-v-old.txt. At least 2 examples CLIPPER.BM_
and DEFAULT.COL are definitively not bitmap graphics. This can be
verified by command line graphic tools (XnView and ImageMagick) via
command lines like:
    nconvert -info *   >   nconvert.txt
    identify   *   >   identify.txt
So we know that these are really not bitmaps. When looking inside
bitmap-bmp.trid.xml we see that this contains only one XML
construct. This looks like:
   <Bytes>424D</Bytes>
   <ASCII> B M</ASCII>
   <Pos>0</Pos>

So this is too generic. So only 16 bits are used for recognition. In
file command documentation at least 32 bits should be used to have a
reliable recognition. So every file that start with 2 byte string BM
is described as Windows Bitmap. This is the case for sample
DEFAULT.COL which is just an text file with first line like: BMP 255 0
0 This example is found in SequoiaView program directory. This is a
disk space visualization tool found for example at
www.win.tue.nl/sequoiaview.
The sample CLIPPER.BM_ is the cazip expanded sample found in
installation medium for clipper software (version 5.3).

For comparison reason i check these examples by file command
utility. When running file command (version 5.44) here CLIPPER.BM_ and
DEFAULT.COL are not misidentified and the others are described
correctly as "PC bitmap" with sub classification phrase like ",
Windows " or ", OS/2" (See appended output/file-5.44.txt). The correct
mime type image/bmp is used here (See appended
output/file-i-5.44.txt). This utility expects usually as suffix bmp or rle
(See appended output/file-ext-5.44.txt).

For comparison reason i also run the file format identification
utility DROID ( See https://sourceforge.net/projects/droid/). This
does not misidentify CLIPPER.BM_ and DEFAULT.COL. Most of the others
real bitmaps are described as "Windows Bitmap" or "OS/2 Bitmap". Here
the correct mime type image/bmp is shown. It does sub
classification. In version field something like 2.0, 3.0, 4.0 or 5.0
is listed. The variant for Adobe Photoshop (rgb32h52.bmp
rgba32h56.bmp) and newer Windows (wix_ui_dialog.bmp) are not
recognized. Only BMP is considered as valid suffix. The others (OS2
PQG WBF hcp spb SYS) are considered here as "bad" (see
EXTENSION_MISMATCH true output/droid-bmp.csv).

So i run tridscan on real bitmap samples to generate replacement
bitmap-bmp-all.trid.xml. I must take care to avoid adding "bad" bitmap
examples. Such samples like os2-invalid-bpp.bmp or 1241729-1.bmp are
used as example for corrupted bitmaps or used for crash tests. The
samples x-fmt-270-signature-id-505.bmp fmt-118-signature-id-120.bmp
fmt-119-signature-id-121.bmp are used by DROID tool to recognize
bitmaps. Therefore these samples contain only some leading bytes and
contain some times "nonsense" values in some fields. But sample
TEAMcol-64.bmp is at the edge of validity. Some programs like
IrfanView can open it, but some like XnView or ImageMagick Display can
not open it.

Then i looked inside definition what patterns occur here and try to
understand why things happens. The first XML is like in
bitmap-bmp.trid.xml. The third XML construct looks like
   <Bytes>000000</Bytes>
   <Pos>15</Pos>
That is the average of all sub classification variants.  So in
bitmap-bmp-v2.trid.xml for Windows Bitmap (v2) we get here:
   <Bytes>40000000</Bytes>
   <Pos>14</Pos>
And in bitmap-bmp-v3.trid.xml for Windows Bitmap (v3) we get here:
   <Bytes>28000000</Bytes>
   <Pos>14</Pos>

According to documentation at offset 14 the DIB header size is stored
as 4 byte little endian. So the above XML constructs means 64 or 40
DIB header size. And in documentation only eight DIB header sizes are
mentioned:
12   0Ch   bitmap-bmp-v1.trid.xml
16   10h   bitmap-bmp-v2o.trid.xml
40   28h   bitmap-bmp-v3.trid.xml
52   34h   bitmap-bmp-v2a.trid.xml
56   38h   bitmap-bmp-v3a.trid.xml
64   40h   bitmap-bmp-v2.trid.xml
108   6Ch   bitmap-bmp-v4.trid.xml
124   7Ch   bitmap-bmp-v5.trid.xml
The highest value is 124. That is 7C hexadecimal. So the upper 3 bytes
of this field are never used and are nil. That means third XML
construct is always true.

According to file command there may exist 2 additional variants:
24   18h   OS/2 2.x format (DIB header size=24)   bitmap-bmp-v2-24.trid.xml
48   30h   OS/2 2.x format (DIB header size=48)   bitmap-bmp-v2-48.trid.xml
I am not sure if this is really true because i myself do not found
such images. Just in case that there exist such exotic variants i
created two additional variants bitmap-bmp-v2-24.trid.xml
bitmap-bmp-v2-48.trid.xml.

That information can be found on page about BMP on file formats
archive team web site. So this indicated by line like:
   <RefURL>http://fileformats.archiveteam.org/wiki/BMP</RefURL>

If now all BMP variants with sub classification are described by
specific TrID definitions then of course the generic definitions like
bitmap-bmp.trid.xml or bitmap-bmp-all.trid.xml are not needed any
more. So probably mis-identification are then avoided.

The second XML construct inside bitmap-bmp-all.trid.xml looks like:
   <Bytes>00</Bytes>
   <Pos>12</Pos>

According to documentation at offset 10 the starting address where the
bitmap image data (pixel array) begins is stored as 4 byte little
endian integer. That information is also shown by file command via
phrase ", bits offset". Here often i get low values like:
   1078 142 122 118 94 74 70 66 54   

Highest value 1078 here is 0436 hexadecimal. So the 2 upper bytes are
nil. This is not surprising, because after the 2 header parts comes in
worst case a few hundreds bytes with color table entries. And in worst
case there is only a small gap to align parts on machine suited
borders.

The sample TEAMcol-64.bmp is an "gray zone". Here the stored offset
value is 201326686. That is hexadecimal 0C00005E. That is an overflow
over the real file size. So most tools do not accept, but a few like
IrfanView do. In consequence also the highest byte becomes not nil in
definition bitmap-bmp-all.trid.xml.

Then there arises a problem with the extensions. The standard suffix
is BMP, but on BMP page on file formats archive team also more
extensions are listed.

Here i get 2 groups. One group contains samples with explainable
extensions. That in some samples another suffix is used to emphasize a
special aspect of such bitmaps. So for Run Length Encoded (RLE)
compressed PC bitmap images the suffix RLE is mentioned. My few RLE
samples like EGALOGO.RLE were described with highest priority as "Run
Length Encoded bitmap" by bitmap-rle-bmp.trid.xml. Now i use a kind of
ping-pong-tactic to improve things. When i look at such samples via
file command then the additional aspect is expressed by phrase like "2
compression," or "1 compression," and and 2 suffix are listed here by
phrase "bmp/rle" ( file-5.44.txt file-ext-5.44.txt in
rle_/output). When i look for such samples on my system i get around
79. After skipping crash test and corrupted examples then around 14
are left. A few are already described by bitmap-rle-bmp.trid.xml like:
   field.bmp
   pastoral.bmp
   room.bmp
   NASABALL.BMP
But a few are not like:
   pal4rle.bmp
   rle4-delta-320x240.bmp
   idt_check.bmp
   RSCW16.BMP
   HAL.RLE
   ASTRO.RLE
   LOGO.RLE

So i run tridscan on such samples to update bitmap-rle-bmp.trid.xml. I
also add line for correct mime type. This i done by line like:
   <Mime>image/bmp</Mime>
As reference URL use the page on file formats web site that is
expressed by line like:
   <RefURL>http://fileformats.archiveteam.org/wiki/BMP</RefURL>
Furthermore we see that RLE suffix is optional and not a duty. That
means we find run length encoded samples with BMP suffix. So this fact
is now expressed by line like:
   <Ext>RLE/BMP</Ext>

Then i look inside definition for pattern and try to understand what
and why has changed comparing with other variants. The first
characteristic pattern in front block is XML construct like:
   <Bytes>000028000000</Bytes>
   <ASCII> . . (</ASCII>
   <Pos>12</Pos>
That means that DIB header size is 28 hexadecimal or 40 decimal. The
second significant pattern is expressed by XML construct like:
   <Bytes>00000100</Bytes>
   <Pos>24</Pos>
According to documentation this means only the number of color planes
is 1, which is always true.

The samples should use Run Length Encoding (RLE) compression. When
looking in documentation under compression then only 2 compression are
possible:
1   BI_RLE8      for RLE 8-bit/pixel
2   BI_RLE4    for RLE 4-bit/pixel

Method 4 (BI_JPEG) is used in connection with other DIB header size
(BITMAPV4INFOHEADER) and 12 (BI_CMYKRLE8) and 13 (BI_CMYKRLE4) are
used only for Windows Metafile CMYK. So show this information by
remark line like:
   <Rem>This variant with 40 byte DIB header and RLE-compression (1 or 2)</Rem>

When using now brain we see that RLE aspect is not really expressed
inside bitmap-rle-bmp.trid.xml and that this definition is just the
sum of bitmap-bmp-v3.trid.xml and some nil pattern which are probably
triggered by lucky circumstances. So i replace this variant by 2
others:
bitmap-rle4-bmp.trid.xml   Run Length Encoded bitmap (compression 2)
bitmap-rle8-bmp.trid.xml   Run Length Encoded bitmap (compression 1)

So i run tridscan on samples with such reported compression methods
and look at definitions and try to refine these.

At offset 2 the size of the BMP file in bytes is stored as 4 byte
little endian integer. At offset 6 and 8 the hot spot x,y coordinates
are reported as 2 byte little endian integer. That was expressed by
XML construct like
   <Bytes>000000000000</Bytes>
   <Pos>4</Pos>
Non zero values for hot spot coordinates are normally used only for
OS/2. So i assume that these are here always true. Assuming that BMP
size can reach 4 GB limit the second construct becomes like:
   <Bytes>00000000</Bytes>
   <Pos>6</Pos>

At offset 10 the offset, i.e. starting address, of the byte where the
bitmap image data (pixel array) can be found. is stored. At offset 14
the size of this header, in bytes is stored as 4 byte little endian
integer. In this variant this size is always 40 (28 hexadecimal).So in
bitmap-rle4-bmp.trid.xml this was expressed by XML construct like:
   <Bytes>00000028000000</Bytes>
   <ASCII> . . . (</ASCII>
   <Pos>11</Pos>
With well known low header parts sizes and color table with few
entries and no artificial gap, than bitmap pixel start at "low"
offsets (lower 100h) with values like 118 94. So the above construct
is probably always true.

In bitmap-rle4-bmp.trid.xml this offset in all inspected samples the
pixel offset was 1078 (436h). So here that facts are now expressed by
XML construct like:
   <Bytes>000000003604000028000000</Bytes>
   <ASCII> . . . . 6 . . . (</ASCII>
   <Pos>6</Pos>

At offset 18 the bitmap width in pixels is stored as 4 byte little
endian integer. In my examples i got "low" values (like 640 320). So
the 2 upper bytes were nil. That was expressed by XML construct like:
   <Bytes>0000</Bytes>
   <Pos>20</Pos>
Assuming that width can reach 4 GB limit than this construct vanish.

At offset 22 the bitmap height in pixels is stored as 4 byte little
endian integer. In my examples i got "low" values (like 400 400
16). So the 2 upper bytes were nil. At offset 26 the number of color
planes is stored as 2 byte little endian. That value must be one. At
offset 28 the number of bits per pixel, which is the color depth of
the image is stored as 2 byte little endian. According to
documentation for compression 2 this is 4 and for compression 1 this
is 8. At offset 30 the used compression method is stored as 4 byte
little endian. That facts inside bitmap-rle4-bmp.trid.xml were
expressed by XML construct like:
   <Bytes>00000100040002000000</Bytes>
   <Pos>24</Pos>
Assuming height is reaching 4 GB limit this becomes like:
   <Bytes>0100040002000000</Bytes>
   <Pos>26</Pos>
In bitmap-rle8-bmp.trid.xml this construct becomes like:
   <Bytes>0100080001000000</Bytes>
   <Pos>26</Pos>

At offset 34 the image size is stored as 4 byte little endian. In my
inspected examples i get "low" values here. So the upper bytes are
nil. That was expressed by XML constructs like:
   <Bytes>00</Bytes>
   <Pos>37</Pos>
   <Bytes>0000</Bytes>
   <Pos>36</Pos>
Assuming that size can reach 4 GB limit than these constructs vanish.

At offset 38 the horizontal resolution of the image is s 4 byte little
endian integer. In my examples i got "low" values. So the 2 upper
bytes are nil. That was expressed by XML construct like:
   <Bytes>0000</Bytes>
   <Pos>40</Pos>
Assuming that horizontal resolution can reach 4 GB limit than this
construct vanish.

At offset 46 the number of colors in the palette is stored as 4 byte
little endian integer. In my examples i get "low" values.  When the
color depth value is 4 then the number of colors is limited to 2
powered by 4. That is 16. Then of course the number of colors in the
palette can reach this limit. So in bitmap-rle4-bmp.trid.xml the
following limit is expressed by XML construct like:
   <Bytes>000000</Bytes>
   <Pos>47</Pos>

At offset 50 the number of important used colors is stored as 4 byte
little endian integer. In my examples i get "low" values. When the
color depth value is 4 then the number of important colors is probably
limited to 2 powered by 4. That is 16. When the color depth is 8 then
the number of colors is limited to 2 powered by 8. That is 256 or 100
hexadecimal. Then of course the number of important colors should
reach this limit in worst case. So in bitmap-rle8-bmp.trid.xml this is
probably always true and expressed by XML construct like:
   <Bytes>0000</Bytes>
   <Pos>52</Pos>
So in bitmap-rle4-bmp.trid.xml this is probably always true and
becomes expressed by XML construct like:
   <Bytes>000000</Bytes>
   <Pos>51</Pos>

At offset 54 the header parts are ended. So here til reported bit
offset then this should be color table entries. So all nil pattern in
this range are probably triggered by lucky circumstances. So i delete
such pattern like:
      <Pattern>
         <Bytes>00</Bytes>
         <Pos>57</Pos>
      </Pattern>
      <Pattern>
         <Bytes>00</Bytes>
         <Pos>1077</Pos>
      </Pattern>

Then i find bitmap samples like BEARS2.OS2 BEARS.OS2 where suffix OS2
instead of BMP is used. Apparently this suffix is used hear to
emphasize that this bitmap are described as OS2 bitmap variant
bitmap-bmp-v1.trid.xml. So i update this definition and the fact that
also another suffix can appear here is now expressed by line like:
   <Ext>BMP/OS2</Ext>

Now comes the ugly group. There also another suffix instead of
standard BMP is used. After hard thinking i come to conclusion that
company use security by obscurity strategy. Internal they use standard
windows bitmap graphic format and do not want that user easily by just
clicking can open/change graphic images. Such people act like Russia
Putin and china Xi behaving in in transparent manner or just lying but
in the end somebody finds the truth. By modern communication tools the
facts are available to everybody in the world and when using brain
everybody should be able to judge what and why something is true. And
the unusual suffix are not registered or explained. In the end the
disadvantages have more weight. The graphic tool IrfanView always want
me to force a naming correction of such samples. The malicious tool
writer say hurry and thanks to such companies because now they can
place there kidnapper messages as hidden graphic and anti virus
software probably does not complain because it just a usual bitmap and
that suffix is not standard. This does not care because it seems to be
often used practice.

Samples like EPISMG05.WBF are apparently used by Epson printer
software to insert water mark tags. Samples like hardcopy-bitmap.hcp
are created and opened by graphic tool hardcopy. Samples like
Logo164x382_256.spb are used by some Infineon security software.
PQMAGIC.PQG is a graphic used by PowerQuest PartitionMagic 5.0
software.  According to XnView also VGA, RL4 and RL8 should occur but
i myself do not find such examples. Samples like logo.sys, logos.sys
and logow.sys are used by the Windows 9x family to display its boot
message.

The second group with unusual suffix in table form looks like:
EPISMG00.WBF      Epson printer water mark https://support.epson-europe.com/onlineguides/en/l800/html/vari_6.htm
EPISMG01.WBF      Epson printer water mark http://justsolve.archiveteam.org/wiki/Epson_Printer_Bitmaps
EPISMG02.WBF      Epson printer water mark https://files.support.epson.com/htmldocs/rx620_/rx620_rf/vari_7.htm
EPISMG03.WBF      Epson printer water mark c:\ProgramData\EPSON\EPSON PX730 Series
EPISMG04.WBF      Epson printer water mark
EPISMG05.WBF      Epson printer water mark
EPUTBM01.WBF      Epson printer water mark c:\Windows\System32\DriverStore\FileRepository\prnep001.inf_f0a9a372\I386
EPUTBM02.WBF      Epson printer water mark
EPUTBM03.WBF      Epson printer water mark
EPUTBM05.WBF      Epson printer water mark
EPUTBM07.WBF      Epson printer water mark
EPUTBM08.WBF      Epson printer water mark
EPUTBM09.WBF      Epson printer water mark
EPUTBM10.WBF      Epson printer water mark
hardcopy-bitmap.hcp   https://info.hardcopy.de/formate.php
Logo164x382_16.spb   Infineon Logo
Logo164x382_24bit.spb   Infineon Logo
Logo164x382_256.spb   Infineon Logo
PQMAGIC.PQG      PowerQuest PartitionMagic 5.0 graphic
*.vga         foo
*.rl4         probably run length encoded with color depth 4
*.rl8         probably run length encoded with color depth 8
logo.sys      https://en.wikipedia.org/wiki/LOGO.SYS
logos.sys      https://en.wikipedia.org/wiki/LOGO.SYS
LOGOW.SYS      https://en.wikipedia.org/wiki/LOGO.SYS
sulogo.sys      https://en.wikipedia.org/wiki/LOGO.SYS
NLOGOS.SYS      https://en.wikipedia.org/wiki/LOGO.SYS
NLOGOW.SYS      https://en.wikipedia.org/wiki/LOGO.SYS

With the updated trid definition now all BMP examples with unusual
suffix are described (see appended output/trid-v-new.txt). TrID
definition, some samples and output are stored in archive
bmp-else.zip. I hope that my XML files can be used in future version
of triddefs.

With best wishes
Jörg Jenderek

jsummers

  • Newbie
  • *
  • Posts: 5
Many Windows programs give their internally-used BMP files unconventional filename extensions, sometimes derived from the software name, or a screen mode, or whatever. I'd say the only extensions that are even remotely common are .bmp, .rle, and .dib. I agree that .rl4 and .rl8 are very rare.

An example of a file with a 24-byte info header (the header size field at offset 14): SPHERES1.BMP, and the other files in that collection. I'm not sure if I've seen any more.

I don't think I've ever seen a 48-byte info header.

The DOS utility Image Alchemy creates OS/2 BMP files with a 36-byte info header (!). But I've never seen one in the wild. Using Image Alchemy v1.11, run "ALCHEMY.EXE [inputfile] [outputfile].BMP -O".

Then there are the completely nonstandard BMP-like formats, which you might want to take into account when indentifying BMP. There's Pegasus PIC (or KQP), with an info header size of 68, and a compression method of "JPEG". For whatever this is worth, I recently found a .bmc format with "CRAM" in the compression field: GUI.Z (*.bmc). I think I've seen other one-off formats like that, though maybe only in embedded image resources.

Mark0

  • Administrator
  • Hero Member
  • *****
  • Posts: 2732
    • Mark0's Home Page
Many thanks to both of you for the definitions and the info!