Author Topic: variant bitmap-jxr.trid.xml for JPEG XR bitmap  (Read 697 times)

jenderek

  • Sr. Member
  • ****
  • Posts: 375
variant bitmap-jxr.trid.xml for JPEG XR bitmap
« on: August 22, 2023, 10:55:36 PM »
Hello trid users,

some days ago i must handle some exotic "JPEG" images. These have filename
suffix jxr or wdp.

So i run trid utility on such images database samples. These samples are
described correctly as "JPEG XR bitmap" with mime type image/vnd.ms-photo by
bitmap-wmp.trid.xml. Here 4 file name suffix (HDP/JXR/WDP/WMP) are listed (See
appended output/trid-v-old.txt).

For comparison reason i also run the file format identification utility DROID
(See https://sourceforge.net/projects/droid/). Here the examples are also
recognized. These are described here "JPEG Extended Range" and mime type
image/jxr by PUID fmt/590.

For comparison reason i also run file command (version 5.45) on such
samples. With keep going option -k i get often 2 nearly same messages.  In the
first the images are called "JPEG-XR Image" and in the second the samples are
just called "JPEG-XR". Also some more information like images dimensions and
color information are shown (see appended output/file-k-5.45.txt). Here also
mime type image/jxr is listed (see appended output/file-k-i-5.45.txt). Here 3
file name suffix are listed (see appended output/file-k-ext-5.45.txt). For a
few samples like example.wdp and MARKET-3361-ipm-bg-DE-treat[1].wdp i got only
firs message part.

So started to generate bitmap-jxr.trid.xml by running tridscan on few examples
with the 2 file command descriptions.

Inside bitmap-wmp.trid.xml the page about JPEG XR (Redirected from HD Photo)
on Wikipedia is used as reference URL. Instead is use page about JPEG XR on
file format archive team web site. That is expressed by line like:
   <RefURL>http://fileformats.archiveteam.org/wiki/JPEG_XR</RefURL>

There the Wikipedia page is also listed as link. But i choose this as new
reference because it has 2 advantages. First it list some sample download
links. Furthermore it contains a section Identification which describes some
characteristics of the file format.

The file format starts by Microsoft with name "HD Photo". Apparently therefor
the mime type was image/vnd.ms-photo. Later the format becomes an official
standard. So the registered mime as IANA is therefore expressed by line like:
   <Mime>image/jxr</Mime>
That mime type is also shown by file command and DROID utility.

I have not enough time and brain to read and understand the full file format
specification, but luckily i read the mime type information at iana.org:
   https://www.iana.org/assignments/media-types/image/jxr

Under item magic number is following written:
Data begins with a FILE_HEADER( ) data structure, which begins with a
FIXED_FILE_HEADER_II_2BYTES field equal to 0x4949, followed by a
FIXED_FILE_HEADER_0XBC_BYTE field equal to 0xBC, followed by a FILE_VERSION_ID
which is equal to 1 for the current version of the Recommendation and
International Standard (with other values reserved for future use, as modified
in additional parts or amendments, by ITU-T or ISO/IEC). That is expressed by
XML construct:
   <Bytes>4949BC01</Bytes>
   <ASCII> I I</ASCII>
   <Pos>0</Pos>
The JXR format started before 2009 and now we have year 2023. Now there exist
other newer graphic image formats like WebP or HEIF. So in my option there
will probably no evolution of JXR from version 1 to something like 2 become
reality. So XML construct without version is very unlike to become true like
in bitmap-wmp.trid.xm where this looks like:
   <Bytes>4949BC</Bytes>
   <ASCII> I I</ASCII>
   <Pos>0</Pos>
The 4 byte start pattern is used by DROID tool and second file command message
as characteristic.

Within the payload data, JPEG XR IMAGE_HEADER data structures begin with a
GDI_SIGNATURE, which is a 64-bit syntax element that has the value
0x574D50484F544F00 that corresponds to "WMPHOTO" using the UTF-8 character set
encoding specified in Annex D of ISO/IEC 10646, followed by a byte equal to 0.
That is expressed inside Global Strings section by line like:
   <String>WMPHOTO</String>

In many examples this characteristic is stored at offset 90. So this was
expressed inside front block section by XML construct like:
   <Bytes>574D50484F544F00</Bytes>
   <ASCII> W M P H O T O</ASCII>
   <Pos>90</Pos>

In that way it is test by first file command message. But for few samples like
MARKET-3361-ipm-bg-DE-treat[1].wdp and example.wdp this characteristic string
occur at other offsets. So after running tridscan with these samples the above
XML vanish.

With six samples i got some additional very short nil patterns like:
   <Pattern>
      <Bytes>000000</Bytes>
      <Pos>5</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>25</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>29</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>33</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>37</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>41</Pos>
   </Pattern>
   <Pattern>
      <Bytes>0000</Bytes>
      <Pos>44</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>49</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>53</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>55</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>61</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>65</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>73</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>97</Pos>
   </Pattern>
I assume that these are triggered by lucky circumstances (too few
examples). So i delete these nil patterns. I do not know if there exist
samples without that characteristic string from pre standard area. So all
standard JXR samples are described by variant bitmap-jxr.trid.xml.  In current
definition only the first 3 bytes are used as characteristic. That is maybe
not unique enough. According to file command at least 4 byte should be used
for recognition to be unique enough.

The sample fmt-590-signature-id-931.wdp is used by DROID as test pattern for
JXR images and contain the first 4 bytes of JXR images. So by
bitmap-wmp.trid.xml this sample is described by TrID as JXR image but it is
not a graphic image.

All samples are described with low priority as "Crayola Art Studio graphic
Art" by art-crayola.trid.xml. This is done by 2 XML constructs like:
   <Pattern>
      <Bytes>4949</Bytes>
      <ASCII> I I</ASCII>
      <Pos>0</Pos>
   </Pattern>
   <Pattern>
      <Bytes>01</Bytes>
      <Pos>3</Pos>
   </Pattern>
Maybe the Crayola samples are just the average of JXR images and something
else.

During my work i had to do some steps. First i need to verify that my few two
"strange" samples are really JXR images. I tried to do these by ImageMagick
command like:
       identify -verbose *.wdp *.jxr

This works partly on Linux system (See appended identify-jxr.txt and
identify-verbose-jxr.txt in output), but fails on Windows (See appended
abydos-identify.txt). As written on Wikipedia page ImageMagick does not
support JXR natively but needs the jxrlib packages. On Linux identify works
because library package libjxr0 and command line tools JxrDecApp, JxrEncApp (
package libjxr-tools) are installed. After jumping about this hurdle for
control reason i just convert the "strange" JXR samples by these command line
tools like:
     JxrDecApp -v -i example.wdp -o example.tif

I also verified the validity of the JXR samples with help of the XnView
graphic viewer. This was able to open and display the images. For control
reasons you get relevant image information like dimension by the command line
tool via line like
   nconvert -fullinfo *.jxr *.wdp
So now i am very sure that my "strange" JXR samples are real and valid JXR
images.

On IANA only JXR is mentioned as file name suffix. On many site also suffix
wdp is mentioned. These 2 suffix are found for my samples. So these
observation are expressed by line like:
   <Ext>JXR/WDP</Ext>
On many sites also a third suffix hdp is mentioned. And current TrID
definition also list WMP as fourth suffix. But i do not know if such samples
are matched by new deviation.

With the new trid definition now my JXR examples are still described as before
and some misidentified samples like fmt-590-signature-id-931.wdp are now
skipped.  TrID definition, few samples and output are stored in archive
jpeg_jxr.zip. I hope that my definition can be used in future version of
triddefs.

With best wishes
Jörg Jenderek

Mark0

  • Administrator
  • Hero Member
  • *****
  • Posts: 2743
    • Mark0's Home Page
Re: variant bitmap-jxr.trid.xml for JPEG XR bitmap
« Reply #1 on: August 24, 2023, 08:53:58 PM »
Thanks!