Author Topic: gid.trid.xml replacing gid_idx.trid.xml for GID Help inde  (Read 1087 times)

jenderek

  • Sr. Member
  • ****
  • Posts: 375
gid.trid.xml replacing gid_idx.trid.xml for GID Help inde
« on: December 16, 2023, 04:31:23 AM »
Hello trid users,

some months ago i migrate to Windows 10. Some days ago i wanted to use the
help of an older Windows program. Now i get an error message that the used
help system is not not supported any more. The same error occur in my previous
Window 8.1 system. The solution offered by Microsoft is to download
installation package with knowledge base KB917607. For Windows 8.1 i could
download a MSU package for my language and CPU architecture. This could be
started by double click. But for Windows 10 no download is offered.  I tried
the version for Windows 8.1 but when starting installation Windows complains
that package is not suited for my version.

For the windows help files the name suffix HLP is used. Unfortunately this
suffix is also used for other help systems. So in first step you want to
identify all HLP systems on your systems. Unfortunately on my systems some HLP
files are not identified. So in this session i will handle files with suffix
GID which are related to Windows HELP File which are described by
hlp.trid.xml.

The GID files are typically found in same directory as corresponding HLP file.
The samples are created by Microsoft Help tool winhlp32.exe.

So i run trid utility on such GID examples. Many samples are recognized and
are described correctly as "GID Help index" without mime type by
gid_idx.trid.xml. With little lower priority these samples are also described
as "Windows HELP File" with file name suffix HLP by hlp.trid.xml. Again with
little lower priority these samples are also described as "Multimedia Viewer
Book" with suffix MVB and mime type application/octet-stream by mvb.trid.xml.
(see appended trid-v-old.txt in output).

For comparison reason i also run the file format identification utility DROID
(See https://sourceforge.net/projects/droid/). Here the samples are
recognized, but these are described only as "Windows Help File" by PUID
fmt/474. Here GID suffix is considered as "bad".

For comparison reason i also run file command (version 5.45) on such
samples. Here also most samples are recognized and described correctly as "MS
Windows help Global Index". Also the file size information in bytes is shown
(see appended output/file-5.45.txt). The mime type is application/x-winhelp
here (see appended file-i-5.45.txt in output). The correct file name suffix
GID is here shown for most samples (see appended file-ext-5.45.txt in
output). A few samples (like ICCviewer.GID win98rk.GID) are misidentified as
"MS Windows help Bookmark" with wrong suffix bmk. These samples are recognized
by TrID. On the other hand file command recognize some samples (like
RESCUE32.GID grep.GID IBMAVW.GID putty.GID RESCUE32.GID) not identified
correctly by TrID. This happens because file command use other methods to
identify GID samples.

On Linux according to shared MIME-info database such samples are called
"WinHelp help file". Here application/winhlp is used as mime type. The samples
are just recognized by looking for 4 byte sequence 3F5F0300 at the
beginning. Here suffix HLP is displayed. That information can be seen in
source freedesktop.org.xml.in found for example on gitlab.freedesktop.org.

Luckily i found on the net information parts about Windows HELP. Of course no
official from Microsoft. And this applies also to related annotation files
with suffix GID. So i choose page on Wikipedia. So i use this as
reference. That is expressed inside new definitions by line like:
   <RefURL>https://en.wikipedia.org/wiki/WinHelp</RefURL>

On many sites and also English Wikipedia application/winhlp is mentioned as
mime type for HLP files. But when looking on my Windows systems and
extension.nirsoft.net there not such a thing is listed. Also no such type is
officially registered at IANA.org. So i choose user defined type listed by
file command. That is expressed by line like:
     <Mime>application/x-winhelp</Mime>

So i first run tridscan on undetected samples to improve gid_idx.trid.xml. At
first glance (see trid.tmp in output) now all is OK and all my GID samples are
recognized. But when looking what has happened we see what experienced TrID
users expected. Some short nil byte sequences vanished like:
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>40</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>52</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>58</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>62</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>66</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>73</Pos>
   </Pattern>

These are probably triggered by lucky circumstances (Too few samples and not
reaching 32-bit limits). So i guess when inspecting more samples then the
remaining short nil sequences also vanish. Then in the end only 2 XML
constructs will survive. The first is expressed by
   <Bytes>3F5F0300</Bytes>
   <ASCII> ? _</ASCII>
   <Pos>0</Pos>
That is the pattern that is used by all tools for recognition.

The second looks like:
   <Pattern>
      <Bytes>00FFFFFFFF</Bytes>
      <Pos>7</Pos>
   </Pattern>

At offset 4 DirectoryStart is stored as 4 byte little integer. That is offset
of FILEHEADER of internal directory.  At offset 8 FirstFreeBlock is stored as
4 byte little integer. That is offset of free header. Value -1 ( FFFFFFFFh )
means no free list. So DirectoryStart is not reaching maximal 32-bit limit. So
DirectoryStart is lower 1000000h (=16777216 = 16 MiB) and GID samples have no
no free list. But when we look at hlp.trid.xml we see that these are the used
patterns for HLP samples. So the conclusion is that by current TrID definition
in principal there is no difference between GID and HLP samples. Obviously
this wrong! I know it because i implement this feature for file command where
this recognition is done in another way.

So i recommend not to use or improve gid_idx.trid.xml any more.

So i run tridscan on my GID samples and create replacement definition
gid.trid.xml. Now i get similar patterns. Again i get the same two XML
construct. Then i get also some short nil byte sequences like:
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>15</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>19</Pos>
   </Pattern>
   <Pattern>
      <Bytes>0000</Bytes>
      <Pos>23</Pos>
   </Pattern>
   <Pattern>
      <Bytes>0000</Bytes>
      <Pos>28</Pos>
   </Pattern>
   <Pattern>
      <Bytes>0000</Bytes>
      <Pos>35</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>793</Pos>
   </Pattern>
   <Pattern>
      <Bytes>0000</Bytes>
      <Pos>967</Pos>
   </Pattern>
   <Pattern>
      <Bytes>000000000000</Bytes>
      <Pos>1081</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>1090</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>1094</Pos>
   </Pattern>
I assume that these are triggered by lucky circumstances. So i delete these.

But now in Global Strings section i get lines like:
   <String>KWBTREE</String>
   <String>FILES</String>
   <String>FLAGS</String>
   <String>KWMAP</String>
   <String>.CNT</String>
   <String>.HLP</String>
   <String>PETE</String>
2 are triggered by references to other help related file types with suffix CNT
and HLP. Two lines with KWBTREE and KWMAP are mentioned in documentation about
HLP format. The line with name PETE seems to be characteristic for GID. That
characteristic is used by file command as additional test.

With this new trid definition now all my help GID samples are described. But
now recognition rate is higher and i get clear differences compared to HLP and
MVB samples.

TrID definition, some samples and output are stored in archive gid_.zip. I
hope that my definition can be used in future version of triddefs.

With best wishes
Jörg Jenderek

Mark0

  • Administrator
  • Hero Member
  • *****
  • Posts: 2743
    • Mark0's Home Page
Re: gid.trid.xml replacing gid_idx.trid.xml for GID Help inde
« Reply #1 on: December 19, 2023, 02:46:40 AM »
Thanks!