Author Topic: bmk.trid.xml for Windows HELP bookmark; misidentified  (Read 1111 times)

jenderek

  • Sr. Member
  • ****
  • Posts: 375
bmk.trid.xml for Windows HELP bookmark; misidentified
« on: December 18, 2023, 01:52:46 AM »
Hello trid users,

some months ago i migrate to Windows 10. Some days ago i wanted to use the
help of an older Windows program. Now i get an error message that the used
help system is not not supported any more. The same error occur in my previous
Window 8.1 system. The solution offered by Microsoft is to download
installation package with knowledge base KB917607. For Windows 8.1 i could
download a MSU package for my language and CPU architecture. This could be
started by double click. But for Windows 10 no download is offered.  I tried
the version for Windows 8.1 but when starting installation Windows complains
that package is not suited for my version.

For the windows help files the name suffix HLP is used. Unfortunately this
suffix is also used for other help systems. So in first step you want to
identify all HLP systems on your systems. Unfortunately on my systems some HLP
files are not identified. So in this session i will handle files with suffix
BMK which are related to Windows HELP File which are described by
hlp.trid.xml.

The BMK files are typically found inside directory %LOCALAPPDATA%\Help. For
newer Windows system the old HLP format and therefor the BMK format is not
supported any more.  The samples are created by Microsoft Help tool
winhlp32.exe, when you choose menu entry like "bookmark" and "define".

The file name is WinHlp32.BMK (on Windows XP 32-bit) or WinHlp32 (on Windows 7
and 8.1 64-bit)

So i run trid utility on such bookmark examples. All samples are recognized
and are described wrong as "Multimedia Viewer Book" with suffix MVB by
mvb.trid.xml. Some samples are described with higher priority as "Windows HELP
File" with wrong suffix HLP by hlp.trid.xml (see appended trid-v-old.txt in
output).

For comparison reason i also run the file format identification utility DROID
(See https://sourceforge.net/projects/droid/). Here the samples described by
TrID as Windows HELP File are here described as "Windows Help File" without
mime type by PUID fmt/474. But missing suffix is considered here as bad (See
EXTENSION_MISMATCH true in droid-bmk.csv in output).

For comparison reason i also run file command (version 5.45) on such
samples. Here such samples are recognized and described correctly s "MS
Windows help Bookmark". Also the file size information in bytes is shown (see
appended output/file-5.45.txt). The mime type is here application/x-winhelp
(see appended file-i-5.45.txt in output). The correct file name suffix BMK is
here shown (see appended file-ext-5.45.txt in output).

On Linux according to shared MIME-info database such samples are called
"WinHelp help file". Here application/winhlp is used as mime type. The samples
are just recognized by looking for 4 byte sequence 3F5F0300 at the
beginning. Here suffix HLP is displayed. That information can be seen in
source freedesktop.org.xml.in found for example on gitlab.freedesktop.org.

Luckily i found on the net information parts about Windows HELP. Of course no
official from Microsoft. And this applies also to related bookmark files
sometimes with suffix BMK. So i choose page on Wikipedia. So i use this as
reference. That is expressed inside new definitions by line like:
   <RefURL>https://en.wikipedia.org/wiki/WinHelp</RefURL>

On many sites and also English Wikipedia application/winhlp is mentioned as
mime type for HLP files. But when looking on my Windows systems and
extension.nirsoft.net there not such a thing is listed. Also no such type is
officially registered at IANA.org. So i choose user defined type listed by
file command. That is expressed by line like:
   <Mime>application/x-winhelp</Mime>

So i first create TrID definition bmk.trid.xml by running tridscan on my
samples.

The first XML construct looks like:
   <Bytes>3F5F0300</Bytes>
   <ASCII> ? _</ASCII>
   <Pos>0</Pos>
According to documents the first 4 bytes are the magic for all HLP related
files. So this also expressed inside hlp.trid.xml and mvb.trid.xml by XML same
construct.

At offset 8 FirstFreeBlock is stored as 4 byte little integer. That is offset
of free header. Value -1 ( FFFFFFFFh ) means no free list.  So for some
bookmark examples i get this value but for some not.  That is also different
from pure HLP file. There exist no FirstFreeBlock. That is expressed there by
XML construct like:
   <Bytes>00FFFFFFFF</Bytes>
   <Pos>7</Pos>

The second XML construct looks like:
   <Bytes>000000</Bytes>
   <Pos>5</Pos>

At offset 4 DirectoryStart is stored as 4 byte little integer. That is offset
of FILEHEADER of internal directory. So 3 upper bytes are nil. That means
DirectoryStart is lower 100h. After hard thinking i believe that this "low"
value is probably always true. Why? Normally every bookmark entry is equal to
something like header text and is limited to some dozen characters. So in
worst realistic case with thousands of bookmarks the content just has a size
of some 10000 bytes. With a page size of 400h than the b-tree is not so
complicate organized and is similar organized (directory near the
beginning). So there is not much overhead and total file size is in similar
range.

At offset 12 files is stored as 4 byte little integer. in my examples the 2
upper bytes are nil. So file size is lower 10000h. So this probably always
true. This is expressed by XML construct like:
   <Bytes>0000</Bytes>
   <Pos>14</Pos>

The next XML constructs are short nil byte sequences like:
   <Pattern>
      <Bytes>000000</Bytes>
      <Pos>17</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>34</Pos>
   </Pattern>
   <Pattern>
      <Bytes>0000000000</Bytes>
      <Pos>38</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>184</Pos>
   </Pattern>
But i do not know what this means. Unfortunately i still found no "real"
characteristic that make the difference to other "HLP" files. So i keep these
constructs.

The last construct is a long nil byte sequence reaching about 1 KB limit. That
looks like:
   <Bytes>00000000000000000000000000000000000000000000000000000
   <Pos>186</Pos>
So i do not really found at first glance characteristics for help bookmark. So
may other users know more facts or can improve my definition.

With this new trid definition now all my help bookmark samples are described
more precisely. TrID definition, some samples and output are stored in archive
bmk_.zip. I hope that my definition can be used in future version of triddefs.

With best wishes
Jörg Jenderek

Mark0

  • Administrator
  • Hero Member
  • *****
  • Posts: 2743
    • Mark0's Home Page
Re: bmk.trid.xml for Windows HELP bookmark; misidentified
« Reply #1 on: December 19, 2023, 02:39:36 AM »
Thanks!
Unfortunately I tried to refine the definition with a couple other BMK files, including one from Windows XP, and most of the patterns disappear leaving something too little different from a normal HLP file.