Author Topic: ann.trid.xml for Windows HELP File annotation ; misidentified  (Read 1077 times)

jenderek

  • Sr. Member
  • ****
  • Posts: 375
ann.trid.xml for Windows HELP File annotation ; misidentified
« on: December 15, 2023, 05:58:59 PM »
Hello trid users,

some months ago i migrate to Windows 10. Some days ago i wanted to use the
help of an older Windows program. Now i get an error message that the used
help system is not not supported any more. The same error occur in my previous
Window 8.1 system. The solution offered by Microsoft is to download
installation package with knowledge base KB917607. For Windows 8.1 i could
download a MSU package for my language and CPU architecture. This could be
started by double click. But for Windows 10 no download is offered.  I tried
the version for Windows 8.1 but when starting installation Windows complains
that package is not suited for my version.

For the windows help files the name suffix HLP is used. Unfortunately this
suffix is also used for other help systems. So in first step you want to
identify all HLP systems on your systems. Unfortunately on my systems some HLP
files are not identified. So in this session i will handle files with suffix
ANN which are related to Windows HELP File which are described by
hlp.trid.xml.

The ANN files are typically found inside directory %LOCALAPPDATA%\Help. This
is true for Windows XP and 8.1 on my systems. For newer Windows system the old
HLP format and therefor the ANN format is not supported any more.

The samples are created by Microsoft Help tool winhlp32.exe, when you choose
menu entry like "annotate" under "edit". This does not work with Reactos 4.14
and original winhlp32.exe under wine.

So i run trid utility on such ANN examples. All samples are recognized and are
described wrong as "Multimedia Viewer Book" with suffix MVB by mvb.trid.xml
(see appended trid-v-old.txt in output).

For comparison reason i also run the file format identification utility DROID
(See https://sourceforge.net/projects/droid/). Here the samples are not
recognized.

For comparison reason i also run file command (version 5.45) on such
samples. Here such samples are recognized and described as "MS Windows help
annotation". Also the file size information in bytes is shown (see appended
output/file-5.45.txt). The mime type is here application/x-winhelp (see
appended file-i-5.45.txt in output). The correct file name suffix ANN is here
shown (see appended file-ext-5.45.txt in output).

On Linux according to shared MIME-info database such samples are called
"WinHelp help file". Here application/winhlp is used as mime type. The samples
are just recognized by looking for 4 byte sequence 3F5F0300 at the
beginning. Here suffix HLP is displayed. That information can be seen in
source freedesktop.org.xml.in found for example on gitlab.freedesktop.org.

Luckily i found on the net information parts about Windows HELP. Of course no
official from Microsoft. And this applies also to related annotation files
with suffix ANN. I choose page on file formats archive team server. So i use
this as reference. That is expressed inside new definitions by line like:
 <RefURL>http://fileformats.archiveteam.org/wiki/WinHelp_annotation</RefURL>

On many sites and also English Wikipedia application/winhlp is mentioned as
mime type for HLP files. But when looking on my Windows systems and
extension.nirsoft.net there not such a thing is listed. Also no such type is
officially registered at IANA.org. So i choose user defined type listed by
file command. That is expressed by line like:
   <Mime>application/x-winhelp</Mime>

So i first create TrID definition ann.trid.xml by running tridscan on my
samples.

The first XML construct looks like:
   <Bytes>3F5F03001B00000010000000</Bytes>
   <ASCII> ? _</ASCII>
   <Pos>0</Pos>
According to documents the first 4 bytes are the magic for all HLP related
files. So this also expressed inside hlp.trid.xml and mvb.trid.xml by XML
construct like:
   <Bytes>3F5F0300</Bytes>
   <ASCII> ? _</ASCII>
   <Pos>0</Pos>

At offset 4 DirectoryStart is stored as 4 byte little integer. That is offset
of FILEHEADER of internal directory.  At offset 8 FirstFreeBlock is stored as
4 byte little integer. That is offset of free header. Value -1 ( FFFFFFFFh )
means no free list.  So for my ANN examples i get value offset 1Bh and
FirstFreeBlock at offset 10h. So this means part directly comes after "header"
parts. So this is probably always true, but i do not know. So i mention this
facts in remark line. That is also different from pure HLP file. There exist
no FirstFreeBlock. That is expressed there by XML construct like:
   <Bytes>00FFFFFFFF</Bytes>
   <Pos>7</Pos>

The second XML construct looks like:
   <Bytes>00000B000000</Bytes>
   <Pos>14</Pos>
At offset 12 the size of entire help file in byte as 4 byte little integer
variable EntireFileSize. In my samples i got "low" values. So the 2 upper
bytes are unused nil. Assuming that file size can reach 32-bit maximum the nil
bytes will vanish and construct becomes like:
   <Bytes>0B000000</Bytes>
   <Pos>16</Pos>
That means first free block starts at offset 16 (010h) with 4 byte sequence
0B000000. So this is probably always true.

Third XML construct look like:
 <Bytes>000000AF000000A6000000043B29020480007A340000000000000000000000000000000000000000FFFF0100010003000000</Bytes>
 <ASCII> . . . . . . . . . . . . ; ) . . . . z 4</ASCII>
 <Pos>24</Pos>

When we looked in patched file command output file.tmp we see that this is
mainly directory part starting with "low" and constant variables. This start
at offset 27 with ReservedSpace AFh, followed by UsedSpace A6h at offset 31
and so on. The last entry before offset 75 is TotalBtreeEntries with value
3. This block depends on b-tree structure. So maybe for bigger annotation or
many changes some fields in this area become different, but i do not know. So
i keep this at the moment and mention the observations in remark line. I
assume that 3 nil bytes before are triggered by lucky circumstances. So delete
these 3 bytes pattern and the third construct becomes like:
 <Bytes>AF000000A6000000043B29020480007A340000000000000000000000000000000000000000FFFF0100010003000000</Bytes>
 <ASCII> . . . . . . . . . ; ) . . . . z 4</ASCII>
 <Pos>27</Pos>

Fourth XML construct looks like:
   <Bytes>000300FFFFFFFF</Bytes>
   <Pos>75</Pos>

This maybe belongs to end of directory or next structure. In that place next
structures @LINK and @VERSION follows. These are characteristic for ANN files
are expressed inside global strings section by lines like
   <String>VERSION</String>
   <String>LINK</String>

The @VERSION part is obviously filled with many nil bytes. That is expressed
by fifth XML construct what looks like:
   <Bytes>000000000000000000000000000000000000000000000000000000000
   <Pos>115</Pos>

The sixth XML construct looks like:
   <Bytes>626D6601000000</Bytes>
   <ASCII> b m f</ASCII>
   <Pos>212</Pos>
According to documents that is the first internal file described by 6 bytes of
version info, which seems to be constant and characteristic for ANN
samples. So i mention this fact in remark line and the construct get shrinks
to 6 bytes. This now looks like:
   <Bytes>626D66010000</Bytes>
   <ASCII> b m f</ASCII>
   <Pos>212</Pos>

The last 2 XML construct look like:
   <Pattern>
      <Bytes>0000</Bytes>
      <Pos>221</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>227</Pos>
   </Pattern>
I assume that these are triggered by lucky circumstances ( too few samples not
reaching 32-bit maximum). So i delete these two constructs.

With this new trid definition now all my help ANN samples are described. TrID
definition and output are stored in archive ann_.zip. I hope that my
definition can be used in future version of triddefs.

With best wishes
Jörg Jenderek


Mark0

  • Administrator
  • Hero Member
  • *****
  • Posts: 2744
    • Mark0's Home Page
Re: ann.trid.xml for Windows HELP File annotation ; misidentified
« Reply #1 on: December 19, 2023, 02:12:37 AM »
Thanks!