Recent Posts

Pages: 1 ... 8 9 [10]
91
Thanks Jörg!
92
Hello trid users,

some days ago i handled files in context of old Window help system.
So in this session i will handle files with suffix FTS and FTG.

The files are typically found in same directory as corresponding HLP file.
The samples are created by Microsoft Help tool winhlp32.exe.

So i run trid utility on my examples. The FTS samples are recognized and are
described correctly as "Windows Help Full-Text Search index file" without mime
type and reference by fts.trid.xml. The few FTG samples are not recognized and
are described as "Unknown!" (see appended trid-v-old.txt in output).

For comparison reason i also run the file format identification utility DROID
(See https://sourceforge.net/projects/droid/). Here the samples are not
recognized.

For comparison reason i also run file command (version 5.45) on such
samples. Here also the FTS samples are recognized and described correctly as
"MS Windows help Full Text Search index" Also the corresponding HLP full file
name is shown (see appended output/file-5.45.txt). The mime type here is
application/x-winhelp-fts (see appended file-i-5.45.txt in output). The
correct file name suffix is here also shown for FTS samples (see appended
file-ext-5.45.txt in output). The FTG samples (like winhlp32.FTG.GID) are not
recognized and therefor described with generic application/octet-stream mime
type as "data".

On Linux according to shared MIME-info database samples with FTS suffix are
called "FITS document" with acronym "Flexible Image Transport System". But
that is another file format.

Luckily i found on the net information parts about Windows HELP. Of course no
official from Microsoft. And this applies also to related search files with
suffix FTS and FTG. So i choose page on Wikipedia. So i use this as
reference. That is expressed inside updated definition by line like:
   <RefURL>https://en.wikipedia.org/wiki/WinHelp</RefURL>

In current definition no mime type is listed. So i choose user defined type
listed by file command. That is expressed by line like:
      <Mime>application/x-winhelp-fts</Mime>

The file command list also the full name of corresponding HLP file (like
"C:\TMP.TMP\hlp\htmhlp98.hlp"). Apparently this is stored at offset 16. So i
mention my observation in remark line because these facts become relevant when
considering FTG samples.

The description of FTS mainly happen by characteristic 4 byte pattern at the
beginning. That is expressed by XML construct that looks like:

   <Bytes>74664D52</Bytes>
   <ASCII> t f M R</ASCII>

On Wikipedia beside FTS suffix also FTG is listed as Full Text Search of
WinHelp. So i looked on my systems for such files. Unfortunately i found only
few samples. Many (like CTRLREF.FTG SETUPWIZ.FTG) are empty. So file size is
0, but many (like CTRLREF.FTG SETUPWIZ.FTG) contain just an empty line (
Carriage Return Line-Feed). So file size is 2. So in the end i got only one
real sample (like winhlp32.FTG).

So i generate ftg.trid.xml manually. At offset 16 here also full file name is
stored but here instead of HLP FTS is referenced. So i mention fact in remark
line. This fact is expressed inside global strings section by line like:
   <String>.FTS</String>

When searching on the net for difference then the phrase group is
mentioned. So compared with fts.trid.xml this fact is expressed by line like:

   <FileType>Windows Help Full-Text search Group file</FileType>

And compared with fts.trid.xml i choose another user defined mime type. That
is expressed by line like:
   <Mime>application/x-winhelp-ftg</Mime>

In the starting 4 byte pattern letter g no instead of t is used compared with
fts.trid.xml. So this is expressed by XML construct like:
   <Bytes>67664D52</Bytes>
   <ASCII> g f M R</ASCII>
   <Pos>0</Pos>

With this new trid definition now all my real help Windows Help Full-Text
search samples are described; also the Group samples (*.FTG). And now more
details are shown.

TrID definition, some samples and output are stored in archive fts_ftg.zip. I
hope that my definitions can be used in future version of triddefs.

With best wishes
Jörg Jenderek
93
Definitions DB change log / Re: Current - Year 2023
« Last post by Mark0 on December 19, 2023, 03:11:57 AM »
Updated:
  • Workflow Petri Net Designer project (PNML)
Added:
  • Windows Help Annotation (ANN)
  • BAR game data archive (BAR/DFW)
  • MechWarrior 2 mission data (BWD)
  • DESQview/X colors Configuration (CFG)
  • Windows Help index (GID)
  • DESQview/X Group (GRP)
  • DESQview/X Help (HLP)
  • DESQview/X Layout (LYT)
  • MechWarrior 2 demo data (MW2)
  • Microsoft Test compiled P-Code (v3.0) (PCD)
  • Petri Net XML (PNML)
  • MechWarrior 2 game data (PRJ)
  • DESQview/X Print Manager driver (PTM)
  • Microsoft Test Screen (SCN)
  • InfoSpotter Template (SPT)
  • Windows NT Registry Hive (Windows Firewall) (WFW)
Deleted:
  • GID Help index (GID)
94
TrID File Identifier / Re: gid.trid.xml replacing gid_idx.trid.xml for GID Help inde
« Last post by Mark0 on December 19, 2023, 02:46:40 AM »
Thanks!
95
TrID File Identifier / Re: bmk.trid.xml for Windows HELP bookmark; misidentified
« Last post by Mark0 on December 19, 2023, 02:39:36 AM »
Thanks!
Unfortunately I tried to refine the definition with a couple other BMK files, including one from Windows XP, and most of the patterns disappear leaving something too little different from a normal HLP file.
97
TrID File Identifier / Re: ann.trid.xml for Windows HELP File annotation ; misidentified
« Last post by Mark0 on December 19, 2023, 02:12:37 AM »
Thanks!
98
TrID File Identifier / bmk.trid.xml for Windows HELP bookmark; misidentified
« Last post by jenderek on December 18, 2023, 01:52:46 AM »
Hello trid users,

some months ago i migrate to Windows 10. Some days ago i wanted to use the
help of an older Windows program. Now i get an error message that the used
help system is not not supported any more. The same error occur in my previous
Window 8.1 system. The solution offered by Microsoft is to download
installation package with knowledge base KB917607. For Windows 8.1 i could
download a MSU package for my language and CPU architecture. This could be
started by double click. But for Windows 10 no download is offered.  I tried
the version for Windows 8.1 but when starting installation Windows complains
that package is not suited for my version.

For the windows help files the name suffix HLP is used. Unfortunately this
suffix is also used for other help systems. So in first step you want to
identify all HLP systems on your systems. Unfortunately on my systems some HLP
files are not identified. So in this session i will handle files with suffix
BMK which are related to Windows HELP File which are described by
hlp.trid.xml.

The BMK files are typically found inside directory %LOCALAPPDATA%\Help. For
newer Windows system the old HLP format and therefor the BMK format is not
supported any more.  The samples are created by Microsoft Help tool
winhlp32.exe, when you choose menu entry like "bookmark" and "define".

The file name is WinHlp32.BMK (on Windows XP 32-bit) or WinHlp32 (on Windows 7
and 8.1 64-bit)

So i run trid utility on such bookmark examples. All samples are recognized
and are described wrong as "Multimedia Viewer Book" with suffix MVB by
mvb.trid.xml. Some samples are described with higher priority as "Windows HELP
File" with wrong suffix HLP by hlp.trid.xml (see appended trid-v-old.txt in
output).

For comparison reason i also run the file format identification utility DROID
(See https://sourceforge.net/projects/droid/). Here the samples described by
TrID as Windows HELP File are here described as "Windows Help File" without
mime type by PUID fmt/474. But missing suffix is considered here as bad (See
EXTENSION_MISMATCH true in droid-bmk.csv in output).

For comparison reason i also run file command (version 5.45) on such
samples. Here such samples are recognized and described correctly s "MS
Windows help Bookmark". Also the file size information in bytes is shown (see
appended output/file-5.45.txt). The mime type is here application/x-winhelp
(see appended file-i-5.45.txt in output). The correct file name suffix BMK is
here shown (see appended file-ext-5.45.txt in output).

On Linux according to shared MIME-info database such samples are called
"WinHelp help file". Here application/winhlp is used as mime type. The samples
are just recognized by looking for 4 byte sequence 3F5F0300 at the
beginning. Here suffix HLP is displayed. That information can be seen in
source freedesktop.org.xml.in found for example on gitlab.freedesktop.org.

Luckily i found on the net information parts about Windows HELP. Of course no
official from Microsoft. And this applies also to related bookmark files
sometimes with suffix BMK. So i choose page on Wikipedia. So i use this as
reference. That is expressed inside new definitions by line like:
   <RefURL>https://en.wikipedia.org/wiki/WinHelp</RefURL>

On many sites and also English Wikipedia application/winhlp is mentioned as
mime type for HLP files. But when looking on my Windows systems and
extension.nirsoft.net there not such a thing is listed. Also no such type is
officially registered at IANA.org. So i choose user defined type listed by
file command. That is expressed by line like:
   <Mime>application/x-winhelp</Mime>

So i first create TrID definition bmk.trid.xml by running tridscan on my
samples.

The first XML construct looks like:
   <Bytes>3F5F0300</Bytes>
   <ASCII> ? _</ASCII>
   <Pos>0</Pos>
According to documents the first 4 bytes are the magic for all HLP related
files. So this also expressed inside hlp.trid.xml and mvb.trid.xml by XML same
construct.

At offset 8 FirstFreeBlock is stored as 4 byte little integer. That is offset
of free header. Value -1 ( FFFFFFFFh ) means no free list.  So for some
bookmark examples i get this value but for some not.  That is also different
from pure HLP file. There exist no FirstFreeBlock. That is expressed there by
XML construct like:
   <Bytes>00FFFFFFFF</Bytes>
   <Pos>7</Pos>

The second XML construct looks like:
   <Bytes>000000</Bytes>
   <Pos>5</Pos>

At offset 4 DirectoryStart is stored as 4 byte little integer. That is offset
of FILEHEADER of internal directory. So 3 upper bytes are nil. That means
DirectoryStart is lower 100h. After hard thinking i believe that this "low"
value is probably always true. Why? Normally every bookmark entry is equal to
something like header text and is limited to some dozen characters. So in
worst realistic case with thousands of bookmarks the content just has a size
of some 10000 bytes. With a page size of 400h than the b-tree is not so
complicate organized and is similar organized (directory near the
beginning). So there is not much overhead and total file size is in similar
range.

At offset 12 files is stored as 4 byte little integer. in my examples the 2
upper bytes are nil. So file size is lower 10000h. So this probably always
true. This is expressed by XML construct like:
   <Bytes>0000</Bytes>
   <Pos>14</Pos>

The next XML constructs are short nil byte sequences like:
   <Pattern>
      <Bytes>000000</Bytes>
      <Pos>17</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>34</Pos>
   </Pattern>
   <Pattern>
      <Bytes>0000000000</Bytes>
      <Pos>38</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>184</Pos>
   </Pattern>
But i do not know what this means. Unfortunately i still found no "real"
characteristic that make the difference to other "HLP" files. So i keep these
constructs.

The last construct is a long nil byte sequence reaching about 1 KB limit. That
looks like:
   <Bytes>00000000000000000000000000000000000000000000000000000
   <Pos>186</Pos>
So i do not really found at first glance characteristics for help bookmark. So
may other users know more facts or can improve my definition.

With this new trid definition now all my help bookmark samples are described
more precisely. TrID definition, some samples and output are stored in archive
bmk_.zip. I hope that my definition can be used in future version of triddefs.

With best wishes
Jörg Jenderek
99
Hello trid users,

some months ago i send definitions to do sub classification of Windows NT
Registry Hive. In this session i will handle Windows Firewall configuration.

Such files have typically name suffix WFW (like Win10firewall.wfw
netsh-advfirewall.wfw).

Unfortunately as usual you do not find information about file format from
Microsoft. Either you get samples with accessing files via API or low level
information like click on foo to get bar. Luckily there exist an unofficial
page about Windows registry file format on GitHub. This describe some
technical aspects. So i use this as reference in one definition. That is
expressed by line like:
 <RefURL>
 https://github.com/msuhanov/regf/blob/master/
 Windows%20registry%20file%20format%20specification.md
 </RefURL>

Such samples can be exported and imported for example by command like:
      netsh advfirewall export "c:\firewall-rules.wfw"

So i run trid utility on such WFW examples. All samples are recognized and
are described in principal OK as "Windows NT Registry Hive (generic)" by
hiv.trid.xml. But file name suffix is wrong. It is not HIV/DAT (see appended
trid-v-old.txt in output).

For comparison reason i also run the file format identification utility
DROID (See https://sourceforge.net/projects/droid/). Here the samples are
not recognized.

For comparison reason i also run file command (version 5.45) on such
samples. Here such samples are recognized and described generic as "MS
Windows registry file, NT/2000 or above" (see appended file-5.45.txt in
output). The mime type is here generic application/octet-stream (see
appended file-i-5.45.txt in output). The file name suffix is also not
recognized (see appended file-ext-5.45.txt in output).

Instead of generic application/octet-stream mime type i choose the type used
for generic Windows NT Registry Hive. That is expressed by line like:
   <Mime>application/x-ms-registry</Mime>

Because of missing complete information i first create TrID definition
hiv-wfw.trid.xml by running tridscan on many (29) samples. After running
Windows system in Virtualbox and deleting firewall rules i get small enough
samples containing not so much content.

In global string sections i get 2 lines:
   <String>HBIN</String>
   <String>REGF</String>
According to documentation the first is triggered by 4 byte signature hbin
of Hive bins header. The second is triggered by 4 byte signature regf of
Base block, also known as a file header. All these lines are apparently
characteristics for Windows Registry Hive, but no is specif for WFW.

In Front Block section the first construct is characteristic for all
hive. That is expressed by XML construct like:
   <Pattern>
      <Bytes>72656766</Bytes>
      <ASCII> r e g f</ASCII>
      <Pos>0</Pos>
   </Pattern>

Then there are 3 non nil sequences. These look like:
   <Pattern>
      <Bytes>010000000500000000000000010000002000000000</Bytes>
      <Pos>20</Pos>
   </Pattern>
   <Pattern>
      <Bytes>0001000000</Bytes>
      <Pos>43</Pos>
   </Pattern>
   <Pattern>
      <Bytes>726D746D</Bytes>
      <ASCII> r m t m</ASCII>
      <Pos>164</Pos>
   </Pattern>
Apparently here are some fields constant, but none are specific for WFW.

So i looked in output of patched file command according to documentation
(see file.tmp in output). The content becomes visible when you load such
samples with Microsoft registry editor regedit.exe. Or you can use the
Forensic Registry EDitor (fred). The advantage of this program is that there
exist ports for Windows and Linux. The disadvantage is that is does not work
on all registry examples, because file format is not officially revealed. So
maybe the function of some fields are not known and lead to program
crashes. This tool can be found at
    https://www.pinguin.lu/fred .

After deleting all firewall rules i got not so much patterns in definition
(So no needle in haystack problem).
When i look inside Global Strings section i see only 22 lines like:
   <String>P'F'I'R'E'W'A'L'L'.'L'O'G</String>
   <String>%'S'Y'S'T'E'M'R'O'O'T'%</String>
   <String>DISABLENOTIFICATIONS</String>
   <String>DISABLESTATEFULPPTP</String>
   <String>DISABLESTATEFULFTP</String>
   <String>L'O'G'F'I'L'E'S</String>
   <String>S'Y'S'T'E'M'3'2</String>
   <String>STANDARDPROFILE</String>
   <String>ENABLEFIREWALL</String>
   <String>DOMAINPROFILE</String>
   <String>POLICYVERSION</String>
   <String>PUBLICPROFILE</String>
   <String>IPSECEXEMPT</String>
   <String>LOGFILEPATH</String>
   <String>LOGFILESIZE</String>
   <String>LOGGING</String>
   <String>CP-NO</String>
   <String>HBIN</String>
   <String>HCP-</String>
   <String>REGF</String>
   <String>RMTM</String>
   <String>TALL</String>

According to documentation the second last is triggered by 4 byte signature
rmtm of GUID. According to documentation this field exist in Windows 10. So
maybe if other users have samples from older Windows version like Vista this
line will vanish. My samples are from Windows 11, 10, 8 and 7, but such
samples are maybe "reorganized" by newer Windows version. So i keep this
line at the moment.

In my samples the GUID signature appears as expected at fixed offset. This
is expressed inside front block section by XML construct like:
   <Pattern>
      <Bytes>726D746D</Bytes>
      <ASCII> r m t m</ASCII>
      <Pos>164</Pos>
   </Pattern>

According to documentation before that 16 byte GUID is stored.  After the
signature 8 byte last reorganized timestamp is stored. So in my samples
value zero obviously means not reorganized.

The pattern before looks like:
   <Pattern>
      <Bytes>11</Bytes>
      <Pos>155</Pos>
   </Pattern>
According to documentation at offset 148 16 byte GUID TmId is stored. So
apparently by lucky circumstances 1 byte is the same.  Assuming that other
GUID values can occur this construct vanish.

The pattern before looks like:
   <Pattern>
      <Bytes>00000000</Bytes>
      <Pos>144</Pos>
   </Pattern>
The same thoughts must be applied here. According to documentation at offset
144 4 byte flags are stored. So i keep it and mention fact in remark line.

The pattern before looks like:
   <Pattern>
      <Bytes>11</Bytes>
      <Pos>135</Pos>
   </Pattern>
According to documentation at 128 16 byte GUID LogId is stored.  So
apparently by luckily circumstances 1 byte of LogId in my samples is the
same. Assuming that other GUID values can occur this construct vanish and
can be deleted.

The pattern before look like:
   <Pattern>
      <Bytes>11</Bytes>
      <Pos>119</Pos>
   </Pattern>
The same thoughts must be applied here. According to documentation at offset
112 16 byte GUID RmId is stored. Here in my samples one middle byte of that
GUID is the same. Assuming that other GUID values can occur this construct
vanish and can be deleted.

The first 3 XML constructs look like:
   <Pattern>
      <Bytes>72656766</Bytes>
      <ASCII> r e g f</ASCII>
      <Pos>0</Pos>
   </Pattern>
   <Pattern>
      <Bytes>000000</Bytes>
      <Pos>5</Pos>
   </Pattern>
   <Pattern>
      <Bytes>000000</Bytes>
      <Pos>9</Pos>
   </Pattern>
According to documentation after regf signature 4 byte primary sequence
number is stored followed by secondary sequence number.  The first number is
incremented by 1 in the beginning of a write operation on the primary
file. And the second is incremented by 1 at the end of a write operation on
the primary file and numbers should be equal after a successful write
operation. So apparently for such samples the sequence numbers are always be
1 at the beginning. In my examples i get beside 1 low sequence number 5. So
the 3 upper bytes of sequence numbers are nil.  When reaching 32-bit limit
the last 2 construct will vanish. So i delete these.

So next XML construct must be inspected. This looks like:
 <Bytes>010000000500000000000000010000002000000000</Bytes>
 <Pos>20</Pos>

According to documentation at offset 20 major version is stored. The value
in all NT Windows is 1. At offset 24 minor version is stored. In my examples
i get value 5. This is the values mostly found in my other registry
samples. Value 0 means "pre" version, 1 means NT 3.1, 2 means NT 3.5 and
higher values 3,4,5,6 means NT 4 til Windows 11. At offset 28 file type is
stored. 0 means primary file and other values are used for transaction
variants (*.LOG*). So this is always true here. At offset 32 file format is
stored. 1 means direct memory load; This is what i also found in my other
registry samples. At offset 36 root cell offset is stored. In all my
registry samples i get value 20h. At offset 40 Hive bins data size is
stored. Here i get lowest value 1000h. That is apparently the minimal
possible value. I do not know if other values occur here. So i keep this
construct and mention facts in remark line.

The next construct looks like:
   <Pattern>
      <Bytes>0001000000</Bytes>
      <Pos>43</Pos>
   </Pattern>
At offset 44 the clustering factor is stored. In all my registry samples i
get here value 1. That means 512 block size. Before data size is stored. So
in my samples i get "low" values. So the upper byte is nil.  Assuming that
this size can reach 32-bit limit this now becomes like:
   <Pattern>
      <Bytes>01000000</Bytes>
      <Pos>44</Pos>
   </Pattern>

At offset 48 the partial file path is stored. These 64 bytes contain UTF-16
LE encoded name. In the samples with sequence number 1 this string is not
used. So it is filled there with nil bytes. But these names parts can look
like:
   ry\netsh-advfirewall-export.wfw
Because name part are ASCII like stored as UTF16-LE on odd offsets i get a
nil byte. That is expressed by XML constructs like:
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>49</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>51</Pos>
   </Pattern>
   ...
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>107</Pos>
   </Pattern>
   <Pattern>
      <Bytes>000000</Bytes>
      <Pos>109</Pos>
   </Pattern>
I do not know if it possible to create WFW on file system with directory
names with exotic languages like Chinese. When this is true then the nil
bytes will vanish and only the terminating nil character for UTF-16 will
survive. So the above mentioned patterns vanish and the last one becomes
like:
   <Pattern>
      <Bytes>0000</Bytes>
      <Pos>110</Pos>
   </Pattern>

The last XML construct is like in other definition. At offset 200h (=512)
comes 600h (=1536) padding nil bytes. This is expressed by XML construct
like:
   <Bytes>0000000000000000000000000000000000000000000000000000000
   <Pos>512</Pos>

In other definition after rmtm signature comes Last reorganized
timestamp. After that at offset 176 i get nil byte sequence til about 512
limit. So this looks like:
   <Pattern>
      <Bytes>0000000000000000000000000000000000000000000000
      <Pos>176</Pos>
   </Pattern>

In global strings section i get 4 lines triggered by UTF-16 strings.  The
lines look like:
   <String>P'F'I'R'E'W'A'L'L'.'L'O'G</String>
   <String>%'S'Y'S'T'E'M'R'O'O'T'%</String>
   <String>L'O'G'F'I'L'E'S</String>
   <String>S'Y'S'T'E'M'3'2</String>
Apparently these seem to be characteristic for all WFW samples. When you
look in exported output of regedit you see that this is the file name of
log. That looks like:
   %systemroot%\\system32\\LogFiles\\Firewall\\pfirewall.log

I do not know if it is possible to change this path, but i see no GUI option
to do this. So i assume that this always true. This name is stored inside
variable LogFilePath together with LogFileSize in Logging section. So these
facts are expressed by lines like:
   <String>LOGFILEPATH</String>
   <String>LOGFILESIZE</String>
   <String>LOGGING</String>

One section up i get 3 profiles sections. These are expressed by lines like:
   <String>STANDARDPROFILE</String>
   <String>DOMAINPROFILE</String>
   <String>PUBLICPROFILE</String>

Then some are apparently triggered by behaviour of firewall. These are
expressed by lines like:
   <String>DISABLENOTIFICATIONS</String>
   <String>DISABLESTATEFULPPTP</String>
   <String>DISABLESTATEFULFTP</String>
   <String>ENABLEFIREWALL</String>
   <String>POLICYVERSION</String>
   <String>IPSECEXEMPT</String>
But i do not know if these conditions are always apply. So i keep lines.

Then there are lines that look like garbage:
   <String>CP-NO</String>
   <String>HCP-</String>
   <String>TALL</String>
The last obviously is triggered by keys like DoNotAllowExceptions
CertificateInstall-TCP-Out. The first is triggered by keys that contains
phrases like -RTSP-NoScope or -TCP-NoScope.  The second is triggered by keys
contains phrases like DHCP-In DHCP-Out So i delete these 3 lines.

With this new trid definition now all my WFW samples are described in detail
(correct suffix) beside generic. TrID definition, some samples and output
are stored in archive wfw_.zip. I hope that my definition can be used in
future version of triddefs.

With best wishes
Jörg Jenderek
100
TrID File Identifier / gid.trid.xml replacing gid_idx.trid.xml for GID Help inde
« Last post by jenderek on December 16, 2023, 04:31:23 AM »
Hello trid users,

some months ago i migrate to Windows 10. Some days ago i wanted to use the
help of an older Windows program. Now i get an error message that the used
help system is not not supported any more. The same error occur in my previous
Window 8.1 system. The solution offered by Microsoft is to download
installation package with knowledge base KB917607. For Windows 8.1 i could
download a MSU package for my language and CPU architecture. This could be
started by double click. But for Windows 10 no download is offered.  I tried
the version for Windows 8.1 but when starting installation Windows complains
that package is not suited for my version.

For the windows help files the name suffix HLP is used. Unfortunately this
suffix is also used for other help systems. So in first step you want to
identify all HLP systems on your systems. Unfortunately on my systems some HLP
files are not identified. So in this session i will handle files with suffix
GID which are related to Windows HELP File which are described by
hlp.trid.xml.

The GID files are typically found in same directory as corresponding HLP file.
The samples are created by Microsoft Help tool winhlp32.exe.

So i run trid utility on such GID examples. Many samples are recognized and
are described correctly as "GID Help index" without mime type by
gid_idx.trid.xml. With little lower priority these samples are also described
as "Windows HELP File" with file name suffix HLP by hlp.trid.xml. Again with
little lower priority these samples are also described as "Multimedia Viewer
Book" with suffix MVB and mime type application/octet-stream by mvb.trid.xml.
(see appended trid-v-old.txt in output).

For comparison reason i also run the file format identification utility DROID
(See https://sourceforge.net/projects/droid/). Here the samples are
recognized, but these are described only as "Windows Help File" by PUID
fmt/474. Here GID suffix is considered as "bad".

For comparison reason i also run file command (version 5.45) on such
samples. Here also most samples are recognized and described correctly as "MS
Windows help Global Index". Also the file size information in bytes is shown
(see appended output/file-5.45.txt). The mime type is application/x-winhelp
here (see appended file-i-5.45.txt in output). The correct file name suffix
GID is here shown for most samples (see appended file-ext-5.45.txt in
output). A few samples (like ICCviewer.GID win98rk.GID) are misidentified as
"MS Windows help Bookmark" with wrong suffix bmk. These samples are recognized
by TrID. On the other hand file command recognize some samples (like
RESCUE32.GID grep.GID IBMAVW.GID putty.GID RESCUE32.GID) not identified
correctly by TrID. This happens because file command use other methods to
identify GID samples.

On Linux according to shared MIME-info database such samples are called
"WinHelp help file". Here application/winhlp is used as mime type. The samples
are just recognized by looking for 4 byte sequence 3F5F0300 at the
beginning. Here suffix HLP is displayed. That information can be seen in
source freedesktop.org.xml.in found for example on gitlab.freedesktop.org.

Luckily i found on the net information parts about Windows HELP. Of course no
official from Microsoft. And this applies also to related annotation files
with suffix GID. So i choose page on Wikipedia. So i use this as
reference. That is expressed inside new definitions by line like:
   <RefURL>https://en.wikipedia.org/wiki/WinHelp</RefURL>

On many sites and also English Wikipedia application/winhlp is mentioned as
mime type for HLP files. But when looking on my Windows systems and
extension.nirsoft.net there not such a thing is listed. Also no such type is
officially registered at IANA.org. So i choose user defined type listed by
file command. That is expressed by line like:
     <Mime>application/x-winhelp</Mime>

So i first run tridscan on undetected samples to improve gid_idx.trid.xml. At
first glance (see trid.tmp in output) now all is OK and all my GID samples are
recognized. But when looking what has happened we see what experienced TrID
users expected. Some short nil byte sequences vanished like:
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>40</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>52</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>58</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>62</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>66</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>73</Pos>
   </Pattern>

These are probably triggered by lucky circumstances (Too few samples and not
reaching 32-bit limits). So i guess when inspecting more samples then the
remaining short nil sequences also vanish. Then in the end only 2 XML
constructs will survive. The first is expressed by
   <Bytes>3F5F0300</Bytes>
   <ASCII> ? _</ASCII>
   <Pos>0</Pos>
That is the pattern that is used by all tools for recognition.

The second looks like:
   <Pattern>
      <Bytes>00FFFFFFFF</Bytes>
      <Pos>7</Pos>
   </Pattern>

At offset 4 DirectoryStart is stored as 4 byte little integer. That is offset
of FILEHEADER of internal directory.  At offset 8 FirstFreeBlock is stored as
4 byte little integer. That is offset of free header. Value -1 ( FFFFFFFFh )
means no free list. So DirectoryStart is not reaching maximal 32-bit limit. So
DirectoryStart is lower 1000000h (=16777216 = 16 MiB) and GID samples have no
no free list. But when we look at hlp.trid.xml we see that these are the used
patterns for HLP samples. So the conclusion is that by current TrID definition
in principal there is no difference between GID and HLP samples. Obviously
this wrong! I know it because i implement this feature for file command where
this recognition is done in another way.

So i recommend not to use or improve gid_idx.trid.xml any more.

So i run tridscan on my GID samples and create replacement definition
gid.trid.xml. Now i get similar patterns. Again i get the same two XML
construct. Then i get also some short nil byte sequences like:
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>15</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>19</Pos>
   </Pattern>
   <Pattern>
      <Bytes>0000</Bytes>
      <Pos>23</Pos>
   </Pattern>
   <Pattern>
      <Bytes>0000</Bytes>
      <Pos>28</Pos>
   </Pattern>
   <Pattern>
      <Bytes>0000</Bytes>
      <Pos>35</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>793</Pos>
   </Pattern>
   <Pattern>
      <Bytes>0000</Bytes>
      <Pos>967</Pos>
   </Pattern>
   <Pattern>
      <Bytes>000000000000</Bytes>
      <Pos>1081</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>1090</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>1094</Pos>
   </Pattern>
I assume that these are triggered by lucky circumstances. So i delete these.

But now in Global Strings section i get lines like:
   <String>KWBTREE</String>
   <String>FILES</String>
   <String>FLAGS</String>
   <String>KWMAP</String>
   <String>.CNT</String>
   <String>.HLP</String>
   <String>PETE</String>
2 are triggered by references to other help related file types with suffix CNT
and HLP. Two lines with KWBTREE and KWMAP are mentioned in documentation about
HLP format. The line with name PETE seems to be characteristic for GID. That
characteristic is used by file command as additional test.

With this new trid definition now all my help GID samples are described. But
now recognition rate is higher and i get clear differences compared to HLP and
MVB samples.

TrID definition, some samples and output are stored in archive gid_.zip. I
hope that my definition can be used in future version of triddefs.

With best wishes
Jörg Jenderek
Pages: 1 ... 8 9 [10]