Author Topic: hlp-symantec.trid.xml for Symantec DOS software help  (Read 938 times)

jenderek

  • Sr. Member
  • ****
  • Posts: 375
hlp-symantec.trid.xml for Symantec DOS software help
« on: November 03, 2023, 05:27:14 PM »
Hello trid users,

some months ago i migrate to Windows 10. Some days ago i wanted to use the
help of an older Windows program. Now i get an error message that the used
help system is not not supported any more. The same error occur in my previous
Window 8.1 system. The solution offered by Microsoft is to download
installation package with knowledge base KB917607. For Windows 8.1 i could
download a MSU package for my language and CPU architecture. This could be
started by double click. But for Windows 10 no download is offered.  I tried
the version for Windows 8.1 but when starting installation Windows complains
that package is not suited for my version.

There exist some obscure tutorial for that issue. But before doing these items
the step before for me is to find the Windows help files on my computer
systems. For the windows help files the name suffix HLP is used. Unfortunately
this suffix is also used for other help systems. So in first step you want to
identify all HLP systems on your systems.  Unfortunately on my systems some
HLP files are not identified. So in this session i will handle HLP samples
which are produced by Symantec. These are used by DOS software programs like
DISKEDIT.EXE RESCUE.EXE UNERASE.EXE. The corresponding help files have same
main name as program and file name suffix HLP.

So i run trid utility on such HLP examples. All samples are not recognized and
are described as "Unknown!" (see appended trid-v-old.txt in output)

For comparison reason i also run the file format identification utility DROID
(See https://sourceforge.net/projects/droid/). Here the samples are also not
recognized.

For comparison reason i also run file command (version 5.45) on such
samples. Here such samples are also not recognized and described also as
"data" (see appended output/file-5.45.txt). The mime type is here generic
application/octet-stream (see appended file-i-5.45.txt in output). The file
name suffix is also not recognized (see appended file-ext-5.45.txt in output).

Unfortunately i found no file format specification.  When searching on the net
i can find no information parts. So at least i found a page about Symantec at
Wikipedia. So i use this as reference .That is expressed inside new
definitions by line like:
   <RefURL>https://en.wikipedia.org/wiki/Symantec</RefURL>

Instead of generic application/octet-stream mime type i choose an user defined
one. That is expressed by line like:
   <Mime>application/x-symantec-hlp</Mime>

Because of missing information i first create TrID definition
hlp-symantec.trid.xml by running tridscan on my few samples. Apparently all
samples start with similar copyright message. That becomes visible when
running a command like:
   type *.HLP
Then we see 2 copyright at the beginning like:
Copyright 1991-1997 Symantec Corporation
Copyright 1991-1997 by Symantec Corporation

When you try to get similar output by running a command like
       head -1 *.HLP
you get a little different output, because type command stop at Control-Z
character which is considered as end of file (EOF) marker.

So starting copyright message is terminated Ctrl-Z. Afterwards comes some nil
bytes. Apparently these are used for padding purpose. After the nil bytes at
offset 0x81 the help title or main program name is stored. So in my examples i
get here strings like:
   RESCUED
   UNERASE
   DISKEDIT

So these information items are expressed inside Front Block section by XML
constructs like:
   <Pattern>
      <Bytes>436F7079726967687420313939312D3139393720</Bytes>
      <ASCII> C o p y r i g h t   1 9 9 1 - 1 9 9 7</ASCII>
      <Pos>0</Pos>
   </Pattern>
   <Pattern>
      <Bytes>79</Bytes>
      <ASCII> y</ASCII>
      <Pos>21</Pos>
   </Pattern>
   <Pattern>
      <Bytes>6F72</Bytes>
      <ASCII> o r</ASCII>
      <Pos>33</Pos>
   </Pattern>
   <Pattern>
      <Bytes>000000000000000000000000000000000000000000000000000000000
      <Pos>44</Pos>
   </Pattern>
And in Global Strings section we find lines like:
      <String>SYMANTEC CORPORATION</String>
      <String>COPYRIGHT 1991-1997</String>

You might argue that year in copyright message becomes different in other
samples. In my option this is unlikely, because development of DOS has
stopped. So newer Symantec programs becomes Windows executables and these of
course use the Window help format which is different and is described by
hlp.trid.xml. In order to reduce pattern and just keep characteristic part i
just delete the short second and third XML constructs and mention my
observation in remark line.

I also get some short nil patterns like:
   <Pattern>
      <Bytes>0000</Bytes>
      <Pos>151</Pos>
   </Pattern>
   <Pattern>
      <Bytes>0000</Bytes>
      <Pos>169</Pos>
   </Pattern>
   <Pattern>
      <Bytes>0000</Bytes>
      <Pos>195</Pos>
   </Pattern>
   <Pattern>
      <Bytes>0000</Bytes>
      <Pos>199</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>202</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>214</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>220</Pos>
   </Pattern>
   <Pattern>
      <Bytes>0000</Bytes>
      <Pos>223</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>226</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>246</Pos>
   </Pattern>
I assume that these are triggered by lucky circumstances ( Too few
samples). So i delete such lines.

I also get some short space patterns like:
   <Pattern>
      <Bytes>20</Bytes>
      <Pos>674</Pos>
   </Pattern>
   <Pattern>
      <Bytes>20</Bytes>
      <Pos>1092</Pos>
   </Pattern>
   <Pattern>
      <Bytes>20</Bytes>
      <Pos>1228</Pos>
   </Pattern>
   <Pattern>
      <Bytes>20</Bytes>
      <Pos>1296</Pos>
   </Pattern>
   <Pattern>
      <Bytes>20</Bytes>
      <Pos>1298</Pos>
   </Pattern>
   <Pattern>
      <Bytes>20</Bytes>
      <Pos>1369</Pos>
   </Pattern>
   <Pattern>
      <Bytes>20</Bytes>
      <Pos>1545</Pos>
   </Pattern>
   <Pattern>
      <Bytes>20</Bytes>
      <Pos>1772</Pos>
   </Pattern>
   <Pattern>
      <Bytes>20</Bytes>
      <Pos>1840</Pos>
   </Pattern>
   <Pattern>
      <Bytes>20</Bytes>
      <Pos>1843</Pos>
   </Pattern>
When i looked at the mentioned offset i see there ASCII like text
fragments. Because of new used samples by lucky circumstances in all inspected
samples at mentioned offset a space character occur. So i delete such
patterns.

Then i also get some other short pattern like:

   <Pattern>
      <Bytes>632E</Bytes>
      <ASCII> c</ASCII>
      <Pos>140</Pos>
   </Pattern>
   <Pattern>
      <Bytes>03001E00</Bytes>
      <Pos>157</Pos>
   </Pattern>
   <Pattern>
      <Bytes>0100</Bytes>
      <Pos>165</Pos>
   </Pattern>
   <Pattern>
      <Bytes>001800</Bytes>
      <Pos>188</Pos>
   </Pattern>
   <Pattern>
      <Bytes>0048</Bytes>
      <ASCII> . H</ASCII>
      <Pos>172</Pos>
   </Pattern>
   <Pattern>
      <Bytes>5E</Bytes>
      <ASCII> ^</ASCII>
      <Pos>1318</Pos>
   </Pattern>
I assume that these are triggered by lucky circumstances ( Too few
samples). So i delete such lines.

When you run this help system it uses a hyperlink system as you know it from
HTML pages. So you find the lower or capitalised phrase inside the hlp
file. The first appearance occur at fixed offset. So these facts are expressed
inside Front Block section by XML construct like:
   <Bytes>797065726C696E6B00</Bytes>
   <ASCII> y p e r l i n k</ASCII>
   <Pos>204</Pos>
and inside Global Strings sections by lines like
   <String>E HYPERLINK</String>
   <String>HHYPERLINKS</String>
So these seem to be characteristic. so i keep these theses patterns.

Then i got inside global strings section short now word patterns expressed by
lines like:
      <String>CH A</String>
      <String>D IN</String>
      <String>F1 F</String>
      <String>LICK</String>
I assume that these are triggered by lucky circumstances. So i delete such
lines.

Then i got inside global strings section short word like looking patterns are
expressed by lines like:
      <String>EACH</String>
      <String>ENTE</String>
      <String>MARK</String>
      <String>SUCH</String>
      <String>WARE</String>
      <String>BEEN</String>
      <String>BENT</String>
      <String>ALLE</String>
      <String>SYSTEM</String>
      <String>DIALOG</String>
      <String>INSTALL</String>

When you look in HLP files these are apparently triggered by lucky
circumstances. ( Too few samples. So WARE is triggered by hardware and
software).

The reference to DOS programs was expressed by 2 lines like:
   <String>DOS DE</String>
   <String>. DOS</String>
The second was triggered by sentences starting with DOS ( like "Record. DOS
depends" "wurde. DOS überschreibt" ). The first was triggered by phrase like
"DOS depends" OS den ersten" "DOS den Rest").  The mentioned HLP samples
describe at least a DOS program so i assume that the word DOS appears in all
such HLP samples but maybe with other sentence structure and other
languages. So i keep these lines at the moment.

The reference to programs was expressed by line like:
      <String>PROGRAM</String>
This was triggered by phases (like "program as an added" "DOS FDISK program"
"a program that") The mentioned HLP samples describe at least a DOS program so
i assume that the word program appears in all such HLP samples. But maybe with
other and exotic languages this is maybe not true. But i do not know. So i
keep this line.

Then there was a line like:
   <String>UNERASE</String>
It is clear that such a phrase occur in UNERASE.HLP, but why in the other HLP
samples. There also others programs are mentioned by phrases like "Unformat
and UnErase which" or "UnFormat and UnErase use". I assume that there may
exist symantec tools without reference to UNERASE tool. So i delete this line.

Then i found longer word line like:
      <String>OPERATION</String>
This was triggered by phrase like "Suchoperation", "Operationen mit einer
Zwischenablage" and "the operational status". I assume that with more samples
especially with other languages this does not occur any more. So i delete this
line.

If the help system is based on a hyperlink concept then it makes sense that
inside the HLP file also exist an index part. So inside HLP samples exist
phrase like "Rescue Disk Index" or "Click [Index]". That is expressed inside
definition by line like:
   <String>INDEX</String>
But i do not know if in other and more exotic HLP samples this is called in
another way. So i keep it at he moment.

Then i found a longer word line like:
   <String>E INFORMATION</String>
This was trigger by phrases like "Rescue Information", "accurate information",
"the information" and "same information". I assume that there exist HLP
samples in other more exotic languages or without such phrase. So i delete
this line.

As written on Wikipedia Symantec acquire Peter Norton Computing, a developer
of various utilities for DOS. So this relation is expressed inside definition
by lines like:
   <String>N NORTON</String>
   <String>S NORTON</String>
   <String>NORTON UTILITIES</String>
These are triggered by phrase like "von Norton Protection" and "as Norton Disk
Doctor". Symantec has now become Gen Digital Inc. Devolpment of DOS has
stopped there. So no adapted marketing names will not occur. So this phrase
will still occur in such old DOS but maybe in other context phrase. So i
changed the first 2 to one line like:
   <String>NORTON</String>
I keep items what maybe are relevant because there exist other hlp samples
described as "Peter Norton Computing Help" by hlp-pnc.trid.xml.

With this new trid definition now all my Symantec HLP samples are described. TrID
definition and output are stored in archive hlp_symantec.zip. I hope that my
definitions can be used in future version of triddefs.

There are more HLP samples which are not recognized. I will try to handle
these items in a future session.

With best wishes
Jörg Jenderek


Mark0

  • Administrator
  • Hero Member
  • *****
  • Posts: 2743
    • Mark0's Home Page
Re: hlp-symantec.trid.xml for Symantec DOS software help
« Reply #1 on: November 06, 2023, 10:41:20 PM »
Thanks for the new defs!
I have refined it scanning also other .HLP files for previous versions of Symantec's / Norton utilities.