Author Topic: hlp-fm.trid.xml for FrameMaker help  (Read 995 times)

jenderek

  • Sr. Member
  • ****
  • Posts: 375
hlp-fm.trid.xml for FrameMaker help
« on: December 13, 2023, 03:13:37 PM »
Hello trid users,

some days ago i handled files created or used by Adobe FrameMaker. In this
session i will handle FrameMaker samples with file name suffix HLP.

So i run trid utility on such examples. Some samples are recognized and
described as "FrameMaker document" by fm.trid.xml with file name suffix FM
(see appended trid-v-old.txt in output).

For comparison reason i also run the file format identification utility DROID
(See https://sourceforge.net/projects/droid/). Here samples are also
recognized and described as "Adobe FrameMaker Document" with mime type
application/vnd.framemaker. The samples are described by file command with 4
are here described with additional version "4.0" via PUID fmt/535. The samples
described by file command with 3 are here described with additional version
"3.0" via PUID fmt/534. The suffix HLP is considered as "bad" (See
EXTENSION_MISMATCH true in droid-hlp-fm.csv).

For comparison reason i also run file command (version 5.45) on such
samples. Here samples are also recognized and described as "FrameMaker
document".  It also display some version detail (like "4.0 K" "3.0 F").
The mime type here is application/x-mif (see appended file-i-5.45.txt in
output). Here no file name suffix is listed (see appended file-ext-5.45.txt
in output).

On Linux according to shared MIME-info database such samples are called "Adobe
FrameMaker document". Here application/vnd.framemaker is used as mime
type. The samples are just recognized by looking for 10 byte sequence
<MakerFile at the beginning. Here suffix FM is displayed. That information can
be seen in source freedesktop.org.xml.in found for example on
gitlab.freedesktop.org.

Apparently the internal help of FrameMaker use the same format as for
FrameMaker MF documents. So such samples are typically found found inside HELP
directory of FrameMaker program directory. Unfortunately i found no
documentation about this HLP format and what is the difference compared with
MF documents.

The computer companies are like the money banks. Too big to fail or to be
regulated and having too many lobby people trying to avoid rules and
regulations. I am also angry with your political elite like European union. So
we have hundreds of computer companies storing files on my personal computer
without any public rules. But for me as single person i must check my car
every 2 years and are forbidden to buy old light bulbs and vacuum cleaners
with thousand of Watts.  So apparently some are more equal than others without
any reasons. That will lead in the end of current public society.

Instead web page about FrameMaker on Adobe web site i do not use this as
reference. There you find only buzz words of marketing people without any
relevant facts. I now use page about FrameMaker on file format archive team
site. That is expressed by line like:
   <RefURL>http://fileformats.archiveteam.org/wiki/FrameMaker</RefURL>

So i choose the mention mime type also for HLP samples. That is expressed by
line like:
   <Mime>application/vnd.framemaker</Mime>

After running tridscan i look at generated hlp-fm.trid.xml. The first pattern
is characteristic for FrameMaker documents and also found inside
fm.trid.xml. This looks like:
   <Bytes>3C4D616B657246696C6520</Bytes>
   <ASCII> . M a k e r F i l e</ASCII>
   <Pos>0</Pos>

The second XML construct looks like:
   <Bytes>2E30</Bytes>
   <ASCII> . 0</ASCII>
   <Pos>12</Pos>
This is triggered because my HLP samples are version 3.0 and 4.0. On reference
page also documents version like 5.5 is mentioned and i myself found documents
with version 10.0. So i assume that there exist help samples with such
versions. So this construct will vanish.

Then i got many short nil byte sequences like:
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>37</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>48</Pos>
   </Pattern>
   ..
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>324</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>365</Pos>
   </Pattern>
These are probably triggered by lucky circumstance ( too few samples with
values not reaching 32-bit limit). So i delete such patterns.

Then i got 2 pattern matching Little o letter. These look like:
   <Pattern>
      <Bytes>6F</Bytes>
      <ASCII> o</ASCII>
      <Pos>243</Pos>
   </Pattern>
   <Pattern>
      <Bytes>6F</Bytes>
      <ASCII> o</ASCII>
      <Pos>258</Pos>
   </Pattern>
For version 4 samples these was expressed by XML construct like:
 <Bytes>0008466F6F746E6F7465000D5461626C65466F6F746E6F7465</Bytes>
 <ASCII> . . F o o t n o t e . . T a b l e F o o t n o t e</ASCII>
 <Pos>240</Pos>

For version 3 we found two phrases Footnote TableFootnote at little lower
offsets (238 248) and without leading pascal string length byte. Maybe these
vanish for other version help files. So i delete these pattern and i mention
these facts in a remark line. This fact is still described inside Global
Strings section by line like:
   <String>TABLEFOOTNOTE</String>

So in front block section i get 2 longer nil sequences. That were expressed by
XML constructs like:
   <Pattern>
      <Bytes>0000000000000000</Bytes>
      <Pos>118</Pos>
   </Pattern>
   <Pattern>
      <Bytes>0000000000000000</Bytes>
      <Pos>278</Pos>
   </Pattern>
So i keep these constructs at the moment.

Then i got a longer non nil byte sequence after starting magic. This was
expressed by XML construct like:
   <Pattern>
      <Bytes>3E0D0A1A000008090A0D1A41617FFF</Bytes>
      <ASCII> . . . . . . . . . . . A a</ASCII>
      <Pos>15</Pos>
   </Pattern>
I do not know what this means. So i keep it at the moment.

Then in Global Strings i get short garbage phrases like:
   <String>$AMPM</String>
   <String>COMM</String>
   <String>MENT</String>
   <String>TION</String>
I assume that these are triggered by lucky circumstances. So i delete such
lines.

Now i look what seems to be characteristic for help samples. Obviously this is
matched by line like:
   <String>.HLP</String>
When i look with help of patched file command (see file.tmp in output) for
first occurrence of this 4 byte sequence we see that this followed by colon
character. That is followed by "link" name ( like firstpage Help Overview
lastpage Menu. Before "help" name is one space character. And before is a
"keyword {like openlink or gotolink (most cases)}. That is expressed by line
like:
   <String>GOTOLINK</String>

Then i get lines which maybe are not needed, but i do not know for sure. So
maybe other users can improve (shrink/cancel) such lines.

Some seems to be triggered by used font information like:
      <String>HELVETICA</String>
      <String>VERY THIN</String>
      <String>REGULAR</String>
      <String>DOUBLE</String>
      <String>MEDIUM</String>
      <String>TIMES</String>
      <String>BOLD</String>

Some seems to be triggered by used navigation operations like:
      <String>PREVIOUSLINK</String>
      <String>PREVIOUSPAGE</String>
      <String>FIRSTPAGE</String>
      <String>$MARKER1</String>
      <String>$MARKER2</String>
      <String>NEXTPAGE</String>
      <String>ON PAGE</String>
      <String>INDEX</String>
      <String>RIGHT</String>
      <String>LEFT</String>

Some seems  to be triggered by used meta information like:
      <String>$FULLFILENAME</String>
      <String>$TBLSHEETNUM</String>
      <String>$CURPAGENUM</String>
      <String>$MONTHNAME</String>
      <String>$SHORTYEAR</String>
      <String>$FILENAME</String>
      <String>$MINUTE00</String>
      <String>$MONTHNUM</String>
      <String>$DAYNUM</String>
      <String>$AMPM</String>
      <String>$HOUR</String>
      <String>$YEAR</String>


TrID definition, some samples and output are stored in archive hlp_fm.zip. I
hope that my definition can be used in future version of triddefs.

With best wishes
Jörg Jenderek


Mark0

  • Administrator
  • Hero Member
  • *****
  • Posts: 2744
    • Mark0's Home Page
Re: hlp-fm.trid.xml for FrameMaker help
« Reply #1 on: December 14, 2023, 06:23:59 PM »
Many thanks!