Author Topic: adv-wp.trid.xml for WordPerfect dictionary advise  (Read 1239 times)

jenderek

  • Sr. Member
  • ****
  • Posts: 375
adv-wp.trid.xml for WordPerfect dictionary advise
« on: October 03, 2021, 08:03:13 PM »
Hello trid users,

some times ago i handles some WordPerfect documents. Now i look at companion
examples with file name extension ADV. The file names often looks like
WTnnLL.adv where nn is a version number like 13 or 21 and LL is a language
abbreviation like de, fr, nl, us , etc.

These examples are normally found in WritingTools sub directory of
WordPerfect directory like "WordPerfect Office 2021" or "CorelDRAW Graphics
Suite 2019".

When i run TrID on such examples these are described correct but unspecific
like "WordPerfect (generic)" by definition wp-generic.trid.xml (see appended
output/trid-v-old.txt).

For comparison reason i also run the file utility (version 5.40). This
describes the examples in similar way as "Corel/Wordperfect" and more
specific as "product 34, file type 11" and "v6.0" (see appended
output/file-5.40.txt).

When i run a patched file command according to documentation, then the ADV
examples contain text phrases which looks like advise for example which
character (point or comma depending on used language) to use for decimal
numbers (see appended output/file.tmp).  So these are obviously part or
Corel WordPerfect tools ( dictionary, thesaurus, etc.). So this is expressed
inside definition by line like:
   <FileType>WordPerfect dictionary advise</FileType>

I found no relevant information on official websites. At least i found a
document about WordPerfect File Format. So this is used as reference by line
like:
 <RefURL>
 https://github.com/OneWingedShark/WordPerfect/blob/master/doc/SDK_Help/FileFormats/
 WPFF_DocumentStructure.htm
 </RefURL>

The first four bytes of a WP file are \xffWPC.
At offset 4 a long pointer to document area is stored as 4 byte integer. In
the examples the value was hexadecimal 10 ( 16 decimal). That mean this area
comes direct after file header.
At offset 8 the product number is stored as 1 byte integer. So for ADV this
is 34 ( hexadecimal 22).
At offset 9 the file type is stored as 1 byte integer. So for ADV this is 11
(hexadecimal 0B).
At offset 10 the major version is stored as 1 byte integer. In examples this
was always 6.
At offset 11 the minor version is stored as 1 byte integer. In examples this
was always 0. These observations are expressed by XML construct like:
   <Bytes>FF57504310000000220B0600</Bytes>
   <ASCII> . W P C . . . . "</ASCII>
   <Pos>0</Pos>

Then there are short byte seunces in 6 byte steps like:
   <Bytes>0000</Bytes>
   <Pos>18</Pos>
   ...
   <Bytes>0000</Bytes>
   <Pos>24</Pos>
   ...
   <Bytes>0000</Bytes>
   <Pos>2046</Pos>
I assume that these are something like short pointers. So also higher values
maybe possible. So i delete these patterns.

Because ADV files contain text passages i get in global strings section many
short fragments commnon in all languages. That is expressed by lines like:
      <String>E PERSO</String>
      <String>E FORM</String>
      <String>E VERB</String>
      <String>N FORM</String>
      <String>N VERB</String>
      <String>E GEN</String>
      <String>E MEN</String>
      <String>E _VERB</String>
      <String>_ EN</String>
But when looking at such text fragments inside the ADV examples i belive
that all such lines in general are not characteristic and are generated by
lucky circum stances (too few examples). So i delete all string patterns.

With the new defition now all my ADV examples are described with details (
see appended trid-v.txt). TrID definition and output are stored in archive
adv.zip. I hope that the XML file can be used in future version of triddefs.

With best wishes
Jörg Jenderek


Mark0

  • Administrator
  • Hero Member
  • *****
  • Posts: 2743
    • Mark0's Home Page
Re: adv-wp.trid.xml for WordPerfect dictionary advise
« Reply #1 on: October 05, 2021, 03:24:19 PM »
Thanks!