Hello trid users,
some days ago i handled WordPerfect files with name extension CBT. These
are found in sub directory WritingTools inside WordPerfect program directory
"c:\Program Files (x86)\Corel\WordPerfect Office 2021". So i looked in
Programs Sub directory for other WordPerfect samples. There exist samples
with QRS file name extension and names like: wpen.qrs wpDE.qrs.
These start with 2 letter phrase WP. That apparently is the abbreviation for
WordPerfect. The starting 2 letter phrase are followed last capitals
correspond to used language. For Germany DE is used and for English EN is
used.
When looking how such QRS extension is called by others, i choose as
describing text other than file command. That is expressed by line like:
<FileType>WordPerfect Equation Editor Resource Support</FileType>
I found no information especially about file format specification about such
WordPerfect files, but luckily some basic info are found in unofficial
WordPerfect File Format description. So i choose that page as
reference. That is expressed by line like:
<RefURL>
https://github.com/OneWingedShark/WordPerfect/blob/master/ doc/SDK_Help/FileFormats/WPFF_DocumentStructure.htm
</RefURL>
When i run TrID on such examples these are described correctly but
unspecific like "WordPerfect (generic)" by definition wp-generic.trid.xml
(see appended output/trid-v-old.txt).
For comparison reason i also run the file utility (version 5.42). This
describes the examples as "WordPerfect" with sub classification as "equation
resource data" and "v1.0" (that is 31 hexadecimal; see appended
output/file-5.42.txt). I also run a file command patched according to
unofficial WordPerfect File Format description (see appended
output/file.txt)
So i run tridscan on my examples to generate qrs-wp.trid.xml. So we see that
not only the first 4 bytes are the same like \xFFWPC that is generic for all
WordPerfect samples, but also all first 6083 bytes are the same. That is
expressed by XML construct like:
<Bytes>FF575043C4170000011E01000000000090010800C417000092040000561C0000DC05
<Pos>0</Pos>
Unfortunately the definition is based only on 2 samples. So these are nearly
the same. There is only one difference. In the English example at offset
17C4 byte has hexadecimal value 44 (="D" and in German hexadecimal value 79
(="y").
At offset 4 pointer to document area is stored as 4 byte little endian
integer. In my examples this value was always 17C4 hexadecimal. At offset
the 8 and 9 the product and file type are stored. In my examples this value
was always 1 and 30 ( that is 1 and 1E in hexadecimal). At offset 10 the
major version and minor version fields are stored as byte value. In my
examples this value was always 1.0. At offset 12 the encryption field is
stored as 2 byte little endian integer. In my examples this value was always
0. At offset 14 pointer to index area is stored as 2 byte little endian
integer. In my examples this value was always 0 hexadecimal.
At offset 16 the extended file header start with 4 reserved bytes. In my
examples this sequence was always 90010800.
At offset 20 the file size ( not including pad characters at EOF) should be
stored as 4 byte little endian integer. In my examples this value was
hexadecimal 17C4. That is the same as pointer to document area. But the
real file size is 17040 ( hexadecimal 4290). So in QRS samples the stored
"files size" is not true.
Assuming that there exist examples with other document area pointer, other
"file sizes" and other extended file header this becomes like:
<Bytes>FF575043</Bytes>
<Pos>0</Pos>
<Bytes>011E01000000000090010800</Bytes>
<Pos>8</Pos>
So at offset 0x17c4 the document area begin. Obviously it contains often
ASCII strings ( "Fraction: x OVER y" "Superscript:" and so on) and maybe
some additional information. So this is expressed inside global strings
section by lines like:
<String>X OVER Y'''''''''S'SUPERSCRIPT</String>
But i do not know if this mathematical expressions must always be used. So i
keep it at the moment, but for the recognition this is not needed, because
that should be done bytes at offset 8 and 9.
Probably this not always true, but i do not know. So i keep that patterns.
With the new definition now all my WordPerfect QRS examples are described
more precisely ( see appended output/trid-v-new.txt). TrID definition and
output are stored in archive qrs.zip. I hope that the XML file can be used
in future version of triddefs.
With best wishes
Jörg Jenderek