Author Topic: updated pagemaker-generic.trid.xml for Adobe PageMaker document (generic)  (Read 1088 times)

jenderek

  • Sr. Member
  • ****
  • Posts: 375
Hello trid users,

some times ago i installed an Adobe PageMaker software.

The "newer" documents and templates are files with name extensions like PM6
P65 PMD PT6 T65 PMT.

When i run TrID on the inspected examples these are identified correctly
with low rate as "Generic OLE2 / Multistream Compound" by
docfile.trid.xml. Most examples are also described as "Adobe PageMaker
document (generic)" with mime type application/x-pagemaker by
pagemaker-generic.trid.xml. But one example Charset.pmt is not described as
PageMaker document. The version 6 examples with file name extension PM6 and
PT6 should be described by pagemaker-pm6.trid.xml as "Adobe PageMaker
document (v6)". The version 6.5 examples with file name extension P65 and
T65 should be described by pagemaker-pm65.trid.xml as "Adobe PageMaker
document (v6.5)". The version 7 examples with file name extension PMD and
PMT should be described by pmd-pm7.trid.xml as "Page Maker 7 Document" (see
appended output/trid-v-old.txt).

For comparison reason i also run the file format identification utility
DROID ( See https://sourceforge.net/projects/droid/). This identifies all
new PageMaker examples as "Pagemaker Document (Generic)" with mime type
application/vnd.pagemaker by PUID fmt/876. But it only shows 3 extensions
PMD, PMT and P65 as valid. Three extensions PM6, PT6 and T65 are shown as
mismatched (See appended output/pagemaker-cdf.csv).

When running file command (newest version > 5.41) with -e cdf option on such
documents all are described correctly generic as "OLE 2 Compound Document"
and also as "Adobe PageMaker document". The PM6 and PT6 examples are
described correctly as "version 6", but file command here is not able to
distinguish template (that is file name extension PT6) from pure PageMaker
document (that is file name extension PM6). The four other examples (P65 T65
PMD PMT) are described as "version 6.50" which is wrong for the last two
file name extensions (See appended output/file.txt).

With --extension option file command shows correct extensions "pm6/pt6" for
version 6 examples. For higher version examples it shows shows correct
extensions "p65/t65/pmd/pmt", but obviously it was not able to distinguish
template from pure document and version 6.5 from 7 (See appended
output/file-extension.txt).

I found a page about PageMaker on web site file formats archive team with
some file format information. There at least some information about file
format and version depending stuff is mentioned.

There exist no official registered mime type, but DROID use
application/vnd.pagemaker. This is also used by file command (See appended
output/file-i.txt). So i changed the used application/x-pagemaker in
definition by this one. This is now expressed by line like:
   <Mime>application/vnd.pagemaker</Mime>

So i update definition pagemaker-generic.trid.xml by running tridscan on
undetected example. So then i check what has changed and what is kept. The
starting phrase is the generic characteristic and also used inside
docfile.trid.xml for "Generic OLE2 / Multistream Compound". That is
expressed by XML construct like:
   <Bytes>D0CF11E0A1B11AE100</Bytes>
   <Pos>0</Pos>

Most examples contain 3 word phrase "PageMaker 5.0 CMYK". That was expressed
inside global strings section by lines like:
   <String>PAGEMAKER 5.0 CMYK</String>
In undetected example instead CMYK color system i found 3 word phrase
"PageMaker 5.0 RGB". Obviously here another color system (RGB) is used. So
the above line now becomes like:
   <String>PAGEMAKER 5.0</String>
      
According to documentation the compound document just contain the real
content inside a stream. When i inspect and extract this stream for example
by Michal Mutl Structured Storage Viewer, i get real PageMaker content in
"old" format. Obviously these stream seems to have always the name
"PageMaker". This is stored inside the directory entries as UTF-16
string. So this is the special characteristic and this is also found in
global string section inside TrID definition by line like:
   <String>P'A'G'E'M'A'K'E'R</String>

The original definition mention no file name extensions, but the
documentation mention six extensions. So i mention this in remark line and
this is now expressed by line like:
   <Ext>PM6/P65/PMD/PT6/T65/PMT</Ext>

Furthermore the describing text "Adobe PageMaker document (generic)" is
misleading, because it implies that this describes all PageMaker variants,
but that is not true. The "older" versions 3, 4 and 5 are not described by
that definition. It describes only the "newer" versions 6, 6.5 and 7 where
real content is wrapped as stream inside Multistream Compound. So when you
want to emphasize the version aspect then i would use a text like "Adobe
PageMaker document (v6 and above)". When you want to emphasize the wrapped
aspect then i would use a text like "Adobe PageMaker document (generic OLE2
based)". I choose the latter one and use a changed line like:
   <FileType>Adobe PageMaker document (generic OLE2 based)</FileType>

Theoretically the definition pmd-pm7.trid.xml must also be updated, but i
found only one example Charset.pmt which is missed. I do not know where this
example is from. It is dated as 18 June 2001, that is before the release
date 9 July 2001 of page maker version 7. So maybe this example is of an
older version and got by somebody a wrong file name extension. So i left
this definition untouched.

With the updated TrID definition all of my inspected newer PageMaker
examples are now described correctly as "Adobe PageMaker document" (see
appended output/trid-v-new.txt).

TrID definitions, some examples and output are stored in archive
pm_ole2.zip.

With best wishes
Jörg Jenderek

Mark0

  • Administrator
  • Hero Member
  • *****
  • Posts: 2730
    • Mark0's Home Page
Thanks!