Author Topic: updated eps-adobe.trid.xml for *.ept + 2 variants for TIFF/WMF preview  (Read 1604 times)

jenderek

  • Sr. Member
  • ****
  • Posts: 361
Hello trid users,

some weeks ago i had to handle some Encapsulated Postscript binary
files with previews.

When running trid on such files all examples are recognized as "Adobe
Encapsulated Postscript" by eps-adobe.trid.xml (See appended
output/trid-old.txt).

For comparison reason i also run other tools for file type identification.
The newest file command version 5.38 {See
https://en.wikipedia.org/wiki/File_(command)} describes inspected examples
correctly like "DOS EPS Binary File" ( see appended output/file-5.38.txt)
and displays "image/x-eps" as mime type ( see appended
output/file-ik-5.38.txt )

The droid tool { found at http://droid.sourceforge.net/ } also recognize
such examples as "Encapsulated PostScript File Format" by PUID fmt/122 and
fmt/124 and use "application/postscript" as mime type ( see appended
output/droid-ept.csv) .

The identify command line tool of ImageMagick graphic software { found at
https://imagemagick.org/ } also recognize such examples as "EPT
(Encapsulated PostScript with TIFF preview)" ( see appended
output/identify-verbose.txt) .

So TrID only mention file name extension "eps", but according to ImageMagick
also "ept" is used for Encapsulated Postscript with TIFF preview like in
Bitmap_VS_SVG.ept generated from Wikipedia SVG example. So i run tridscan to
update definition file. Now 2 file name extensions are expressed by line

   <Ext>EPS/EPT</Ext>

For mime type i choose expression mentioned by file command. This is now
expressed by additional line

   <Mime>image/x-eps</Mime>

Information about that file format can be found on file formats archive team
web site. This is now expressed by XML line

<RefURL>
http://fileformats.archiveteam.org/wiki/Encapsulated_PostScript
</RefURL>

According to that site and by looking at other description the phrase "Adobe
Encapsulated PostScript" is not well suited. Better would be a phrase like
"Encapsulated PostScript Binary" or "Encapsulated PostScript with TIFF or
WMF preview". But at the moment i mention this fact in remark line.

I also do not like the "DOS" phrase used by file command. In computer
ancient times on classic Mac OS it was possible to put an preview image in
the resource fork, but on DOS computers this concept does not exist. So a
binary format was "invented" to put plain PostScript text together with
binary TIFF or WMF preview image in one file. Nowadays nearly nobody is
using DOS but that binary format still can be read/written by software like
CorelDRAW and ImageMagick.

That also means that definition eps-dos.trid.xml with text "Encapsulated
PostScript (with DOS style preview)" in principal describe the same and
should be removed, but there seems to exist samples with file name extension
PS. So if this is true then this name extension must be added to trid
definition.

When looking in output of file command ("Postscript starts at byte" ) and
the Encapsulated Postscript file format summary on Encyclopedia of graphics
file formats it is apparent that the pure Postscript part is embedded inside
that binary. When extracting this plain text the resulting file is
described by eps.trid.xml as "Encapsulated PostScript". So to be consistent
all string phrases like "PS-ADOBE-" "%%CREATOR" mentioned in global strings
section of eps.trid.xml should also appear inside eps-adobe.trid.xml.

But maybe there is one exception rule. Normally the plain Postscript part starts
after header at offset 30 or 32 followed by preview image, but in
example.eps preview image comes first and then plain text part. So it is
possible that characteristic postscript phrases like "%%CREATOR" are beyond the
search range limit of trid program.

But according to documentations it is possibly to distinguish between variant
with TIFF preview image and WMF preview image. So i run tridscan on samples
recognized by file command with TIFF preview image to generate
eps-tiff.trid.xml. Because these binaries contain a TIFF image, then no WMF
image is embedded. This means offset and length value of WMF image are
null. This fact is expressed by additional XML construct:

   <Bytes>0000000000000000</Bytes>
   <Pos>12</Pos>

Do the same procedure for WMF variant described by eps-wmf.trid.xml. Here
offset and length value of TIFF image are null. This fact is expressed here
by additional XML construct:

   <Bytes>0000000000000000</Bytes>
   <Pos>20</Pos>

It is not clear but for WMF variant only one file name extension seems to be
used. This i expressed by line:
      
      <Ext>EPS</Ext>

With the update trid definition and the 2 variant the examples are now more
precisely described ( see appended output/trid-new-v.txt). TrID definitions,
some examples and output are stored in archive eps_ept.zip. I hope that my
XML files can be used in future version of triddefs.

With best wishes
Jörg Jenderek

Mark0

  • Administrator
  • Hero Member
  • *****
  • Posts: 2667
    • Mark0's Home Page
Re: updated eps-adobe.trid.xml for *.ept + 2 variants for TIFF/WMF preview
« Reply #1 on: January 18, 2020, 01:10:12 AM »
Thanks for the defs and the info.
I scanned some other files, and removed all the strings (the header patterns should be to be enough).
I have added an EPSI def too.
« Last Edit: January 18, 2020, 01:22:42 AM by Mark0 »