Author Topic: updated afm.trid.xml for Outline Font Metric *.afm + MacBinary packed variant  (Read 1224 times)

jenderek

  • Sr. Member
  • ****
  • Posts: 375
Hello trid users,

some days ago i installed an older Adobe software with some fonts. So i was
checking some other font stuff. The inspected examples have file name
extension AFM.

When i run TrID on thousands of examples most are identified as "Outline
Font Metric" by afm.trid.xml. Unfortunately a few dozens are described as
"Unknown!" (see appended output/trid-v-old.txt).

For comparison reason i also run the file utility (version 5.41). This
identifies the examples as "ASCII font metrics" by starting phrase StartFont
or as "ASCII font bits" when using --keep-going option (see appended
output/file-k-5.41.txt).

I also run a patched file command that displays more information (see
appended output/file.tmp).

With that information i found a page about Adobe Type 1 including
information about AFM on web site file formats archive team. So this is now
expressed by line like:
 <RefURL>http://fileformats.archiveteam.org/wiki/Adobe_Type_1</RefURL>

So i update afm.trid.xml by running tridscan. I used a few new dozens of
examples. So then i check what has changed. The starting phrase followed by
one space character still remains. That is expressed by XML construct like:
   <Bytes>5374617274466F6E744D65747269637320</Bytes>
   <ASCII> S t a r t F o n t M e t r i c s</ASCII>
   <Pos>0</Pos>

Afterwards apparently a version string is stored. For thousands i found 3
byte string 2.0. About hundreds of examples have string 3.0. I also found
few examples with string 1.0 and 4.1. So all these examples have version
strings that looks like X.Y. So a point character appear always inside
version part. That was expressed by XML construct like:
   <Bytes>2E</Bytes>
   <Pos>18</Pos>

Unfortunately i found 2 examples (cmbxti10.afm cmti10.afm in source in afm
sub directory of R statistical software) with 5 byte version string [2.0].
So here version string is surrounded by brackets. After some time of
thinking i believe that such version strings are not accidents and are also
valid. Then the above XML construct vanish. In comment lines are written
that examples are created by script gf2afm composed by Pierre A. MacKay. So
maybe this script is a little bit too lazy, but probably this still seems to
be valid. In Adobe Font Metrics File Format Specification about version is
written that fractional increases in the version number indicate minor,
upwards-compatible revisions to the format. Whole-number increases indicate
major, potentially incompatible, changes. But nowhere is written that
version string must looks explicitly like X.Y.

Then i found about a dozen of examples (like: pcrb-o.afm pcrbo-o.afm
pcrr-o.afm pcrro-o.afm phvb-o.afm phvbo-o.afm phvr-o.afm phvro-o.afm
ptmb-o.afm ptmbi-o.afm ptmr-o.afm ptmri-o.afm ADLIB.AFM ALGERIA.AFM
ARCHTURA.AFM BEEBOPP.AFM BUSORAMA.AFM CARASTRO.AFM FRANKFRT.AFM GLYPIC.AFM
GRAFSHAD.AFM GRAPHIK.AFM KABEL.AFM LITHOS.AFM) without keyword
IsFixedPitch. According to specification this boolean key is optional. So in
global strings section the following line vanish like:
   <String>ISFIXEDPITCH</String>

After running heavily tridscan in global strings section also the following
line vanish like:
   <String>FONTBBOX</String>

This was triggered by one example pcfont.afm with font name PCFont. This
file is included in GNU package a2ps (at least sources of version
4.14). Inside this file in comment line is written that this file is not
correct. According to documentation in StartCharMetrics section all
characters have bounding box dimensions zero (if missing B llx lly urx
ury). So this makes no sense. Maybe that this example was created for
testing purpose of some aspects. According to specification also keyword
FontBBox with corner coordinates are required. That is also missing in that
example. So i undo the changes triggered by that example.

After running heavily tridscan in global strings section the appending s-
character of EndFontMetrics in the following line vanish. So this looks
like:
   <String>ENDFONTMETRIC</String>
This was triggered by examples AFUTBLK_.AFM and SYDNEY.AFM. These files are
found on web site cd.textfiles.com below maxfonts directory. And even worse
finally this line completely vanished. This was triggered by example
UPSILON.AFM, where last phrase now becomes like just 3 byte phrase End.

After the StartCharMetrics line each character’s metrics consists of a list
of keys and values separated by semicolons, on one line. A character metric
data line might look like this:
   C 102 ; WX 333 ; N f ; B 20 0 383 682 ; L i fi ; L l fl ;
Afterwards this part is terminated by a line with phrase EndCharMetrics.

According to documentation this is optional, but found in all my
examples. That is expressed by two lines inside global string section like:
   <String>STARTCHARMETRICS</String>
   <String>ENDCHARMETRICS</String>

That means that also in these three examples the metric information is
complete. I am no font expert. So i do not know how AFM parsing software
behaves. Maybe some software ignore the missing EndFontMetrics line or gives
just a warning, but obviously this does not hurt if the metric information
is complete as in the above three mentioned examples. But according to
specification the StartFontMetrics keyword is required and must be the first
line in the file. The keyword EndFontMetrics is also required and must be
the last non-empty line in the AFM file. So i undo the changes done by these
three examples.

Because AFM samples are just plain files nearly most are described by file
command by mime type text/plain. But about a dozen of examples like
AFUTBLK_.AFM have as last character Control-Z (0x1A) at the end instead of
usual used Linefeed character Control-J (0x0A). Such examples are described
by file command by generic application/octet-stream (See appended
output/file-i-5.41.txt). But according to shared-mime-info found for example
at web site reposcope.com AFM files get their own user defined mime
type. That is now expressed by line like:
   <Mime>application/x-font-afm</Mime>

When running tridscan i also found about a dozen of AFM examples where the
real font metric part is packed inside a MacBinary with AFM file name
extension. Because AFM metric files are just plain text such examples are
classified as "Macintosh plain text (MacBinary)" by macbin-gen-txt.trid.xml
(see appended macbin/output/trid-v-old.txt) and by file command as
"MacBinary" with type ASCII and creator Fontographer (see appended
macbin/output/file-5.41.txt).

So i generate macbin-afm.trid.xml by running tridscan. This start with the
characteristic of Macintosh plain text (MacBinary) described by
macbin-gen-txt.trid.xml.

In first byte the old version number is stored that must be be kept at zero
for compatibility. That is expressed by first XML construct like:
   <Bytes>00</Bytes>
   <Pos>0</Pos>

In next byte the length of the embedded metric filename (must be in the
range 1-63) is stored. Afterwards this name like PixieFont.AFM. is
stored. Because of the used file name extension AFM this is expressed inside
global strings section by line like:
   <String>.AFM</String>

At offset 65 the file type (normally expressed as four characters) is
stored. At offset 69 file creator (normally expressed as four characters) is
stored.

The second XML construct looks like:
   <Bytes>00000000000000000000000   00000000005445585461436132</Bytes>
   <ASCII> . . . . . . . . . . .     . . . . . T E X T a C a 2</ASCII>
   <Pos>21</Pos>
Because in my inspected examples the maximal (63) file name was not used the
remaining file name bytes are filled with nils. Because AFM are considered
as pure ASCII text the 4 byte file type TEXT is used. All my MacBinary AFM
metric are obviously packed by software created from Fontographer (4 byte
creator id aCa2). Assuming that also longer file names occur and such packed
files are always created by Fontographer software this XML construct becomes
like:
   <Bytes>5445585461436132</Bytes>
   <ASCII> T E X T a C a 2</ASCII>
   <Pos>65</Pos>

According to documentation byte at offset 74 is zero fill for
compatibility. That is expressed by third XML construct like:
   <Bytes>00</Bytes>
   <Pos>74</Pos>

Forth XML construct looks like:
   <Bytes>000000000000</Bytes>
   <Pos>79</Pos>
According to documentation at offset 79 the window or folder ID of file is
stored. At offset 81 "Protected" flag byte is stored. At offset 82 zero fill
byte is stored that must be zero for compatibility. Assuming that other
folder ID and other protected flag can occur this construct now becomes
like:
   <Bytes>00</Bytes>
   <Pos>82</Pos>

XML construct number 5 looks like:
   <Bytes>00000000</Bytes>
   <Pos>87</Pos>
According to documentation at offset 87 Resource Fork length is stored as 4
byte integer. So all my inspected examples have no resource fork. Assuming
that there exist examples with resource fork this construct must be deleted.

XML construct number 6 looks like:
 <Bytes>0000  005374617274466F6E744D65747269637320322E300D436F6D6D656E74204765
 <ASCII> . .   . S t a r t F o n t M e t r i c s   2 . 0 . C o m m e n t   G e
 <Pos>99</Pos>

At offset 99 variable like length of get info comment is stored. Often these
variables are zero. At offset 128 the data fork begins. Here we find the real
AFM metric content. In all my inspected examples the first line looks like:
   StartFontMetrics 2.0
So all examples are version 2.0. The second line of AFM looks like:
   Comment Generated by Fontographer 10/11/90
   Comment Generated by Fontographer 3.2 5/3/91
So the the phrase "Comment Generated by Fontographer" appear in all examples
at the same position and by lucky circumstances one slash character of date
appear in all examples at the same position. That is expressed by XML
construct number 8 which looks like:
   <Bytes>2F</Bytes>
   <ASCII> /</ASCII>
   <Pos>188</Pos>

Assuming that maybe non zero values, AFM version other than 2.0 and second
AFM line is not the same comment line XML construct number 8 vanish and
construct number seven now becomes like:
   <Bytes>5374617274466F6E744D65747269637320</Bytes>
   <ASCII> S t a r t F o n t M e t r i c s  </ASCII>
   <Pos>128</Pos>
In global string section these observations are expressed by pattern like:
   <String>STARTFONTMETRICS 2.0</String>
   <String>COMMENT GENERATED BY FONTOGRAPHER</String>
So this now becomes like:
   <String>STARTFONTMETRICS</String>

In global string section were short pattern like:
   <String>C 48</String>
   <String>C 49</String>
   ...
   <String>C 90</String>
That defines the individual character metrics of digit 0, 1 and so on til
upper letter Z. But there may exist metrics where such characters are
missing (then integer -1) or value is given in hexadecimal form (CH <hex>)
instead of decimal. So i delete such patterns.

In global string section were short pattern like:
   <String>N ZERO</String>
   <String>N ONE</String>
   ...
   <String>N NINE</String>
By optional "N name" the PostScript language name of character is given.
For metrics of fonts without digits these optional parts do not exist. So i
delete such patterns.

In global string section were short pattern like:
   <String>WX 58</String>
   <String>WX 6</String>
By optional WX number the character width in x for writing direction 0 is
given. So i delete such patterns.

In global string section were short pattern like:
   <String>B 0 0</String>
By optional "B llx lly urx ury" character bounding box is given where llx,
lly, urx, and ury are all numbers. If a character makes no marks on the page
(for example, the space character), this field reads B 0 0 0 0. So i delete
such pattern.

In global string section was short pattern like:
   <String>NOTICE</String>
By optional Notice string a font name trademark or copyright notice is
given. Because it is optional i delete this pattern.

In global string section was short pattern like:
   <String>XHEIGHT</String>
By optional XHeight number the y-value of the top of the lowercase x is
given. If this font program contains no lowercase x, this keyword might be
missing or number might be 0. Because it is optional i delete this pattern.

In global string section was short pattern like:
   <String>SPACE</String>
This apparently was triggered by optional "N space" or "N nbspace". For font
without space characters this is missing. Because it is optional i delete
this pattern.

In global string section was short pattern like:
   <String>RIGHT</String>
This apparently was triggered by optional postscript language character
naming parts like "N parenright", "N copyright", "N quotedblright", "N
quoteright" or optional notice strings like "All Rights Reserved". Because
it is optional i delete this pattern.

In global string section was pattern like:
   <String>FULLNAME</String>
By optional "FullName string" the full text name of the font is given like
"PixieFont", "TC Garamond Light" or "Ryumin Light V". Because it is optional
i delete this pattern.

In global string section was pattern like:
   <String>FAMILYNAME</String>
The optional "FamilyName string" is the name of the typeface family to which
the font belongs. Because it is optional i delete this pattern.

In global string section were patterns like:
   <String>DESCENDER</String>
   <String>ASCENDER</String>
The optional "Descender number" for roman font programs typically is the
y-value of the bottom of the lowercase p. If this font program contains no
lowercase p, this keyword might be missing or number might be 0.  The
optional "Ascender number" for roman font programs usually is the y-value of
the top of the lowercase d. If this font program contains no lowercase d,
this keyword might be missing or number might be 0. Because these are
optional i delete these patterns.

In global string section was pattern like:
   <String>CAPHEIGHT</String>
The optional "CapHeight number" usually is the y-value of the top of the
capital H. If this font program contains no capital H, this keyword might be
missing or number might be 0. Because it is optional i delete this pattern.

In global string section were patterns like:
   <String>STARTKERNDATA</String>
   <String>ENDKERNDATA</String>
   <String>STARTKERNPAIRS</String>
   <String>ENDKERNPAIRS</String>
The kerning data section is surrounded by the lines StartKernData and
EndKernData. It is optional. Therefor i delete these patterns.  The
pair-wise kerning data is surrounded by the keywords "StartKernPairs
integer" and "EndKernPairs". It is required if pair-wise kerning data are
present. So apparently i assume it can also be absent. So i delete these
patterns.

In global string section was pattern like:
   <String>WEIGHT STANDARD</String>
The optional "Weight string" is the weight of the font like Roman, Bold,
Light. Because it is optional i delete this pattern.

In global string section was pattern like:
   <String>ITALICANGLE 0.0</String>
The optional "ItalicAngle number" is the Angle (in degrees counter-clockwise
from the vertical) of the dominant vertical strokes of the font. For
non-italic fonts, this angle will be zero. Because it is optional i delete
this pattern.

In global string section was pattern like:
   <String>UNDERLINETHICKNESS</String>
The optional "UnderlineThickness number" is the stroke width for
underlining, and is generally proportional to the stroke widths of
characters in the font program. Because it is optional i delete this
pattern.

In global string section was pattern like:
   <String>UNDERLINEPOSITION -</String>
The optional "UnderlinePosition number" is the Distance from the baseline
for centering underlining strokes. Because it is optional i delete this
pattern.

In global string section was pattern like:
   <String>ISFIXEDPITCH FALSE</String>
The optional "IsFixedPitch boolean" with value true, this indicates that the
font program is a fixed pitch (mono spaced) font. Because it is optional i
delete this pattern.

In global string section was pattern like:
   <String>VERSION 001.000</String>
The optional "Version string" is the Font program version
identifier. Matches the string found in the FontInfo dictionary of the font
program itself. Because it is optional i delete this pattern.

In global string section was pattern like:
   <String>ENCODINGSCHEME APPLESTANDARD</String>
The optional "EncodingScheme string" indicating the default encoding vector
for this font program. Common ones are AdobeStandardEncoding and
JIS12-88-CFEncoding. Because it is optional i delete this pattern.

With the 2 TrID definitions all of my inspected "valid" AFM examples are now
described correctly "Outline Font Metric" or "Outline Font Metric
(MacBinary)" for the MacBinary variant (see appended output/trid-v.txt
macbin/output/trid-v.txt). TrID definitions, some excamples and output are
stored in archive AFM_.zip. I hope that the XML files can be used in future
version of triddefs.

With best wishes
Jörg Jenderek

Mark0

  • Administrator
  • Hero Member
  • *****
  • Posts: 2743
    • Mark0's Home Page
Thanks Jörg!