Author Topic: TrID variants for TrueType / OpenType font collection  (Read 3936 times)

jenderek

  • Sr. Member
  • ****
  • Posts: 371
TrID variants for TrueType / OpenType font collection
« on: July 07, 2017, 01:32:46 AM »
Hello,

when i run trid on on hundreds of *.TTC files dozens are not recognised
( "Unknown!" see appended ttc_v2/output/trid-old.txt ).

The newest file(1) command version (http://darwinsys.com/file/) identifies such
examples correct as "TrueType font collection"
( see appended ttc_v2/output/file-new.txt ).

Furthermore i find some dozen of ttc files which are identified by file command
as "OpenType font collection" ( see otc/output/file-new.txt) and are not
recognised by trid ( "Unknown!" see otc/output/trid-old.txt).

According to specification TTC files are collection for TrueType Font ( see also
ttf.trid.xml ) or OpenType font (see also otf.trid.xml ).

So by ttc.trid.xml the average of all these font collections is described and
only one XML pattern is left describing the font collection id:
      <Bytes>74746366</Bytes>
         <ASCII> t t c f</ASCII>
         <Pos>0</Pos>
      </Pattern>

So definitions in ttc.trid.xml are a little too general and all text files like
example ttfc-test.txt starting with string "ttcf" and additional keywords like
"MAXP" are misidentified as "TrueType Font Collection"
( see otc/output/trid-old.txt).

So i do not waste efforts to refine this definition file.

At offset 4 the version is stored in four bytes. According to specifications
there exist only two versions ( 1.0 and 2.0 ). So i need to create two trid
definition files ttc-v?-*.trid.xml. Furthermore i distinguish between TrueType
(*-ttf.trid.xml) and OpenType (*-otf.trid.xml) variant.  I found no OpenType
font collection with version 1.0. So i do not create definition file
ttc-v1-otf.trid.xml.
At offset 8 the number of fonts is stored as big endian long integer. Typically
font collection contains only a couple of fonts. So three upper bytes are null.
So for TrueType Font Collection version 1.0 definition file ttc-v1-ttf.trid.xml
starts with pattern
         <Bytes>7474636600010000000000</Bytes>
         <ASCII> t t c f</ASCII>
         <Pos>0</Pos>
The version 2.0 variants ttc-v2-*.trid.xml start with pattern
         <Bytes>7474636600020000000000</Bytes>
         <ASCII> t t c f</ASCII>
         <Pos>0</Pos>

After creating definition files by tridscan i manually fine tuned XML files.
The mime type is now done by line:
      <Mime>application/font-sfnt</Mime>
This will become probably "font/collection" in near future.

Furthermore i add wikipedia font page as reference URL. For TrueType this is done
by line:
   <RefURL>https://en.wikipedia.org/wiki/TrueType</RefURL>
And for OpenType in ttc-v2-otf.trid.xml URL is shown by line:
   <RefURL>https://en.wikipedia.org/wiki/OpenType</RefURL>

File name extension for TrueType Font Collection is "ttc". This is expressed by
line:
      <Ext>TTC</Ext>
File name extension for OpenType Font Collection is "otc". I myself found no
such examples. For compatibility reasons in most cases "ttc" extension is
used. This is expressed in ttc-v2-otf.trid.xml by line:
      <Ext>TTC/OTC</Ext>

At position 12 the offset to fonts is stored as big endian long integer.
Normally first font is stored after this offset array or after additional
signature section for version 2. So first offset is normally 0x14 for collection
with 2 fonts like in YOzRAP.TTC or 0x1c for collection with 4 fonts like in
uming.ttc ( see ttc_v1/output/file-new.txt). That means three upper bytes of
first offset are null. This is expressed by XML construct:
      <Pattern>
         <Bytes>000000</Bytes>
         <Pos>12</Pos>
      </Pattern>

Normally second font is stored after first one. If first font is not so big then
upper bytes of second offset are also null. This is expressed by
      <Pattern>
         <Bytes>0000</Bytes>
         <Pos>16</Pos>
      </Pattern>

In generated by tridscan ttc-v2-otf.trid.xml i found low offsets for third font
offset. This was expressed by XML construct:
      <Pattern>
         <Bytes>0000</Bytes>
         <Pos>20</Pos>
      </Pattern>

For OpenType font collection only 24 examples were inspected. If you are looking
at more examples with little but big fonts last null patterns shrink or vanish.
For TrueType Font Collection version 2 (ttc-v2-otf.trid.xml) with more examples
no pattern for third offset exist any more. So i synchronised XML files and
canceled expression for third offset in ttc-v2-otf.trid.xml.

Because after TTC header font is stored, we should get same strings as in
ttf.trid.xml for TrueType Font and otf.trid.xml for OpenType font. That means in
GlobalStrings section only 4 byte name table identifiers should occur. So changed
lines from
   <String>6HHEA</String>
   <String>$HMTX</String>
to
   <String>HHEA</String>
   <String>HMTX</String>

In generated by tridscan ttc-v1-ttf.trid.xml with only 21 examples i found in
GlobalStrings section a line:
   <String>GASP</String>
That means inspected font examples have a grid fitting/scan-conversion
table. But according to specification this is optional and in
ttc-v2-ttf.trid.xml based on many examples this table was not generally found
any more. So i synchronize XML files and remove this line.

OpenType fonts starts with sfnt version 'OTTO'. So we find for OpenType fonts
collection in GlobalStrings section a line
      <String>OTTO</String>

This line is missing in definitions ttc-v?-ttf.trid.xml for TrueType collections
with typically sfnt version 00010000h. According to specifications some tables
are only related to True Type outlines. Glyph data table with identifier "glyf"
is only found for TrueType variant. So we found here additional line
      <String>GLYF</String>

trid definitions and output are stored in archive ttc_otc.zip.
I hope that my 3 XML files can be used in future version of triddefs.

With best wishes
J?rg Jenderek

Mark0

  • Administrator
  • Hero Member
  • *****
  • Posts: 2686
    • Mark0's Home Page
Re: TrID variants for TrueType / OpenType font collection
« Reply #1 on: July 07, 2017, 03:33:12 PM »
OK, I'll keep the original as "generic", ad add this more specific 3.
Thanks as usual!