Hello trid users,
some weeks ago ago i send updates for aup3.trid.xml. There i detected a trick,
feature "application id" that can be also used by some other SQLite 3.x
databases.
The standard file name suffix for SQLite 3.x databases is SQLITE3, SQLITE or
DB. Nowadays many companies and developers use this file format to store their
data.
Some use other file name suffix. I assume they do not want that "normal" users
open the database manually by tools for handling SQLite databases. But worse
is that some does not explain in a transparent way why they do such steps and
what they did change comparing with standard database. If all goes well (file
name suffix is OK or known and samples found in well known sub directories)
then it does not hurt, but in real world you must consider also all other
point views. So some behave like Putin claiming the world belongs to me or the
whole disc belongs to me for software developers. But what if hard disc crash,
extracting or packing of software archives failed. Then you often get hundreds
of "unknown" samples lying somewhere on your disc and undoing the chaos is
then nearly impossible. Luckily in current SQLite database file format there
exist a 4 byte field application id at offset 68 that make it possible to do
sub classification. Luckily some people use this feature.
In this session i will only consider GeoPackage database.
The file name of such samples is normally ending with GPKG after point
character. So it is described on page about GeoPackage on file formats archive
team web site. So that information is expressed inside new defitions again by
line like:
<RefURL>
http://fileformats.archiveteam.org/wiki/GeoPackage </RefURL>
So i run trid utility on such GeoPackage database samples. My samples are
described correctly generic as "SQLite 3.x database" with mime type
application/x-sqlite3 by sqlite-3x.trid.xml. But sub classification is done by
gpkg.trid.xml for examples. Then with highest priority the samples are
decribed as "GeoPackage" by gpkg.trid.xml with application/octet-stream mime
type and correct file name suffix (.GPKG See appended output/trid-v-old.txt).
For comparison reason i also run the file format identification utility DROID
(See
https://sourceforge.net/projects/droid/). Here the examples are also
recognized. These are described here also generic as "OGC GeoPackage" with
version "1.0-1.31" and mime type application/geopackage+sqlite3 by PUID
fmt/1700.
For comparison reason i also run file command (version 5.45) on such
samples. Here these GeoPackage samples are also described as "SQLite 3.x
database" but with additional information (OGC GeoPackage see appended
output/file-5.45.txt) and mime type application/vnd.sqlite3 (see appended
output/file-i-5.44.txt). The correct file suffix is also not recognized (see
appended output/file-ext-5.45.txt). For a few samples like rivers.gpkg and
rte.gpkg that is all, but for the others additional "version 1.0" is
displayed.
Here the devolpers have done their work correctly, because there exist an
official registered mime type application/geopackage+sqlite3 at iana.org for
this SQLite 3.x database variant instead of application/vnd.sqlite3. That is
now expressed inside new TrID definitions by line like:
<Mime>application/geopackage+sqlite3</Mime>
At offset 68 the "Application ID" set by PRAGMA application_id is stored as 4
byte big endian integer. That is the most important sub classification feature
to distinguish the GeoPackage samples from others. This is hexadecimal byte sequences
47503130 or 47504B47 for version 1.2 and higher according to IANA. The ID
expressed as strings are GP10 or GPKG.
So now i can create gpkg-v1.trid.xml and gpkg-v12.trid.xml manually without
running tridscan on dozen of examples. The sub classification done by
"Application ID" is expressed inside gpkg-v1.trid.xml by XML construct like:
<Bytes>47503130</Bytes>
<ASCII> G P 1 0</ASCII>
<Pos>68</Pos>
The main classification like in sqlite-3x.trid.xml is expressed by XML
construct like:
<Bytes>53514C69746520666F726D61742033</Bytes>
<ASCII> S Q L i t e f o r m a t 3</ASCII>
<Pos>0</Pos>
The sub classification done by "Application ID" is expressed inside
gpkg-v12.trid.xml by XML construct like:
<Bytes>47504B47</Bytes>
<ASCII> G P K G</ASCII>
<Pos>68</Pos>
qWith the new 2 trid definition now my GeoPackage database examples are
described with more details (correct mime type and version information see
appended output/trid-v-new.txt). TrID definition, few samples and output are
stored in archive gpkg_.zip. I hope that my definition can be used in future
version of triddefs.
If i understand the facts alright the current gpkg.trid.xml which may need
some refinements is not needed any more, because by application id check
inside gpkg-v1.trid.xml and gpkg-v12.trid.xml the sub classification is now done.
There exist more other database samples using the application id feature. I
will try to handle such samples in future session.
With best wishes
Jörg Jenderek