Hello trid users,
some weeks ago ago i send updates for aup3.trid.xml. There i detected a trick,
feature "application id" that can be also used by some other SQLite 3.x
databases.
The standard file name suffix for SQLite 3.x databases is SQLITE3, SQLITE or
DB. Nowadays many companies and developers use this file format to store
their data.
Some use other file name suffix. I assume they do not want that "normal" users
open the database manually by tools for handling SQLite databases. But worse
is that some does not explain in a transparent way why they do such steps and
what they did change comparing with standard database. If all goes well (file
name suffix is OK or known and samples found in well known sub directories)
then it does not hurt, but in real world you must consider also all other
point views. So some behave like Putin claiming the world belongs to me or the
whole disc belongs to me for software developers. But what if hard disc crash,
extracting or packing of software archives failed. Then you often get hundreds
of "unknown" samples lying somewhere on your disc and undoing the chaos is
then nearly impossible. Luckily in current SQLite database file format there
exist a 4 byte field application id at offset 68 that make it possible to do
sub classification. Luckily some people use this feature.
In this session i will only consider MBTiles tile set.
Luckily i found MBTiles Specification in the internet. So that information is
expressed inside new TrID definitions by line like:
<RefURL>
https://github.com/mapbox/mbtiles-spec/tree/master </RefURL>
So i run trid utility on such tile set samples. My samples are described
correctly generic as "SQLite 3.x database" with mime type
application/x-sqlite3 by sqlite-3x.trid.xml. But not sub classification is
done. That means file name is not correctly shown (See appended
output/trid-v-old.txt).
For comparison reason i also run the file format identification utility DROID
( See
https://sourceforge.net/projects/droid/). Here the examples are also
recognized. These are described here also generic as "SQLite Database File
Format" with version "3" and mime type application/x-sqlite3 by PUID fmt/729.
For comparison reason i also run file command (version 5.45) on such
samples. Here these Fossil samples are also described as "SQLite 3.x database"
but some with additional information (MBTiles tileset see appended
output/file-5.45.txt) and mime type application/vnd.sqlite3 (see appended
output/file-i-5.44.txt). The correct file suffix is also not recognized (see
appended output/file-ext-5.45.txt).
There exist an official registered mime type application/vnd.sqlite3 at
iana.org for SQLite 3.x database. For the inspected Tilsit samples i found no
mime type. Because the tile set samples are just SQLite 3 database these
should at least get that mime type instead of generic application/octet-stream
mime type or deprecated application/x-sqlite3. That is now expressed inside
TrID definitions by line like:
<Mime>application/vnd.sqlite3</Mime>
At offset 68 the "Application ID" set by PRAGMA application_id is stored as 4
byte big endian integer. That is the most important sub classification feature
to distinguish the tile set samples from others. For tile set database this is
hexadecimal byte sequences 4d504258 or expressed as MPBX string.
So now i can create mbtiles-MPBX.trid.xml manually without running tridscan on
dozen of examples. The sub classification done by "Application ID" is
expressed by XML construct like:
<Bytes>4D504258</Bytes>
<ASCII> M P B X</ASCII>
<Pos>68</Pos>
The main classification like in sqlite-3x.trid.xml is expressed by XML
construct like:
<Bytes>53514C69746520666F726D61742033</Bytes>
<ASCII> S Q L i t e f o r m a t 3</ASCII>
<Pos>0</Pos>
In the specification no file name suffix is mentioned. But apparently no
standard sqlite is used here. So the correct file name suffix are now shown by
line like:
<Ext>MBTILES</Ext>
In version 1.2 and 1.3 version of specification is written that MBTiles
databases MAY use the officially assigned magic number to be easily identified
as MBTiles. So a few do like samples i downloaded from bbbike.org follow the
specification and are therefor detected by mbtiles-MPBX.trid.xml. But many
(5/7) do not follow the specification. So i must create a "generic"
mbtiles.trid.xml by running tridscan on my tile set samples.
I could try to refine the second definition by running tridscan with more
examples. But often samples occupy GB of my disc space. So i am not willing
to download dozen of probably big examples to improve my definition. I could
also try like for aup3.trid.xml to look in SQLite specification for fields and
values and eliminate byte sequences, but i am to tired to be the garbage man
for all the lazy developers. So i hope that other user improve that
definition.
With the new trid definitions now my tile set database examples are described
with more details (correct file name suffix see appended trid-v-new.txt in
output). TrID definition and output are stored in archive MBTile_.zip. I hope
that my definitions can be used in future version of triddefs.
With best wishes
Jörg Jenderek
e