Hello trid users,
some weeks ago ago i send updates for aup3.trid.xml. There i detected a trick,
feature "application id" that can be also used by some other SQLite 3.x
databases.
The standard file name suffix for SQLite 3.x databases is SQLITE3, SQLITE or
DB. Nowadays many companies and developers use this file format to store
their data.
Some use other file name suffix. I assume they do not want that "normal" users
open the database manually by tools for handling SQLite databases. But worse
is that some does not explain in a transparent way why they do such steps and
what they did change comparing with standard database. If all goes well (file
name suffix is OK or known and samples found in well known sub directories)
then it does not hurt, but in real world you must consider also all other
point views. So some behave like Putin claiming the world belongs to me or the
whole disc belongs to me for software developers. But what if hard disc crash,
extracting or packing of software archives failed. Then you often get hundreds
of "unknown" samples lying somewhere on your disc and undoing the chaos is
then nearly impossible. Luckily in current SQLite database file format there
exist a 4 byte field application id at offset 68 that make it possible to do
sub classification. Luckily some people use this feature.
In this session i will only consider Fossil repository database.
The file name of such samples is normally ending with fossil after point
character. So it is described on page about Fossil repository database on file
formats archive team web site. So that information is expressed inside new
fossil-repo.trid.xml by line like:
<RefURL>
http://fileformats.archiveteam.org/wiki/Fossil_repository_database </RefURL>
I installed software package fossil on my Raspian OS. Inside the corresponding
README.Debian in directory /usr/share/doc/fossil there is also mentioned an
example fossil.fsl found on debian web server. The URL is
https://people.debian.org/~bap/fossil.fslThe FSL is not listed in the mentioned documentations. I assume that for
historical (old DOS FAT 8+3 limitations) FSL is used there instead of FOSSIL
suffix. So these 2 suffix are now expressed by line like:
<Ext>FOSSIL/FSL</Ext>
So i run trid utility on such Fossil database samples. My samples are
described as correctly generic as "SQLite 3.x database" with mime type
application/x-sqlite3 by sqlite-3x.trid.xml. But not sub classification is
done by fossil.trid.xml for examples. That means file name is not correctly
shown (See appended output/trid-v-old.txt).
For comparison reason i also run the file format identification utility DROID
(See
https://sourceforge.net/projects/droid/). Here the examples are also
recognized. These are described here also generic as "SQLite Database File
Format" with version "3" and mime type application/x-sqlite3 by PUID fmt/729.
For comparison reason i also run file command (version 5.45) on such
samples. Here these Fossil samples are also described as "SQLite 3.x database"
but with additional information (Fossil repository see appended
output/file-5.45.txt) and mime type application/vnd.sqlite3 (see appended
output/file-i-5.44.txt). The correct file suffix is also not recognized (see
appended output/file-ext-5.45.txt).
There exist an official registered mime type application/vnd.sqlite3 at
iana.org for SQLite 3.x database. For the inspected Fossil samples i found no
mime type. Because the Fossil samples are just SQLite 3 database these should
at least get that mime type instead of generic application/octet-stream mime
type or deprecated application/x-sqlite3. That is now expressed inside TrID
definitions by line like:
<Mime>application/vnd.sqlite3</Mime>
At offset 68 the "Application ID" set by PRAGMA application_id is stored as 4
byte big endian integer. That is the most important sub classification feature
to distinguish the Fossil samples from others. For configuration database this
is decimal 252006673 or hexadecimal byte sequences F055111.
So now i can create fossil-repo.trid.xml manually without running tridscan
on dozen of examples. The sub classification done by "Application ID" is
expressed by XML construct like:
<Bytes>0F055111</Bytes>
<ASCII> . . Q</ASCII>
<Pos>68</Pos>
The main classification like in sqlite-3x.trid.xml is expressed by XML
construct like:
<Bytes>53514C69746520666F726D61742033</Bytes>
<ASCII> S Q L i t e f o r m a t 3</ASCII>
<Pos>0</Pos>
With the new trid definition now my Fossil repository database examples are
described with more details (correct file name suffix see appended
output/trid-v-new.txt). TrID definition and output are stored in archive
fossil-config.zip. I hope that my definition can be used in future version of
triddefs.
If i understand the facts alright the current fossil.trid.xml which may need
some refinements is not needed any more, because by application id check
inside fossil-repo.trid.xml the sub classification is now done.
There exist more other database samples using the application id feature. I
will try to handle such samples in future session.
With best wishes
Jörg Jenderek