Author Topic: fossil-checkout.trid.xml for Fossil checkout database  (Read 633 times)

jenderek

  • Sr. Member
  • ****
  • Posts: 375
fossil-checkout.trid.xml for Fossil checkout database
« on: August 08, 2023, 10:59:19 PM »
Hello trid users,

some weeks ago ago i send updates for aup3.trid.xml. There i detected a trick,
feature "application id" that can be also used by some other SQLite 3.x
databases.

The standard file name suffix for SQLite 3.x databases is SQLITE3, SQLITE or
DB.  Nowadays many companies and developers use this file format to store
their data.

Some use other file name suffix. I assume they do not want that "normal" users
open the database manually by tools for handling SQLite databases. But worse
is that some does not explain in a transparent way why they do such steps and
what they did change comparing with standard database. If all goes well (file
name suffix is OK or known and samples found in well known sub directories)
then it does not hurt, but in real world you must consider also all other
point views. So some behave like Putin claiming the world belongs to me or the
whole disc belongs to me for software developers. But what if hard disc crash,
extracting or packing of software archives failed. Then you often get hundreds
of "unknown" samples lying somewhere on your disc and undoing the chaos is
then nearly impossible. Luckily in current SQLite database file format there
exist a 4 byte field application id at offset 68 that make it possible to do
sub classification. Luckily some people use this feature.

In this session i will only consider Fossil checkout database.

The file name of such samples is _FOSSIL_ (apparently on Windows system) or
.fslckout On UNIX like systems. That information can be found for example on
page about Fossil checkout database on file formats archive team web site. So
that information is expressed inside new fossil-checkout.trid.xml by line
like:
     <RefURL>
     http://fileformats.archiveteam.org/wiki/Fossil_checkout_database
     </RefURL>

So i run trid utility on such Fossil database samples. My samples are
described as correctly generic as "SQLite 3.x database" with mime type
application/x-sqlite3 by sqlite-3x.trid.xml. But not sub classification is
done. That means file name is is not correctly shown (See appended
output/trid-v-old.txt).

For comparison reason i also run the file format identification utility DROID
( See https://sourceforge.net/projects/droid/). Here the examples are also
recognized. These are described here also generic as "SQLite Database File
Format" with version "3" and mime type application/x-sqlite3 by PUID fmt/729.

For comparison reason i also run file command (version 5.45) on such
samples. Here these Fossil samples are also described as "SQLite 3.x database"
but with additional information (Fossil checkout see appended
output/file-5.45.txt) and mime type application/vnd.sqlite3 (see appended
output/file-i-5.44.txt). The correct file suffix is also not recognized (see
appended output/file-ext-5.45.txt).

There exist an official registered mime type application/vnd.sqlite3 at
iana.org for SQLite 3.x database. For the inspected Fossil samples i found no
mime type. Because the Fossil samples are just SQLite 3 database these should
at least get that mime type instead of generic application/octet-stream mime
type or deprecated application/x-sqlite3. That is now expressed inside TrID
definitions by line like:
   <Mime>application/vnd.sqlite3</Mime>

At offset 68 the "Application ID" set by PRAGMA application_id is stored as 4
byte big endian integer. That is the most important sub classification feature
to distinguish the Fossil samples from others. For checkout database this is
decimal 252006674 or hexadecimal byte sequences F055112.

So now i can create fossil-checkout.trid.xml manually without running tridscan
on dozen of examples. The sub classification done by "Application ID" is
expressed by XML construct like:
   <Bytes>0F055112</Bytes>
   <ASCII> . . Q</ASCII>
   <Pos>68</Pos>

The main classification like in sqlite-3x.trid.xml is expressed by XML
construct like:
   <Bytes>53514C69746520666F726D61742033</Bytes>
   <ASCII> S Q L i t e   f o r m a t   3</ASCII>
   <Pos>0</Pos>

The correct file name suffix are now shown by line like:
   <Ext>FSLCKOUT</Ext>

With the new trid definition now my Fossil checkout database examples are
described with more details (correct file name suffix see appended
output/trid-v-new.txt). TrID definition, few samples and output are stored in
archive fossil-checkout.zip. I hope that my definition can be used in future
version of triddefs.

There exist more other database samples using the application id feature. I will
try to handle such samples in future session.

With best wishes
Jörg Jenderek