Hello trid users,
some days ago i send definition for SQLite Write-Ahead-Log Index
shared memory. Because i get there many different file name extensions
so then for the related SQLite database i expected also more file name
extensions. So i look on my system for more such databases.
Some samples have a suffix that is different from standard SQLITE or
SQLITE3. That is expressed in new definition by line like:
<Ext>MAPLE</Ext>
These belong to mathematical software maple found at
https://www.maplesoft.com/.
So i run trid utility on such MAPLE examples. Theses are described in
a expected way as "SQLite 3.x database" by sqlite-3x.trid.xml (See
appended output/trid-v-old.txt).
For comparison reason i also run file command ( version 5.44 and 5.40)
on such samples. Here these maple samples are described as "SQLite
3.x database" with sub classification as "Maple Workbook" and with
additional information like version, file counter, number of pages and
UTF-8 encoding (See appended file-5.44.txt and file-5.40.txt in
output).
In version 5.44 now the correct suffix MAPLE is shown (See appended
file-ext-5.44.txt in output) compared with standard extensions
(sqlite/sqlite3/db/dbe) shown by version 5.40 (See appended
file-ext-5.40.txt in output).
On maplesoft web site exist a page about Maple Workbook (.maple)
format. There it is said that underlying file format is an SQLite
database. This information is used as reference URL by line like:
<RefURL>
https://www.maplesoft.com/support/help/errors/view.aspx?path=Formats/Maple </RefURL>
Because maple samples are sqlite database files the samples should get
at least the mime type of this database format. There exist an
official registered mime at iana.org. That is
application/vnd.sqlite3. So instead of generic mime type
application/octet-stream i choose this mime type. So this is now
expressed by line like:
<Mime>application/vnd.sqlite3</Mime>
That is also shown by newest file command (See appended
output/file-i-5.44.txt) whereas version 5.40 show the deprecated type
application/x-sqlite3 (See appended output/file-i-5.40.txt).
For comparison reason i also run the file format identification
utility DROID ( See
https://sourceforge.net/projects/droid/). Here the
database examples are recognized. These are described as "SQLite
Database File Format" with version "3" and with mime type
application/x-sqlite3 by PUID fmt/729.
So i run tridscan on such MAPLE samples to generate
sqlite-maple.trid.xml. With the help of file format specification i
look at my TrID definition and try to understand why some constructs
appear or where i can refine the definition.
The database start with 16 byte string "SQLite format 3\000". At
offset 16 the database page size in bytes is stored as 2 byte big
endian integer. This is in all my examples the standard value 4096
(1000 hexadecimal).
At offset 18 the file format write version is stored as byte
integer. According to documentation this is 1 for legacy.
At offset 19 the file format read version is stored as byte
integer. According to documentation this is 1 for legacy.
At offset 20 the bytes of unused "reserved" space at the end of each
page is stored as byte integer. Usually this is 0.
At offset 21 the maximum embedded payload fraction is stored as byte
integer. According to documentation this must be 64 (40 hexadecimal).
At offset 22 the minimum embedded payload fraction is stored as byte
integer. According to documentation this is must be 32 (20 hexadecimal).
At offset 23 the Leaf payload fraction is stored as byte
integer. According to documentation this is must be 32 (20 hexadecimal).
At offset 24 the file change counter is stored as 4-byte big endian integer.
So these facts are expressed by the first construct:
<Bytes>53514C69746520666F726D617420330010000101004020200000</Bytes>
<ASCII> S Q L i t e f o r m a t 3 . . . . . . @</ASCII>
<Pos>0</Pos>
In my examples i get low values for file counter ( like 202 247 263
320 322 411). If these value reach 4 GiB limit then the above construct
will shrink and become like:
<Bytes>53514C69746520666F726D61742033001000010100402020</Bytes>
<ASCII> S Q L i t e f o r m a t 3 . . . . . . @</ASCII>
<Pos>0</Pos>
At offset 28 the size of the database file in pages is stored as
4-byte big endian integer. These fact was expressed by construct like:
<Bytes>0000</Bytes>
<Pos>28</Pos>
In my examples i get low values for file counter ( like 141 143 205
1102 8194 ). If these value reach 4 GiB limit then the above construct
will shrink and vanish.
At offset 32 the page number of the first free list trunk page is
stored as 4-byte big endian integer. At offset 36 the total number of
free list pages is stored as 4-byte big endian integer. In all my
examples the both values are nil.
At offset 40 the schema cookie is stored as 4-byte big endian integer.
So these 3 facts were expressed by construct like:
<Bytes>00000000000000000000</Bytes>
<Pos>32</Pos>
In my examples i get low values for cookie ( like 0x8f 0x93 0xa4
0x103). If these value reach 4 GiB limit then the above construct will
shrink and become like:
<Bytes>0000000000000000</Bytes>
<Pos>32</Pos>
At offset 44 the schema format number is stored as 4-byte big endian
integer. Supported schema formats are 1, 2, 3, and 4. In my examples i
get only value 4.
At offset 48 the default page cache size is stored as 4-byte big
endian integer. In my examples i get only value 0.
At offset 52 the page number of the largest root b-tree page is stored
as 4-byte big endian integer. In my examples i get only value 0.
At offset 56 the database text encoding is stored as 4-byte big endian
integer. In my examples i get only value 1. That means UTF-8.
At offset 60 the "user version" as read and set by the user_version
pragma. is stored as 4-byte big endian integer. In my examples i get
low values like 9 10 15 19. So these facts were expressed by construct
like:
<Bytes>00000004000000000000000000000001000000</Bytes>
<Pos>44</Pos>
Assuming that for user version higher values reaching 4 GiB limit are
possible then the above construct shrink and become like:
<Bytes>00000004000000000000000000000001</Bytes>
<Pos>44</Pos>
At offset 64 the incremental-vacuum mode is stored as 4-byte big
endian integer. In my examples i get value 0 for false.
At offset 68 the "Application ID" set by PRAGMA application_id is
stored as 4-byte big endian integer. In my examples i get value
5CDE09EFh. So this is the most significant characteristic and should
be an unique pattern to distinguish MAPLE samples from other sqlite
database samples.
At offset 72 20 bytes reserved for expansion are stored. These Must be
zero.
At offset 92 the version-valid-for number is stored as 4-byte big
endian integer. In my examples i get low values like 202 247 345 411
991.
So these facts are expressed by construct like:
<Bytes>000000005CDE09EF00000000000000000000000000000000000000000000</Bytes>
<ASCII> . . . . \</ASCII>
<Pos>64</Pos>
Assuming that version-valid-for number can reach higher values and
reach 4 GiB limit the above construct will shrink and become like:
<Bytes>000000005CDE09EF0000000000000000000000000000000000000000</Bytes>
<ASCII> . . . . \</ASCII>
<Pos>64</Pos>
At offset 96 the SQLITE_VERSION_NUMBER is stored 4-byte big endian
integer. In my examples i get values like 3013000 3027002. This fact
was expressed by construct like:
<Bytes>00</Bytes>
<Pos>96</Pos>
Assuming that this version can reach higher values and reach 4 GiB
limit the above construct will vanish.
The header contain 100 bytes. So i assume that all constructs above
this limit are triggered by lucky circumstances ( too few examples and
not reaching 4 GiB limit). So i delete such patterns like:
<Bytes>05000000070FDD000000006D0FFB0FF60FF10FEC0FE70FE20FDD0BF00A8F09830931082307A4072A05</Bytes>
<ASCII> . . . . . . . . . . . m . . . . . . . . . . . . . . . . . . . . . 1 . # . . . *</ASCII>
<Pos>100</Pos>
...
<Bytes>6E</Bytes>
<ASCII> n</ASCII>
<Pos>1647</Pos>
With the new trid definition now my MAPLE examples are described more
precisely (see appended output/trid-v-new.txt). TrID definitions, and
output are stored in archive maple_.zip. I hope that my definition can
be used in future version of triddefs.
Unfortunately there seems to exist a password protected variant with same
suffix, but the file structure there does seem to have no characteristic pattern.
With best wishes
Jörg Jenderek