Author Topic: sqlite-help.trid.xml for Maple help ; sqlite based  (Read 741 times)

jenderek

  • Sr. Member
  • ****
  • Posts: 375
sqlite-help.trid.xml for Maple help ; sqlite based
« on: January 16, 2023, 12:18:45 AM »
Hello trid users,

some days ago i send definition for SQLite Write-Ahead-Log Index shared
memory. Because i get there many different file name extensions so then for
the related SQLite database i expected also more file name extensions. So i
look on my system for more such databases.

Some samples have a suffix that is different from standard SQLITE or
SQLITE3. That is expressed in new definition by line like:
   <Ext>HELP</Ext>
These are used as help files of the mathematical software maple found at
https://www.maplesoft.com/.

So i run trid utility on such HELP examples. Theses are described in a
expected way as "SQLite 3.x database" by sqlite-3x.trid.xml (See appended
output/trid-v-old.txt).

For comparison reason i also run file command ( version 5.44 and 5.40) on
such samples. Here these maple samples are described as "SQLite 3.x
database" with no sub classification and with additional information like
version, file counter, number of pages and UTF-8 encoding (See appended
file-5.44.txt and file-5.40.txt in output).

In version 5.44 now the correct suffix HELP is also shown inside list
(sqlite/sqlite3/db/db3/dbe/sdb/help see appended file-ext-5.44.txt in
output) compared with standard extensions (sqlite/sqlite3/db/dbe) shown by
version 5.40 (see appended file-ext-5.40.txt in output).

On maplesoft web site exist a page about Maple Workbook (.maple) format.
This information is used as reference URL by line like:
 <RefURL>
 https://www.maplesoft.com/support/help/Maple/view.aspx?path=worksheet/reference/helpdatabase
 </RefURL>
Unfortunately the maple software developers also behaves like i am the king
of the user system and not explaining what is their HELP files. That is
clear when you find the samples like maple.help and update.help inside lib
sub directory of Maple Program directory like "c:\Program Files\Maple
2021". But this is not clear when you find such samples as FILE0001.CHK
after a file system crash for example. Obviously the help system for the
ample software is SQLite based, but this is not mentioned on the web
site. There it is only written prior to Maple 18, help databases used the
HDB format.

The HELP files are also not registered inside system. Because such HELP
samples are sqlite database files the samples should get at least the mime
type of this database format. There exist an official registered mime at
iana.org. That is application/vnd.sqlite3. So instead of generic mime type
application/octet-stream i choose this mime type. So this is now expressed
by line like:
   <Mime>application/vnd.sqlite3</Mime>
That is also shown by newest file command (See appended
output/file-i-5.44.txt) whereas version 5.40 show the deprecated type
application/x-sqlite3 (See appended output/file-i-5.40.txt).

For comparison reason i also run the file format identification utility
DROID ( See https://sourceforge.net/projects/droid/). Here the HELP examples
are also recognized in a generic way. These are described as "SQLite
Database File Format" with version "3" and with mime type
application/x-sqlite3 by PUID fmt/729.

So i run tridscan on such HELP samples to generate
sqlite-help.trid.xml. With the help of file format specification i look at
my TrID definition and try to understand why some constructs appear or where
i can refine the definition.

The database start with 16 byte string "SQLite format 3\000".  So this fact
is expressed by the first construct:
   <Bytes>53514C69746520666F726D6174203300</Bytes>
   <ASCII> S Q L i t e   f o r m a t   3</ASCII>
   <Pos>0</Pos>

At offset 16 the database page size in bytes is stored as 2 byte big endian
integer. In all my examples i found samples with standard value 4096 (1000
hexadecimal) and value 1024 (0400 hexadecimal).
At offset 18 the file format write version is stored as byte
integer. According to documentation this is 1 for legacy.
At offset 19 the file format read version is stored as byte
integer. According to documentation this is 1 for legacy.
At offset 20 the bytes of unused "reserved" space at the end of each
page is stored as byte integer. Usually this is 0.
At offset 21 the maximum embedded payload fraction is stored as byte
integer. According to documentation this must be 64 (40 hexadecimal).
At offset 22 the minimum embedded payload fraction is stored as byte
integer. According to documentation this is must be 32 (20 hexadecimal).
At offset 23 the Leaf payload fraction is stored as byte
integer. According to documentation this is must be 32 (20 hexadecimal).
At offset 24 the file change counter is stored as 4-byte big endian integer.
So these facts are expressed by the second construct:
   <Bytes>000101004020200000</Bytes>
   <ASCII> . . . . @</ASCII>
   <Pos>17</Pos>
In my examples i get low values for file counter ( like 101 1024 35509). If
these value reach 4 GiB limit then the above construct will shrink and
become like:
   <Bytes>00010100402020</Bytes>
   <ASCII> . . . . @</ASCII>
   <Pos>17</Pos>

At offset 28 the size of the database file in pages is stored as 4-byte big
endian integer. These fact was expressed by construct like:
   <Bytes>00</Bytes>
   <Pos>28</Pos>

In my examples i get low values for number of pages ( like 187 755
210163). If these value reach 4 GiB limit then the above construct will
shrink and vanish.

At offset 32 the page number of the first free list trunk page is
stored as 4-byte big endian integer.  At offset 36 the total number of
free list pages is stored as 4-byte big endian integer. In all my
examples both values are nil.
At offset 40 the schema cookie is stored as 4-byte big endian integer.
In my examples i get same values for cookie (like 0x35).
At offset 44 the schema format number is stored as 4-byte big endian
integer. Supported schema formats are 1, 2, 3, and 4. In my examples i
get only value 4.
At offset 48 the default page cache size is stored as 4-byte big
endian integer. In my examples i get only value 0.
At offset 52 the page number of the largest root b-tree page is stored
as 4-byte big endian integer. In my examples i get only value 0.
At offset 56 the database text encoding is stored as 4-byte big endian
integer. In my examples i get only value 1. That means UTF-8.
At offset 60 the "user version" as read and set by the user_version
pragma is stored as 4-byte big endian integer. In my examples i get
value 0.
At offset 64 the incremental-vacuum mode is stored as 4-byte big
endian integer. In my examples i get value 0 for false.
At offset 68 the "Application ID" is stored as 4-byte big endian integer. In
my examples i get value 0. So there exist no significant and easy
characteristic that make it easy to distinguish HELP samples from other
sqlite database samples.
At offset 72 20 bytes reserved for expansion are stored. These must be
zero.
At offset 92 the version-valid-for number is stored as 4-byte big endian
integer. In my examples i get low values like 84 101 35509.  So these facts
are expressed by construct like:
 <Bytes>0000000000000000000000350000000400000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000</Bytes>
 <ASCII> . . . . . . . . . . . 5</ASCII>
 <Pos>32</Pos>
Assuming that version-valid-for number can reach higher values and reach 4
GiB limit the above construct will shrink and become like:
 <Bytes>0000000000000000000000350000000400000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000</Bytes>
 <ASCII> . . . . . . . . . . . 5</ASCII>
 <Pos>32</Pos>

At offset 96 the SQLITE_VERSION_NUMBER is stored 4-byte big endian
integer. In my examples i get values like 3008001 3027002. This fact was
expressed by construct like:
   <Bytes>00</Bytes>
   <Pos>96</Pos>
Assuming that this version can reach higher values and reach 4 GiB limit the
above construct will vanish.

The header contain 100 bytes. So i assume that all constructs above this
limit are triggered by lucky circumstances ( too few examples and not
reaching 4 GiB limit). So i delete such patterns like:
   <Bytes>05000000</Bytes>
   <Pos>100</Pos>
   ...
   <Bytes>0617451F0100696E64657873716C6974655F6175746F696E6465785F</Bytes>
   <ASCII> . . E . . . i n d e x s q l i t e _ a u t o i n d e x _</ASCII>
   <Pos>541</Pos>

Because the HELP samples have neither an not zero value for user version nor
for application ID then hopefully some keywords or phrases inside global
strings are unique enough for maple HELP samples, but i do not know.

With the new trid definition now my maple HELP examples are described more
precisely (see appended output/trid-v-new.txt). TrID definitions, and output
are stored in archive help.zip. I hope that my definition can be used in
future version of triddefs.

With best wishes
Jörg Jenderek


Mark0

  • Administrator
  • Hero Member
  • *****
  • Posts: 2743
    • Mark0's Home Page
Re: sqlite-help.trid.xml for Maple help ; sqlite based
« Reply #1 on: January 16, 2023, 09:23:37 PM »
Thanks!
I'll try to collect some other file samples.