Author Topic: sqlite-shm.trid.xml for SQLite Write-Ahead-Log Index shared memory *-shm  (Read 758 times)

jenderek

  • Sr. Member
  • ****
  • Posts: 375
Hello trid users,

some days ago i run Pirisoft ccleaner. Under item for file extension under
registry cleaner i can scan for errors. There it complains about suffix
srd-shm. In the same category i get files with suffix like:
   sqlite-shm/db-shm/db3-shm/dbx-shm/aup3-shm/srd-shm
For every file exist a companion file without 4 byte phrase -shm at the end.

So i run trid utility on such SQLite examples. The SHM samples are described
as "Unknown!". The corresponding database samples without shm phrase are
described as "SQLite 3.x database" by sqlite-3x.trid.xml (See appended
output/trid-v-old.txt).

For comparison reason i also run file command (newest version sql,v 1.25
2023/01/06) on such samples. Here these SHM samples are described as
"SQLite Write-Ahead Log shared memory" with additional information like
counter, page size, page numbers and check sums (See appended
output/file-new.44.txt) and all possible extensions (see appended
file-ext-new.txt).

For comparison reason i also run the file format identification utility
DROID ( See https://sourceforge.net/projects/droid/). Here only the database
examples are recognized. These are described as "SQLite Database File
Format" with version "3" and with mime type application/x-sqlite3 by PUID
fmt/729.

With the help of the information shown by newest file command i found an
official document about the WAL-Index File Format on sqlite.org.  This is
expressed by line like:
   <RefURL>http://www.sqlite.org/draft/walformat.html#walidxfmt</RefURL>

There exist an official registered mime type application/vnd.sqlite3 at
iana.org. That is application/vnd.sqlite3, but according to documentation
this only explicitly applies to the database itself. So instead of generic
mime type application/octet-stream i choose the deprecated mime type of
database. So this is now expressed by line like:
      <Mime>application/x-sqlite3</Mime>


The suffix is generated by adding 4 byte phrase -shm at name of
corresponding SQLite database. Typical the two suffix sqlite and db are used
for such databases. But i also found other extensions. db3-shm is used in
Acronis Backup And Recovery F4CEEE47-042C-4828-95A0-DE44EC267A28.db3-shm
sample. The dbx-shm is probably used for Dropbox sample
filecache.dbx-shm. The suffix aup3-shm is used for Audacity project like
tada.aup3-shm. The srd-shm is used for Microsoft Windows StateRepository
service samples like StateRepository-Deployment.srd-shm
StateRepository-Machine.srd-shm found inside
c:\ProgramData\Microsoft\Windows\AppRepository. So these facts are expressed
by line like:
   <Ext>SQLITE-SHM/DB-SHM/DBX-SHM/DB3-SHM/AUP3-SHM/SRD-SHM</Ext>

In the documentation is written that values except of salt values are stored
in the native byte-order of the host machine. My inspected samples come from
little endian machines. So i expect that on big endian machines the bytes in
definition must be swapped.

So i run tridscan on such SHM samples to generate sqlite-shm.trid.xml.  With
the help of file format specification i look at my TrID definition and try
to understand why some constructs appear or where i can refine the
definition.

In the first 4 bytes the WAL-index format version number is stored. This is
always always 3007000 (2DE218 hexadecimal big endian). The next 4 bytes are
unused padding space and must be zero. So these 2 facts are expressed by the
first construct:
   <Bytes>18E22D0000000000</Bytes>
   <ASCII> . . -</ASCII>
   <Pos>0</Pos>
The bytes from offset 48 until 95 contain a second copy of the WAL index
information part. So i also get a construct like:
   <Bytes>18E22D0000000000</Bytes>
   <ASCII> . . -</ASCII>
   <Pos>48</Pos>

At offset 8 an unsigned 4 byte integer counter iChange is stored. This is
incremented with each transaction.  At offset 12 a flag byte is stored. The
value is 1 when the shm file has been initialized. All my examples are
initialized ( that means 1).  At offset 13 byte variable bigEndCksum is
stored. This is true if the WAL file uses big-ending check sums and 0 if the
WAL uses little-endian check sums. Because all my examples are little endian
i get here value 0. These 3 facts were expressed by construct like:
   <Bytes>000100</Bytes>
   <Pos>11</Pos>
In my examples the iChange counter is "low" and not reaching 4 GiB
limit. For samples reaching that limit the construct must shrink and become
like:
   <Bytes>0100</Bytes>
   <Pos>12</Pos>
Then the second copy part also must shrink and become like:
   <Bytes>0100</Bytes>
   <Pos>60</Pos>

At offset 16 an unsigned 4 byte integer mxFrame is stored. That is the value
for number of valid and committed frames in the WAL file.  This was
expressed by XML construct like:
   <Bytes>0000</Bytes>
   <Pos>18</Pos>
In my my examples i get "low" values, but when reaching 4 GiB limit this
construct will vanish. Then the second copy will also vanish. That was
expressed by XML construct:
   <Bytes>0000</Bytes>
   <Pos>66</Pos>

At offset 20 an unsigned 4 byte integer nPage is stored. That is the value
for the size of the database file in pages. This was expressed by XML
construct like:
   <Bytes>0000</Bytes>
   <Pos>22</Pos>
In my my examples i get "low" values, but when reaching 4 GiB limit this
construct will vanish. Then the second copy will also vanish. That was
expressed by XML construct:
   <Bytes>0000</Bytes>
   <Pos>70</Pos>

At offset 96 an unsigned 4 byte integer nBackfill is stored. That is the
value for number of WAL frames that have already been back filled into the
database by prior checkpoints. At offset 100 5 unsigned 4 byte integer
read-mark are stored. The first has value nil in all my examples. These 2
facts are expressed by XML construct like:
   <Bytes>000000000000</Bytes>
   <Pos>98</Pos>
In my my examples i get "low" values for back filled, but when reaching 4
GiB limit this construct will shrink. I do not know if first read-mark is
always nil. So i make there no assumption about the first. So at least the
above construct must shrink and become like:
   <Bytes>00000000</Bytes>
   <Pos>100</Pos>

The second and third read-mark values are different, but fourth and fifth
read-mark starting at offset 122 always have values 0xFFffFFff.  At offset
120 are unused space bytes set aside for 8 file locks. In my examples i
always found there nil bytes, what apparently means no file is locked by
operating system. These two facts were expressed by XML construct like:
   <Bytes>FFFFFFFFFFFFFFFF0000000000000000</Bytes>
   <Pos>112</Pos>
I do not know if other read-mark are possible at this place. So i keep
them. When i interpret the documentation right then non nil values are
possible for file lock bytes. So the above construct now becomes like:
   <Bytes>FFFFFFFFFFFFFFFF</Bytes>
   <Pos>112</Pos>

At offset 128 an unsigned 4 byte integer nBackfillAttempted is stored. That
is the value for the number of WAL frames that have attempted to be back
filled but which might not have been back filled successfully.  At offset
132 4 unused space bytes reserved for further expansion are stored. These
two facts were expressed by XML construct like:
   <Bytes>000000000000</Bytes>
   <Pos>130</Pos>
In my examples the value for attempts was the same as for back filled. So
that value was "low" but when reaching 4 GiB limit this construct will
shrink. So this now become like:
   <Bytes>00000000</Bytes>
   <Pos>132</Pos>

The header has a length of 136 bytes. So that means all short nil constructs
at higher offsets belong to some index parts. These looks like:
      <Pattern>
         <Bytes>0000</Bytes>
         <Pos>138</Pos>
      </Pattern>
      <Pattern>
         <Bytes>0000</Bytes>
         <Pos>142</Pos>
      </Pattern>
      ...

      <Pattern>
         <Bytes>000000</Bytes>
         <Pos>2045</Pos>
      </Pattern>
I assume that this are triggered by too few samples and not reaching 4 GiB
limit. So when scanning more samples with values reaching 4 GiB values these
last constructs will vanish. So i delete them.

With the new trid definition now my SHM examples are described (see appended
output/trid-v-new.txt). TrID definitions, and output are stored in archive
sqlite-shm_.zip. I hope that my definition can be used in future version of
triddefs.

Because for SHM samples i get so many different file name extensions. So i
expect that this also applies to definitions for corresponding SQLite
database and the related WAL files. I will try to handle this in a future
session.

With best wishes
Jörg Jenderek

Mark0

  • Administrator
  • Hero Member
  • *****
  • Posts: 2732
    • Mark0's Home Page
Re: sqlite-shm.trid.xml for SQLite Write-Ahead-Log Index shared memory *-shm
« Reply #1 on: January 12, 2023, 03:44:11 PM »
Thanks!

I found various samples with a lot of different extensions. Will probably leave the naming convention note on the remarks, and just put SHM as the ext.