Author Topic: db-mlocate.trid.xml for mlocate database  (Read 701 times)

jenderek

  • Sr. Member
  • ****
  • Posts: 375
db-mlocate.trid.xml for mlocate database
« on: September 09, 2023, 04:39:29 AM »
Hello trid users,

some weeks ago ago i handled some SQLite database samples. Often these have
the file name suffix DB. So i looked for such samples on my
systems. Unfortunately this suffix is also used for other database formats. In
this session i will handle mlocate database. This is used by "standard" search
utility locate on UNIX like systems. Typically the database is stored as
/var/lib/mlocate/mlocate.db ( On SUSE 13.2, Raspian/Debian 10). But by option
parameters other database name and path could be used. The utility is for
examples described by page on Wikipedia like:
     https://en.wikipedia.org/wiki/Locate_(Unix)

The calling of this program is described in Linux User Manual locate(1). You
can find this on the web for example via link like:
      https://manned.org/locate.1

The companion to creates or updates a database is called updatedb. This
command is described by Linux User Manual updatedb(8). So i use that URL as
reference inside new trid definition db-mlocate.trid.xml. This is expressed by
line like:
   <RefURL>
   https://manned.org/updatedb.8
   </RefURL>

So i run trid utility on such mlocate database samples. Most of my samples are
described as "Unknown!" or wrong as "Mac Compact Pro archive" by mac-cpt.trid.xml
(See appended output/trid-v-old.txt).

For comparison reason i also run the file format identification utility DROID
( See https://sourceforge.net/projects/droid/). Here the examples are
described wrong as "Thumbs DB file" with version "XP" and mime type
application/vnd.microsoft.windows.thumbnail-cache by PUID fmt/682 via DB
extension.

For comparison reason i also run file command (version 5.45) on such
samples. Here samples are described as "mlocate database" and with additional
information (version, root and visibility see appended output/file-5.45.txt)
and generic mime type application/octet-stream (see appended
output/file-i-5.44.txt). The correct file suffix is also not recognized (see
appended output/file-ext-5.45.txt).

Instead of generic mime type i choose a user defined one.  That is expressed
by line like:
   <Mime>application/x-mlocate</Mime>

Luckily mlocate is open source. So with the help of the header file db.h it
tried to understand and improve the trid patterns. According to that the
samples starts with 8 byte magic \0mlocate. At offset 8 the configuration
block size is stored as 32 bit unsigned in big endian. In real examples i get
size with some hundreds of bytes. The highest value i observed was 0x258 and
the lowest 0x52 (see appended output/file.tmp). So this is expressed inside
Front block section by XML construct like:

   <Bytes>006D6C6F636174650000</Bytes>
   <ASCII> . m l o c a t e</ASCII>
   <Pos>0</Pos>

In theory this size value could reach the 4 GB limit, but in reality this
never could occur. I will explain later why. In counter part i consider it as
a security issue. Maybe it is possible to add at the end of the typical
configuration block some malicious data ( up to nearly 4 GB) and increase the
size value. If locate read the configuration block and does not explicitly
check for non known key values combination than you will get in trouble. So i
keep the 2 upper nil bytes.

At offset 12 the File format version is stored as byte value. This is
expressed by XML construct like:
   <Bytes>00</Bytes>
   <Pos>12</Pos>

At the moment the version value is 0 expressed by constant DB_VERSION_0. In my
opinion a higher version will not occur, because mlocate will/is replaced by
plocate.

At offset 14 2 byte pad for 32-bit total alignment is stored. In my examples i
got here nil bytes. So i assume that this is always true.
At offset 16 the NUL-terminated absolute path of the database-root is
stored. In standard configuration this is 1 byte string /, but you can force
another like /tmp/mlocate by updatedb option -U. I also tried to use relative
path, but in the stored path in database is translated in absolute path. So
the stored path always start with a slash character.  So these observations
are expressed by XML construct like:
   <Bytes>00002F</Bytes>
   <ASCII> . . /</ASCII>
   <Pos>14</Pos>

The behaviour of the updatedb program can be influenced by options or
configuration variables. These are typically stored inside /etc/updatedb.conf
and explained by manual page updatedb.conf(5).  These 4 variables are
expressed inside global strings by lines like:
      <String>PRUNE_BIND_MOUNTS</String>
      <String>PRUNENAMES</String>
      <String>PRUNEPATHS</String>
      <String>PRUNEFS</String>

When we look for examples inside updatedb.conf we see typically key value
combinations like:
PRUNEFS="afs anon_inodefs auto autofs bdev binfmt binfmt_misc
cgroup cifs coda configfs cramfs cpuset debugfs devfs devpts devtmps
ecryptfs eventpollfs exofs futexfs ftpfs fusectl gfs gfs2 hostfs hugetlbfs
inotifyfs iso9660 jffs2 lustre  mqueue ncpfs nfs NFS nfs4 nfsd nnpfs
ocfs ocfs2 pipefs proc ramfs rpc_pipefs
securityfs selinuxfs sfs shfs smbfs sockfs spufs sshfs subfs supermount sysfs
tmpfs ubifs udf usbfs vboxsf vperfctrfs"
PRUNEPATHS="/tmp /var/tmp /var/cache /var/lock /var/run /var/spool
/cdrom /usr/tmp /proc  /sys /.snapshots /var/run/media"
PRUNENAMES=".git .hg .svn CVS"
PRUNE_BIND_MOUNTS="no"

When counting the bytes of all these variables and their keys we get about the
size of the configuration block which is in the range of some hundreds. I see
no reason why this would be extremely changed and especially increased.

With the new trid definition now my mlocate database examples are recognized
and described (see appended trid-v-new.txt in output). TrID definition, some
samples and output are stored in archive db_locate.zip. I hope that my
definitions can be used in future version of triddefs.

Unfortunately there exist other variants of the locate utility. So i do not
know if the other variants are also described by my definition. What i can say
that plocate is similar compared with mlocate, but is not matched current
definition. I will try to handle this variant in a future session.

With best wishes
Jörg Jenderek


Mark0

  • Administrator
  • Hero Member
  • *****
  • Posts: 2743
    • Mark0's Home Page
Re: db-mlocate.trid.xml for mlocate database
« Reply #1 on: September 09, 2023, 03:19:39 PM »
Many thanks for the new def!