Hello trid users,
some weeks ago i installed Linux Mint 21.1. The main partition size was 89
GB but i run out of space and get 100% usage. I tried to delete some
unnecessary files but the free space is immediately filled. That was
annoying. Nowadays for every little item you get a notification but not for
the really important things. The problem for me is what can be deleted. I
know i can remove some old log files, backup files, downloads and cache
files, but i can not do this in a hinted way by bleachbit or Czkawka because
the graphical Desktop environment does not start any more. I tried to use
command line tools like du, df, ncdu but these do not work reliable on btrfs
file system. Furthermore it is difficult to find many small files. For this
purpose i tried many different disk space visualization tools running from
rescue or other operating system. For me tools like baobab, k4dirstat,
Filelight are not useful because i get a colored map of my disk, but the
colors are not correlated to a file type, but that is what i needed. At
least gdmap has this feature i needed, but only a few file types by
extensions are predefined. So i just spend one day to add more colors for
"big" or "many" file types which are shown with gray color. The other
solution was tool SequoiaView, but this requires wine environment. Nearly 2
GB were occupied by files beneath /var/log/journal. When i look in sub
directory with "machine id" i get many similar files.
So i run trid utility on my journal examples. All samples are described as
"systemd journal" with generic mime type application/octet-stream by
journal-sysd.trid.xml. Only one suffix JOURNAL is listed, but some samples
have suffix JOURNAL~ (see appended output/trid-v-old.txt)
For comparison reason i also run the file format identification utility
DROID ( See
https://sourceforge.net/projects/droid/). Here no example is
recognized.
For comparison reason i also run file command (version 5.44 and newer
linux,v 1.85 2023/07/17) on such samples. Here the samples are described as
"Journal file", but also more information is shown. The status ( offline
online archived ) is also shown. Some samples are described as empty. All of
these samples have suffix JOURNAL~ (see appended file-5.44.txt file-ext.txt
in output). In newest version also more flag bits (keyed hash siphash24,
compressed zstd) are interpreted (see appended output/file.txt). Also for
non empty variants timestamp like "Sat Jul 8 20:52:22 2023" is shown and
number of entries is shown. This can be verified by command line like:
journalctl --file=user-1000.journal | wc -l
In newest version now instead of generic application/octet-stream
application/x-linux-journal is shown ( see appended output file-i.txt).
So i update TrID definition journal-sysd.trid.xml. Now 2 suffix are
possible. That is expressed by line like:
<Ext>JOURNAL/JOURNAL~</Ext>
The mime type is now shown by line like:
<Mime>application/x-linux-journal</Mime>
Then i run tridscan on my empty samples to generate
journal-sysd-empty.trid.xml. Instead of page about Journal File Format on
freedesktop.org web site i used Linux manual page
systemd-journald.service(8). That is expressed by line like:
<RefURL>
https://man7.org/linux/man-pages/man8/systemd-journald.service.8.html </RefURL>
In Linux manual page systemd-journald.service(8) is written that if the
daemon is stopped uncleanly, or if the files are found to be corrupted, they
are renamed using the ".journal~" suffix, and the daemon starts writing to a
new file. Unfortunately is not explained how this is expressed inside the
journal structure itself. The suffix journal~ is not used as i expected by
my intuition. So by try and error i can only say that for empty variants of
offline/online i always got suffix journal~. So the file name suffix
information is now shown by line like:
<Ext>JOURNAL~</Ext>
After running i look at generated patterns and try to understand why things
happen and try to refined patterns.
The first byte sequence is the magic signature[8] LPKSHHRH. At offset 8 the
compatible_flags are stored as 32 bit in little endian. Here only the first
bit can be set, where HEADER_COMPATIBLE_SEALED means value 1. In my samples
i get only value 0.
At offset 12 the incompatible_flags are stored as 32 bit in little
endian. According to newest documentation here only the first 5 bit can be
set, where highest value 16 means HEADER_INCOMPATIBLE_COMPACT. In my samples
i get value Ch=12=8+4. That is keyed hash siphash24 and compressed
zstd. These facts were expressed by first XML construct like:
<Bytes>4C504B5348485248000000000C000000</Bytes>
<ASCII> L P K S H H R H</ASCII>
<Pos>0</Pos>
Assuming that also other flag values can occur i get 3 constructs like:
<Pattern>
<Bytes>4C504B5348485248</Bytes>
<ASCII> L P K S H H R H</ASCII>
<Pos>0</Pos>
</Pattern>
<Pattern>
<Bytes>000000</Bytes>
<Pos>9</Pos>
</Pattern>
<Pattern>
<Bytes>000000</Bytes>
<Pos>13</Pos>
</Pattern>
According to documentation at offset 17 reserved[7] are stored. All fields
marked as "reserved" must be initialized with 0 when writing. So this is
expressed by XML construct like:
<Pattern>
<Bytes>00000000000000</Bytes>
<Pos>17</Pos>
</Pattern>
According to documentation at offset 88 the header size is stores as 8 byte
integer header_size in little endian. This seems to be always 100h.
Afterwards comes field arena_size and so on til end of header. Apparently
for empty journals all these fields are apparently nil. That is expressed by
last XML construct like:
<Bytes>00010000000000000000000000000000000000000000000000000000000000
<Pos>88</Pos>
With the updated and new trid variant now all my journal samples are
described. TrID definitions, some samples and output are stored in
journal-mint.zip. I hope that my definitions can be used in future version
of triddefs.
With best wishes
Jörg Jenderek