Hello trid users,
some days ago i run TrID inside LibreOffice directories. In sub
directory gallery there files of Office Galleries are stored. The
samples with file name extension sdv are described as "Generic OLE2 /
Multistream Compound File" by docfile.trid.xml ( see appended
sdv_big/output/trid-old.txt).
The newest file command {en.wikipedia.org/wiki/File_(command)}
describes some of the sdv-files as "StarOffice Gallery view". See
appended sdv_big/output/file-new.txt.
So i run tridscan to generate a definition file sdv.trid.xml. Unfortunately
i found no documentation about such gallery file format. At least one note
is mentioned on sub page about StarOffice binary formats on file formats
archive team site. This is expressed by reference line:
<RefURL>
http://fileformats.archiveteam.org/wiki/StarOffice_binary_formats </RefURL>
Apparently such SDV files already seems to start with the same 44 bytes
sequence. This is expressed by first XML construct like:
<Bytes>
D0CF11E0A1B11AE1000000000000000000000000000000003B000300FEFF09000600
</Bytes>
<Pos>0</Pos>
The first 8 bytes are the magic of Generic OLE2 / Multistream Compound
File as used by docfile.trid.xml where first XML construct looks like:
<Bytes>D0CF11E0A1B11AE1</Bytes>
<Pos>0</Pos>
I keep the remaining bytes after offset 8. According to OLE2
documentation the 2 byte sequence FEFF means little endian. When these
two bytes are swapped then this would be a big endian variant. The
0900 is the exponent for basis 2. This means used block size is
512=2**9. The byte sequence 3B000300 is the OLE2 format version 3.59.
The byte sequence 0600 means size of a short-sector is 64=2**6.
I removed the remaining patterns, because the pattern seems to be not
interesting. For example the end of first block is padded with FF byte
sequences.
According to OLE2 documentations the root storage has UTF-16 name
"Root Entry". This is described in global string section by line
<String>R'O'O'T' 'E'N'T'R'Y</String>
The SDV files recognized by file command are always bigger than the
others. Or in other words the file size is apparently higher the 2048
bytes.
When inspecting such big SDV files with software like Michal Mutl
Structured Storage Viewer {https://www.mitec.cz/ssv.html}, we see that
big samples contains some sub directories or streams in OLE2 speech. The
names of these streams seems to start always with dd followed by
decimal digits. This is described in global string section by line like:
<String>D'D'2</String>
And if streams are not empty, these streams seem to start with string
SVRLE. This is expressed in global string section by line like:
<String>SVRLE</String>
I also look for older versions. The same gallery format is used in old
StarOffice 5.2 dated about May 2000. And the format is still used in actual
LibreOffice with version 6.3.2.2 and it also used in OpenOffice
4.1.6.
Some sites on the net call the SDV samples like "OpenOffice.org
gallery storage", but such files are already used in predecessor
software StarOffice. And on ask.libreoffice.org the sdv parts are
called "gallery view". That phrase is also used by file command.
So i choose as describing text by line something like:
<FileType>StarOffice Gallery view (big)</FileType>
I used the user defined mime type shown by file command by line:
<Mime>application/x-star-sdv</Mime>
With the new trid definition sdv.trid.xml all big SDV examples are now
described ( see appended output/sdv_big/trid-new-v.txt).
The other SDV variant files have a file size of 2048 bytes. When
inspecting such samples with Structured Storage Viewer, we see that
such samples does not contain any streams. That means no characteristic
strings like SVRLE or streams names starting with dd are found.
So i run tridscan to generate a trid definition file
sdv-small.trid.xml, which seems to be unique for such SDV files and
does not match other OLE2 files. I rearrange the XML constructs to be
ready to clean trid definition.
The first XML is nearly the same as in the other variant but with an
additional 01000000 byte sequence at offset 44. This means the total
number of sectors used for the sector allocation table is always 1.
Not surprisingly because with a file size of 2048 and a block size of
512 such SDV samples only consist of 4 blocks.
According to output of file command ( See appended
output\file-new-soft.txt) for most samples directory sector identifier
(SecID) is 1 and for some 2. This number is stored as 4 byte integer
at offset 48. So byte at that offset is 1 or 2 and we get a gap in
constant byte sequence and second XML construct looks like
<Bytes>0000000000000000100000FEFFFFFF00000000
<Pos>49</Pos>
That also means in most cases block 3 is directory structure and in
some cases block 4 is the directory structure, which starts with
OLE2 typical UTF-16 string "ROOT ENTRY".
The 2 last blocks are always constant. So pattern at position 1024 and
1536 starts like
<ASCII> R . o . o . t . . E . n . t . r . y .
With the second trid definition sdv-small.trid.xml all small SDV
examples are now described ( see appended output/trid-new-v.txt). 2 TrID
definitions, some examples and output are stored in archive sdv.zip.
I hope that my XML file can be used in future version of triddefs.
Furthermore a gallery consist of some files. For every gallery beside
the SDG file there seems to exist files with same main name but with
extension sdv, thm and sometimes str. The last is not identified by
trid. I will try to handle the last file type in a future session.
With best wishes
Jörg Jenderek