Hello trid users,
some days ago i had trouble with booting UEFI system with GRUB and secure
boot. So i first look at files on EFI partition mounted on /boot/efi. In sub
directory with name like ubuntu i found files like BOOTX64.CSV with file name
suffix CSV.
So i run trid utility on my CSV samples. Most of the samples are not
recognized. Some "artificial" samples with BOM (byte order mark) are described
as "Text - UTF-16 (LE) encoded" by txt-utf-16-le.trid.xml with mime type
text/plain and file name suffix (.TXT) (see appended trid-old.txt
trid-v-old.txt in output).
For comparison reason i also run the file format identification utility DROID
(See
https://sourceforge.net/projects/droid/). Here all samples are
recognized. These are described as "Comma Separated Values" by PUID x-fmt/18
with mime type text/csv. But the recognition is here only based on the file
name suffix (see appended droid-sbat.csv in output).
For comparison reason i also run file command (version 5.45) on such
samples. Here most CSV samples especially all real world examples are not
recognized and are described as "data". A few like sbat.csv are described as
"CSV Unicode text" with UTF-16, little-endian encoding. A few are described as
"Unicode text" with UTF-16, little-endian encoding (see appended file-5.45.txt
in output). For the "recognized " samples here mime type text/csv or
text/plain is shown (see appended file-i-5.45.txt in output). Here no file
name suffix is shown (see appended file-ext-5.45.txt in output).
Luckily i found page about UEFI shim boot loader on github web site. So i use
this. So the reference URL in new definition is expressed by line like:
<RefURL>
https://github.com/rhboot/shim/blob/main/SBAT.md</RefURL>
Unfortunately there no precisely file format for this CSV is listed here.
Some more details can be found at Rod Smith page about managing EFI boot
loaders for Linux and fallback. See:
https://www.rodsbooks.com/efi-bootloaders/fallback.htmlThe CSV samples contain comma-separated values (CSV). One line contains 4 data
elements separated by commas (filename, label, options,description).
Unfortunately the exact encoding is not mentioned, but apparently for real
world examples this is UTF-16 (little endian) without BOM (Byte Order
Mark). But i do not know if this always true. So i mention my observations in
the remark line. Because in samples ASCII like strings are stored as UTF-16 LE
i got at odd offset nil bytes. That is expressed by XML constructs like:
<Pattern>
<Bytes>00</Bytes>
<Pos>1</Pos>
</Pattern>
<Pattern>
<Bytes>00</Bytes>
<Pos>3</Pos>
</Pattern>
...
<Pattern>
<Bytes>00</Bytes>
<Pos>107</Pos>
</Pattern>
Maybe that there exist samples with label or description field in
Chinese. Then 16 bits for characters are used and nil bytes at higher offset
will vanish. At the beginning the file name of UEFI bootable is stored. For
system files all operating system i know used English based names. So the nil
bytes at lower offset will probably always be true.
At the beginning the file name of bootable executables is stored. In real
world examples i found here similar strings (shimx64.efi, shimia32.efi,
refind_x64.efi). The UEFI staff is mainly pushed by Intel and Microsoft. The
standard bootable on such systems is like BOOTX64.EFI or bootia32.efi. So i
assume that other partners use the Windows convention to characterize such
bootable executables by 4 byte .efi sting at the end of file name. That is
expressed inside global strings section by line like:
<String>.'E'F'I</String>
So this probably always true.
The description field in my inspected real world samples start with phrase
"This is the boot entry for " followed by Linux distribution name (like redhat
ubuntu). That is expressed inside global strings section by line like:
<String>T'H'I'S' 'I'S' 'T'H'E' 'B'O'O'T' 'E'N'T'R'Y' 'F'O'R</String>
Maybe that there exist some exotic samples with Chinese text. Then the above
construct would be not true any more.
Most Linux (and Windows and macOS) text editors create ASCII files by default
for CSV samples. Or because of nil bytes does not handle the SBAT samples as
text when opening. But as "Text - UTF-16 (LE) encoded" instead of
application/octet-stream get mime type text/plain here we got a similar mime
type. So i choose what is shown by file command and DROID. So this expressed
by line like:
<Mime>text/csv</Mime>
With the new definition such CSV samples are now recognized and described (see
appended trid-v-new.txt trid-new.txt in output).
TrID definitions, some samples and output are stored in archive CSV_.zip. I
hope that my definition can be used in future version of triddefs.
With best wishes
J?rg Jenderek