Author Topic: updated mat-l5.trid.xml for Matlab MAT-File *.mat + level 7 variant  (Read 1012 times)

jenderek

  • Sr. Member
  • ****
  • Posts: 375
Hello trid users,

some days ago i inspected some Matlab examples with file name extension mat.

For level 4 variants i send 2 definitions to recognize such "older"
matrices. Now i run trid again on all my MAT examples.

But a few examples like are misidentified as "SMS Material" by
mat-sms.trid.xml (See appended output/trid-v-old.txt).

For comparison reason i also run the file utility (version 5.40 and
newer). This describes most examples still as "Matlab v5 mat-file" and one
example testhdf5_7.4_GLNX86.mat as "Matlab v7.0 mat-file" (see appended
output/file.txt).

For the examples a page about MAT on file formats archive team
website was mentioned as related URL. That i also used in new variant
definitions by line like:
   <RefURL>
   http://fileformats.archiveteam.org/wiki/MAT
   </RefURL>

On that page a MAT-File Format documentation matfile_format.pdf is
mentioned. There the Level 5 MAT-File Format was explained.


According to documentation the first 116 bytes of the header can contain
text data in human-readable form. This text typically provides information
that describes how the MAT-file was created. For MAT-files created by MATLAB
include the following information in their headers:

1) Level of the MAT-file
2) Platform on which the file was created
3) Date and time the file was created

This often looks like the following string:
MATLAB 5.0 MAT-file, Platform: SOL2, Created on: Thu Nov 13 10:10:27 1997


That is used inside mat-l5.trid.xml by  XML construct like:

 <Bytes>4D41544C414220352E30204D41542D66696C652C20506C6174666F726D3A20</Bytes>
 <ASCII> M A T L A B   5 . 0   M A T - f i l e ,   P l a t f o r m :</ASCII>
 <Pos>0</Pos>

So i look for the platform tag part (which is like: GLNX86 PCWIN PCWIN64 SOL2 Windows_7 nt posix)
and for the creation time. So in a few examples like malformed1.mat
and miuint32_for_miint32.mat the leading comma (0x2C) before platform
part is missing. And in one example not created by MATLAB like in
one_by_zero_char.mat the leading ASCII string looks like
"MATLAB 5.0 MAT-file, written by Octave 3.2.3, 2011-01-25 19:30:48
UTC". So here platform part is missing and creation time is stored in
another format.

After updating the definition by tridscan the above pattern now becomes like:

 <Bytes>4D41544C414220352E30204D41542D66696C65</Bytes>
 <ASCII> M A T L A B   5 . 0   M A T - f i l e</ASCII>
 <Pos>0</Pos>

One MAT example testhdf5_7.4_GLNX86.mat was not identified by TrID
because it start with ASCII string "MATLAB 7.0" instead of string
"MATLAB 5.0" like in other examples. So this is a variant with higher
version level 7. This is also visible that the hexadecimal version is
0x0200 in that case whereas for level 5 this value is 0x0100.

So i generate variant mat-l7.trid.xml based on level 5 definition.

With the second definition now my unrecognized MAT examples are described as
"Matlab Level 5 MAT-File" or "Matlab Level 7 MAT-File" (See appended
output/trid-v-new.txt).

TrID definition, some examples and output are stored in archive
mat_trid.zip. I hope that my 2 XML files can be used in future version of
triddefs.


According to page about MAT-File Versions on web site mathworks.com there
exist also version 7.3 and 6, but without file specification or examples.
So maybe for these versions some additional trid definitions are needed.

With best wishes
Jörg Jenderek


Mark0

  • Administrator
  • Hero Member
  • *****
  • Posts: 2730
    • Mark0's Home Page
Re: updated mat-l5.trid.xml for Matlab MAT-File *.mat + level 7 variant
« Reply #1 on: July 31, 2021, 03:42:15 PM »
Thanks Jörg!
I'll add a more generic def too.