Author Topic: 3 replacement macbin-?.trid.xml for MacBinary and 2 variants  (Read 4270 times)

jenderek

  • Sr. Member
  • ****
  • Posts: 361
3 replacement macbin-?.trid.xml for MacBinary and 2 variants
« on: December 11, 2017, 06:46:35 PM »
when i run TrID on examples often files are misidentified as "MacBinary 2
header" or similar (see appended bad/output/trid-old.txt).

The file(1) command does not classify that examples as MacBinary (see
appended bad/output/file.txt)

Because the 3 trid definitions concerning MacBinary are not full fitted i
replace them. So macbinary-2.trid.xml is replaced by macbin-2.trid.xml etc.

The current definitions contain a line like:
   <Ext></Ext>

That would mean that MacBinary have no file name extension. That is
wrong. According to Wikipedia page https://en.wikipedia.org/wiki/MacBinary
"bin" extension is used. This is what i also see for most examples.

Following links on website also extension "macbin" is used. This would now be
expressed by line:

   <Ext>BIN/MACBIN</Ext>

For further inspection i look at MacBinary II Specification found for
example at http://files.stairways.com/other/macbinaryii-standard-info.txt .
According to that document MacBinary contain some 3 zero fill bytes. This is
expressed by XML construct:

   <Bytes>00</Bytes>
   <Pos>0</Pos>
   ...
   <Bytes>00</Bytes>
   <Pos>74</Pos>
   ...
   <Bytes>00</Bytes>
   <Pos>82</Pos>

Trid definition macbinary-2.trid.xml contains no more patterns. Because these
null bytes also occur in master boot record and CD images without boot
loader. So such files like Samsung5_Z-pt.vmdk and MSDOS8.iso are always
misidentified as "MacBinary 2 header".

At offset 122 version is stored. For oldest this byte is null and for the
newer versions value begins at 129. Luckily there exist only 3 main major
versions (1/II/III). So add for MacBinary 2 to trid definition also
XML-construct:

   <Bytes>81</Bytes>
   <Pos>122</Pos>

At offset 123 minimum version needed to read this file is stored.
Theoretically 6 version combination may exist. But in real world i only
found 3 combinations (0, 8181h and 8281h). So if further distinguishing is
needed create branches like definition for "MacBinary III with II
to read" with construct like:
   <Bytes>8281</Bytes>
   <Pos>122</Pos>

Not mentioned in specifications but always found "mBiN" string for MacBinary 3.
This was already expressed by construct:

   <Bytes>6D42494E</Bytes>
   <ASCII> m B I N</ASCII>
   <Pos>102</Pos>

At offset 99 length of Get Info comment is stored. According to docs no
program apparently has implemented this feature. That means length is
null. So add for MacBinary 2 and 3 additional expression like:

   <Bytes>0000</Bytes>
   <Pos>99</Pos>

For MacBinary 1 most variables at the end of header are not defined.

According to specs any bytes in the header not defined should be set to
zero. This was expressed correctly for MacBinary 1 by construct like:

   <Bytes>0000000000000000000000000000000000000000000000000000000000</Bytes>
   <Pos>99</Pos>

Undefined bytes are now expressed for MacBinary 2 by additional construct:

   <Bytes>0000000000000000</Bytes>
   <Pos>108</Pos>

At offset 83 data fork length is stored as long big endian and at 87
resource fork length. In documentation there exist a sentence which i
finally interpreted that the length of the forks should be in the range of
0-007FFFFFh for MacBinary 1. That would mean a 8 MB size limit for embedded
forks. Or in trid scenario upper byte is null which match all forks with
size below 16 MB. So add/change for MacBinary 1 two additional XML
constructs like:

   <Bytes>0000</Bytes>
   <Pos>82</Pos>
   ...
   <Bytes>00</Bytes>
   <Pos>87</Pos>

With macstream (version 2.0b3 22Oct1992 and part of macutils package on
Linux) and -d option (files assumed to be plain text) create from big sized
(29096865 bytes) input a MacBinary 1 example NOOBS_lite_v2_1.zip.macbin (
see output of patched file command 1/output/file-new.txt). This example is
then not recognized as pure MacBinary 1 by new trid definition any
more. At first glance size limitation seems to be wrong.

But when looking at Wikipedia we see MacBinary 1 was released in 1985 and
updated to MacBinary 2 in 1987. I am no Mac expert, but in that time area
values like 8 MB are the size range of the whole PC RAM. So 8 MB or bigger
sized MacBinary 1 can not occur in real word examples at that time. So in
real world examples size with only some hundreds KB occur (See 215500 bytes
for MacGzip-1.1.3.sea.bin in 1/output/file-new.txt). So i keep that size
limitations in trid definition for MacBinary 1.

With replacement macbin-1.trid.xml all DOS 2.0-3.2 backup part like
ATIH_255.EXE are not misidentified any more (see bad/output/trid-new.txt).

But still many examples are misidentified whereas patched file command
identifies correctly. So i look what file command does. It also check if
length of filename is in the range 1-63 and if file name starts with
printable characters. So i add this information to remark line.

All misidentification for MacBinary 2 vanish by macbin-2.trid.xml ( see
bad/output/trid-new.txt).

For MacBinary 3 identification by macbin-3.trid.xml is the same as by old
macbinary-3.trid.xml, but recognition rate has changed a little bit ( see
3/output/trid-new.txt).


Another approach to overcame the week pattern especially for MacBinary 1 is
to look for apple type (normally expressed as four characters) stored at
offset 65 and/or apple file creator stored at offset 69 by trid definition
of form macbin-gen-*.trid.xml.

So example like gnupg-logo-eps.macbin is also identified as "Macintosh
Encapsulated Postscript (MacBinary)" by macbin-gen-eps.trid.xml which
contains additional XMl construct:
   <Bytes>45505346</Bytes>
   <ASCII> E P S F</ASCII>
   <Pos>65</Pos>

The used reference link in such trid definition is now dead:
   <RefURL>http://www.macdisk.com/macsigen.php3</RefURL>
The new updated should contain no "3" at the end and look like:
   <RefURL>http://www.macdisk.com/macsigen.php</RefURL>

Furthermore such definitions should contain for mime type a line like:
   <Mime>application/x-macbinary</Mime>

I am no Mac expert. So i am not sure about following fact. According to
macbin-gen-eps.trid.xml the MacBinary have file name extension "eps" by
line:
   <Ext>EPS</Ext>

But i found for example still typical MacBinary "macbin" extension which would
be expressed by line like:

   <Ext>BIN/MACBIN</Ext>
   
and only the source file has extension "eps" like for my example
gnupg-logo.eps ( see 1/output/file-new.txt). So i leave existing
macbin-gen*.trid.xml definitions untouched.

So i only create 2 additional definitions with apple type for my examples
where missing.

GohuFont.bin is now identified as "Macintosh font suitcase (MacBinary)" (see
3/output/trid-new.txt) by macbin-gen-font.trid.xml by also looking for apple
type FFIL via additional XML construct:

   <Bytes>4646494C4</Bytes>
   <ASCII> F F I L </ASCII>
   <Pos>65</Pos>


TERM.MAC.bin i now identified as "Macintosh Stuffit Deluxe archive
(MacBinary)" ( see 2/output/trid-new.txt ) by
macbin-gen-stuffitDeluxe.trid.xml by also looking for apple type SITD via
additional XML construct:

   <Bytes>53495444</Bytes>
   <ASCII> S I T D</ASCII>
   <Pos>65</Pos>


TrID definition, some examples and output are stored in archive
macbina.zip. I hope that my 5 XML files can be used in future version of
triddefs.

With best wishes
J?rg Jenderek

Mark0

  • Administrator
  • Hero Member
  • *****
  • Posts: 2667
    • Mark0's Home Page
Re: 3 replacement macbin-?.trid.xml for MacBinary and 2 variants
« Reply #1 on: December 12, 2017, 03:03:41 PM »
Many thanks Joerg!