Author Topic: exe-cafe.trid.xml for Mac OS X Mach-O universal executables with CAFEBABE magic  (Read 1723 times)

jenderek

  • Sr. Member
  • ****
  • Posts: 375
Hello trid users,

some days ago i run TrID on about hundreds of Mac OS X Mach-O
universal executables found in /sbin, /sbin and /usr/bin directory on
a Mac OS X system.

Many are misidentified as Mac OS X Mach-O universal Dynamically linked
shared Library (*.dylib) by dylib-cafe.trid.xml and a few like
db_codegen are misidentified by java-class.trid.xml as Java byte code
(*.class). All are also described in general by exe-ub.trid.xml as
"Mac OS X Universal Binary executable" (See appended
output/trid-v-old.txt) because all such files start with CAFEBABE
magic string.

The file command {See https://en.wikipedia.org/wiki/File_(command)}
describes my inspected examples correctly like "Mach-O universal
binary" with sub type classification "executable" (See appended
output/file-5.39.txt), because the file command use another method to
detect such executables.
So i run tridscan on such executables to create exe-cafe.trid.xml
definition file.

I add here again web page about Mach-O file format on Wikipedia. That is now
expressed by line like:
   <RefURL>https://en.wikipedia.org/wiki/Mach-O</RefURL>

Instead generic application/octet-stream the file command shows a user
defined type (See appended output/file-i-5.39.txt). So i changed in trid
definition file mime type. This is now shown by updated line like:
   <Mime>application/x-mach-binary</Mime>

As on other UNIX like systems the executables normally have no file
extension. But in my /usr/bin directory different versions of perl
executable exist (perl5.18 perl5.16 perl). There version string is
appended to main name. This becomes in trid definition a line like:
   <Ext>16/18</Ext>

When looking in exe-cafe.trid.xml i see in global string section lines,
which maybe are generated by lucky circumstances like:
   <String>060425214036Z</String>
Apparently all executables are digitally signed by Apple. This is
expressed by lines like:
   <String>*APPLE CODE SIGNING CERTIFICATION AUTHORITY0</String>
   <String>CRL.APPLE.COM</String>
   <String>APPLE CERTIFICATION AUTHORITY1301</String>
I am no Apple expert, so i do not know if there can exist examples
which are not digitally signed or signed by others. If other users
know such things, then please inform me or run tridscan with such
samples to improve TrID definition file.

All my inspected samples are binary with 2 architectures. This
together with the CAFEBABE magic is expressed by pattern like:
   <Bytes>CAFEBABE00000002</Bytes>
   <Pos>0</Pos>

All my samples start with x86 (i396 or x86_64 like in uuidgen) CPU
binary as first at offset 0x1000. This is expressed by patterns like:
   <Pattern>
      <Bytes>000007</Bytes>
      <Pos>9</Pos>
   </Pattern>
   <Pattern>
      <Bytes>0000030000100000</Bytes>
      <Pos>13</Pos>
   </Pattern>
So i hope that other users can improve the definition file by running
tridscan on bundles with other and more CPU architectures.

With the new trid definition file now my Mac OS X Mach-O universal executables
are described correctly ( see appended output/trid-new.txt). TrID
definition, some examples and output are stored in archive exe-macho.zip. I
hope that the new XML file can be used in future version of triddefs.

When comparing that new definition file with others like o-cafe.trid.xml
or dylib-cafe.trid.xml, these look much the same. The only difference
i see is expressed by 1 line like:
   <String>__MH_EXECUTE_HEADER'_</String>

Value 2 is declared as MH_EXECUTE inside macho header file loader.h,
which is used for demand paged executables. That method for
recognition is used by file command. For my samples the offset to
first mach_header was 0x1000 (4096). There i found MH_MAGIC. For
little endian this is MH_CIGAM or 0xcefaedfe in hexadecimal. Relative
at offset 12 the file type is stored as long integer. For demand paged
executables this value is 2 declared via MH_EXECUTE constant. I do not
know why tridscan does not recognize that structure. So i create a
variant exe-cafe-id1.trid.xml where i check for that magic and file
type value by additional constructs like:
   <Bytes>CEFAEDFE</Bytes>
   <Pos>4096</Pos>
   <Bytes>02000000</Bytes>
   <Pos>4108</Pos>

For dylib often the offset to first mach binary was also 0x1000 in many case
but not always. So maybe there exist samples with other offsets,
which of course are not recognized by variant.

According to header file there seems to exist about a dozen of
different file types. Highest value is 0xb by constant MH_KEXT_BUNDLE
for x86_64 kexts. At the moment only for 4 different file types
(MH_OBJECT=0x1, MH_EXECUTE=0x2, MH_DYLIB=0x6 and MH_BUNDLE=0x8) are
described by dedicated TriD definitions.

With best wishes
Jörg Jenderek

Mark0

  • Administrator
  • Hero Member
  • *****
  • Posts: 2743
    • Mark0's Home Page
TrID/TrIDScan looks for patterns in the first 2KB, so that's why the one at offset 4096 isn't detected.
I'll definitely have to find a bigger sample set of this filetype.

Thanks!