Recent Posts

Pages: 1 2 [3] 4 5 ... 10
21
TrID File Identifier / bundle-cafe.trid.xml for Mac OS X Mach-O universal bundle
« Last post by jenderek on August 25, 2020, 09:51:43 AM »
Hello trid users,

some days ago i run TrID on hundreds of Mac OS X Mach-O universal bundles
(*.bundle). Some inspected samples like Bzip2.bundle are misidentified as
"Mac OS X Mach-O universal Dynamically linked shared Library" by
dylib-cafe.trid.xml and all are also described in general by exe-ub.trid.xml
as "Mac OS X Universal Binary executable" (see appended output/trid-v.txt).

The file command {See https://en.wikipedia.org/wiki/File_(command)}
describes most of my inspected examples correctly like "Mach-O universal
binary" with sub type classification "bundle" (See appended
output/file-5.39.txt), because the file command use another method to detect
such libraries archives.
So i run tridscan on such bundle files to create bundle-cafe.trid.xml
definition file.

I add here again web page about Mach-O file format on Wikipedia. That is now
expressed by line like:
   <RefURL>https://en.wikipedia.org/wiki/Mach-O</RefURL>

Instead generic application/octet-stream the file command shows a user
defined type (See appended output/file-i-5.39.txt). So i changed in trid
definition file mime type. This is now shown by updated line like:
   <Mime>application/x-mach-binary</Mime>

When looking in bundle-cafe.trid.xml i see in global string section lines,
which are obviously generated by lucky circumstances like:
   <String>8'''__LINKEDIT</String>
I was not able to remove such strings, although definition file is based on
186 bundles. Probably the reason is that many bundles belong to the same package
like Python or Perl. So apparently often the same string phrases occur.

All my inspected samples are binary with 2 architectures with i386 CPU binary
as first. This together with the CAFEBABE magic is expressed by pattern
like:
   <Bytes>CAFEBABE0000000200000007000000030000100000</Bytes>
So i hope that other users can improve the definition file by running
tridscan on bundles with other and more CPU architectures.

With the new trid definition file now my Mac OS X Mach-O universal bundle
are described correctly ( see appended output/trid-new.txt). TrID
definition, some examples and output are stored in archive bundle.zip. I
hope that the new XML file can be used in future version of triddefs.

Value 6 is declared as MH_BUNDLE, which is used for dynamically bound bundle
file. That method for recognition is used by file command. I had hoped that i
can adopt this method for trid, but that mach_header structure seem to
appear some times at varying offsets. In many cases this offset was 0x1000.

With best wishes
Jörg Jenderek
22
Hello trid users,

some days ago i run TrID on dozens of Mac OS X Mach-O universal Dynamically
linked shared Library (*.dylib), Mach-O bundles (*.bundle), Mach-O
executables without filename extension and thousands Java byte-codes
(*.class).

All inspected samples are described by exe-ub.trid.xml as "Mac OS X
Universal Binary executable". So all Java byte-codes are misidentified with
40% possibility as such executables. And the other way an executable like
file is misidentified with 60% rate as Java byte-code (See appended
output/trid-v-old.txt).

The file command {See https://en.wikipedia.org/wiki/File_(command)} has also
some difficulties to distinguish, but it identifies file example correctly
as "Mach-O universal binary" (See appended output/file-k-5.39.txt), because
the file command use another method to detect such binaries.

TrID identifies all such examples by 4-byte magic string at the beginning.
This is identical for Java bytes code and Mach-O universal binary. So in both
definitions files this is expressed in pattern section by XML construct like:
   <Bytes>CAFEBABE</Bytes>
   <Pos>0</Pos>

The difference between two definition files is that java-class.trid.xml
contains global string section an additional line like:
   <String>JAVA</String>
So every Mach-O universal binary which contains the string Java will be
misidentified as Java class file.

So i look how file command does distinguish and then try to adopt this
method for trid in replacement java-class-new.trid.xml.

According to Java class file page on Wikipedia at offset 6 the major version
is stored as 2 byte value in big-endian order. That is in range from 45
(=0x2d for JDK 1.1) to 58 (=0x3A for Java SE 14) in year 2020. The file
command take values above 30 as characteristic for java class files.
Unfortunately Trid has no construct for testing above value, but the upper
byte of majors version is always null assuming that Java never reaches a
major version number of 256 or higher. I assume that is very unlikely. So
major version part is expressed by additional XML construct like:
   <Bytes>00</Bytes>
   <Pos>6</Pos>

At offset 4 the minor version is stored as 2 byte value in big-endian
order. Theoretically a minor version of 65535 can exist. But after testing
some thousand Java class files i only found low values like 0 or 3. So it it
very unlikely that minor version number of 256 or higher exist. So upper
byte of minor version is than also null. That is expressed together with
magic string by XML construct like:
   <Bytes>CAFEBABE00</Bytes>
   <Pos>0</Pos>

Unfortunately this was not sufficient to distinguish Mach-O from Java class
files. At offset 8 the constant pool count is stored, which is apparently
always non zero. That can be verified by output of a patched file command
(See appended output/file.txt). Furthermore i mention this now in remark
line instead instead old comment.

In current java-class.trid.xml the mime type application/java-byte-code was
used. But when looking at IANA site such mime type is not officially
registered. So i replace it with a used defined one (Starting with x- ) that
is shown by file command (See appended output/file-ik-5.39.txt) and by
http://extension.nirsoft.net/class. That is now expressed by line like:
   <Mime>application/x-java-applet</Mime>

In exe-ub.trid.xml is no reference URL. So i add in replacement definition
Wikipedia page about Mach-O file format. That is now expressed by line like:
   <RefURL>https://en.wikipedia.org/wiki/Mach-O</RefURL>

According to documentation is becomes visible that not only Mac OS X
Universal Binary executables like sgdisk and file without file name extension
are described, but also dynamically linked shared libraries ( file name
extension dylib) and Universal Binary bundles ( with file name
extension bundle). So i removed phrase "executable" and replace it by phrase
"(generic)" in replacement definition. So i also changed  TrID definition name to
ub-gen.trid.xml instead exe-ub.trid.xml. The possible different file name
extensions are now shown by line like:
   <Ext>DYLIB/BUNDLE/O</Ext>

Instead generic mime type application/octet-stream now i use the user defined
one that is shown by file command (See appended output/file-ik-5.39.txt).
That is now expressed by line like:
   <Mime>application/x-mach-binary</Mime>

According to file command at offset 4 the number of architectures is stored
as 4 byte value in big-endian order. Typical values are 2 for Mac OS X
Universal Binaries for i386 and x86_64 architectures. Often i also find
examples with value 1 for one of this x86 architectures. I also found a few
samples with value 3 and 1 example with value 4 like
libclang_rt.asan_watchos_dynamic.dylib (See appended output/file.txt).

There seem to exist only about two dozens CPU architectures for embedding.
Highest numbered by file command is 18 for ppc. So in worst or biggest case
a Mach-O universal apparently contains maximal 18 binaries. So file command
considers value below small value 20 as characteristic for Mach-o files
compared with "high" value for Java classes. So such "low" values means that
the 3 upper bytes for architectures number of are null. That is expressed
together with magic string by XML construct like:
   <Bytes>CAFEBABE000000</Bytes>
   <Pos>0</Pos>

According to file command at offset 4 the CPU type is stored as 4 byte
value. The upper byte seems to be 1 for 64-bit architectures architectures
and 0 for 32-bit architectures. The remaining 3 bytes apparently contain low
values like 7 for x86 CPUs, Ch for arm CPUs. Highest mentioned number is 18
for PowerPC. That means the 2 bytes in the middle are null. That is now
expressed by XML construct like:
   <Bytes>0000</Bytes>
   <Pos>9</Pos>

That was sufficient for me to distinguish Java class files from Macho-O.
There at offset 12 the CPU sub type is stored as 4 byte value. So the same
consideration as for CPU type can be done for sub type if needed.

With the 2 trid definitions all my Mac OS X Universal Binaries and Java
class are described correctly ( See appended output/trid-new.txt) and the
recognition rate is raised ( See appended output/trid.txt).

TrID definition, some examples and output are stored in archive
class_macho.zip. I hope that the 2 XML files can be used in future version
of triddefs as replacement after additional tests with other exotic CPU type
architectures.

With best wishes
Jörg Jenderek
23
Definitions DB change log / Re: Current
« Last post by Mark0 on August 24, 2020, 03:10:13 AM »
Updated:
  • PrintFox/Pagefox bitmap (640x400) (BG/BIN)
  • PrintFox/Pagefox bitmap (320x200) (BS/BIN)
  • Mac OS X Mach-O universal Dynamically linked shared Library (DYLIB)
  • Eureka/Mercury Configuration (EKA/CFG)
  • PrintFox/Pagefox bitmap (640x800) (PG/BIN)
Added:
  • Team F1 Circuit data (CDR)
  • Eureka/Mercury Help (HLP)
  • QMovie Video (QMV)
  • Eureka/Mercury Report (RPT)
  • Lunar DSP script manifest (XML)
25
Hello trid users,

some days ago i run TrID on hundreds of Mac OS X Mach-O universal
Dynamically linked shared Library (*.dylib). These should be described by
dylib-cafe.trid.xml as "Mac OS X Mach-O universal Dynamically linked shared
Library". Some inspected samples like libasprintf.0.dylib or
libX11-xcb.1.dylib are only described in general by exe-ub.trid.xml as "Mac
OS X Universal Binary executable" (see appended output/trid-v.txt).

The file command {See https://en.wikipedia.org/wiki/File_(command)}
describes most of my inspected examples correctly like "Mach-O universal
binary" with sub type classification "dynamically linked shared library"
(See appended output/file-5.39.txt), because the file command use another
method to detect such libraries archives.

The definition file dylib-cafe.trid.xml does not contain a reference URL.
So i add web page about Mach-O file format on Wikipedia. That is now
expressed by line like:
   <RefURL>https://en.wikipedia.org/wiki/Mach-O</RefURL>

Instead generic application/octet-stream the file command shows a user
defined type (See appended output/file-i-5.39.txt). So i changed in trid
definition file mime type. This is now shown by updated line like:
   <Mime>application/x-mach-binary</Mime>

When looking in dylib-cafe.trid.xml i see in global string section lines,
which are obviously generated by lucky circumstances like:
   <String>TION</String>
   <String>D_INFO</String>
So with the help of the grep command i search on a Mac OS X system for such
dylib libraries without such patterns. When i run tridscan on such samples
many pattern in updated trid definition file vanish.

With the updated trid definition file most Mach-O universal Dynamically
linked shared Library archives are described correctly ( see appended
output/trid-new.txt). TrID definition, some examples and output are stored
in archive dylib.zip. I hope that the updated XML file can be used in
future version of triddefs.
Wikipedia also mention Mach-O formats with file name extension bundle and
o. As far as i can see there exist no trid definition for such variants. I
will try to handle this in a future session.

After looking deeper in documentation it is visible that at offset 12 the
file type is stored as long in big endian format. The value is range from 1
til 11. Value 6 is declared as MH_DYLIB, which is used for dynamically bound
shared library. That method for recognition is used by file command.  So
maybe it is better to arrange trid definitions for Mach-O based on this
method, instead on file name extension.

So a few examples like AMDil_r700.dylib are described by file command with
sub type classification "bundle", that is matched by file type value 8
declared as MH_BUNDLE for dynamically bound bundle files.

With best wishes
Jörg Jenderek
26
Definitions DB change log / Re: Current
« Last post by Mark0 on August 19, 2020, 12:43:24 AM »
Updated:
  • ZX Spectrum CHR$ bitmap (CH$)
  • Open Publication Structure eBook (EPUB)
  • PKZIP mini-self-extracting 16bit DOS executable (EXE)
  • GigaScreen bitmap (HLR)
  • XLD4 bitmap (Q4)
  • CompuServe RLE bitmap (hi-res) (RLE)
  • CompuServe RLE bitmap (med-res) (RLE)
Added:
  • BSP bitmap (BSP)
  • Telvox CODEC archive (v3.1x) (CDC/DQT)
  • Telvox CODEC archive (v3.2x) (CDC/DQT)
  • Image Cytometry Standard header (ICS)
  • MIDI Instrument Definition File (IDF)
  • ArtMaster88 bitmap (IMG)
  • PMG Designer bitmap (PMD)
  • XLD4 Graphic Data Document bitmap (Q4D)
Deleted:
27
Microsoft links are basically a lost cause, unfortunately, as they routinely reorganize their pages. I will probably link to Wikipedia MIDI page.
Thanks!
28
Hello trid users,

some days ago i looked at my RIFF based file collection. When i run TrID on
my few Microsoft Instrument Definition Files with file name extension idf
all are described only as Generic RIFF container by riff.trid.xml instead by
idf.trid.xml (See appended output/trid-v.txt).

So i run tridscan to update definition file and look at differences. I create
my samples with Microsoft's IDFEDIT.EXE on system running a German Windows
98. So in examples like GENERAL.IDF instead English word percussion the
German translation Schlagzeug occurs. So in trid definition file in global
string section the following line vanish:
   <String>PERCUSSION</String>

The instrument names often looks like "Universal-MIDI-Instrument", but i can
create an example like My-idf2.idf without the phrase MIDI. This is probably
a rare case, but that sample is still a valid IDF file. So in trid definition
file in global string section also the following line vanish:
   <String>MIDI</String>

Because IDF files are used to remap the general MIDI audio Patch set instead
global application/octet-stream or RIFF specific application/x-riff mime
type show a user defined on by line like:
   <Mime>audio/x-idf</Mime>

The site msdn.microsoft.com does not exist any more. So the reference URL is
redirected to another web presence. That is now expressed by updated URL line
like:
 <RefURL>
 https://docs.microsoft.com/en-us/previous-versions/visualstudio/aa264328(v=vs.60)
 </RefURL>

With the updated trid definition all my IDF examples are now detected ( see
appended output/trid-new-v.txt). TrID definitions, some examples and output
are stored in archive idf.zip. I hope that my updated XML file can be used
in future version of triddefs.

With best wishes
Jörg Jenderek

29
Thanks!
30
Hello trid users,

some days ago i looked at my ZIP archive collection. When i run TrID on
more than 300 Open Publication Structure eBooks with file name
extension epub most are described correctly by epub.trid.xml. But a
few like welcome.epub are only identified as ZIP compressed archive by
ark-zip.trid.xml ( see appended output/trid-v.txt).

For comparison reason i also run other file identification tools.
The tool DROID ( See http://digital-preservation.github.io/droid/)
also describes welcome.epub as "epub format" by signature id 483 ( See
output/epub-droid.csv).

The epub documents are just zip containers. This is expressed by
XML-construct:
   <Bytes>504B0304</Bytes>
   <ASCII> P K</ASCII>
   <Pos>0</Pos>
So i look in output of decompressing tools 7-zip with list and show
technical information (See appended output/7z-l-slt.txt) and output of
unzip with verbose zipinfo option ( See output/unzip-Zv.txt).

All Epub samples contain a file with 8 byte name mimetype. The content
is stored uncompressed and contains 20 bytes mime type string
"application/epub+zip". In most samples this archive member is the
first one, but not for welcome.epub. There it is the last one, but
this is only a problem for recognition by current file(1) command.
Who does not obey the conventions used by most others. That is Adobe.
The "strange" epub is part of Adobe Digital Editions ( Version 4.5 for
me) and is found in "My Digital Editions" sub directory inside
Documents directory in my HOME directory.

In  the "bad" examples the mimetype member has some extra fields. That is
universal time fields (ID 0x5455) and Unix UID and GID field ( ID 0x7875)
According to page about Zip file format on Wikipedia at the end of the
local file header the file name ( that is for inspected samples
mimetype) is stored followed by optional m bytes for extra field. That
is followed by member data (That is for inspected samples mime type
string application/epub+zip). So for "good" examples without extra
fields we find thees 2 ASCII strings concatenated. That was expressed
in global section by line like:
   <String>MIMETYPEAPPLICATION</String>
In the welcome.epub after filename mimetype comes some extra field byte
sequence starting with UT ( That is 55 54 for time fields) and the type
string application/epub+zip appears some bytes later. This is now
expressed in updated epub.trid.xml by 2 lines like:
   <String>APPLICATION</String>
   <String>MIMETYPE</String>

Similar considerations can be done for central directory file
header. At the end of an entry the file name is stored. That is
followed by optional extra field and file comment. Afterward comes
next entry starting with ZIP magic string PK. So for "good" examples
without extra fields and file comments we find concatenated string
mimetypePK. That was expressed in global section by line like
   <String>MIMETYPEPK</String>
In "bad" welcome.epub here after file name mimetype comes extra fields
byte sequence starting again with UT for time stamps. So in updated trid
definition this now becomes in global string section like:
   <String>MIMETYPE</String>

With the updated trid definition all examples are now detected ( see
appended output/trid-new-v.txt). TrID definitions, some examples and
output are stored in archive epub.zip. I hope that my XML file can be
used in future version of triddefs.

With best wishes
Jörg Jenderek
Pages: 1 2 [3] 4 5 ... 10