Author Topic: updated dll-os2-no-dos-stub.trid.xml + 5 replacements for *.dll *.sys  (Read 1355 times)

jenderek

  • Sr. Member
  • ****
  • Posts: 375
updated dll-os2-no-dos-stub.trid.xml + 5 replacements for *.dll *.sys

Hello trid users,

some days ago just for interest i inspect some days OS/2 disks. I run trid
on libraries with DLL file name extension and device drivers with SYS name
extension.

Many like GCC335.DLL are described correctly by dll-os2-no-dos-stub.trid.xml
as "OS/2 Dynamic Link Library (no DOS stub)". But some examples like
UFAT32.DLL DLL are not recognized (See appended output/trid-v-old.txt).

For comparison reasons i also run other identifying tools on such examples.
The newest file command (version >5.40) identifies these examples as "LX
executable", "for OS/2"and "(library)" (See appended output/file-new.txt).

So i run tridscan on these 6 undetected samples and i update the trid
definition file dll-os2-no-dos-stub.trid.xml. Then i look at the
differences. Some nil pattern in front block section vanished like:
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>199</Pos>
   </Pattern>
And some patterns become shorter.

Instead generic mime type application/octet-stream i choose another one. On
Windows systems mime type application/x-msdownload is registered for DLL,
but such OS/2 libraries are not made for Windows systems. So i choose user
defined mime type displayed by file command (See appended
output/file-i-new.txt). That is now shown by line like:

   <Mime>application/x-lx-executable</Mime>

When i look in definition, i see that it only contains nil pattern
and these become shorter or vanished. So this is probably not
very unique. So i look for for other defs with nil bytes. Of course
one candidate is null_bytes.trid.xml and another is iso-9660-image.trid.xml.
The later describes CD-ROM images like ReactOS-LiveCD.iso as "ISO 9660 CD
image". These CD images are also described as "OS/2 Dynamic Link Library (no
DOS stub)" with low rate.

Then there exist a third definition macbinary-1.trid.xml with nil bytes for
"MacBinary 1". So an example with name INFO is described as "MacBinary 1".
This is also described as "OS/2 Dynamic Link Library (no DOS stub)" with low
rate.

The reference for such DLL is a page on Wikipedia. That is expressed by line
like:
   <RefURL>http://en.wikipedia.org/wiki/EXE</RefURL>

According to Wikipedia page most DLL start with MZ magic at the
beginning. Such variants are described by dll-os2-dos-stub.trid.xml as "OS/2
Dynamic Link Library (with DOS stub)".

For the non DOS stub variants only a few possible start magic can occur. My
inspected examples start with 2 byte string LX. So i generate a new
definition dll-os2-lx.trid.xml for such "OS/2 Dynamic Link Library
(LX)". This is described by first XML construct like:
   <Bytes>4C58000000000000</Bytes>
   <ASCII> L X</ASCII>
   <Pos>0</Pos>
Another description can be found as text file lxexe.txt. Found for example at:
   http://www.textfiles.com/programming/FORMATS/lxexe.txt

That information can be also found in header file exeflat.h of Open Watcom
compiler suite presented in another way. This is found at:
https://github.com/open-watcom/open-watcom-v2/blob/master/bld/watcom/h/

After the LX signature the byte and word order is stored as byte. Also big
endian variant exist (value 1), but all my inspected examples are all little
endian (value 0). At offset 4 the exe format level is stored as 4 byte
integer. According to documentation the Linear EXE Format Level is set to 0
for the initial version of the 32-bit linear EXE format. Each incompatible
change to the linear EXE format must increment this value. Because
development of OS/2 is dead i assume that incrementing never happens and
this level value is always 0, but according to dll-os2-no-dos-stub.trid.xml
low positive (<256) value may exist. That is expressed by XML construct
like:
   <Bytes>4C58000000000000
   <ASCII> L X
   <Pos>0</Pos>

At offset 8 the CPU type is stored as 2 byte little endian value. All my
real examples have value 2. That means 386 CPU. The artificial example
test-486.DLL with value 3 (OSF_CPU_486) means 486 CPU.  According to
documentation highest value is 41h for MIPS Mark II ( R6000 ). So upper byte
is always 0. At offset 10 the target operating system is stored as 2 byte
little endian value in the range from 0 til 4. So upper byte is always
0. Value 4 is used for Windows 386 and 2 is used for lower Windows
version. Value 0 is used for Unknown (any "new-format") OS and value 1 means
OS/2. When we consider only OS/2 modules then only value 1 can occur
here. Maybe that there exist other LX modules belonging to DOS extenders,
but then there the value 3 maybe occur. These 2 facts are described by XML
construct like:
   <Bytes>000100
   <Pos>9

In definition dll-os2-no-dos-stub.trid.xml also low non one value
occur. That make no sense for me. So i assume that by this definition an
average of different DLL examples are described (maybe LX and LE variants).

At offset 12 module version is stored as 4 byte little endian. I found
example with value 0, 020000h and 020002h (See appended
output/file.tmp). Probably also other values are possible. At the moment
that is expressed by XML constructs like:
   <Pattern>
      <Bytes>00
      <Pos>13
   </Pattern>
   <Pattern>
      <Bytes>00
      <Pos>15
   </Pattern>

At offset 20 the number of pages in module as 4 byte little endian (LE)
integer pages. In theory an upper limit is FFffFFffh, but highest value in
my examples was 106h for example UNIAUD32.SYS. That gives an upper limit of
65536. That means 2 upper bytes are nil. That is expressed by XML construct
like:
   <Bytes>0000
   <Pos>22

At offset 24 the EIP object number is stored as 4 byte LE to which the Entry
Address is relative. I found values 0, 1 and 2 for start_obj. That gives an
upper limit of 256. That means 3 upper bytes are nil. That is expressed by
XML construct like:
   <Bytes>000000
   <Pos>25

At offset 28 the Address of entry point (EIP) is stored as 4 byte LE. In
theory an upper limit is FFffFFffh, but highest value was 05BE0h for example
LVMLAYER.DLL. That gives an upper limit of 16777216 (16 MiB).
That means highest upper byte is nil.
At offset 32 the object number to which the starting ESP is relative is
stored as as 4 byte little endian value stack_obj. This must be a nonzero
value for a program module. So apparently this seem to be 0 for library.  At
offset 36 the entry stack pointer value stored as as 4 byte LE value
esp. According to documentation this field is ignored for a library
module. So apparently this seems to be 0 for library.  At offset 40 the page
size as 4 byte LE page_size. For the initial LX format the page size is
4096 (1000h = 4 KiB).
These 3 facts are expressed by XML construct like:
   <Bytes>00000000000000000000100000
   <Pos>31

At offset 44 the left shift for page offsets is stored as 4 byte LE
page_shift.  A page offset shift of 9 would align all pages on a 512 byte
(disk sector) basis. the default value for this field is 12 (decimal), which
give a 4096 byte alignment. That means 3 highest upper bytes are nil. That
is expressed by XML construct like:
   <Bytes>000000
   <Pos>45

At offset 48 the fixup section size is stored as 4 byte LE fixup_size. In
theory an upper limit is FFffFFffh, but highest value in my examples was
1A362h for example LIBC063.DLL. That gives an upper limit of 16777216 (16
MiB). That means upper byte is nil.
At offset 52 the fixup section checksum is stored as 4 byte LE
fixup_cksum. If the checksum feature is not implemented, then the linker
will set these fields to zero. These 2 fact are expressed by by XML
construct like:
   <Bytes>0000000000
   <Pos>51

At offset 56 the loader section size is stored as 4 byte LE loader_size. In
theory an upper limit is FFffFFffh, but highest value in my examples was
59DEh for example LIBC063.DLL.  That gives an upper limit of 65536 (64 KiB)
That means 2 upper bytes are nil.
At offset 60 the loader section checksum is stored as LE loader_cksum. If
the checksum feature is not implemented, then the linker will set these
fields to zero.
These 2 fact are expressed by by XML construct like:
   <Bytes>000000000000
   <Pos>58

At offset 64 the object table offset is stored as LE objtab_off. For my
example i found value 0 and C4h.  That gives an upper limit of 256. That
means 3 upper bytes are nil.  That is expressed by XML construct like:
   <Bytes>000000
   <Pos>65

At offset 68 the number of objects is stored as LE num_objects. For my
example i found values 0, 3, 4 and 8. That gives an upper limit of 256. That
means 3 upper bytes are nil. That is expressed by XML construct like:
   <Bytes>000000
   <Pos>69

At offset 72 the object page map offset is stored as LE objmap_off. Highest
value was 184h for example UFAT32.DLL. That gives an upper limit of 65536
(64 KiB). That means 2 upper bytes are nil. That is expressed by XML
construct like:
   <Bytes>0000
   <Pos>74

At offset 76 the object iterated pages offset is stored as LE
idmap_off. Highest value was 1a4c for example LVMLAYER.DLL. That gives an
upper limit of 65536 (64 KiB). That means 2 upper bytes are nil. That is
expressed by XML construct like:

   <Bytes>0000
   <Pos>78

At offset 80 the resource table offset is stored as 4 byte little endian
value rsrc_off. Highest value was 174h for example VRSPLITB.DLL. That gives
an upper limit of 65536 (64 KiB). That means 2 upper bytes are nil. That is
expressed by XML construct like:
   <Bytes>0000
   <Pos>82

At offset 84 the number of resource entries is stored as LE num_rsrcs. For
my example i found value 0 and 2. That gives an upper limit of 256. That
means 3 upper bytes are nil. That is expressed by XML construct like:
   <Bytes>000000
   <Pos>85

At offset 88 the resident name table offset is stored as LE
resname_off. Highest value was 97Ch for example LIBC063.DLL. That gives an
upper limit of 65536 (64 KiB). That means 2 upper bytes are nil. That is
expressed by XML construct like:
   <Bytes>0000
   <Pos>90

At offset 92 the offset of entry table is stored as LE entry_off.  Highest
value was 254h for example MPG.DLL. That gives an upper limit of 65536 (64
KiB). That means 2 upper bytes are nil. That is expressed by XML construct
like:
   <Bytes>0000
   <Pos>94

At offset 96 the offset of module directives table is stored LE
moddir_off. Highest value was 1f9h for example IPLUGINW.DLL. That gives an
upper limit of 65536 (64 KiB). That means 2 upper bytes are nil. That is
expressed by XML construct like:
   <Bytes>0000
   <Pos>98

At offset 100 the number of module directives is stored as LE
num_moddirs. Highest value was 1 for example IPLUGINW.DLL. That gives an
upper limit of 256. That means 3 upper bytes are nil. That is expressed by
XML construct like:
   <Bytes>000000
   <Pos>101

At offset 104 the fixup page table offset is stored as LE
fixpage_off. Highest value was 71b2h for example LIBC062.DLL. That gives an
upper limit of 65536 (64 KiB). That means 2 upper bytes are nil. That is
expressed by XML construct like:
   <Bytes>0000
   <Pos>106

At offset 108 the fixup record table offset is stored as LE
fixrec_off. Highest value was 71b6h for example LIBC062.DLL. That gives an
upper limit of 65536 (64 KiB). That means 2 upper bytes are nil. That is
expressed by XML construct like:
   <Bytes>0000
   <Pos>110

At offset 112 import module name table offset is stored as LE endian value
impmod_off. The highest value was 1fde4 for example LIBC063.DLL.  That gives
an upper limit of 16777216 (16 MiB). That means highest upper byte is
nil. That is expressed by XML construct like:
   <Bytes>00
   <Pos>115

At offset 116 the number of entries in import mod name table is stored as LE
num_impmods. My found value are 1 4 7 and 10. That gives an upper limit of
256. That means 3 upper bytes are nil. That is expressed by XML construct
like:
   <Bytes>000000
   <Pos>117

At offset 120 import procedure name table offset is stored as LE
impproc_off. The highest value was 1fe03 for example LIBC063.DLL. That gives
an upper limit of 16777216 (16 MiB). That means highest upper byte is nil.
At offset 124 per-page checksum table offset is stored as 4 byte little
endian value cksum_off. This value was 0 in all my examples. These 2
observations are expressed by XML construct like:
   <Bytes>0000000000
   <Pos>123

At offset 128 the offset of enumerated data pages is stored as LE page_off.
The highest value was 20000h for example LIBC063.DLL. That gives an upper
limit of 16777216 (16 MiB). That means highest upper byte is nil. That is
expressed by XML construct like:
   <Bytes>00
   <Pos>131

At offset 132 number of preload pages is stored as LE integer
num_preload. For my example i found value like 0, 2 or 9. That gives an
upper limit of 256. That means 3 upper bytes are nil. That is expressed by
XML construct like:
   <Bytes>000000
   <Pos>133

At offset 136 the non-resident names table offset is stored as LE
nonres_off.  The highest value was 12b000 for example LIBC063.DLL.  That
gives an upper limit of 16777216 (16 MiB). That means highest upper byte is
nil. That is expressed by XML construct like:
   <Bytes>00
   <Pos>139

At offset 140 the size of non-resident names table is stored as LE
nonres_size.  The highest value was 1e5c4 for example LIBC063.DLL. That
gives an upper limit of 16777216 (16 MiB). That means highest upper byte is
nil.
At offset 144 the non-resident name table checksum is stored as LE
nonres_cksum. This value was 0 in all my examples. These 2 observations are
expressed by XML construct like:
   <Bytes>0000000000
   <Pos>143

At offset 148 the object number of autodata segment is stored as LE
autodata_obj. The highest value was 8 for example UFAT32.DLL. That gives an
upper limit of 256. That means 3 upper bytes are nil.
At offset 152 the offset of the debugging information is stored as LE
debug_off. For my examples this was 0. That means all 4 bytes are nil.
At offset 156 the length of the debugging info is stored as LE
debug_len. For my examples this was 0. That means all 4 bytes are nil. These
3 observations are expressed by XML construct like:
   <Bytes>0000000000000000000000
   <Pos>149

At offset 160 the number of instance pages in preload section is stored as
LE integer num_inst_preload. The highest value was 2 for example
VRSPLITB.DLL That gives an upper limit of 256. That means 3 upper bytes are
nil. That is expressed by XML construct like:
   <Bytes>000000
   <Pos>161

At offset 164 the number of instance pages in demand load section is stored
as LE num_inst_demand. The highest value was 5 for example UFAT32.DLL That
gives an upper limit of 256. That means 3 upper bytes are nil.
At offset 168 (A8) the size of heap for 16-bit apps is stored as LE
heapsize. This field is supported for 16-bit compatibility only and is not
used by 32-bit modules. So in my examples this was always 0. That means all
4 bytes are nil.

At offset 172 (AC) the size of stack OS/2 is stored as LE stacksize. In my
examples this was always 0. That means all 4 bytes are nil.
From offset 176 the following 20 (=OSF_FLAT_RESERVED) bytes til 196 range
are apparently only used for Windows VxD. So for my OS/examples this value
is obviously nil.  These 4 observations are expressed by XML construct like:
 <Bytes>00000000000000000000000000000000000000000000000000000000000000
 <Pos>165

Then after LX-Header then by lucky circumstances 8 short nil pattern occur
like:
   <Bytes>00
   <Pos>209

I also found OS/2 device driver with SYS name extension, that are also LX
excutables. So i run tridscan on these 5 samples and i generate the trid
definition file sys-os2.trid.xml. These are not recognized by current trid
definitions and the structure is nearly the same.

So i look for differences to distinguish SYS from DLL examples. The only
relevant part is 4 byte little endian module flags value at offset 16.

According to documentation the bit 15 in flags is described by mask value
0x8000 =OSF_IS_DLL. If this bit is 0 then module is a program and if this
bit is set the module is a library. So for all DLL samples this bit is set
and for my inspected SYS examples this is 0. Unfortunately this not always
true.  According to documentation this also true for virtual device driver
module like in artificial test-virtualDevice.tmp with mask value 28000h
=OSF_VIRT_DEVICE. Unfortunately TrID can not handle bits, but with 7
additional bits a byte value can be formed that can be considered by trid.
So we look for all possible values 8?h at offset 17. Luckily not so many bit
combinations exist. Some bit are described as reserved for system use. These
seems to be always 0. Or some bits with value 2000h=OSF_LINK_ERROR are only
set when compilation result contains an error. So in a end state of module
this bit is never set.

According to documentation the only relevant additional bits for that byte
are bits 8-10. These must be considered as a group. The flag value
100h=OSF_NOT_PM_COMPATIBLE is described as "incompatible with PM
windowing". The flag value 200h=OSF_PM_COMPATIBLE is described "compatible
with PM windowing". And if both bits are set, then flag value
300h=OSF_PM_APP is described by "Uses PM windowing API". When i understand
documentation text right, this value means like in artificial example
test-GUI.SYS for Graphical User Interface (GUI). Then vice versa means
library or driver is for console. So only four variants values 80 82 81 83
at offset 17 can occur. In my inspected example i only found the first 2
variants.

So i generate 4 trid definition variants like dll-os2-lx-0x82.trid.xml,
which contains an additional XML construct like:
   <Bytes>82
   <Pos>17

According to documentation the bit 17 in flags is described by mask value
20000h=OSF_PHYS_DEVICE. If this bit is set then it is a device driver. Then
there exist only one other bit that can be set here. According to
documentation the bit 16 in flags is described as "Protected Memory Library
module" with mask value 10000h=OSF_IS_PROT_DLL. If i understand
documentation right this can only occur for libraries and does not happen
for OS/2 device drivers. So only possible value for OS/2 SYS examples is
value 02h. This is expressed by XML construct inside sys-os2-lx.trid.xml
like:
   <Bytes>02</Bytes>
   <Pos>18</Pos>

According to documentation value 01 occur only for protected memory library
module with mask value 18000h. The value 02 can also occur for mask value
28000h=VXD_DEVICE_DRIVER_STATIC and value 03 can occur for mask value
38000h=VXD_DEVICE_DRIVER_DYNAMIC, but the later two cases with VXD apply
only for Windows and not for OS/2. And the Windows VXD variants start with
LE magic instead of LX pattern. So these cases do not occur for OS/2.

Then i delete short nil patterns after LX-header (196 byte limit). To match
also debugging examples i delete these patterns. Many stored number and
offset values are "low". So i delete such patterns. I keep unused ( that
means zero) checksum values.

With the 5 definitions now all LX excutables are now identified and
misidentification of CD-ROM images and MacBinary vanished (See appended
output/trid-v-new.txt)

TrID definition, some examples and output are stored in archive
dll_sys.zip. I hope that my 4 XML files can be used in future version of
triddefs.

With best wishes
Jörg Jenderek

Mark0

  • Administrator
  • Hero Member
  • *****
  • Posts: 2743
    • Mark0's Home Page
Re: updated dll-os2-no-dos-stub.trid.xml + 5 replacements for *.dll *.sys
« Reply #1 on: April 23, 2021, 03:32:17 AM »
Thanks for the all the work Jörg!

I feel that TrID isn't the best tool for this kind of analysis (i.e. discerning between very similar filetypes without using some specific rules), but your updated defs will most probably do a better job than the current ones, so will surely adopt them!