Author Topic: 3 or more replacements for Microsoft Visual C Library *.LIB lib-msvc.trid.xml  (Read 1183 times)

jenderek

  • Sr. Member
  • ****
  • Posts: 375
Hello trid users,

some days ago i handled some libraries with file name extension LIB.

When i run TrID on such libraries and related files most are identified
correctly as "Microsoft Visual C Library" by lib-msvc.trid.xml (see appended
lib_16/output/trid-v-old.txt and lib_32/output/trid-v-old.txt).

But some examples like MOUSE.LIB, BZ2.LIB and ZLIB.LIB are not recognized
and are described as "Unknown!" (see appended output/trid-v-old.txt).

When running file command (version 5.41) on such libraries some are also
described as "Microsoft Visual C library", but with keep going option -k
such examples are also described wrong as "SysEx File - ADA" and real SYX
examples are described as "SysEx File -" (See appended
lib_16/output/file-k-5.41.txt). Some examples are only described wrong as
"SysEx File - Inventronic" (See appended lib_32/output/file-k-5.41.txt).
Some examples are only described as "data" (See appended
output/file-5.41.txt).

When running newest file command (version >5.41) on such libraries all are
described now only as "Microsoft Visual C/OMF library" with additional
information like page size. The misidentified "Inventronic" libraries are
described with "page size 32" (See appended lib_32/output/file.txt). The
"data" examples are now described with "page size 512" (See appended
file.txt). The "good" examples are described with "page size 16" (See
appended lib_16/output/file.txt).

Instead generic mime type application/octet-stream now application/x-omf-lib
is shown (See appended lib_16/output/file-i.txt).

One solution is to update lib-msvc.trid.xml by running tridscan on
undetected examples. I did this and call this definition
lib-msvc-test.trid.xml. But i decided against this step. I will explain this
later.

In the definition the page about Microsoft Visual C++ on Wikipedia was used,
but that was not so helpful, but on file formats archive team website i
found a page about Microsoft Library (*.lib). That is expressed by line
like:
 <RefURL>http://fileformats.archiveteam.org/wiki/Microsoft_Library</RefURL>

From there i get the right hint that the used file format is the relocatable
Object Module Format (OMF). So i use this as new describing text for file
type by a line like:
   <FileType>
   relocatable Object Module Format (OMF) library (page size 16)
   </FileType>

Now i explain why i changed this text. My oldest inspected example was
MOUSE.LIB dated from September 1984. According to Wikipedia the first Visual
C compiler suite occur at February 1993. So to become general true, the word
Visual must be removed from phrase "Microsoft Visual C library". According
to reference site about Microsoft Library such libraries would be compiled
from source code (BASIC, C, Pascal, etc.). That can be verified by example
like QB4UTIL.LIB. Here second record name is "QB4UTIL.ASM". That means
source was Assembler code. So the up case letter C in phrase must be
expanded to something like "C, Assembler, Pascal, BASIC" or must be removed.
This library format was not only used by Microsoft but also by completely
other companies like Borland. This can be seen in example CATDB.LIB where
third record is a Translator comment like "TC86 Borland Turbo C++ 3.00". So
describing phrase now has shrunken down to just one word library. So the
correct describing text should look like "relocatable Object Module Format
(OMF) library". With the help of Relocatable Object Module Format (OMF)
Specification Version 1.1 found for example as document OMF_v1.1.pdf it is
possible to understand what is going on.

So definition lib-msvc.trid.xml describes the average of different OMF
variants. So in the end in lib-msvc-test.trid.xml only 2 bytes are used to
identify library file types by lines like:
   <Pattern>
      <Bytes>F0</Bytes>
      <Pos>0</Pos>
   </Pattern>
   <Pattern>
       <Bytes>00</Bytes>
      <Pos>3</Pos>
   </Pattern>

According to file command recommendations at least 4 bytes should be used to
describe different file types in a unique manner. The older file command
version had difficulties to distinguish such libraries from audio SYX
examples. Luckily one step avoid this catastrophic non differentiation. The
libraries contains the ASCII word DATA for a segment name, whereas the SYX
examples seems to be pure binary files. That is expressed inside global
string section by line like:
   <String>DATA</String>

So i run tridscan to generate definitions for different page sizes.
In the first byte the OMF record type is stored. For OMF libraries this has
value 0Fh (LibraryHeaderRecord). Afterwards the first record data length is
stored as 2 byte integer in little endian. By adding three (1 for type byte
and 2 for length) you get the length of the whole first record. Apparently
for libraries you call it page size. According to documentation page size
must be multiple of two (page size=2**n). The lowest possible value is
sixteen (16=2**4) and highest possible value 32768 (=2*15).  When printing
this in hexadecimal this record length looks like ???Dh. So the first nibble
has always value D. So in theory there may exist 12 variants, but i found
only 3 variants. And in file command there exist an entry for variant with
page size 128. So these variants looks like:
   definition      page size
   lib-omf-16.trid.xml   016=000Dh+3
   lib-omf-32.trid.xml   032=001Dh+3
   lib-omf-512.trid.xml   512=01FDh+3
   lib-omf-128.trid.xml   128=007Dh+3

At offset 7 the dictionary size is stored as 2 byte integer as number of
blocks (a 512 byte). So in theory the upper limit is 65536, but in
documentation is written the Library Manager, LIB.EXE for MS-DOS, cannot
create a library whose dictionary requires more than 251 512-byte
pages. That means that upper byte of dictionary size is nil. That is
expressed by third XML construct line like:
      <Bytes>00</Bytes>
      <Pos>8</Pos>

It is not explicitly written, but when the dictionary size is a multiple of
512 ( that is hexadecimal 200), then it obviously make sense that the
dictionary itself start on such a boundary. So for this dictionary offset
value stored as 4 byte integer at offset 3 the lowest byte is then always 0.

So the first XML construct checks now the first four bytes. It test that
RecordType is as Library Header Record (F0 hexadecimal), record length is a
hexadecimal number like ???D and dictionary offset is multiple of 512
(?????200 hexadecimal and so on). So for lib-omf-512.trid.xml this looks
like:
   <Pattern>
      <Bytes>F0FD0100</Bytes>
      <Pos>0</Pos>
   </Pattern>

In theory upper limit for dictionary offset is 0x100000000 ( That is 4
GiB). But in examples the upper byte was always nil. That means upper found
limit is 16777216 (That is 16 MiB). That makes probably always sense because
in ancient times with DOS the usual sizes are just a few Megabytes. So that
fact is expressed by second XML construct like:
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>6</Pos>
   </Pattern>

After the dictionary size a library flag byte is stored at offset 9.
According to documentation value one means case sensitive and all bits are
reserved for future use and should be 0, but for old MOUSE.LIB i found here
unexpected value 0x4d. Afterwards until next record come padding bytes,
which are not significant. The first record also contain no checksum byte.
Apparently often the padding bytes are often nil. So examples done by
lib-omf-32.trid.xml are case sensitive. That is expressed by by XML
construct like:
   <Pattern>
      <Bytes>00010000000000000000000000000000000000000000000080</Bytes>
      <Pos>8</Pos>
   </Pattern>
So examples done by lib-omf-16.trid.xml are case insensitive. That is
expressed by by XML construct like:
   <Pattern>
      <Bytes>00000000000000</Bytes>
      <Pos>8</Pos>
   </Pattern>
So if users found examples with other cases and other padding the above XML
constructs will shrink.

The second record is Translator Header Record (THEADR=80h) or Library Module
Header Record (LHEADR=82h). In my inspected samples i only found THEADR
examples. So in lib-omf-512.trid.xml this is expressed by XML construct
like:
   <Pattern>
      <Bytes>80</Bytes>
      <Pos>512</Pos>
   </Pattern>

Afterwards the record data length is stored as 2 byte integer in little
endian. The upper byte of this value apparently is nil. That is expressed by
XML construct like:
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>514</Pos>
   </Pattern>
That means that record length is lower than 256. Here record content consist
just of one pascal string followed by a checksum byte. Often this string is
the library module source name like "dos\crt0.asm" in mlibce.lib,
"QB4UTIL.ASM" in QB4UTIL.LIB or "C:\Documents and Settings\Allan Campbell\My
Documents\FDOSBoot\ zlib\zutil.c" in ZLIB.LIB. But the string name can also
directly specified by the programmer via TITLE pseudo-operand or assembler
NAME directive. So sometimes i find title like "87INIT" in FP87.LIB or
"ACOSASIN" in MATHC.LIB or "Copyright" in calc-bcc.lib. On DOS systems about
256 is the upper limit for full file names with path part. So probably this
upper limit is always true.

The definition lib-omf-32.trid.xml was based on only a few dozen samples of
the xharbour project examples. So the second record string start in all
examples by same part. That was expressed by XML construct like:
   <Pattern>
      <Bytes>433A5C58484152424F55525C5352435C534F555243455C</Bytes>
      <ASCII> C : \ X H A R B O U R \ S R C \ S O U R C E \</ASCII>
      <Pos>36</Pos>
   </Pattern>
Assuming that in other project samples this second record looks total
different, i delete the above construct.

So in lib-omf-32.trid.xml in global string section i get many lines like:
      <String>MATH387R</String>
      <String>XHARBOUR</String>
      <String>EMU387G</String>
      <String>CONST2</String>
      <String>DGROUP</String>
      <String>SOURCE</String>
      <String>_TEXT{</String>
      <String>3FOPD</String>
      <String>OS220</String>
      <String>_DATA</String>
      <String>CODE</String>
      <String>FLAT</String>
      <String>T$N(</String>
      <String>_BSS</String>
In lib-omf-512.trid.xml   this section contains only two lines. That look
like:
      <String>GROUP</String>
      <String>DATA</String>
When inspecting more examples this probably shrink to one line like in
lib-omf-16.trid.xml. There this looks like:
      <String>DATA</String>

By definition lib-msbc7.trid.xml libraries are described as "Microsoft Basic
7.x compiled library" and by lib-msbc.trid.xml libraries are described as
"Microsoft Basic compiled library (generic)", but when looking inside we see
that these are just OMF libraries with page size 16. The only difference is
that the generic Basic variant contains a segment with name "B$SEG". That is
expressed inside global string section by lines like:
      <String>B$SEG</String>
And in BASIC version 7 variant there exist segments with names "B$SEG" and
"B$SA". That is expressed inside global string section by lines like:
      <String>B$SA</String>
      <String>B$SEG</String>
If there exist no constraints in compiler suites, it should be possible to
build excutables from Basic, Assembler or C based OMF libraries, or even
combinations. So the two definitions lib-msbc.trid.xml and
lib-msbc7.trid.xml are only needed when you explicitly want to emphasize
that the libraries are based on Microsoft Basic sources.

Just for completeness i generate definition syx-midi.trid.xml for "MIDI
audio System Exclusive (SysEx) message" SYX samples. Unfortunately this not
so unique, because it test just for 1 starting byte. That is expressed by
XML construct like:
   <Pattern>
      <Bytes>F0</Bytes>
      <Pos>0</Pos>
   </Pattern>

With the 4 new TrID definitions all of my inspected libraries and SYX
examples are now described correctly as "relocatable Object Module Format
(OMF) library" or "MIDI audio System Exclusive (SysEx) message" (see
appended trid-v.txt).

TrID definitions, some examples and output are stored in archive
lib_trid.zip.

With best wishes
Jörg Jenderek


Mark0

  • Administrator
  • Hero Member
  • *****
  • Posts: 2743
    • Mark0's Home Page
Thanks Jörg!

P.S.
I see that in the ZIP file you included the lib-msvc-test.trid.xml but not the lib-omf-128.trid.xml.
« Last Edit: January 31, 2022, 10:08:00 PM by Mark0 »