Author Topic: 3 bat*.trid.xml for many batch files *.bat *.cmd *.btm  (Read 1515 times)

jenderek

  • Sr. Member
  • ****
  • Posts: 375
3 bat*.trid.xml for many batch files *.bat *.cmd *.btm
« on: October 22, 2020, 12:35:35 AM »
Hello trid users,

some days ago i run TrID on 2635 batch files including duplicates.

All thousands samples are described only as "Unknown!" (see appended
@/output/trid-old.txt).

For comparison reason i run other file identifying tools.
DROID ( See http://digital-preservation.github.io/droid/ ) describes the
samples with bat extension as "Batch file (executable)" by PUID x-fmt/413
(see appended @/output/batch-droid.csv).
The current file command { See https://en.wikipedia.org/wiki/File_(command)}
describes most samples as "DOS batch file" and as "text, with CRLF line
terminators" (see appended @/output/file-5.39.txt).

So i run tridscan on these samples and i get a trid definition file
bat.trid.xml.
Some few information about this file format can be found on Batch file
page on file formats archive team web site. This is expressed by reference
URL line like:
 <RefURL>http://fileformats.archiveteam.org/wiki/Batch_file</RefURL>

According to that site often the batch file starts with line "@ECHO OFF" to
turn off the echoing of commands. The construct with at-sign at the
beginning only works in "newer" DOS systems like Microsoft DOS version 3.30
dated about February 1988. It does not exist in "older" DOS versions like
Microsoft DOS version 3.20.
176 of my inspected batch samples like GEM.BAT starts with that expression.
Because the stored DOS or Windows commands are case insensitive i also found
other spellings. For the low case variant i found 1019 examples like
BENNY.BAT.
The remaining 169 batch files are little different. Some contain space after
that typical phrase like in example eps2eps.bat found in Ghostscript
software til version 9.53.3. Or more commands are concatenated by ampersand
character in one line like in example Get_Files.cmd with first line like:
@echo off &MODE CON: COLS=75 LINES=20 &color 1e
I also find examples where no disabling of echoing is done. Instead the echo
command is used to display some message text like in example wixenv.bat
found inside source of graphic software Inkscape version 1.0.1. So first
line starts like:
@echo Setting environment variables
I also found 2 examples like SIVDBG.bat ( found inside System Information
Viewer package version 5.52 ) where a point is directly following the echo
command, which produce an empty message line.

All these 1364 batch files are described by definition file bat.trid.xml. All
these samples start with at sign character. That is expressed by pattern block
like:
   <Bytes>40</Bytes>
   <ASCII> @</ASCII>
   <Pos>0</Pos>
And later in the batch files occur the echo command. That is expressed in
global string section by line like:
   <String>ECHO</String>
Originally such files contain a batch of DOS commands and have the file name
extension bat and are executed by DOS commando interpreter program
COMMAND.COM. Alternative commando interpreters like 4DOS based ones extend
the range of command. To distinguish and handle such newer batch files
another file name extension BTM is used. The same procedure occur in Windows
NT and OS/2 interpreter program CMD.EXE. There now the extension CMD is
used. So three possible extensions can occur for batch files. That is
expressed by line like:
   <Ext>BAT/CMD/BTM</Ext>
The files with CMD extension are registered as "Windows Command Script"
whereas the files with BAT extension are registered as "Windows Batch File".
In my opinion that naming is a bit misleading, because some batch files are
also or only used inside DOS system. And there is no reliable way to
distinguish between Windows CMD and DOS BAT files. The only reliable fact is
that when special instructions like setlocal or environment variables like
%SystemRoot% or %SystemDrive% occur inside a batch file, then that file is
not DOS batch file and is probably used inside a Windows NT like system.
Unfortunately many batch files suited only for Windows systems still use the
BAT file name extension instead CMD. And this allowed and works. So it is
impossible to distinguish clearly between CMD and BAT variants.

Although batch files are "simple" text files the are executed by DOS and
Windows systems according by the file name extension. So batch files can be
misused by malicious software or possible unwanted applications. So the
phrase like "executable" should be also used when defining a name for batch
files like the DROID tool do. So i finally chose a naming convention
expressed by line like:
   <FileType>Batch file executable</FileType>

Because batch files are "simple" text files a mime type like "text/plain"
could be chosen. Because batch files are executable these are not "plain"
text. So a modified mime type should be used. The file command use
"text/x-msdos-batch", but that is not well suited because some batch files
and especially CMD variants are used inside Windows NT like systems. So i
choose a similar user defined names expressed by line like:
   <Mime>text/x-batch</Mime>
With the new definition 1362 undetected batch files are now described ( see
appended @/output/trid-new-v.txt). That is about the half of my inspected
batch collection.

Instead of the echo command often (about 9%) batch files start with the
remark command for comment lines. This construct also work in "older" DOS
version like Microsoft DOS version 2.11. In my collection i found 240
samples starting with remark line. No of my 4DOS batch files use that
command at the beginning. That is expressed by line like:
   <Ext>BAT/CMD</Ext>
Because of case sensitiveness i found 67 example like png2pnm.bat with up
case spelling. In that examples after that command a space character
occur. That is expressed inside bat-REM-upcase.trid.xml by XML construct:
   <Bytes>52454D20</Bytes>
   <ASCII> R E M</ASCII>
   <Pos>0</Pos>

In my collection i find 114 examples with low case variant and in 108
examples a space character appears after that command. Here i find example
like fallback_build.bat with carriage return (0xD) after remark command.  Or
examples like pyunoenv.bat have only line feed character (0xA) after remark
command. So for such examples that is expressed inside bat-rem.trid.xml by
XML construct like:
   <Bytes>72656D</Bytes>
   <ASCII> r e m</ASCII>
   <Pos>0</Pos>

With the new definition the undetected batch files starting with remark
lines are now described ( see appended rem/output/trid-new-v.txt
REM-up/output/trid-new-v.txt). TrID definitions, some examples and output
are stored in archive bat_trid.zip. I hope that my 3 XML files can be used
in future version of triddefs.

In my collection i also found 92 examples directly starting with ECHO
command. I also found 71 examples starting with 2 colon characters.  I also
found 64 examples starting with @REM and 55 starting with @IF.  I also found
examples starting directly with DOS commands like 12 with MD, 10 with COPY,
23 with Windows specific START, 28 with IF directive and 33 with SET
command. In many building project batch files directly start with calling
compiler like TCC, BCC, CC, NASM or MASM or tools like WCL, DMAKE, NMAKE or
MAKE. On disk for flashing the BIOS i found batch files where flash tools
like EAFUDOS or SFLASH are called with the right parameters. Too raise
recognition rate for batch by a few percent then every specific case an
additional Trid definition file is needed. That is annoying and hard
inefficient work.

I would be nice if all batch file writer would start their command text with
@ECHO directive or remark lines.

With best wishes
Jörg Jenderek

Mark0

  • Administrator
  • Hero Member
  • *****
  • Posts: 2743
    • Mark0's Home Page
Re: 3 bat*.trid.xml for many batch files *.bat *.cmd *.btm
« Reply #1 on: October 22, 2020, 02:26:05 AM »
I think that these filetypes are too free-form to be identified, like most source code.
Maybe just the DOS .BAT with just @ECHO or @echo.
Will take a look!

Thanks!