Marco Pontello's Home Page
TrIDScan
Questa pagina in italiano

(Last updated: 20/01/25)
 

TrIDScan - Patterns scanner

TrIDScan creates new definitions to be used with TrID. You can use it to help collect new unique definitions. Here's how.

Let's say you want to create a definition for Java Class files and you have a collection of them. Put your file collection into a directory (folder) of it's own. The more varied your collection in size and compression, the better the results. Run TrIDScan against the folder. That's all there is to it; the program does the rest.

 D:\TrID>tridscan.py \test\*.class

 TrIDScan/Py v2.02 - (C) 2015-2016 By M.Pontello
 
 File(s) to scan found: 4
 Scanning for patterns...
 Checking file 1/4 '\test\corewar.class'
 Checking file 2/4 '\test\hellow.class'
   Pattern(s) found: 15
 Checking file 3/4 '\test\life.class'
   Pattern(s) found: 6
 Checking file 4/4 '\test\primes.class'
   Pattern(s) found: 3
 Last pattern end at offset: 11
 Scanning for strings...
 Analyzing file 1/4 '\test\corewar.class'
   Raw strings: 0K
   String(s) found: 162
 Analyzing file 2/4 '\test\hellow.class'
 Parsing...
   Raw strings: 0K
   Checking strings...
   Filtering strings...
   String(s) found: 5
 Analyzing file 3/4 '\test\life.class'
   String(s) found: 3
 Analyzing file 4/4 '\test\primes.class'
   String(s) found: 3
 New TrID's definition written as 'newtype.trid.xml'. 

Scanning is generally fast, even for many files. It could be slow if there isn't at least one small file (under 300/400KB) and the the file contents is virtually random (ZIP files, MP3, JPEG, etc.). Just in case, it's possibile to disable the strings scanning (the slow part of the process) using the switch "-ns". Doing this, the scan will be blazing fast even for a thousand of files.
When finished, TrIDScan will create a file named "newtype.trid.xml" that contains the identifying details for the files you just scanned.

You have two steps left at this point: rename the file and edit its header. In the example given you might rename the file to "java-class.trid.xml" to indicate it applies to that kind of files. Then, open the file in a text editor and make necessary changes to the header information. The file "java-class.trid.xml" would have a form similar to:

<TrID ver="2.00">
    <Info>
        <FileType>Java Bytecode</FileType>
        <Ext>CLASS</Ext>
        <Mime></Mime>
        <ExtraInfo>
            <Rem></Rem>
            <RefURL></RefURL>
        </ExtraInfo>
        <User>Marco Pontello</User>
        <E-Mail>marcopon@nospam@gmail.com</E-Mail>
        <Home>http://mark0.net</Home>
    </Info>
    <General>
        <FileNum>4</FileNum>
        <CheckStrings>True</CheckStrings>
        <Date>
            <Year>2003</Year>
            <Month>11</Month>
            <Day>14</Day>
        </Date>
        <Time>
            <Hour>03</Hour>
            <Min>10</Min>
            <Sec>51</Sec>
        </Time>
        <Creator>TrIDScan/Py v2.02</Creator>
    </General>
    <FrontBlock>
        <Pattern>
            <Bytes>CAFEBABE00</Bytes>
            <Pos>0</Pos>
        </Pattern>
        <Pattern>
            <Bytes>00</Bytes>
            <Pos>6</Pos>
        </Pattern>
        <Pattern>
            <Bytes>00</Bytes>
            <Pos>11</Pos>
        </Pattern>
    </FrontBlock>
    <GlobalStrings>
        <String>CODE</String>
        <String>INIT</String>
        <String>JAVA</String>
    </GlobalStrings>

Edit the information between the <FileType> tags to indicate what type of file it is; complete the <Mime> element if the MIME type is known; the information between the <Ext> tags if TrIDScan has not guessed correctly; and your contact information in the <User>, <E-mail>, and <Home> tags. That's it.

If you plan to do many scans then you can wrote your contact informations in a "tridscan.cfg.xml" file and it will be inserted into all further scans by default.

<?xml version="1.0"?>
<settings>
    <User>Marco Pontello</User>
    <E-Mail>marcopon@nospam@gmail.com</E-Mail>
    <Home>http://mark0.net</Home>
</settings>

The tags <Rem> and <RefURL> are used to provide some kinds of info about the file type. For example:
        <ExtraInfo>
            <Rem></Rem>
            <RefURL>http://java.sun.com/</RefURL>
        </ExtraInfo>

Once you create a new definition, send me a copy of the XML file to include in the database (see the Contacts page for the address).

It's also possible to "refine" a definition, scanning some files and telling TrIDScan to start from an already existing definition, instead that starting from scratch. This way, it will be like scanning also all the files already analyzed (maybe by others users). To use the refining function, simply add to the command line the definition to use as a starting point. The new def will substitute the previous one, after saving it as "newtype.trid.xml.bak" (this come handy if for some reasons you want to rescue the previous definition). For example:

 D:\TrID>tridscan.py c:\dev\programs\*.class -d java-class.trid.xml 

It's also possible to force the rescan of all uniques strings, for example when refining a def made with a previous version of TrIDScan that doesn't yet support this features. Simply use the switch "-fs" when refining.

Making definitions is easy. Use TrIDScan to scan all those data files you've been creating over the years. Send them all in. This data will be added to the master database so your work will help others identify unknown files they might have on their system. You'll be helping everyone!

If you find TrID a worthwhile project, please tell your friends about it and this site; this is going to work much better if many people participate and produce new or better defs!

N.B.
Please be as specific as possible when you describe the file type. If you have data files from different versions of a program try to group them by version and then create a definition for each version of the program. If, for example, you have Excel files created by a DOS version and Excel files created by a Windows version don't just create a single Excel definition; create one for each version.

Download

 Python   TrIDScan/Py v2.02, 6KB ZIP (Python 2.7.x required)
 Python   TrIDDefsPack v1.26b, 3KB ZIP (Python 2.7.x required)
 Win32   TrIDScan v1.56, 28KB ZIP (deprecated)
 Win32   TrIDDefsPack v1.12, 25KB ZIP (deprecated)
   TrID XML defs, 2214KB 7Z (archive with 18772 definitions, 20/01/25

 

Change Log

TridDefsPack/Py v1.21 - 26/01/16:
+ Rewritten in Python, released under AGPL v3.0 license
+ Support new directory structure for file definitions (.\defs\0,a-z)

TridScan/Py v2.01 - 16/03/15:
+ re-added sorting of files to analyze by size - could lead to major speed up in some circumstances

TridScan/Py v2.00 - 25/02/15:
+ Rewritten in Python, released under AGPL v3.0 license
+ Improved accuracy and (generally) speed of strings scanning
+ Can recurse subdirs
+ Added element for MIME tpye
- Old Win32 version now deprecated

TridDefsPack v1.12 - 24/02/11:
* Fixed an inconsistency with the Tag element

TridScan/32 v1.56 - 22/11/04:
+ Unique strings are sorted by length

TridScan/32 v1.55 - 20/11/03:
+ Unique strings scanning now is case insensitive
+ Possible bug fixed in the refining function

TrIDScan/32 v1.50 - 15/11/03:
+ New uniques strings detection function. It could be forced with the switch "/FS", or disabled with "/NS"

TrIDScan/32 v1.23 - 13/08/03:
+ New definition refining function
+ Added section <ExtraInfo> with elements <Rem> (for some remarks) and <RefURL> (for a related/reference URL)

TrIDScan/32 v1.00.1 - 13/07/03:
+ Added element <ASCII> with an ASCII dump of the pattern