Recent Posts

Pages: 1 ... 6 7 [8] 9 10
71
Definitions DB change log / Re: Current - Year 2024
« Last post by Mark0 on April 25, 2024, 03:40:46 PM »
Updated:
  • Construct 2 Project (XML) (CAPROJ)
  • Maya ASCII Scene (MA)
  • Maya Binary Scene (32bit) (MB)
  • Maya Binary Scene (64bit) (MB)
  • Nuke script (NK)
  • QLWA hard disk image (WIN)
Added:
  • QubIDE disk image (BIN)
  • Construct 3 Add-On (C3ADDON)
  • Construct 3 Project/Package (C3P)
  • Construct 3 Project (JSON) (C3PROJ)
  • Whacker Tracker audio Driver (DRV)
  • MilkDrop preset (v2) (MILK)
  • MilkDrop preset (v3) (MILK)
  • MilkDrop double-preset (v3) (MILK2)
  • MilkDrop Shape (SHAPE)
  • MilkDrop Wave (WAVE)
  • Construct 2 UI state (XML) (XML)
  • Construct 2 layout (XML) (XML)
72
Definitions DB change log / Re: Current - Year 2024
« Last post by Mark0 on April 24, 2024, 12:46:50 AM »
Updated:
Added:
  • PotPlayer Skin (DSF)
  • Dramatica story (generic) (DSF/DR5)
  • Fade In document (FADEIN)
  • Kingsoft Antivirus data (FSG)
  • HueForge Project (HFP)
  • Kingsoft Antivirus component install info (KID)
  • KMPlayer Skin File (KSF)
  • Kingsoft Antivirus data (KSG)
  • WebArt Designer graphics (MIF)
  • Kingsoft Antivirus data (PSG)
  • Amiga Symbolizer Function Definition (SFD)
  • HueForge STereoLithography (binary) (STL)
  • MakerWorld STereoLithography (binary) (STL)
  • Kingsoft Antivirus data (VSG)
  • Open Screenplay Format document (generic) (XML)
73
Definitions DB change log / Re: Current - Year 2024
« Last post by Mark0 on April 22, 2024, 02:22:54 PM »
Updated:
  • PureBasic library ()
  • PureBasic resident data (Amiga) ()
  • SPIP ASCII export SPM text (ASC/TXT)
  • Hamamatsu NanoZoomer Digital Pathology Images Set (NDPIS)
  • PureBasic Resident data (Win/Linux/OS X) (RES)
  • SoundFont 1.0 (SBK)
  • SoundFont 2.0 (SF2)
  • sfArk compressed SoundFont (SFARK)
Added:
  • 3DX 3D model (ASC/TXT)
  • 16bit COM self displaying KrisCard (Jr, v1.2) (COM)
  • 16bit COM self displaying KrisCard (v2.0) (COM)
  • DIZ2EXE Win32 executable (v1.0) (EXE)
  • CRI FMS text strings data (FMS)
  • HP 48 binary (generic) (HP48/48P)
  • HP 49 binary (generic) (HP49/49P)
  • Karaoke track info (KRK)
  • KRadio Preset (KRP)
  • KrisCards card/template (KRS)
  • PureHELPMaker Project (PHM)
  • PureBasic IDE exported Preferences (PREFS)
  • RivaTuner data base Build (RTB)
  • Rusted Warfare Saved game (RWSAVE)
  • Sprout game data archive (SAF)
  • Scheduling Export File (SEX/RPT)
  • Urban Chaos Story Script (STY)
  • CrossTie profile (TIE)
  • Panda Security System vulnerabilities info (XML)
  • Panda Security System vulnerabilities info (UTF-8) (XML)
74
Hello trid users,

some days ago i looked at the content of an exotic CD-ROM. There are also
stored samples which are misidentified. The samples have TFM file name suffix.

There exist other variants. So in this session i will handle only variant with
lh=0x40. I have forgotten to append the last definition
tfm-tex-0x40.trid.xml (so this now in tfm-tex-0x40.trid.zip)

I though i get all variants. When considering TFM samples on Windows i
was at the end after a dozen. But when considering samples on Linux Mint
(version 21.3) i get some more variants but at least maximal 30 as far
as i can see. The problem for me was that tridscan does not work with
path on Mint. So i must transfer TFM to Windows system or in a sample
directory and the do scanning procedure. So i will summarize my
results that i get at the moment in following table:

definition or lh      #files   #files   #files   "name"
            trid   windows   mint
tfm-tex-0x12.trid.xml      647   12328   11634   data~2
tfm-tex-0x11.trid.xml      170   245   1431   data~1
tfm-tex-0x02.trid.xml      314   11   392   data~4
tfm-tex-0x15.trid.xml      69   81   21   data~3
tfm-tex-0x40.trid.xml      46   14   63   data~8
tfm-tex-0x40-foo.trid.xml   28   include   include   data~8
tfm-tex-0x78.trid.xml      24   25   31   data~7
01h            9   9   1603   data~5
21h            8   8   17   data~6
75h               0   2   data~9
79h               0   18   data~10
2ah               0   12   data~11
23h               0   196   data~12
247h               0   14   data~13
45h               0   14   data~14
272               0   14   data~15
77h               0   98   data~16
5bh               0   14   data~17
38h               0   98   data~18
e0h               0   296   data~19
24h               0   4   data~20
126h               0   296   data~21
33h               0   4   data~22
17h               0   258   data~23
71h               0   4   data~24
32h               0   6   data~25
2dh               0   4   data~26
30h               0   2   data~27
36h               0   4   data~28
92h               0   2   data~29
5ah               0   2   data~30
else $RECYCLE.BIN $IX19EC1.tfm   -   1   
sum            1315   12722   16554

In column "name" is listed how i call it at the moment in patched file command
(see appended file.tmp in output).

Like in all concerns you must balance the pro and contra items.

So samples described as "data~2" have a recognition rate of 70 percent
and variant described as "data~1" have a rate of 8 percent. At first
glance this sound great with a rate of about 80 %.

If you say you only want to keep "most important" variants than at
least "data~5" with samples in thousand range (1603) must also be kept.

Then you must look at the aim or goal of TrID utility. One purpose is
"restoration". If after a file system crash you have lost the
correlation of files to the names and directories and you have then
ten thousands of files you have about two thousands of "unrecognized" files
in case of TFM. So this rate is still too low to "restore" your system
for example.

I do not hate gamers, but when you add definition for games like GTA
or exotic audio formats like "PC9801 rip" by m-mod.trid.xml you should
first concentrate that the files of the operating system and of
important/relevant components like web browser are already
described. So you must answer how relevant is the TEX system compared
with games for example. Some decades ago when i was studying on all scientific
institutes at my university are using TeX/Latex. This was the only
software that can handle formulas and does not cost money like
software products from Adobe or Apple. Today many Office suite can also
does this work but older publications are done by TeX as
word processor. So this system has a higher relevance compared with
games or similar in my opinion.

Like in virus scanner "wrong results" are annoying. So for many
examples i get with low priority description  as "Adobe PhotoShop
Brush" by abr.trid.xml. So when you keep such unreliable definition why
do refuse to add definitions for TFM?

The next question is how many efforts must be done to describe TFM
samples. So this is not so complicated and not rocket science. It is just
hard work and little knowledge. The main classification is done by
data header size (lh value). Unfortunately this field is only 16 bit. So the
recognition done only by this field is not reliable. When considering all 24
bytes of header i get more patterns like values are lower 256, which
maybe are not always true but make definition for TFM different from other
file formats. The lh values determines the size of the following data
header. So you may add pattern describing the coding, font family name and
seven bit safe byte. Then the definition should be unique enough and the
efforts for TFM files is manageable.

You may cry that are so many definitions, but in the end you must look
that 30 definitions considering with hard work only about 24 bytes gives
720 bytes totally that described all TeX TFM samples with about 100%
rate (at least on typically Linux Mint system).

That is also important when you consider another aspect that is
provided by TrID. That item is "security" as it is provided by
virustotal for example. If you only have a classification rate of about
80 % then this means 20 % are unclassified and must be considered as
potentially location for malicious code.

So i will continue and deliver definitions for all 30 variants.

With best wishes
J?rg Jenderek
75
Definitions DB change log / Re: Current - Year 2024
« Last post by Mark0 on April 19, 2024, 12:06:36 AM »
Updated:
  • DK Multimedia Animation (22kHz audio) (ANI)
  • Gwyddion XYZ field SPM data (v1.0) (GXYZF)
  • PC9801 rip (M)
Added:
  • AIX Common Object File Format (COFF) executable ()
  • SPIP ASCII export SPM text (ASC)
  • WITec exported text data (ASC)
  • Wyko OPD ASCII data (ASC)
  • Omicron flat SPM data (_FLAT)
  • Quesant AFM data (AFM)
  • Thermicroscopes SPMLab FP data (FLT)
  • Unisoku SPM data Header (HDR)
  • Omicron MATRIX SPM image data (MTRX)
  • Omicron MATRIX SPM parameter data (MTRX)
  • Wyko OPD binary data (OPD)
  • Nanosurf PLT SPM text data (PLT)
  • Sensofar PLUx SPM data (PLUX)
  • Pacific Nanotechlology Nano-R SPM data (PNI)
  • SymPhoTime TTTR data (v2.0) (PT3)
  • FEI Tecnai imaging and analysis (S)TEM data (SER)
  • Surface Imaging Systems SPM data (SIS)
  • Shimadzu SPM File Format (binary) (SPH/SPP)
  • Gwyddion STereoLithography (binary) (STL)
  • HyperStudio Stack (Mac) (STK)
  • mtronix Stream (STM)
  • WinSTM data (STM)
  • Molecular Imaging STP SPM data (STP)
  • Surf SPM data (SUR)
  • WSxM SPM data (TOM/STP/TOP)
  • Shimadzu SPM File Format (ASCII) (TXT)
  • OpenGPS X3P (ISO 5436-2) (X3P)
  • VCL Style (v2.0) (VSF)
  • Renishaw WiRE Data (WDF)
  • WITec Project data (WIP)
  • WHDLoad DB (XML)
76
Thanks for the updated PC9801 rip def!
For the TFM, see my reply on the other last one.
77
Thanks but, it seems that the actual last *.trid.xml file is missing (there only a lot of backups).

Also, I see that that seems to be really a lot of possibile variant, sometimes with very little patterns/differences...
Maybe it would be better to just keep the 2/3 more common ones, if you can identify them.
78
Definitions DB change log / Re: Current - Year 2024
« Last post by Mark0 on April 16, 2024, 01:59:48 PM »
Updated:
  • Olympus RAW format (FluoView) (OIR)
Added:
  • Mac ICOL colors LUT ()
  • Mac PICT bitmap ()
  • Veeco D3100 data (old) (001/002/003)
  • attocube Systems 2D data (ASCII) (ASC)
  • Hitachi AFM data (AFM)
  • BCR-STM SPM data (16bit int) (BCR)
  • BCR-STM SPM data (16bit int, UTF-16) (BCR)
  • BCR-STM SPM data (32bit FP) (BCRF)
  • BCR-STM SPM data (32bit FP, UTF-16) (BCRF)
  • Createc SPM data (16bit int) (DAT)
  • Createc SPM data (32bit FP compressed) (DAT)
  • Createc SPM data (32bit FP) (DAT)
  • Zygo MetroPro SPM data (var.1) (DAT)
  • Zygo MetroPro SPM data (var.2) (DAT)
  • Zygo MetroPro SPM data (var.3) (DAT)
  • EPOC/Symbian Library (DLL)
  • Nano Measuring Machine profile data (DSC)
  • Nanosurf EZD SPM data (EZD/NID)
  • MicroProf FRT profilometry data (FRT)
  • Danish Micro Engineering General Data Exchange Format (GDF)
  • Gwyddion XYZ field SPM data (v1.0) (GXYZF)
  • Danish Micro Engineering Rasterscope SPM data (IMG)
  • MapVue profilometry data (intensity) (MAP)
  • MapVue profilometry data (phase) (MAP)
  • NT-MDT SPM data (MDT)
  • Molecular Imaging MI image SPM data (MI)
  • Molecular Imaging MI spectroscopy SPM data (MI)
  • Danish Micro Engineering MIF SPM data (MIF)
  • Nanoeducator SPM data (MSPM)
  • Nanonics NAN SPM data (NAN)
  • Nanomagnetics SPM data (v3) (NMI)
  • Nanomagnetics SPM data (v5) (NMI)
  • Olympus RAW format (OIR)
  • Dektak OPDx profilometry data (OPDX)
  • Anfatec SPM Parameters (PAR)
  • Hitachi SEM data (SEM)
  • ISO 28600:2011 SPM data transfer format (SPM)
  • Veeco Nanoscope II SPM data (SPM)
  • Veeco Nanoscope III SPM data (binary) (SPM)
  • Veeco Nanoscope III SPM data (text) (SPM)
  • Veeco Nanoscope III SPM force data (binary) (SPM)
  • Nanonis SXM data (SXM)
  • Keyence VK3 profilometry data (VK3)
  • Keyence VK4 profilometry data (VK4)
  • Keyence VK6 profilometry data (VK6)
  • NanoScan SPM data (XML)
79
Hello trid users,

some days ago i looked at the content of an exotic CD-ROM. There are also
stored samples which are misidentified. The samples have TFM file name suffix

There exist other variants. So in this session i will handle only variant with
lh=0x02. I will explain later what this means.

I found such samples after installing MiKTeX version 23.12 on Windows. On
Linux Mint 21.3 i found such samples as part of packages (like texlive-base
texlive-fonts-recommended texlive-lang-greek texlive-lang-other
texlive-latex-extra texlive-music texlive-pictures texlive-science).

So i run trid utility on my TFM samples with lh=0x02. The samples are not
recognized. Many are described as "Unknown!". For not unknown samples i get
many different descriptions, but all are wrong (see appended trid-v-old.txt in
output). I get many hundreds of such TFM samples. It took some time to get
dozen of non TFM samples which matches the misidentified TFM samples. Many are
described as "Adobe PhotoShop Brush" by abr.trid.xml. Few are described as
"Commodore 128 BASIC V7.0 program" by prg-c128.trid.xml or as "Commodore 128
BASIC V7.0 program (graph mode on)" by prg-c128-gfx.trid.xml. Few (like
cmman.tfm gen9.tfm) are described as "MacBinary 1" by
macbinary-1.trid.xml. Few samples (like yarborn.tfm) are described as "PC9801
rip" by m-mod.trid.xml.

For comparison reason i also run the file format identification utility DROID
(See https://sourceforge.net/projects/droid/). Here most of the samples are
also not recognized.  Samples with bin suffix like (like ttcomp-bin-4k.bin)
are therefore described as "Binary File" by PUID fmt/208. Samples with m
suffix like (like DIES_13.M EVE_18.M EVE_19A.M EVE_26.M) are therefore
described as "MATLAB Script File" by PUID fmt/1678.

For comparison reason i also run file command (version 5.45) on such
samples. Here all such samples are not recognized" and not described as "TeX
font metric data". A few samples with M suffix (like EVE_18.M) are
misidentified as "TeX font metric data".  Few samples (like cmfibs8.tfm
fcitt12.tfm rgrbf10.tfm) are described as "executable" for "MIPS" or "amd 29k"
architecture Few samples (like cmrgrsl10.tfm yrcmex10.tfm) are described as
"object" for "Tower/XP" architecture.  This behaviour get not better when
using no keep going option of file command (see appended file-5.45.txt in
output). For the TFM samples no mime type application/x-tex-tfm is shown (see
appended file-i-5.45.txt in output). Here no file name suffix is shown (see
appended file-ext-5.45.txt in output).

Luckily i found page about TeX Font Metrics on file formats archive team web
site and on Wikipedia. So i use the first because the Wikipedia link is there
also mentioned and furthermore link to download samples are here listed. So
the reference URL in new definition is expressed by line like:
 <RefURL>http://fileformats.archiveteam.org/wiki/TeX_Font_Metrics/RefURL>

So i run tridscan on my samples to generate tfm-tex-0x02.trid.xml.  Afterwards
i tried to understand the generated constructs and look if these are always
true. According to specification the six-word (24-byte) file header contains
twelve unsigned 16-bit integers which describes general TFM characteristics
(the length of the file, the range of character codes contained in the font,
and the size of each of the tables). According to specification i patched file
command (see appended file.tmp in output).

On specification are some formulas listed like:
    bc-1 <= ec < =255
    ne <=256
    lf=6+lh+(ec-bc+1)+nw+nh+nd+ni+nl+nk+ne+np
That means that at least three fields (bc,ec,ne) are always lower 256. Because
the files are stored in big endian format that means upper byte of these
fields are nil. Apparently nearly all others of these twelve fields are below
256.  So at even offsets we have nil bytes. That is expressed by XML
constructs like:
   ...
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>20</Pos>
   </Pattern>
   <Pattern>
      <Bytes>00</Bytes>
      <Pos>22</Pos>
   </Pattern>
   <Pattern>

The only exceptions are number of words in the lig table (nl) and file length
(lf). The first value is stored at offset 16 as field nk and is sometimes
bigger than 255. Also the file length is sometimes bigger than 255 (like
casyll10.tfm cmfibs8.tfm fcbx10.tfm fcitt12.tfm mrgrsl10.tfm rgrbf10.tfm
wasysl10.tfmyrcmex10.tfm ). That value is stored at offset 0 as field lf in
word units. By multiplying this value with 4 the file size in bytes can be
obtained.

If samples are real TFM that can be verified by running a command line tool
like:
   tftopl yarborn.tfm yarborn.pl

Now comes the interesting part. At offset 2 the length of the header data is
in word units. For some of my inspected TFM samples this value is 2
(=0x02). The samples in this session all have this value. Together with upper
nil byte of bc (first character code in the font) this significant part is
expressed by XML construct like:
   <Bytes>000200</Bytes>
   <Pos>2</Pos>

As far as i can see there exist less than dozen of variants with other lh
values. In other variants the lh values does change only a little bit. I will
handle the other variants in a future session.

When header size is 2 then there exist only two elements (header[0] is a
32-bit check sum; header[1] size of the font (fix_word are units of TeX
points). So in this variant there exist no header[2..11] (coding name) and no
header[12..16] (font family name). So these samples apparently contain no ASCI
like strings.

With the new definition all TFM samples with header size 0x02 are now
recognized and described (see appended trid-v-new.txt trid-new.txt in
output). The definition is "good" , that it does not misidentifies non TFM
samples. Unfortunately for some TFM samples the description as TeX Font Metric
is not the first. The main reason is that significant characteristic is done
by 16-bit lh value.

Luckily i found page about audio samples with m suffix on file formats archive
team web site. So i use this. So the reference URL in definition is expressed
by line like:

 <RefURL>
 http://fileformats.archiveteam.org/wiki/Professional_Music_Driver_PMD
 </RefURL>

TrID definitions, some samples and output are stored in archive
tfm_0x02.zip. I hope that my definition can be used in future version of
triddefs. As mentioned there exist other variants of TFM. I will try to handle
these in a future session.

With best wishes
J?rg Jenderek
80
Definitions DB change log / Re: Current - Year 2024
« Last post by Mark0 on April 13, 2024, 07:53:34 PM »
Updated:
  • CompDisk compressed disk image ()
  • Astound / Animation Works Movie (AWM)
  • Csound unified file format (CSD)
  • Lynx archive (LNX)
  • Simulation Description Format (SDF)
  • XMILE Model (STMX)
  • Transport Neutral Encapsulation Format (TNEF/DAT)
Added:
  • Alicona 3D image (AL3D)
  • Ambios AMB (AMB)
  • Ambisonic B-Format audio (AMB)
  • Fox Engine DeForm (DFRM)
  • Fox Engine Model (FMDL)
  • Fox Engine Texture (FTEX)
  • Fox Engine Form Variation (FV2)
  • Gwyddion Simple Field (v1.0) (GSF)
  • Gwyddion Container (GWY)
  • gfxboot compiled HTML Help (main) (HLP)
  • gfxboot compiled HTML Help (opt) (HLP)
  • Sibelius Scorch (SCO)
  • TsiLang binary translation data (SIB)
  • TsiLang translation data (SIL)
  • ANDOR SIF (SIF)
  • 3shape STereoLithography (binary) (STL)
  • TeX Font Metric (0x78) (TFM)
Deleted:
  • gfxboot compiled html help (HLP)
  • Csound Score (SCO)
Pages: 1 ... 6 7 [8] 9 10