Author Topic: gxt-gta3.trid.xml gxt-gta4.trid.xml for "newer" Grand Theft Auto text data  (Read 673 times)

jenderek

  • Sr. Member
  • ****
  • Posts: 375
Hello trid users,

some days ago i must handle some patch files. Sometimes suffix DIF is used.
Unfortunately there exist other file format with that extension like Data
Interchange Format. For such samples together with games samples with GXT
suffix i get unexpected identifications.

So i run trid utility on such examples with DIF and GXT suffix. The DIF
samples are recognized and described as "Data Interchange Format" with mime
type text/plain by dif.trid.xml. The GXT samples are not recognized and are
described as "Unknown!"  (see appended trid-v-old.txt in output).

For comparison reason i also run the file format identification utility DROID
(See https://sourceforge.net/projects/droid/). Here the DIF samples are also
recognized. Some are described as "VisiCalc Database" by PUID x-fmt/368
brcause of extension. The samples with CRLF line terminators are described as
"Data Interchange Format" by PUID x-fmt/41.  The DBT samples are described as
"dBASE Text Memo" by PUID x-fmt/311 based on file name extension.  The GXT
samples are not recognized.

For comparison reason i also run file command (version 5.45) on such samples.
Here the GXT samples are recognized. Some are described as "GTA in-game text
(GXT), version 3" (see appended file-k-5.45.txt in output). Some are described
as "GTA in-game text (GXT), version 4" (see appended file-k-5.45.txt in
v4/output). The DIF samples are also described wrong as version 3 GXT. The
version 4 samples are described also wrong as "dBase IV DBT".

Luckily i found about GXT on gtamods web server.So this expressed inside new
TrID definition by line like:
   <RefURL>https://gtamods.com/wiki/GXT</RefURL>

No mime type is mentioned. Because GXT samples are binary, so the samples get
associated generic mime type.  That is expressed by line like:
   <Mime>application/octet-stream</Mime>

There exist TrID definition for older version 2 of Such GXT samples. This is
called gxt-gta2.trid.xml with "Grand Theft Auto 2 text data" description
text. So the new definitions are named similar (like gxt-gta4.trid.xml
gxt-gta3.trid.xml)

The characteristics for version 3 GXT is described uniquely by first XML
construct. This looks like:
   <Bytes>5441424CB40300004D41494E00000000BC030000414D42554C414500</Bytes>
   <ASCII> T A B L . . . . M A I N . . . . . . . . A M B U L A E</ASCII>
   <Pos>0</Pos>
According to documentation at offset 0 4 byte string TABL is stored. This
matched the TABLE string in other matching definitions.
At offset 4 size of TABL block is stored as 4 byte integer. Here i get value
000003b4 in my few samples. Maybe that value is different for game
mods. Afterwards comes n TABL blocks. The first block at offset 8 has name
Main stored in 8 byte field. Afterwards comes the absolute offset for
corresponding TKEY entry. So in my examples i get value 0x000003bc.  At offset
20 second block starts. All entries are sorted in alphabetical order. So
apparently in my examples this is AMBULAE. Apparently all 8 character fields
are right padded with nil. At least that is the reason why these samples are
considered as "binary" whereas the non GXT samples are just plain text.

So the third entry is described by XML construct like:
   <Bytes>00415353494E310000</Bytes>
   <ASCII> . A S S I N 1</ASCII>
   <Pos>31</Pos>
At offset 32 third 8 byte entry name is stored. Apparently in my samples this
is ASSIN1. The bytes before are triggered because offset values are not
reaching 32 high bit limit.

The last of these entries is expressed by construct like:
   <Bytes>005441584957413300</Bytes>
   <ASCII> . T A X I W A 3</ASCII>
   <Pos>943</Pos>

With the block size value 3b4h and 8 (for leading bytes) at offset 0x3BC (956)
the next structure starts. In my examples this is expressed by XML construct
like:
   <Bytes>00544B4559</Bytes>
   <ASCII> . T K E Y</ASCII>
   <Pos>955</Pos>

Afterwards comes first entry of next structure. That is expressed by XML
construct like:
   <Bytes>00004143435552410000</Bytes>
   <ASCII> . . A C C U R A</ASCII>
   <Pos>966</Pos>

In my samples only the message fragment are different depending on on used
translations. So the entry numbers, ordering and their labels are always the
same. I have to do a lot of work and have not time to play games. So i do not
know if there exist modification samples where this structure look more
different. So i hope that other users can improve my definitions bases on my
described observations. I will not do improvement any more, because i spend
some hours on the net searching on the net for samples. That is really
difficult for me , because i am not a gamer. So most gaming phrases are like
Egypt hieroglyphs for me.

For version 4 variant the structure is similar, but before at the beginning
comes 16-bit version followed by 16-bit encoding bits (8: ASCII, 16:
UTF-16). So in my few samples this is expressed by first XML construct like:
   <Bytes>040008005441424C</Bytes>
   <ASCII> . . . . T A B L</ASCII>
   <Pos>0</Pos>

With the new 2 definitions my inspected GXT samples are now recognized and
described (see appended trid-v-new.txt in output and v4/output). But for
version 4 i get only 19.9% rate instead of 100% which i do not understand.

TrID definitions and output are stored in archive gxt_dif.zip. I hope
that my definitions can be used in future version of triddefs.

In defition dif.trid.xml generic mime type text/plain is used. So that is not
wrong. But here is what other use:
# https://www.pcmatic.com/company/libraries/fileextension/detail.asp?ext=dif.html
#!:mime   application/x-dif-spreadsheet   Gnumeric
# https://github.com/LibreOffice/online/blob/master/discovery.xml
#!:mime   application/x-dif-document   LibreOffice
# https://www.wikidata.org/wiki/Wikidata:WikiProject_Informatics/File_formats/Lists/File_formats
#!:mime   application/x-dif
# https://extension.nirsoft.net/dif
#!:mime   application/vnd.ms-excel

So in my opinion more suited seems to be application/x-dif-spreadsheet or
text/x-dif.

With best wishes
Jörg Jenderek

Mark0

  • Administrator
  • Hero Member
  • *****
  • Posts: 2743
    • Mark0's Home Page
Re: gxt-gta3.trid.xml gxt-gta4.trid.xml for "newer" Grand Theft Auto text data
« Reply #1 on: February 03, 2024, 03:43:33 PM »
Thanks for the new defs!
There are definitely two much strings & patterns, so I tried to trim both defs and hopefully they will work well enough still.