Author Topic: updated cpx.trid.xml for Windows codepage translator (Cyrillic) *.cpx  (Read 888 times)

jenderek

  • Sr. Member
  • ****
  • Posts: 375
Hello trid users,

some days ago i installed an old Windows software. Just for interest i run
TrID inside installation directory.

For most examples with CPX file name extension i get correct first
description "Windows codepage translator" by cpx.trid.xml and i got also a
description "Generic INI configuration" by ini.trid.xml which is also
correct but less specific. But one example 12510866.CPX is only described by
Generic INI configuration with wrong file name extension INI (See appended
output/trid-v-old.txt ).

For comparison reason i also run the file utility (version 5.40). This
behaves similar. Most examples are described as "Windows codepage
translator" with CPX extension, but example 12510866.CPX is only described
as ASCII text with CRLF line terminators and unknown extension (see appended
file-5.40.txt file-extension-5.40.txt in output directory).

Most examples have starting lines like:
[Windows Latin 1/437 (English)]
[Windows Latin 1(1252)/850 (Multilingual-Latin 1)]
[Windows Latin 1(1252)/860 (Portugal)]
[Windows Latin 1(1252)/861 (Iceland)]
[Windows Latin 1(1252)/863 (French Canada)]

That is expressed inside front block section of cpx.trid.xml by XML
construct like:
   <Bytes>5B57696E646F7773204C6174696E20</Bytes>
   <ASCII> [ W i n d o w s   L a t i n</ASCII>
   <Pos>0</Pos>

Because all examples start with a left bracket character these examples are
also identified by ini.trid.xml vi XML construct like:
   <Bytes>5B</Bytes>
   <ASCII> [</ASCII>
   <Pos>0</Pos>

The example 12510866.CPX has a starting line like:
[Windows Cyrillic(1251)/866 (Russian)]

So here second word is Cyrillic instead of Latin. So i mention this fact in
updated trid definition where identification now happens by XML construct
like:
   <Bytes>5B57696E646F777320</Bytes>
   <ASCII> [ W i n d o w s</ASCII>
   <Pos>0</Pos>
   <Bytes>69</Bytes>
   <ASCII> i</ASCII>
   <Pos>12</Pos>

Because the CPX examples are simple text files these can be described by
generic mime type text/plain like the file command do (See appended
output/file-i-5.40.txt). Instead of generic mime type i choose a user
defined one. That is now expressed by line like:
   <Mime>text/x-ms-cpx</Mime>

At least i found a page about Character Translation at web site
documentation.basis.com. So i use this as reference URL. That is expressed
by line like:
 <RefURL>
 https://documentation.basis.com/BASISHelp/WebHelp/b3odbc/ODBC_Driver/
 obdcdriv_character_translation.htm
 </RefURL>

With the updated definition all CPX code page translators are now described
with more details.  (see appended output/trid-v.txt). TrID definitions,
some examples and output are stored in cpx_ini.zip. I hope that my XML files
can be used in future version of triddefs.

Maybe there exist more non Latin codepage translator files.

With best wishes
Jörg Jenderek

Mark0

  • Administrator
  • Hero Member
  • *****
  • Posts: 2709
    • Mark0's Home Page
Re: updated cpx.trid.xml for Windows codepage translator (Cyrillic) *.cpx
« Reply #1 on: August 28, 2021, 02:45:30 PM »
Thanks!