Author Topic: updated ark-cpio*.trid.xml for CPIO archive + 3 variants for portable  (Read 671 times)

jenderek

  • Sr. Member
  • ****
  • Posts: 375
Hello trid users,

some days ago i looked at medium of my last Linux installation. It was a
SUSE LEAP. So i looked at files in boot/x86_64/loader directory. There was a
file named like bootlogo.

When running Trid command on such examples and other cpio archive i get a
reasonable looking output (See appended output/trid-v-old.txt). TrID
recognizes all samples and describes 3 different variants. It list as
Related URL the cpio page on gnu web site. That was triggered inside
definition by line like:
   <RefURL>http://www.gnu.org/software/cpio/cpio.html</RefURL>

That is too special, because there exist also cpio programs which behaves a
little bit different. During my tests i also use 7-zip archive program to
handle cpio archive with type directive -tcpio (see appended
output/7z-l.txt). So i replace GNU cpio URL. Instead i could choose cpio
page on Wikipedia. But in the end i choose cpio page on file formats archive
team web site. So this now is expressed inside definition by updated line
like:
   <RefURL>http://fileformats.archiveteam.org/wiki/Cpio</RefURL>
Why i choose that web site? The Wikipedia site is here also listed as link,
but here also samples to download are listed. Furthermore also suited
software to handle such archives are listed. Also identification patterns
are listed which is relevant for TrID definitions.

TrID also show only generic mime type application/octet-stream, but
according to the above web site an used defined type is used. That is now
expressed by changed line which now looks like:
   <Mime>application/x-cpio</Mime>
That information is also shown by file command with -i option (see appended
output/file-i-5.44.txt).

On Wikipedia page also another file identification type is mentioned. That
is called Uniform Type Identifier (UTI) and mainly pushed by Apple. So here
"public.cpio-archive" is used for the archives. I am not a prophet and can
say which system will win. In evil case the worst system will win (like
"VHS" versus "Beta" in video recorder format wars). And since Putin's war
against Ukraine we shall be prepared for all evil cases. So the same applies
for file identifications methods. I think it will be not difficult to
implement this Apple forced identification method.

The standard suffix apparently is cpio. That is expressed inside definition
by line like:
   <Ext>CPIO</Ext>
But i also found sample like message.cpi or cinema.cpi with 3 byte suffix
cpi. It is not explained why, but i assume this is triggered by DOS FAT name
limitation to 8+3 name length.
On Unix like system you do not have the concept of file type is indicated by
file name suffix. So here i find samples without file name suffix like
bootlogo or pcmcia.
But i also found samples like VOL.000.008 or VOL.000.012. Here suffix
consist of digits. I assume that this triggered by backup procedure that
number backup parts.
But i am not sure about the additional name extensions. The file command
(version 5.44) shown no file name suffix with extension option (See appended
output/file-ext-5.44.txt). For comparison reason i also run the file format
identification utility DROID (See https://sourceforge.net/projects/droid/).
It does only recognize the one binary variant like in samples
fmt-635-signature-id-960.cpio and bootlogo that are described by TrID
command as "CPIO archive (binary)" by ark-cpio-bin.trid.xml. Here these are
described as "CPIO" by PUID fmt/635. Here no mime type is shown. Also only
suffix CPIO is considered as valid (EXTENSION_MISMATCH is false). So samples
bootlogo and message.cpi are marked as "bad".
For that non-sureness i leave in the line for name extensions the 1 with CPIO.

Now comes a surprising part. For samples described by TrID as "CPIO archive
(portable)" via ark-cpio.trid.xml the file command does a further level of
sub classification. The describing level is expressed by phrase "ASCII cpio
archive" (See appended output/file-5.44.txt).
The specifications can be found for example on mentioned cpio(5) format man
page. All starts with 5 byte string 07070. That is expressed inside
ark-cpio.trid.xml by XML construct like:
   <Bytes>3037303730</Bytes>
   <ASCII> 0 7 0 7 0</ASCII>
   <Pos>0</Pos>

The first subclass variant is described by new ark-cpio-odc.trid.xml. The
characteristic pattern is expressed by XML construct like:
   <Bytes>303730373037</Bytes>
   <ASCII> 0 7 0 7 0 7</ASCII>
   <Pos>0</Pos>
This variant is called "ASCII cpio archive (pre-SVR4 or odc)" by file
command, "cpio/odc" by GNU cpio and "Cpio/Portable ASCII" by 7-Zip. So i
mention this different naming schemes in remark line. In TrID definition
this i now express by line like:
   <FileType>CPIO archive (portable old)</FileType>

The second subclass variant is described by new ark-cpio-newc.trid.xml. The
characteristic pattern is expressed by XML construct like:
   <Bytes>303730373031</Bytes>
   <ASCII> 0 7 0 7 0 1</ASCII>
   <Pos>0</Pos>
This variant is called "ASCII cpio archive (SVR4 with no CRC)" by file
command, "cpio/newc" by GNU cpio and "Cpio/New ASCII" by 7-Zip. So i mention
this different naming schemes in remark line. In TrID definition this i now
express by line like:
   <FileType>CPIO archive (portable new)</FileType>

The third subclass variant is described by new ark-cpio-crc.trid.xml. The
characteristic pattern is expressed by XML construct like:
   <Bytes>303730373032</Bytes>
   <ASCII> 0 7 0 7 0 2</ASCII>
   <Pos>0</Pos>
This variant is called called "ASCII cpio archive (SVR4 with CRC)" by file
command, "cpio/crc" by GNU cpio and "Cpio/New CRC" by 7-Zip.  So i mention
this different naming schemes in remark line. In TrID definition this i now
express by line like:
   <FileType>CPIO archive (portable with CRC)</FileType>

If the sub class variants are used then the definition ark-cpio.trid.xml is
not needed any more because description with more details is now already
done by the 3 new additional definitions.

With the updated and additional trid definitions now my cpio examples are
still described but with more details (see appended output/trid-v-new.txt).
TrID definition, some samples and output are stored in archive
cpio_trid.zip. I hope that my updated XML files and the variants can be used
in future version of triddefs.

With best wishes
Jörg Jenderek

Mark0

  • Administrator
  • Hero Member
  • *****
  • Posts: 2732
    • Mark0's Home Page
Re: updated ark-cpio*.trid.xml for CPIO archive + 3 variants for portable
« Reply #1 on: March 04, 2023, 10:44:51 PM »
Thanks!