Mark0's Forum
		Software => TrID File Identifier => Topic started by: jenderek on March 03, 2023, 02:17:54 AM
		
			
			- 
				Hello trid users,
 
 some days ago i looked at medium of my last Linux installation. It was a
 SUSE LEAP. So i looked at files in boot/x86_64/loader directory. There was a
 file named like bootlogo.
 
 When running Trid command on such examples and other cpio archive i get a
 reasonable looking output (See appended output/trid-v-old.txt). TrID
 recognizes all samples and describes 3 different variants. It list as
 Related URL the cpio page on gnu web site. That was triggered inside
 definition by line like:
 <RefURL>http://www.gnu.org/software/cpio/cpio.html</RefURL>
 
 That is too special, because there exist also cpio programs which behaves a
 little bit different. During my tests i also use 7-zip archive program to
 handle cpio archive with type directive -tcpio (see appended
 output/7z-l.txt). So i replace GNU cpio URL. Instead i could choose cpio
 page on Wikipedia. But in the end i choose cpio page on file formats archive
 team web site. So this now is expressed inside definition by updated line
 like:
 <RefURL>http://fileformats.archiveteam.org/wiki/Cpio</RefURL>
 Why i choose that web site? The Wikipedia site is here also listed as link,
 but here also samples to download are listed. Furthermore also suited
 software to handle such archives are listed. Also identification patterns
 are listed which is relevant for TrID definitions.
 
 TrID also show only generic mime type application/octet-stream, but
 according to the above web site an used defined type is used. That is now
 expressed by changed line which now looks like:
 <Mime>application/x-cpio</Mime>
 That information is also shown by file command with -i option (see appended
 output/file-i-5.44.txt).
 
 On Wikipedia page also another file identification type is mentioned. That
 is called Uniform Type Identifier (UTI) and mainly pushed by Apple. So here
 "public.cpio-archive" is used for the archives. I am not a prophet and can
 say which system will win. In evil case the worst system will win (like
 "VHS" versus "Beta" in video recorder format wars). And since Putin's war
 against Ukraine we shall be prepared for all evil cases. So the same applies
 for file identifications methods. I think it will be not difficult to
 implement this Apple forced identification method.
 
 The standard suffix apparently is cpio. That is expressed inside definition
 by line like:
 <Ext>CPIO</Ext>
 But i also found sample like message.cpi or cinema.cpi with 3 byte suffix
 cpi. It is not explained why, but i assume this is triggered by DOS FAT name
 limitation to 8+3 name length.
 On Unix like system you do not have the concept of file type is indicated by
 file name suffix. So here i find samples without file name suffix like
 bootlogo or pcmcia.
 But i also found samples like VOL.000.008 or VOL.000.012. Here suffix
 consist of digits. I assume that this triggered by backup procedure that
 number backup parts.
 But i am not sure about the additional name extensions. The file command
 (version 5.44) shown no file name suffix with extension option (See appended
 output/file-ext-5.44.txt). For comparison reason i also run the file format
 identification utility DROID (See https://sourceforge.net/projects/droid/).
 It does only recognize the one binary variant like in samples
 fmt-635-signature-id-960.cpio and bootlogo that are described by TrID
 command as "CPIO archive (binary)" by ark-cpio-bin.trid.xml. Here these are
 described as "CPIO" by PUID fmt/635. Here no mime type is shown. Also only
 suffix CPIO is considered as valid (EXTENSION_MISMATCH is false). So samples
 bootlogo and message.cpi are marked as "bad".
 For that non-sureness i leave in the line for name extensions the 1 with CPIO.
 
 Now comes a surprising part. For samples described by TrID as "CPIO archive
 (portable)" via ark-cpio.trid.xml the file command does a further level of
 sub classification. The describing level is expressed by phrase "ASCII cpio
 archive" (See appended output/file-5.44.txt).
 The specifications can be found for example on mentioned cpio(5) format man
 page. All starts with 5 byte string 07070. That is expressed inside
 ark-cpio.trid.xml by XML construct like:
 <Bytes>3037303730</Bytes>
 <ASCII> 0 7 0 7 0</ASCII>
 <Pos>0</Pos>
 
 The first subclass variant is described by new ark-cpio-odc.trid.xml. The
 characteristic pattern is expressed by XML construct like:
 <Bytes>303730373037</Bytes>
 <ASCII> 0 7 0 7 0 7</ASCII>
 <Pos>0</Pos>
 This variant is called "ASCII cpio archive (pre-SVR4 or odc)" by file
 command, "cpio/odc" by GNU cpio and "Cpio/Portable ASCII" by 7-Zip. So i
 mention this different naming schemes in remark line. In TrID definition
 this i now express by line like:
 <FileType>CPIO archive (portable old)</FileType>
 
 The second subclass variant is described by new ark-cpio-newc.trid.xml. The
 characteristic pattern is expressed by XML construct like:
 <Bytes>303730373031</Bytes>
 <ASCII> 0 7 0 7 0 1</ASCII>
 <Pos>0</Pos>
 This variant is called "ASCII cpio archive (SVR4 with no CRC)" by file
 command, "cpio/newc" by GNU cpio and "Cpio/New ASCII" by 7-Zip. So i mention
 this different naming schemes in remark line. In TrID definition this i now
 express by line like:
 <FileType>CPIO archive (portable new)</FileType>
 
 The third subclass variant is described by new ark-cpio-crc.trid.xml. The
 characteristic pattern is expressed by XML construct like:
 <Bytes>303730373032</Bytes>
 <ASCII> 0 7 0 7 0 2</ASCII>
 <Pos>0</Pos>
 This variant is called called "ASCII cpio archive (SVR4 with CRC)" by file
 command, "cpio/crc" by GNU cpio and "Cpio/New CRC" by 7-Zip.  So i mention
 this different naming schemes in remark line. In TrID definition this i now
 express by line like:
 <FileType>CPIO archive (portable with CRC)</FileType>
 
 If the sub class variants are used then the definition ark-cpio.trid.xml is
 not needed any more because description with more details is now already
 done by the 3 new additional definitions.
 
 With the updated and additional trid definitions now my cpio examples are
 still described but with more details (see appended output/trid-v-new.txt).
 TrID definition, some samples and output are stored in archive
 cpio_trid.zip. I hope that my updated XML files and the variants can be used
 in future version of triddefs.
 
 With best wishes
 Jörg Jenderek
 
- 
				Thanks!