Mark0's Forum
		Software => TrID File Identifier => Topic started by: jenderek on May 14, 2021, 12:34:46 AM
		
			
			- 
				Hello trid users,
 
 some days ago i wanted to transfer some contact from an old PC system to new
 Android system. These contacts are stored as vCard with file name extension
 VCF.
 
 Because i had some trouble with some contacts i check all my VCF examples by
 running TrID command on such samples and i get an output.
 
 Most examples like foo.b_w are described by vcf-v3.trid.xml correctly as
 "vCard - Business Card". But a few examples like $R00H5MZ.vcf and
 unknown-2.1.vcf are only described as "Unknown!" (See appended
 output/trid-v-old.txt).
 
 For comparison reason i also run the file utility (version 5.40).  This
 describes the recognized examples as "vCard visiting card" and the other
 examples " ASCII text, with CRLF line terminators" (see appended
 output/file-5.40.txt).
 
 On Wikipedia page about vCard is written that all vCards begin with BEGIN:VCARD.
 That is wrong. Most examples start in this way, but the unrecognized examples
 do not do this.
 
 In definition RFC 2425 for older vCard Version 2.1 is written that type
 names and parameter names are case insensitive (e.g., the type name "fn" is
 the same as "FN" and "Fn"). If i understand this right then Vcard could even
 start with a phrase like BeGiN:vCARd, but in real world beside common used
 up cased variant i only found sometimes all low cased variant.
 
 So i generate a TrID definition vcf-lowcase.trid.xml for such examples. The
 low cases first line is now expressed by XML construct inside front block
 section like:
 
 <Bytes>626567696E3A76636172640D0A</Bytes>
 <ASCII> b e g i n : v c a r d . .</ASCII>
 <Pos>0</Pos>
 
 and in global strings section by three lines like:
 
 <String>VERSION</String>
 <String>BEGIN</String>
 <String>VCARD</String>
 
 In TrID definition a page on imc.org was used. That was expressed by line
 like:
 
 <RefURL>http://www.imc.org/pdi/</RefURL>
 
 The web site still exist, but the home of the Internet Mail Consortium has
 closed down in 2002. So no information about vCard can be find there any
 more.
 
 Some information about the vCard image file format can be found on Wikipedia
 web site. This is now expressed by updated reference URL line like:
 
 <RefURL>https://en.wikipedia.org/wiki/VCard</RefURL>
 
 In definition RFC 6350 for vCard Version 4.0 is written that the content
 entity MUST begin with the BEGIN property with a value of "VCARD" and the
 value is case-insensitive and based on experience with vCard 3
 inter operability, it is RECOMMENDED that property and parameter names be
 upper-case on output. So that is was is described by current definition
 vcf.trid.xml by XML construct inside front block section like:
 
 <Bytes>424547494E3A</Bytes>
 <ASCII> B E G I N :</ASCII>
 <Pos>0</Pos>
 
 and in global strings section by two lines like:
 
 <String>BEGIN</String>
 <String>VCARD</String>
 
 We see that string VERSION does not exist in global string section. That is
 OK because according to RFC 6350 in earlier versions of vCard this property
 can be absent. I myself do not find an example without version.
 
 And on the German version of vCard site on Wikipedia more details can be
 found. So apparently at the moment there exist three different versions (2.1
 3.0 4.0). That version information is also displayed by file command for
 most examples (see appended output/file-5.40.txt). So i run tridscan on such
 examples to generate 3 variants vcf-v2.trid.xml, vcf-v3.trid.xml
 vcf-v4.trid.xml.
 
 The first definition contains in front block a XML construct like
 
 <Bytes>424547494E3A56434152440D0A</Bytes>
 <ASCII> B E G I N : V C A R D</ASCII>
 <Pos>0</Pos>
 
 and in Global Strings section three lines like:
 
 <String>VERSION:2.1</String>
 <String>BEGIN</String>
 <String>VCARD</String>
 
 The line with version was manually expanded by myself according to RFC 6350.
 For earlier versions like 2.1 of vCard allowed the VERSION property to be
 placed anywhere in the vCard object.
 
 One of my 334 version 2.1 examples is not described by that definition. That was
 basic_vcard_addressbook.vcf found inside the sources of Thunderbird ( at
 least for version 60.5.3 and 78.10.1). When i inspect this example i see that only line
 feed character 0x0A is used for terminating the lines, but according to RFC
 6350 individual lines within vCard are delimited by the line break, which is
 a CRLF sequence (U+000D followed by U+000A). So this non standard samples is
 still described by vcf.trid.xml but not by vcf-v2.trid.xml (See appended
 output/trid-v-new.txt).
 
 Unfortunately i found only 2 samples (example-4.vcf vcard4.0.vcf) for version
 four. So i manually clean up vcf-v4.trid.xml generated by tridscan according
 to documentation.  So in front block one shortened XML construct remain
 like:
 
 
 <Bytes>424547494E3A56434152440D0A56455253494F4E3A342E300D0A</Bytes>
 <ASCII> B E G I N : V C A R D . . V E R S I O N : 4 . 0 . .</ASCII>
 <Pos>0</Pos>
 
 And in Global strings section the three required lines remain like:
 
 <String>VERSION</String>
 <String>BEGIN</String>
 <String>VCARD</String>
 
 According to RFC 6350 the version property MUST must appear immediately
 after BEGIN:VCARD and the value MUST be "4.0". That is the main difference
 when comparing the variants. So non standard version 4 example test.vcf is
 not recognized by that definition and it is still described by generic
 vcf.trid.xml ( See appended output/trid-new.txt)
 
 
 For version 3 the situation is a little bit unclear. In RFC 6350 is written
 that earlier versions of vCard allowed Version property to be placed
 anywhere in the vCard object, but on German Wikipedia page about vCard is
 written that VERSION must directly follow the BEGIN property, except for
 vCard 2.1. That seems to common used. So in 487 of my inspected v3 examples
 there VERSION property occur on second line. That is expressed by XML
 construct like:
 
 <Bytes>424547494E3A56434152440D0A56455253494F4E3A332E300D0A</Bytes>
 <ASCII> B E G I N : V C A R D . . V E R S I O N : 3 . 0</ASCII>
 <Pos>0</Pos>
 
 and in global strings section by three lines like:
 
 <String>VERSION</String>
 <String>BEGIN</String>
 <String>VCARD</String>
 
 Three "non-standard" examples like std.vcf are not recognized by this definition, but are
 still described by vcf.trid.xml but not by vcf-v3.trid.xml (See appended
 output/trid-v-new.txt). Just for control reasons i create a definition
 vcf-v3-nonstandard.trid.xml with XML construct like:
 
 <Bytes>424547494E3A56434152440D0A</Bytes>
 <ASCII> B E G I N : V C A R D</ASCII>
 <Pos>0</Pos>
 
 and in global strings section i changed one line to:
 
 <String>VERSION:3.0</String>
 
 By this definition these 3 examples are also recognized ( See appended
 output/trid-v.txt)
 
 
 All my inspected examples have only VCF file name extension. On my system i
 found no example with vCard extension. So at the moment this was expressed
 in variant definitions by line like:
 
 <Ext>VCF</Ext>
 
 Now with the definition variants all my examples are described and also with
 information about file format version is shown (See appended output/trid-v-new.txt).
 
 TrID definition, some examples and output are stored in archive vcf.zip. I
 hope that my 4 XML files can be used in future version of triddefs.
 
 With the identification of the VCF examples i was able to solve my
 problem. One software handle only version 2 and another handle only version
 3 variant.  But instead of clear error message like "i can only import
 version x" i got an error message like "this is not a valid VCF".
 
 With best wishes
 Jörg Jenderek
 
- 
				Thanks!