Hello trid users,
some days ago i wanted to transfer some contact from an old PC system to new
Android system. These contacts are stored as vCard with file name extension
VCF.
Because i had some trouble with some contacts i check all my VCF examples by
running TrID command on such samples and i get an output.
Most examples like foo.b_w are described by vcf-v3.trid.xml correctly as
"vCard - Business Card". But a few examples like $R00H5MZ.vcf and
unknown-2.1.vcf are only described as "Unknown!" (See appended
output/trid-v-old.txt).
For comparison reason i also run the file utility (version 5.40). This
describes the recognized examples as "vCard visiting card" and the other
examples " ASCII text, with CRLF line terminators" (see appended
output/file-5.40.txt).
On Wikipedia page about vCard is written that all vCards begin with BEGIN:VCARD.
That is wrong. Most examples start in this way, but the unrecognized examples
do not do this.
In definition RFC 2425 for older vCard Version 2.1 is written that type
names and parameter names are case insensitive (e.g., the type name "fn" is
the same as "FN" and "Fn"). If i understand this right then Vcard could even
start with a phrase like BeGiN:vCARd, but in real world beside common used
up cased variant i only found sometimes all low cased variant.
So i generate a TrID definition vcf-lowcase.trid.xml for such examples. The
low cases first line is now expressed by XML construct inside front block
section like:
<Bytes>626567696E3A76636172640D0A</Bytes>
<ASCII> b e g i n : v c a r d . .</ASCII>
<Pos>0</Pos>
and in global strings section by three lines like:
<String>VERSION</String>
<String>BEGIN</String>
<String>VCARD</String>
In TrID definition a page on imc.org was used. That was expressed by line
like:
<RefURL>
http://www.imc.org/pdi/</RefURL>
The web site still exist, but the home of the Internet Mail Consortium has
closed down in 2002. So no information about vCard can be find there any
more.
Some information about the vCard image file format can be found on Wikipedia
web site. This is now expressed by updated reference URL line like:
<RefURL>
https://en.wikipedia.org/wiki/VCard</RefURL>
In definition RFC 6350 for vCard Version 4.0 is written that the content
entity MUST begin with the BEGIN property with a value of "VCARD" and the
value is case-insensitive and based on experience with vCard 3
inter operability, it is RECOMMENDED that property and parameter names be
upper-case on output. So that is was is described by current definition
vcf.trid.xml by XML construct inside front block section like:
<Bytes>424547494E3A</Bytes>
<ASCII> B E G I N :</ASCII>
<Pos>0</Pos>
and in global strings section by two lines like:
<String>BEGIN</String>
<String>VCARD</String>
We see that string VERSION does not exist in global string section. That is
OK because according to RFC 6350 in earlier versions of vCard this property
can be absent. I myself do not find an example without version.
And on the German version of vCard site on Wikipedia more details can be
found. So apparently at the moment there exist three different versions (2.1
3.0 4.0). That version information is also displayed by file command for
most examples (see appended output/file-5.40.txt). So i run tridscan on such
examples to generate 3 variants vcf-v2.trid.xml, vcf-v3.trid.xml
vcf-v4.trid.xml.
The first definition contains in front block a XML construct like
<Bytes>424547494E3A56434152440D0A</Bytes>
<ASCII> B E G I N : V C A R D</ASCII>
<Pos>0</Pos>
and in Global Strings section three lines like:
<String>VERSION:2.1</String>
<String>BEGIN</String>
<String>VCARD</String>
The line with version was manually expanded by myself according to RFC 6350.
For earlier versions like 2.1 of vCard allowed the VERSION property to be
placed anywhere in the vCard object.
One of my 334 version 2.1 examples is not described by that definition. That was
basic_vcard_addressbook.vcf found inside the sources of Thunderbird ( at
least for version 60.5.3 and 78.10.1). When i inspect this example i see that only line
feed character 0x0A is used for terminating the lines, but according to RFC
6350 individual lines within vCard are delimited by the line break, which is
a CRLF sequence (U+000D followed by U+000A). So this non standard samples is
still described by vcf.trid.xml but not by vcf-v2.trid.xml (See appended
output/trid-v-new.txt).
Unfortunately i found only 2 samples (example-4.vcf vcard4.0.vcf) for version
four. So i manually clean up vcf-v4.trid.xml generated by tridscan according
to documentation. So in front block one shortened XML construct remain
like:
<Bytes>424547494E3A56434152440D0A56455253494F4E3A342E300D0A</Bytes>
<ASCII> B E G I N : V C A R D . . V E R S I O N : 4 . 0 . .</ASCII>
<Pos>0</Pos>
And in Global strings section the three required lines remain like:
<String>VERSION</String>
<String>BEGIN</String>
<String>VCARD</String>
According to RFC 6350 the version property MUST must appear immediately
after BEGIN:VCARD and the value MUST be "4.0". That is the main difference
when comparing the variants. So non standard version 4 example test.vcf is
not recognized by that definition and it is still described by generic
vcf.trid.xml ( See appended output/trid-new.txt)
For version 3 the situation is a little bit unclear. In RFC 6350 is written
that earlier versions of vCard allowed Version property to be placed
anywhere in the vCard object, but on German Wikipedia page about vCard is
written that VERSION must directly follow the BEGIN property, except for
vCard 2.1. That seems to common used. So in 487 of my inspected v3 examples
there VERSION property occur on second line. That is expressed by XML
construct like:
<Bytes>424547494E3A56434152440D0A56455253494F4E3A332E300D0A</Bytes>
<ASCII> B E G I N : V C A R D . . V E R S I O N : 3 . 0</ASCII>
<Pos>0</Pos>
and in global strings section by three lines like:
<String>VERSION</String>
<String>BEGIN</String>
<String>VCARD</String>
Three "non-standard" examples like std.vcf are not recognized by this definition, but are
still described by vcf.trid.xml but not by vcf-v3.trid.xml (See appended
output/trid-v-new.txt). Just for control reasons i create a definition
vcf-v3-nonstandard.trid.xml with XML construct like:
<Bytes>424547494E3A56434152440D0A</Bytes>
<ASCII> B E G I N : V C A R D</ASCII>
<Pos>0</Pos>
and in global strings section i changed one line to:
<String>VERSION:3.0</String>
By this definition these 3 examples are also recognized ( See appended
output/trid-v.txt)
All my inspected examples have only VCF file name extension. On my system i
found no example with vCard extension. So at the moment this was expressed
in variant definitions by line like:
<Ext>VCF</Ext>
Now with the definition variants all my examples are described and also with
information about file format version is shown (See appended output/trid-v-new.txt).
TrID definition, some examples and output are stored in archive vcf.zip. I
hope that my 4 XML files can be used in future version of triddefs.
With the identification of the VCF examples i was able to solve my
problem. One software handle only version 2 and another handle only version
3 variant. But instead of clear error message like "i can only import
version x" i got an error message like "this is not a valid VCF".
With best wishes
Jörg Jenderek