Author Topic: replacements for cvd.trid.xml for Clam AntiVirus *.cvd,cld,cud,info  (Read 3696 times)

jenderek

  • Sr. Member
  • ****
  • Posts: 369
Hello,

i saw that only most popular Clam AntiVirus files are detected by
cvd.trid.xml (see appended output/trid-old.txt)

In Clam AntiVirus User Manual found as clamdoc.pdf in chapter 6.4
about page 35 the virus database format is described. The header is a
string with colon separated fields. According to that documentation
first field is ClamAV-VDB. So create a generic TrID definition
cvd-clamav.trid.xml matching all Clam AntiVirus files by XML
construct:

   <Pattern>
      <Bytes>436C616D41562D5644423A</Bytes>
      <ASCII> C l a m A V - V D B :</ASCII>
      <Pos>0</Pos>
   </Pattern>

As reference URL use Wikipedia page concerning ClamAV by line:
   <RefURL>https://en.wikipedia.org/wiki/Clam_AntiVirus</RefURL>
Then by this new definition file also examples like db-3.info are
identified.

During database update process partial download virus database get a
name like:
   clamav-3b59cf0efc1bfbb5781568327134ce3b.000010ac.clamtmp .
So also add clamtmp to file name extension by line:
   <Ext>CVD/CLD/CUD/CLAMTMP/INFO</Ext>

Such files do not exist on normal end user computer. But such files
are created for example during virus database update process or when
building own signature database.

If you build your own virus signature like example db-3.cud then
digital signature field is probably not used. That is then marked by X
character. Also no build time is needed. So field is empty. This is
visible as string part with 2 colons like ::. So such virus database
is not recognized by current trid definition. According to document
header is 512-bytes long. Fields are not fixed size. If fields like
builder name filled with "google" string are not extreme long header
contains at the end padding bytes. In all inspected cases these are
space characters. This may be expressed by XML construct like:

   <Pattern>
      <Bytes>20202020</Bytes>
      <Pos>508</Pos>
   </Pattern>

With this additional construct all ClamAV Virus Database (also *.cul)
should be identified by cvd-cud.trid.xml.

The official database contain as second field build time which looks
like:
   08 Nov 2006 16-10 +0000
The first 3 spaces separating words are expressed by XML construct
   <Pattern>
      <Bytes>20</Bytes>
      <Pos>13</Pos>
   </Pattern>
   <Pattern>
      <Bytes>20</Bytes>
      <Pos>17</Pos>
   </Pattern>
   <Pattern>
      <Bytes>20</Bytes>
      <Pos>22</Pos>
   </Pattern>

Then add for minus character separating hour and minutes by XML
construct:
   <Pattern>
      <Bytes>2D</Bytes>
      <ASCII> -</ASCII>
      <Pos>25</Pos>
   </Pattern>

For space before time zone offset also add:
   <Pattern>
      <Bytes>20</Bytes>
      <Pos>28</Pos>
   </Pattern>

Minutes part of time zone found are 00. Together with field separating
colon this is expressed by XML construct:
   <Pattern>
      <Bytes>30303A</Bytes>
      <ASCII> 0 0 :</ASCII>
      <Pos>32</Pos>
   </Pattern>

Then after header with meta data at offset 512 the real virus
database starts. When looking in ark-gz.trid.xml we see that by
current trid definition file only gzip compressed variant is
described by XML construct:
   <Pattern>
      <Bytes>1F8B08</Bytes>
      <Pos>512</Pos>
   </Pattern>

Besides CVD file name extension also CLD is found. This is expressed
now by line:
   <Ext>CVD/CLD</Ext>
So old cvd.trid.xml should be replaced by cvd-gz.trid.xml.

According to PDF document the other variant contains a tar archive at
offset 512. This can also be seen in output of patched file(1) command
(see output/file.txt).

The padding null bytes of TAR header are expressed by construct like:

   <Pattern>
      <Bytes>00000000</Bytes>
      <Pos>1012</Pos>
   </Pattern>

When inspecting these tar part we see that inspected archives contain
a COPYING member and also some *.INFO files. This is expressed in
global string section by XML construct:
   <String>COPYING</String>
   <String>.INFO</String>

With these modifications also second variants are recognized by
cld-tar.trid.xml definition file. File name extension is now always
"CLD". This is expressed by line:
   <Ext>CLD</Ext>

This variant seem to be created if the configuration file
freshclam.conf contains lines like:
   CompressLocalDatabase no
   ScriptedUpdates yes

If freshclam shall update ClamAV database, it seems to try to do this
by downloading small diff files if possible. Then it applies these
patches to build updated database.

With variant definition files all inspected examples are now detected
correct (See appended output/trid-new.txt).

TrID definition, some examples and output are stored in archive cvd_cld.zip .
I hope that my 4 XML files can be used in future version of triddefs.

With best wishes
J?rg Jenderek

Mark0

  • Administrator
  • Hero Member
  • *****
  • Posts: 2682
    • Mark0's Home Page
Re: replacements for cvd.trid.xml for Clam AntiVirus *.cvd,cld,cud,info
« Reply #1 on: November 05, 2017, 11:40:14 PM »
At the moment I think that I'll just make the current CVD definition more generic, but I'll keep the more specific definitions in mind.
Thanks as always!