Author Topic: updated lit.trid.xml for Microsoft Reader eBook + feature request  (Read 683 times)

jenderek

  • Sr. Member
  • ****
  • Posts: 375
Hello trid users,

some days for control reason i check my systems with disc visualization
tool. In my case i choose SequoiaView because i can easily add specific
colors for different name extension as ASCII text to configuration file. So
i look for big or many areas which are uncolored or gray. This means the
tool does not recognize the found extensions.
One uncolored extension is LIT. So i looked for such samples.  On my systems
these are part of a ebook collection.

So i run trid utility on these LIT examples. Some examples described
correctly as "Microsoft Reader eBook" by lit.trid.xml, but without mime
type. Some examples are only described as "Unknown!" (See appended
output/trid-v-old.txt). I verified that these are valid books by opening
samples with calibre software.

For comparison reason i also run file command (newest version 5.44) on such
samples. Here all real examples are described as "MS Windows HtmlHelp Data"
(See appended output/file-5.44.txt). With -i option is shows an mime type
(See appended output/file-i-5.44.txt).

For comparison reason i also run the file format identification utility
DROID ( See https://sourceforge.net/projects/droid/). Here all samples are
described "Microsoft Reader eBook" by PUID fmt/867 without mime type (See
appended output/droid-lit.csv).

The mentioned mime type by file command is also found on web site
extension.nirsoft.net. Apparently this is used by Microsoft Reader software
itself. So this mime type is now expressed in updated lit.trid.xml by line
like:
   <Mime>application/x-ms-reader</Mime>

So i looked what has changed. So far as i can see only one line inside
global string section has vanished. That is:
   <String>.HTM</String>

With the updated trid definition now all my real LIT examples are described
(see appended output/trid-v-news.txt). TrID definition and output are stored
in archive lit_.zip. I hope that my updated definition can be used in future
version of triddefs.

I also have a feature request. It would be nice if similar to mime type for
DROID identification a line inside definition could be added. That maybe
look like:
   <PUID>fmt/867</PUID>
Then this information should be shown by TrID command. Even more nice would
be if this expanded to an URL like:
   https://www.nationalarchives.gov.uk/PRONOM/fmt/867
And the best would be if this becomes a click-able text that open the
web-browser with that URL. So this is done by DROID tool itself.

Why do i think this is important? A similar situation exist for anti virus
software. Every company use an own naming scheme. So when you have trouble
with some suspicious file example, you must consult in doubt dozen different
pages of anti virus software to get a solid meaning about the inspected file
format. For file identifying tools we have the same problems. I use 3
different tools:
   file
   trid
   droid
Every of these tools has advantages and disadvantages. Or in other words
none is the global and universal truth showing tool.

With best wishes
Jörg Jenderek

Mark0

  • Administrator
  • Hero Member
  • *****
  • Posts: 2732
    • Mark0's Home Page
Re: updated lit.trid.xml for Microsoft Reader eBook + feature request
« Reply #1 on: December 30, 2022, 05:35:24 PM »
Thanks for the new def!
Will surely consider the addition of a <PUID> field.