Author Topic: updated onepkg.trid.xml for Microsoft OneNote Package  (Read 774 times)

jenderek

  • Sr. Member
  • ****
  • Posts: 375
updated onepkg.trid.xml for Microsoft OneNote Package
« on: September 06, 2022, 02:36:56 AM »
Hello trid users,

Some days ago i run the cleaning tool czkawka found on
https://qarmin.github.io/czkawka/. One menu item concerns bad
extensions. After running tool i looked in saved file list
results_bad_extensions.txt for bad extension examples.

One listed extension is ONEPKG. So i run trid utility on my ONEPKG
examples. many are described correctly as "Microsoft OneNote Package" by
onepkg.trid.xml. All are described with lower rate generic as "Microsoft
Cabinet Archive" by ark-cab.trid.xml (See appended output/trid-v-old.txt)

For comparison reason i check these examples by file command utility. When
running file command (version 5.42) here all examples are also described
generic as "Microsoft Cabinet archive data" (See appended
output/file-5.42.txt) and with mime type application/vnd.ms-cab-compressed
(See appended output/file-i-5.42.txt).

For comparison reason i also run the file format identification utility
DROID ( See https://sourceforge.net/projects/droid/). This identifies few
examples also only generic as "Windows Cabinet File" with mime type
application/vnd.ms-cab-compressed by PUID x-fmt/414. But it complains about
file suffix ONEPKG.  Many examples are described as "Microsoft OneNote
Package File" by fmt/987 (See appended output/droid-onepkg.csv).  This
utility identifies onepkg by looking for file name extension onetoc inside
the first 2 KB blocks.

Because ONEPKG are CAB archives we can inspect such archives by suited
unpacking tools like 7z for examples. With list option we get output like
7z-l.txt or with more details by additional slt option 7z-l-slt.txt.

After running tridscan to update definition onepkg.trid.xml i looked what
has changed and why. The second XML construct looked like:
      <Pattern>
         <Bytes>000000002C0000000000000003010100</Bytes>
         <ASCII> . . . . ,</ASCII>
         <Pos>12</Pos>
      </Pattern>
For many examples the first member entry start at offset 0x2c (stored at
offset 16 as 4 byte little endian integer), but for the unrecognized
examples like ONGuide.onepkg and Notebook03.onepkg the first entry starts at
offset 0x44 because in head are more dozen bytes. So these now becomes like:
      <Pattern>
         <Bytes>00000000</Bytes>
         <Pos>12</Pos>
      </Pattern>
      <Pattern>
         <Bytes>0000000000000003010100</Bytes>
         <Pos>17</Pos>
      </Pattern>
At offset 12 4 reserved bytes are stored. Apparently these are nil.  At
offset 20 4 reserved bytes are stored. Apparently these are nil.  At offset
24 cabinet file format version is stored as 2 bytes. Currently major version
is 1 and minor version is 3.  At offset 26 the number of CFFOLDER entries is
stored as 2 byte little endian integer. In all examples this value was 1.

Many packages contain members with names like test-onenote.one or "Open
Notebook.onetoc2". That was expressed inside Global Strings section by line
like:
      <String>NOTE</String>
But in some packages like Notebook03.onepkg and ONGuide.onepkg members does
not contain the phrase note in file name. So the above line in global
section vanished.

The current definition contain no mime type. Because ONEPKG are CAB archives
the could get that generic mime type application/vnd.ms-cab-compressed. But
when looking on website nirsoft.net for extension there an own mime type is
listed. That is expressed by line like:
      <Mime>application/msonenote</Mime>

With the updated trid definition now all my ONEPKG examples are described
(see appended output/trid-v-new.txt). TrID definition and output are stored
in archive onepkg_.zip. I hope that my XML file can be used in future
version of triddefs.

With best wishes
Jörg Jenderek

Mark0

  • Administrator
  • Hero Member
  • *****
  • Posts: 2731
    • Mark0's Home Page
Re: updated onepkg.trid.xml for Microsoft OneNote Package
« Reply #1 on: September 08, 2022, 02:43:13 AM »
Thanks Jörg!