Author Topic: oform.trid.xml for ONLYOFFICE form for online completion  (Read 751 times)

jenderek

  • Sr. Member
  • ****
  • Posts: 375
oform.trid.xml for ONLYOFFICE form for online completion
« on: September 18, 2022, 12:06:42 AM »
Hello trid users,

Some days ago i tried an alternative office suite to escape from Microsoft,
because mono-pols are bad and formally forbidden. But hey, if you are too
big, than you never get really punished.  That suite is called ONLYOFFICE
and can be found at web site onlyoffice.com. It is said that the software is
compatible to Microsoft office. Modern Word documents use file name
extension DOCX. Such samples cam be read and written by ONLYOFFICE. Such
examples are described as "Word Microsoft Office Open XML Format document"
by docx.trid.xml.

Among other standard file types ONLYOFFICE offers 2 own file types. One has
suffix OFORM and is called in German variant "ONLYOFFICE-Form für Online

So i run trid utility on my OFORM examples. All are described correctly with
low rate generic as "ZIP compressed archive" by ark-zip.trid.xml with mime
type application/zip. All examples are described "wrong" ( suffix DOCX
instead of OFORM) as "Word Microsoft Office Open XML Format document" by
docx.trid.xml (See appended output/trid-v-old.txt).

For comparison reason i check these examples by file command utility. When
running file command (version 5.43). Here all examples are also described
generic as "Zip archive data" (See appended output/file-k-5.43.txt) with
application/zip mime type (See appended output/file-ki-5.43.txt).  86 of 154
examples are also described more specific as "Microsoft Word 2007+". For
these examples the same wrong extension DOCX and mime type as by TrID is
shown (See appended file-msooxml-ext-5.43.txt file-msooxml-i-5.43.txt in
output).

For comparison reason i also run the file format identification utility
DROID ( See https://sourceforge.net/projects/droid/). This describes all
examples also as "Microsoft Word for Windows" and with version "2007
onwards" by PUID fmt/412. But software complains about file name suffix
OFORM instead of DOCX.

The identifications by all tools as "new" Microsoft Word is not
surprising. As described on a page on their web site OFORM is based on
DOCX. This fact is represented in new definition oform.trid.xml by lines
like:
 <FileType>ONLYOFFICE form for online completion</FileType>
 <RefURL>
 https://www.onlyoffice.com/blog/2022/01/7-interesting-facts-about-onlyoffice-forms/
 </RefURL>

Unfortunately they do not describe what exactly is the difference to
Microsoft DOCX. In my used program (variant for Windows version 7.1.1.57)
they registered the file format as ASC.Oform, but the connection to OFORM
file name suffix is missing. Also no mime type is found in Windows registry
for that file type. That are too many easy elementary errors in my
opinion. So blame on heads of ONLYOFFICE team. Because OFORM is "different"
from Microsoft DOCX that mime type can not be used. But because OFORM are
ZIP archives at least the mime of ZIP should be at least applied. So i do
this by line like:
   <Mime>application/zip</Mime>

After running tridscan to generate definition oform.trid.xml i looked what
XML construct are created and try to understand it.
I would like to reduce the XML constructs, but i was not able to do this
because the ONLYOFFICE team does not explain what exactly is their additions
or difference from DOCX format. So i do not know if XML constructs are
always true or just triggered by lucky circumstances. So i keep at the
moment all XML constructs.

Because OFORM are ZIP container we can inspect such examples by suited
unpacking tools like 7-zip for example. There we see that all archive
members have a time stamp of midnight of 1 January 1980 (See appended
output/7z-l.txt). But i do not know if this is a bug or feature.  So 2 bytes
for modification time for first member at offset 8 are nil. So 2 byte
modification DOS date for first member at offset 10 are byte sequence
2100. So this was expressed by first XML constructs which looks like:
   <Bytes>504B030414000000000000002100</Bytes>
   <ASCII> P K . . . . . . . . . . !</ASCII>
   <Pos>0</Pos>
That is different from construct for generic ZIP archive by ark-zip.trid.xml
which looks like:
   <Bytes>504B0304</Bytes>
   <ASCII> P K</ASCII>
   <Pos>0</Pos>

With the new trid definition now all my OFORM examples are described now
more precisely (see appended output/trid-v-new.txt). TrID definition and
output are stored in archive oform_.zip. I hope that my XML file can be used
in future version of triddefs. I also hope that others can refine the
definition.

There exist another second file format, which is specific for only
office. That is DOCXF. There exist the same problematic as seen for
OFORM. So i will try to handle this in a future session.

With best wishes
Jörg Jenderek


Mark0

  • Administrator
  • Hero Member
  • *****
  • Posts: 2731
    • Mark0's Home Page
Re: oform.trid.xml for ONLYOFFICE form for online completion
« Reply #1 on: September 19, 2022, 09:02:02 PM »
Thanks! I scanned some other OFORM files and refined the def a bit.