Author Topic: Feature request: Configurable result output format  (Read 10980 times)

wolfy

  • Newbie
  • *
  • Posts: 2
Feature request: Configurable result output format
« on: January 26, 2009, 06:41:10 AM »
Hi all,

I intended using trid to search for certain file types in Opera's (browser) cache and was very happy to see that the files where identified correctly (Opera removed extensions from the filenames in the cache some time ago, which is really dumb, but nothing can be done about it until they change their mind...).

Then I came across a drawback: I simply want to search for e.g. flash files and use "grep" to further process the found files:
$ trid *
File: opr1P0VY
100.0% (.FLV) Flash Video (4000/1)

File: opr1P0ZX
100.0% (.FLV) Flash Video (4000/1)
...

$ trid * | grep FLV
100.0% (.FLV) Flash Video (4000/1)
100.0% (.FLV) Flash Video (4000/1)
...

Not good, because "grep" outputs only the lines matching FLV, which do not contain the file name.

Matching the line with the file name is impossible, because an extension is not present (that's why I need trid in the first place ;o) and the names are synthetic and do not carry any useful information.

At first I expected that using -ae and grepping for ".flv" would do the trick, until I realized that -ae permanently renames the files in place, which is not a very good idea in a cache... ;o)

Of course I could alternatively either
*) use a more complex script to catch the two-line output and parse the filename appropriately, or
*) use -ae and remove the extensions again afterwards (e.g. with rename 's/\..*$//' *)

Both solutions are far from elegant however.
trid just does not lend itself very smoothly to automated processing of output, as far as I can see it in the moment - perhaps I have overlooked something?

But what I think is really needed is an option to influence the output format, such as "put the output on a single line":

$ trid -1 *
100.0% (.FLV) Flash Video (4000/1)  File: opr1P0VY
100.0% (.FLV) Flash Video (4000/1)  File: opr1P0ZX
...
which would allow for easy grepping (BTW, "-1" is "dash one", not "dash L" ;o)

Still, it would be necessary to use "sed" or "cut" to get the filename only, so a more flexible approach is needed - how about a syntax similar to Unix' "ps" command to select single output fields:
-o (output) [optional] accepts a list of pre-defined column names ("ps" also supports header renaming, which is not really necessary here because there are no headers, but this could be added as well later)
-f (filter) [optional] selects which criteria shall be used to determine whether the line should be included in the output (perhaps this should allow a simple grep-like syntax?):

$ trid -o type,ext,name -f Flash *
"Flash Video" .FLV opr1P0VY
"Flash Video" .FLV opr1P0ZX
...
(see below for the necessity of "")

$ trid -o name -f .FLV *
opr1P0VY
opr1P0ZX
...
$ trid -o path -f video/x-flv *
/home/wolfy/.opera/cache4/opr1P0VY
/home/wolfy/.opera/cache4/opr1P0ZX
...

Note the use of either "Flash", ".FLV" or "video/x-flv" as filter criterion - the whole line is matched to check this.
Speaking of "video/x-flv": Mime types should be supported as well!

Alternatively, dedicated options could be implemented:
-t for a type name (like "Flash Video")
-e for an extension (like "FLV")
-m for a mime type

$ trid -o name,ext -m video/x-flv *
opr1P0VY .FLV
opr1P0ZX .FLV

Another idea: Using "+" instead of "," in the output list could be used for concatenation:

$ trid -o path+ext -m video/x-flv *
/home/wolfy/.opera/cache4/opr1P0VY.FLV
/home/wolfy/.opera/cache4/opr1P0ZX.FLV

Note that now there is no whitespace between the name and the extension - we have a complete canonical file name including extension, which was not present before, but without renaming the file itself as is the case with "-ae".
To allow upper and lower case extensions, two different IDs for extensions could be implemented:
name+ext    ->   opr1P0VY.flv
name+EXT   ->   opr1P0VY.FLV

Additionally, special care must be taken to escape whitespace inside e.g. filenames or type identifiers, which is not the case currently, as it is not *really* necessary, because the output format is fixed.
Still this is not very elegant even in the moment when e.g. trying to process the output further (e.g. with "cut" or "sed").

With -o and the possibility to change the order of columns, proper escaping is important, because whitespace separates the columns:

$ trid -o name,type *
"Some longer filename" "Flash Video"

or

Some\ longer\ filename Flash\ Video

which makes the only unescaped whitespace character the delimiter between the two strings.
Speaking of which: an additional option "-d" could specify a user defined delimiter:

$ trid -d "::" -o name,type *
Some longer filename::Flash Video

This means that not whitespace, but the current delimiter character (defaulting to whitespace) is escaped in the output :)


BTW, not using -1 or -o and/or -f should make trid behave exactly like it does now, in order to not brake legacy scripts.

Sorry for the overly long post, hope I managed to bring my points across  ;D

Mark0

  • Administrator
  • Hero Member
  • *****
  • Posts: 2743
    • Mark0's Home Page
Re: Feature request: Configurable result output format
« Reply #1 on: January 26, 2009, 11:23:52 AM »
Hi!

trid just does not lend itself very smoothly to automated processing of output, as far as I can see it in the moment - perhaps I have overlooked something?

No, that's right. It just wasn't a priority, or an interesting thing, when TrID was "born".
But it's surely something that's worth considering now, so most probably a next version will be improved in this regard.

Thanks for the detailed post & suggestions,
Bye!

wolfy

  • Newbie
  • *
  • Posts: 2
Re: Feature request: Configurable result output format
« Reply #2 on: January 26, 2009, 03:14:41 PM »
Hi!

trid just does not lend itself very smoothly to automated processing of output, as far as I can see it in the moment - perhaps I have overlooked something?

No, that's right. It just wasn't a priority, or an interesting thing, when TrID was "born".
But it's surely something that's worth considering now, so most probably a next version will be improved in this regard.

Thanks for the detailed post & suggestions,
Bye!


Sounds good!
Perhaps it is from my more *NIXy background that all my programs always take automated post processing into account, even if they are not used on the shell normally  ;)
For a quick fix perhaps it is possible to implement the one-line-output option ("-1") soon? (this should not take too much work I think).

The other possible improvements surely take some more consideration - I'm sure I could come up with lots of additional ideas if necessary  ;D

Greetings & keep up the great work!


PS: BTW, I almost forgot... If anyone does need the cited features, on Linux/Unix systems you can do something along the lines of:

trid * | sed '/File:.*[^\.]$/N;s/\n */ /;/^$/d' | grep "\.FLV" | cut -d" " -f2

I use this for my Opera cache: "sed" looks for the line starting with "File:" and puts it into a buffer to have access to the newline character ("/File:.*[^\.]$/N"), then removes the newline ("s/\n */ /") and the empty lines ("/^$/d"), which yields such lines (like my suggested "one-line" option "-1" would):

File: opr1P7HZ 100.0% (.FLV) Flash Video (4000/1)
File: opr1P6IC 100.0% (.TXT) Text - UTF-8 encoded (3000/1)
File: opr1P6IP 60.0% (.GIF) GIF89a Bitmap (6000/1)
File: opr1P5AR 100.0% (.FLV) Flash Video (4000/1)

The following "grep" filters for the desired type (Flash Video in this case) and then "cut"s out the second field (the filename), using space " " as separator, resulting in a list of file names:

opr1P7HZ
opr1P5AR

Just replace the "\.FLV" pattern by anything you want to filter for (e.g. the long type name like "GIF89a Bitmap" or the percentage etc.). You can also chain more filter commands, like

... | grep "GIF89a Bitmap" | grep "59\..%"

which filters all "59.1%" to "59.9%". Be careful to escape all special characters like "." etc.: "\." means the decimal point, the second "." means "exactly one arbirary character after the decimal point":

File: opr1P6NV 59.7% (.GIF) GIF89a Bitmap (6000/1)
File: opr1P6UQ 59.9% (.GIF) GIF89a Bitmap (6000/1)
File: opr1P6W3 59.9% (.GIF) GIF89a Bitmap (6000/1)

Alternatively, you could create one grep expression for this, but this tends to be more complicated as you have to take the order of the information into account (percentage is in front of the type info):

... | grep "59\..%.* GIF89a Bitmap"

yields the same result as above.

Hope this is of use for someone!  :)


EDIT:

I originally wrote                  

grep ".FLV"

which should of course read   

grep "\.FLV"

"." is a special character in the pattern and must be escaped, otherwise it would match an arbitrary character and not the dot!

Alternatively, the option -F treats the pattern as fixed string, "." is not read as special character any more, therefore no "\" necessary:

grep -F ".FLV"

Changed that in the text too.
Note to self: TEST regular expression patterns before posting them  ;)
« Last Edit: January 27, 2009, 02:47:51 PM by wolfy »