Mark0's Forum
Software => TrID File Identifier => Topic started by: m^2 on December 29, 2006, 11:26:15 PM
-
There's no way to identify file that has a couple of bytes. But in many cases 100 is enough. Do you have any idea, by your knowledge / experience, where is the lower boundary for identifiable file size?
I don't call TrID for files under 16 bytes, but, for permormance reasons, it's important for me to make this limit as high as possible - without loosing correctness.
I could look for the shortest signature in your definitions, but this doesn't give me certainty that in future there'll be no shorter one, so it's rather generic, not technical question.
-
Not an easy question. Some filetypes can be identified by very little patterns: Lua compiled scripts, for example, take just 4 bytes (sure that's just the unique header, the smallest real script will be larger). Same for RIFF containers. I think that some filetypes with just a couple of bytes as header, and another two as a minimum content could exist, so probably 4 bytes will probably be a "right" minimum. It could also be configurable, anyway.
-
Not an easy question. Some filetypes can be identified by very little patterns: Lua compiled scripts, for example, take just 4 bytes (sure that's just the unique header, the smallest real script will be larger). Same for RIFF containers. I think that some filetypes with just a couple of bytes as header, and another two as a minimum content could exist, so probably 4 bytes will probably be a "right" minimum. It could also be configurable, anyway.
Are you sure that probability of incorrect identification of 32-bit file is low?
I don't think that configurability is a good idea - there IS a lower limit (bigger than 0 ;) ) and no configuration changes it.
ADDED:
Let's say that "low" means no more than 60% ;)
-
Are you sure that probability of incorrect identification of 32-bit file is low?
Sorry, I'm not sure to understand what you mean (you know, my English...)... ?
-
In another words:
Are you sure that when TrID gets a 32-bit file and says that it's i.e. compiled LUA script, I can believe it really is?
-
For Lua, yes. Some filetypes have pretty strong characterization.
With others, you can't be too sure even if you have a couple of MB availabe.
-
Thank you, I'll have to lower the limit I use.