For example, (two spaces) gets picked up as hexadecimal, because this format is checked first, but it is likely just meant to be plain ASCII.
Some sort of scoring system might be useful. On a scale of 0 to 1:
- 0.0 - definitely not this format. For example, it can't really be hex if it has a
W in it.
- between 0 and 1 - likelyhood of this format.
abc123, abc 123, and ab c1 23 can both be interpreted as hex or ASCII. Intuition says the one with pairs is more likely hex than the others- find a way to capture this algorithmically.
- 1.0 - can only be this format. (not sure if this is really possible)
CyberChef's "magic" mode might also be inspiring- there is some scoring done there.