Encoding Normalization
From Exterior Memory
Revision as of 01:13, 23 September 2017 by MacFreek (Talk | contribs) (Created page with "What to do if you are stuck with a file with non-printable ASCII characters. Well, you normalise the content, by either removing or converting unwanted characters. What to do...")
What to do if you are stuck with a file with non-printable ASCII characters. Well, you normalise the content, by either removing or converting unwanted characters.
What to do exactly depends on the input source (what's wrong with the file, if that is known) and the desired output.
Contents
File contains control characters
Example: a file with NULL characters:
Example: a file with a Byte Order Mark (BOM):
Resolution:
strings
tr -cd '\11\12\15\40-\176' < syseeprom.txt > cleansander.txt
File has unknown (non-UTF-8) encoding
Resolution:
enca
Unwanted line endings
Unwanted normalization
Example: NFD normalized
Example: Non ascii characters
Resolution:
iconv
strings