Encoding Normalization

From Exterior Memory
Revision as of 01:13, 23 September 2017 by MacFreek (Talk | contribs) (Created page with "What to do if you are stuck with a file with non-printable ASCII characters. Well, you normalise the content, by either removing or converting unwanted characters. What to do...")

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

What to do if you are stuck with a file with non-printable ASCII characters. Well, you normalise the content, by either removing or converting unwanted characters.

What to do exactly depends on the input source (what's wrong with the file, if that is known) and the desired output.

This article is unfinished.

File contains control characters

Example: a file with NULL characters:



Example: a file with a Byte Order Mark (BOM):


Resolution:

 strings
 tr -cd '\11\12\15\40-\176' <  syseeprom.txt > cleansander.txt

File has unknown (non-UTF-8) encoding

Resolution:

 enca

Unwanted line endings

Unwanted normalization

Example: NFD normalized

Example: Non ascii characters


Resolution:

 iconv
 strings