[$] Working with UTF-8 in the kernel

Thursday March 28, 2019. 06:34 PM , from LWN.net

In the real world, text is expressed in many languages using a wide variety
of character sets; those character sets can be encoded in a lot of
different ways. In the kernel, life has always been simpler; file names
and other string data are just opaque streams of bytes. In the few cases
where the kernel must interpret text, nothing more than ASCII is required.
The proposed addition of case-insensitive
file-name lookups to the ext4 filesystem changes things, though; now
some kernel code must
deal with the full complexity of Unicode. A look at the API being provided
to handle encodings illustrates nicely just how complicated this task is.