Patent attributes
Embodiments are directed to managing character encoding. A plurality characters that are each encoded as code units based on a character code may be provided such that the code units for each character represents a code point of a character encoding scheme. An encoding model may be determined based on the character code, one or more processor features, and a target character code. Process features may be employed to transform the code units into target code units based on the encoding model such that the target code units are based on the target character code and such that the target code units encode the code point for each character. The plurality of target characters may be provided to a target stream such that each target character may be encoded as the target code units.