mirror of
https://github.com/TinyCC/tinycc.git
synced 2026-02-06 22:02:37 +00:00
As the standard requires, take 4 hex digits after the \u opener of a Universal Character Name, or take 8 hex digits after \U, but reject smaller counts and don't consume more (https://port70.net/~nsz/c/c11/n1570.html#6.4.3, https://port70.net/~nsz/c/c99/n1256.html#6.4.3). The unicode codepoint used to get truncated to 1 byte. Now it gets expanded into UTF-8, matching gcc & clang behavior on Linux. TODO: Universal character names should also be supported in identifiers, as in, e.g., char \u010dau_sv\u011bte[]="čau_světe";