java Programming Glossary: surrogates
Comparing a char to a code-point? http://stackoverflow.com/questions/1029897/comparing-a-char-to-a-code-point pair i.e. two char s with the first being the high surrogates code unit in the range uD800 uDBFF the second being the low..
UTF-16 to ASCII conversion in Java http://stackoverflow.com/questions/1490218/utf-16-to-ascii-conversion-in-java 21 bit values ... 0x000000 to 0x10FFFF ... and uses surrogates to represent codes 0x00FFFF. In other words a Unicode codepoint..
How can I iterate through the unicode codepoints of a Java String? http://stackoverflow.com/questions/1527856/how-can-i-iterate-through-the-unicode-codepoints-of-a-java-string char at an index testing whether the char is in the high surrogates range if so use String#codePointAt int to get the codepoint.. sure whether codepoints which are naturally in the high surrogates range will be stored as two char values or one this seems like..
Java Can't Open a File with Surrogate Unicode Values in the Filename? http://stackoverflow.com/questions/1545625/java-cant-open-a-file-with-surrogate-unicode-values-in-the-filename and if a filename contains Unicode characters that require surrogates the JVM can't seem to locate the file. For example my test file.. xB6 it outputs a UTF 8 encoded sequence for each of the surrogates xED xA1 x9B xED xBF xB6 This isn't a valid UTF 8 sequence but..
Howto unescape a Java string literal in Java http://stackoverflow.com/questions/3537706/howto-unescape-a-java-string-literal-in-java trailing c cp oldstr.codePointAt i don't need to grok surrogates as next line blows them up if cp 0x7f die expected ASCII after.. false if oldstr.charAt i ' ' ^^^^^^ ok to ignore surrogates here i saw_brace true int j for j 0 j 8 j if saw_brace.. j if saw_brace j 2 break for ASCII test also catches surrogates int ch oldstr.charAt i j if ch 127 die illegal non ASCII..
Unicode equivalents for \w and \b in Java regular expressions? http://stackoverflow.com/questions/4304928/unicode-equivalents-for-w-and-b-in-java-regular-expressions characters in logical code points not in idiotic UTF 16 surrogates. It ™s hard to overstress how important that is And that ™s just..
|