Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the ByteStringUtil.longestCommonPrefix(...) method isn't working between non ascii String and internal CharSequence #165

Open
ate47 opened this issue Jul 19, 2022 · 0 comments

Comments

@ate47
Copy link
Contributor

ate47 commented Jul 19, 2022

The ByteStringUtil.longestCommonPrefix(...) method isn't working when one of its parameters is a String and the other a Compact or Replazable String, in the internal strings (Replazable/Compact), the charAt(i) methods are returning byte[i] and in a string, it returns the character at location i, so if we are using non ASCII characters, we are using more than one byte. For example (Shorten value of a Wikidata literal of Q101213907)

String s1 = "\u00C2\u00A0normal";
CompactString s2 = new CompactString("\u00A0normal");
Assert.assertEquals(0, ByteStringUtil.longestCommonPrefix(s1, s2));
// java.lang.AssertionError: 
// Expected :0
// Actual   :8

The string value is
"\u00C2\u00A0" = char[] {0xC2, 0xA0}
The internal value is
utf8("\u00A0") = byte[] {0xC2, 0XA0}

cf: UTF8

In the code, it is used 2 internal strings, but because the method is public, it might be better to fix it if someone is using the library method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant