Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoder: Add transliteration processor & help JavaDoc link updates #5815

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

kingthorin
Copy link
Member

@kingthorin kingthorin commented Oct 15, 2024

Overview

  • CHANGELOG.md > Add note.
  • EncodeDecodeProcessors > Register the new processor.
  • encoder.html > Add help content and behavior examples. (Update JavaDoc links to version 17 references.)
  • Messages.properties > Add key/value pair for the name.
  • Transliterate > New encoder.
  • TransliterateUnitTest > Unit Test for the new encoder.

Related Issues

n/a

Checklist

  • Update help
  • Update changelog
  • Run ./gradlew spotlessApply for code formatting
  • Write tests
  • Check code coverage
  • Sign-off commits
  • Squash commits
  • Use a descriptive title

- CHANGELOG.md > Add note.
- EncodeDecodeProcessors > Register the new processor.
- encoder.html > Add help content and behavior examples. (Update JavaDoc
links to version 17 references.)
- Messages.properties > Add key/value pair for the name.
- Transliterate > New encoder.
- TransliterateUnitTest > Unit Test for the new encoder.

Signed-off-by: kingthorin <kingthorin@users.noreply.github.com>
@kingthorin kingthorin changed the title encoder: Add transliteration processor encoder: Add transliteration processor & help JavaDoc link updates Oct 18, 2024
@kingthorin
Copy link
Member Author

Fixed

@Override
protected String processInternal(String value) throws IOException {
// Normalize with compatible decomposition, then remove anything non-ASCII
return Normalizer.normalize(value, Normalizer.Form.NFKD).replaceAll("[^\\p{ASCII}]", "");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could also consider using the Apache commons util StringUtils#stripAccents (if it's already a dependency): https://commons.apache.org/proper/commons-lang/javadocs/api-release/org/apache/commons/lang3/StringUtils.html#stripAccents-java.lang.String-

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did come across that when I was considering this but I didn't see any benefit vs the native impl.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants