Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoder: Add transliteration processor & help JavaDoc link updates #5815

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion addOns/encoder/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,8 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## Unreleased

### Added
- A predefined processor "Transliterate" which converts text removing accents/diacritics/ligatures (perhaps not fully, due to operation in compatibility mode) leaving only ASCII characters.

## [1.5.0] - 2024-05-07
### Added
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@
import org.zaproxy.addon.encoder.processors.predefined.utility.LowerCase;
import org.zaproxy.addon.encoder.processors.predefined.utility.RemoveWhitespace;
import org.zaproxy.addon.encoder.processors.predefined.utility.Reverse;
import org.zaproxy.addon.encoder.processors.predefined.utility.Transliterate;
import org.zaproxy.addon.encoder.processors.predefined.utility.UpperCase;
import org.zaproxy.addon.encoder.processors.script.ScriptBasedEncodeDecodeProcessor;
import org.zaproxy.zap.extension.script.ScriptWrapper;
Expand Down Expand Up @@ -103,6 +104,7 @@ public class EncodeDecodeProcessors {
addPredefined("lowercase", LowerCase.getSingleton());
addPredefined("uppercase", UpperCase.getSingleton());
addPredefined("powershellencode", PowerShellEncoder.getSingleton());
addPredefined("transliterate", Transliterate.getSingleton());
}

private Map<String, EncodeDecodeProcessorItem> scriptProcessors = new HashMap<>();
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
/*
* Zed Attack Proxy (ZAP) and its related class files.
*
* ZAP is an HTTP/HTTPS proxy for assessing web application security.
*
* Copyright 2024 The ZAP Development Team
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.zaproxy.addon.encoder.processors.predefined.utility;

import java.io.IOException;
import java.text.Normalizer;
import org.zaproxy.addon.encoder.processors.predefined.DefaultEncodeDecodeProcessor;

public class Transliterate extends DefaultEncodeDecodeProcessor {

private static final Transliterate INSTANCE = new Transliterate();

@Override
protected String processInternal(String value) throws IOException {
// Normalize with compatible decomposition, then remove anything non-ASCII
return Normalizer.normalize(value, Normalizer.Form.NFKD).replaceAll("[^\\p{ASCII}]", "");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could also consider using the Apache commons util StringUtils#stripAccents (if it's already a dependency): https://commons.apache.org/proper/commons-lang/javadocs/api-release/org/apache/commons/lang3/StringUtils.html#stripAccents-java.lang.String-

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did come across that when I was considering this but I didn't see any benefit vs the native impl.

}

public static Transliterate getSingleton() {
return INSTANCE;
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -152,7 +152,7 @@ <H4>ASCII Hex Decode</H4>

<H4>Base 64 Decode</H4>
Will display the base 64 decoding of the text you enter.<br/>
Leveraging a <a href="https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/Base64.html#getMimeDecoder()">Mime decoder</a> to handle wrapped lines.
Leveraging a <a href="https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/Base64.html#getMimeDecoder()">Mime decoder</a> to handle wrapped lines.

<H4>Base 64 URL Decode</H4>
Will display the base 64 URL decoding of the text you enter. Base64URL is a modification to the primary base 64 standard
Expand Down Expand Up @@ -198,14 +198,23 @@ <H4>To Lower Case</H4>
Converts the input to all lower case characters.

<H4>Remove Whitespace</H4>
Removes all whitespace characters from the text, based on <a href="https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/Character.html#isWhitespace(char)">Character.isWhiteSpace(char)</a>.
Removes all whitespace characters from the text, based on <a href="https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/lang/Character.html#isWhitespace(char)">Character.isWhiteSpace(char)</a>.

<H4>Reverse</H4>
Reverses the order of the input.

<H4>To Upper Case</H4>
Converts the input to all upper case characters.

<H4>Transliterate</H4>
Converts text removing accents/diacritics/ligatures (perhaps not fully, due to operation in compatibility mode) leaving only ASCII characters.
Ex: <code>Tĥïŝ ĩš â fůňķŷ Šťŕĭńġ: fi. étrange.</code> becomes <code>This is a funky String: fi. etrange.</code>.>br>
See also:<br>
<ul>
<li><a href="https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/text/Normalizer.html">Normalizer JavaDoc</a>. </li>
<li><a href="https://en.wikipedia.org/wiki/Transliteration">Wikipedia: Transliteration</a></li>
</ul>

<H3>Miscellaneous</H3>

<H4>PowerShell Encode</H4>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ encoder.predefined.tab.encode = Encode
encoder.predefined.tab.hash = Hash
encoder.predefined.tab.illegalUTF8 = Illegal UTF8
encoder.predefined.tab.unicode = Unicode
encoder.predefined.transliterate = Transliterate (Strip accents, etc)
encoder.predefined.unicodedecode = Unicode Unescaped Text
encoder.predefined.unicodeencode = Unicode Escaped Text
encoder.predefined.uppercase = To Upper Case
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
/*
* Zed Attack Proxy (ZAP) and its related class files.
*
* ZAP is an HTTP/HTTPS proxy for assessing web application security.
*
* Copyright 2024 The ZAP Development Team
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.zaproxy.addon.encoder.processors.predefined.utility;

import static org.hamcrest.MatcherAssert.assertThat;
import static org.hamcrest.Matchers.equalTo;
import static org.hamcrest.Matchers.is;

import org.junit.jupiter.api.Test;
import org.zaproxy.addon.encoder.processors.EncodeDecodeResult;
import org.zaproxy.addon.encoder.processors.predefined.ProcessorTests;

class TransliterateUnitTest extends ProcessorTests<Transliterate> {

@Override
protected Transliterate createProcessor() {
return Transliterate.getSingleton();
}

@Test
void shouldEncodeWithoutError() throws Exception {
// Given / When
EncodeDecodeResult result = processor.process("Tĥïŝ ĩš â fůňķŷ Šťŕĭńġ: fi. étrange.");
// Then
assertThat(result.hasError(), is(equalTo(false)));
assertThat(result.getResult(), is(equalTo("This is a funky String: fi. etrange.")));
}
}