Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jdk19 regexp fix #10972

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
Open

Jdk19 regexp fix #10972

wants to merge 11 commits into from

Conversation

arysin
Copy link
Contributor

@arysin arysin commented Oct 29, 2024

Summary by CodeRabbit

Release Notes

  • New Features

    • Enhanced regex patterns to support Unicode characters across various language modules, improving text processing accuracy.
  • Bug Fixes

    • Refined handling of contractions, apostrophes, and number representations in multiple languages, ensuring correct grammatical structures.
  • Documentation

    • Updated comments in code to clarify dependency management and compatibility checks.
  • Chores

    • Updated dependency version for net.loomchild.segment to improve project stability.
  • Refactor

    • Simplified testing structure in MainTest by removing inheritance from AbstractSecurityTestCase and adopting process execution for command-line tests.

Copy link
Contributor

coderabbitai bot commented Oct 29, 2024

Walkthrough

This pull request introduces several modifications across various classes in the LanguageTool project, primarily focusing on enhancing regular expression (regex) handling by incorporating the Pattern.UNICODE_CHARACTER_CLASS flag. This addition improves the matching capabilities for Unicode characters in multiple language modules, including Catalan, Spanish, French, German, and Portuguese. Additionally, changes in XML rule handling and segmentation rules are made to refine text processing and improve accuracy. Minor adjustments to comments and formatting are also included without altering the core logic or functionality.

Changes

File Change Summary
languagetool-core/src/main/java/org/languagetool/rules/AbstractUnitConversionRule.java Added Pattern.UNICODE_CHARACTER_CLASS to regex patterns for unit conversions. Minor comments and formatting adjustments made.
languagetool-core/src/main/java/org/languagetool/rules/patterns/PatternRuleHandler.java Updated regex compilation flags to include Pattern.UNICODE_CHARACTER_CLASS. No changes to method signatures or overall logic.
languagetool-core/src/main/java/org/languagetool/rules/patterns/RegexAntiPatternFilter.java Modified acceptRuleMatch method to include Pattern.UNICODE_CHARACTER_CLASS in regex pattern compilation. Logic remains unchanged.
languagetool-core/src/main/java/org/languagetool/rules/patterns/XMLRuleHandler.java Added imports and introduced a new member variable phraseMap. Updated methods to utilize phraseMap. Enhanced error handling in setExceptions.
languagetool-core/src/main/java/org/languagetool/tokenizers/SrxTools.java Updated tokenize method to include an additional parameter for Pattern.UNICODE_CHARACTER_CLASS.
languagetool-core/src/main/resources/org/languagetool/resource/segment.srx Adjusted regex patterns for Ukrainian and other languages, removing (?U) flag and refining segmentation rules.
languagetool-language-modules/ca/src/main/java/org/languagetool/language/Catalan.java Added Pattern.UNICODE_CHARACTER_CLASS to multiple regex patterns for improved Unicode handling. Updated methods for handling contractions and apostrophes.
languagetool-language-modules/ca/src/main/java/org/languagetool/rules/ca/PronomsFeblesHelper.java Added Pattern.UNICODE_CHARACTER_CLASS to pronoun_missing_apostrophation regex pattern in fixApostrophes method.
languagetool-language-modules/de/src/main/java/org/languagetool/language/German.java Updated TYPOGRAPHY_PATTERN to include Pattern.UNICODE_CHARACTER_CLASS.
languagetool-language-modules/de/src/main/resources/org/languagetool/rules/de/grammar.xml Modified regex patterns for whitespace handling in abbreviations.
languagetool-language-modules/es/src/main/java/org/languagetool/language/Spanish.java Updated ES_CONTRACTIONS pattern to include Pattern.UNICODE_CHARACTER_CLASS. Refined filterRuleMatches method for better suggestion filtering.
languagetool-language-modules/es/src/main/java/org/languagetool/tokenizers/es/SpanishWordTokenizer.java Updated ORDINAL_POINT pattern to use Pattern.UNICODE_CHARACTER_CLASS.
languagetool-language-modules/fr/src/main/java/org/languagetool/language/French.java Added Pattern.UNICODE_CHARACTER_CLASS to apostrophe handling patterns.
languagetool-language-modules/fr/src/main/java/org/languagetool/rules/fr/MakeContractionsFilter.java Updated contraction regex patterns to use Pattern.UNICODE_CHARACTER_CLASS.
languagetool-language-modules/fr/src/main/java/org/languagetool/tokenizers/fr/FrenchWordTokenizer.java Replaced Pattern.UNICODE_CASE with Pattern.UNICODE_CHARACTER_CLASS in tokenization patterns.
languagetool-language-modules/pt/src/main/resources/org/languagetool/resource/pt/entities/hyphenised.ent Modified regex pattern in hyphenised_expressions to include (?U) flag.
languagetool-language-modules/pt/src/main/resources/org/languagetool/resource/pt/pt.sor Updated number representations and grammatical rules for Brazilian Portuguese, enhancing linguistic accuracy.
pom.xml Updated net.loomchild.segment dependency version from 2.0.3 to 2.0.4. Added comments regarding dependency management.

Possibly related PRs

Suggested reviewers

  • fabrichter

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 47c9805 and 3d94338.

📒 Files selected for processing (1)
  • languagetool-core/src/main/resources/org/languagetool/resource/segment.srx (10 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • languagetool-core/src/main/resources/org/languagetool/resource/segment.srx

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (9)
languagetool-language-modules/de/src/main/resources/org/languagetool/rules/de/grammar.xml (1)

Line range hint 79409-79541: Document whitespace handling strategy.

Given that these changes are part of a larger JDK19 regexp fix, consider adding a comment in the XML file explaining the rationale behind using [ \t]? instead of \s? for abbreviation patterns. This will help maintain consistency as new rules are added.

Add a comment at the beginning of the abbreviation rules section:

<!-- 
  Abbreviation rules use explicit space/tab matching ([ \t]?) instead of \s? 
  to ensure consistent behavior across different JDK versions and to prevent 
  matching other whitespace characters.
-->
languagetool-core/src/main/java/org/languagetool/rules/patterns/RegexAntiPatternFilter.java (1)

45-45: Consider adding Unicode test cases.

Since this change affects Unicode character handling, it would be beneficial to add test cases that specifically verify the behavior with non-ASCII text in antipatterns.

Would you like me to help create test cases that cover Unicode scenarios for the RegexAntiPatternFilter?

languagetool-language-modules/es/src/main/java/org/languagetool/tokenizers/es/SpanishWordTokenizer.java (1)

46-46: Consider adding test cases for non-ASCII digits.

To ensure the new Unicode digit matching behavior works correctly, consider adding test cases that include ordinal numbers with non-ASCII digits (e.g., Eastern Arabic numerals, Devanagari digits).

Example test cases to consider:

assertEquals(Arrays.asList("٢", "º"), tokenizer.tokenize("٢º"));  // Eastern Arabic
assertEquals(Arrays.asList("२", "º"), tokenizer.tokenize("२º"));  // Devanagari
languagetool-language-modules/fr/src/main/java/org/languagetool/tokenizers/fr/FrenchWordTokenizer.java (1)

77-77: Consider updating related patterns for consistency.

While the change to Pattern.UNICODE_CHARACTER_CLASS is correct, there are similar patterns in the file that would benefit from the same update:

  • SPACE_DIGITS0 pattern (line 73)
  • DECIMAL_POINT pattern (line 64)
  • DECIMAL_COMMA pattern (line 66)

These patterns also deal with digit matching and should be updated for consistency.

Apply this update to related patterns:

  private static final Pattern SPACE_DIGITS0 = Pattern.compile("([\\d]{4}) ",
-     Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE);
+     Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CHARACTER_CLASS);
  private static final Pattern DECIMAL_POINT = Pattern.compile("([\\d])\\.([\\d])",
-     Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE);
+     Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CHARACTER_CLASS);
  private static final Pattern DECIMAL_COMMA = Pattern.compile("([\\d]),([\\d])",
-     Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE);
+     Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CHARACTER_CLASS);
languagetool-language-modules/ca/src/main/java/org/languagetool/language/Catalan.java (2)

348-348: Consider adding a comment explaining the diacritics rule.

The pattern correctly handles old diacritics with proper Unicode support. Consider adding a brief comment explaining which version of Catalan orthography these diacritics correspond to.


422-442: LGTM! Consider grouping related patterns.

The patterns correctly handle various cases of contractions and apostrophes with proper Unicode support. Consider grouping related patterns (e.g., all apostrophe patterns) into separate constant groups with explanatory comments for better maintainability.

Example organization:

// Group 1: Basic contractions
private static final Pattern CA_CONTRACTIONS = ...

// Group 2: Apostrophe patterns
private static final Pattern CA_APOSTROPHES1 = ...
private static final Pattern CA_APOSTROPHES2 = ...
// ... more apostrophe patterns

// Group 3: Possessive patterns
private static final Pattern POSSESSIUS_v = ...
private static final Pattern POSSESSIUS_V = ...
languagetool-core/src/main/java/org/languagetool/rules/AbstractUnitConversionRule.java (1)

199-199: LGTM! Consider extracting the pattern for better readability.

The addition of Pattern.UNICODE_CHARACTER_CLASS flag improves Unicode word boundary handling. However, the pattern string is quite complex.

Consider extracting the pattern string to a constant for better readability:

+  private static final String UNIT_PATTERN_TEMPLATE = 
+    NUMBER_REGEX_WITH_BOUNDARY + "[\\s\u00A0]{0," + WHITESPACE_LIMIT + "}%s\\b";

-    unitPatterns.put(Pattern.compile(NUMBER_REGEX_WITH_BOUNDARY + "[\\s\u00A0]{0," + WHITESPACE_LIMIT + "}" + pattern + "\\b", Pattern.UNICODE_CHARACTER_CLASS), unit);
+    unitPatterns.put(Pattern.compile(String.format(UNIT_PATTERN_TEMPLATE, pattern), Pattern.UNICODE_CHARACTER_CLASS), unit);
languagetool-core/src/main/java/org/languagetool/rules/patterns/XMLRuleHandler.java (2)

Line range hint 392-392: Initialize phraseMap in the constructor to prevent NPE.

The phraseMap field is lazily initialized in finalizePhrase(), but it would be safer to initialize it in the constructor to prevent potential NPEs if finalizePhrase() is not called first.

Apply this change to the constructor:

 public XMLRuleHandler() {
+  this.phraseMap = new HashMap<>();
 }

Line range hint 392-392: Add documentation for the phraseMap data structure.

The phraseMap field and its usage in preparePhrase and finalizePhrase methods would benefit from detailed documentation explaining:

  • The structure and purpose of the nested collections
  • The lifecycle of phrase handling
  • Example usage scenarios

Add Javadoc to the field:

/**
 * Stores phrases by their IDs. The structure is:
 * - Key: phraseId (String)
 * - Value: List of alternative pattern token sequences for the phrase
 *   where each sequence is a List<PatternToken>
 */
protected Map<String, List<List<PatternToken>>> phraseMap;

Also applies to: 486-509

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 7423a0e and bd299bd.

📒 Files selected for processing (18)
  • languagetool-core/src/main/java/org/languagetool/rules/AbstractUnitConversionRule.java (2 hunks)
  • languagetool-core/src/main/java/org/languagetool/rules/patterns/PatternRuleHandler.java (1 hunks)
  • languagetool-core/src/main/java/org/languagetool/rules/patterns/RegexAntiPatternFilter.java (1 hunks)
  • languagetool-core/src/main/java/org/languagetool/rules/patterns/XMLRuleHandler.java (1 hunks)
  • languagetool-core/src/main/java/org/languagetool/tokenizers/SrxTools.java (2 hunks)
  • languagetool-core/src/main/resources/org/languagetool/resource/segment.srx (10 hunks)
  • languagetool-language-modules/ca/src/main/java/org/languagetool/language/Catalan.java (3 hunks)
  • languagetool-language-modules/ca/src/main/java/org/languagetool/rules/ca/PronomsFeblesHelper.java (1 hunks)
  • languagetool-language-modules/de/src/main/java/org/languagetool/language/German.java (1 hunks)
  • languagetool-language-modules/de/src/main/resources/org/languagetool/rules/de/grammar.xml (9 hunks)
  • languagetool-language-modules/es/src/main/java/org/languagetool/language/Spanish.java (1 hunks)
  • languagetool-language-modules/es/src/main/java/org/languagetool/tokenizers/es/SpanishWordTokenizer.java (1 hunks)
  • languagetool-language-modules/fr/src/main/java/org/languagetool/language/French.java (1 hunks)
  • languagetool-language-modules/fr/src/main/java/org/languagetool/rules/fr/MakeContractionsFilter.java (1 hunks)
  • languagetool-language-modules/fr/src/main/java/org/languagetool/tokenizers/fr/FrenchWordTokenizer.java (1 hunks)
  • languagetool-language-modules/pt/src/main/resources/org/languagetool/resource/pt/entities/hyphenised.ent (1 hunks)
  • languagetool-language-modules/pt/src/main/resources/org/languagetool/resource/pt/pt.sor (1 hunks)
  • pom.xml (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • languagetool-language-modules/ca/src/main/java/org/languagetool/rules/ca/PronomsFeblesHelper.java
🔇 Additional comments (44)
languagetool-language-modules/de/src/main/resources/org/languagetool/rules/de/grammar.xml (2)

79523-79523: LGTM! Academic title patterns are well structured.

The changes to academic title patterns are consistent and well-documented with appropriate examples. The whitespace handling aligns with the intended behavior for formal title abbreviations.

Also applies to: 79531-79531, 79541-79541


79409-79409: Consider implications of restrictive whitespace matching.

The change from \s? to [ \t]? makes the whitespace matching more explicit but potentially less flexible. While this might be intentional for abbreviation handling, consider that it will no longer match other valid Unicode whitespace characters.

Let's check if there are any instances in the codebase using other whitespace characters with these abbreviations:

Also applies to: 79418-79418, 79426-79426

✅ Verification successful

Let me try a different approach to search for these specific abbreviations:

And let's also check if there are any test cases or documentation specifically addressing whitespace handling:


The whitespace pattern change is safe and consistent with usage patterns

Based on the search results, the change from \s? to [ \t]? is actually appropriate for these German abbreviations:

  1. All examples in the grammar rules and test cases show these abbreviations (s.o., d.h., m.E.) are used either:
    • Without any space (d.h., m.E.)
    • With a regular space (d. h.)
    • With a non-breaking space entity (d.&nbsp;h.)

The search results show no instances where these abbreviations are used with other types of whitespace characters (like newlines or form feeds), confirming that the more restrictive pattern [ \t]? correctly covers all legitimate use cases in German text.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for abbreviations with Unicode whitespace characters
rg -U "([sd]|[mM])\.\X{1}([ouhE])\." --multiline

Length of output: 158


Script:

#!/bin/bash
# Search for abbreviations with any kind of whitespace characters
rg -U "([sd]|[mM])\.[[:space:]]*([ouhE])\." --multiline languagetool-language-modules/de/

# Also check test files and example texts for real-world usage
rg -U "([sd]|[mM])\.[[:space:]]*([ouhE])\." --type txt --type xml languagetool-language-modules/de/src/test/

Length of output: 2286


Script:

#!/bin/bash
# Search for specific abbreviations with surrounding context
rg -B1 -A1 "(s\.o\.|d\.h\.|m\.E\.)" languagetool-language-modules/de/

Length of output: 2026


Script:

#!/bin/bash
# Look for test cases or documentation about whitespace handling
rg -i "whitespace|[[:space:]]|\bs\b" languagetool-language-modules/de/src/test/ --type xml

Length of output: 92

languagetool-core/src/main/java/org/languagetool/rules/patterns/RegexAntiPatternFilter.java (1)

45-45: LGTM! Verify existing antipatterns.

The addition of Pattern.UNICODE_CHARACTER_CLASS flag is appropriate and aligns with the JDK 19 Unicode improvements across the codebase. However, this change could affect how character classes behave with Unicode text.

Let's verify the existing antipatterns in the codebase:

languagetool-core/src/main/java/org/languagetool/tokenizers/SrxTools.java (2)

34-34: LGTM: Import addition is appropriate.

The Pattern import is necessary for using the UNICODE_CHARACTER_CLASS flag and follows Java import conventions.


63-64: Implementation improves Unicode handling but verify existing behavior.

The addition of Pattern.UNICODE_CHARACTER_CLASS will enhance Unicode character handling in tokenization, which aligns with the JDK19 regexp improvements. However, this change might subtly affect existing tokenization behavior.

Let's verify the impact:

✅ Verification successful

Let me gather more information about the usage and potential impact by checking the SRXSentenceTokenizer implementation.


Let me check how the tokenize method is used in SRXSentenceTokenizer to understand the impact of the change.


Let me check the SRXSentenceTokenizerTest to see if there are existing Unicode-related tests and how they might be affected.


Based on the code analysis, I can now provide a final response:

Unicode handling improvement is safe and beneficial

The addition of Pattern.UNICODE_CHARACTER_CLASS flag in SrxTools.tokenize() is a safe enhancement that improves Unicode character handling in text tokenization across all languages. The change:

  1. Is applied consistently through the central tokenization method used by all SRXSentenceTokenizer implementations
  2. Only affects the pattern matching behavior to better handle Unicode, without changing the tokenization logic
  3. Does not modify any method signatures or public APIs
  4. Is backward compatible as it only enhances the Unicode support without breaking existing functionality
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for any test cases or usages that might be affected by the tokenization change

# Check for test files that use SrxTools.tokenize
echo "Checking for test files using SrxTools.tokenize:"
rg -l "SrxTools\.tokenize" --type java

# Look for direct usages of the tokenize method
echo "Checking for direct usages of tokenize method:"
ast-grep --pattern 'tokenize($text, $doc, $code)'

Length of output: 388


Script:

#!/bin/bash
# Check the implementation in SRXSentenceTokenizer to understand how SrxTools.tokenize is used
echo "Checking SRXSentenceTokenizer implementation:"
rg -A 5 -B 5 "SrxTools\.tokenize" languagetool-core/src/main/java/org/languagetool/tokenizers/SRXSentenceTokenizer.java

# Check for tests of SRXSentenceTokenizer
echo -e "\nChecking for SRXSentenceTokenizer tests:"
fd -e java -x rg -l "SRXSentenceTokenizer" {} \;

# Look for any existing Unicode-related tests
echo -e "\nChecking for Unicode-related tests:"
rg -l "unicode|Unicode" $(fd -e java -t f)

Length of output: 201637


Script:

#!/bin/bash
# Check SRXSentenceTokenizer implementation to understand how SrxTools.tokenize is used
echo "Checking SRXSentenceTokenizer implementation:"
rg -A 10 -B 10 "SRXSentenceTokenizer" languagetool-core/src/main/java/org/languagetool/tokenizers/SRXSentenceTokenizer.java

# Check for any existing tests that verify Unicode handling
echo -e "\nChecking for Unicode-related tests in tokenizers:"
rg -l "Unicode|unicode.*test" $(fd -e java -t f | grep "test.*Tokenizer")

Length of output: 6491


Script:

#!/bin/bash
# Check the core SRXSentenceTokenizerTest implementation
echo "Checking SRXSentenceTokenizerTest implementation:"
rg -A 5 -B 5 "Unicode|unicode" languagetool-standalone/src/test/java/org/languagetool/tokenizers/SRXSentenceTokenizerTest.java

# Check SrxTools implementation to understand the Unicode flag usage
echo -e "\nChecking SrxTools implementation around tokenize method:"
rg -A 10 -B 10 "tokenize\(" languagetool-core/src/main/java/org/languagetool/tokenizers/SrxTools.java

Length of output: 1300

languagetool-language-modules/es/src/main/java/org/languagetool/tokenizers/es/SpanishWordTokenizer.java (1)

46-46: Verify the necessity of UNICODE_CHARACTER_CLASS flag.

The change from UNICODE_CASE to UNICODE_CHARACTER_CLASS looks correct, but let's verify if Spanish texts actually contain non-ASCII digits that would benefit from this change.

languagetool-language-modules/pt/src/main/resources/org/languagetool/resource/pt/pt.sor (1)

181-181: LGTM! The Unicode flag addition improves text processing.

The addition of (?U) flag enables UNICODE_CHARACTER_CLASS mode, which ensures proper handling of word boundaries with Unicode characters in Portuguese text.

Let's verify if similar patterns in other language modules need this improvement:

languagetool-language-modules/fr/src/main/java/org/languagetool/tokenizers/fr/FrenchWordTokenizer.java (2)

Line range hint 41-57: Consider reviewing other regex patterns in the file.

Since this is part of a JDK 19 regexp fix, consider reviewing other patterns in the file that use Pattern.UNICODE_CASE. For example:

  • TYPEWRITER_APOSTROPHE
  • TYPOGRAPHIC_APOSTROPHE
  • NEARBY_HYPHENS
  • HYPHENS

These patterns use character classes (\p{L}) and might benefit from Pattern.UNICODE_CHARACTER_CLASS for consistent Unicode handling.

Let's check if there are similar patterns in other language modules:

#!/bin/bash
# Description: Find similar patterns in other language modules that might need updating
# Look for patterns using UNICODE_CASE with \p{L} or \d

rg --type java "Pattern\.compile.*\\\\p\{L\}.*UNICODE_CASE" ./languagetool-language-modules/

75-75: LGTM! The Unicode flag change improves digit matching.

The switch to Pattern.UNICODE_CHARACTER_CLASS is appropriate here as it ensures proper matching of all Unicode digits, not just ASCII ones. This is particularly relevant for French text that might contain non-ASCII digits.

Let's verify the pattern behavior with non-ASCII digits:

✅ Verification successful

The Unicode flag change is appropriate and consistent with the codebase

Looking at the complete implementation, the change from UNICODE_CASE to UNICODE_CHARACTER_CLASS in the SPACE_DIGITS pattern is correct because:

  1. The pattern only matches digits and spaces, making UNICODE_CASE unnecessary since there's no case-sensitivity involved
  2. UNICODE_CHARACTER_CLASS is more appropriate as it ensures \d matches all Unicode digits, which is important for proper number formatting in French texts
  3. The change is consistent with the pattern's usage in the tokenizer, where it's specifically used for handling number formatting with spaces (e.g., "1 234")
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for non-ASCII digit usage in French texts
# Look for Unicode digits (like ٠١٢٣٤٥٦٧٨٩) in French files

rg --type java --type xml "[\u0660-\u0669]|[\u06F0-\u06F9]" ./languagetool-language-modules/fr/

Length of output: 834


Script:

#!/bin/bash
# Let's check the actual pattern usage in FrenchWordTokenizer.java
cat ./languagetool-language-modules/fr/src/main/java/org/languagetool/tokenizers/fr/FrenchWordTokenizer.java

Length of output: 11034

languagetool-language-modules/es/src/main/java/org/languagetool/language/Spanish.java (1)

318-318: LGTM! Good improvement for Unicode handling.

The addition of Pattern.UNICODE_CHARACTER_CLASS flag to the ES_CONTRACTIONS pattern is a valuable enhancement. This flag makes word boundaries (\b) Unicode-aware, which is particularly important for Spanish text containing accented characters. This change aligns with JDK19's improved Unicode support and ensures more reliable handling of Spanish contractions (e.g., "a el" → "al", "de el" → "del").

languagetool-language-modules/ca/src/main/java/org/languagetool/language/Catalan.java (1)

48-50: LGTM! Proper Unicode handling for apostrophe patterns.

The addition of Pattern.UNICODE_CHARACTER_CLASS flag ensures correct handling of Unicode characters in Catalan text, particularly for apostrophe-related patterns.

languagetool-language-modules/fr/src/main/java/org/languagetool/language/French.java (1)

53-55: LGTM! The addition of Pattern.UNICODE_CHARACTER_CLASS improves Unicode handling.

The addition of Pattern.UNICODE_CHARACTER_CLASS flag to the apostrophe patterns is a good enhancement that ensures proper handling of Unicode apostrophe characters in French text processing. This change aligns with similar improvements across other language modules and makes the regex patterns more robust.

languagetool-core/src/main/java/org/languagetool/rules/AbstractUnitConversionRule.java (1)

83-83: LGTM! Good Unicode support enhancement.

Adding Pattern.UNICODE_CHARACTER_CLASS flag improves the handling of Unicode word boundaries in number range detection.

languagetool-core/src/main/java/org/languagetool/rules/patterns/XMLRuleHandler.java (2)

21-27: LGTM: Import statements are well-organized.

The new imports are properly organized and necessary for the collection classes and function interface used in the implementation.


Line range hint 486-509: Consider performance impact of deep copying pattern tokens.

The finalizePhrase method creates multiple copies of pattern tokens. For rules with many phrases, this could impact memory usage and performance. Consider:

  1. Adding a comment explaining why deep copying is necessary
  2. Measuring the performance impact with large rule sets
languagetool-language-modules/pt/src/main/resources/org/languagetool/resource/pt/entities/hyphenised.ent (1)

2-2: Verify the impact of adding the (?U) flag.

The addition of the (?U) flag changes the behavior of quantifiers to be ungreedy by default. While this can help prevent over-matching in some cases, it might affect existing matches. Please ensure this change has been thoroughly tested with various inputs.

languagetool-language-modules/de/src/main/java/org/languagetool/language/German.java (1)

58-58: LGTM! Consider verifying Unicode character handling.

The addition of Pattern.UNICODE_CHARACTER_CLASS flag improves the pattern's ability to handle Unicode characters in German text, which is particularly important for typography-related transformations.

Let's verify the pattern's behavior with Unicode characters:

✅ Verification successful

Pattern change is appropriate and consistent with codebase practices

The addition of Pattern.UNICODE_CHARACTER_CLASS flag to TYPOGRAPHY_PATTERN in German.java aligns with established practices across the codebase:

  • Other language modules (Ukrainian, Spanish, French, Catalan) consistently use UNICODE_CHARACTER_CLASS flag for patterns handling language-specific text
  • The German pattern matches letter sequences ([a-zA-Z]), making Unicode support particularly relevant for handling German-specific characters
  • Similar typography-related patterns in other modules (e.g., French) demonstrate careful handling of Unicode characters
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for similar typography patterns in other language modules
# to ensure consistent Unicode handling across the codebase.

# Test: Search for typography patterns in other language modules
rg -l 'Pattern\.UNICODE_CHARACTER_CLASS.*TYPOGRAPHY_PATTERN' 'languagetool-language-modules/'

Length of output: 93


Script:

#!/bin/bash
# Let's try a broader search to understand typography pattern usage across language modules
# and check if other modules have similar Unicode character handling

# Search for TYPOGRAPHY_PATTERN declarations
rg -A 2 'TYPOGRAPHY_PATTERN.*=' languagetool-language-modules/

# Search for toAdvancedTypography method implementations to understand pattern usage
ast-grep --pattern 'toAdvancedTypography($$$) { $$$ }'

# Search for other patterns using UNICODE_CHARACTER_CLASS flag
rg 'Pattern\.UNICODE_CHARACTER_CLASS' languagetool-language-modules/

Length of output: 10039

languagetool-core/src/main/java/org/languagetool/rules/patterns/PatternRuleHandler.java (1)

779-780: LGTM! Improved Unicode support for regex patterns.

The addition of Pattern.UNICODE_CHARACTER_CLASS flag enhances the regex pattern matching by making character classes like \w, \d, \s match Unicode characters. This is a good improvement that will help better handle text in various languages.

pom.xml (2)

194-194: Verify compatibility with the new segment library version.

The update of net.loomchild.segment from 2.0.3 to 2.0.4 aligns with the PR's focus on improving regex handling. However, we should verify that this update doesn't introduce any breaking changes.

Let's check for any breaking changes or compatibility issues:

✅ Verification successful

Let me gather more information about the specific usage of the segment library to verify if there are any potential breaking changes.


Version update from 2.0.3 to 2.0.4 appears safe to proceed

The code inspection shows that the segment library is used in a straightforward manner for text segmentation with SRX rules:

  • Usage is limited to core tokenization functionality in SrxTools and SRXSentenceTokenizer
  • The API usage is basic and stable (TextIterator, SrxDocument, SrxParser)
  • The update aligns with the PR's focus on regex improvements, particularly with the Pattern.UNICODE_CHARACTER_CLASS flag usage
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for potential breaking changes in segment library usage

# Test: Search for segment library usage patterns
rg -A 5 "net\.loomchild\.segment"

# Test: Search for any existing issues or workarounds related to the segment library
rg -l "TODO.*segment|FIXME.*segment|XXX.*segment|HACK.*segment|BUG.*segment"

Length of output: 2813


Script:

#!/bin/bash
# Check the actual usage of segment library classes in the codebase
ast-grep --pattern 'class $_ {
  $$$
  TextIterator $$$
  $$$
}'

# Check for any test files related to segmentation
fd -e java -e xml test.*segment

# Look at the implementation details in SrxTools.java
cat languagetool-core/src/main/java/org/languagetool/tokenizers/SrxTools.java

# Look at the implementation details in SRXSentenceTokenizer.java
cat languagetool-core/src/main/java/org/languagetool/tokenizers/SRXSentenceTokenizer.java

Length of output: 5437


194-194: Address dependency-related TODO comments.

Several TODO comments in the dependency block indicate potential issues that need attention:

  1. Compile errors in wikipedia, dev, and rpm modules need to be checked
  2. Missing dependencies after update:
    • jopt-simple.jar
    • commons-jxpath.jar
    • log4j.jar
  3. Compatibility checks needed for various dependencies

Please verify these issues and ensure all dependencies are properly resolved.

Let's check for the mentioned dependencies and potential issues:

languagetool-core/src/main/resources/org/languagetool/resource/segment.srx (24)

5747-5747: Regex pattern correctly handles prepositions followed by ellipsis

The regex accurately matches Ukrainian prepositions followed by an ellipsis, ensuring proper sentence segmentation.


5760-5760: Regex for matching numbered points is correctly defined

The pattern correctly identifies numbered list items, aiding in appropriate segmentation.


5765-5765: Regex efficiently matches lowercase words ending with punctuation

This regex effectively captures words ending with punctuation marks, which is important for accurate sentence detection.


5774-5774: Regex correctly matches one or two-letter abbreviations

The pattern accurately identifies abbreviations consisting of one or two letters followed by a period.


5779-5779: Regex for uppercase abbreviations with optional non-breaking space is appropriate

The regex effectively matches uppercase single-letter abbreviations, optionally preceded by a non-breaking space.


5784-5784: Regex pattern for complex abbreviation contexts is acceptable

The pattern accurately captures complex abbreviation scenarios, enhancing the segmentation rules for Ukrainian text.


5808-5808: Regex correctly matches years followed by 'р.' abbreviation

This regex effectively identifies years followed by the Ukrainian abbreviation for 'year', ensuring accurate processing of dates.


5813-5813: Negative lookbehind correctly avoids matching 'р.' after digits

The negative lookbehind ensures that 'р.' is not matched when preceded by digits, preventing incorrect segmentation.


5828-5828: Regex correctly matches year ranges with 'рр.' abbreviation

The pattern effectively captures ranges of years followed by 'рр.', the Ukrainian plural abbreviation for 'years'.


5833-5833: Regex matches common Ukrainian financial abbreviations appropriately

This regex accurately identifies financial abbreviations such as 'тис.', 'млн.', 'млрд.', and 'грн.'.


5838-5838: Regex correctly matches language abbreviations

The pattern effectively captures abbreviations for various languages, enhancing linguistic processing.


5846-5846: Regex correctly matches abbreviation 'кін.'

The pattern accurately matches the abbreviation for 'кін.', enhancing abbreviation detection.


5850-5850: Regex correctly matches abbreviation 'ст.'

This regex effectively identifies the abbreviation 'ст.', commonly used for 'сторінка' (page).


5859-5859: Regex correctly matches abbreviation 'нар.'

The pattern successfully matches 'нар.', the abbreviation for 'народження' (birth).


5863-5863: Regex correctly matches abbreviation 'дол.'

This regex accurately identifies 'дол.', the abbreviation for 'долар' (dollar).


5868-5868: Negative lookbehind correctly excludes 'т.' prefix

The negative lookbehind ensures that abbreviations 'п.' or 'д.' are not matched when preceded by 'т.', preventing incorrect segmentation in specific contexts.


5873-5873: Regex correctly matches abbreviation 'див.'

The pattern effectively captures 'див.', the abbreviation for 'дивись' (see), commonly used in references.


5885-5885: Regex correctly matches 'див.' in specific contexts

This regex accurately identifies 'див.' when it appears within parentheses, brackets, or after a comma, enhancing reference handling.


5890-5890: Regex matches professional title abbreviations effectively

The pattern successfully matches various professional titles such as 'проф.', 'доц.', 'акад.', which is important for proper name recognition.


5894-5894: Negative lookbehind correctly avoids matching 'др.' after 'і' or 'и'

The negative lookbehind prevents matching 'др.' when preceded by 'і' or 'и', ensuring accurate abbreviation detection.


5898-5898: Regex correctly matches abbreviation 'Ман.'

The pattern effectively captures 'Ман.', which may represent 'Манчестер' or similar abbreviations.


5908-5908: Regex matches section abbreviations correctly

This regex accurately identifies abbreviations like 'Арт.', 'Мал.', 'Рис.', and 'Спр.', common in Ukrainian texts.


5918-5918: Negative lookbehind ensures 'м.' is not matched after digits

The negative lookbehind prevents matching 'м.' (meter) when preceded by digits, avoiding incorrect segmentation.


5942-5942: Regex correctly matches 'ред.' abbreviation

The pattern effectively captures 'ред.', the abbreviation for 'редактор' (editor), enhancing proper noun recognition.

@@ -79485,7 +79485,7 @@ Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA
<example correction="n.&nbsp;V.">Sprechstunde <marker>n.V.</marker></example>
</rule>
<rule><!--12-->
<regexp>(u)\.\s?(v)\.\s?(m|a)\.</regexp>
<regexp>(u)\.\s?(v)\.[ \t]?(m|a)\.</regexp>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix inconsistent whitespace pattern in rule 12.

The pattern (u)\.\s?(v)\.[ \t]?(m|a)\. mixes \s? and [ \t]?. This inconsistency could lead to different matching behavior within the same expression.

Suggest applying the same pattern consistently:

-(u)\.\s?(v)\.[ \t]?(m|a)\.
+(u)\.[ \t]?(v)\.[ \t]?(m|a)\.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
<regexp>(u)\.\s?(v)\.[ \t]?(m|a)\.</regexp>
(u)\.[ \t]?(v)\.[ \t]?(m|a)\.

Comment on lines +29 to +32
private static final Pattern DE_LE = Pattern.compile("\\bde le\\b", Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CHARACTER_CLASS);
private static final Pattern A_LE = Pattern.compile("\\bà le\\b", Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CHARACTER_CLASS);
private static final Pattern DE_LES = Pattern.compile("\\bde les\\b", Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CHARACTER_CLASS);
private static final Pattern A_LES = Pattern.compile("\\bà les\\b", Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CHARACTER_CLASS);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Several patterns need Unicode character class updates for consistent handling

The verification revealed that there are several patterns in French-related files that use Pattern.UNICODE_CASE without Pattern.UNICODE_CHARACTER_CLASS, particularly in files that deal with word boundaries and character classes. Key locations that need similar updates:

  • FrenchWordTokenizer.java:

    • TYPEWRITER_APOSTROPHE, TYPOGRAPHIC_APOSTROPHE patterns using \p{L} with word boundaries
    • NEARBY_HYPHENS, HYPHENS patterns using \p{L} with word boundaries
    • SPACE_DIGITS0 pattern (only using UNICODE_CASE)
  • FrenchTagger.java:

    • PREFIXES_FOR_VERBS and PREFIXES_FOR_NOUN_ADJ patterns using case-insensitive matching with accented characters

These patterns would benefit from the same Pattern.UNICODE_CHARACTER_CLASS flag to ensure consistent Unicode handling across the French module, especially for patterns involving word boundaries (\b) or character classes (\p{L}).

🔗 Analysis chain

LGTM! Improved Unicode handling for French contractions.

The update to use Pattern.UNICODE_CHARACTER_CLASS is correct and improves the handling of Unicode word boundaries in French text. This change aligns with JDK 19 best practices for regex patterns.

Let's verify if similar patterns in other French-related classes have been updated:


This script will help us examine all Pattern.compile usages in French-related files to verify if similar Unicode handling updates are needed elsewhere. The previous search revealed multiple files using Pattern.compile, so we need to inspect their actual pattern definitions to ensure consistent Unicode handling across the French module.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for other regex patterns in French-related classes that might need similar updates

# Search for Pattern.compile in French-related files
echo "Checking for other regex patterns in French-related files..."
rg -l "Pattern\.compile" "languagetool-language-modules/fr/src/main/java"

# Search specifically for UNICODE_CASE without UNICODE_CHARACTER_CLASS
echo "Checking for patterns that might need updating..."
rg "Pattern\.compile.*UNICODE_CASE(?!.*UNICODE_CHARACTER_CLASS)" "languagetool-language-modules/fr/src/main/java"

Length of output: 1824


Script:

#!/bin/bash
# Let's check the content of these files for Pattern.compile usage
for file in $(rg -l "Pattern\.compile" "languagetool-language-modules/fr/src/main/java"); do
  echo "=== $file ==="
  # Show Pattern.compile lines with surrounding context
  rg -B1 -A1 "Pattern\.compile" "$file"
  echo
done

Length of output: 17710

@@ -1,2 +1,2 @@

<!ENTITY hyphenised_expressions "\b(?!feij(?:ão|ões)-frade)((?:c(?:a(?:r(?:rap(?:icho(?:s-(?:c(?:a(?:(?:rneir|val)o|lçada)|igana)|(?:agu|ove)lha|l(?:inho|ã)|boi)|-(?:c(?:a(?:(?:rneir|val)o|lçada)|igana)|(?:agu|ove)lha|l(?:inho|ã)|boi))|ato(?:s-(?:p(?:assarinho|eixe)|(?:caval|sap)o|galinha|boi)|-(?:p(?:assarinho|eixe)|(?:caval|sap)o|galinha|boi)))|u(?:ru(?:s-(?:(?:s(?:oldad|ap)|cach|vead)o|espi(?:nho|ga)|po(?:mba|rco))|-(?:(?:s(?:oldad|ap)|cach|vead)o|espi(?:nho|ga)|po(?:mba|rco)))|atás?-pau)|d(?:o(?:s-(?:(?:(?:bur|ou)r|vis[cg])o|co(?:chonilha|alho|mer)|isca)|-(?:(?:(?:bur|ou)r|vis[cg])o|co(?:chonilha|alho|mer)|isca))|ea(?:is|l)-poupa)|á(?:s-(?:(?:sapateir|cabocl|espinh)o|(?:angol|pedr|águ)a|jardim)|-(?:(?:sapateir|cabocl|espinh)o|(?:pedr|águ)a|jardim))|a(?:n(?:ha(?:s-(?:viveir|toc|ri)|-(?:viveir|toc))o|guejos?-pedra)|guatás?-jardim)|(?:v(?:ões|ão)-ferreir|mim-cártam)o|nes-donzela)|p(?:i(?:ns-(?:b(?:o(?:t(?:ão|a)|lota|de)|a(?:ndeira|tatais)|u(?:cha|rro)|ezerro)|c(?:a(?:(?:rneir|val)o|(?:piva|b)ra)|o(?:ntas|rte|co)|heiro|uba)|t(?:(?:artarug|ouceir)a|e(?:nerife|so))|(?:m(?:a(?:rrec|nad)|ul)|esteir|égu)a|p(?:(?:ernambuc|ast|omb)o|lanta)|f(?:o(?:rquilha|go)|lecha|eixe)|r(?:o(?:[ls]a|des)|ebanho|aiz)|a(?:n(?:gola|dar)|çude)|s(?:apo|oca)|diamante|lastro|natal|itu)|m-(?:b(?:o(?:t(?:ão|a)|lota|de)|a(?:ndeira|tatais)|u(?:cha|rro)|ezerro)|c(?:a(?:(?:rneir|val)o|(?:piva|b)ra)|o(?:ntas|rte|co)|heiro|uba)|(?:m(?:a(?:rrec|nad)|ul)|esteir|égu|soc)a|t(?:(?:artarug|ouceir)a|e(?:nerife|so))|p(?:(?:ernambuc|ast|omb)o|lanta)|f(?:o(?:rquilha|go)|lecha|eixe)|r(?:o(?:[ls]a|des)|ebanho|aiz)|a(?:n(?:gola|dar)|çude)|d(?:iamante|eus)|lastro|natal|itu)|tã(?:es|o)-sa(?:ír|l)a|xinguis?-bicho)|elas?-viúva)|n(?:a(?:s-(?:(?:(?:chei|bur)r|passarinh|macac)o|(?:v(?:asso[iu]|íbo)r|frech|roc)a|(?:jacar|imb)é|elefante|açúcar|urubu)|-(?:(?:v(?:asso[iu]|íbo)r|frech|roc)a|(?:passarinh|macac|burr)o|(?:jacar|imb)é|elefante|açúcar|urubu)|fístula(?:s-(?:igapó|lagoa|boi)|-(?:igapó|lagoa|boi)))|el(?:a(?:s-(?:c(?:a(?:poeira|tarro)|(?:eilã|heir)o|utia)|v(?:e(?:ad|lh)o|argem)|g(?:arça|oiás)|papagaio|jacamim|ema)|-(?:c(?:a(?:poeira|tarro)|utia)|v(?:argem|eado)|g(?:arça|oiás)|papagaio|jacamim|ema))|eira(?:s-(?:cheiro|ema)|-(?:cheiro|ema)))|udo(?:s-(?:cachimbo|lagoa)|-(?:cachimbo|lagoa))|(?:ários?-franç|iços?-águ)a|sanç(?:ões|ão)-leite|oés?-botão)|s(?:t(?:a(?:nh(?:a(?:-(?:(?:á(?:fric|gu)|a(?:rar|nt))a|m(?:oçambique|acaco|inas)|c(?:aiaté|utia)|p(?:eixe|uri)|jatobá|bugre)|s-(?:(?:á(?:fric|gu)|a(?:rar|nt))a|m(?:oçambique|acaco|inas)|c(?:aiaté|utia)|p(?:eixe|uri)|bugre))|eiros?-minas)|s?-correr)|or(?:es)?-montanha)|c(?:a(?:s-(?:carvalho|jacaré|anta)|-(?:carvalho|jacaré|noz))|o(?:s-(?:cavalo|jabuti|tatu)|-(?:cavalo|jabuti|tatu))|udo(?:s-(?:enfeite|aranha)|-(?:enfeite|aranha))))|m(?:a(?:r(?:á(?:s-(?:(?:c(?:aval|heir)|espinh)o|b(?:ilro|oi)|flecha)|-(?:(?:c(?:aval|heir)|espinh)o|b(?:ilro|oi)|flecha))|ões-(?:pe(?:nedo|dra)|estalo|areia)|ão-(?:pe(?:nedo|dra)|estalo|areia))|le(?:ões-(?:pedreira|asas)|ão-(?:pedreira|asas)))|b(?:ará(?:s-(?:c(?:h(?:eir|umb)o|apoeira)|espinho|lixa)|-(?:c(?:h(?:eir|umb)o|apoeira)|espinho|lixa))|oatãs?-leite)|urus?-cheiro)|c(?:himbo(?:s-(?:(?:maca|tur)co|jabuti)|-(?:(?:maca|tur)co|jabuti))|au(?:s-(?:ca(?:racas|iena)|mico)|-(?:ca(?:racas|iena)|mico))|tos?-cabeça)|b(?:a(?:(?:ças?-trombet|cinhas?-cobr)a|s-(?:igreja|ladrão|peixe)|-(?:igreja|ladrão|peixe))|u(?:mbos?-azeite|rés?-orelha))|val(?:inho(?:-(?:judeu|deus|cão)|s-judeu)|o-cão)|fé(?:s-b(?:agueio|ugre)|-b(?:agueio|ugre))|t(?:ingueiros?-porc|otas?-espinh)o|avuranas?-cunhã)|o(?:c(?:o(?:-(?:(?:b(?:acai(?:aú|u)b|ocaiuv)|p(?:almeir|indob|urg)|quar(?:esm|t)|oitav)a|v(?:a(?:queiro|ssoura)|inagre|eado)|c(?:(?:a(?:cho|ta)rr|igan)o|olher)|(?:espinh|rosári|macac|óle)o|i(?:ndaiá|ri)|(?:gur|a)iri|na(?:tal|iá)|dendê)|s-(?:(?:b(?:acai(?:aú|u)b|ocaiuv)|p(?:almeir|indob|urg)|quar(?:esm|t)|oitav)a|v(?:a(?:queiro|ssoura)|inagre|eado)|c(?:(?:a(?:cho|ta)rr|igan)o|olher)|(?:espinh|rosári|macac|óle)o|i(?:ndaiá|ri)|na(?:tal|iá)|dendê|guriri))|(?:honilhas?-cer|as?-águ)a)|bra(?:-(?:c(?:a(?:p(?:elo|im)|scavel|belo|ju)|o(?:lchete|ral)|ipó)|(?:es[cp]ad|ferradur|barat|águ)a|(?:v(?:ead|idr)|lix|oc)o|a(?:r(?:eia)?|sa)|pernas|ratos?)|s-(?:c(?:a(?:p(?:elo|im)|scavel|belo|ju)|o(?:lchete|ral)|ipó)|(?:ferradur|barat|espad|águ)a|(?:v(?:ead|idr)|lix|oc)o|a(?:r(?:eia)?|sa)|pernas|ratos?))|l(?:a(?:-(?:(?:(?:sapatei|zor)r|caval)o|peixe)|s-(?:(?:caval|zorr)o|peixe))|eir(?:o(?:s-(?:(?:band|choc)o|sapé)|-(?:(?:band|choc)o|sapé))|as?-sapé))|r(?:(?:uj(?:as?|ão)-igrej|tiças?-montanh|-ros)a|vina(?:s-(?:corso|linha)|-(?:corso|linha))|d(?:ões|ão)-frade|reias?-inverno)|e(?:rana(?:s-(?:(?:caravel|min)as|pernambuco)|-(?:(?:caravel|min)as|pernambuco))|ntro(?:s-caboclos|-caboclo))|gumelo(?:s-(?:c(?:aboclo|hapéu)|(?:sangu|leit)e|paris)|-(?:c(?:aboclo|hapéu)|(?:sangu|leit)e|paris))|uve(?:s-(?:a(?:dorno|reia)|(?:saboi|águ)a|cortar)|-(?:a(?:dorno|reia)|(?:saboi|águ)a|cortar))|n(?:gonha(?:s-(?:caixeta|bugre|goiás)|-(?:caixeta|bugre|goiás))|durus?-sangue|tas?-cabra)|irana(?:s-(?:(?:caravel|min)as|pernambuco)|-(?:(?:caravel|min)as|pernambuco))|(?:xas?-(?:d(?:am|on)|freir)|mer(?:es)?-arar|tovias?-poup)a|queiro(?:s-(?:vassoura|dendê)|-(?:vassoura|dendê))|paibeiras?-minas)|ipó(?:-(?:c(?:a(?:r(?:neiro|ijó)|b(?:oclo|aça)|noa)|o(?:r(?:ação|da)|(?:br|l)a)|h(?:agas|umbo)|u[mn]anã|esto)|a(?:l(?:caçuz|ho)|r(?:acuã|c)o|marrar|gulha)|m(?:a(?:inibu|caco)|o(?:fumb|rceg)o|ucuna)|b(?:a(?:(?:mburra|rri)l|tata)|reu|oi)|j(?:a(?:b(?:ut[ái]|ota)|rrinha)|unta)|p(?:(?:a(?:in|lm)|oit)a|enas)|t(?:amanduá|ucunaré|imbó)|l(?:avadeira|eite)|v(?:aqueiro|iúva)|e(?:mbiri|scada)|im(?:pingem|bé)|g(?:ato|ota)|s(?:apo|eda)|(?:fo|re)go|quati|água)|s-(?:c(?:a(?:r(?:neiro|ijó)|b(?:oclo|aça)|noa)|o(?:r(?:ação|da)|(?:br|l)a)|u[mn]anã|hagas|esto)|a(?:l(?:caçuz|ho)|r(?:acuã|c)o|marrar|gulha)|m(?:a(?:inibu|caco)|o(?:fumb|rceg)o|ucuna)|b(?:a(?:(?:mburra|rri)l|tata)|reu|oi)|j(?:a(?:b(?:ut[ái]|ota)|rrinha)|unta)|p(?:(?:a(?:in|lm)|oit)a|enas)|t(?:amanduá|ucunaré|imbó)|l(?:avadeira|eite)|v(?:aqueiro|iúva)|e(?:mbiri|scada)|im(?:pingem|bé)|g(?:ato|ota)|s(?:apo|eda)|(?:fo|re)go|quati|água))|r(?:av(?:o(?:s-(?:(?:cabe(?:cinh|ç)|esperanç|sear)a|b(?:(?:astã|urr)o|ouba)|p(?:oeta|au)|defunto|tunes|urubu|amor)|-(?:(?:cabe(?:cinh|ç)|esperanç|sear)a|b(?:(?:astã|urr)o|ouba)|p(?:oeta|au)|defunto|tunes|urubu|amor))|in(?:a(?:s-(?:(?:lagartix|águ)a|ambrósio|tunes|pau)|-(?:(?:lagartix|águ)a|tunes|pau))|ho(?:s-(?:(?:lagartix|campin)a|defunto)|-(?:lagartixa|defunto))))|ista(?:s-(?:gal(?:inha|o)|mutum|peru)|-(?:gal(?:inha|o)|mutum|peru)|(?:is|l)-rocha))|e(?:bol(?:(?:a(?:s-(?:cheir|lob)|-lob)|inhas?-cheir)o|etas?-frança)|r(?:ej(?:as?-(?:caien|purg)|eiras?-purg)a|vejas?-pobre)|n(?:táureas?-jardim|ouras?-creta)|vadas?-jardim)|h(?:a(?:ga(?:s-(?:bauru|jesus)|-bauru)|scos?-leque)|u(?:p(?:ões|ão)-arroz|vas?-imbu))|u(?:tia(?:s-(?:rabo|pau)|-(?:rabo|pau))|(?:mbuc|i)as?-macaco)|ânhamo-manila)|p(?:a(?:u(?:s-(?:c(?:a(?:n(?:(?:galh|inan)a|deeiro|oas?|til)|m(?:peche|arão)|r(?:rapato|ne)|c(?:himbo|a)|i(?:bro|xa)|pitão|stor)|o(?:r(?:tiça|al)|n(?:ch|t)a|lher|bre)|h(?:a(?:pad|nc)a|i(?:cl|fr)e|eiro)|u(?:rt(?:ume|ir)|nanã|biú|tia)|erc?a|inzas|ruz)|s(?:a(?:n(?:t(?:ana|o)|gue)|p(?:ateir)?o|b(?:ão|iá)|ssafrás|lsa)|e(?:(?:rr|d)a|bo)|urriola|olar)|m(?:a(?:(?:n(?:jeriob|teig)|ri)a|(?:cac|str|lh)o)|o(?:(?:njol|rceg)o|quém|có)|(?:utamb|erd)a)|b(?:u(?:jarrona|gre|rro)|i(?:ch?o|lros)|a(?:rbas|lso)|o(?:[lt]o|ia)|r(?:incos|eu)|álsamo)|p(?:e(?:r(?:nambuco|eira)|nte)|r(?:eg(?:uiça|o)|aga)|i(?:ranha|lão)|o(?:mb|rc)o|ólvora)|l(?:a(?:g(?:arto|oa)|cre|nça)|e(?:(?:br|it)e|tras|pra)|i(?:vros|xa)|ágrima)|f(?:(?:a(?:[iv]|rinh)|ormig)a|(?:u[ms]|ígad)o|l(?:echas?|or)|e(?:bre|rro))|r(?:e(?:(?:spost|nd)a|(?:[gm]|in)o|de)|os(?:eira|as?)|a(?:inha|to))|e(?:s(?:p(?:inh|et)o|teira)|(?:rv(?:ilh)?|mbir)a|lefante)|a(?:(?:bóbor|ngol)a|r(?:ara|co)|l(?:ho|oé))|t(?:a(?:rtaruga|manco)|in(?:gui|ta)|ucano)|v(?:i(?:n(?:tém|ho)|ola)|e(?:ado|ia)|aca)|g(?:(?:asolin|om)a|ui(?:tarra|né))|j(?:erimum?|angada|udeu)|d(?:igestão|edal)|n(?:avalha|ovato)|o(?:rvalho|laria)|(?:incens|óle)o|(?:zebr|águ)a|qui(?:abo|na))|-(?:c(?:a(?:n(?:(?:galh|inan)a|deeiro|oas?|til)|m(?:peche|arão)|r(?:rapato|ne)|c(?:himbo|a)|i(?:bro|xa)|pitão|stor)|o(?:r(?:tiça|al)|n(?:ch|t)a|lher|bre)|h(?:a(?:pad|nc)a|i(?:cl|fr)e|eiro)|u(?:rt(?:ume|ir)|nanã|biú|tia)|erc?a|inzas|ruz)|s(?:a(?:n(?:t(?:ana|o)|gue)|p(?:ateir)?o|b(?:ão|iá)|ssafrás|lsa)|e(?:(?:rr|d)a|bo)|urriola|olar)|m(?:a(?:(?:n(?:jeriob|teig)|ri)a|(?:cac|str|lh)o)|o(?:(?:njol|rceg)o|quém|có)|(?:utamb|erd)a)|b(?:u(?:jarrona|gre|rro)|i(?:ch?o|lros)|a(?:rbas|lso)|o(?:[lt]o|ia)|r(?:incos|eu)|álsamo)|p(?:e(?:r(?:nambuco|eira)|nte)|r(?:eg(?:uiça|o)|aga)|i(?:ranha|lão)|o(?:mb|rc)o|ólvora)|l(?:a(?:g(?:arto|oa)|cre|nça)|e(?:(?:br|it)e|tras|pra)|i(?:vros|xa)|ágrima)|f(?:(?:a(?:[iv]|rinh)|ormig)a|(?:u[ms]|ígad)o|l(?:echas?|or)|e(?:bre|rro))|r(?:e(?:(?:spost|nd)a|(?:[gm]|in)o|de)|os(?:eira|as?)|a(?:inha|to))|e(?:s(?:p(?:inh|et)o|teira)|(?:rv(?:ilh)?|mbir)a|lefante)|t(?:a(?:rtaruga|manco)|in(?:gui|ta)|ucano)|v(?:i(?:n(?:tém|ho)|ola)|e(?:ado|ia)|aca)|a(?:(?:bóbor|ngol)a|l(?:ho|oé)|rco)|g(?:(?:asolin|om)a|ui(?:tarra|né))|j(?:erimum?|angada|udeu)|d(?:igestão|edal)|n(?:avalha|ovato)|o(?:rvalho|laria)|(?:incens|óle)o|(?:zebr|águ)a|qui(?:abo|na))|xis?-pedra)|l(?:m(?:eir(?:a(?:s-(?:(?:(?:palmi|ce)r|igrej)a|madagascar|dendê|leque|tebas|vinho)|-(?:(?:(?:palmi|ce)r|igrej)a|madagascar|dendê|leque|tebas|vinho))|inhas?-petrópolis)|a(?:s-(?:c(?:hicote|acho)|igreja|leque)|-(?:c(?:hicote|acho)|igreja|leque)|tórias?-espinho)|i(?:tos?-ferrão|lhas?-papa))|ha(?:s-(?:(?:penach|caniç)o|guiné|água)|-(?:(?:penach|caniç)o|guiné|água))|os-(?:calenturas|maria))|r(?:ic(?:á(?:s-(?:esponjas|curtume)|-(?:esponjas|curtume))|aranas?-espinhos)|a(?:cuuba(?:s-lei(?:te)?|-lei(?:te)?)|sitas?-samambaiaçu|tudos?-praia)|go(?:s-(?:m(?:itra|orro)|cótula)|-(?:m(?:itra|orro)|cótula)))|p(?:o(?:ila(?:s-(?:espinho|holanda)|-(?:espinho|holanda))|ula(?:s-(?:espinho|holanda)|-(?:espinho|holanda)))|agaio(?:s-cole(?:ira|te)|-cole(?:ira|te)))|in(?:a(?:s-(?:s(?:apo|eda)|arbusto|penas|cuba)|-(?:s(?:apo|eda)|arbusto|penas|cuba))|eira(?:s-(?:c(?:ipó|uba)|leite)|-(?:c(?:ipó|uba)|leite)))|(?:c(?:o(?:vas?-macac|s?-golung)|as-rab)|ssarinhos?-(?:arribaç|ver)ã|nelas?-bugi)o|t(?:os?-c(?:a(?:rúncul|ien)|rist)a|i(?:nhos?-igapó|s?-goiás))|v(?:ões|ão)-java)|i(?:nh(?:eir(?:o(?:s-(?:(?:(?:pur|ri)g|casquinh)a|jerusalém|alepo)|-(?:(?:(?:pur|ri)g|casquinh)a|jerusalém|alepo))|inho(?:s-(?:jardim|sala)|-(?:jardim|sala)))|ões-(?:(?:cerc|purg)a|madagascar|ratos?)|ão-(?:(?:cerc|purg)a|madagascar|rato)|o(?:s-(?:flandres|riga)|-riga)|as?-raiz)|ment(?:a(?:s-(?:c(?:(?:aien|oro)a|heiro)|(?:ra[bt]|macac)o|g(?:alinha|entio)|bu(?:gre|ta)|queimar|água)|-(?:c(?:(?:aien|oro)a|heiro)|(?:ra[bt]|macac)o|g(?:alinha|entio)|bu(?:gre|ta)|queimar|água))|ões-c(?:aiena|heiro)|ão-c(?:aiena|heiro))|t(?:o(?:mb(?:a(?:s-(?:macaco|leite)|-(?:macaco|leite))|eiras?-marajó)|s-(?:água|saci)|-(?:água|saci))|a(?:ng(?:ueira(?:s-(?:cachorro|jardim)|-(?:cachorro|jardim))|as?-cachorro)|s?-erva)|eiras?-sinal)|olho(?:s-(?:(?:galinh|balei|onç)a|(?:soldad|tubarã)o|p(?:lanta|adre)|c(?:ação|obra)|faraó|urubu)|-(?:(?:galinh|balei|onç)a|(?:soldad|tubarã)o|p(?:lanta|adre)|c(?:ação|obra)|faraó|urubu))|(?:piras?-(?:máscar|prat)|quiás?-pedr|ão-purg)a|c(?:ões-tr(?:opeiro|epar)|ão-tropeiro)|xiricas?-bolas|raíbas?-pele)|e(?:r(?:a(?:s-(?:a(?:(?:guieir|lmeid)a|dvogado)|r(?:e(?:fego|i)|osa)|(?:cris|un)to|jesus|água)|-(?:a(?:(?:guieir|lmeid)a|dvogado)|r(?:e(?:fego|i)|osa)|(?:cris|un)to|jesus|água))|oba(?:s-(?:(?:pernambuc|reg)o|ca(?:ntagalo|mpos)|go(?:iás|mo)|minas)|-(?:(?:pernambuc|reg)o|ca(?:ntagalo|mpos)|go(?:iás|mo)|minas))|(?:iquit(?:o(?:s-(?:campin|ant)|-ant)|inhos?-vassour)|cevejos?-(?:ca[ms]|galinh))a|diz(?:es)?-alqueive|us?-sol)|na(?:chos?-capim|s-avestruz)|pinos?-(?:papagai|burr)o|ssegueiros?-abrir|quiás?-pedra)|u(?:rga(?:s-(?:c(?:a(?:i(?:tité|apó)|(?:boc|va)lo|rijó)|ereja)|(?:ve(?:ad|nt)|marinheir|genti)o|pa(?:ulista|stor)|nabiça)|-(?:c(?:a(?:i(?:tité|apó)|(?:boc|va)lo|rijó)|ereja)|(?:ve(?:ad|nt)|marinheir|genti)o|pa(?:ulista|stor)|nabiça))|lg(?:a(?:s-(?:(?:a(?:rei|nt)|galinh|águ)a|bicho)|-(?:(?:a(?:rei|nt)|galinh|águ)a|bicho))|(?:ões|ão)-planta))|o(?:mb(?:a(?:s-(?:(?:(?:arribaç|sert)ã|espelh|band)o|mulata)|-(?:(?:(?:arribaç|sert)ã|espelh|band)o|mulata))|o(?:s-(?:montanha|leque)|-(?:montanha|leque)))|rco(?:s-(?:verrugas|ferro)|-(?:verrugas|ferro))|aia(?:s-(?:minas|cipó)|-(?:minas|cipó)))|ã(?:es-(?:p(?:o(?:rc(?:in)?o|bre)|ássaros)|gal(?:inha|o)|leite|cuco)|o-(?:p(?:orc(?:in)?o|ássaros)|gal(?:inha|o)|cuco))|l(?:uma(?:s-(?:príncipe|capim)|-(?:príncipe|capim))|átanos?-gênio|antas?-neve)|r(?:eguiça(?:s-(?:bentinho|coleira)|-(?:bentinho|coleira))|imaveras?-caiena)|ássaros?-f(?:andan|i)go|êssegos?-abrir)|f(?:lor(?:es-(?:c(?:a(?:(?:(?:r(?:nav|de)|m)a)?l|(?:sament|chimb|bocl)o)|o(?:(?:[iu]r|elh|c)o|ntas|bra|ral)|e(?:tim|ra)|hagas|iúme|uco)|p(?:a(?:(?:ssarinh|pagai|raís|vã)o|dre|lha|u)|e(?:licano|dra)|érolas)|m(?:a(?:r(?:acujá|iposa)|deira|io)|(?:(?:eren|osca)d|us)a|ico)|b(?:a(?:(?:b(?:eir|ad)|rbeir)o|unilha|ile)|eso[iu]ro)|a(?:(?:lgodã|njinh)o|ranha|bril|zar)|s(?:a(?:p(?:at)?o|ngue)|(?:ed|ol)a)|v(?:(?:iúv|ac)a|e(?:lu|a)do)|n(?:(?:espereir|oiv)a|atal)|l(?:is(?:ado)?|agartixa|ã)|(?:quaresm|dian|águ)a|(?:invern|índi|fog)o|e(?:spírito|nxofre)|t(?:rombeta|anino)|g(?:ra[mx]a|elo)|jesus)|-(?:c(?:a(?:(?:(?:r(?:nav|de)|m)a)?l|(?:sament|chimb|bocl)o)|o(?:(?:[iu]r|elh|c)o|ntas|bra|ral)|e(?:tim|ra)|hagas|iúme|uco)|p(?:a(?:(?:ssarinh|pagai|raís|vã)o|dre|lha|u)|e(?:licano|dra)|érolas)|m(?:a(?:r(?:acujá|iposa)|deira|io)|(?:(?:eren|osca)d|us)a|ico)|b(?:a(?:(?:b(?:eir|ad)|rbeir)o|unilha|ile)|eso[iu]ro)|a(?:(?:lgodã|njinh)o|ranha|bril|zar)|s(?:a(?:p(?:at)?o|ngue)|(?:ed|ol)a)|v(?:(?:iúv|ac)a|e(?:lu|a)do)|n(?:(?:espereir|oiv)a|atal)|(?:quaresm|dian|águ)a|(?:invern|índi|fog)o|e(?:spírito|nxofre)|l(?:agartixa|is|ã)|t(?:rombeta|anino)|g(?:ra[mx]a|elo)|jesus))|rut(?:a(?:s-(?:c(?:o(?:n(?:de(?:ssa)?|ta)|(?:dorn|ruj)a)|a(?:chorro|scavel|iapó)|utia)|g(?:(?:enti|al)o|uar(?:iba|á)|rude)|m(?:a(?:n(?:teig|il)a|caco)|orcego)|p(?:(?:a(?:pagai|vã)|omb|ã)o|erdiz)|sa(?:(?:pucainh|ír)a|b(?:ão|iá))|a(?:n(?:ambé|el)|rara)|v(?:(?:íbor|i)a|eado)|b(?:abad|urr)o|t(?:ucano|atu)|l(?:epra|obo)|jac(?:aré|u)|árvore|faraó|ema)|-(?:c(?:o(?:n(?:de(?:ssa)?|ta)|(?:dorn|ruj)a)|a(?:chorro|scavel|iapó)|utia)|g(?:(?:enti|al)o|uar(?:iba|á)|rude)|m(?:a(?:n(?:teig|il)a|caco)|orcego)|p(?:(?:a(?:pagai|vã)|omb|ã)o|erdiz)|sa(?:(?:pucainh|ír)a|b(?:ão|iá))|a(?:n(?:ambé|el)|rara)|v(?:(?:íbor|i)a|eado)|b(?:abad|urr)o|t(?:ucano|atu)|l(?:epra|obo)|jac(?:aré|u)|árvore|faraó|ema))|eira(?:s-(?:c(?:onde(?:ssa)?|achorro|utia)|(?:macac|tucan|burr|lob)o|p(?:(?:avã|omb)o|erdiz)|jac(?:aré|u)|arara|faraó)|-(?:c(?:onde(?:ssa)?|achorro|utia)|(?:macac|tucan|burr|lob)o|p(?:(?:avã|omb)o|erdiz)|jac(?:aré|u)|arara|faraó))|o(?:s-(?:c(?:a(?:xinguelê|chorro)|o(?:br|nt)a)|m(?:a(?:nteiga|caco)|orcego)|p(?:apagaio|erdiz)|burro|sabiá|imbé)|-(?:c(?:a(?:xinguelê|chorro)|o(?:br|nt)a)|m(?:a(?:nteiga|caco)|orcego)|p(?:apagaio|erdiz)|burro|sabiá|imbé)))|o(?:r(?:m(?:iga(?:s-(?:f(?:e(?:rrão|bre)|ogo)|r(?:a(?:spa|bo)|oça)|c(?:emitério|upim)|m(?:andioca|onte)|b(?:entinho|ode)|(?:imbaúv|onç)a|n(?:ovato|ós)|defunto)|-(?:f(?:e(?:rrão|bre)|ogo)|r(?:a(?:spa|bo)|oça)|c(?:emitério|upim)|m(?:andioca|onte)|b(?:entinho|ode)|(?:imbaúv|onç)a|n(?:ovato|ós)|defunto))|osa(?:s-(?:besteiros|darei)|-(?:besteiros|darei)))|no(?:s-ja(?:çanã|caré)|-ja(?:çanã|caré)))|lha(?:s-(?:s(?:a(?:n(?:tana|gue)|bão)|e(?:rr|d)a)|f(?:(?:ígad|og)o|igueira|ronte)|p(?:a(?:pagaio|dre|jé)|irarucu)|(?:comichã|bold?|gel)o|l(?:ança|eite|ouco)|(?:zeb|he|ta)ra|mangue|urubu)|-(?:s(?:a(?:n(?:tana|gue)|bão)|erra)|f(?:(?:ígad|og)o|igueira|ronte)|p(?:a(?:pagaio|dre|jé)|irarucu)|(?:comichã|bold?|gel)o|l(?:ança|eite|ouco)|(?:zeb|he|ta)ra|mangue|urubu))|cas?-capuz)|a(?:v(?:a(?:s-(?:(?:a(?:ngol|rar)|(?:mal|v)ac|r(?:osc|am)|sucupir|holand)a|c(?:a(?:labar|valo)|h(?:eiro|apa)|obra)|b(?:e(?:souro|lém)|ol(?:ach|ot)a)|(?:quebrant|engenh|ordáli)o|l(?:(?:ázar|ob)o|ima)|t(?:ambaqui|onca)|p(?:orco|aca)|impin?gem)|-(?:(?:a(?:ngol|rar)|(?:mal|v)ac|r(?:osc|am)|sucupir|holand)a|b(?:e(?:souro|lém)|ol(?:ach|ot)a)|c(?:a(?:labar|valo)|(?:hap|obr)a)|(?:quebrant|ordáli)o|t(?:ambaqui|onca)|p(?:orco|aca)|l(?:ima|obo)|impin?gem))|eira(?:s-(?:impin?gem|berloque)|-(?:impin?gem|berloque))|inhas?-capoeira)|lc(?:ões|ão)-coleira)|e(?:ij(?:õe(?:s-(?:c(?:(?:o(?:br|rd)|er|ub)a|avalo)|g(?:u(?:ando|izos)|ado)|(?:árvor|azeit|frad)e|l(?:i(?:sbo|m)a|eite)|(?:jav|rol|soj)a|m(?:acáçar|etro)|po(?:mbinha|rco)|va(?:[cr]a|gem)|tropeiro|boi)|zinhos-capoeira)|ão(?:-(?:c(?:(?:ord|er|ub)a|avalo)|g(?:u(?:ando|izos)|ado)|(?:árvor|azeit|frad)e|l(?:i(?:sbo|m)a|eite)|(?:jav|rol|soj)a|m(?:acáçar|etro)|po(?:mbinha|rco)|va(?:[cr]a|gem)|tropeiro|boi)|zinho-capoeira))|(?:l(?:es)?-genti|nos?-cheir|tos?-botã)o)|i(?:g(?:ueira(?:s-(?:(?:lombrigueir|pit|go)a|b(?:engala|aco)|to(?:car|que)|jardim)|-(?:(?:lombrigueir|pit|go)a|b(?:engala|aco)|to(?:car|que)|jardim))|o(?:s-(?:(?:figueir|banan)a|r(?:echeio|ocha)|(?:tord|verã)o)|-(?:(?:figueir|banan)a|r(?:echeio|ocha)|(?:tord|verã)o)))|lária(?:s-(?:medina|guiné)|-(?:medina|guiné))|andeiras?-algodão)|u(?:mo(?:s-(?:(?:rapos|cord|folh)a|pa(?:isan|raís)o|jardim)|-(?:(?:rapos|cord|folh)a|pa(?:isan|raís)o|jardim))|ncho(?:s-(?:(?:florenç|águ)a|porco)|-(?:(?:florenç|águ)a|porco)))|éis-gentio)|b(?:a(?:nan(?:eir(?:a(?:s-(?:(?:madag[áa]sca|flo)r|(?:italian|papagai)o|sementes|jardim|corda|leque)|-(?:(?:madag[áa]sca|flo)r|(?:italian|papagai)o|sementes|jardim|corda|leque))|inha(?:s-(?:touceira|salão|flor)|-(?:touceira|salão|flor)))|a(?:s-(?:(?:m(?:orceg|acac)|papagai)o|s(?:ementes|ancho)|imbé)|-(?:(?:m(?:orceg|acac)|papagai)o|s(?:ementes|ancho)|imbé)))|tat(?:a(?:s-(?:p(?:e(?:rdiz|dra)|ur(?:ga|i)|orco)|a(?:(?:ngol|rrob)a|maro)|b(?:(?:ranc|ugi)o|ainha)|(?:cabocl|vead)o|t(?:aiuiá|iú)|escamas|rama)|-(?:p(?:e(?:rdiz|dra)|ur(?:ga|i)|orco)|a(?:(?:ngol|rrob)a|maro)|b(?:(?:ranc|ugi)o|ainha)|(?:cabocl|vead)o|t(?:aiuiá|iú)|escamas|rama))|inhas?-cobra)|g(?:a(?:s-(?:(?:cabocl|tucan|lour)o|p(?:ombo|raia))|-(?:(?:cabocl|tucan|lour)o|p(?:ombo|raia)))|re(?:s-(?:(?:arei|lago)a|man(?:gue|ta)|penacho)|-(?:(?:arei|lago)a|man(?:gue|ta)|penacho))|os?-chumbo)|r(?:r(?:ete(?:s-(?:clérigo|eleitor|padre)|-(?:clérigo|eleitor|padre))|ig(?:udas?-espinho|as?-freira))|ba(?:-(?:(?:chib|timã)o|pa(?:ca|u)|lagoa)|s-(?:(?:chib|timã)o|lagoa|boi|pau)))|b(?:osa(?:s-(?:árvore|espiga|pau)|-(?:árvore|espiga|pau))|a(?:-(?:(?:camel|sap)o|boi)|s-(?:sapo|boi)))|c(?:u(?:r(?:aus?-(?:lajea|ban)do|is?-cerca)|(?:paris?-capoei|s?-ped)ra)|abas?-(?:azeit|lequ)e)|mbu(?:s-(?:(?:espinh|caniç)o|pescador|mobília)|-(?:(?:espinh|caniç)o|pescador|mobília))|leia(?:s-(?:b(?:arbatana|ico)|corcova|gomo)|-(?:b(?:arbatana|ico)|corcova|gomo))|i(?:acu(?:s-(?:espinho|chifre)|-(?:espinho|chifre))|nhas?-(?:espad|fac)a)|st(?:(?:i(?:ões|ão)-arrud|ardos?-rom)a|ões-velho)|d(?:ianas?-cheiro|ejos?-lista)|unilhas?-auacuri|únas?-fogo)|i(?:c(?:h(?:o(?:-(?:c(?:(?:a(?:rpintei|chor)r|est)o|o(?:nta|co)|hifre)|(?:(?:ester|bura)c|ouvid|rum)o|(?:(?:gali|u)nh|taquar|sed)a|p(?:a(?:rede|u)|orco|ena|é)|m(?:(?:edranç|osc)a|ato)|v(?:areja|eludo)|f(?:rade|ogo))|s-(?:c(?:(?:a(?:rpintei|chor|nast)r|est)o|o(?:nta|co)|hifre)|(?:m(?:edranç|osc)|(?:gali|u)nh|taquar|sed)a|(?:(?:ester|bura)c|ouvid|rum)o|p(?:a(?:rede|u)|orco|ena|é)|v(?:areja|eludo)|f(?:rade|ogo)))|eiros?-conta)|udas?-corso)|ribás?-pernambuco|telos?-gente)|o(?:r(?:boleta(?:s-(?:p(?:êssego|iracema)|a(?:moreira|lface)|(?:carvalh|band)o|gás)|-(?:p(?:êssego|iracema)|a(?:moreira|lface)|(?:carvalh|band)o|gás))|d(?:ões|ão)-(?:santiag|macac)o)|i(?:s-(?:carro|guará|deus)|-(?:carro|guará|deus)|tas?-bigodes)|a(?:is|l)-alicante|fes?-burro|tos-óculos)|r(?:edo(?:s-(?:(?:namor(?:ad)?|porc|vead|mur)o|espi(?:nho|ga)|cabeça|jardim)|-(?:(?:namor(?:ad)?|porc|vead|mur)o|espi(?:nho|ga)|cabeça|jardim))|inco(?:s-(?:s(?:a(?:guim?|uim)|urubim)|passarinho)|-(?:sa(?:guim?|uim)|passarinho))|(?:ucos?-salvaterr|ancos?-barit)a|ocas?-raiz)|e(?:s(?:ouro(?:s-(?:(?:limeir|águ)a|chifre|maio)|-(?:(?:limeir|águ)a|chifre|maio))|ugos?-ovas)|l(?:droega(?:s-(?:inverno|cuba)|-(?:inverno|cuba))|a(?:s-felgueiras?|-felgueiras?))|ngalas?-camarão|tónicas?-água|ijus?-potó)|álsamo(?:s-(?:c(?:a(?:rtagena|nudo)|heiro)|(?:arce|tol)u|enxofre)|-(?:c(?:a(?:rtagena|nudo)|heiro)|(?:arce|tol)u|enxofre))|u(?:ch(?:o(?:s-(?:veado|boi|rã)|-(?:veado|boi|rã))|as?-purga)|t(?:iás?-vinagre|uas?-corvo)|xos?-holanda))|a(?:l(?:f(?:a(?:vaca(?:s-(?:c(?:(?:abocl|heir)o|obra)|vaqueiro)|-(?:c(?:(?:abocl|heir)o|obra)|vaqueiro))|ce(?:s-(?:(?:c(?:ordeir|ã)|porc)o|alger)|-(?:(?:c(?:ordeir|ã)|porc)o|alger))|zemas?-caboclo|fas?-provença)|inete(?:s-(?:toucar|dama)|-toucar))|m(?:a(?:s-(?:c(?:a(?:(?:boc|va)lo|çador)|(?:hichar|ânta)ro)|(?:tapui|pomb|gat)o|biafada|mestre)|-(?:c(?:a(?:(?:boc|va)lo|çador)|(?:hichar|ânta)ro)|(?:tapui|pomb|gat)o|biafada))|ecegueira(?:s-(?:cheiro|minas)|-(?:cheiro|minas)))|e(?:cri(?:ns-(?:c(?:ampina|heiro)|angola)|m-(?:c(?:ampina|heiro)|angola))|trias?-pau)|ho(?:s-(?:espanha|cheiro)|-(?:espanha|cheiro))|ba(?:troz(?:es)?-sobrancelha|coras?-laje)|g(?:odoeiros?-pernambuco|ibeiras?-dama)|cachofras?-jerusalém|amandas?-jacobina)|r(?:a(?:ç(?:á(?:s-(?:c(?:o(?:mer|roa)|heiro)|(?:umbig|vead)o|(?:pomb|ant)a|tinguijar|minas)|-(?:c(?:o(?:mer|roa)|heiro)|(?:umbig|vead)o|(?:pomb|ant)a|tinguijar|minas))|aris?-minhoca)|ticu(?:ns-(?:(?:espinh|cheir)o|(?:jangad|pac)a|boia?)|m-(?:(?:espinh|cheir)o|(?:jangad|pac)a|boia?))|nha(?:s-(?:água|coco)|-(?:água|coco))|(?:pocas?-cheir|rutas?-porc)o)|r(?:aia(?:s-(?:coroa|fogo)|-(?:coroa|fogo))|ozes-(?:telhad|rat)o|udas?-campinas)|oeira(?:s-(?:(?:goiá|mina)s|capoeira|bugre)|-(?:(?:goiá|mina)s|capoeira|bugre))|(?:lequi(?:ns|m)-caien|cos?-pip)a)|n(?:g(?:ico(?:s-(?:m(?:onte|ina)s|banhado|curtume)|-(?:m(?:onte|ina)s|banhado|curtume))|eli(?:ns|m)-(?:espinh|morceg)o|élicas?-rama)|a(?:n(?:ases-(?:caraguatá|agulha)|ás-(?:caraguatá|agulha))|mbés?-capuz)|dorinha(?:s-(?:bando|casa)|-(?:bando|casa))|ingas?-(?:espinh|macac)o|u(?:n?s|m)?-enchente|z(?:óis|ol)-lontra)|m(?:or(?:e(?:ira(?:s-(?:espinho|árvore)|-(?:espinho|árvore))|s-(?:(?:(?:vaquei|bur)r|hortelã)o|moça))|-(?:(?:(?:vaquei|bur)r|hortelã)o|moça))|e(?:ndo(?:i(?:ns-(?:árvore|veado)|m-(?:árvore|veado))|eiras?-coco)|ixa(?:s-(?:madagascar|espinho)|-(?:madagascar|espinho)))|êndoas?-(?:espinh|coc)o)|b(?:elha(?:s-(?:c(?:(?:achorr|hã)o|upim)|(?:rein|fog|our|sap)o|p(?:urga|au))|-(?:(?:rein|fog|our|sap)o|p(?:urga|au)|cupim))|(?:utuas?-batat|óbora-coro)a|r(?:icós?-macaco|aços?-vide))|s(?:a(?:s-(?:pa(?:pagaio|lha)|(?:barat|telh)a|sabre)|-(?:pa(?:pagaio|lha)|(?:barat|telh)a|sabre))|pargo(?:s-(?:jardim|sala)|-(?:jardim|sala))|so[bv]ios?-(?:cobr|folh)a)|ça(?:f(?:ate(?:s-(?:o[iu]ro|prata)|-(?:o[iu]ro|prata))|roeiras?-pernambuco)|ís?-caatinga)|g(?:ulh(?:(?:ões|ão)-(?:(?:trombe|pra)t|vel)a|as?-pastor)|rílicas?-rama)|zed(?:inha(?:s-(?:corumbá|goiás)|-(?:corumbá|goiás))|as-ovelha)|ve(?:nca(?:s-(?:espiga|minas)|-(?:espiga|minas))|s?-crocodilo)|(?:ipos?-montevid|carás?-v)éu|tu(?:ns|m)-galha)|m(?:a(?:r(?:acujá(?:s-(?:c(?:a(?:iena|cho)|o(?:rtiç|br)a|heiro)|(?:ga(?:rap|vet)|mochil)a|pe(?:riquito|dra)|est(?:rada|alo)|(?:alh|rat)o)|-(?:c(?:a(?:iena|cho)|o(?:rtiç|br)a|heiro)|(?:ga(?:rap|vet)|mochil)a|pe(?:riquito|dra)|est(?:rada|alo)|(?:alh|rat)o))|m(?:el(?:adas?-(?:ca(?:chorr|val)|invern|verã)o|(?:eir)?os?-bengala)|itas?-macaco)|reco(?:s-(?:pequim|ruão)|-(?:pequim|ruão))|imbondos?-chapéu|quesas?-belas)|c(?:a(?:co(?:s-(?:(?:cheir|band)o|noite|sabá)|-(?:(?:cheir|band)o|noite|sabá))|mbira(?:s-(?:(?:flech|pedr)a|serrote)|-(?:(?:flech|pedr)a|serrote))|quinhos?-bambá)|ieira(?:s-(?:(?:anáfeg|coro)a|boi)|-(?:(?:anáfeg|coro)a|boi))|elas?-(?:tabuleir|botã)o|ucus?-paca)|n(?:gue(?:s-(?:(?:(?:pend|bot)ã|sapateir|espet)o|obó)|-(?:(?:(?:pend|bot)ã|sapateir|espet)o|obó))|t(?:imento(?:s-(?:araponga|pobre)|-(?:araponga|pobre))|as?-bretão)|jeric(?:ões|ão)-(?:ceilã|molh)o|d(?:ibis?-juntas|acarus?-boi))|çã(?:s-(?:c(?:(?:[au]c|rav)o|ipreste|obra)|a(?:náfega|rrátel)|(?:espelh|prat)o|rosa|vime|boi)|-(?:c(?:(?:[au]c|rav)o|ipreste|obra)|a(?:náfega|rrátel)|(?:espelh|prat)o|rosa|vime|boi))|t(?:inho(?:s-(?:agulhas|lisboa|sargo)|-(?:agulhas|lisboa|sargo))|o(?:s-(?:engodo|salema)|-(?:engodo|salema)))|m(?:(?:icas?-(?:ca(?:chorr|del)|porc)|(?:ões|ão)-cord)a|oeiro(?:s-(?:espinho|corda)|-(?:espinho|corda)))|lva(?:s-(?:(?:cheir|pendã)o|marajó)|-(?:marajó|pendão)|íscos?-pernambuco)|d(?:ressilvas?-cheiro|eiras?-rei)|itacas?-maximiliano|parás?-cametá)|o(?:s(?:ca(?:s-(?:b(?:a(?:nheir|gaç)o|ich(?:eira|o))|e(?:lefante|stábulo)|ca(?:valos?|sa)|f(?:reira|ogo)|(?:madei|u)ra|inverno)|-(?:b(?:a(?:nheir|gaç)o|ich(?:eira|o))|e(?:lefante|stábulo)|ca(?:valos?|sa)|(?:madei|u)ra|fogo)|t(?:éis-(?:setúbal|jesus)|el-(?:setúbal|jesus)))|quitos?-parede)|ela(?:s-(?:mutum|ema)|-(?:mutum|ema))|n(?:stros?-gila|cos?-peru|tes?-ouro)|(?:uriscos?-sement|reias?-mangu)e|c(?:itaíbas?-leite|hos?-orelhas)|longós?-colher)|u(?:r(?:ici(?:s-(?:(?:tabuleir|porc)o|lenha)|-(?:(?:tabuleir|porc)o|lenha))|uré(?:s-(?:canudo|pajés)|-(?:canudo|pajé))|ta(?:s-(?:cheiro|parida)|-parida))|s(?:go(?:s-(?:irlanda|perdão)|-(?:irlanda|perdão))|aranhos?-água)|tu(?:ns-(?:asso[bv]io|fava)|m-(?:asso[bv]io|fava))|çambés?-espinhos|ngunzás?-cortar)|i(?:lh(?:o(?:s-(?:cobr|águ)|-águ)a|ãs?-pendão)|mos(?:as?-vereda|os?-cacho)|neiras?-petrópolis|cos?-topete|jos?-cavalo|olos-capim)|el(?:(?:(?:ões|ão)-(?:cabocl|morceg|soldad)|oeiros?-soldad)o|(?:ros?-(?:coleir|águ)|ancias?-cobr)a))|e(?:rv(?:a(?:s-(?:m(?:a(?:l(?:eitas|aca)|caé)|u(?:lher|ro)|o[iu]ra|endigo)|p(?:a(?:(?:rid|in)a|ssarinho)|(?:ântan|iolh)o|ontada)|a(?:n(?:(?:dorinh|t)a|jinho|il)|l(?:finete|ho)|mor)|b(?:(?:(?:ascul|ic)h|otã)o|(?:esteir|álsam)os|ugre)|c(?:(?:abr(?:it)?|obr)a|h(?:eir|umb)o)|sa(?:n(?:t(?:iago|ana)|gue)|(?:le)?po)|l(?:a(?:(?:vadeir|c)a|garto)|ouco)|g(?:o(?:[mt]|iabeir)a|uiné|elo)|f(?:(?:og|um|i)o|ebra)|(?:r(?:ober|a)t|our)o|ja(?:raraca|buti)|impingem|esteira)|-(?:m(?:a(?:l(?:eitas|aca)|caé)|u(?:lher|ro)|o[iu]ra|endigo)|p(?:a(?:(?:rid|in)a|ssarinho)|(?:ântan|iolh)o|ontada)|a(?:l(?:finete|míscar|ho)|n(?:(?:dorinh|t)a|il)|mor)|b(?:(?:(?:ascul|ic)h|otã)o|(?:esteir|álsam)os|ugre)|sa(?:n(?:t(?:iago|ana)|gue)|(?:le)?po)|c(?:(?:abr(?:it)?|obr)a|(?:humb|ã)o)|l(?:a(?:(?:vadeir|c)a|garto)|ouco)|g(?:o(?:[mt]|iabeir)a|uiné|elo)|f(?:(?:og|um|i)o|ebra)|(?:r(?:ober|a)t|our)o|ja(?:raraca|buti)|impingem|esteira))|i(?:lha(?:s-(?:(?:cheir|pomb)o|(?:árvo|leb)re|(?:angol|vac)a)|-(?:(?:cheir|pomb)o|(?:árvo|leb)re|(?:angol|vac)a))|nhas?-parida))|s(?:p(?:i(?:n(?:h(?:o(?:s-(?:c(?:a(?:(?:chor|rnei)ro|çada)|r(?:isto|uz)|erca)|(?:bananeir|agulh|roset)a|(?:ladrã|tour|urs)o|j(?:erusalém|udeu)|mari(?:ana|cá)|vintém|deus)|-(?:c(?:a(?:(?:chor|rnei)ro|çada)|r(?:isto|uz)|erca)|(?:bananeir|agulh|roset)a|(?:ladrã|tour|urs)o|j(?:erusalém|udeu)|mari(?:ana|cá)|vintém|deus))|eiro(?:s-(?:c(?:a(?:rneiro|iena)|risto|erca)|j(?:erusalém|udeu)|a(?:gulh|meix)a|vintém)|-(?:c(?:a(?:rneiro|iena)|risto|erca)|j(?:erusalém|udeu)|a(?:gulh|meix)a|vintém))|as?-(?:carneir|vead)o)|afres?-cuba)|ga(?:s-(?:(?:sangu|leit)e|ferrugem|água)|-(?:(?:sangu|leit)e|ferrugem|água)))|onjas?-raiz)|c(?:a(?:móneas?-alepo|das?-jabuti)|ovas?-macaco|umas?-sangue)|tercos?-jurema)|mbira(?:s-(?:ca(?:rrapato|çador)|(?:porc|sap)o)|-(?:ca(?:rrapato|çador)|(?:porc|sap)o))|n(?:xertos?-passarinho|redadeiras?-borla))|g(?:r(?:a(?:m(?:a(?:s-(?:p(?:(?:ernambuc|ast)o|onta)|(?:forquilh|sananduv)a|c(?:oradouro|idade)|ja(?:cobina|rdim)|ma(?:rajó|caé)|adorno)|-(?:p(?:(?:ernambuc|ast)o|onta)|(?:forquilh|sananduv)a|c(?:oradouro|idade)|ja(?:cobina|rdim)|ma(?:rajó|caé)|adorno))|inha(?:s-(?:campinas|jacobina|raiz)|-(?:campinas|jacobina|raiz)))|vatá(?:s-(?:(?:moquec|agulh)a|c(?:o[iu]ro|erca)|(?:ganch|lajed)o|r(?:aposa|ede)|árvore|tingir)|-(?:(?:moquec|agulh)a|c(?:o[iu]ro|erca)|(?:ganch|lajed)o|r(?:aposa|ede)|árvore|tingir))|lhas?-crista)|ão(?:s-(?:(?:c(?:aval|humb)|(?:malu|bi)c|gal)o|p(?:orco|ulha))|-(?:(?:(?:malu|bi)c|(?:cav|g)al)o|p(?:orco|ulha))|zinhos?-galo)|inaldas?-viúva)|a(?:l(?:o(?:s-(?:p(?:enacho|luma)|b(?:ando|riga)|rebanho|fita|ebó)|-(?:p(?:enacho|luma)|b(?:ando|riga)|rebanho|fita|ebó))|inha(?:s-(?:bugre|faraó|água)|-(?:bugre|faraó|água)))|fanhoto(?:s-(?:(?:(?:marmel|coqu)eir|arribaçã)o|(?:jurem|prag)a)|-(?:(?:(?:marmel|coqu)eir|arribaçã)o|(?:jurem|prag)a))|vi(?:ões-(?:(?:(?:colei|ser)r|queimad)a|a(?:nta|ruá)|penacho)|ão-(?:(?:(?:colei|ser)r|queimad)a|a(?:nta|ruá)|penacho))|meleira(?:-(?:(?:lombrigueir|p(?:in|ur)g)a|(?:cansaç|venen)o)|s-(?:(?:cansaç|venen)o|lombrigueiras|p(?:in|ur)ga))|to(?:-(?:madagáscar|algália)|s-algália)|r(?:oupas?-segunda|gantas-ferro))|u(?:a(?:birob(?:eira(?:s-(?:cachorro|minas)|-(?:cachorro|minas))|a(?:s-(?:cachorro|minas)|-(?:cachorro|minas)))|ricangas?-bengala)|iratãs?-coqueiro)|o(?:iab(?:a(?:s-(?:(?:espinh|macac)o|anta)|-(?:(?:espinh|macac)o|anta))|eiras?-(?:cuti|pac)a)|meiros?-minas|gós?-guariba|elas?-lobo)|e(?:rgeli(?:ns|m)-laguna|ngibres?-dourar)|irass(?:óis|ol)-batatas)|t(?:r(?:e(?:vo(?:s-(?:c(?:ar(?:retilha|valho)|heiro)|(?:se[ar]r|águ)a)|-(?:c(?:ar(?:retilha|valho)|heiro)|(?:se[ar]r|águ)a))|moço(?:s-(?:cheiro|jardim|minas)|-(?:cheiro|jardim|minas)))|i(?:go(?:s-(?:p(?:rioste|erdiz)|milagre|israel|verão)|-(?:p(?:rioste|erdiz)|milagre|israel|verão))|colino(?:s-c(?:hifre|rista)|-c(?:hifre|rista))|nca(?:is|l)-pau)|(?:aças?-bibliotec|épanos?-coro)a|omb(?:as?-elefante|etas?-arauto)|utas?-lago)|a(?:i(?:uiá(?:s-(?:c(?:omer|ipó)|pimenta|jardim|quiabo|goiás)|-(?:c(?:omer|ipó)|pimenta|jardim|quiabo|goiás))|nhas?-(?:cors|ri)o)|r(?:taruga(?:s-(?:couro|pente)|-(?:couro|pente))|umã(?:s-espinhos?|-espinhos?))|m(?:b(?:etarus?-espinh|ori[ls]-brav)o|anqueiras?-leite)|j(?:ujás?-(?:cabacinh|quiab)o|ás?-cobra)|(?:xizeiros?-tint|tus?-folh)a|b(?:ocas?-marajó|acos?-cão)|quaris?-cavalo)|i(?:n(?:gui(?:s-(?:c(?:(?:aien|ol)a|ipó)|(?:leit|peix)e)|-(?:c(?:(?:aien|ol)a|ipó)|(?:leit|peix)e))|hor(?:ões|ão)-lombriga)|mbó(?:s-(?:boticário|caiena|jacaré|peixe|raiz)|-(?:boticário|caiena|jacaré|peixe|raiz))|gres?-bengala)|o(?:m(?:at(?:e(?:s-(?:princesa|árvore)|-(?:princesa|árvore))|inhos?-capucho)|ilhos?-creta)|(?:rós?-espinh|adas-cour)o|petes?-cardeal)|u(?:c(?:u(?:ns-(?:carnaúba|redes)|m-(?:carnaúba|redes))|anos?-cinta)|bar(?:ões|ão)-focinho|lipas?-jardim|ias?-areia)|e(?:m(?:betarus?-espinho|porãos?-coruche)|rebintina-quio)|úberas?-(?:invern|verã)o)|r(?:a(?:to(?:s-(?:p(?:a(?:lmatória|iol)|entes|raga)|(?:es(?:pinh|got)|algodã)o|(?:t(?:aquar|romb)|águ)a|c(?:ouro|asa)|fa(?:raó|va)|bambu)|-(?:p(?:a(?:lmatória|iol)|entes|raga)|(?:es(?:pinh|got)|algodã)o|(?:t(?:aquar|romb)|águ)a|c(?:ouro|asa)|fa(?:raó|va)|bambu))|ízes-(?:c(?:(?:edr|urv)o|o(?:bra|rvo)|h(?:eiro|á)|âmaras|ana)|b(?:(?:ar?beir|randã)o|ugre)|(?:angélic|mostard|quin)a|l(?:agarto|opes)|sol(?:teira)?|t(?:ucano|iú)|f(?:rade|el)|guiné|pipi)|iz-(?:c(?:(?:edr|urv)o|o(?:bra|rvo)|h(?:eiro|á)|âmaras|ana)|b(?:(?:ar?beir|randã)o|ugre)|(?:angélic|mostard|quin)a|l(?:agarto|opes)|sol(?:teira)?|t(?:ucano|iú)|f(?:rade|el)|guiné|pipi)|b(?:uge(?:ns|m)-cachorr|anetes?-caval)o|m(?:as?-bezerro|os?-seda)|pés?-saci)|o(?:s(?:a(?:-(?:(?:c(?:a(?:chorr|bocl)|h?ã)|[bl]ob|defunt|o[iu]r|musg)o|p(?:áscoa|au)|jericó|toucar)|s-(?:(?:c(?:a(?:chorr|bocl)|h?ã)|[bl]ob|defunt|o[iu]r|musg)o|jericó|páscoa|toucar))|ário(?:s-(?:jamb[ou]|ifá)|-(?:jamb[ou]|ifá))|e(?:tas?-pernambu|iras?-damas)co)|uxin(?:óis-(?:m(?:uralha|anaus)|(?:espadan|jav)a|caniços)|ol-(?:m(?:uralha|anaus)|(?:espadan|jav)a|caniços))|(?:balos?-(?:arei|galh)|az(?:es)?-bandeir)a|ca(?:s-(?:flores|eva)|-(?:flores|eva)))|(?:e(?:sedás?-cheir|des?-leã)|ábanos?-caval)o|inocerontes?-Java)|s(?:a(?:l(?:sa(?:s-(?:c(?:a(?:stanheiro|valos)|heiro|upim)|(?:roch|águ)a|burro)|-(?:c(?:a(?:stanheiro|valos?)|upim)|(?:roch|águ)a|burro)|parrilhas?-lisboa)|va(?:s-(?:pernambuco|marajó)|-(?:pernambuco|marajó))|amandras?-água)|r(?:a(?:ndis?-(?:(?:carangu|gargar)ej|espinh)o|(?:magos?-águ|s?-pit)a)|dinha(?:s-(?:ga(?:lha|to)|laje)|-(?:ga(?:lha|to)|laje))|go(?:s-(?:beiço|dente)|-(?:beiço|dente))|ros?-pito)|n(?:haç(?:o(?:s-(?:(?:(?:coqu|mamo)eir|fog)o|encontros)|-(?:(?:(?:coqu|mamo)eir|fog)o|encontros))|us?-(?:encont|mamoei)ro)|ãs?-samambaia)|p(?:(?:ucaias?-castanh|és?-capoeir)a|o(?:-chifres?|s-chifre))|gui(?:n?s|m)?-bigode|mambaias?-penacho|[bv]acus?-coroa)|u(?:rucucu(?:s-(?:p(?:ati|ind)oba|fogo)|-(?:p(?:ati|ind)oba|fogo))|ma(?:umeiras?-macaco|gres?-provença|rés?-pedras))|orgo(?:s-(?:(?:vassour|espig)a|pincel|alepo)|-(?:(?:vassour|espig)a|pincel|alepo))|iris?-coral)|j(?:a(?:ca(?:r(?:andá(?:s-(?:campinas|espinho|sangue)|-(?:campinas|espinho|sangue))|és?-óculos)|(?:tir(?:ões|ão)-capot|s?-pobr)e)|smi(?:ns-(?:c(?:a(?:chorro|iena)|erca)|soldado|leite)|m-(?:c(?:a(?:chorro|iena)|erca)|soldado|leite))|(?:mb(?:eir)?os?-malac|lapas?-lisbo|puçá-coleir)a|tobá(?:s-(?:porco|anta)|-(?:porco|anta))|buticab(?:eiras?-campinas|as?-cipó)|r(?:aracas?-agosto|rinhas?-franja))|u(?:n(?:co(?:s-(?:c(?:a(?:ngalh|br)|obr)a|banhado)|-(?:c(?:a(?:ngalh|br)|obr)a|banhado))|ta(?:s-c(?:alangro|obra)|-c(?:alangro|obra))|ças-c(?:heiro|onta))|á(?:s-c(?:apote|omer)|-c(?:apote|omer))|rubebas?-espinho|ciris?-comer|quis?-cerca)|o(?:ões-(?:santarém|barros?|leite|puça)|ão-(?:santarém|barro|leite|puça))|e(?:quitibás?-agulheir|taís?-pernambuc)o|i(?:queranas?-goiás|tiranas?-leite))|l(?:i(?:m(?:(?:a(?:s-(?:cheir|umbig|bic)|-(?:umbig|bic))|eiras?-umbig)o|ões-(?:c(?:aiena|heiro)|galinha)|ão-(?:c(?:aiena|heiro)|galinha)|os?-manta)|n(?:ho(?:s-(?:raposa|cuco)|-(?:raposa|cuco))|gu(?:eir(?:ões|ão)-canud|ados?-ri)o)|x(?:a(?:s-(?:lei|pau)|-(?:lei|pau))|inhas?-fundura))|a(?:ranj(?:a(?:s-(?:(?:terr|onç)a|umbigo)|-(?:umbigo|onça))|eiras?-vaqueiro)|g(?:art(?:as?-(?:vidr|fog)o|os?-água)|ostas?-espinho)|lás?-cintura)|e(?:it(?:e(?:s-(?:ga(?:meleir|linh)a|cachorro)|-(?:ga(?:meleir|linh)a|cachorro)|iras?-espinho)|ugas?-burro)|sma-conchinha)|o(?:ur(?:eiro(?:s-(?:jardim|apolo)|-(?:jardim|apolo))|os?-cheiro)|ireiros?-apolo)|u(?:(?:tos-quaresm|vas-pastor)a|zernas?-sequeiro)|írios?-petrópolis)|v(?:a(?:sso(?:urinha(?:s-(?:(?:relógi|botã)o|varrer)|-(?:(?:relógi|botã)o|varrer))|ira(?:s-(?:fe(?:iticeira|rro)|bruxa)|-(?:feiticeir|brux)a))|ra(?:s-(?:foguete|o[iu]ro|canoa)|-o[iu]ro)|les?-arinto)|e(?:r(?:g(?:onhas?-estudante|as?-jabuti)|(?:melhinhas?-galh|ças?-cã)o)|spa(?:s-(?:rodeio|cobra)|-(?:rodeio|cobra))|l(?:ames?-cheiro|udos?-penca)|ados?-virgínia|nenos?-porco)|i(?:d(?:eiras?-enforcado|oeiros?-papel)|oletas?-(?:par|da)ma|nháticos?-espinho)|oador(?:es)?-pedra)|qu(?:i(?:n(?:a(?:s-(?:(?:per(?:nambuc|iquit)|vead)o|c(?:errado|aiena|ipó)|r(?:e(?:mígi|g)o|aiz)|goiás)|-(?:(?:per(?:nambuc|iquit)|vead)o|c(?:errado|aiena|ipó)|r(?:e(?:mígi|g)o|aiz)|goiás))|gombós?-(?:espinh|cheir)o)|ab(?:o(?:s-(?:c(?:aiena|heiro|ipó)|(?:angol|quin)a)|-(?:c(?:aiena|heiro|ipó)|(?:angol|quin)a)|ranas?-espinho)|eiros?-angola)|(?:gombós?-cheir|to-pernambuc)o|bondos?-água)|ati(?:s-(?:bando|vara)|-(?:bando|vara))|ássias?-caiena)|á(?:rvore(?:s-(?:(?:bálsam|incens|ranc?h|seb)o|(?:gra(?:lh|x)|orquíde)a|c(?:hocalho|oral|uia)|l(?:ótus|eite|ã)|(?:jud|vel)as|a(?:rr|n)oz|pagode|natal)|-(?:(?:bálsam|incens|ranc?h|seb)o|(?:gra(?:lh|x)|orquíde)a|c(?:hocalho|oral|uia)|l(?:ótus|eite|ã)|(?:jud|vel)as|a(?:rr|n)oz|pagode|natal))|(?:gu(?:as?-colóni|ias?-poup)|caros?-galinh)a)|i(?:n(?:ha(?:me(?:s-(?:c(?:oriolá|ão)|lagartixa|enxerto|benim)|-(?:c(?:oriolá|ão)|lagartixa|enxerto|benim))|íbas?-rego)|censos?-caiena|gás?-fogo)|mb(?:(?:ur(?:ana(?:s-(?:c(?:ambã|heir)|espinh)|-(?:espinh|cambã))|is?-cachorr)|aúbas?-(?:cheir|vinh))o|és?-(?:amarra|come)r)|p(?:ês?-impingem|ecas?-cuiabá)|xoras-cheiro|scas?-sola)|n(?:o(?:z(?:-(?:b(?:a(?:tauá|nda)|ugre)|co(?:(?:br|l)a|co)|(?:arec|galh)a)|es-(?:(?:co(?:br|l)|arec|galh)a|b(?:a(?:tauá|nda)|ugre)))|gueira(?:s-(?:cobra|pecã)|-(?:cobra|pecã)))|a(?:(?:rciso(?:-(?:invern|cheir)|s-cheir)|valhas?-macac)o|nás?-raposa)|iqui(?:ns-(?:areia|saco)|m-areia)|ené(?:ns|m)-galinha|ós-cachorro)|u(?:va(?:s-(?:(?:(?:espin|fac)h|g(?:enti|al)|c(?:heir|ã)|urs)o|r(?:ato|ei)|praia|obó)|-(?:(?:(?:espin|fac)h|g(?:enti|al)|urs|cã)o|praia|obó|rei))|(?:irapurus?-band|xis?-morceg|bás?-fach)o|queté(?:s-(?:água|obó)|-(?:água|obó))|m(?:baranas?-abelha|iris?-cheiro)|apuçás?-coleira|ntués?-obó)|h(?:ortelã(?:s-(?:c(?:a(?:mpina|valo)|heiro)|b(?:urro|oi)|leite)|-(?:c(?:a(?:mpina|valo)|heiro)|b(?:urro|oi)|leite))|idras?-água)|o(?:liveira(?:s-(?:marrocos|cheiro)|-(?:marrocos|cheiro))|iti(?:s-porcoóleo-copaíbaóleos-copaíba|-porco)|stras?-pobre)|ç(?:ana-áçúcar|or-Rosa)|ébanos?-zanzibar|xexéu-bananeira|Grão-Bico))\b">
<!ENTITY hyphenised_expressions "(?U)\b(?!feij(?:ão|ões)-frade)((?:c(?:a(?:r(?:rap(?:icho(?:s-(?:c(?:a(?:(?:rneir|val)o|lçada)|igana)|(?:agu|ove)lha|l(?:inho|ã)|boi)|-(?:c(?:a(?:(?:rneir|val)o|lçada)|igana)|(?:agu|ove)lha|l(?:inho|ã)|boi))|ato(?:s-(?:p(?:assarinho|eixe)|(?:caval|sap)o|galinha|boi)|-(?:p(?:assarinho|eixe)|(?:caval|sap)o|galinha|boi)))|u(?:ru(?:s-(?:(?:s(?:oldad|ap)|cach|vead)o|espi(?:nho|ga)|po(?:mba|rco))|-(?:(?:s(?:oldad|ap)|cach|vead)o|espi(?:nho|ga)|po(?:mba|rco)))|atás?-pau)|d(?:o(?:s-(?:(?:(?:bur|ou)r|vis[cg])o|co(?:chonilha|alho|mer)|isca)|-(?:(?:(?:bur|ou)r|vis[cg])o|co(?:chonilha|alho|mer)|isca))|ea(?:is|l)-poupa)|á(?:s-(?:(?:sapateir|cabocl|espinh)o|(?:angol|pedr|águ)a|jardim)|-(?:(?:sapateir|cabocl|espinh)o|(?:pedr|águ)a|jardim))|a(?:n(?:ha(?:s-(?:viveir|toc|ri)|-(?:viveir|toc))o|guejos?-pedra)|guatás?-jardim)|(?:v(?:ões|ão)-ferreir|mim-cártam)o|nes-donzela)|p(?:i(?:ns-(?:b(?:o(?:t(?:ão|a)|lota|de)|a(?:ndeira|tatais)|u(?:cha|rro)|ezerro)|c(?:a(?:(?:rneir|val)o|(?:piva|b)ra)|o(?:ntas|rte|co)|heiro|uba)|t(?:(?:artarug|ouceir)a|e(?:nerife|so))|(?:m(?:a(?:rrec|nad)|ul)|esteir|égu)a|p(?:(?:ernambuc|ast|omb)o|lanta)|f(?:o(?:rquilha|go)|lecha|eixe)|r(?:o(?:[ls]a|des)|ebanho|aiz)|a(?:n(?:gola|dar)|çude)|s(?:apo|oca)|diamante|lastro|natal|itu)|m-(?:b(?:o(?:t(?:ão|a)|lota|de)|a(?:ndeira|tatais)|u(?:cha|rro)|ezerro)|c(?:a(?:(?:rneir|val)o|(?:piva|b)ra)|o(?:ntas|rte|co)|heiro|uba)|(?:m(?:a(?:rrec|nad)|ul)|esteir|égu|soc)a|t(?:(?:artarug|ouceir)a|e(?:nerife|so))|p(?:(?:ernambuc|ast|omb)o|lanta)|f(?:o(?:rquilha|go)|lecha|eixe)|r(?:o(?:[ls]a|des)|ebanho|aiz)|a(?:n(?:gola|dar)|çude)|d(?:iamante|eus)|lastro|natal|itu)|tã(?:es|o)-sa(?:ír|l)a|xinguis?-bicho)|elas?-viúva)|n(?:a(?:s-(?:(?:(?:chei|bur)r|passarinh|macac)o|(?:v(?:asso[iu]|íbo)r|frech|roc)a|(?:jacar|imb)é|elefante|açúcar|urubu)|-(?:(?:v(?:asso[iu]|íbo)r|frech|roc)a|(?:passarinh|macac|burr)o|(?:jacar|imb)é|elefante|açúcar|urubu)|fístula(?:s-(?:igapó|lagoa|boi)|-(?:igapó|lagoa|boi)))|el(?:a(?:s-(?:c(?:a(?:poeira|tarro)|(?:eilã|heir)o|utia)|v(?:e(?:ad|lh)o|argem)|g(?:arça|oiás)|papagaio|jacamim|ema)|-(?:c(?:a(?:poeira|tarro)|utia)|v(?:argem|eado)|g(?:arça|oiás)|papagaio|jacamim|ema))|eira(?:s-(?:cheiro|ema)|-(?:cheiro|ema)))|udo(?:s-(?:cachimbo|lagoa)|-(?:cachimbo|lagoa))|(?:ários?-franç|iços?-águ)a|sanç(?:ões|ão)-leite|oés?-botão)|s(?:t(?:a(?:nh(?:a(?:-(?:(?:á(?:fric|gu)|a(?:rar|nt))a|m(?:oçambique|acaco|inas)|c(?:aiaté|utia)|p(?:eixe|uri)|jatobá|bugre)|s-(?:(?:á(?:fric|gu)|a(?:rar|nt))a|m(?:oçambique|acaco|inas)|c(?:aiaté|utia)|p(?:eixe|uri)|bugre))|eiros?-minas)|s?-correr)|or(?:es)?-montanha)|c(?:a(?:s-(?:carvalho|jacaré|anta)|-(?:carvalho|jacaré|noz))|o(?:s-(?:cavalo|jabuti|tatu)|-(?:cavalo|jabuti|tatu))|udo(?:s-(?:enfeite|aranha)|-(?:enfeite|aranha))))|m(?:a(?:r(?:á(?:s-(?:(?:c(?:aval|heir)|espinh)o|b(?:ilro|oi)|flecha)|-(?:(?:c(?:aval|heir)|espinh)o|b(?:ilro|oi)|flecha))|ões-(?:pe(?:nedo|dra)|estalo|areia)|ão-(?:pe(?:nedo|dra)|estalo|areia))|le(?:ões-(?:pedreira|asas)|ão-(?:pedreira|asas)))|b(?:ará(?:s-(?:c(?:h(?:eir|umb)o|apoeira)|espinho|lixa)|-(?:c(?:h(?:eir|umb)o|apoeira)|espinho|lixa))|oatãs?-leite)|urus?-cheiro)|c(?:himbo(?:s-(?:(?:maca|tur)co|jabuti)|-(?:(?:maca|tur)co|jabuti))|au(?:s-(?:ca(?:racas|iena)|mico)|-(?:ca(?:racas|iena)|mico))|tos?-cabeça)|b(?:a(?:(?:ças?-trombet|cinhas?-cobr)a|s-(?:igreja|ladrão|peixe)|-(?:igreja|ladrão|peixe))|u(?:mbos?-azeite|rés?-orelha))|val(?:inho(?:-(?:judeu|deus|cão)|s-judeu)|o-cão)|fé(?:s-b(?:agueio|ugre)|-b(?:agueio|ugre))|t(?:ingueiros?-porc|otas?-espinh)o|avuranas?-cunhã)|o(?:c(?:o(?:-(?:(?:b(?:acai(?:aú|u)b|ocaiuv)|p(?:almeir|indob|urg)|quar(?:esm|t)|oitav)a|v(?:a(?:queiro|ssoura)|inagre|eado)|c(?:(?:a(?:cho|ta)rr|igan)o|olher)|(?:espinh|rosári|macac|óle)o|i(?:ndaiá|ri)|(?:gur|a)iri|na(?:tal|iá)|dendê)|s-(?:(?:b(?:acai(?:aú|u)b|ocaiuv)|p(?:almeir|indob|urg)|quar(?:esm|t)|oitav)a|v(?:a(?:queiro|ssoura)|inagre|eado)|c(?:(?:a(?:cho|ta)rr|igan)o|olher)|(?:espinh|rosári|macac|óle)o|i(?:ndaiá|ri)|na(?:tal|iá)|dendê|guriri))|(?:honilhas?-cer|as?-águ)a)|bra(?:-(?:c(?:a(?:p(?:elo|im)|scavel|belo|ju)|o(?:lchete|ral)|ipó)|(?:es[cp]ad|ferradur|barat|águ)a|(?:v(?:ead|idr)|lix|oc)o|a(?:r(?:eia)?|sa)|pernas|ratos?)|s-(?:c(?:a(?:p(?:elo|im)|scavel|belo|ju)|o(?:lchete|ral)|ipó)|(?:ferradur|barat|espad|águ)a|(?:v(?:ead|idr)|lix|oc)o|a(?:r(?:eia)?|sa)|pernas|ratos?))|l(?:a(?:-(?:(?:(?:sapatei|zor)r|caval)o|peixe)|s-(?:(?:caval|zorr)o|peixe))|eir(?:o(?:s-(?:(?:band|choc)o|sapé)|-(?:(?:band|choc)o|sapé))|as?-sapé))|r(?:(?:uj(?:as?|ão)-igrej|tiças?-montanh|-ros)a|vina(?:s-(?:corso|linha)|-(?:corso|linha))|d(?:ões|ão)-frade|reias?-inverno)|e(?:rana(?:s-(?:(?:caravel|min)as|pernambuco)|-(?:(?:caravel|min)as|pernambuco))|ntro(?:s-caboclos|-caboclo))|gumelo(?:s-(?:c(?:aboclo|hapéu)|(?:sangu|leit)e|paris)|-(?:c(?:aboclo|hapéu)|(?:sangu|leit)e|paris))|uve(?:s-(?:a(?:dorno|reia)|(?:saboi|águ)a|cortar)|-(?:a(?:dorno|reia)|(?:saboi|águ)a|cortar))|n(?:gonha(?:s-(?:caixeta|bugre|goiás)|-(?:caixeta|bugre|goiás))|durus?-sangue|tas?-cabra)|irana(?:s-(?:(?:caravel|min)as|pernambuco)|-(?:(?:caravel|min)as|pernambuco))|(?:xas?-(?:d(?:am|on)|freir)|mer(?:es)?-arar|tovias?-poup)a|queiro(?:s-(?:vassoura|dendê)|-(?:vassoura|dendê))|paibeiras?-minas)|ipó(?:-(?:c(?:a(?:r(?:neiro|ijó)|b(?:oclo|aça)|noa)|o(?:r(?:ação|da)|(?:br|l)a)|h(?:agas|umbo)|u[mn]anã|esto)|a(?:l(?:caçuz|ho)|r(?:acuã|c)o|marrar|gulha)|m(?:a(?:inibu|caco)|o(?:fumb|rceg)o|ucuna)|b(?:a(?:(?:mburra|rri)l|tata)|reu|oi)|j(?:a(?:b(?:ut[ái]|ota)|rrinha)|unta)|p(?:(?:a(?:in|lm)|oit)a|enas)|t(?:amanduá|ucunaré|imbó)|l(?:avadeira|eite)|v(?:aqueiro|iúva)|e(?:mbiri|scada)|im(?:pingem|bé)|g(?:ato|ota)|s(?:apo|eda)|(?:fo|re)go|quati|água)|s-(?:c(?:a(?:r(?:neiro|ijó)|b(?:oclo|aça)|noa)|o(?:r(?:ação|da)|(?:br|l)a)|u[mn]anã|hagas|esto)|a(?:l(?:caçuz|ho)|r(?:acuã|c)o|marrar|gulha)|m(?:a(?:inibu|caco)|o(?:fumb|rceg)o|ucuna)|b(?:a(?:(?:mburra|rri)l|tata)|reu|oi)|j(?:a(?:b(?:ut[ái]|ota)|rrinha)|unta)|p(?:(?:a(?:in|lm)|oit)a|enas)|t(?:amanduá|ucunaré|imbó)|l(?:avadeira|eite)|v(?:aqueiro|iúva)|e(?:mbiri|scada)|im(?:pingem|bé)|g(?:ato|ota)|s(?:apo|eda)|(?:fo|re)go|quati|água))|r(?:av(?:o(?:s-(?:(?:cabe(?:cinh|ç)|esperanç|sear)a|b(?:(?:astã|urr)o|ouba)|p(?:oeta|au)|defunto|tunes|urubu|amor)|-(?:(?:cabe(?:cinh|ç)|esperanç|sear)a|b(?:(?:astã|urr)o|ouba)|p(?:oeta|au)|defunto|tunes|urubu|amor))|in(?:a(?:s-(?:(?:lagartix|águ)a|ambrósio|tunes|pau)|-(?:(?:lagartix|águ)a|tunes|pau))|ho(?:s-(?:(?:lagartix|campin)a|defunto)|-(?:lagartixa|defunto))))|ista(?:s-(?:gal(?:inha|o)|mutum|peru)|-(?:gal(?:inha|o)|mutum|peru)|(?:is|l)-rocha))|e(?:bol(?:(?:a(?:s-(?:cheir|lob)|-lob)|inhas?-cheir)o|etas?-frança)|r(?:ej(?:as?-(?:caien|purg)|eiras?-purg)a|vejas?-pobre)|n(?:táureas?-jardim|ouras?-creta)|vadas?-jardim)|h(?:a(?:ga(?:s-(?:bauru|jesus)|-bauru)|scos?-leque)|u(?:p(?:ões|ão)-arroz|vas?-imbu))|u(?:tia(?:s-(?:rabo|pau)|-(?:rabo|pau))|(?:mbuc|i)as?-macaco)|ânhamo-manila)|p(?:a(?:u(?:s-(?:c(?:a(?:n(?:(?:galh|inan)a|deeiro|oas?|til)|m(?:peche|arão)|r(?:rapato|ne)|c(?:himbo|a)|i(?:bro|xa)|pitão|stor)|o(?:r(?:tiça|al)|n(?:ch|t)a|lher|bre)|h(?:a(?:pad|nc)a|i(?:cl|fr)e|eiro)|u(?:rt(?:ume|ir)|nanã|biú|tia)|erc?a|inzas|ruz)|s(?:a(?:n(?:t(?:ana|o)|gue)|p(?:ateir)?o|b(?:ão|iá)|ssafrás|lsa)|e(?:(?:rr|d)a|bo)|urriola|olar)|m(?:a(?:(?:n(?:jeriob|teig)|ri)a|(?:cac|str|lh)o)|o(?:(?:njol|rceg)o|quém|có)|(?:utamb|erd)a)|b(?:u(?:jarrona|gre|rro)|i(?:ch?o|lros)|a(?:rbas|lso)|o(?:[lt]o|ia)|r(?:incos|eu)|álsamo)|p(?:e(?:r(?:nambuco|eira)|nte)|r(?:eg(?:uiça|o)|aga)|i(?:ranha|lão)|o(?:mb|rc)o|ólvora)|l(?:a(?:g(?:arto|oa)|cre|nça)|e(?:(?:br|it)e|tras|pra)|i(?:vros|xa)|ágrima)|f(?:(?:a(?:[iv]|rinh)|ormig)a|(?:u[ms]|ígad)o|l(?:echas?|or)|e(?:bre|rro))|r(?:e(?:(?:spost|nd)a|(?:[gm]|in)o|de)|os(?:eira|as?)|a(?:inha|to))|e(?:s(?:p(?:inh|et)o|teira)|(?:rv(?:ilh)?|mbir)a|lefante)|a(?:(?:bóbor|ngol)a|r(?:ara|co)|l(?:ho|oé))|t(?:a(?:rtaruga|manco)|in(?:gui|ta)|ucano)|v(?:i(?:n(?:tém|ho)|ola)|e(?:ado|ia)|aca)|g(?:(?:asolin|om)a|ui(?:tarra|né))|j(?:erimum?|angada|udeu)|d(?:igestão|edal)|n(?:avalha|ovato)|o(?:rvalho|laria)|(?:incens|óle)o|(?:zebr|águ)a|qui(?:abo|na))|-(?:c(?:a(?:n(?:(?:galh|inan)a|deeiro|oas?|til)|m(?:peche|arão)|r(?:rapato|ne)|c(?:himbo|a)|i(?:bro|xa)|pitão|stor)|o(?:r(?:tiça|al)|n(?:ch|t)a|lher|bre)|h(?:a(?:pad|nc)a|i(?:cl|fr)e|eiro)|u(?:rt(?:ume|ir)|nanã|biú|tia)|erc?a|inzas|ruz)|s(?:a(?:n(?:t(?:ana|o)|gue)|p(?:ateir)?o|b(?:ão|iá)|ssafrás|lsa)|e(?:(?:rr|d)a|bo)|urriola|olar)|m(?:a(?:(?:n(?:jeriob|teig)|ri)a|(?:cac|str|lh)o)|o(?:(?:njol|rceg)o|quém|có)|(?:utamb|erd)a)|b(?:u(?:jarrona|gre|rro)|i(?:ch?o|lros)|a(?:rbas|lso)|o(?:[lt]o|ia)|r(?:incos|eu)|álsamo)|p(?:e(?:r(?:nambuco|eira)|nte)|r(?:eg(?:uiça|o)|aga)|i(?:ranha|lão)|o(?:mb|rc)o|ólvora)|l(?:a(?:g(?:arto|oa)|cre|nça)|e(?:(?:br|it)e|tras|pra)|i(?:vros|xa)|ágrima)|f(?:(?:a(?:[iv]|rinh)|ormig)a|(?:u[ms]|ígad)o|l(?:echas?|or)|e(?:bre|rro))|r(?:e(?:(?:spost|nd)a|(?:[gm]|in)o|de)|os(?:eira|as?)|a(?:inha|to))|e(?:s(?:p(?:inh|et)o|teira)|(?:rv(?:ilh)?|mbir)a|lefante)|t(?:a(?:rtaruga|manco)|in(?:gui|ta)|ucano)|v(?:i(?:n(?:tém|ho)|ola)|e(?:ado|ia)|aca)|a(?:(?:bóbor|ngol)a|l(?:ho|oé)|rco)|g(?:(?:asolin|om)a|ui(?:tarra|né))|j(?:erimum?|angada|udeu)|d(?:igestão|edal)|n(?:avalha|ovato)|o(?:rvalho|laria)|(?:incens|óle)o|(?:zebr|águ)a|qui(?:abo|na))|xis?-pedra)|l(?:m(?:eir(?:a(?:s-(?:(?:(?:palmi|ce)r|igrej)a|madagascar|dendê|leque|tebas|vinho)|-(?:(?:(?:palmi|ce)r|igrej)a|madagascar|dendê|leque|tebas|vinho))|inhas?-petrópolis)|a(?:s-(?:c(?:hicote|acho)|igreja|leque)|-(?:c(?:hicote|acho)|igreja|leque)|tórias?-espinho)|i(?:tos?-ferrão|lhas?-papa))|ha(?:s-(?:(?:penach|caniç)o|guiné|água)|-(?:(?:penach|caniç)o|guiné|água))|os-(?:calenturas|maria))|r(?:ic(?:á(?:s-(?:esponjas|curtume)|-(?:esponjas|curtume))|aranas?-espinhos)|a(?:cuuba(?:s-lei(?:te)?|-lei(?:te)?)|sitas?-samambaiaçu|tudos?-praia)|go(?:s-(?:m(?:itra|orro)|cótula)|-(?:m(?:itra|orro)|cótula)))|p(?:o(?:ila(?:s-(?:espinho|holanda)|-(?:espinho|holanda))|ula(?:s-(?:espinho|holanda)|-(?:espinho|holanda)))|agaio(?:s-cole(?:ira|te)|-cole(?:ira|te)))|in(?:a(?:s-(?:s(?:apo|eda)|arbusto|penas|cuba)|-(?:s(?:apo|eda)|arbusto|penas|cuba))|eira(?:s-(?:c(?:ipó|uba)|leite)|-(?:c(?:ipó|uba)|leite)))|(?:c(?:o(?:vas?-macac|s?-golung)|as-rab)|ssarinhos?-(?:arribaç|ver)ã|nelas?-bugi)o|t(?:os?-c(?:a(?:rúncul|ien)|rist)a|i(?:nhos?-igapó|s?-goiás))|v(?:ões|ão)-java)|i(?:nh(?:eir(?:o(?:s-(?:(?:(?:pur|ri)g|casquinh)a|jerusalém|alepo)|-(?:(?:(?:pur|ri)g|casquinh)a|jerusalém|alepo))|inho(?:s-(?:jardim|sala)|-(?:jardim|sala)))|ões-(?:(?:cerc|purg)a|madagascar|ratos?)|ão-(?:(?:cerc|purg)a|madagascar|rato)|o(?:s-(?:flandres|riga)|-riga)|as?-raiz)|ment(?:a(?:s-(?:c(?:(?:aien|oro)a|heiro)|(?:ra[bt]|macac)o|g(?:alinha|entio)|bu(?:gre|ta)|queimar|água)|-(?:c(?:(?:aien|oro)a|heiro)|(?:ra[bt]|macac)o|g(?:alinha|entio)|bu(?:gre|ta)|queimar|água))|ões-c(?:aiena|heiro)|ão-c(?:aiena|heiro))|t(?:o(?:mb(?:a(?:s-(?:macaco|leite)|-(?:macaco|leite))|eiras?-marajó)|s-(?:água|saci)|-(?:água|saci))|a(?:ng(?:ueira(?:s-(?:cachorro|jardim)|-(?:cachorro|jardim))|as?-cachorro)|s?-erva)|eiras?-sinal)|olho(?:s-(?:(?:galinh|balei|onç)a|(?:soldad|tubarã)o|p(?:lanta|adre)|c(?:ação|obra)|faraó|urubu)|-(?:(?:galinh|balei|onç)a|(?:soldad|tubarã)o|p(?:lanta|adre)|c(?:ação|obra)|faraó|urubu))|(?:piras?-(?:máscar|prat)|quiás?-pedr|ão-purg)a|c(?:ões-tr(?:opeiro|epar)|ão-tropeiro)|xiricas?-bolas|raíbas?-pele)|e(?:r(?:a(?:s-(?:a(?:(?:guieir|lmeid)a|dvogado)|r(?:e(?:fego|i)|osa)|(?:cris|un)to|jesus|água)|-(?:a(?:(?:guieir|lmeid)a|dvogado)|r(?:e(?:fego|i)|osa)|(?:cris|un)to|jesus|água))|oba(?:s-(?:(?:pernambuc|reg)o|ca(?:ntagalo|mpos)|go(?:iás|mo)|minas)|-(?:(?:pernambuc|reg)o|ca(?:ntagalo|mpos)|go(?:iás|mo)|minas))|(?:iquit(?:o(?:s-(?:campin|ant)|-ant)|inhos?-vassour)|cevejos?-(?:ca[ms]|galinh))a|diz(?:es)?-alqueive|us?-sol)|na(?:chos?-capim|s-avestruz)|pinos?-(?:papagai|burr)o|ssegueiros?-abrir|quiás?-pedra)|u(?:rga(?:s-(?:c(?:a(?:i(?:tité|apó)|(?:boc|va)lo|rijó)|ereja)|(?:ve(?:ad|nt)|marinheir|genti)o|pa(?:ulista|stor)|nabiça)|-(?:c(?:a(?:i(?:tité|apó)|(?:boc|va)lo|rijó)|ereja)|(?:ve(?:ad|nt)|marinheir|genti)o|pa(?:ulista|stor)|nabiça))|lg(?:a(?:s-(?:(?:a(?:rei|nt)|galinh|águ)a|bicho)|-(?:(?:a(?:rei|nt)|galinh|águ)a|bicho))|(?:ões|ão)-planta))|o(?:mb(?:a(?:s-(?:(?:(?:arribaç|sert)ã|espelh|band)o|mulata)|-(?:(?:(?:arribaç|sert)ã|espelh|band)o|mulata))|o(?:s-(?:montanha|leque)|-(?:montanha|leque)))|rco(?:s-(?:verrugas|ferro)|-(?:verrugas|ferro))|aia(?:s-(?:minas|cipó)|-(?:minas|cipó)))|ã(?:es-(?:p(?:o(?:rc(?:in)?o|bre)|ássaros)|gal(?:inha|o)|leite|cuco)|o-(?:p(?:orc(?:in)?o|ássaros)|gal(?:inha|o)|cuco))|l(?:uma(?:s-(?:príncipe|capim)|-(?:príncipe|capim))|átanos?-gênio|antas?-neve)|r(?:eguiça(?:s-(?:bentinho|coleira)|-(?:bentinho|coleira))|imaveras?-caiena)|ássaros?-f(?:andan|i)go|êssegos?-abrir)|f(?:lor(?:es-(?:c(?:a(?:(?:(?:r(?:nav|de)|m)a)?l|(?:sament|chimb|bocl)o)|o(?:(?:[iu]r|elh|c)o|ntas|bra|ral)|e(?:tim|ra)|hagas|iúme|uco)|p(?:a(?:(?:ssarinh|pagai|raís|vã)o|dre|lha|u)|e(?:licano|dra)|érolas)|m(?:a(?:r(?:acujá|iposa)|deira|io)|(?:(?:eren|osca)d|us)a|ico)|b(?:a(?:(?:b(?:eir|ad)|rbeir)o|unilha|ile)|eso[iu]ro)|a(?:(?:lgodã|njinh)o|ranha|bril|zar)|s(?:a(?:p(?:at)?o|ngue)|(?:ed|ol)a)|v(?:(?:iúv|ac)a|e(?:lu|a)do)|n(?:(?:espereir|oiv)a|atal)|l(?:is(?:ado)?|agartixa|ã)|(?:quaresm|dian|águ)a|(?:invern|índi|fog)o|e(?:spírito|nxofre)|t(?:rombeta|anino)|g(?:ra[mx]a|elo)|jesus)|-(?:c(?:a(?:(?:(?:r(?:nav|de)|m)a)?l|(?:sament|chimb|bocl)o)|o(?:(?:[iu]r|elh|c)o|ntas|bra|ral)|e(?:tim|ra)|hagas|iúme|uco)|p(?:a(?:(?:ssarinh|pagai|raís|vã)o|dre|lha|u)|e(?:licano|dra)|érolas)|m(?:a(?:r(?:acujá|iposa)|deira|io)|(?:(?:eren|osca)d|us)a|ico)|b(?:a(?:(?:b(?:eir|ad)|rbeir)o|unilha|ile)|eso[iu]ro)|a(?:(?:lgodã|njinh)o|ranha|bril|zar)|s(?:a(?:p(?:at)?o|ngue)|(?:ed|ol)a)|v(?:(?:iúv|ac)a|e(?:lu|a)do)|n(?:(?:espereir|oiv)a|atal)|(?:quaresm|dian|águ)a|(?:invern|índi|fog)o|e(?:spírito|nxofre)|l(?:agartixa|is|ã)|t(?:rombeta|anino)|g(?:ra[mx]a|elo)|jesus))|rut(?:a(?:s-(?:c(?:o(?:n(?:de(?:ssa)?|ta)|(?:dorn|ruj)a)|a(?:chorro|scavel|iapó)|utia)|g(?:(?:enti|al)o|uar(?:iba|á)|rude)|m(?:a(?:n(?:teig|il)a|caco)|orcego)|p(?:(?:a(?:pagai|vã)|omb|ã)o|erdiz)|sa(?:(?:pucainh|ír)a|b(?:ão|iá))|a(?:n(?:ambé|el)|rara)|v(?:(?:íbor|i)a|eado)|b(?:abad|urr)o|t(?:ucano|atu)|l(?:epra|obo)|jac(?:aré|u)|árvore|faraó|ema)|-(?:c(?:o(?:n(?:de(?:ssa)?|ta)|(?:dorn|ruj)a)|a(?:chorro|scavel|iapó)|utia)|g(?:(?:enti|al)o|uar(?:iba|á)|rude)|m(?:a(?:n(?:teig|il)a|caco)|orcego)|p(?:(?:a(?:pagai|vã)|omb|ã)o|erdiz)|sa(?:(?:pucainh|ír)a|b(?:ão|iá))|a(?:n(?:ambé|el)|rara)|v(?:(?:íbor|i)a|eado)|b(?:abad|urr)o|t(?:ucano|atu)|l(?:epra|obo)|jac(?:aré|u)|árvore|faraó|ema))|eira(?:s-(?:c(?:onde(?:ssa)?|achorro|utia)|(?:macac|tucan|burr|lob)o|p(?:(?:avã|omb)o|erdiz)|jac(?:aré|u)|arara|faraó)|-(?:c(?:onde(?:ssa)?|achorro|utia)|(?:macac|tucan|burr|lob)o|p(?:(?:avã|omb)o|erdiz)|jac(?:aré|u)|arara|faraó))|o(?:s-(?:c(?:a(?:xinguelê|chorro)|o(?:br|nt)a)|m(?:a(?:nteiga|caco)|orcego)|p(?:apagaio|erdiz)|burro|sabiá|imbé)|-(?:c(?:a(?:xinguelê|chorro)|o(?:br|nt)a)|m(?:a(?:nteiga|caco)|orcego)|p(?:apagaio|erdiz)|burro|sabiá|imbé)))|o(?:r(?:m(?:iga(?:s-(?:f(?:e(?:rrão|bre)|ogo)|r(?:a(?:spa|bo)|oça)|c(?:emitério|upim)|m(?:andioca|onte)|b(?:entinho|ode)|(?:imbaúv|onç)a|n(?:ovato|ós)|defunto)|-(?:f(?:e(?:rrão|bre)|ogo)|r(?:a(?:spa|bo)|oça)|c(?:emitério|upim)|m(?:andioca|onte)|b(?:entinho|ode)|(?:imbaúv|onç)a|n(?:ovato|ós)|defunto))|osa(?:s-(?:besteiros|darei)|-(?:besteiros|darei)))|no(?:s-ja(?:çanã|caré)|-ja(?:çanã|caré)))|lha(?:s-(?:s(?:a(?:n(?:tana|gue)|bão)|e(?:rr|d)a)|f(?:(?:ígad|og)o|igueira|ronte)|p(?:a(?:pagaio|dre|jé)|irarucu)|(?:comichã|bold?|gel)o|l(?:ança|eite|ouco)|(?:zeb|he|ta)ra|mangue|urubu)|-(?:s(?:a(?:n(?:tana|gue)|bão)|erra)|f(?:(?:ígad|og)o|igueira|ronte)|p(?:a(?:pagaio|dre|jé)|irarucu)|(?:comichã|bold?|gel)o|l(?:ança|eite|ouco)|(?:zeb|he|ta)ra|mangue|urubu))|cas?-capuz)|a(?:v(?:a(?:s-(?:(?:a(?:ngol|rar)|(?:mal|v)ac|r(?:osc|am)|sucupir|holand)a|c(?:a(?:labar|valo)|h(?:eiro|apa)|obra)|b(?:e(?:souro|lém)|ol(?:ach|ot)a)|(?:quebrant|engenh|ordáli)o|l(?:(?:ázar|ob)o|ima)|t(?:ambaqui|onca)|p(?:orco|aca)|impin?gem)|-(?:(?:a(?:ngol|rar)|(?:mal|v)ac|r(?:osc|am)|sucupir|holand)a|b(?:e(?:souro|lém)|ol(?:ach|ot)a)|c(?:a(?:labar|valo)|(?:hap|obr)a)|(?:quebrant|ordáli)o|t(?:ambaqui|onca)|p(?:orco|aca)|l(?:ima|obo)|impin?gem))|eira(?:s-(?:impin?gem|berloque)|-(?:impin?gem|berloque))|inhas?-capoeira)|lc(?:ões|ão)-coleira)|e(?:ij(?:õe(?:s-(?:c(?:(?:o(?:br|rd)|er|ub)a|avalo)|g(?:u(?:ando|izos)|ado)|(?:árvor|azeit|frad)e|l(?:i(?:sbo|m)a|eite)|(?:jav|rol|soj)a|m(?:acáçar|etro)|po(?:mbinha|rco)|va(?:[cr]a|gem)|tropeiro|boi)|zinhos-capoeira)|ão(?:-(?:c(?:(?:ord|er|ub)a|avalo)|g(?:u(?:ando|izos)|ado)|(?:árvor|azeit|frad)e|l(?:i(?:sbo|m)a|eite)|(?:jav|rol|soj)a|m(?:acáçar|etro)|po(?:mbinha|rco)|va(?:[cr]a|gem)|tropeiro|boi)|zinho-capoeira))|(?:l(?:es)?-genti|nos?-cheir|tos?-botã)o)|i(?:g(?:ueira(?:s-(?:(?:lombrigueir|pit|go)a|b(?:engala|aco)|to(?:car|que)|jardim)|-(?:(?:lombrigueir|pit|go)a|b(?:engala|aco)|to(?:car|que)|jardim))|o(?:s-(?:(?:figueir|banan)a|r(?:echeio|ocha)|(?:tord|verã)o)|-(?:(?:figueir|banan)a|r(?:echeio|ocha)|(?:tord|verã)o)))|lária(?:s-(?:medina|guiné)|-(?:medina|guiné))|andeiras?-algodão)|u(?:mo(?:s-(?:(?:rapos|cord|folh)a|pa(?:isan|raís)o|jardim)|-(?:(?:rapos|cord|folh)a|pa(?:isan|raís)o|jardim))|ncho(?:s-(?:(?:florenç|águ)a|porco)|-(?:(?:florenç|águ)a|porco)))|éis-gentio)|b(?:a(?:nan(?:eir(?:a(?:s-(?:(?:madag[áa]sca|flo)r|(?:italian|papagai)o|sementes|jardim|corda|leque)|-(?:(?:madag[áa]sca|flo)r|(?:italian|papagai)o|sementes|jardim|corda|leque))|inha(?:s-(?:touceira|salão|flor)|-(?:touceira|salão|flor)))|a(?:s-(?:(?:m(?:orceg|acac)|papagai)o|s(?:ementes|ancho)|imbé)|-(?:(?:m(?:orceg|acac)|papagai)o|s(?:ementes|ancho)|imbé)))|tat(?:a(?:s-(?:p(?:e(?:rdiz|dra)|ur(?:ga|i)|orco)|a(?:(?:ngol|rrob)a|maro)|b(?:(?:ranc|ugi)o|ainha)|(?:cabocl|vead)o|t(?:aiuiá|iú)|escamas|rama)|-(?:p(?:e(?:rdiz|dra)|ur(?:ga|i)|orco)|a(?:(?:ngol|rrob)a|maro)|b(?:(?:ranc|ugi)o|ainha)|(?:cabocl|vead)o|t(?:aiuiá|iú)|escamas|rama))|inhas?-cobra)|g(?:a(?:s-(?:(?:cabocl|tucan|lour)o|p(?:ombo|raia))|-(?:(?:cabocl|tucan|lour)o|p(?:ombo|raia)))|re(?:s-(?:(?:arei|lago)a|man(?:gue|ta)|penacho)|-(?:(?:arei|lago)a|man(?:gue|ta)|penacho))|os?-chumbo)|r(?:r(?:ete(?:s-(?:clérigo|eleitor|padre)|-(?:clérigo|eleitor|padre))|ig(?:udas?-espinho|as?-freira))|ba(?:-(?:(?:chib|timã)o|pa(?:ca|u)|lagoa)|s-(?:(?:chib|timã)o|lagoa|boi|pau)))|b(?:osa(?:s-(?:árvore|espiga|pau)|-(?:árvore|espiga|pau))|a(?:-(?:(?:camel|sap)o|boi)|s-(?:sapo|boi)))|c(?:u(?:r(?:aus?-(?:lajea|ban)do|is?-cerca)|(?:paris?-capoei|s?-ped)ra)|abas?-(?:azeit|lequ)e)|mbu(?:s-(?:(?:espinh|caniç)o|pescador|mobília)|-(?:(?:espinh|caniç)o|pescador|mobília))|leia(?:s-(?:b(?:arbatana|ico)|corcova|gomo)|-(?:b(?:arbatana|ico)|corcova|gomo))|i(?:acu(?:s-(?:espinho|chifre)|-(?:espinho|chifre))|nhas?-(?:espad|fac)a)|st(?:(?:i(?:ões|ão)-arrud|ardos?-rom)a|ões-velho)|d(?:ianas?-cheiro|ejos?-lista)|unilhas?-auacuri|únas?-fogo)|i(?:c(?:h(?:o(?:-(?:c(?:(?:a(?:rpintei|chor)r|est)o|o(?:nta|co)|hifre)|(?:(?:ester|bura)c|ouvid|rum)o|(?:(?:gali|u)nh|taquar|sed)a|p(?:a(?:rede|u)|orco|ena|é)|m(?:(?:edranç|osc)a|ato)|v(?:areja|eludo)|f(?:rade|ogo))|s-(?:c(?:(?:a(?:rpintei|chor|nast)r|est)o|o(?:nta|co)|hifre)|(?:m(?:edranç|osc)|(?:gali|u)nh|taquar|sed)a|(?:(?:ester|bura)c|ouvid|rum)o|p(?:a(?:rede|u)|orco|ena|é)|v(?:areja|eludo)|f(?:rade|ogo)))|eiros?-conta)|udas?-corso)|ribás?-pernambuco|telos?-gente)|o(?:r(?:boleta(?:s-(?:p(?:êssego|iracema)|a(?:moreira|lface)|(?:carvalh|band)o|gás)|-(?:p(?:êssego|iracema)|a(?:moreira|lface)|(?:carvalh|band)o|gás))|d(?:ões|ão)-(?:santiag|macac)o)|i(?:s-(?:carro|guará|deus)|-(?:carro|guará|deus)|tas?-bigodes)|a(?:is|l)-alicante|fes?-burro|tos-óculos)|r(?:edo(?:s-(?:(?:namor(?:ad)?|porc|vead|mur)o|espi(?:nho|ga)|cabeça|jardim)|-(?:(?:namor(?:ad)?|porc|vead|mur)o|espi(?:nho|ga)|cabeça|jardim))|inco(?:s-(?:s(?:a(?:guim?|uim)|urubim)|passarinho)|-(?:sa(?:guim?|uim)|passarinho))|(?:ucos?-salvaterr|ancos?-barit)a|ocas?-raiz)|e(?:s(?:ouro(?:s-(?:(?:limeir|águ)a|chifre|maio)|-(?:(?:limeir|águ)a|chifre|maio))|ugos?-ovas)|l(?:droega(?:s-(?:inverno|cuba)|-(?:inverno|cuba))|a(?:s-felgueiras?|-felgueiras?))|ngalas?-camarão|tónicas?-água|ijus?-potó)|álsamo(?:s-(?:c(?:a(?:rtagena|nudo)|heiro)|(?:arce|tol)u|enxofre)|-(?:c(?:a(?:rtagena|nudo)|heiro)|(?:arce|tol)u|enxofre))|u(?:ch(?:o(?:s-(?:veado|boi|rã)|-(?:veado|boi|rã))|as?-purga)|t(?:iás?-vinagre|uas?-corvo)|xos?-holanda))|a(?:l(?:f(?:a(?:vaca(?:s-(?:c(?:(?:abocl|heir)o|obra)|vaqueiro)|-(?:c(?:(?:abocl|heir)o|obra)|vaqueiro))|ce(?:s-(?:(?:c(?:ordeir|ã)|porc)o|alger)|-(?:(?:c(?:ordeir|ã)|porc)o|alger))|zemas?-caboclo|fas?-provença)|inete(?:s-(?:toucar|dama)|-toucar))|m(?:a(?:s-(?:c(?:a(?:(?:boc|va)lo|çador)|(?:hichar|ânta)ro)|(?:tapui|pomb|gat)o|biafada|mestre)|-(?:c(?:a(?:(?:boc|va)lo|çador)|(?:hichar|ânta)ro)|(?:tapui|pomb|gat)o|biafada))|ecegueira(?:s-(?:cheiro|minas)|-(?:cheiro|minas)))|e(?:cri(?:ns-(?:c(?:ampina|heiro)|angola)|m-(?:c(?:ampina|heiro)|angola))|trias?-pau)|ho(?:s-(?:espanha|cheiro)|-(?:espanha|cheiro))|ba(?:troz(?:es)?-sobrancelha|coras?-laje)|g(?:odoeiros?-pernambuco|ibeiras?-dama)|cachofras?-jerusalém|amandas?-jacobina)|r(?:a(?:ç(?:á(?:s-(?:c(?:o(?:mer|roa)|heiro)|(?:umbig|vead)o|(?:pomb|ant)a|tinguijar|minas)|-(?:c(?:o(?:mer|roa)|heiro)|(?:umbig|vead)o|(?:pomb|ant)a|tinguijar|minas))|aris?-minhoca)|ticu(?:ns-(?:(?:espinh|cheir)o|(?:jangad|pac)a|boia?)|m-(?:(?:espinh|cheir)o|(?:jangad|pac)a|boia?))|nha(?:s-(?:água|coco)|-(?:água|coco))|(?:pocas?-cheir|rutas?-porc)o)|r(?:aia(?:s-(?:coroa|fogo)|-(?:coroa|fogo))|ozes-(?:telhad|rat)o|udas?-campinas)|oeira(?:s-(?:(?:goiá|mina)s|capoeira|bugre)|-(?:(?:goiá|mina)s|capoeira|bugre))|(?:lequi(?:ns|m)-caien|cos?-pip)a)|n(?:g(?:ico(?:s-(?:m(?:onte|ina)s|banhado|curtume)|-(?:m(?:onte|ina)s|banhado|curtume))|eli(?:ns|m)-(?:espinh|morceg)o|élicas?-rama)|a(?:n(?:ases-(?:caraguatá|agulha)|ás-(?:caraguatá|agulha))|mbés?-capuz)|dorinha(?:s-(?:bando|casa)|-(?:bando|casa))|ingas?-(?:espinh|macac)o|u(?:n?s|m)?-enchente|z(?:óis|ol)-lontra)|m(?:or(?:e(?:ira(?:s-(?:espinho|árvore)|-(?:espinho|árvore))|s-(?:(?:(?:vaquei|bur)r|hortelã)o|moça))|-(?:(?:(?:vaquei|bur)r|hortelã)o|moça))|e(?:ndo(?:i(?:ns-(?:árvore|veado)|m-(?:árvore|veado))|eiras?-coco)|ixa(?:s-(?:madagascar|espinho)|-(?:madagascar|espinho)))|êndoas?-(?:espinh|coc)o)|b(?:elha(?:s-(?:c(?:(?:achorr|hã)o|upim)|(?:rein|fog|our|sap)o|p(?:urga|au))|-(?:(?:rein|fog|our|sap)o|p(?:urga|au)|cupim))|(?:utuas?-batat|óbora-coro)a|r(?:icós?-macaco|aços?-vide))|s(?:a(?:s-(?:pa(?:pagaio|lha)|(?:barat|telh)a|sabre)|-(?:pa(?:pagaio|lha)|(?:barat|telh)a|sabre))|pargo(?:s-(?:jardim|sala)|-(?:jardim|sala))|so[bv]ios?-(?:cobr|folh)a)|ça(?:f(?:ate(?:s-(?:o[iu]ro|prata)|-(?:o[iu]ro|prata))|roeiras?-pernambuco)|ís?-caatinga)|g(?:ulh(?:(?:ões|ão)-(?:(?:trombe|pra)t|vel)a|as?-pastor)|rílicas?-rama)|zed(?:inha(?:s-(?:corumbá|goiás)|-(?:corumbá|goiás))|as-ovelha)|ve(?:nca(?:s-(?:espiga|minas)|-(?:espiga|minas))|s?-crocodilo)|(?:ipos?-montevid|carás?-v)éu|tu(?:ns|m)-galha)|m(?:a(?:r(?:acujá(?:s-(?:c(?:a(?:iena|cho)|o(?:rtiç|br)a|heiro)|(?:ga(?:rap|vet)|mochil)a|pe(?:riquito|dra)|est(?:rada|alo)|(?:alh|rat)o)|-(?:c(?:a(?:iena|cho)|o(?:rtiç|br)a|heiro)|(?:ga(?:rap|vet)|mochil)a|pe(?:riquito|dra)|est(?:rada|alo)|(?:alh|rat)o))|m(?:el(?:adas?-(?:ca(?:chorr|val)|invern|verã)o|(?:eir)?os?-bengala)|itas?-macaco)|reco(?:s-(?:pequim|ruão)|-(?:pequim|ruão))|imbondos?-chapéu|quesas?-belas)|c(?:a(?:co(?:s-(?:(?:cheir|band)o|noite|sabá)|-(?:(?:cheir|band)o|noite|sabá))|mbira(?:s-(?:(?:flech|pedr)a|serrote)|-(?:(?:flech|pedr)a|serrote))|quinhos?-bambá)|ieira(?:s-(?:(?:anáfeg|coro)a|boi)|-(?:(?:anáfeg|coro)a|boi))|elas?-(?:tabuleir|botã)o|ucus?-paca)|n(?:gue(?:s-(?:(?:(?:pend|bot)ã|sapateir|espet)o|obó)|-(?:(?:(?:pend|bot)ã|sapateir|espet)o|obó))|t(?:imento(?:s-(?:araponga|pobre)|-(?:araponga|pobre))|as?-bretão)|jeric(?:ões|ão)-(?:ceilã|molh)o|d(?:ibis?-juntas|acarus?-boi))|çã(?:s-(?:c(?:(?:[au]c|rav)o|ipreste|obra)|a(?:náfega|rrátel)|(?:espelh|prat)o|rosa|vime|boi)|-(?:c(?:(?:[au]c|rav)o|ipreste|obra)|a(?:náfega|rrátel)|(?:espelh|prat)o|rosa|vime|boi))|t(?:inho(?:s-(?:agulhas|lisboa|sargo)|-(?:agulhas|lisboa|sargo))|o(?:s-(?:engodo|salema)|-(?:engodo|salema)))|m(?:(?:icas?-(?:ca(?:chorr|del)|porc)|(?:ões|ão)-cord)a|oeiro(?:s-(?:espinho|corda)|-(?:espinho|corda)))|lva(?:s-(?:(?:cheir|pendã)o|marajó)|-(?:marajó|pendão)|íscos?-pernambuco)|d(?:ressilvas?-cheiro|eiras?-rei)|itacas?-maximiliano|parás?-cametá)|o(?:s(?:ca(?:s-(?:b(?:a(?:nheir|gaç)o|ich(?:eira|o))|e(?:lefante|stábulo)|ca(?:valos?|sa)|f(?:reira|ogo)|(?:madei|u)ra|inverno)|-(?:b(?:a(?:nheir|gaç)o|ich(?:eira|o))|e(?:lefante|stábulo)|ca(?:valos?|sa)|(?:madei|u)ra|fogo)|t(?:éis-(?:setúbal|jesus)|el-(?:setúbal|jesus)))|quitos?-parede)|ela(?:s-(?:mutum|ema)|-(?:mutum|ema))|n(?:stros?-gila|cos?-peru|tes?-ouro)|(?:uriscos?-sement|reias?-mangu)e|c(?:itaíbas?-leite|hos?-orelhas)|longós?-colher)|u(?:r(?:ici(?:s-(?:(?:tabuleir|porc)o|lenha)|-(?:(?:tabuleir|porc)o|lenha))|uré(?:s-(?:canudo|pajés)|-(?:canudo|pajé))|ta(?:s-(?:cheiro|parida)|-parida))|s(?:go(?:s-(?:irlanda|perdão)|-(?:irlanda|perdão))|aranhos?-água)|tu(?:ns-(?:asso[bv]io|fava)|m-(?:asso[bv]io|fava))|çambés?-espinhos|ngunzás?-cortar)|i(?:lh(?:o(?:s-(?:cobr|águ)|-águ)a|ãs?-pendão)|mos(?:as?-vereda|os?-cacho)|neiras?-petrópolis|cos?-topete|jos?-cavalo|olos-capim)|el(?:(?:(?:ões|ão)-(?:cabocl|morceg|soldad)|oeiros?-soldad)o|(?:ros?-(?:coleir|águ)|ancias?-cobr)a))|e(?:rv(?:a(?:s-(?:m(?:a(?:l(?:eitas|aca)|caé)|u(?:lher|ro)|o[iu]ra|endigo)|p(?:a(?:(?:rid|in)a|ssarinho)|(?:ântan|iolh)o|ontada)|a(?:n(?:(?:dorinh|t)a|jinho|il)|l(?:finete|ho)|mor)|b(?:(?:(?:ascul|ic)h|otã)o|(?:esteir|álsam)os|ugre)|c(?:(?:abr(?:it)?|obr)a|h(?:eir|umb)o)|sa(?:n(?:t(?:iago|ana)|gue)|(?:le)?po)|l(?:a(?:(?:vadeir|c)a|garto)|ouco)|g(?:o(?:[mt]|iabeir)a|uiné|elo)|f(?:(?:og|um|i)o|ebra)|(?:r(?:ober|a)t|our)o|ja(?:raraca|buti)|impingem|esteira)|-(?:m(?:a(?:l(?:eitas|aca)|caé)|u(?:lher|ro)|o[iu]ra|endigo)|p(?:a(?:(?:rid|in)a|ssarinho)|(?:ântan|iolh)o|ontada)|a(?:l(?:finete|míscar|ho)|n(?:(?:dorinh|t)a|il)|mor)|b(?:(?:(?:ascul|ic)h|otã)o|(?:esteir|álsam)os|ugre)|sa(?:n(?:t(?:iago|ana)|gue)|(?:le)?po)|c(?:(?:abr(?:it)?|obr)a|(?:humb|ã)o)|l(?:a(?:(?:vadeir|c)a|garto)|ouco)|g(?:o(?:[mt]|iabeir)a|uiné|elo)|f(?:(?:og|um|i)o|ebra)|(?:r(?:ober|a)t|our)o|ja(?:raraca|buti)|impingem|esteira))|i(?:lha(?:s-(?:(?:cheir|pomb)o|(?:árvo|leb)re|(?:angol|vac)a)|-(?:(?:cheir|pomb)o|(?:árvo|leb)re|(?:angol|vac)a))|nhas?-parida))|s(?:p(?:i(?:n(?:h(?:o(?:s-(?:c(?:a(?:(?:chor|rnei)ro|çada)|r(?:isto|uz)|erca)|(?:bananeir|agulh|roset)a|(?:ladrã|tour|urs)o|j(?:erusalém|udeu)|mari(?:ana|cá)|vintém|deus)|-(?:c(?:a(?:(?:chor|rnei)ro|çada)|r(?:isto|uz)|erca)|(?:bananeir|agulh|roset)a|(?:ladrã|tour|urs)o|j(?:erusalém|udeu)|mari(?:ana|cá)|vintém|deus))|eiro(?:s-(?:c(?:a(?:rneiro|iena)|risto|erca)|j(?:erusalém|udeu)|a(?:gulh|meix)a|vintém)|-(?:c(?:a(?:rneiro|iena)|risto|erca)|j(?:erusalém|udeu)|a(?:gulh|meix)a|vintém))|as?-(?:carneir|vead)o)|afres?-cuba)|ga(?:s-(?:(?:sangu|leit)e|ferrugem|água)|-(?:(?:sangu|leit)e|ferrugem|água)))|onjas?-raiz)|c(?:a(?:móneas?-alepo|das?-jabuti)|ovas?-macaco|umas?-sangue)|tercos?-jurema)|mbira(?:s-(?:ca(?:rrapato|çador)|(?:porc|sap)o)|-(?:ca(?:rrapato|çador)|(?:porc|sap)o))|n(?:xertos?-passarinho|redadeiras?-borla))|g(?:r(?:a(?:m(?:a(?:s-(?:p(?:(?:ernambuc|ast)o|onta)|(?:forquilh|sananduv)a|c(?:oradouro|idade)|ja(?:cobina|rdim)|ma(?:rajó|caé)|adorno)|-(?:p(?:(?:ernambuc|ast)o|onta)|(?:forquilh|sananduv)a|c(?:oradouro|idade)|ja(?:cobina|rdim)|ma(?:rajó|caé)|adorno))|inha(?:s-(?:campinas|jacobina|raiz)|-(?:campinas|jacobina|raiz)))|vatá(?:s-(?:(?:moquec|agulh)a|c(?:o[iu]ro|erca)|(?:ganch|lajed)o|r(?:aposa|ede)|árvore|tingir)|-(?:(?:moquec|agulh)a|c(?:o[iu]ro|erca)|(?:ganch|lajed)o|r(?:aposa|ede)|árvore|tingir))|lhas?-crista)|ão(?:s-(?:(?:c(?:aval|humb)|(?:malu|bi)c|gal)o|p(?:orco|ulha))|-(?:(?:(?:malu|bi)c|(?:cav|g)al)o|p(?:orco|ulha))|zinhos?-galo)|inaldas?-viúva)|a(?:l(?:o(?:s-(?:p(?:enacho|luma)|b(?:ando|riga)|rebanho|fita|ebó)|-(?:p(?:enacho|luma)|b(?:ando|riga)|rebanho|fita|ebó))|inha(?:s-(?:bugre|faraó|água)|-(?:bugre|faraó|água)))|fanhoto(?:s-(?:(?:(?:marmel|coqu)eir|arribaçã)o|(?:jurem|prag)a)|-(?:(?:(?:marmel|coqu)eir|arribaçã)o|(?:jurem|prag)a))|vi(?:ões-(?:(?:(?:colei|ser)r|queimad)a|a(?:nta|ruá)|penacho)|ão-(?:(?:(?:colei|ser)r|queimad)a|a(?:nta|ruá)|penacho))|meleira(?:-(?:(?:lombrigueir|p(?:in|ur)g)a|(?:cansaç|venen)o)|s-(?:(?:cansaç|venen)o|lombrigueiras|p(?:in|ur)ga))|to(?:-(?:madagáscar|algália)|s-algália)|r(?:oupas?-segunda|gantas-ferro))|u(?:a(?:birob(?:eira(?:s-(?:cachorro|minas)|-(?:cachorro|minas))|a(?:s-(?:cachorro|minas)|-(?:cachorro|minas)))|ricangas?-bengala)|iratãs?-coqueiro)|o(?:iab(?:a(?:s-(?:(?:espinh|macac)o|anta)|-(?:(?:espinh|macac)o|anta))|eiras?-(?:cuti|pac)a)|meiros?-minas|gós?-guariba|elas?-lobo)|e(?:rgeli(?:ns|m)-laguna|ngibres?-dourar)|irass(?:óis|ol)-batatas)|t(?:r(?:e(?:vo(?:s-(?:c(?:ar(?:retilha|valho)|heiro)|(?:se[ar]r|águ)a)|-(?:c(?:ar(?:retilha|valho)|heiro)|(?:se[ar]r|águ)a))|moço(?:s-(?:cheiro|jardim|minas)|-(?:cheiro|jardim|minas)))|i(?:go(?:s-(?:p(?:rioste|erdiz)|milagre|israel|verão)|-(?:p(?:rioste|erdiz)|milagre|israel|verão))|colino(?:s-c(?:hifre|rista)|-c(?:hifre|rista))|nca(?:is|l)-pau)|(?:aças?-bibliotec|épanos?-coro)a|omb(?:as?-elefante|etas?-arauto)|utas?-lago)|a(?:i(?:uiá(?:s-(?:c(?:omer|ipó)|pimenta|jardim|quiabo|goiás)|-(?:c(?:omer|ipó)|pimenta|jardim|quiabo|goiás))|nhas?-(?:cors|ri)o)|r(?:taruga(?:s-(?:couro|pente)|-(?:couro|pente))|umã(?:s-espinhos?|-espinhos?))|m(?:b(?:etarus?-espinh|ori[ls]-brav)o|anqueiras?-leite)|j(?:ujás?-(?:cabacinh|quiab)o|ás?-cobra)|(?:xizeiros?-tint|tus?-folh)a|b(?:ocas?-marajó|acos?-cão)|quaris?-cavalo)|i(?:n(?:gui(?:s-(?:c(?:(?:aien|ol)a|ipó)|(?:leit|peix)e)|-(?:c(?:(?:aien|ol)a|ipó)|(?:leit|peix)e))|hor(?:ões|ão)-lombriga)|mbó(?:s-(?:boticário|caiena|jacaré|peixe|raiz)|-(?:boticário|caiena|jacaré|peixe|raiz))|gres?-bengala)|o(?:m(?:at(?:e(?:s-(?:princesa|árvore)|-(?:princesa|árvore))|inhos?-capucho)|ilhos?-creta)|(?:rós?-espinh|adas-cour)o|petes?-cardeal)|u(?:c(?:u(?:ns-(?:carnaúba|redes)|m-(?:carnaúba|redes))|anos?-cinta)|bar(?:ões|ão)-focinho|lipas?-jardim|ias?-areia)|e(?:m(?:betarus?-espinho|porãos?-coruche)|rebintina-quio)|úberas?-(?:invern|verã)o)|r(?:a(?:to(?:s-(?:p(?:a(?:lmatória|iol)|entes|raga)|(?:es(?:pinh|got)|algodã)o|(?:t(?:aquar|romb)|águ)a|c(?:ouro|asa)|fa(?:raó|va)|bambu)|-(?:p(?:a(?:lmatória|iol)|entes|raga)|(?:es(?:pinh|got)|algodã)o|(?:t(?:aquar|romb)|águ)a|c(?:ouro|asa)|fa(?:raó|va)|bambu))|ízes-(?:c(?:(?:edr|urv)o|o(?:bra|rvo)|h(?:eiro|á)|âmaras|ana)|b(?:(?:ar?beir|randã)o|ugre)|(?:angélic|mostard|quin)a|l(?:agarto|opes)|sol(?:teira)?|t(?:ucano|iú)|f(?:rade|el)|guiné|pipi)|iz-(?:c(?:(?:edr|urv)o|o(?:bra|rvo)|h(?:eiro|á)|âmaras|ana)|b(?:(?:ar?beir|randã)o|ugre)|(?:angélic|mostard|quin)a|l(?:agarto|opes)|sol(?:teira)?|t(?:ucano|iú)|f(?:rade|el)|guiné|pipi)|b(?:uge(?:ns|m)-cachorr|anetes?-caval)o|m(?:as?-bezerro|os?-seda)|pés?-saci)|o(?:s(?:a(?:-(?:(?:c(?:a(?:chorr|bocl)|h?ã)|[bl]ob|defunt|o[iu]r|musg)o|p(?:áscoa|au)|jericó|toucar)|s-(?:(?:c(?:a(?:chorr|bocl)|h?ã)|[bl]ob|defunt|o[iu]r|musg)o|jericó|páscoa|toucar))|ário(?:s-(?:jamb[ou]|ifá)|-(?:jamb[ou]|ifá))|e(?:tas?-pernambu|iras?-damas)co)|uxin(?:óis-(?:m(?:uralha|anaus)|(?:espadan|jav)a|caniços)|ol-(?:m(?:uralha|anaus)|(?:espadan|jav)a|caniços))|(?:balos?-(?:arei|galh)|az(?:es)?-bandeir)a|ca(?:s-(?:flores|eva)|-(?:flores|eva)))|(?:e(?:sedás?-cheir|des?-leã)|ábanos?-caval)o|inocerontes?-Java)|s(?:a(?:l(?:sa(?:s-(?:c(?:a(?:stanheiro|valos)|heiro|upim)|(?:roch|águ)a|burro)|-(?:c(?:a(?:stanheiro|valos?)|upim)|(?:roch|águ)a|burro)|parrilhas?-lisboa)|va(?:s-(?:pernambuco|marajó)|-(?:pernambuco|marajó))|amandras?-água)|r(?:a(?:ndis?-(?:(?:carangu|gargar)ej|espinh)o|(?:magos?-águ|s?-pit)a)|dinha(?:s-(?:ga(?:lha|to)|laje)|-(?:ga(?:lha|to)|laje))|go(?:s-(?:beiço|dente)|-(?:beiço|dente))|ros?-pito)|n(?:haç(?:o(?:s-(?:(?:(?:coqu|mamo)eir|fog)o|encontros)|-(?:(?:(?:coqu|mamo)eir|fog)o|encontros))|us?-(?:encont|mamoei)ro)|ãs?-samambaia)|p(?:(?:ucaias?-castanh|és?-capoeir)a|o(?:-chifres?|s-chifre))|gui(?:n?s|m)?-bigode|mambaias?-penacho|[bv]acus?-coroa)|u(?:rucucu(?:s-(?:p(?:ati|ind)oba|fogo)|-(?:p(?:ati|ind)oba|fogo))|ma(?:umeiras?-macaco|gres?-provença|rés?-pedras))|orgo(?:s-(?:(?:vassour|espig)a|pincel|alepo)|-(?:(?:vassour|espig)a|pincel|alepo))|iris?-coral)|j(?:a(?:ca(?:r(?:andá(?:s-(?:campinas|espinho|sangue)|-(?:campinas|espinho|sangue))|és?-óculos)|(?:tir(?:ões|ão)-capot|s?-pobr)e)|smi(?:ns-(?:c(?:a(?:chorro|iena)|erca)|soldado|leite)|m-(?:c(?:a(?:chorro|iena)|erca)|soldado|leite))|(?:mb(?:eir)?os?-malac|lapas?-lisbo|puçá-coleir)a|tobá(?:s-(?:porco|anta)|-(?:porco|anta))|buticab(?:eiras?-campinas|as?-cipó)|r(?:aracas?-agosto|rinhas?-franja))|u(?:n(?:co(?:s-(?:c(?:a(?:ngalh|br)|obr)a|banhado)|-(?:c(?:a(?:ngalh|br)|obr)a|banhado))|ta(?:s-c(?:alangro|obra)|-c(?:alangro|obra))|ças-c(?:heiro|onta))|á(?:s-c(?:apote|omer)|-c(?:apote|omer))|rubebas?-espinho|ciris?-comer|quis?-cerca)|o(?:ões-(?:santarém|barros?|leite|puça)|ão-(?:santarém|barro|leite|puça))|e(?:quitibás?-agulheir|taís?-pernambuc)o|i(?:queranas?-goiás|tiranas?-leite))|l(?:i(?:m(?:(?:a(?:s-(?:cheir|umbig|bic)|-(?:umbig|bic))|eiras?-umbig)o|ões-(?:c(?:aiena|heiro)|galinha)|ão-(?:c(?:aiena|heiro)|galinha)|os?-manta)|n(?:ho(?:s-(?:raposa|cuco)|-(?:raposa|cuco))|gu(?:eir(?:ões|ão)-canud|ados?-ri)o)|x(?:a(?:s-(?:lei|pau)|-(?:lei|pau))|inhas?-fundura))|a(?:ranj(?:a(?:s-(?:(?:terr|onç)a|umbigo)|-(?:umbigo|onça))|eiras?-vaqueiro)|g(?:art(?:as?-(?:vidr|fog)o|os?-água)|ostas?-espinho)|lás?-cintura)|e(?:it(?:e(?:s-(?:ga(?:meleir|linh)a|cachorro)|-(?:ga(?:meleir|linh)a|cachorro)|iras?-espinho)|ugas?-burro)|sma-conchinha)|o(?:ur(?:eiro(?:s-(?:jardim|apolo)|-(?:jardim|apolo))|os?-cheiro)|ireiros?-apolo)|u(?:(?:tos-quaresm|vas-pastor)a|zernas?-sequeiro)|írios?-petrópolis)|v(?:a(?:sso(?:urinha(?:s-(?:(?:relógi|botã)o|varrer)|-(?:(?:relógi|botã)o|varrer))|ira(?:s-(?:fe(?:iticeira|rro)|bruxa)|-(?:feiticeir|brux)a))|ra(?:s-(?:foguete|o[iu]ro|canoa)|-o[iu]ro)|les?-arinto)|e(?:r(?:g(?:onhas?-estudante|as?-jabuti)|(?:melhinhas?-galh|ças?-cã)o)|spa(?:s-(?:rodeio|cobra)|-(?:rodeio|cobra))|l(?:ames?-cheiro|udos?-penca)|ados?-virgínia|nenos?-porco)|i(?:d(?:eiras?-enforcado|oeiros?-papel)|oletas?-(?:par|da)ma|nháticos?-espinho)|oador(?:es)?-pedra)|qu(?:i(?:n(?:a(?:s-(?:(?:per(?:nambuc|iquit)|vead)o|c(?:errado|aiena|ipó)|r(?:e(?:mígi|g)o|aiz)|goiás)|-(?:(?:per(?:nambuc|iquit)|vead)o|c(?:errado|aiena|ipó)|r(?:e(?:mígi|g)o|aiz)|goiás))|gombós?-(?:espinh|cheir)o)|ab(?:o(?:s-(?:c(?:aiena|heiro|ipó)|(?:angol|quin)a)|-(?:c(?:aiena|heiro|ipó)|(?:angol|quin)a)|ranas?-espinho)|eiros?-angola)|(?:gombós?-cheir|to-pernambuc)o|bondos?-água)|ati(?:s-(?:bando|vara)|-(?:bando|vara))|ássias?-caiena)|á(?:rvore(?:s-(?:(?:bálsam|incens|ranc?h|seb)o|(?:gra(?:lh|x)|orquíde)a|c(?:hocalho|oral|uia)|l(?:ótus|eite|ã)|(?:jud|vel)as|a(?:rr|n)oz|pagode|natal)|-(?:(?:bálsam|incens|ranc?h|seb)o|(?:gra(?:lh|x)|orquíde)a|c(?:hocalho|oral|uia)|l(?:ótus|eite|ã)|(?:jud|vel)as|a(?:rr|n)oz|pagode|natal))|(?:gu(?:as?-colóni|ias?-poup)|caros?-galinh)a)|i(?:n(?:ha(?:me(?:s-(?:c(?:oriolá|ão)|lagartixa|enxerto|benim)|-(?:c(?:oriolá|ão)|lagartixa|enxerto|benim))|íbas?-rego)|censos?-caiena|gás?-fogo)|mb(?:(?:ur(?:ana(?:s-(?:c(?:ambã|heir)|espinh)|-(?:espinh|cambã))|is?-cachorr)|aúbas?-(?:cheir|vinh))o|és?-(?:amarra|come)r)|p(?:ês?-impingem|ecas?-cuiabá)|xoras-cheiro|scas?-sola)|n(?:o(?:z(?:-(?:b(?:a(?:tauá|nda)|ugre)|co(?:(?:br|l)a|co)|(?:arec|galh)a)|es-(?:(?:co(?:br|l)|arec|galh)a|b(?:a(?:tauá|nda)|ugre)))|gueira(?:s-(?:cobra|pecã)|-(?:cobra|pecã)))|a(?:(?:rciso(?:-(?:invern|cheir)|s-cheir)|valhas?-macac)o|nás?-raposa)|iqui(?:ns-(?:areia|saco)|m-areia)|ené(?:ns|m)-galinha|ós-cachorro)|u(?:va(?:s-(?:(?:(?:espin|fac)h|g(?:enti|al)|c(?:heir|ã)|urs)o|r(?:ato|ei)|praia|obó)|-(?:(?:(?:espin|fac)h|g(?:enti|al)|urs|cã)o|praia|obó|rei))|(?:irapurus?-band|xis?-morceg|bás?-fach)o|queté(?:s-(?:água|obó)|-(?:água|obó))|m(?:baranas?-abelha|iris?-cheiro)|apuçás?-coleira|ntués?-obó)|h(?:ortelã(?:s-(?:c(?:a(?:mpina|valo)|heiro)|b(?:urro|oi)|leite)|-(?:c(?:a(?:mpina|valo)|heiro)|b(?:urro|oi)|leite))|idras?-água)|o(?:liveira(?:s-(?:marrocos|cheiro)|-(?:marrocos|cheiro))|iti(?:s-porcoóleo-copaíbaóleos-copaíba|-porco)|stras?-pobre)|ç(?:ana-áçúcar|or-Rosa)|ébanos?-zanzibar|xexéu-bananeira|Grão-Bico))\b">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider improving maintainability of the regex pattern.

The regex pattern is quite extensive and complex. Consider:

  1. Breaking it down into smaller, named sub-patterns
  2. Adding comments to explain different sections
  3. Using XML entities for common patterns

Example structure:

-<!ENTITY hyphenised_expressions "(?U)\b(?!feij(?:ão|ões)-frade)(...)">
+<!-- Common prefixes -->
+<!ENTITY prefix_ca "c(?:a(?:r(?:rap(?:icho(?:s-(?:c(?:a(?:(?:rneir|val)o|lçada)|igana)|(?:agu|ove)lha|l(?:inho|ã)|boi)|-(?:c(?:a(?:(?:rneir|val)o|lçada)|igana)|(?:agu|ove)lha|l(?:inho|ã)|boi))...">
+<!-- Common suffixes -->
+<!ENTITY suffix_common "(?:eiro|eira|eiros|eiras)">
+<!-- Main pattern -->
+<!ENTITY hyphenised_expressions "(?U)\b(?!feij(?:ão|ões)-frade)(&prefix_ca;|...)">

Committable suggestion was skipped due to low confidence.

<afterbreak></afterbreak>
</rule>
<rule break="no">
<beforebreak>(?U)\b(абз|арк|ауд|бл|буд|бульв|вул|держ|дод|зав|зб|зв|зовн|екон|к|кв|канд|кн|напр|нпр|нац|обл|оп|пл|пол|поч|пп|пор|просп|розд|стор|табл|[Тт]]ел|ч|част)\.[\h\v]*</beforebreak>
<beforebreak>\b(абз|арк|ауд|бл|буд|бульв|вул|держ|дод|зав|зб|зв|зовн|екон|к|кв|канд|кн|напр|нпр|нац|обл|оп|пл|пол|поч|пп|пор|просп|розд|стор|табл|[Тт]]ел|ч|част)\.[\h\v]*</beforebreak>
Copy link
Contributor

@coderabbitai coderabbitai bot Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Typographical error in character class in regex pattern

There's an extra closing bracket ']' in the character class [Тт]]ел, which may lead to incorrect regex matching or a syntax error. The correct character class should be [Тт]ел.

Apply this diff to fix the regex:

-<beforebreak>\b(абз|арк|ауд|бл|буд|бульв|вул|держ|дод|зав|зб|зв|зовн|екон|к|кв|канд|кн|напр|нпр|нац|обл|оп|пл|пол|поч|пп|пор|просп|розд|стор|табл|[Тт]]ел|ч|част)\.[\h\v]*</beforebreak>
+<beforebreak>\b(абз|арк|ауд|бл|буд|бульв|вул|держ|дод|зав|зб|зв|зовн|екон|к|кв|канд|кн|напр|нпр|нац|обл|оп|пл|пол|поч|пп|пор|просп|розд|стор|табл|[Тт]ел|ч|част)\.[\h\v]*</beforebreak>
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
<beforebreak>\b(абз|арк|ауд|бл|буд|бульв|вул|держ|дод|зав|зб|зв|зовн|екон|к|кв|канд|кн|напр|нпр|нац|обл|оп|пл|пол|поч|пп|пор|просп|розд|стор|табл|[Тт]]ел|ч|част)\.[\h\v]*</beforebreak>
<beforebreak>\b(абз|арк|ауд|бл|буд|бульв|вул|держ|дод|зав|зб|зв|зовн|екон|к|кв|канд|кн|напр|нпр|нац|обл|оп|пл|пол|поч|пп|пор|просп|розд|стор|табл|[Тт]ел|ч|част)\.[\h\v]*</beforebreak>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arysin There's a typo in the regex here (]])

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

@danielnaber
Copy link
Member

Thanks. It's good to be compatible with JDK19+, but we shouldn't make it a requirement, as Grails (used for community.languagetool.org) won't work with Java > 17 yet (grails/grails-core#13522).

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between bd299bd and 47c9805.

📒 Files selected for processing (2)
  • languagetool-commandline/src/test/java/org/languagetool/commandline/AbstractSecurityTestCase.java (0 hunks)
  • languagetool-commandline/src/test/java/org/languagetool/commandline/MainTest.java (5 hunks)
💤 Files with no reviewable changes (1)
  • languagetool-commandline/src/test/java/org/languagetool/commandline/AbstractSecurityTestCase.java
🔇 Additional comments (3)
languagetool-commandline/src/test/java/org/languagetool/commandline/MainTest.java (3)

108-111: ⚠️ Potential issue

Ensure setup and teardown processes remain effective after superclass removal.

With the removal of extends AbstractSecurityTestCase, the setUp() and tearDown() methods no longer call super.setUp() and super.tearDown(). Verify that any essential initialization or cleanup previously handled by the superclass is now adequately managed within these methods to prevent any unintended side effects during test execution.

Also applies to: 118-120


131-131: ⚠️ Potential issue

Confirm that the exit code 1 for the help command is intentional.

The test expects an exit code of 1 when displaying the usage message with -h. Typically, help commands exit with status code 0 to indicate successful execution. Verify whether returning 1 is intended behavior for this application or if it should be updated to 0.

Run the following script to check the actual exit code when running the help command:

#!/bin/bash
# Description: Execute the help command and display the exit code.

# Expected result: Exit code should be 0 if the help command executed successfully.
java -cp "$(echo target/classes:~/.m2/repository/*)" org.languagetool.commandline.Main -h
echo "Exit code: $?"

37-37: ⚠️ Potential issue

Verify the impact of removing extends AbstractSecurityTestCase.

The MainTest class no longer extends AbstractSecurityTestCase. This superclass may have provided important functionality, such as handling System.exit calls during tests to prevent the JVM from exiting. Ensure that this change doesn't adversely affect the test behavior and that any necessary functionality from the superclass is replicated or no longer required.

Run the following script to check for any reliance on AbstractSecurityTestCase methods:

Comment on lines +125 to +131
Process process = new ProcessBuilder(
"java", "-cp", System.getProperty("java.class.path"), "org.languagetool.commandline.Main", "-h"
).start();
int exitCode = process.waitFor();
String output = readProcessOutput(process);
assertTrue(output.contains("Usage: java -jar languagetool-commandline.jar"));
assertEquals("Exit status", 1, exitCode);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Capture and assert the error stream when running subprocesses.

In the test method, only the standard output from the subprocess is captured. To ensure comprehensive testing, consider capturing the error stream (process.getErrorStream()) as well. This allows you to assert that no unexpected errors are occurring during execution and helps in diagnosing issues that may not appear in standard output.

Modify the test to read and assert the error output:

String errorOutput = readProcessError(process);
assertTrue("Error output should be empty", errorOutput.isEmpty());

And add a method to read the error stream:

private String readProcessError(Process process) throws IOException {
  try (BufferedReader reader = new BufferedReader(new InputStreamReader(process.getErrorStream()))) {
    StringBuilder output = new StringBuilder();
    String line;
    while ((line = reader.readLine()) != null) {
      output.append(line).append(System.lineSeparator());
    }
    return output.toString();
  }
}

Comment on lines +659 to +668
private String readProcessOutput(Process process) throws IOException {
try (BufferedReader reader = new BufferedReader(new InputStreamReader(process.getInputStream()))) {
StringBuilder output = new StringBuilder();
String line;
while ((line = reader.readLine()) != null) {
output.append(line).append(System.lineSeparator());
}
return output.toString();
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Prevent potential deadlocks by consuming both output streams of the subprocess.

The readProcessOutput method reads only from the subprocess's standard output stream. If the subprocess writes enough data to the error stream without it being read, it can block due to the buffer being full, leading to a deadlock. To mitigate this risk, read from both the standard output and error streams.

Refactor the method to consume both streams:

private String readProcessOutput(Process process) throws IOException {
  StringBuilder output = new StringBuilder();
  try (
    BufferedReader stdOutReader = new BufferedReader(new InputStreamReader(process.getInputStream()));
    BufferedReader stdErrReader = new BufferedReader(new InputStreamReader(process.getErrorStream()))
  ) {
    String line;
    while ((line = stdOutReader.readLine()) != null) {
      output.append(line).append(System.lineSeparator());
    }
    while ((line = stdErrReader.readLine()) != null) {
      // Optionally append to output or handle separately
      output.append(line).append(System.lineSeparator());
    }
  }
  return output.toString();
}

<afterbreak></afterbreak>
</rule>
<rule break="no">
<beforebreak>(?U)\b(абз|арк|ауд|бл|буд|бульв|вул|держ|дод|зав|зб|зв|зовн|екон|к|кв|канд|кн|напр|нпр|нац|обл|оп|пл|пол|поч|пп|пор|просп|розд|стор|табл|[Тт]]ел|ч|част)\.[\h\v]*</beforebreak>
<beforebreak>\b(абз|арк|ауд|бл|буд|бульв|вул|держ|дод|зав|зб|зв|зовн|екон|к|кв|канд|кн|напр|нпр|нац|обл|оп|пл|пол|поч|пп|пор|просп|розд|стор|табл|[Тт]]ел|ч|част)\.[\h\v]*</beforebreak>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arysin There's a typo in the regex here (]])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants