Cb changes 20241030 #10974

tiff · 2024-10-30T14:54:35Z

Summary by CodeRabbit

Release Notes

New Features
- Expanded vocabulary for German and English spell-checking, including proper nouns, compound words, and modern vernacular.
- New rules added for improved punctuation handling in greetings for German and unnecessary punctuation in French.
- New entries added to the ignore lists for both German and English, covering a variety of terms and acronyms.
- Enhanced multi-token suggestions for German and English to support more phrases.
Bug Fixes
- Enhanced disambiguation rules for Spanish and French to improve accuracy in grammar checking.

These enhancements aim to improve the overall user experience by providing more accurate language processing and spell-checking capabilities.

coderabbitai · 2024-10-30T14:54:45Z

Walkthrough

The pull request introduces a significant expansion of LanguageTool's vocabulary across multiple language modules. Key changes include the addition of numerous proper nouns to spelling_global.txt, updates to ignore lists and spelling dictionaries in both German and English, and the introduction of new rules in the disambiguation XML files for Spanish and French. These modifications aim to enhance the tool's ability to recognize and process a broader range of terms and linguistic constructs without altering existing structures.

Changes

File Path	Change Summary
languagetool-core/src/main/resources/org/languagetool/resource/spelling_global.txt	Added a large number of proper nouns related to various topics, including many Olympic entries.
languagetool-language-modules/de/src/main/resources/org/languagetool/resource/de/hunspell/ignore.txt	Added two new entries: `IZF` and `Martijn/S #name`.
languagetool-language-modules/de/src/main/resources/org/languagetool/resource/de/hunspell/spelling.txt	Added multiple new compound words and variations related to finance and other terms.
languagetool-language-modules/de/src/main/resources/org/languagetool/resource/de/multitoken-suggest.txt	Added new multi-token suggestions: `Fair Market Value/S`, `Bami Goreng/S`, `Sambal Ulek/S`, `Sambal Oelek/S`.
languagetool-language-modules/de/src/main/resources/org/languagetool/rules/de/remote-rule-filters.xml	Introduced a new rule for handling greetings followed by a comma.
languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/hunspell/ignore.txt	Added numerous entries, including `MBR`, `QBR`, `navbar`, and `Martijn`.
languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/hunspell/spelling.txt	Added new words and phrases, including `outgroup` and `ingroup`, and various scientific terms.
languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/multiwords.txt	Added various multi-word terms with corresponding part-of-speech tags, including `sambal ulek`.
languagetool-language-modules/en/src/main/resources/org/languagetool/rules/en/grammar.xml	Modified the `TALK_AT_TO` rule, adding antipatterns and changing its default state to "off".
languagetool-language-modules/es/src/main/resources/org/languagetool/resource/es/disambiguation.xml	Added multiple new rules and modified existing ones to improve disambiguation in Spanish.
languagetool-language-modules/fr/src/main/resources/org/languagetool/resource/fr/disambiguation.xml	Added new rules for handling nominal groups and ambiguous cases, and removed some unnecessary rules.
languagetool-language-modules/fr/src/main/resources/org/languagetool/rules/fr/remote-rule-filters.xml	Introduced a new rule group for unnecessary punctuation handling.

Possibly related PRs

Cb changes 20240830 #10856: This PR adds numerous proper nouns to spelling_global.txt, similar to the main PR's focus on expanding proper nouns in the same file.
Cb changes 20240905 #10865: Similar to the main PR, this PR also adds new entries to spelling_global.txt, enhancing the vocabulary for spell-checking.
[de] add words + gGEC AP + removed currencies #10961: This PR adds new entries to ignore.txt, which may relate to the overall expansion of recognized terms in the language tool, aligning with the main PR's goal of enhancing the knowledge base.

Suggested reviewers

jaumeortola
p-goulart
susanaboatto

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (9)

languagetool-language-modules/de/src/main/resources/org/languagetool/resource/de/hunspell/spelling.txt (1)

65220-65223: LGTM: Financial terms with proper formatting

The financial instrument terms follow proper German compound noun formation rules with correct hyphenation and capitalization. The /S and /SN suffixes are appropriately used for singular and plural forms.

Consider adding other common variants like "Hebel-Bear-Zertifikat" for completeness, as these typically come in pairs in financial contexts.

languagetool-language-modules/en/src/main/resources/org/languagetool/rules/en/grammar.xml (1)

104447-104458: LGTM! Consider adding examples in comments

The antipatterns effectively handle common cases that should not trigger the rule. The skip="6" flexibility in the "give talks" pattern is particularly useful for various phrasings.

Consider adding XML comments with example matches for each antipattern to improve maintainability:

      <antipattern>
+       <!-- Matches possessive forms like "John's" -->
        <token postag="NNP" />
        <token>'s</token>
      </antipattern>
      <antipattern>
+       <!-- Matches compound proper nouns like "John Smith" -->
        <token postag="NNP" />
        <token case_sensitive="yes" regexp="yes">[A-Z].*</token>
      </antipattern>
      <antipattern>
+       <!-- Matches phrases like "give an interesting talk" -->
        <token inflected="yes" skip="6">give</token>
        <token regexp="yes">talks?</token>
      </antipattern>

languagetool-language-modules/fr/src/main/resources/org/languagetool/rules/fr/remote-rule-filters.xml (2)

741-754: LGTM! The new rule group for handling unnecessary punctuation is well-structured.

The implementation follows the existing pattern structure and correctly handles unnecessary punctuation marks at sentence endings and standalone quotation marks.

Consider enhancing the rules by:

Adding more examples to cover different scenarios
Including message attributes to provide specific error messages

 <rulegroup id="AI_FR_GGEC_UNNECESSARY.*" name="">
     <rule>
         <pattern>
             <token postag="SENT_END" regexp="yes">[","]</token>
         </pattern>
+        <message>Remove unnecessary punctuation mark at the end of the sentence.</message>
         <example correction="">Je suis Chris.<marker>"</marker></example>
+        <example correction="">L'histoire continue,<marker>,</marker></example>
     </rule>
     <rule>
         <pattern>
             <token>"</token>
         </pattern>
+        <message>Remove standalone quotation mark.</message>
         <example correction="">Je suis Chris.<marker>"</marker></example>
+        <example correction="">Il a dit<marker>"</marker>bonjour.</example>
     </rule>
 </rulegroup>

744-744: Consider expanding the pattern to catch more cases of unnecessary punctuation.

The current pattern only catches comma and quotation marks. Consider including other punctuation marks that might be unnecessarily repeated.

-    <token postag="SENT_END" regexp="yes">[","]</token>
+    <token postag="SENT_END" regexp="yes">[",";:!?]</token>

languagetool-language-modules/de/src/main/resources/org/languagetool/resource/de/multitoken-suggest.txt (1)

3113-3114: Consider consolidating the Sambal variants.

"Sambal Ulek" and "Sambal Oelek" refer to the same Indonesian chili paste. Consider keeping only the more common spelling variant to maintain consistency and reduce redundancy in the dictionary.

languagetool-language-modules/de/src/main/resources/org/languagetool/rules/de/remote-rule-filters.xml (1)

2453-2462: Consider enhancing the greeting pattern coverage.

The rule implementation for handling commas after greetings is logically sound. However, consider these improvements:

Add support for additional greeting variations like "Schönen" or "Herzlichen".
Consider reducing the skip="2" to be more precise and avoid potential false positives.

Apply this diff to enhance the pattern coverage:

             <rule>
                 <pattern>
-                    <token>Guten</token>
-                    <token skip="2">Morgen|Abend|Tag</token>
+                    <token regexp="yes">Guten|Schönen|Herzlichen</token>
+                    <token skip="1">Morgen|Abend|Tag</token>
                     <marker>
                         <token postag="SENT_END">,</token>
                     </marker>
                 </pattern>
                 <example correction="">Guten Morgen Frank<marker>,</marker></example>
+                <example correction="">Schönen Tag Herr Schmidt<marker>,</marker></example>
             </rule>

languagetool-core/src/main/resources/org/languagetool/resource/spelling_global.txt (1)

26704-26710: Consider standardizing the Brazilian league entries.

The Brazilian league is listed with three variations. While all are valid, consider standardizing to the official name to maintain consistency.

-Série A
-Campeonato Brasileiro Série A
-Brasileirão Série A
+Campeonato Brasileiro Série A

languagetool-language-modules/fr/src/main/resources/org/languagetool/resource/fr/disambiguation.xml (1)

11217-11225: Add examples and consider refining the acronym pattern.

The rule correctly handles acronyms at sentence start, but could benefit from some improvements:

Add example cases in XML comments for better maintainability:

 <rule>
+    <!-- Examples:
+         Matches: SNCF, RATP, CNRS
+         Non-matches: ABC123, XYZabc -->
     <pattern case_sensitive="yes">

Consider making the pattern more specific to known French acronym patterns:

-    <token regexp="yes" postag="SENT_END">[BCDFGHJKLMNPQRSTVWXZ]{3,6}s?</token>
+    <token regexp="yes" postag="SENT_END">(?:SNCF|RATP|CNRS|[BCDFGHJKLMNPQRSTVWXZ]{3,6})s?</token>

This would help prevent false positives while still maintaining flexibility for common French acronyms.

languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/disambiguation.xml (1)

16392-16400: Consider refining the acronym pattern to reduce false positives.

The rule correctly handles common acronym patterns at sentence start, but the current regexp [BCDFGHJKLMNPQRSTVWXZ]{3,6}s? might be too permissive and could match invalid consonant sequences.

Consider these improvements:

-                    <token regexp="yes" postag="SENT_END">[BCDFGHJKLMNPQRSTVWXZ]{3,6}s?</token>
+                    <token regexp="yes" postag="SENT_END">(?:UNESCO|NATO|NASA|[A-Z][BCDFGHJKLMNPQRSTVWXZ]{2,5})s?</token>

This would:

Explicitly allow common acronyms
Require at least one valid letter start
Still maintain flexibility for other valid acronyms

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 42f2c0e and 2830bf9.

📒 Files selected for processing (14)

languagetool-core/src/main/resources/org/languagetool/resource/spelling_global.txt (1 hunks)
languagetool-language-modules/de/src/main/resources/org/languagetool/resource/de/disambiguation.xml (1 hunks)
languagetool-language-modules/de/src/main/resources/org/languagetool/resource/de/hunspell/ignore.txt (1 hunks)
languagetool-language-modules/de/src/main/resources/org/languagetool/resource/de/hunspell/spelling.txt (2 hunks)
languagetool-language-modules/de/src/main/resources/org/languagetool/resource/de/multitoken-suggest.txt (2 hunks)
languagetool-language-modules/de/src/main/resources/org/languagetool/rules/de/remote-rule-filters.xml (1 hunks)
languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/disambiguation.xml (1 hunks)
languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/hunspell/ignore.txt (1 hunks)
languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/hunspell/spelling.txt (1 hunks)
languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/multiwords.txt (1 hunks)
languagetool-language-modules/en/src/main/resources/org/languagetool/rules/en/grammar.xml (1 hunks)
languagetool-language-modules/es/src/main/resources/org/languagetool/resource/es/disambiguation.xml (1 hunks)
languagetool-language-modules/fr/src/main/resources/org/languagetool/resource/fr/disambiguation.xml (1 hunks)
languagetool-language-modules/fr/src/main/resources/org/languagetool/rules/fr/remote-rule-filters.xml (1 hunks)

🔥 Files not summarized due to errors (2)

languagetool-language-modules/de/src/main/resources/org/languagetool/resource/de/disambiguation.xml: Error: Server error: no LLM provider could handle the message
languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/disambiguation.xml: Error: Server error: no LLM provider could handle the message

✅ Files skipped from review due to trivial changes (1)

languagetool-language-modules/de/src/main/resources/org/languagetool/resource/de/hunspell/ignore.txt

🔇 Additional comments (11)

languagetool-language-modules/de/src/main/resources/org/languagetool/resource/de/hunspell/spelling.txt (2)

62885-62889: LGTM: Correct comparative adjective forms

The added comparative forms of "strukturiert" follow proper German adjective declension rules and cover all necessary grammatical cases.

65217-65219: LGTM: Valid compound adjective formations

The new compound adjectives follow proper German word formation rules and are marked correctly with the /A suffix for adjective variations.

languagetool-language-modules/en/src/main/resources/org/languagetool/rules/en/grammar.xml (1)

104460-104462: LGTM! Pattern changes improve precision

The addition of chunk_re attributes makes the rule more accurate by considering phrase boundaries. The pattern will now better handle complex sentence structures.

Let's verify there are no conflicting chunk patterns in other rules:

languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/hunspell/spelling.txt (2)

Line range hint 1-4: LGTM! Clear documentation of file purpose.

The header comment effectively explains:

The file's purpose (extending spell checker)
Its relationship with ignore.txt
Guidelines for handling multi-token phrases

493-495: LGTM! Valid additions of non-hyphenated variants.

The additions of "outgroup" and "ingroup" complement the existing hyphenated forms ("out-group" and "in-group"). Both variants are widely accepted in academic and professional writing, particularly in social psychology and sociology.

languagetool-language-modules/de/src/main/resources/org/languagetool/resource/de/multitoken-suggest.txt (1)

3091-3091: LGTM! The additions follow the established pattern.

The new entries are correctly formatted and appropriately use the /S suffix for plural forms in German text. They represent commonly used terms in financial and culinary contexts.

Also applies to: 3112-3114

languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/hunspell/ignore.txt (1)

10932-10939: LGTM! The additions are appropriate for the spellchecker ignore list.

The new entries include common technical abbreviations (MBR, QBR, EOW), UI elements (navbar), and proper nouns (Martijn). These are valid terms that should be ignored during spell checking.

languagetool-language-modules/en/src/main/resources/org/languagetool/resource/en/multiwords.txt (2)

Line range hint 1-5: LGTM! Clear and comprehensive header comments.

The file header provides excellent guidance on:

The file's purpose for English disambiguation
Clear criteria for adding terms
Case-sensitivity rules
Relationship with spelling.txt

8027-8027: LGTM! Correct POS tagging for culinary term.

The term "sambal ulek" is correctly tagged as NN:U (uncountable noun) as it refers to an Indonesian chili paste that is typically used as a mass noun.

languagetool-core/src/main/resources/org/languagetool/resource/spelling_global.txt (1)

26704-26710: LGTM for the remaining entries!

The other additions (Brandeis University, Carl Boese, Le Méridien, Albert Heijn, Tiff Macklem, Carel Fabritius) are valid proper nouns that will enhance the spell checker's vocabulary.

languagetool-language-modules/de/src/main/resources/org/languagetool/resource/de/disambiguation.xml (1)

6037-6045: Rule looks good, consider pattern refinements.

The rule appropriately handles capitalized consonant sequences (acronyms/initialisms) at sentence start. The ignore_spelling action is suitable for this case.

A few suggestions to consider:

The pattern could be more specific to avoid matching invalid consonant combinations
Consider adding a test case to verify the rule behaves correctly with various inputs

Let's verify the test coverage:

coderabbitai · 2024-10-30T15:01:41Z

...ol-language-modules/de/src/main/resources/org/languagetool/resource/de/hunspell/spelling.txt

@@ -62879,6 +62879,14 @@ hinaus_gleiten
 heraus_flutschen
 raus_flutschen
 hinaus_flutschen
+hinein_flutschen
+herin_flutschen


⚠️ Potential issue

Fix typo in compound verb formation

The word "herin_flutschen" appears to contain a typo. The correct prefix should be "herein" instead of "herin".

-herin_flutschen +herein_flutschen

Committable suggestion was skipped due to low confidence.

coderabbitai · 2024-10-30T15:01:41Z

languagetool-language-modules/en/src/main/resources/org/languagetool/rules/en/grammar.xml

@@ -104442,12 +104442,24 @@ USA
      <url>https://dictionary.cambridge.org/grammar/british-grammar/prefer</url>
      <example correction='prefers overalls to skirts'>She much <marker>prefers overalls instead of skirts</marker>, because she enjoys playing in the mud</example>
    </rule>
-    <rule id='TALK_AT_TO' name='Collocation: Talk at/to'>
+    <rule id='TALK_AT_TO' name='Collocation: Talk at/to' default="off">


💡 Codebase verification

Setting default="off" is a new change that requires attention

The git history shows this is a recent change where default="off" was explicitly added to the rule. This modification makes the "TALK_AT_TO" rule opt-in rather than enabled by default, which is a significant behavioral change that could affect users' grammar checking experience.

The rule was previously enabled by default since its creation in 2017

The change coincides with the addition of new antipatterns for proper nouns

This suggests the rule might have had false positives, leading to it being disabled by default

🔗 Analysis chain

Verify the impact of setting default="off"

Setting the rule to disabled by default is a significant change that affects all users. While this can help prevent false positives, we should ensure this aligns with user expectations.

Let me search for any previous versions or history of this rule to verify if this is a change in default state.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash # Check if this rule was previously enabled by default rg -B 5 -A 5 'id=.TALK_AT_TO.' --glob '*.xml' | grep -B 5 -A 5 'default='

Length of output: 1661

Script:

#!/bin/bash # Search git history for changes to this rule git log -p | grep -A 10 -B 10 'TALK_AT_TO.*default='

Length of output: 1242

coderabbitai · 2024-10-30T15:01:41Z

...etool-language-modules/es/src/main/resources/org/languagetool/resource/es/disambiguation.xml

+        <rule>
+            <pattern case_sensitive="yes">
+                <token postag="SENT_START" />
+                <marker>
+                    <token regexp="yes" postag="SENT_END">[BCDFGHJKLMNPQRSTVWXZ]{3,6}s?</token>
+                </marker>
+            </pattern>
+            <disambig action="ignore_spelling"/>
+        </rule>


🛠️ Refactor suggestion

Consider enhancing the abbreviation rule pattern.

The rule for handling uppercase abbreviations at sentence end is good, but could be improved:

The pattern could be more specific to avoid false positives

Documentation would help explain the rule's purpose

Consider this enhanced version:

- <rule> - <pattern case_sensitive="yes"> - <token postag="SENT_START" /> - <marker> - <token regexp="yes" postag="SENT_END">[BCDFGHJKLMNPQRSTVWXZ]{3,6}s?</token> - </marker> - </pattern> - <disambig action="ignore_spelling"/> - </rule> +  + <rule id="uppercase_abbrev_sent_end" name="uppercase abbreviations at sentence end"> + <pattern case_sensitive="yes"> + <token postag="SENT_START" /> + <marker> +  + <token regexp="yes" postag="SENT_END">(?:[BCDFGHJKLMNPQRSTVWXZ]{3,6})s?</token> + </marker> + </pattern> + <disambig action="ignore_spelling"/> + </rule>

The improvements include:

Added descriptive comment explaining the rule's purpose

Added rule ID and name for better maintainability

Enhanced regexp pattern with non-capturing group

Added inline comment explaining the pattern

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

<rule>

<pattern case_sensitive="yes">

<token postag="SENT_START" />

<marker>

<token regexp="yes" postag="SENT_END">[BCDFGHJKLMNPQRSTVWXZ]{3,6}s?</token>

</marker>

</pattern>

<disambig action="ignore_spelling"/>

</rule>



<rule id="uppercase_abbrev_sent_end" name="uppercase abbreviations at sentence end">

<pattern case_sensitive="yes">

<token postag="SENT_START" />

<marker>



<token regexp="yes" postag="SENT_END">(?:[BCDFGHJKLMNPQRSTVWXZ]{3,6})s?</token>

</marker>

</pattern>

<disambig action="ignore_spelling"/>

</rule>

coderabbitai · 2024-10-30T15:01:42Z

languagetool-core/src/main/resources/org/languagetool/resource/spelling_global.txt

+Carl Boese
+Série A
+Campeonato Brasileiro Série A
+Brasileirão Série A
+Le Méridien
+Albert Heijn
+Tiff Macklem


💡 Codebase verification

Redundant name entries confirmed - consolidation needed

Found duplicate entries in spelling_global.txt:

"Thomas Chatterton Williams" and "Thomas Chatterton"

"Frank Ramsey" and "Frank P. Ramsey"

Please keep only the full name versions:

"Thomas Chatterton Williams"

"Frank P. Ramsey"

This will maintain consistency while preserving the most complete form of each name.

🔗 Analysis chain

Review redundant name entries.

Some names have multiple variations that might need consolidation:

"Thomas Chatterton Williams" and "Thomas Chatterton"

"Frank Ramsey" and "Frank P. Ramsey"

Consider keeping only the most commonly used or full versions to avoid potential confusion in spell checking.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash # Check for other instances of these names in the codebase # to ensure consistent usage echo "Checking for existing name variations..." rg "Thomas Chatterton|Frank.*Ramsey" --type txt

Length of output: 546

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between a08d58b and 93ba5ee.

📒 Files selected for processing (1)

languagetool-language-modules/de/src/main/resources/org/languagetool/rules/de/remote-rule-filters.xml (1 hunks)

coderabbitai · 2024-10-30T15:45:18Z

...ool-language-modules/de/src/main/resources/org/languagetool/rules/de/remote-rule-filters.xml

+            <rule>
+                <pattern>
+                    <token>Guten</token>
+                    <token skip="2" regexp="yes">Morgen|Abend|Tag</token>
+                    <marker>
+                        <token postag="SENT_END">,</token>
+                    </marker>
+                </pattern>
+                <example correction="">Guten Morgen Frank<marker>,</marker></example>
+            </rule>


💡 Codebase verification

Overlapping rules need to be consolidated

The new rule in remote-rule-filters.xml conflicts with existing rules that handle greeting punctuation:

GUTEN_TAG_KOMMA in grammar.xml is active and handles comma placement after greetings when followed by questions/phrases

ANREDE_KOMMA_2 in de-DE-AT/grammar.xml appears to be commented out (ends with -->)

The new rule should be removed or consolidated with GUTEN_TAG_KOMMA because:

Both rules target the same greeting patterns ("Guten Tag/Morgen/Abend")

GUTEN_TAG_KOMMA is more comprehensive, handling various follow-up phrases

Having multiple rules for the same pattern can lead to inconsistent corrections

🔗 Analysis chain

LGTM: New rule for greeting punctuation looks good.

The rule correctly handles comma placement after formal German greetings ("Guten Morgen", "Guten Abend", "Guten Tag"). The pattern matching is well-structured and the example demonstrates proper usage.

Let's verify there are no conflicting rules:

Potential conflict with existing greeting punctuation rules detected

The search results reveal two potentially conflicting rules in the German grammar files:

ANREDE_KOMMA_2 rule in de-DE-AT/grammar.xml specifically handles missing commas after greeting formulas

GUTEN_TAG_KOMMA rule in grammar.xml handles comma placement after "Guten Tag" followed by "wie gehts"

Let's verify if these rules are actually active and how they interact with our new rule:

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash # Description: Check for any existing rules that might conflict with the new greeting punctuation rule # Look for patterns containing "Guten" and greeting words rg -A 5 'Guten\s+(Morgen|Abend|Tag)' --type xml

Length of output: 9586

Script:

#!/bin/bash # Check if the potentially conflicting rules are commented out and get their full context rg -B 2 -A 10 'id="(ANREDE_KOMMA_2|GUTEN_TAG_KOMMA)"' --type xml

Length of output: 3893

tiff added 6 commits October 30, 2024 15:53

[core] add global spelling

1aa6b0e

[de] improve rules

d62f21c

[de] add spelling

53ebb19

[en] improve rules

897c016

[es] improve disambig rule

40dcf74

[fr] improve rules

2830bf9

coderabbitai bot reviewed Oct 30, 2024

View reviewed changes

tiff added 2 commits October 30, 2024 16:21

[de] fix typo, thanks @coderabbitai

a08d58b

[de] fix antipattern

93ba5ee

coderabbitai bot reviewed Oct 30, 2024

View reviewed changes

tiff merged commit 0ac604f into master Oct 30, 2024
5 checks passed

coderabbitai bot mentioned this pull request Nov 5, 2024

Cb changes 20241105 #10987

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cb changes 20241030 #10974

Cb changes 20241030 #10974

tiff commented Oct 30, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 30, 2024 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot Oct 30, 2024

coderabbitai bot Oct 30, 2024

coderabbitai bot Oct 30, 2024

coderabbitai bot Oct 30, 2024

coderabbitai bot left a comment

coderabbitai bot Oct 30, 2024

Cb changes 20241030 #10974

Cb changes 20241030 #10974

Conversation

tiff commented Oct 30, 2024 • edited by coderabbitai bot Loading

Summary by CodeRabbit

Release Notes

coderabbitai bot commented Oct 30, 2024 • edited Loading

Walkthrough

Changes

Possibly related PRs

Suggested reviewers

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Oct 30, 2024

Choose a reason for hiding this comment

coderabbitai bot Oct 30, 2024

Choose a reason for hiding this comment

coderabbitai bot Oct 30, 2024

Choose a reason for hiding this comment

coderabbitai bot Oct 30, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Oct 30, 2024

Choose a reason for hiding this comment

tiff commented Oct 30, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 30, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)