jWordSplitter Change Log

2023-xx-yy (4.8-SNAPSHOT)

...

2023-09-21 (4.7)

requires Java 1.8 or later now
extended the dictionary

2023-09-16 (4.6)

extended the dictionary
load compound parts from a plain text file in the JAR, not a binary file (it's even faster, about the same size and makes development easier)

2021-06-15 (4.5)

extended the dictionary
added several exceptions

2018-05-05 (4.4)

New method AbstractWordSplitter.setMaximumWordLength() that sets a maximum length for the input: longer words will throw InputTooLongException to avoid excessively long processing times for artificial compounds. NOTE: The default is 70 and this is a change in behavior - there used to be no limit.
The recursive method that does most work now properly checks if the current thread is interrupted using Thread.interrupted() and throws a RuntimeException if so.

2018-01-27 (4.3)

Fix: Whenever we split a word, we should first check whether it is a defined split exception (#18)

2017-09-10 (4.2)

Net method getSubWords(), that also gets the shortest matches. For example, Sauerstoffflasche will get Sauer, stoff, Sauerstoff, flasche. Thanks to github user Tobulus.
Added new constructor GermanWordSplitter(boolean hideInterfixCharacters, Set<String> words) (issue #11)
Fixed a UnsupportedOperationException that could occur in non-strict mode

2015-05-09 (4.1)

extended the dictionary
added several exceptions
constructor GermanWordSplitter(boolean hideInterfixCharacters) now makes sure the dictionary is read only once
fixed getAllSplits() to properly obey minimum word length

2015-04-20 (4.0)

requires Java 1.7 or later
moved classes to package de.danielnaber.jwordsplitter
rewrote algorithm to make it simpler, slightly faster and more correct: it now always returns the longest match
default minimum length of the compound parts is now 3, leading to much more decompositions
dictionary update
internal: binary dictionary not part of repository anymore. The source files are in the repository now and everything can be built with build.sh.

2012-09-24 (3.4)

Important note for users who extend AbstractWordSplitter: getConnectingCharacters() must return lowercase characters now
added new constructor: GermanWordSplitter(boolean hideConnectingCharacters, InputStream plainTextDict)
added several words to the list of exceptions
about 40% speedup if you split a lot of words (for many use cases, time of initialization will probably still outweigh runtime)
Fixed an UnsupportedOperationException that could occur when strict mode is false (which is the default)
Fixed the JAR so starting it with java -jar jWordSplitter-x.y.jar <filename> works again
Removed EnglishWordSplitter as its dictionary was not included anyway. Extend AbstractWordSplitter if you want to add support for languages other than German.

2011-12-16 (3.3)

renamed jWordSplitter in package path to jwordsplitter to be in accordance with Java conventions
New file exceptionsGerman.txt added to JAR that contains special cases to overwrite the algorithm. With this we now get a correct split for e.g. Klimasünderecke -> Klima + Sünder + Ecke (used to be: Klima + Sünde + Recke)
new method addException() to set the desired splitting for words, without touching the dictionary
small dictionary cleanup (e.g. removing some three-letter words)

2011-02-06 (3.2)

now built with Maven, no other changes

2010-09-18 (3.1)

fixed a bug: compound parts that ended in "s" caused the splitting not to work

2009-10-25 (3.0)

using generics (i.e. at least Java 1.5 is required now for jWordSplitter)
AbstractWordSplitter.splitWords(String) is now called AbstractWordSplitter.splitWord(String)
slightly better handling of German compounds with hyphens
small extensions to the dictionary

2008-11-09 (2.2)

in strict mode the minimum length of words is not longer ignored
major dictionary update: using a smaller dictionary with less words but hopefully better quality (it's the one that was used for LanguageTool already)

2008-07-01 (2.1)

now distributed as a ZIP
simplified output format of TestjWordSplitterGerman
comes with a larger German dictionary
ImportTxtFile is now called SerializeDict

2007-05-12 (2.0)

The JAR can now be used directly to split a list of German words: java -jar jWordSplitter.jar <filename>

2007-05-08

checked in original version from Sven Abels
removed log4j dependency
improved exception handling, i.e. exceptions aren't caught and logged but thrown
code cleanup
hardcoded three examples of German compound parts that don't exist as stand-alone words ("miet-", "grenz-", "ess-")

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CHANGES.md

CHANGES.md

jWordSplitter Change Log

2023-xx-yy (4.8-SNAPSHOT)

2023-09-21 (4.7)

2023-09-16 (4.6)

2021-06-15 (4.5)

2018-05-05 (4.4)

2018-01-27 (4.3)

2017-09-10 (4.2)

2015-05-09 (4.1)

2015-04-20 (4.0)

2012-09-24 (3.4)

2011-12-16 (3.3)

2011-02-06 (3.2)

2010-09-18 (3.1)

2009-10-25 (3.0)

2008-11-09 (2.2)

2008-07-01 (2.1)

2007-05-12 (2.0)

2007-05-08

Files

CHANGES.md

Latest commit

History

CHANGES.md

File metadata and controls

jWordSplitter Change Log

2023-xx-yy (4.8-SNAPSHOT)

2023-09-21 (4.7)

2023-09-16 (4.6)

2021-06-15 (4.5)

2018-05-05 (4.4)

2018-01-27 (4.3)

2017-09-10 (4.2)

2015-05-09 (4.1)

2015-04-20 (4.0)

2012-09-24 (3.4)

2011-12-16 (3.3)

2011-02-06 (3.2)

2010-09-18 (3.1)

2009-10-25 (3.0)

2008-11-09 (2.2)

2008-07-01 (2.1)

2007-05-12 (2.0)

2007-05-08