STWFSA includes a potential security vulnerability and is no longer maintained. Please consider the new implementation in python. While it does not provide a REST API, it includes a scoring mechanism for matches.
This is a dictionary matching component for the STW Thesaurus for Economics (STW). The software builds upon the finite-state-automaton (FSA) text filtering tool monq.
In particular, STWFSA implements generation of regular expressions to recognize preferred terms and alternative terms of the STW. A number of test cases are provided to assure correct recognition of several patterns.
STWFSA is meant to be integrated into more complex automatic subject indexing pipelines, where it may be combined with other dictionary matching tools. Appropriately configured, you can run STWFSA as a candidate generator and combine it with machine learning techniques, which especially allows to reject ambiguous matches.
Please visit zbw.eu for more information about ZBW's automatic subject indexing working group.
Author: Martin Toepfer, 2017-2018
ZBW - Leibniz Information Centre for Economics
Before testing and building STWFSA, you should adapt the file pom.xml, e.g., set STW_PTH. When you have built STWFSA successfully and monq is on your classpath, you can run the tool like
set STW_DIR=~/kb/stw
java -cp zbw-a1-match-fsa-$VERSION.jar eu.zbw.stwfsa.app.StwRecApp -in content_unlabeled.tsv -out predicted.tsv
The directory STW_DIR must contain the file stw.nt.
Add the argument -info
to print offsets and matching text. The call
java -cp zbw-a1-match-fsa-$VERSION.jar eu.zbw.stwfsa.app.StwRecApp -help
explains all arguments and usage in more detail.
You may want to have a look at StwRecServe as a starting point for offering dictionary matching as a webservice.
STWFSA reads and writes tab-separated (TSV) data.
Two columns. For each row: document id, content (short-text, e.g., title)
default: document id, concept id
option: -compressed
: document id, list of concept ids (tab-separated)
option: -info
: document id, concept id, begin, end, covered text
Please note, option info
has precedence over option compressed
.
If you want to use STWFSA programmatically, please have a look at StwAnnotator.
- add simple server example
This is the first entry of the public changelog.