Intervention_Normalizer

An automated pipeline to normalize complex interventions into the computable representation to enable structured queries.

Prerequest

Install 'QuickUMLS' locally. Reference: https://github.com/Georgetown-IR-Lab/QuickUMLS
Install 'scispaCy' locally. Reference: https://github.com/allenai/scispacy. Note that the pre-trained model we use is "en_core_sci_lg". Make sure to follow the guidance in Reference to download this model. We also use the abbreviation component, be sure to add this component.
Check 'package-list.txt' for the full packages list.

How to use

Configure the resource, quickUMLS, input and output paths in configure.py. Note that currently we only support the input data as: a .txt file with the raw abstract text (e.g., 88754.txt, where '88754' is the file id) in combined with an .ann file that include the annotated rew text intervention snippets of this abstract (e.g., 88754.ann, where '88754' is again the file id). The only purpose of having the file abstract text is to extraction the abbreviations for intervention snippets preprocessing. You can find some examples in the "example/dataset" folder. The default resource folder is "resource/", input folder is "example/dataset", output folder is "example/result"
Run the program.

cd intervention_normalizer
python main.py

Example

input:

"brodalumab 210 mg every 2 weeks after receiving ustekinumab through 52 weeks"

output:

{
    "file_id": "10010",
    "start": "285",
    "end": "297",
    "text": "brodalumab 210 mg every 2 weeks after receiving ustekinumab through 52 weeks",
    "has_drug": [
        {
            "text": "brodalumab",
            "maps_to": "C3491331:brodalumab",
            "start": 0,
            "end": 10,
            "has_strength": [
                "210 mg"
            ],
            "has_frequency": [
                "every 2 weeks"
            ],
            "has_negation": "affirmed"
        },
        {
            "text": "ustekinumab",
            "maps_to": "C1608841:ustekinumab",
            "start": 48,
            "end": 59,
            "has_duration": [
                "52 weeks"
            ],
            "has_negation": "affirmed"
        }
    ],
    "has_relation": "before (C1608841->C3491331)"
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.idea		.idea
example		example
resource		resource
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
attribute_extractor.py		attribute_extractor.py
configure.py		configure.py
entity_extraction.py		entity_extraction.py
main.py		main.py
package-list.txt		package-list.txt
postprocess.py		postprocess.py
preprocess.py		preprocess.py
relation_extractor.py		relation_extractor.py
save_json.py		save_json.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Intervention_Normalizer

Prerequest

How to use

Example

About

Releases

Packages

Languages

License

WengLab-InformaticsResearch/Intervention_Normalizer

Folders and files

Latest commit

History

Repository files navigation

Intervention_Normalizer

Prerequest

How to use

Example

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages