- MtSamples
- Medical Transcriptions
- Find the annotated versions by three different annotators under
data/processed/mtsamples
- UMLS
- MRSCONSO.RRF file
- Put under
data/raw
- ICD10 2017
- icd10cm_order_2017.txt file
- Put under
data/raw
- big.txt
- variety of ebooks which are merged into one document
- Project Gutenberg's The Adventures of Sherlock Holmes, by Arthur Conan Doyle http://www.gutenberg.org/files/1661/1661-0.txt
- The Project Gutenberg EBook of History of the United States by Charles A. Beard and Mary R. Beard http://www.gutenberg.org/cache/epub/16960/pg16960.txt
- The Project Gutenberg EBook of War and Peace, by Leo Tolstoy http://www.gutenberg.org/files/2600/2600-0.txt
- Put under
data/raw
-
Tune models:
train/tune_models.py
- Tunes dataset-model pairs using grid search
- Saves results to Wandb
-
Generate model performance reports:
train/run_reports.py
- Loads the best dataset-model pair and related config file of a trained model from wandb
- Generates and saves results figures
@article{ozyegen2022word,
title={Word-level text highlighting of medical texts for telehealth services},
author={Ozyegen, Ozan and Kabe, Devika and Cevik, Mucahit},
journal={Artificial Intelligence in Medicine},
pages={102284},
year={2022},
publisher={Elsevier}
}