Fuzzy Segmentation Source Code

This project relies on the BlockML framework also developed by myself, to work. Fuzzy Segmentation is a segmentation model that utilises fuzzy systems and syntax parsing to segment text in a manner of different ways. My PhD project mainly concerns text segmentation with RST (Rhetorical structure Theory) however, the fuzzy system can be trained to segment in accordance with other patterns.

BlockML Example Project Structure

each folder defines a particular block the interface will run through 2. each block can contain sub-blocks, modules etc that will run in the order of your choosing -> Defined in the init file of the respective block.
1. each block must take an input and output file defined in the IO.py file in the root of the folder

Example commands I use

Example of running syntax-parse training for fuzzy system

python3 mlflow.py run fuzzy_segmentation preprocess_fis phd_datasets/raw_dataset_inputs syntax

Example of running dependency-parse training for fuzzy system

python3 mlflow.py run fuzzy_segmentation preprocess_fis phd_datasets/raw_dataset_inputs dependency

Running SLSeg

python3 run_all.py ../phd_datasets/gum_outputs/original_gum_text ../phd_datasets/slseg_outputs/gum ./parser05Aug16 -T50

Run on smaller set that is used by Fuzzy Seg.

python3 run_all.py ../phd_datasets/gum_outputs/original_gum_text ../phd_datasets/slseg_outputs/gum ./parser05Aug16 -T50

Running Segbot (GUM Dataset)

python3 run_segbot.py '../phd_datasets/gum_outputs/original_gum_text' '../phd_datasets/segbot_outputs/gum'

Running HILDA to generate the segmentations.

Running hilda to get the segmentations

python3 hilda.py -s texts/bbc_20081227.txt

Training and running fuzzy segmentation

This will run all of them in a file and produce outputs --> This is specifically to return results (using kfold) for the original analysis of the model. This doesn't actually take in a file and segment it. That is below.

python3 mlflow.py run fuzzy_segmentation train "phd_datasets/fuzzyseg_outputs/fis_training/" '{"none":"none"}'

This will only run using the one dataset. We only want to train once in this instance.

python3 mlflow.py run fuzzy_segmentation train "phd_datasets/fuzzyseg_outputs/fis_training/train_0-1_k3.dat" '{"training_data_path":"../dependencies/phd_datasets/fuzzyseg_outputs/fis_training/train_0-1_k3.dat", "test_data_path":"../dependencies/phd_datasets/fuzzyseg_outputs/fis_training/train_2_k3.dat"}'

Run the run + validation flow

python3 mlflow.py run-flow fuzzy_segmentation train-and-run-flow-syntax "phd_datasets/fuzzyseg_outputs/fis_training/charniak/train_0-1_k3_char.dat" '{"training_data_path":"../dependencies/phd_datasets/fuzzyseg_outputs/fis_training/charniak/train_0-1_k3_char.dat", "test_data_path":"../dependencies/phd_datasets/fuzzyseg_outputs/fis_training/charniak/train_2_k3_char.dat"}'

python3 mlflow.py run-flow fuzzy_segmentation train-and-run-flow-syntax "phd_datasets/fuzzyseg_outputs/fis_training/charniak/train_0-1_k5_char.dat" '{"training_data_path":"../dependencies/phd_datasets/fuzzyseg_outputs/fis_training/charniak/train_0-1_k5_char.dat", "test_data_path":"../dependencies/phd_datasets/fuzzyseg_outputs/fis_training/charniak/train_2_k5_char.dat"}'

python3 mlflow.py run-flow fuzzy_segmentation train-and-run-flow-syntax "phd_datasets/fuzzyseg_outputs/fis_training/syntax/train_0-1_k3.dat" '{"training_data_path":"../dependencies/phd_datasets/fuzzyseg_outputs/fis_training/syntax/train_0-1_k3.dat", "test_data_path":"../dependencies/phd_datasets/fuzzyseg_outputs/fis_training/syntax/train_2_k3.dat"}'

# KFOLD STUFF
python3 mlflow.py run-flow fuzzy_segmentation train-and-run-flow-syntax "../dependencies/phd_datasets/fuzzyseg_outputs/fis_training/generated/train_12_k3_syntax.dat" '{"training_data_path":"phd_datasets/fuzzyseg_outputs/fis_training/generated/train_12_k3_syntax.dat", "test_data_path":"phd_datasets/fuzzyseg_outputs/fis_training/generated/test/train_12_k3_syntax.dat", "kfold":10}'


> this is using data that was generated before we did the fullstop logic (now commented out) in the comparewordavg function.
python3 mlflow.py run-flow fuzzy_segmentation train-and-run-flow-syntax "../dependencies/phd_datasets/fuzzyseg_outputs/fis_training/generated/old_train_12_k3_syntax.dat" '{"training_data_path":"phd_datasets/fuzzyseg_outputs/fis_training/generated/old_train_12_k3_syntax.dat", "test_data_path":"phd_datasets/fuzzyseg_outputs/fis_training/generated/test/train_12_k3_syntax.dat", "kfold":10}'
# END KFOLD

python3 mlflow.py run-flow fuzzy_segmentation train-and-run-flow-syntax "phd_datasets/fuzzyseg_outputs/fis_training/academic/5050split/train_50_k3_syntax.dat" '{"training_data_path":"../dependencies/phd_datasets/fuzzyseg_outputs/fis_training/academic/5050split/train_50_k3_syntax.dat", "test_data_path":"../dependencies/phd_datasets/fuzzyseg_outputs/fis_training/academic/5050split/test_50_k3_syntax.dat"}'

Running FuzzySeg as a Segmenter

To run FuzzySeg as a standalone piece. This process takes in a file + training data and returns a list of segments in HILDA or array format for use in subsequent RST or text summ. models.

python3 mlflow.py run fuzzy_segmentation run "phd_datasets/fuzzyseg_inputs/001a.txt" '{
    "training_data_path":"/phd_datasets/fuzzyseg_outputs/fis_training/generated/train_12_k3_syntax.dat", 
    "output_data_path":"/phd_datasets/fuzzyseg_outputs/gum",
    "parse_type":"syntax",
    "parser_output_form":"hilda"
}'

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
__pycache__		__pycache__
mlruns/0		mlruns/0
preprocess_fis		preprocess_fis
python_micrologic		python_micrologic
results		results
run		run
train		train
visualiser		visualiser
.DS_Store		.DS_Store
.gitignore		.gitignore
IO.py		IO.py
MLproject		MLproject
README.md		README.md
__init__.py		__init__.py
flows.ini		flows.ini
micro_logic.py		micro_logic.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fuzzy Segmentation Source Code

BlockML Example Project Structure

Example commands I use

Example of running syntax-parse training for fuzzy system

Example of running dependency-parse training for fuzzy system

Running SLSeg

Running Segbot (GUM Dataset)

Running HILDA to generate the segmentations.

Training and running fuzzy segmentation

Running FuzzySeg as a Segmenter

About

Releases

Packages

Languages

omarali0703/blockml_fuzzysegmentation

Folders and files

Latest commit

History

Repository files navigation

Fuzzy Segmentation Source Code

BlockML Example Project Structure

Example commands I use

Example of running syntax-parse training for fuzzy system

Example of running dependency-parse training for fuzzy system

Running SLSeg

Running Segbot (GUM Dataset)

Running HILDA to generate the segmentations.

Training and running fuzzy segmentation

Running FuzzySeg as a Segmenter

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages