Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate DataStore with MultiPack #834

Merged
merged 7 commits into from
Jun 23, 2022

Conversation

mylibrar
Copy link
Collaborator

This PR fixes #833.

Description of changes

  • base_pack.py
    • Refactor _pending_entries
    • Add pack version checking in deserialization
    • Implement delete_entry
    • Update on_entry_creation to support new approach for entry creation
  • base_store.py
    • Update some interfaces to make it consistent with data_store.py
  • multi_pack.py
    • Remove all the SortedList containers and store the entries to DataStore
    • Update _add_entry_with_check, get, delete_entry accordingly
  • data_pack.py
    • Move some of the implementation to BasePack
    • Create a separate file for EntryConverter: entry_converter.py
  • data_store.py
    • Add add_multipack_entries_raw()
    • Remove allow_duplicate option when adding some some non-annotation entries
  • top.py
    • Refactor the MultiEntrys and store their attributes to DataStore. Any calls that may update these entries' attributes will be directed to DataStore.

Possible influences of this PR.

The integration is not fully tested and it might be unstable.

Test Conducted

The integration should successfully pass all the current CI tests. Will need more dedicated test cases in future #808.

@mylibrar mylibrar self-assigned this Jun 15, 2022
@codecov
Copy link

codecov bot commented Jun 15, 2022

Codecov Report

Merging #834 (06dfacc) into master (3ed661e) will increase coverage by 0.08%.
The diff coverage is 85.89%.

@@            Coverage Diff             @@
##           master     #834      +/-   ##
==========================================
+ Coverage   80.54%   80.63%   +0.08%     
==========================================
  Files         252      253       +1     
  Lines       19363    19295      -68     
==========================================
- Hits        15596    15558      -38     
+ Misses       3767     3737      -30     
Impacted Files Coverage Δ
forte/data/container.py 76.47% <ø> (-1.03%) ⬇️
forte/processors/nlp/ner_predictor.py 18.00% <0.00%> (ø)
forte/data/base_store.py 73.19% <67.74%> (-2.93%) ⬇️
forte/data/entry_converter.py 82.66% <82.66%> (ø)
forte/data/ontology/top.py 74.39% <86.66%> (+2.11%) ⬆️
forte/data/multi_pack.py 82.12% <86.79%> (+3.34%) ⬆️
forte/data/base_pack.py 77.15% <87.67%> (+1.53%) ⬆️
forte/data/data_pack.py 84.29% <100.00%> (-0.31%) ⬇️
forte/data/data_store.py 92.53% <100.00%> (+1.35%) ⬆️
forte/data/ontology/core.py 76.47% <100.00%> (+0.85%) ⬆️
... and 10 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3ed661e...06dfacc. Read the comment docs.

Copy link
Member

@hunterhector hunterhector left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there is quite a bit changes in the code, but we probably need to add test cases specifically designed for them.

forte/data/__init__.py Outdated Show resolved Hide resolved
forte/data/base_pack.py Show resolved Hide resolved
forte/data/base_pack.py Outdated Show resolved Hide resolved
forte/data/base_pack.py Show resolved Hide resolved
forte/data/base_pack.py Show resolved Hide resolved
forte/data/entry_converter.py Outdated Show resolved Hide resolved
forte/data/entry_converter.py Outdated Show resolved Hide resolved
forte/data/entry_converter.py Outdated Show resolved Hide resolved
forte/data/entry_converter.py Outdated Show resolved Hide resolved
forte/data/multi_pack.py Outdated Show resolved Hide resolved
@mylibrar mylibrar marked this pull request as draft June 16, 2022 20:58
@mylibrar mylibrar marked this pull request as ready for review June 16, 2022 23:51
Copy link
Member

@hunterhector hunterhector left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forte/data/base_pack.py Show resolved Hide resolved
forte/data/base_pack.py Outdated Show resolved Hide resolved
forte/data/base_pack.py Outdated Show resolved Hide resolved
forte/data/base_pack.py Outdated Show resolved Hide resolved
forte/data/base_pack.py Show resolved Hide resolved
forte/data/base_pack.py Show resolved Hide resolved
forte/data/data_store.py Show resolved Hide resolved
Copy link
Member

@hunterhector hunterhector left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. there is a big decrease in coverage on core.py for 43 lines: https://app.codecov.io/gh/asyml/forte/compare/834/changes. What happens?
  2. Ci failed after merging to master

@mylibrar
Copy link
Collaborator Author

  1. there is a big decrease in coverage on core.py for 43 lines: https://app.codecov.io/gh/asyml/forte/compare/834/changes. What happens?
  2. Ci failed after merging to master
  1. I guess this is because we discard the entry serialization/deserialization approach which uses Pointer and MpPointer to resolve tid. Maybe we can just remove them.
  2. This is caused by the changes in BasePack.__del__. I'll fix it in the next update.

@hunterhector
Copy link
Member

  1. there is a big decrease in coverage on core.py for 43 lines: https://app.codecov.io/gh/asyml/forte/compare/834/changes. What happens?
  2. Ci failed after merging to master
  1. I guess this is because we discard the entry serialization/deserialization approach which uses Pointer and MpPointer to resolve tid. Maybe we can just remove them.
  2. This is caused by the changes in BasePack.__del__. I'll fix it in the next update.

If the test cases are no longer needed (using pointers), we can delete those branches.

@mylibrar mylibrar added this to the 0.3 stable version milestone Jun 22, 2022
@mylibrar mylibrar merged commit 330a667 into asyml:master Jun 23, 2022
@mylibrar mylibrar deleted the datastore-multipack branch June 23, 2022 18:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Integrate DataStore with MultiPack
2 participants