Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

auto and user confirmed deletion of book titles #27

Draft
wants to merge 72 commits into
base: master
Choose a base branch
from

Conversation

kamauln
Copy link

@kamauln kamauln commented May 9, 2023

Auto deletion currently works well - removing ~.25 of text, but the user confirmed deletion is still too broad and doesn't capture all possible matches. The current workaround is the user just manually deletes the remaining book titles.

Kamau Njendu and others added 28 commits March 21, 2023 17:52
replaces articles json file with fixed article texts and returns pd dataframe of articles that cannot be fixed.
the final cell block replaces the initial article json file with one with the fixed text > ie a permanent fix
Auto deletion currently works well - removing ~.25 of text, but user confirmed deletion is still too broad and doesn't capture all possible matches. Current workaround is user just manually deletes remaining book titles.
PDF to Text with page number detection
@JonathanReeve
Copy link
Owner

Hi @kamauln and @gracexu7 ! Thanks so much for all this great work. I'm happy to merge it at some point, provided it's cleaned up. At minimum, you'll want to:

  • remove everything unnecessary from the PR. .DS_Store files, .ipynb checkpoints—anything that isn't code.
  • squash commits together which are all part of the same feature, bugfix, or task. Write good commit messages. "Add files via upload" is not a good commit message, since it doesn't tell us anything useful.
  • remove any experiments which use text-matcher but aren't a part of text-matcher itself. Example use cases are OK, but they should be in something like examples/ rather than gender-trouble/ which isn't helpful to others who may be using the repo.

But tests would be good, too, especially if you're changing any functionality.

Let me know when you've done this by requesting my review and unmarking it as a draft. Or close this PR and open a different one later, if you prefer. Thanks again!

@milanterlunen milanterlunen deleted the deleting-book-titles branch February 20, 2024 11:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants