Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

auto and user confirmed deletion of book titles #27

Draft
wants to merge 72 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
72 commits
Select commit Hold shift + click to select a range
c956829
Delete matcher.py
gracexu7 Oct 25, 2022
b1f812a
Add files via upload
gracexu7 Oct 25, 2022
74afb1f
Add files via upload
gracexu7 Nov 1, 2022
86d8004
Create placeholder.txt
gracexu7 Nov 1, 2022
5c53eff
Rename annotated-by-size.html to gender-trouble/annotated-by-size.html
gracexu7 Nov 1, 2022
8be4f12
Rename annotated-by-size.ipynb to gender-trouble/annotated-by-size.ipynb
gracexu7 Nov 1, 2022
e11f294
Delete placeholder.txt
gracexu7 Nov 1, 2022
1a31302
character index finder
Nov 3, 2022
b6083c1
character index finder
Nov 3, 2022
b98c802
Add files via upload
gracexu7 Nov 4, 2022
8bd864f
jump to visualization
Dec 1, 2022
0f1cd0b
Add files via upload
gracexu7 Dec 1, 2022
90744d3
albrecht
Dec 2, 2022
52b6e94
Create file.txt
gracexu7 Jan 14, 2023
3440db7
Add files via upload
gracexu7 Jan 14, 2023
11f164b
Delete file.txt
gracexu7 Jan 14, 2023
7648665
Add files via upload
gracexu7 Jan 14, 2023
f6207b7
added notebook to retrieve character indexes
milanterlunen Feb 14, 2023
e668413
added index retrieval notebook
Feb 14, 2023
1b01546
updated files needed
Feb 14, 2023
d502363
update ID source in spreadsheet
Feb 14, 2023
4ef2e08
updated ID workflow
Feb 14, 2023
9651a89
new branch
irinazoccolini Mar 6, 2023
5bae9de
Create input_re-run-ocr.ipynb
Mar 6, 2023
4b822f3
added jupyter notebook for running matcher algorithm
fleuriie Mar 7, 2023
2b87e08
finalized matcher
fleuriie Mar 10, 2023
d46aba3
finalized matcher
fleuriie Mar 10, 2023
4a4a4c4
fixed merge problems
fleuriie Mar 10, 2023
cdaa97a
spell check for all articles
Mar 10, 2023
a249e71
matcher formatting
fleuriie Mar 10, 2023
8f3fc1b
start cell for re-running ocr
irinazoccolini Mar 10, 2023
4fdc101
remove prints
irinazoccolini Mar 10, 2023
e0e7a60
merge
irinazoccolini Mar 13, 2023
159c3f9
changed language detection
Mar 17, 2023
4c292c3
parsed data to be ready for analysis
fleuriie Mar 17, 2023
9536ddf
add to ocr cells
irinazoccolini Mar 17, 2023
d1a227f
Merge branch 'rerun-ocr' of https://github.com/milanterlunen/quotatio…
irinazoccolini Mar 17, 2023
e5aee71
fix print statement
irinazoccolini Mar 17, 2023
92968c3
added article number
Mar 17, 2023
c33bf82
loop over folder of pdfs
irinazoccolini Mar 17, 2023
f8bd244
Merge branch 'rerun-ocr' of https://github.com/milanterlunen/quotatio…
irinazoccolini Mar 17, 2023
2a2209d
clean up code
irinazoccolini Mar 17, 2023
90b778e
added what portions of texts are most quoted
gtipan Mar 17, 2023
fa87d61
added modifiability of distribution for what portions of texts are mo…
gtipan Mar 21, 2023
3282b99
replace fixed article text - need to test
Mar 21, 2023
fd25d6b
added replace article text code
Mar 24, 2023
f611581
initial pdf to text notebook
irinazoccolini Mar 24, 2023
67f2006
added benjamin and fanon data
fleuriie Mar 24, 2023
aa0aefc
Update analyze_data.ipynb
kingstonlyw Apr 7, 2023
37baaf2
Update analyze_data.ipynb
kingstonlyw Apr 7, 2023
24e077f
finish rerun ocr notebook
irinazoccolini Apr 11, 2023
4efbdde
updated matcher_notebook and annotated_by_size
fleuriie Apr 14, 2023
8b135d7
Merge branch 'master' of https://github.com/milanterlunen/quotation-d…
fleuriie Apr 14, 2023
dde2f18
get page number
irinazoccolini Apr 14, 2023
57ed239
Update analyze_data.ipynb
kingstonlyw Apr 21, 2023
0d2470b
Update analyze_data.ipynb
kingstonlyw Apr 21, 2023
8c8614a
Update analyze_data.ipynb
kingstonlyw Apr 21, 2023
9acac93
Update analyze_data.ipynb
kingstonlyw Apr 21, 2023
215b42d
added histogram of quotation lengths
fleuriie Apr 21, 2023
4b6964c
add initial page num detection w/ user input
irinazoccolini Apr 24, 2023
664e756
Merge pull request #22 from milanterlunen/rerun-ocr
irinazoccolini Apr 24, 2023
e2889d4
make edits to regexes
irinazoccolini Apr 25, 2023
cd3ff06
add cell for installing packages when running notebook in browser
irinazoccolini May 2, 2023
f925aa9
cleaning up analysis notebook
fleuriie May 5, 2023
791bed0
add final edits to notebook
irinazoccolini May 8, 2023
16e3baa
add more comments
irinazoccolini May 8, 2023
c342c9b
final comments
irinazoccolini May 8, 2023
e274d08
auto and user confirmed deletion of book titles
May 9, 2023
b58a2be
finish notebook
irinazoccolini May 9, 2023
c701f39
Merge branch 'master' of https://github.com/milanterlunen/quotation-d…
irinazoccolini May 9, 2023
6ce5378
Merge pull request #23 from milanterlunen/fix-pdf-to-text
irinazoccolini May 9, 2023
7a0ea00
Merge branch 'master' into deleting-book-titles
milanterlunen May 9, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added .DS_Store
Binary file not shown.
14 changes: 14 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,17 @@
__pycache__
text_matcher.egg-info
dist/
preprocessing\gendertrouble-text.txt
preprocessing\gendertrouble-text-cleaned.txt
preprocessing/incorrect_articles_pdfs
preprocessing/gender_trouble.pdf
algorithm-testing/jstor-gender-trouble-all-articles.jsonl
algorithm-testing/jstor-middlemarch-articles.json
algorithm-testing/jstor-middlemarch-articles.json
algorithm-testing/jstor-gender-trouble-all-articles.jsonl
preprocessing/incorrect_articles_pdfs
preprocessing/gender_trouble.pdf
preprocessing/foucault.pdf
preprocessing/kuhn.pdf
preprocessing/hooks.pdf
preprocessing/benjamin.pdf
Loading