Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add nightly scrape action #22

Merged
merged 47 commits into from
Dec 19, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
0000434
Add datasette metadata
Dec 1, 2023
274a01f
Add first scraping workflow
Dec 1, 2023
1003daa
Cleanup workflow
Dec 1, 2023
c2c5c7d
Add job for Heroku deploy
Dec 1, 2023
737bc6c
Temp change to run workflow
Dec 1, 2023
740ed8b
Fetch cases from releases
Dec 1, 2023
dfa8856
Fix workflow
Dec 1, 2023
7176f6d
Add debugger
Dec 1, 2023
e8da99f
Revert "Add debugger"
Dec 1, 2023
f6b5d0c
Download assets to folder
Dec 1, 2023
dc5d93e
Debug
Dec 1, 2023
63259f0
Revert "Download assets to folder"
Dec 1, 2023
6eb692d
Add quotes around pw
Dec 1, 2023
f52e0ec
Add heroku login
Dec 1, 2023
1b2f836
Fix workflow
Dec 1, 2023
98246da
Install heroku builds plugin
Dec 1, 2023
35ffe23
Add heroku org
Dec 1, 2023
f7dcc30
Hash datasette pw in workflow
Dec 1, 2023
db10f10
Install datasette plugins
Dec 1, 2023
4b71a82
Add quotes around hash
Dec 1, 2023
a80739e
Try single quotes
Dec 1, 2023
558d358
Revert "Temp change to run workflow"
Dec 1, 2023
5470df4
Add nightly scrape action
Dec 18, 2023
b31ab2b
Fix workflow
Dec 18, 2023
91567d4
Run on PR (for testing)
Dec 18, 2023
68787f6
Remove unnecessary package
Dec 18, 2023
5917293
Update reqs
Dec 18, 2023
834d9d9
Merge branch 'civil' into feature/nightly-action
antidipyramid Dec 18, 2023
81731d3
Try installing crypto packages
Dec 18, 2023
8c60d03
Update default year
Dec 18, 2023
9a20f44
Limit scraping for testing
Dec 18, 2023
37c3647
Test action
Dec 18, 2023
db01b3e
Try single quotes
Dec 18, 2023
9d5abcb
Revert "Try single quotes"
Dec 18, 2023
8d2ec9f
Debug with tmate
Dec 18, 2023
92d1e3f
Revert "Debug with tmate"
Dec 19, 2023
4ad3d7a
Remove quotes
Dec 19, 2023
a8e7f6a
Change comment
Dec 19, 2023
26df58a
Only get new chancery cases for now
Dec 19, 2023
f3888d5
Debug
Dec 19, 2023
64ac43b
Remove test session and fix workflow
Dec 19, 2023
0591b84
Remove test lines
Dec 19, 2023
af930e0
Don't run on cron schedule yet
Dec 19, 2023
a165ee3
Merge branch 'civil' into feature/nightly-action
Dec 19, 2023
2352d2c
Remove unnecessary file
Dec 19, 2023
d4202e1
Remove comments
Dec 19, 2023
318989d
Remove testing line
Dec 19, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
99 changes: 0 additions & 99 deletions .github/workflows/build.yml

This file was deleted.

116 changes: 116 additions & 0 deletions .github/workflows/nightly.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
name: Nightly case scrape

on:
workflow_dispatch:
# schedule:
# - cron: '15 4 * * *'

jobs:
scrape:
name: Scrape new cases
runs-on: ubuntu-latest

steps:
- name: Set current date as env variable
run: echo "BEGIN_COURTS_RUN=$(date +'%s')" >> $GITHUB_ENV
- uses: actions/checkout@v3
- name: upgrade sqlite3
run: |
sudo apt-get update
sudo apt-get install sqlite3
- name: Install requirements
run: |
pip install -U pyopenssl cryptography
pip install -r requirements.txt
- name: Download latest database zip
uses: robinraju/release-downloader@v1.8
with:
latest: true
tag: "nightly"
fileName: "*.db.zip"

- name: Decrypt database
run: |
unzip -P '${{ secrets.CASE_DB_PW }}' cases.db.zip && rm cases.db.zip
- name: Run scrape
run: |
echo $BEGIN_COURTS_RUN
make get_new_records
- name: Setup database for upload
run: |
zip -P '${{ secrets.CASE_DB_PW }}' cases.db.zip cases.db
- name: Upload new release
uses: WebFreak001/deploy-nightly@v3.0.0
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
upload_url: https://uploads.github.com/repos/datamade/court-scrapers/releases/131985702/assets{?name,label}
release_id: 131985702
asset_path: ./cases.db.zip
asset_name: cases.db.zip
asset_content_type: application/zip # required by GitHub API
max_releases: 7

- name: Keepalive
uses: gautamkrishnar/keepalive-workflow@v1

deploy:
name: Deploy to Heroku
needs: scrape
runs-on: ubuntu-latest

env:
HEROKU_ORGANIZATION: ${{ secrets.HEROKU_ORG }}

steps:
- uses: actions/checkout@v3

- name: Install requirements
run: pip install -r requirements.txt

- name: Download latest database zip
uses: robinraju/release-downloader@v1.8
with:
latest: true
tag: "nightly"
fileName: "*.db.zip"

- name: Decrypt database
run: |
unzip -P '${{ secrets.CASE_DB_PW }}' cases.db.zip
- name: Install heroku-builds plugin
run: |
heroku plugins:install heroku-builds
- name: Login to Heroku CLI
uses: akhileshns/heroku-deploy@v3.12.14
with:
heroku_api_key: ${{ secrets.HEROKU_API_KEY }}
heroku_app_name: ""
heroku_email: ${{ secrets.HEROKU_EMAIL }}
justlogin: true

- name: Install Datasette plugins
run: |
datasette install datasette-auth-passwords datasette-auth-tokens
- name: Get hashed Datasette password
run: |
# Store hash as an environment variable
hash=$(echo '${{ secrets.DATASETTE_INSTANCE_PW }}' \
| datasette hash-password --no-confirm); \
echo "hash=$hash" >> $GITHUB_ENV
- name: Deploy Datasette instance to Heroku
run: |
datasette publish heroku cases.db \
-n court-scraper \
-m metadata.json \
--install datasette-auth-passwords \
--plugin-secret datasette-auth-passwords root_password_hash '${{ env.hash }}'
6 changes: 1 addition & 5 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
.INTERMEDIATE: *.csv *.jl *.json

.PHONY: all
all: upload

Expand Down Expand Up @@ -43,9 +41,7 @@ new_plaintiffs.csv: cases.json
new_defendants.csv: cases.json
cat $^ | jq '.[] | . as $$p | .defendants[] | [., $$p.case_number] | @csv' -r > $@

cases.json : civil-2.jl civil-3.jl civil-4.jl civil-5.jl \
civil-6.jl civil-101.jl civil-104.jl civil-11.jl \
civil-13.jl civil-14.jl civil-15.jl civil-17.jl chancery.jl
cases.json : chancery.jl
cat $^ | sort | python scripts/remove_dupe_cases.py | jq --slurp '.' > $@

# Query parameterized by civil case subdivision
Expand Down
Empty file.
2 changes: 1 addition & 1 deletion courtscraper/spiders/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ class UnsuccessfulAutomation(Exception):

class CourtSpiderBase(ABC, Spider):
def __init__(
self, division="2", year=2022, start=0, case_numbers_file=None, **kwargs
self, division="2", year=2023, start=0, case_numbers_file=None, **kwargs
):
self.year = year
self.misses = set()
Expand Down
2 changes: 1 addition & 1 deletion courtscraper/spiders/chancery.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ class ChancerySpider(CourtSpiderBase):
name = "chancery"
url = "https://casesearch.cookcountyclerkofcourt.org/CivilCaseSearchAPI.aspx"

def __init__(self, year=2022, **kwargs):
def __init__(self, year=2023, **kwargs):
self.case_type = CASE_FORMAT
super().__init__(**kwargs)

Expand Down
2 changes: 1 addition & 1 deletion courtscraper/spiders/civil.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ class CivilSpider(CourtSpiderBase):
name = "civil"
url = "https://casesearch.cookcountyclerkofcourt.org/CivilCaseSearchAPI.aspx"

def __init__(self, division="2", year=2022, **kwargs):
def __init__(self, division="2", year=2023, **kwargs):
self.case_type = DIVISIONS[division]
super().__init__(**kwargs)

Expand Down
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,4 @@ datasette
csvs-to-sqlite
sqlite-utils
csvkit
sqlean.py
2 changes: 1 addition & 1 deletion scripts/nightly_civil_start.sql
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ WITH serials AS (
court_case
WHERE
court = 'civil'
AND subdivision = ':subdivision'
AND subdivision = :subdivision /* noqa */
AND substr(case_number, 1, 4) = strftime('%Y', current_timestamp)
)

Expand Down