Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scrape Filings from candidate and PAC committee details pages #24

Open
fgregg opened this issue Apr 8, 2024 · 3 comments
Open

Scrape Filings from candidate and PAC committee details pages #24

fgregg opened this issue Apr 8, 2024 · 3 comments

Comments

@fgregg
Copy link
Member

fgregg commented Apr 8, 2024

In order to get the opening and closing balance for candidates and committees we need to get their filings from the candidate/committee detail pages.

Screenshot 2024-04-08 at 11-24-03 New Mexico Campaign Finance System
https://login.cfis.sos.state.nm.us/#/exploreDetails/RiiKoPNxtHg4P69Mc3r0NH1lK5MpzTLbNw12UnzEQ-I1/14/22/120/2024

We need to download the filings, i.e. https://login.cfis.sos.state.nm.us//ReportsOutput//103/b6375ec9-9605-474c-843a-f7cb732c0f35.pdf

and extract this table:
Screenshot 2024-04-08 at 11-25-01 rpt_File_ExpAndConReport - b6375ec9-9605-474c-843a-f7cb732c0f35 pdf

We already have a scraper that can visit every detail page: https://github.com/datamade/nmid-scrapers/blob/main/scrapers/office/scrape_search.py

Let's hook into the scrape method to make the ajax call to get the details about the filings. We will then need to fetch the pdf and scrape out the info from the pdf.

Right now the scraper yields rows for candidates, on for each campaign year.

i would like the scraper to yield an object like {'years': [...current info that we are scraping], and 'filings': [all the metadata about the filing from the ajax call plus the information scraped out of the pdf]}

@fgregg
Copy link
Member Author

fgregg commented Apr 8, 2024

Let's get to this point, and then we can talk about CSV outputs.

@antidipyramid
Copy link
Contributor

@fgregg I've noticed a few things:

  1. For some candidates and committees, the year parameter in the filings endpoint isn't what you'd expect. For example, the URL for this candidate's 2024 filings require a year param of 3528 (it doesn't work with the value 2024). Instead of querying a specific year, you can use electionYear=All.

  2. The filing URL for some candidates (and all committees?) requires a committeeID value that differs from the IDNumber returned by the search endpoint (e.g. this candidate)

@fgregg
Copy link
Member Author

fgregg commented Apr 12, 2024

closed by #25

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants