Scrape Filings from candidate and PAC committee details pages #24

fgregg · 2024-04-08T15:28:15Z

In order to get the opening and closing balance for candidates and committees we need to get their filings from the candidate/committee detail pages.

https://login.cfis.sos.state.nm.us/#/exploreDetails/RiiKoPNxtHg4P69Mc3r0NH1lK5MpzTLbNw12UnzEQ-I1/14/22/120/2024

We need to download the filings, i.e. https://login.cfis.sos.state.nm.us//ReportsOutput//103/b6375ec9-9605-474c-843a-f7cb732c0f35.pdf

and extract this table:

We already have a scraper that can visit every detail page: https://github.com/datamade/nmid-scrapers/blob/main/scrapers/office/scrape_search.py

Let's hook into the scrape method to make the ajax call to get the details about the filings. We will then need to fetch the pdf and scrape out the info from the pdf.

Right now the scraper yields rows for candidates, on for each campaign year.

i would like the scraper to yield an object like {'years': [...current info that we are scraping], and 'filings': [all the metadata about the filing from the ajax call plus the information scraped out of the pdf]}

The text was updated successfully, but these errors were encountered:

fgregg · 2024-04-08T15:28:33Z

Let's get to this point, and then we can talk about CSV outputs.

antidipyramid · 2024-04-09T15:58:35Z

@fgregg I've noticed a few things:

For some candidates and committees, the year parameter in the filings endpoint isn't what you'd expect. For example, the URL for this candidate's 2024 filings require a year param of 3528 (it doesn't work with the value 2024). Instead of querying a specific year, you can use electionYear=All.
The filing URL for some candidates (and all committees?) requires a committeeID value that differs from the IDNumber returned by the search endpoint (e.g. this candidate)

fgregg · 2024-04-12T18:56:20Z

closed by #25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scrape Filings from candidate and PAC committee details pages #24

Scrape Filings from candidate and PAC committee details pages #24

fgregg commented Apr 8, 2024

fgregg commented Apr 8, 2024

antidipyramid commented Apr 9, 2024

fgregg commented Apr 12, 2024

Scrape Filings from candidate and PAC committee details pages #24

Scrape Filings from candidate and PAC committee details pages #24

Comments

fgregg commented Apr 8, 2024

fgregg commented Apr 8, 2024

antidipyramid commented Apr 9, 2024

fgregg commented Apr 12, 2024