You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Let's hook into the scrape method to make the ajax call to get the details about the filings. We will then need to fetch the pdf and scrape out the info from the pdf.
Right now the scraper yields rows for candidates, on for each campaign year.
i would like the scraper to yield an object like {'years': [...current info that we are scraping], and 'filings': [all the metadata about the filing from the ajax call plus the information scraped out of the pdf]}
The text was updated successfully, but these errors were encountered:
For some candidates and committees, the year parameter in the filings endpoint isn't what you'd expect. For example, the URL for this candidate's 2024 filings require a year param of 3528 (it doesn't work with the value 2024). Instead of querying a specific year, you can use electionYear=All.
The filing URL for some candidates (and all committees?) requires a committeeID value that differs from the IDNumber returned by the search endpoint (e.g. this candidate)
In order to get the opening and closing balance for candidates and committees we need to get their filings from the candidate/committee detail pages.
https://login.cfis.sos.state.nm.us/#/exploreDetails/RiiKoPNxtHg4P69Mc3r0NH1lK5MpzTLbNw12UnzEQ-I1/14/22/120/2024
We need to download the filings, i.e. https://login.cfis.sos.state.nm.us//ReportsOutput//103/b6375ec9-9605-474c-843a-f7cb732c0f35.pdf
and extract this table:
We already have a scraper that can visit every detail page: https://github.com/datamade/nmid-scrapers/blob/main/scrapers/office/scrape_search.py
Let's hook into the
scrape
method to make the ajax call to get the details about the filings. We will then need to fetch the pdf and scrape out the info from the pdf.Right now the scraper yields rows for candidates, on for each campaign year.
i would like the scraper to yield an object like {'years': [...current info that we are scraping], and 'filings': [all the metadata about the filing from the ajax call plus the information scraped out of the pdf]}
The text was updated successfully, but these errors were encountered: