Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #3 #22

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

nmolivo
Copy link

@nmolivo nmolivo commented Feb 24, 2021

Summary

Issue: #3

Checklist

All checks are run in GitHub Actions. You'll be able to see the results of the checks at the bottom of the pull request page after it's been opened, and you can click on any of the specific checks listed to see the output of each step and debug failures.

  • Tests are implemented
  • All tests are passing
  • Style checks run (see documentation for more details)
  • Style checks are passing
  • Code comments from template removed

Questions

@skorasaurus please review.

@skorasaurus
Copy link
Collaborator

Hi @nmolivo

Thanks for the PR; you may have forgotten some additional commits?

The URL for the library's board meetings has changed; the new urls are at: https://cpl.org/board-agendas/ or https://cpl.org/aboutthelibrary/board-of-trustees/board-agenda-archive/ (both pages are dynamically generated and will be updated simultaneously).

The layout at https://cpl.org/board-agendas/ is pretty similar to what was at the previous URL so that is probably a better place to scrape.

I made some changes in a couple commits that I made (skorasaurus@6a05d16) and skorasaurus@d5a48a4

but my code DOES NOT work: when I run scrapy crawl cle_library on my code, no results are returned; so something is wrong with the changes that I made (I don't know why ;)) (I'm not good at python so there's probably a simple mistake somewhere and I gotta get back to work right now ;)) but that may help and lead you down the right path, use as little or none if you want from that.

(For the test, I used the 2nd item in the array, instead of the first, because the formatting for the
was really unusual for the first item (The February meeting); we had delayed the meeting from Tuesday, the 16th (library was unexpected closed on 16th because of a snowstorm) to Thursday, the 18th.
That was only the 2nd time in my 4 years at CPL that a board meeting's date was rescheduled.

(Typically 'regular' board meetings are held on 3rd Thursday of the month but because Wednesday, the 17th, was CPL's 152nd anniversary, it was originally rescheduled for the 16th..).

Also,
This piece of code also helps explain a little bit of the schema of each of the items to return as well as https://github.com/City-Bureau/city-scrapers-core/blob/main/city_scrapers_core/templates/spider.tmpl

@nmolivo
Copy link
Author

nmolivo commented Feb 25, 2021

Hi, yes thank you! I didn't see that the URL had changed, just that it was successfully picking up info from the existing page, so I wasn't sure if work on this issue had already been merged and the pagination just needed removing! Thanks, looking forward to diving in!

@skorasaurus
Copy link
Collaborator

No problemo.
You'll know that it's correctly scraping if you run scrapy crawl cle_library and see several JSON
if it's correctly scraping, you'll see several JSON object (each object consists of several key:value pairs like the following (this is for the Cuyahoga County library scraper). Each JSON object represents a meeting/event.

For example, this is one of the objects that returns if you run the Cuyahoga County library scraper (scrapy crawl cuya_library)

{'all_day': False,
 'classification': 'Committee',
 'description': '',
 'end': datetime.datetime(2021, 1, 26, 18, 45),
 'id': 'cuya_library/202101261645/x/commission',
 'links': [{'href': 'https://cuyahogalibrary.org/getattachment/About-Us/Our-Organization/Board-Committee-Meetings/Agenda-Jan-26-2021-Records-Commission-Meeting.pdf.aspx?lang=en-US',
            'title': 'Agenda'}],
 'location': {'address': '2111 Snow Rd Parma, OH 44134',
              'name': 'Administration Building Auditorium'},
 'source': 'https://cuyahogalibrary.org/About-Us/Our-Organization.aspx',
 'start': datetime.datetime(2021, 1, 26, 16, 45),
 'status': 'passed',
 'time_notes': 'Details may change, confirm with staff before attending',
 'title': 'Commission'}

If you're not in your virtual environment, add pipenv run before the command.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants