15 fix cle design review #34

zabrahams · 2021-12-03T15:52:10Z

Summary

Issue: #15

Checklist

All checks are run in GitHub Actions. You'll be able to see the results of the checks at the bottom of the pull request page after it's been opened, and you can click on any of the specific checks listed to see the output of each step and debug failures.

Tests are implemented
All tests are passing
Style checks run (see documentation for more details)
Style checks are passing
Code comments from template removed

Questions

I had a couple of questions!
The first is that the page changed in such a way that it's not trivial to figure out upcoming meetings. There are lists of existing agendas and then it just says the meeting is on the 1st and 3rd Tuesday of the month, or something like that. The strategy I adopted was to treat existing links as authoritative and then to calculate out meeting dates for 60 days following the last agenda item for each subcommittee. I'm not sure if that's reliable enough - since there's no way to know if meetings are cancelled until after the next agenda pops up? But maybe it's better then not including future meetings? I don't know?

I think this problem comes up in a few places - the city planning scraper (#12) will have a similar issue, I saw something similar discussed in #30 and #5. If we want to do something like the calculator I wrote, it might make sense to extract it into a mixin or utility package so that we can reuse it in multiple scrapers? But again, I'm not super sure!

I also wasn't sure about how to handle the webex and covid information. I put it in the time notes since that seemed to be the only place for generic notes about how the meeting would occur? But I'm not sure if that will work, especially with how the meeting gets displayed to end users in the calendar?

Lastly I know my python is kind of rough so I'm totally open to any stylistic comments! Like with setting the tmp var on line 121 in the scraper - I'm not sure that it's needed but I did it to be safe! Just let me know any changes that would help!

Thanks!!!!

skorasaurus · 2021-12-06T01:16:04Z

city_scrapers/spiders/cle_design_review.py

+            2. There is no mention of the year anywhere in the text of the site.  We can extract it from the agenda link - at least
+               for now. But it will be important to keep an eye on how the site is changed in January.
+
+            3. Meetings are currently not being held in person but over webex.  I've included this information in the time_notes section of the


This sounds good for now (re: point 3).

skorasaurus · 2021-12-06T01:48:44Z

Thanks soo much for the detailed write-up here!

To answer some of the questions that you've raised, I'll try to answer them below:

I've repeated some of your code comments below so that @lcaswell and @PaulR-Docs from the Documenters can pick it up easier.

I think for our use case, it would probably continue to split out the Planning Commission and Design Review meetings as separate organizations; (meaning Planning Commission is listed separately at https://cleveland.documenters.org/agencies/cleveland-city-planning-commission-179/ and Design Review Committees at https://cleveland.documenters.org/agencies/cleveland-design-review-advisory-committees-157/ ;

Would like to hear @PaulR-Docs thoughts on that ^^

Regarding Point 3 at

city-scrapers-cle/city_scrapers/spiders/cle_design_review.py

Line 52 in 8d9b5d8

3. Meetings are currently not being held in person but over webex. We've

Putting the web_ex meeting information in the time_notes field is probably a good short-term solution.

The schema that we use to parse out the information about a meeting (start time of meeting, end time, day of meeting, physical location, etc) and assign it to a corresponding field does not have a field for information about how to access a meeting online.

Other information that doesn't fall in neatly into one of the meeting fields in the schema has been included in time_notes field for other scrapers.

(For example, Cuyahoga County Library - code

city-scrapers-cle/city_scrapers/spiders/cuya_library.py

Line 64 in 00f5db6

time_notes="Details may change, confirm with staff before attending", # noqa

is also shown visibily https://cleveland.documenters.org/meetings/finance-committee-43120/)

@lcaswell, @PaulR-Docs
What this is means is that the information to join the meeting online would appear
just below the meeting time on an meeting's page (similar to how "Details may change, confirm with staff before attending" appears at https://cleveland.documenters.org/meetings/finance-committee-43120/)

Regarding:

city-scrapers-cle/city_scrapers/spiders/cle_design_review.py

Line 37 in 8d9b5d8

1. The way the data is presented doesn't make it easy to know whether or

To sum up; what @zabarahms has written:

Design Review Committees meets on fixed days (e.g. "the 1st and 3rd Friday of the month") and the specific meeting dates (e.g. January 1, 2022) are not explicitly listed on the website (to scrape).

From a technical perspective, there hasn't been a best practice yet emerge of how to automatically scrape these meetings; across other cities' scrapers, even in other cities. #5

As I understand (please correct me if I'm wrong!)
@zabrahams approach is to calculate whether any meeting times and dates would happen in the next 60 days, based on those date rules (.e.g. 1st and 3rd Friday of the month), create meetings and populate them into Documenters Calendar.
(since they aren't listed on the https://planning.clevelandohio.gov/designreview/schedule.php webpage)

The risk with this approach (assuming the code works exactly as intended), as @zabrahams mentioned, is that meeting dates would be listed at https://cleveland.documenters.org/meetings/
but those meetings might be rescheduled (e.g. holidays) or cancelled. We are assuming that those meetings are always happening.

cc'ing @PaulR-Docs and @lcaswell would like your thoughts on this. ^^^ Would this be problematic for the Documenters?

skorasaurus · 2021-12-06T01:49:45Z

Lastly I know my python is kind of rough so I'm totally open to any stylistic comments! Like with setting the tmp var on line 121 in the scraper - I'm not sure that it's needed but I did it to be safe! Just let me know any changes that would help!

(@RaghavRao, interested in your thoughts from a technical perspective on this ^^^ )

PaulR-Docs · 2021-12-06T19:29:11Z

@zabrahams @skorasaurus Thanks for working on this! I think it is perfectly fine to keep the Design Review Board meetings and Planning commission as separate agencies. We also keep the board of zoning appeals as a separate agency even though they are part of the Planning commission too.

As far as the Web-ex meeting links go, that doesn't really need to be scraped, since our assignment-making process includes manually entering instructions and meeting links for the documenters to access the meetings. Also, these are instructions that only assigned documenters can see, so there is less of a risk of zoombombing by bots that could potentially scrape our site for meeting links.

For example, some GCRTA meetings, like the CAC meetings, are held via web-ex only and are not streamed on FB live. For those we will add the web-ex links when we turn the scraped meeting into an assignment. The RTA/transit scraper never scrapes the meeting links directly.

I would suggest making the meeting description for these Design Review Board meetings say something simple like "Meetings will be held via Web-ex." Then when our team creates those assignments, we can just go to the source site, copy the meeting webex or zoom link and put that directly into the assignment instructions.

Let me know if that works.

PaulR-Docs · 2021-12-06T19:34:30Z

@zabrahams @skorasaurus We also check that meetings are happening before we make them into assignments so having the scraper just stick with the published schedule (1st and 3rd friday) is fine for now as far as assigning work to documenters is concerned.

Having a line in the description about checking with staff is great too, in case any member of the public looking at our list of public meetings and wants to attend one.

…wn timing bug

zabrahams · 2021-12-07T17:28:29Z

@skorasaurus @PaulR-Docs Thanks for the comments! The discussion is super helpful! I updated the PR to move what had been in the time notes into the meeting description - and I added a sentence to the upcoming meeting description saying to check with staff!

I also removed that tmp var since it wasn't doing anything 😀

skorasaurus · 2021-12-16T21:21:44Z

so I don't forget: I think this is ready to go, but proposed to @zabrahams in slack about abstracting the _calculate_meeting_days_per_month and _parse_meeting_schedule_info and a couple related functions ;
but I defer to @zabrahams whether we should abstract them out so they can be reused by other scrapers (would most appropriate place be https://github.com/City-Bureau/city-scrapers-core/blob/main/city_scrapers_core/spiders/spider.py ? )

I totally defer to you and don't want to hold this up much.

zabrahams · 2021-12-16T22:35:22Z

Hi! Sorry I've had a super busy week but I'm actually working on this now! I think it totally makes sense to abstract them out. I think there are 4 candidates I can see for where to put them:

In https://github.com/City-Bureau/city-scrapers-core/blob/main/city_scrapers_core/spiders/spider.py
As a mixin, which gets added via multiple inheritance -- like city_scrapers/mixins/cuya_county.py
In city_scrapers/utils.py (which doesn't exist here but does in other city scraper projects)
In a new city_scrapers/utils folder

I think my preference would be to go with (4)? (I put my reasoning below) Let me know what you think? And either way it wouldn't be to hard to move it around if someone changes their mind?

(1) feels to me like maybe we'd building functionality that only applies to small number of scraper into all our scrapers - it seems better to me to separate the functionality so that only the scrapers that need it pull it in? That way we keep the core api lean and flexible and treat this more like an extension?
(2) makes alot of sense, since we already have mixins, but it would be kind of inconsistent with the way the other mixins are used - they're more being treated like partial scrapers that just need some gaps filled in, as opposed to small utilities? It might cause confusion about what should count as a mixin in the future?
(3) and (4) are similar, but the current utils.py file doesn't seem to be getting used at the moment, and with util files there's always the danger that it just becomes a grab bag of different weird stuff, so I guess I'd prefer to create a folder so we can give it it's own file?

zabrahams · 2021-12-16T22:43:32Z

Though maybe it would make sense to make it a utility in core if we think other cities are going to need it too?

zabrahams · 2021-12-17T16:34:33Z

@skorasaurus I think this is ready for rereview with abstracting out the calculator! Let me know what you think!

skorasaurus · 2021-12-23T00:30:58Z

Thanks; on initial thought; I like option 4; since it doesn't shove everything into one file; I'll try to take a look at it more in depth tomorrow, or more likely Friday

skorasaurus · 2021-12-27T19:35:54Z

Thanks so much for all of this work! We should keep an eye on this to ensure that it's working as intended in 2022.

zabrahams added 8 commits November 29, 2021 14:44

Adds new test data and comment explaining plan for the new scraper

57c5c28

Gets tests running with new scraper

6a292c6

Gets tests passing

28898df

Gets the calculator working

f10eb59

Gets everything working and tested

24cc6c5

Fixes all the linting issues

b97c5ec

Fix small comments issues

8d9b5d8

Fix import order

59322d5

skorasaurus reviewed Dec 6, 2021

View reviewed changes

Moves webex info to meeting, removes unnecessary tmp and fixes downto…

ef7f84f

…wn timing bug

zabrahams added 3 commits December 16, 2021 18:02

moves meeting date calculator

63de053

Gets tests written and everything linted

80dbbc3

cleans up date vs datetime in the calculator

ee4570b

skorasaurus merged commit 4ac7065 into City-Bureau:main Dec 27, 2021

skorasaurus mentioned this pull request Dec 27, 2021

Update City of Cleveland Design Review #15

Closed

skorasaurus mentioned this pull request Mar 19, 2022

How to scrape/create meetings that just list '3rd wednesday of the month'? (Veterans Service Commission) #5

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

15 fix cle design review #34

15 fix cle design review #34

zabrahams commented Dec 3, 2021

skorasaurus Dec 6, 2021

skorasaurus commented Dec 6, 2021 •

edited

Loading

skorasaurus commented Dec 6, 2021 •

edited

Loading

PaulR-Docs commented Dec 6, 2021

PaulR-Docs commented Dec 6, 2021

zabrahams commented Dec 7, 2021

skorasaurus commented Dec 16, 2021 •

edited

Loading

zabrahams commented Dec 16, 2021 •

edited

Loading

zabrahams commented Dec 16, 2021 •

edited

Loading

zabrahams commented Dec 17, 2021

skorasaurus commented Dec 23, 2021

skorasaurus commented Dec 27, 2021

15 fix cle design review #34

15 fix cle design review #34

Conversation

zabrahams commented Dec 3, 2021

Summary

Checklist

Questions

skorasaurus Dec 6, 2021

Choose a reason for hiding this comment

skorasaurus commented Dec 6, 2021 • edited Loading

skorasaurus commented Dec 6, 2021 • edited Loading

PaulR-Docs commented Dec 6, 2021

PaulR-Docs commented Dec 6, 2021

zabrahams commented Dec 7, 2021

skorasaurus commented Dec 16, 2021 • edited Loading

zabrahams commented Dec 16, 2021 • edited Loading

zabrahams commented Dec 16, 2021 • edited Loading

zabrahams commented Dec 17, 2021

skorasaurus commented Dec 23, 2021

skorasaurus commented Dec 27, 2021

skorasaurus commented Dec 6, 2021 •

edited

Loading

skorasaurus commented Dec 6, 2021 •

edited

Loading

skorasaurus commented Dec 16, 2021 •

edited

Loading

zabrahams commented Dec 16, 2021 •

edited

Loading

zabrahams commented Dec 16, 2021 •

edited

Loading