Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scraper Development Guide #861

Open
4 of 5 tasks
strangetom opened this issue Sep 15, 2023 · 6 comments
Open
4 of 5 tasks

Scraper Development Guide #861

strangetom opened this issue Sep 15, 2023 · 6 comments

Comments

@strangetom
Copy link
Collaborator

strangetom commented Sep 15, 2023

Hi guys

One of the suggestions for improving the developer experience in #617 is writing some developer guidance documentation, it's also been mentioned in a few issues and PRs lately, so I though I would have a go at starting something.

I've come up with a rough outline of what the docs could cover:

  • A step by step guide to developing a new scraper.
    This would start from identifying a website, and cover generating the scraper and tests, adding functionality to the scraper, adding functionality to the test cases.
    This would be the main piece of documentation, and it would then link out to some more in depth articles to cover the following specific topics:
  • A more detailed definition of what the Scraper methods are and what they should return (in terms of datatypes and content) and which Scraper methods are 'mandatory' (e.g. title, ingredients, instructions ...) and which are more 'optional' (e.g. ingredient groups, ratings, reviews ...).
  • A more detailed guide on scraping from the html. I see this being a bit like a cookbook of common patterns and best practice.
  • A detailed guide for adding ingredient groups. This would effectively take the guidance I wrote in Updating existing scrapers to support ingredient groups #799 and tidying it up.
  • A more detailed guide on debugging scraper during development.

A couple of questions I have:

  1. What format should this take?
    a. Github wiki?
    b. Markdown files in a docs folder?
    c. Sphinx (or similar) generated pages?
  2. Are there any topics people would like to see covered that I haven't mentioned above?

Progress

Contributions for any of the current unwritten guides or any additional documentation is welcome.

@jayaddison
Copy link
Collaborator

What format should this take?

I'd vote for markdown files within the repository, with a wiki as my second preference.

Reasoning: markdown is fairly straightforward and readable with or without supporting tooling, and GitHub previews it automatically, meaning that casual visitors to our repository could read it effectively too. It's also available while working with the code (whether in an IDE, online, or command-line), a benefit over the web-based wiki. Finally: some documentation changes are closely related to code changes, and the ability to include both in the same pull request / commit (when beneficial) could be useful.

@jayaddison
Copy link
Collaborator

(also: thanks for getting this discussion going!)

@strangetom strangetom mentioned this issue Sep 16, 2023
6 tasks
@strangetom
Copy link
Collaborator Author

Thanks @jayaddison.

I'm glad you've voted for markdown files, as that was my preference too. I've created a draft PR #862 with a starting point and I'll continue adding to it as I get chance.

@hhursev
Copy link
Owner

hhursev commented Sep 16, 2023

I second the markdown files yep. I feel like mkdocs + material theme seems to be the pick nowadays in the python community. I'd vote for that specific combo with search plugin included. Sounds like a nice starting point.

@jayaddison
Copy link
Collaborator

@strangetom maybe worth updating the issue description to use a Markdown checklist, and ticking off the items completed? (most of them :)) I'm thinking it might help some other contributor to see where they can help.

@strangetom
Copy link
Collaborator Author

Updated :)

@jayaddison jayaddison pinned this issue Oct 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants