Skip to content

burchill/web-scraping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Update!

Hey to anyone using this! Starting April 2nd, 2020, I will be abandoning this repo for a private, forked version of it. Every so often, I might make some updates to this repo if they're general enough, but I'll probably forget to, if I'm honest. There's still good stuff here, but I no longer want to deal with the hassle of constantly guessing what should be public and what should be private.

Thanks!


My web-scraping projects

This is (most) of the code of my personal web-scraping Python projects. Feel free to use the source code to learn from, but if you borrow stuff, please source me.

Warning

This code is mostly for me. That means I'm still working on it, playing around with it, and I understand things about it that you, a stranger, may not. Although I "try" to document things and be clear, do not assume any of the code works in the way that you think it will.

Note:

Additionally, much of this code is aimed at scraping particular websites, so please do not start running stuff willy-nilly. For example, if you want the data from the website I scraped for the manga_updates.py/manga_project project, please just help yourself to what I've collected (everything_json.json and everything_json_issues_slimmed.json).

Please do not try to scrape it again yourself---they don't need multiple people bombarding their site with my code.

If you want to see how it's done by running through the code yourself, please limit yourself to ~20 requests per run.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published