Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new ways for scraping data. #2

Open
hsfzxjy opened this issue Dec 21, 2017 · 0 comments
Open

new ways for scraping data. #2

hsfzxjy opened this issue Dec 21, 2017 · 0 comments

Comments

@hsfzxjy
Copy link
Collaborator

hsfzxjy commented Dec 21, 2017

Due to the weird design of WikiMedia, Igem Parts Registry may use non-semantic tags for documentation rendering (e.g. use <table> for typography). This may lead to incorrect parsing by html2markdown, and mess up the final result. Comparatively, directly accessing the Edit page (take BBa_K2042000 as an example) can fetch the raw page code in WikiText format, which can make the parsing simpler and preciser.

Besides, the History page in wikitools will list out recent changes on the page, so there is no need to re-fetch the whole page entirely during upgrading.

@hsfzxjy hsfzxjy added this to the v3.0 schedule milestone Dec 21, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant