Scout

This is a data aggregation framework for scouting and aggregating Scientific Data.

The framework contains 3 modules:

scider - a scientific data spider
sanitizer - sanitising the aggregated data to use it further for text mining and processing.
db - database module that stored the data into database (Currently supports MongoDB only)

How to install

#Install scout development version, no stable version yet
pip install -e  git+https://github.com/invaana/scout.git#egg=scout

How to use

Step1:  Create a scider input json file 
# example : examples/configs/github.json

from scout.scider.tasks import scrape_website_task
from scout.scider import helpers

config_file = "configs/github.json"
config = helpers.read_json_file(config_file)

scrape_website_task(config=config, max_limit=30, save=True) 

:param config: config file in dict format
:param max_limit: max number of entry scraping after which, the scraper should halt
:param save: should the data be saved to db.

To run the job in queue scrape_website_task.delay(config=config, max_limit=30, save=True)

This module is designed by Data Science team for internal usage at Invaana. If you are a scientific data enthusiast, we'd love to know more about your interests. Let us know @invaana !

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
examples		examples
scout		scout
tests		tests
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scout

How to install

How to use

About

Releases

Packages

Languages

raviarrow88/lucy-scout

Folders and files

Latest commit

History

Repository files navigation

Scout

How to install

How to use

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages