Data Scraping (link collector)

How it works?

Should set two parameters: URL and DEPTH.

Depth means how much further you want to go. If set to 0, it'll collect all hyperlinks from the given URL. If set to 1, it'll do the same for every hyperlink found before. and so on...

How to run?

1 - Clone the repository

git clone https://github.com/gph/link-collector.git

2 - Install dependencies

pip install -r requirements.txt

3 - How to import

url = 'https://example.com/'

list_links_found = search(url=url, depth=1).items()
    
sorted_by_datetime = sorted(list_links_found, key=lambda d: d[1])

for link, dt in sorted_by_datetime:
    print(f'{dt} {link}')

PS: I did it for a job interview assignment.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md
__init__.py		__init__.py
link_collector.py		link_collector.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Scraping (link collector)

How it works?

How to run?

About

Releases

Packages

Languages

gph/link-collector

Folders and files

Latest commit

History

Repository files navigation

Data Scraping (link collector)

How it works?

How to run?

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages