Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

just want to ask is this tool good for scraping forums? #185

Open
iorilu opened this issue Oct 21, 2024 · 2 comments
Open

just want to ask is this tool good for scraping forums? #185

iorilu opened this issue Oct 21, 2024 · 2 comments
Assignees
Labels
question Further information is requested

Comments

@iorilu
Copy link

iorilu commented Oct 21, 2024

I want to scrape a forum to get data for maybe llm model fine-tuning

forums have boards
boards have threads
thread include details like title, author, datetime..etc
it's better to support pages

is this tool good for this requirement?

@unclecode unclecode self-assigned this Oct 21, 2024
@unclecode unclecode added the question Further information is requested label Oct 21, 2024
@unclecode
Copy link
Owner

Thanks for showing your interest in our library. I'd be happy to explain and provide assistance to help make your decision. As of now, our main focus has been on creating a Crawl function that is robust, fast, and can extract proper information from a given URL. And I can say that we have been able to achieve that level. The second part, which is currently under development and hopefully will be available within two to three weeks, is our scraper module. While the crawling goal involved focusing on a single URL, the scraping module's goal is to traverse the website as a graph, extract all the information in a neat and organized manner.

Right now, you can use the crawler, extract all the links and external links, and then decide what you're going to do about those links and crawl them again. Additionally, you can wait for these scraping modules to be released. However, remember that our library is making progress and we continue to grow as more people use it and raise their issues.

Therefore, when you start using our projects, you will get really good support at this stage of our library. We learn from your projects and improve our library, and in return, you will receive our support. Feel free to try, continue, and let us know; we'll help each other along the way. Thank you again.

@iorilu
Copy link
Author

iorilu commented Oct 21, 2024

thanks for the details , I will start trying this first

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants