Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scraping other olx sites #7

Open
delighttechnology opened this issue Jul 29, 2023 · 4 comments
Open

Scraping other olx sites #7

delighttechnology opened this issue Jul 29, 2023 · 4 comments

Comments

@delighttechnology
Copy link

Hi,
Just a question, is it possible to scrap olx.bg, olx.ua, olx.pl or other with this scrapper?

@Allama272
Copy link
Owner

Allama272 commented Jul 30, 2023

Hi, Just a question, is it possible to scrap olx.bg, olx.ua, olx.pl or other with this scrapper?

Since every olx site has slightly different formatting/class names, you would need to change some parts of the code like the CSS class names for the price finder, bedrooms, area, etc...
Additionally, olx. pl and the others do not have the number of bedrooms in the main grid listing page so you would need to visit each listing URL to get the number of bedrooms. This will therefore significantly impact the time needed to scrap each page.
Rather than requesting 100 main pages, it will need to visit (100 x number of listing on each page ~50)= 5000. unless you are willing to omit such features.

Shouldn't take much time to make those changes though, If you are interested I can make another branch for those sites.

@delighttechnology
Copy link
Author

@Allama272
This would be super cool if you do so. I am currently looking for an apartment to buy on bank loan and want to analyze possible options on olx.pl . I hope that I could find something cheaper instead of looking for an apartment through realtors. At the end I want to create some kind of Power BI dashboard to follow the prices trends. I can share my dashboard after all :)

@Allama272
Copy link
Owner

@delighttechnology
I was actually thinking of creating a dashboard for this project when I get the chance.
As in the scrapper itself, as I explained earlier a fast and reliable scrapper would only get the features on the main page:
image
so only the price, general address, and area

Alternatively, we can get all the other features :
image
by visiting each listing on each page which will take much longer.

Which option would you like?
Note: I will not schedule autoruns so you will have to run this locally.

@delighttechnology
Copy link
Author

delighttechnology commented Jul 31, 2023

@Allama272,
I think for the purposes of this kind of analysis it would be beneficial to have second option with detailed listing.
Regarding scheduled runs - of course, I will do it on my site and probably schedule autoruns during night.
I think that first run would take much longer but then I will catch only new listings. I am interested in only one city so this limit results significantly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants