cannot import name 'WebCrawler' from 'crawl4ai' #122

gulnihalk · 2024-10-02T16:11:22Z

Hi, when I try to run crawl4ai with microsoft edge on windows, I have this error below, ( same code works for ubuntu on chrome)

Traceback (most recent call last):
File "d:\work\indexing\scrapper.py", line 1, in
from crawl4ai import WebCrawler
ImportError: cannot import name 'WebCrawler' from 'crawl4ai' (C:\Users\abc..\Local\Programs\Python\Python310\lib\site-packages\crawl4ai_init_.py)

and here is my code below:

from crawl4ai import WebCrawler
import json
 
with open('D:\work\indexing\com\scrapped_urls.json', 'r') as file:
    json_data = json.load(file)
    print(type(json_data))
 
# Create an instance of WebCrawler
crawler = WebCrawler()
 
# Warm up the crawler (load necessary models)
crawler.warmup()
 
scrapped_file = 'D:\work\indexing\com\xyz.txt'
 
# Iterate through the JSON array
for item in json_data:
    #print("The url ", item["url"], " is scrapping...")
    # Run the crawler on a URL
    result = crawler.run(url=item["url"])
    # Put the scrapped text into file
    f = open(scrapped_file, "a")
    f.write(result.markdown)
    f.close()

Do you have any idea?

The text was updated successfully, but these errors were encountered:

unclecode · 2024-10-03T13:10:18Z

Thanks for using our library. I do have a question. When you say running our library with Microsoft Edge and Windows, could you please clarify what you mean by that? Crawl4AI does not have any integration with Microsoft Edge or any other browser on your computer. So, I'm guessing you might be experiencing an error related to a Windows OS. If that's the case, I manage some additional tests on Windows to determine the root cause of the issue. I will also review the code you shared to see if I can identify the problem. Meanwhile, We are working on adding a scraping engine to the library, so please stay tuned for that update.

asumansaree · 2024-10-04T11:41:57Z

Thanks for using our library. I do have a question. When you say running our library with Microsoft Edge and Windows, could you please clarify what you mean by that? Crawl4AI does not have any integration with Microsoft Edge or any other browser on your computer. So, I'm guessing you might be experiencing an error related to a Windows OS. If that's the case, I manage some additional tests on Windows to determine the root cause of the issue. I will also review the code you shared to see if I can identify the problem. Meanwhile, We are working on adding a scraping engine to the library, so please stay tuned for that update.

Hi @unclecode, thanks for your interest about our problem (we work together with @gulnihalk).
I've wrote this code in Ubuntu and my browser is Chrome. It scrapes all the urls inside the json file very well. That library is really good work! But when we try exactly same code (except the file paths) in Windows OS that has only Microsoft Edge browser, we got the error
ImportError: cannot import name 'WebCrawler' from 'crawl4ai' (C:\Users\abc..\Local\Programs\Python\Python310\lib\site-packages\crawl4ai_init_.py)
Even we install all possible dependencies of crawl4ai, and even change the classes inside source code for Edge (like changing self.driver = webdriver.Chrome(service=self.service) to -> self.driver = webdriver.Edge(service=self.service) inside the crawler_strategy.py code) it still doesn't work. Maybe those source codes are related to Selenium part. Selenium part is mentioned in the source code as this:

unclecode · 2024-10-08T10:10:06Z

@asumansaree Sorry for my late response, I've been on a short trip. I figured why it behaves this way. You are still using it in previous version which was synchronous by default, now it's asynchronous. To use it in sync mode, you have to import the web crawler directly from the crawler module from crawl4ai.web_crawler import WebCrawler. I suggest you switch to async mode which is using Playwright, faster and better abilities. Please refer to the documents and examples; it's a significant improvement. I will share code example for async version:

from crawl4ai import AsyncWebCrawler

async def simple_crawl():
    async with AsyncWebCrawler(verbose=True) as crawler:
        result = await crawler.arun(url="https://www.nbcnews.com/business")
        print(result.markdown[:500])  

async def main():
    await simple_crawl()

if __name__ == "__main__":
    asyncio.run(main())

unclecode self-assigned this Oct 3, 2024

unclecode added the question Further information is requested label Oct 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cannot import name 'WebCrawler' from 'crawl4ai' #122

cannot import name 'WebCrawler' from 'crawl4ai' #122

gulnihalk commented Oct 2, 2024 •

edited

Loading

unclecode commented Oct 3, 2024

asumansaree commented Oct 4, 2024 •

edited

Loading

unclecode commented Oct 8, 2024

cannot import name 'WebCrawler' from 'crawl4ai' #122

cannot import name 'WebCrawler' from 'crawl4ai' #122

Comments

gulnihalk commented Oct 2, 2024 • edited Loading

unclecode commented Oct 3, 2024

asumansaree commented Oct 4, 2024 • edited Loading

unclecode commented Oct 8, 2024

gulnihalk commented Oct 2, 2024 •

edited

Loading

asumansaree commented Oct 4, 2024 •

edited

Loading