You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
We are trying to do the LLM extraction using the sample code provided here.
This is how we have added the LLM details
async with AsyncWebCrawler(verbose=True) as crawler:
result = await crawler.arun(
url="https://www.nbcnews.com/business",
extraction_strategy=LLMExtractionStrategy(
provider="openai/gpt-4o",
base_url="https://xxx.openai.azure.com/openai/deployments/xx/chat/completions?api-version=xx",
api_token="xxxx",
instruction="Extract only content related to technology"
),
bypass_cache=True,
)
These same credentials are working in other codes that we have for other use cases. However, when we try to run the sample code, we are getting the error as below.
[LOG] 🌤️ Warming up the AsyncWebCrawler
[LOG] 🌞 AsyncWebCrawler is ready to crawl
[LOG] 🕸️ Crawling https://www.nbcnews.com/business using AsyncPlaywrightCrawlerStrategy...
[LOG] ✅ Crawled https://www.nbcnews.com/business successfully!
[LOG] 🚀 Crawling done for https://www.nbcnews.com/business, success: True, time taken: 8.29 seconds
[LOG] 🚀 Content extracted for https://www.nbcnews.com/business, success: True, time taken: 0.34 seconds
[LOG] 🔥 Extracting semantic blocks for https://www.nbcnews.com/business, Strategy: AsyncWebCrawler
[LOG] Call LLM for https://www.nbcnews.com/business - block index: 0
[LOG] Call LLM for https://www.nbcnews.com/business - block index: 1
[LOG] Call LLM for https://www.nbcnews.com/business - block index: 2
[LOG] Call LLM for https://www.nbcnews.com/business - block index: 3
Give Feedback / Get Help: https://github.com/BerriAI/litellm/issues/new
LiteLLM.Info: If you need to debug this error, use `litellm.set_verbose=True'.
[LOG] Call LLM for https://www.nbcnews.com/business - block index: 4
Error in thread execution: litellm.NotFoundError: NotFoundError: OpenAIException - Error code: 404 - {'error': {'code': '404', 'message': 'Resource not found'}}
[LOG] Call LLM for https://www.nbcnews.com/business - block index: 5
Error in thread execution: litellm.NotFoundError: NotFoundError: OpenAIException - Error code: 404 - {'error': {'code': '404', 'message': 'Resource not found'}}
Error in thread execution: litellm.NotFoundError: NotFoundError: OpenAIException - Error code: 404 - {'error': {'code': '404', 'message': 'Resource not found'}}
[LOG] 🚀 Extraction done for https://www.nbcnews.com/business, time taken: 33.02 seconds.
Number of tech-related items extracted: 6
Traceback (most recent call last):
File "C:\test.py", line 31, in <module>
asyncio.run(extract_tech_content())
File "C:\AppData\Local\Programs\Python\Python312\Lib\asyncio\runners.py", line 194, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "C:\AppData\Local\Programs\Python\Python312\Lib\asyncio\runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\AppData\Local\Programs\Python\Python312\Lib\asyncio\base_events.py", line 687, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "C:\test.py", line 28, in extract_tech_content
with open(".data/tech_content.json", "w", encoding="utf-8") as f:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '.data/tech_content.json'
The text was updated successfully, but these errors were encountered:
@MeghanaSrinath Thanks for using Crawl4AI. The error message is coming from the litellm library that we use to communicate with the language model. It seems that it cannot find the standard Open AI interface from the base URL that you passed. One thing we can do is try to use the standard Open AI base url (do not pass anything) and make sure that works. If that works, it means there must be something about the base URL that you are passing. In the worse scenario, you can create a temporary API token for me, and then I'll test it on my end to figure out why it doesn't work and I will fix it for you. Also please share with me the full code have you show me the full code, including the part where you are saving the data into tech_content.json.
me I use the .env with this and I don't put base_url in the LLMExtractionStrategy:
AZURE_API_BASE=https://xxxxx.openai.azure.com/
AZURE_DEPLOYMENT=gpt4o-mini
AZURE_API_VERSION="2024-06-01"
Hi,
We are trying to do the LLM extraction using the sample code provided here.
This is how we have added the LLM details
These same credentials are working in other codes that we have for other use cases. However, when we try to run the sample code, we are getting the error as below.
The text was updated successfully, but these errors were encountered: