Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bro ,browser select options ? #141

Open
ZengMingDa opened this issue Oct 7, 2024 · 2 comments
Open

bro ,browser select options ? #141

ZengMingDa opened this issue Oct 7, 2024 · 2 comments
Assignees
Labels
question Further information is requested

Comments

@ZengMingDa
Copy link

bro ,browser select options ?

"IE 11 is not supported. For an optimal experience visit our site on another\nbrowser."

@unclecode
Copy link
Owner

Hi, could you please let me know what URL you're trying to crawl and details about your operating system, such as whether it's a Windows machine, Mac, or Ubuntu? I'll then test it and provide you with the results. Thank you.

@unclecode unclecode self-assigned this Oct 8, 2024
@unclecode unclecode added the question Further information is requested label Oct 8, 2024
@DengyiLiu
Copy link

DengyiLiu commented Oct 9, 2024

Hi unclecode, I also meet this issue:
"{
"index": 0,
"tags": [],
"content": "IE 11 is not supported. For an optimal experience visit our site on another\nbrowser."
}"
, I'm running this code:

import json

async def main():
    schema = {
        "name": "News Articles",
        "baseSelector": "article.tease-card",
        "fields": [
            {
                "name": "title",
                "selector": "h2",
                "type": "text",
            },
            {
                "name": "summary",
                "selector": "div.tease-card__info",
                "type": "text",
            }
        ],
    }

    async with AsyncWebCrawler(verbose=True) as crawler:
        result = await crawler.arun(
            url="https://www.nbcnews.com/business",
            extraction_strategy=JsonCssExtractionStrategy(schema, verbose=True)
        )
        extracted_data = json.loads(result.extracted_content)
        print(f"Extracted {len(extracted_data)} articles")
        print(json.dumps(extracted_data[0], indent=2))

asyncio.run(main()) ```


using a Ubuntu on WSL, would appreciate your answer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants