Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDF Parser Error #4

Open
nsamarin opened this issue Jul 5, 2021 · 0 comments
Open

PDF Parser Error #4

nsamarin opened this issue Jul 5, 2021 · 0 comments

Comments

@nsamarin
Copy link
Member

nsamarin commented Jul 5, 2021

Traceback (most recent call last):
  File "/Users/nsamarin/Projects/ccpa-compliance/scripts/scraper/main.py", line 159, in <module>
    scrape_policies(**kwargs)
  File "/Users/nsamarin/Projects/ccpa-compliance/scripts/scraper/main.py", line 132, in scrape_policies
    future.result()
  File "/opt/anaconda3/envs/ccpa/lib/python3.9/concurrent/futures/_base.py", line 433, in result
    return self.__get_result()
  File "/opt/anaconda3/envs/ccpa/lib/python3.9/concurrent/futures/_base.py", line 389, in __get_result
    raise self._exception
  File "/opt/anaconda3/envs/ccpa/lib/python3.9/concurrent/futures/thread.py", line 52, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/opt/anaconda3/envs/ccpa/lib/python3.9/site-packages/polipy/polipy.py", line 292, in download_policy
    policy.extract(extractors=extractors)
  File "/opt/anaconda3/envs/ccpa/lib/python3.9/site-packages/polipy/polipy.py", line 112, in extract
    content = extract(extractor, **vargs)
  File "/opt/anaconda3/envs/ccpa/lib/python3.9/site-packages/polipy/extractors.py", line 11, in extract
    content = extract_text(**kwargs)
  File "/opt/anaconda3/envs/ccpa/lib/python3.9/site-packages/polipy/extractors.py", line 18, in extract_text
    content = extract_pdf(static_source)
  File "/opt/anaconda3/envs/ccpa/lib/python3.9/site-packages/polipy/extractors.py", line 28, in extract_pdf
    text = parse_pdf(f)
  File "/opt/anaconda3/envs/ccpa/lib/python3.9/site-packages/pdfminer/high_level.py", line 114, in extract_text
    for page in PDFPage.get_pages(
  File "/opt/anaconda3/envs/ccpa/lib/python3.9/site-packages/pdfminer/pdfpage.py", line 128, in get_pages
    doc = PDFDocument(parser, password=password, caching=caching)
  File "/opt/anaconda3/envs/ccpa/lib/python3.9/site-packages/pdfminer/pdfdocument.py", line 596, in __init__
    raise PDFSyntaxError('No /Root object! - Is this really a PDF?')
pdfminer.pdfparser.PDFSyntaxError: No /Root object! - Is this really a PDF?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant