-
Notifications
You must be signed in to change notification settings - Fork 377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fields with periods are truncated #324
Comments
So I used qpdf's QDF mode (
So if you look at the
So I guess what pdf2json needs to do is to recursively go back and find each parent until there is no parent and it needs to prepend each parent to the |
After I encountered this issue I tried parse the PDF demonstrating this issue with other PDF parsers and https://github.com/smalot/pdfparser was able to parse fields with one dot in them but when the field had two or more dots in it it broke. I filed a bug report against that package and they fixed it. Quoting their response:
I mean, maybe this is more of an issue with https://github.com/mozilla/pdf.js than it is with pdf2json but if that were the case I should think that the devs of this package - with their superior knowledge of the pdf.js API - should be able to create a reproduceable example of the issue using their API and then file a bug report with them... |
So I have a PDF with just one field on it - a field named "xxx.yyy". When I run pdf2json 3.0.5 on the PDF I'm told that the only field on that PDF is "yyy".
test.pdf demonstrates the problem.
Here's what Adobe Acrobat Pro 2020 shows:
pdftk 2.02 also finds "xxx.yyy" when I run
pdftk test.pdf dump_data_fields
:Unfortunately, pdftk doesn't return the coordinates whereas pdf2json does.
According to
qpdf test.pdf --json
the field's alternativename, fullname and mappingname are "xxx.yyy" whereas the partialname is "yyy" so maybe that's the issue?The text was updated successfully, but these errors were encountered: