-
Notifications
You must be signed in to change notification settings - Fork 9.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
makebox doesn't output horizontal coordinates of textangle 90 content #3590
Comments
Reproduced with:
That gives in
The coordinates of the bounding boxes on character level are wrong in the output. I would expect bounding boxes in the coordinate system of the page image like the bounding boxes at word level. That means, they inherit the |
I introduced a pull request for the solution:
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Environment
using nld language.
the language nld (and eng) from https://github.com/tesseract-ocr/tessdata with these sizes:
15400601 eng.traineddata
8903736 nld.traineddata
Makebox does output this for a part in the image that has vertically oriented text (textangle 90):
(all horizontal coordinates and widths are 0)
This is the image that was used for this data:
210913.nog.2-000na.zip
A similar issue was filed earlier #2340, but the issuer https://github.com/dev884 didn't provide any pointer to his fix, he has no code at all in his account.
Expected Behavior:
I would expect the horizontal coordinates to resemble the ones in the word oriented hocr-output of the same region of the picture.
<span class=\'ocr_line\' id=\'line_1_1\' title="bbox 111 1289 133 1532; textangle 90; x_size 28.416666; x_descenders 7.1041665; x_ascenders 7.1041665">\n <span class=\'ocrx_word\' id=\'word_1_1\' title=\'bbox 112 1470 133 1532; x_wconf 88\'>2084</span>\n <span class=\'ocrx_word\' id=\'word_1_2\' title=\'bbox 124 1451 127 1459; x_wconf 88\'>-</span>\n <span class=\'ocrx_word\' id=\'word_1_3\' title=\'bbox 111 1403 133 1441; x_wconf 96\'>2/2</span>\n <span class=\'ocrx_word\' id=\'word_1_4\' title=\'bbox 112 1289 133 1384; x_wconf 96\'>251980</span>\n </span>\n
Suggested Fix:
Not thought of any yet. I don't know if the workaround of the previous issuer could be made watertight.
The text was updated successfully, but these errors were encountered: