Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multi-part figures often not merged #104

Open
cambro opened this issue Apr 19, 2020 · 2 comments
Open

multi-part figures often not merged #104

cambro opened this issue Apr 19, 2020 · 2 comments
Labels
bug Something isn't working enhancement New feature or request

Comments

@cambro
Copy link
Contributor

cambro commented Apr 19, 2020

Much of the time, figures with multiple parts are segmented separately and not merged properly in post-processing, leaving multiple "figures" that cannot be matched to multiple "captions". This results in key figure parts being impossible to retrieve (i.e., they have no text and are associated with no captions). It's also likely that this is causing lower confidence on Figure proposals because "chopped up" figure and table elements are out of distribution (by definition rel. to our training data).

Revisiting the merging step for figures specifically is needed. Tables would also benefit from more sophisticated merging.

@cambro cambro added bug Something isn't working enhancement New feature or request labels Apr 19, 2020
@cambro
Copy link
Contributor Author

cambro commented Apr 19, 2020

This is somewhat related to #103 and may be causing it in some cases.

@ankur-gos
Copy link
Contributor

I'd like to revisit this and also come up with some test cases once we have the new coronavirus vertical live @cambro

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants