multi-part figures often not merged #104

cambro · 2020-04-19T17:12:38Z

Much of the time, figures with multiple parts are segmented separately and not merged properly in post-processing, leaving multiple "figures" that cannot be matched to multiple "captions". This results in key figure parts being impossible to retrieve (i.e., they have no text and are associated with no captions). It's also likely that this is causing lower confidence on Figure proposals because "chopped up" figure and table elements are out of distribution (by definition rel. to our training data).

Revisiting the merging step for figures specifically is needed. Tables would also benefit from more sophisticated merging.

cambro · 2020-04-19T17:13:17Z

This is somewhat related to #103 and may be causing it in some cases.

ankur-gos · 2020-10-19T04:58:23Z

I'd like to revisit this and also come up with some test cases once we have the new coronavirus vertical live @cambro

cambro added bug Something isn't working enhancement New feature or request labels Apr 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multi-part figures often not merged #104

multi-part figures often not merged #104

cambro commented Apr 19, 2020

cambro commented Apr 19, 2020

ankur-gos commented Oct 19, 2020

multi-part figures often not merged #104

multi-part figures often not merged #104

Comments

cambro commented Apr 19, 2020

cambro commented Apr 19, 2020

ankur-gos commented Oct 19, 2020