@gracielagon Unfortunately I am not aware of any recent ones. I know of a prior colleague doing something like that a few years ago. Paper: https://t.co/RvMP2n2nxB Software: https://t.co/GMJU32cROj
699 followers
6,891 followers
@aarontay there have been some attempts to create PDF "layout-aware" tools, but with neither perfect precision nor perfect recall, why not just use AAM instead to avoid the many problems of PDFs? If the AAM is a Word (.docx) file that's even better tbh htt
4,854 followers
To clarify. If publishers published HTML5 it would provide all I want and I wouldn't need XML. I have to convert their HTML to HTML5. https://t.co/OBiARkvQfM
694 followers
@petermurrayrust HTML5 is not meant as a typesetting format. We need PDF extraction to recover text & layout like in https://t.co/zcxCI9O2D5