extraction - Can Apache Tika (or any JVM library) produce ordered extract data referencing images, as well as text, e.g. from a PDF? -
basically, want able create set of 'slides' based on extracts documents such pdfs or word docs, programmatically.
for this, i need know [roughly] in text embed images placed, such, dumping image resources out disk wouldn't help*.
i'm java dev, don't fear code ;-)
*unless of course there references within [tika] extract output, @ appropriate position(s) or line(s).
Comments
Post a Comment