extraction - Can Apache Tika (or any JVM library) produce ordered extract data referencing images, as well as text, e.g. from a PDF? -


basically, want able create set of 'slides' based on extracts documents such pdfs or word docs, programmatically.

for this, i need know [roughly] in text embed images placed, such, dumping image resources out disk wouldn't help*.

i'm java dev, don't fear code ;-)

*unless of course there references within [tika] extract output, @ appropriate position(s) or line(s).


Comments

Popular posts from this blog

python - TypeError: start must be a integer -

c# - DevExpress RepositoryItemComboBox BackColor property ignored -

django - Creating multiple model instances in DRF3 -