JSoup - preserve html entities when outputting as utf-8? -
i want preserve html entities while using jsoup. here utf-8 test string website:
string html = "<html><body>hello — world</body></html>"; string parsed = jsoup.parse(html).tostring();
if printing parsed output in utf-8, looks sequence — gets transformed character code point value of 151.
is there way have jsoup preserve original entity when outputting utf-8? if output in ascii encoding:
document.outputsettings settings = new document.outputsettings(); settings.charset(charset.forname("ascii")); jsoup.parse(html).outputsettings(settings).tostring();
i'll get:
hello — world
which i'm looking for.
you have hitted missing feature of jsoup (as of writing jsoup 1.8.3).
i can see 3 options:
option 1
send request feature on https://github.com/jhy/jsoup i'm not sure you'll added soon...
option 2
use workaround provided in answer: https://stackoverflow.com/a/34493022/363573
option 3
write custom nodevisitor
turn character code point value html equivalent escape sequence.
Comments
Post a Comment