JSoup - preserve html entities when outputting as utf-8? -


i want preserve html entities while using jsoup. here utf-8 test string website:

string html = "<html><body>hello &#151; world</body></html>";  string parsed = jsoup.parse(html).tostring(); 

if printing parsed output in utf-8, looks sequence &#151 gets transformed character code point value of 151.

is there way have jsoup preserve original entity when outputting utf-8? if output in ascii encoding:

document.outputsettings settings = new document.outputsettings(); settings.charset(charset.forname("ascii")); jsoup.parse(html).outputsettings(settings).tostring(); 

i'll get:

hello &#x97; world 

which i'm looking for.

you have hitted missing feature of jsoup (as of writing jsoup 1.8.3).

i can see 3 options:

option 1

send request feature on https://github.com/jhy/jsoup i'm not sure you'll added soon...

option 2

use workaround provided in answer: https://stackoverflow.com/a/34493022/363573

option 3

write custom nodevisitor turn character code point value html equivalent escape sequence.


Comments

Popular posts from this blog

python - TypeError: start must be a integer -

c# - DevExpress RepositoryItemComboBox BackColor property ignored -

django - Creating multiple model instances in DRF3 -