javascript - Parsehub Selection Node Syntax -
i'm trying use parsehub extract data website. using selection tool able isolate title header of each section unable deselect first cell of second header row using alt-click. selection node criteria changes actual selection not. block of html in question
<tr> <td width="100%" align="center"> <table width="493"> <tr><td></td></tr> <tr><td colspan="3"> </td></tr> <tr bgcolor="#99cc00" height="17"> <th height="17" colspan="3" title="scratcher name"><div align="center" class="txt_white_bold">lucky 7`s #348</div></th> </tr> <tr bgcolor="#99cc00" height="17"> <th height="17"><div align="center" class="txt_white_bold">prize amount</div></th> <th align="right"><div align="center" class="txt_white_bold">prizes remaining</div></th> <th align="right"><div align="center" class="txt_white_bold">total prizes</div></th> </tr>
the selection node code follows selection 1
{ "op": "select", "tag": "tr", "alldescendants": true, "flags": [ { "position": 4 } ] }
selection 2
{ "op": "select", "tag": "th", "position": 1 }
selection 3
{ "op": "select", "tag": "div", "classes": [ "txt_white_bold" ], "position": 1 }
the current output is
{ "selection1":[ { "extract1":"lucky 7`s #348" }, { "extract1":"prize amount" },
etc.. how select "scratcher name" , not "prize amount"?
my first thought change 'selection 3' select items within th have title="scratcher name", have not been successful in coding correctly.
parsehub's learning algorithms don't yet take attributes account, in (fairly rare) cases, won't expect. in case, can use css or xpath selector manually select elements want.
to so:
- make arbitrary selection
- click green edit button in node details
- delete textareas exist except one
replace json in remaining textarea with
{ "op": "cssselect", "selector": "th[title='scratcher name'] div.txt_white_bold", "alldescendants": true }
you can use xpathselect if you'd use xpath instead.
Comments
Post a Comment