web scraping - Web Scraper for dynamic forms in python -
i trying fill form of website http://www.marutisuzuki.com/maruti-price.aspx.
it consists of 3 drop down lists. 1 model of car, second state , third city. first 2 static , third, city generated dynamically depending upon value of state, there onclick java script event running gets values of corresponding cities in state.
i familiar mechanize module in python. came across several links telling me cannot handle dynamic content in mechanize. link http://toddhayton.com/2014/12/08/form-handling-with-mechanize-and-beautifulsoup/ in section "adding item dynamically" states can use mechanize handle dynamic content did not understand line of code in it
item = item(br.form.find_control(name='searchauxcountryid'),{'contents': '3', 'value': '3', 'label': 3})
what "item" in line of code corresponding city field in form. came across selenium module might me handling dynamic drop down list. not able find in documentation or blog on how use it.
can 1 suggest me how submit form different models, states , cities? links on how solve problem appreciated. sample code in python on how submit form helpful. in advance.
if @ request being sent site in developer tools, you'll see post sent select state. response sent has form values in city dropdown populated.
so, replicate in script want following:
- open page
- select form
- select values model , state
- submit form
- select form response sent back
- select value city (it should populated now)
- submit form
- parse response table of results
that like:
#!/usr/bin/env python import re import mechanize bs4 import beautifulsoup def select_form(form): return form.attrs.get('id', none) == 'form1' def get_state_items(browser): browser.select_form(predicate=select_form) ctl = browser.form.find_control('ctl00$contentplaceholder1$ddlstate') state_items = ctl.get_items() return state_items[1:] def get_city_items(browser): browser.select_form(predicate=select_form) ctl = browser.form.find_control('ctl00$contentplaceholder1$ddlcity') city_items = ctl.get_items() return city_items[1:] br = mechanize.browser() br.open('http://www.marutisuzuki.com/maruti-price.aspx') br.select_form(predicate=select_form) br.form['ctl00$contentplaceholder1$ddlmodel'] = ['ak'] # model = maruti suzuki alto k10 state in get_state_items(br): # 1 - submit form state.name cities state br.select_form(predicate=select_form) br.form['ctl00$contentplaceholder1$ddlstate'] = [ state.name ] br.submit() # 2 - city dropdown filled state.name city in get_city_items(br): br.select_form(predicate=select_form) br.form['ctl00$contentplaceholder1$ddlcity'] = [ city.name ] br.submit() s = beautifulsoup(br.response().read()) t = s.find('table', id='contentplaceholder1_dtdealer') r = re.compile(r'^contentplaceholder1_dtdealer_lblname_\d+$') header_printed = false p in t.findall('span', id=r): tr = p.findparent('tr') td = tr.findall('td') if header_printed false: str = '%s, %s' % (city.attrs['label'], state.attrs['label']) print str print '-' * len(str) header_printed = true print ' '.join(['%s' % x.text.strip() x in td])
Comments
Post a Comment