web scraping - Web Scraper for dynamic forms in python -


i trying fill form of website http://www.marutisuzuki.com/maruti-price.aspx.

it consists of 3 drop down lists. 1 model of car, second state , third city. first 2 static , third, city generated dynamically depending upon value of state, there onclick java script event running gets values of corresponding cities in state.

i familiar mechanize module in python. came across several links telling me cannot handle dynamic content in mechanize. link http://toddhayton.com/2014/12/08/form-handling-with-mechanize-and-beautifulsoup/ in section "adding item dynamically" states can use mechanize handle dynamic content did not understand line of code in it

item = item(br.form.find_control(name='searchauxcountryid'),{'contents': '3', 'value': '3', 'label': 3})

what "item" in line of code corresponding city field in form. came across selenium module might me handling dynamic drop down list. not able find in documentation or blog on how use it.

can 1 suggest me how submit form different models, states , cities? links on how solve problem appreciated. sample code in python on how submit form helpful. in advance.

if @ request being sent site in developer tools, you'll see post sent select state. response sent has form values in city dropdown populated.

so, replicate in script want following:

  • open page
  • select form
  • select values model , state
  • submit form
  • select form response sent back
  • select value city (it should populated now)
  • submit form
  • parse response table of results

that like:

#!/usr/bin/env python                                                                                                                                                                  import re import mechanize  bs4 import beautifulsoup  def select_form(form):     return form.attrs.get('id', none) == 'form1'  def get_state_items(browser):     browser.select_form(predicate=select_form)     ctl = browser.form.find_control('ctl00$contentplaceholder1$ddlstate')     state_items = ctl.get_items()     return state_items[1:]  def get_city_items(browser):     browser.select_form(predicate=select_form)     ctl = browser.form.find_control('ctl00$contentplaceholder1$ddlcity')     city_items = ctl.get_items()     return city_items[1:]  br = mechanize.browser() br.open('http://www.marutisuzuki.com/maruti-price.aspx')     br.select_form(predicate=select_form) br.form['ctl00$contentplaceholder1$ddlmodel'] = ['ak'] # model = maruti suzuki alto k10                                                                                                state in get_state_items(br):     # 1 - submit form state.name cities state                                                                                                                         br.select_form(predicate=select_form)     br.form['ctl00$contentplaceholder1$ddlstate'] = [ state.name ]     br.submit()      # 2 - city dropdown filled state.name                                                                                                                                  city in get_city_items(br):         br.select_form(predicate=select_form)         br.form['ctl00$contentplaceholder1$ddlcity'] = [ city.name ]         br.submit()          s = beautifulsoup(br.response().read())         t = s.find('table', id='contentplaceholder1_dtdealer')         r = re.compile(r'^contentplaceholder1_dtdealer_lblname_\d+$')          header_printed = false         p in t.findall('span', id=r):             tr = p.findparent('tr')             td = tr.findall('td')              if header_printed false:                 str = '%s, %s' % (city.attrs['label'], state.attrs['label'])                 print str                 print '-' * len(str)                 header_printed = true              print ' '.join(['%s' % x.text.strip() x in td]) 

Comments

Popular posts from this blog

python - TypeError: start must be a integer -

c# - DevExpress RepositoryItemComboBox BackColor property ignored -

django - Creating multiple model instances in DRF3 -