python - Optimizing py2neo's cypher insertion -


i using py2neo import several hundred thousand nodes. i've created defaultdict map neighborhoods cities. 1 motivation more efficiently import these relationships having been unsuccessful neo4j's load tool.

because batch documentation suggests avoid using it, veered away implementation op of this post. instead documentation suggests use cypher. however, being able create nodes defaultdict have created. plus, found difficult importing information first link demonstrates.

to reduce speed of import, should create cypher transaction (and submit every 10,00) instead of following loop?

for city_name, neighborhood_names in city_neighborhood_map.iteritems():      city_node = graph.find_one(label="city", property_key="name", property_value=city_name)          neighborhood_name in neighborhood_names:               neighborhood_node = node("neighborhood", name=neighborhood_name)               rel = relationship(neighborhood_node, "in", city_node)               graph.create(rel) 

i time-out, , appears pretty slow when following. when break transaction commits every 1,000 neighborhoods, still processes slowly.

tx = graph.cypher.begin() statement = "merge (city {name:{city_name}}) create (neighborhood { name : {neighborhood_name}}) create (neighborhood)-[:in]->(city)" city_name, neighborhood_names in city_neighborhood_map.iteritems():     neighborhood_name in neighborhood_names:         tx.append(statement, {"city_name": city_name, "neighborhood_name": neighborhood_name}) tx.commit() 

it fantastic save pointers each city don't need each time merge.

it may faster in 2 runs, i.e. create nodes first unique constraints (which should fast) , create relationships in second round.

constraints first, use labels city , neighborhood, faster match later:

graph.schema.create_uniqueness_constraint('city', 'name') graph.schema.create_uniqueness_constraint('neighborhood', 'name') 

create nodes:

tx = graph.cypher.begin()  statement = "create (:city {name: {name}})" city_name in city_neighborhood_map.keys():     tx.append(statement, {"name": city_name})  statement = "create (:neighborhood {name: {name}})" neighborhood_name in neighborhood_names: # neighborhood names     tx.append(statement, {name: neighborhood_name})  tx.commit() 

relationships should fast (fast match due constraints/index):

tx = graph.cypher.begin() statement = "match (city:city {name: {city_name}}), match (n:neighborhood {name: {neighborhood_name}}) create (n)-[:in]->(city)" city_name, neighborhood_names in city_neighborhood_map.iteritems():     neighborhood_name in neighborhood_names:         tx.append(statement, {"city_name": city_name, "neighborhood_name": neighborhood_name})  tx.commit() 

Comments

Popular posts from this blog

python - TypeError: start must be a integer -

c# - DevExpress RepositoryItemComboBox BackColor property ignored -

django - Creating multiple model instances in DRF3 -