python - Optimizing py2neo's cypher insertion -
i using py2neo import several hundred thousand nodes. i've created defaultdict map neighborhoods cities. 1 motivation more efficiently import these relationships having been unsuccessful neo4j's load tool.
because batch documentation suggests avoid using it, veered away implementation op of this post. instead documentation suggests use cypher. however, being able create nodes defaultdict have created. plus, found difficult importing information first link demonstrates.
to reduce speed of import, should create cypher transaction (and submit every 10,00) instead of following loop?
for city_name, neighborhood_names in city_neighborhood_map.iteritems(): city_node = graph.find_one(label="city", property_key="name", property_value=city_name) neighborhood_name in neighborhood_names: neighborhood_node = node("neighborhood", name=neighborhood_name) rel = relationship(neighborhood_node, "in", city_node) graph.create(rel)
i time-out, , appears pretty slow when following. when break transaction commits every 1,000 neighborhoods, still processes slowly.
tx = graph.cypher.begin() statement = "merge (city {name:{city_name}}) create (neighborhood { name : {neighborhood_name}}) create (neighborhood)-[:in]->(city)" city_name, neighborhood_names in city_neighborhood_map.iteritems(): neighborhood_name in neighborhood_names: tx.append(statement, {"city_name": city_name, "neighborhood_name": neighborhood_name}) tx.commit()
it fantastic save pointers each city don't need each time merge.
it may faster in 2 runs, i.e. create
nodes first unique constraints (which should fast) , create
relationships in second round.
constraints first, use labels city
, neighborhood
, faster match
later:
graph.schema.create_uniqueness_constraint('city', 'name') graph.schema.create_uniqueness_constraint('neighborhood', 'name')
create nodes:
tx = graph.cypher.begin() statement = "create (:city {name: {name}})" city_name in city_neighborhood_map.keys(): tx.append(statement, {"name": city_name}) statement = "create (:neighborhood {name: {name}})" neighborhood_name in neighborhood_names: # neighborhood names tx.append(statement, {name: neighborhood_name}) tx.commit()
relationships should fast (fast match
due constraints/index):
tx = graph.cypher.begin() statement = "match (city:city {name: {city_name}}), match (n:neighborhood {name: {neighborhood_name}}) create (n)-[:in]->(city)" city_name, neighborhood_names in city_neighborhood_map.iteritems(): neighborhood_name in neighborhood_names: tx.append(statement, {"city_name": city_name, "neighborhood_name": neighborhood_name}) tx.commit()
Comments
Post a Comment