csv - Grouping Data first by frequency, then by category in python -
i have large csv file, log of caller data.
an example of file:
companyname high priority qualityissue customer1 yes user customer1 yes user customer2 no user customer3 no equipment customer1 no neither customer3 no user customer3 yes user customer3 yes equipment customer4 no user
my code can sort data can top caller.
however, need next find caller count of types of calls. format csv below:
top calling customers, equipment, user, neither, customer 3, 2, 2, 0, customer 1, 0, 2, 1, customer 2, 0, 1, 0, customer 4, 0, 1, 0,
i've tried sorts of combinations of groupby, counters, , loops cannot life of me past first column.
here code have sort top calling customers:
data = pandas.read_csv('copy of heat data.csv', delimiter =',') topcustomercallers = data['companyname'].value_counts()
however, original issue remains, have use topcustomercallers count qualityissue , sort it. hope question makes sense.
edit: took out example file irrelevant information in , added new example. took out previous 70 lines of code , replaced 2 liner figured out after asking question.
edit: more example data. real data on 5000 rows long , goes column aa i'm interested in frequency of customer , types of calls.
from collections import defaultdict, ordereddict counts = defaultdict(lambda: {"user": 0, "equipment": 0, "neither": 0}) open('filename.tsv', 'rb') fh: reader = csv.reader(fh, delimiter='\t') # assuming it's formatted example above row in reader: company, calltype = row[0], row[2] counts[company][calltype] += 1
at point, have looks this:
in [14]: dict(counts) out[14]: {'customer1': {'equipment': 0, 'neither': 1, 'user': 2}, 'customer2': {'equipment': 0, 'neither': 0, 'user': 1}, 'customer3': {'equipment': 2, 'neither': 0, 'user': 2}, 'customer4': {'equipment': 0, 'neither': 0, 'user': 1}}
depending on how want output structured, might easy call csv.dictwriter
, or might want leverage collections.ordereddict
sort items before writing them.
edit: instance, turn defaultdict ordereddict, do:
sorted_counts = ordereddict(sorted(counts.iteritems(), key=lambda counts_tup: sum(counts_tup[1].values())))
Comments
Post a Comment