csv - Grouping Data first by frequency, then by category in python -


i have large csv file, log of caller data.

an example of file:

companyname    high priority     qualityissue customer1         yes             user customer1         yes             user customer2         no              user customer3         no              equipment customer1         no              neither customer3         no              user customer3         yes             user customer3         yes             equipment customer4         no              user 

my code can sort data can top caller.

however, need next find caller count of types of calls. format csv below:

top calling customers,         equipment,    user,    neither, customer 3,                      2,           2,        0, customer 1,                      0,           2,        1, customer 2,                      0,           1,        0, customer 4,                      0,           1,        0, 

i've tried sorts of combinations of groupby, counters, , loops cannot life of me past first column.

here code have sort top calling customers:

 data = pandas.read_csv('copy of heat data.csv', delimiter =',')  topcustomercallers = data['companyname'].value_counts() 

however, original issue remains, have use topcustomercallers count qualityissue , sort it. hope question makes sense.

edit: took out example file irrelevant information in , added new example. took out previous 70 lines of code , replaced 2 liner figured out after asking question.

edit: more example data. real data on 5000 rows long , goes column aa i'm interested in frequency of customer , types of calls. example data

from collections import defaultdict, ordereddict  counts = defaultdict(lambda: {"user": 0, "equipment": 0, "neither": 0}) open('filename.tsv', 'rb') fh:      reader = csv.reader(fh, delimiter='\t') # assuming it's formatted example above     row in reader:          company, calltype = row[0], row[2]         counts[company][calltype] += 1 

at point, have looks this:

in [14]: dict(counts) out[14]: {'customer1': {'equipment': 0, 'neither': 1, 'user': 2},  'customer2': {'equipment': 0, 'neither': 0, 'user': 1},  'customer3': {'equipment': 2, 'neither': 0, 'user': 2},  'customer4': {'equipment': 0, 'neither': 0, 'user': 1}} 

depending on how want output structured, might easy call csv.dictwriter, or might want leverage collections.ordereddict sort items before writing them.

edit: instance, turn defaultdict ordereddict, do:

sorted_counts = ordereddict(sorted(counts.iteritems(), key=lambda counts_tup: sum(counts_tup[1].values()))) 

Comments

Popular posts from this blog

python - TypeError: start must be a integer -

c# - DevExpress RepositoryItemComboBox BackColor property ignored -

django - Creating multiple model instances in DRF3 -