python - Generate pairings within World Cup tournament groups -


i put data 2015 fifa women's world cup:

import pandas pd  df = pd.dataframe({     'team':['germany','usa','france','japan','sweden','england','brazil','canada','australia','norway','netherlands','spain',        'china','new zealand','south korea','switzerland','mexico','colombia','thailand','nigeria','ecuador','ivory coast','cameroon','costa rica'],     'group':['b','d','f','c','d','f','e','a','d','b','a','e','a','a','e','c','f','f','b','d','c','b','c','e'],     'fifascore':[2168,2158,2103,2066,2008,2001,1984,1969,1968,1933,1919,1867,1847,1832,1830,1813,1748,1692,1651,1633,1485,1373,1455,1589],     'ftescore':[95.6,95.4,92.4,92.7,91.6,89.6,92.2,90.1,88.7,88.7,86.2,84.7,85.2,82.5,84.3,83.7,81.1,78.0,68.0,85.7,63.3,75.6,79.3,72.8]     })  df.groupby(['group', 'team']).mean() 

output

now generate new dataframe contains 6 possible pairings or matches within each group df, in format like:

group    team1        team2        canada       china        canada       netherlands        canada       new zealand        china        netherlands        china        new zealand        netherlands  new zealand b        germany      ivory coast b        germany      norway ...      

what concise , clean way this? can bunch of loops through each group , team, feel there should cleaner vectorized way pandas , split-apply-combine paradigm.

edit: welcome r answers, think it'd interesting compare between r , pandas ways here. added r tag.

here's data in r form, requested in comments:

team <- c('germany','usa','france','japan','sweden','england','brazil','canada','australia','norway','netherlands','spain',       'china','new zealand','south korea','switzerland','mexico','colombia','thailand','nigeria','ecuador','ivory coast','cameroon','costa rica') group <- c('b','d','f','c','d','f','e','a','d','b','a','e','a','a','e','c','f','f','b','d','c','b','c','e') fifascore <- c(2168,2158,2103,2066,2008,2001,1984,1969,1968,1933,1919,1867,1847,1832,1830,1813,1748,1692,1651,1633,1485,1373,1455,1589) ftescore <- c(95.6,95.4,92.4,92.7,91.6,89.6,92.2,90.1,88.7,88.7,86.2,84.7,85.2,82.5,84.3,83.7,81.1,78.0,68.0,85.7,63.3,75.6,79.3,72.8)  df <- data.frame(team, group, fifascore, ftescore) 

here's two-line solution:

import itertools  grpname,grpteams in df.groupby('group')['team']:     # no need use grpteams.tolist() convert pandas series python list     print list(itertools.combinations(grpteams, 2))  [('canada', 'netherlands'), ('canada', 'china'), ('canada', 'new zealand'), ('netherlands', 'china'), ('netherlands', 'new zealand'), ('china', 'new zealand')] [('germany', 'norway'), ('germany', 'thailand'), ('germany', 'ivory coast'), ('norway', 'thailand'), ('norway', 'ivory coast'), ('thailand', 'ivory coast')] [('japan', 'switzerland'), ('japan', 'ecuador'), ('japan', 'cameroon'), ('switzerland', 'ecuador'), ('switzerland', 'cameroon'), ('ecuador', 'cameroon')] [('usa', 'sweden'), ('usa', 'australia'), ('usa', 'nigeria'), ('sweden', 'australia'), ('sweden', 'nigeria'), ('australia', 'nigeria')] [('brazil', 'spain'), ('brazil', 'south korea'), ('brazil', 'costa rica'), ('spain', 'south korea'), ('spain', 'costa rica'), ('south korea', 'costa rica')] [('france', 'england'), ('france', 'mexico'), ('france', 'colombia'), ('england', 'mexico'), ('england', 'colombia'), ('mexico', 'colombia')] 

explanation:

first teamlist of teams within each group using df.groupby('group') , iterate through , accessing 'team' series, list of 4 teams within each group:

for grpname,grpteams in df.groupby('group')['team']:     teamlist = grpteams.tolist() ...  ['canada', 'netherlands', 'china', 'new zealand'] ['germany', 'norway', 'thailand', 'ivory coast'] ['japan', 'switzerland', 'ecuador', 'cameroon'] ['usa', 'sweden', 'australia', 'nigeria'] ['brazil', 'spain', 'south korea', 'costa rica'] ['france', 'england', 'mexico', 'colombia'] 

then generate all-play-all list of tuples of teams. david arenburg's post reminded me use itertools.combinations(..., 2). have used generator or nested for-loops:

def all_play_all(teams):   team1 in teams:     team2 in teams:       if team1 < team2: # [note] don't need generate indices index teamlist, use direct string comparison         yield (team1,team2)  >>> [match match in all_play_all(grpteams)] [('france', 'mexico'), ('england', 'france'), ('england', 'mexico'), ('colombia', 'france'), ('colombia', 'england'), ('colombia', 'mexico')] 

note we're taking shortcut first generating possible tuples of indices, using index teamlist:

>>> t = len(teamlist) + 1 >>> [(i,j) in range(t) j in range(t) if i<j] [(0, 1), (0, 2), (0, 3), (1, 2), (1, 3), (2, 3)] 

(note: if had used approach of directly comparing team names, have slight side-effect of resorting (alphabetically) group names (they sorted seeding, not alphabetically), e.g. 'china' < 'netherlands', pairing show ('netherlands','china') not ('china',netherlands'))


Comments

Popular posts from this blog

python - TypeError: start must be a integer -

c# - DevExpress RepositoryItemComboBox BackColor property ignored -

django - Creating multiple model instances in DRF3 -