python - Generate pairings within World Cup tournament groups -
i put data 2015 fifa women's world cup:
import pandas pd df = pd.dataframe({ 'team':['germany','usa','france','japan','sweden','england','brazil','canada','australia','norway','netherlands','spain', 'china','new zealand','south korea','switzerland','mexico','colombia','thailand','nigeria','ecuador','ivory coast','cameroon','costa rica'], 'group':['b','d','f','c','d','f','e','a','d','b','a','e','a','a','e','c','f','f','b','d','c','b','c','e'], 'fifascore':[2168,2158,2103,2066,2008,2001,1984,1969,1968,1933,1919,1867,1847,1832,1830,1813,1748,1692,1651,1633,1485,1373,1455,1589], 'ftescore':[95.6,95.4,92.4,92.7,91.6,89.6,92.2,90.1,88.7,88.7,86.2,84.7,85.2,82.5,84.3,83.7,81.1,78.0,68.0,85.7,63.3,75.6,79.3,72.8] }) df.groupby(['group', 'team']).mean()
now generate new dataframe contains 6 possible pairings or matches within each group
df
, in format like:
group team1 team2 canada china canada netherlands canada new zealand china netherlands china new zealand netherlands new zealand b germany ivory coast b germany norway ...
what concise , clean way this? can bunch of loops through each group
, team
, feel there should cleaner vectorized way pandas
, split-apply-combine paradigm.
edit: welcome r answers, think it'd interesting compare between r , pandas ways here. added r
tag.
here's data in r form, requested in comments:
team <- c('germany','usa','france','japan','sweden','england','brazil','canada','australia','norway','netherlands','spain', 'china','new zealand','south korea','switzerland','mexico','colombia','thailand','nigeria','ecuador','ivory coast','cameroon','costa rica') group <- c('b','d','f','c','d','f','e','a','d','b','a','e','a','a','e','c','f','f','b','d','c','b','c','e') fifascore <- c(2168,2158,2103,2066,2008,2001,1984,1969,1968,1933,1919,1867,1847,1832,1830,1813,1748,1692,1651,1633,1485,1373,1455,1589) ftescore <- c(95.6,95.4,92.4,92.7,91.6,89.6,92.2,90.1,88.7,88.7,86.2,84.7,85.2,82.5,84.3,83.7,81.1,78.0,68.0,85.7,63.3,75.6,79.3,72.8) df <- data.frame(team, group, fifascore, ftescore)
here's two-line solution:
import itertools grpname,grpteams in df.groupby('group')['team']: # no need use grpteams.tolist() convert pandas series python list print list(itertools.combinations(grpteams, 2)) [('canada', 'netherlands'), ('canada', 'china'), ('canada', 'new zealand'), ('netherlands', 'china'), ('netherlands', 'new zealand'), ('china', 'new zealand')] [('germany', 'norway'), ('germany', 'thailand'), ('germany', 'ivory coast'), ('norway', 'thailand'), ('norway', 'ivory coast'), ('thailand', 'ivory coast')] [('japan', 'switzerland'), ('japan', 'ecuador'), ('japan', 'cameroon'), ('switzerland', 'ecuador'), ('switzerland', 'cameroon'), ('ecuador', 'cameroon')] [('usa', 'sweden'), ('usa', 'australia'), ('usa', 'nigeria'), ('sweden', 'australia'), ('sweden', 'nigeria'), ('australia', 'nigeria')] [('brazil', 'spain'), ('brazil', 'south korea'), ('brazil', 'costa rica'), ('spain', 'south korea'), ('spain', 'costa rica'), ('south korea', 'costa rica')] [('france', 'england'), ('france', 'mexico'), ('france', 'colombia'), ('england', 'mexico'), ('england', 'colombia'), ('mexico', 'colombia')]
explanation:
first teamlist of teams within each group using df.groupby('group')
, iterate through , accessing 'team' series, list of 4 teams within each group:
for grpname,grpteams in df.groupby('group')['team']: teamlist = grpteams.tolist() ... ['canada', 'netherlands', 'china', 'new zealand'] ['germany', 'norway', 'thailand', 'ivory coast'] ['japan', 'switzerland', 'ecuador', 'cameroon'] ['usa', 'sweden', 'australia', 'nigeria'] ['brazil', 'spain', 'south korea', 'costa rica'] ['france', 'england', 'mexico', 'colombia']
then generate all-play-all list of tuples of teams. david arenburg's post reminded me use itertools.combinations(..., 2)
. have used generator or nested for-loops:
def all_play_all(teams): team1 in teams: team2 in teams: if team1 < team2: # [note] don't need generate indices index teamlist, use direct string comparison yield (team1,team2) >>> [match match in all_play_all(grpteams)] [('france', 'mexico'), ('england', 'france'), ('england', 'mexico'), ('colombia', 'france'), ('colombia', 'england'), ('colombia', 'mexico')]
note we're taking shortcut first generating possible tuples of indices, using index teamlist:
>>> t = len(teamlist) + 1 >>> [(i,j) in range(t) j in range(t) if i<j] [(0, 1), (0, 2), (0, 3), (1, 2), (1, 3), (2, 3)]
(note: if had used approach of directly comparing team names, have slight side-effect of resorting (alphabetically) group names (they sorted seeding, not alphabetically), e.g. 'china' < 'netherlands', pairing show ('netherlands','china') not ('china',netherlands'))
Comments
Post a Comment