numpy - Applying transformations to dataframes with multi-level indices in Python's pandas -
i'm trying apply simple functions numeric data in pandas. data set of matrices indexed time. wanted use hierarchical/multilevel indices represent , use split-apply-combine operation group data, apply operation, , summarize result dataframe. i'd result of these operations dataframes , not series objects.
below simple example 2 matrices (two time points) represented multi level dataframe. want subtract matrix each time point, collapse data taking mean, , dataframe preserves original column names of data.
everything try either fails or gives odd result. tried follow http://pandas.pydata.org/pandas-docs/stable/groupby.html since split-apply-combine operation, think, documentation hard understand , examples dense.
how can achieved in pandas? annotated code fails along relevant lines:
import pandas import numpy np t1 = pandas.dataframe([[0, 0, 0], [0, 1, 1], [5, 5, 5]], columns=[1, 2, 3], index=["a", "b", "c"]) t2 = pandas.dataframe([[10, 10, 30], [5, 1, 1], [2, 2, 2]], columns=[1, 2, 3], index=["a", "b", "c"]) m = np.ones([3,3]) c = pandas.concat([t1, t2], keys=["t1", "t2"], names=["time", "name"]) #print "c: ", c # how view 'time' column values? #print c.ix["time"] # fails #print c["time"] # fails # how group matrix time, subtract value each matrix, , # take mean across columns , dataframe back? result = c.groupby(level="time").apply(lambda x: np.mean(x - m, axis=1)) # why 'result' appear have 2 "time" columns?! print result # why 'result' series , not dataframe? print type(result) # attempt dataframe df = pandas.dataframe(result) # why 'df' have weird '0' outer (hierarchical) column?? print df # 0 # time time name # t1 t1 -1.000000 # b -0.333333 # c 4.000000 # t2 t2 15.666667 # b 1.333333 # c 1.000000
in short, operation i'd is:
for each time point: subtract m time point matrix collapse result matrix across columns taking mean (preserving row labels "a", "b", "c" return result dataframe
how view 'time' column values?
in [11]: c.index.levels[0].values out[11]: array(['t1', 't2'], dtype=object)
how group matrix time, subtract value each matrix, , take mean across columns , dataframe back?
your attempt pretty close:
in [46]: c.groupby(level='time').apply(lambda x: x - m).mean(axis=1) out[46]: time name t1 -1.000000 b -0.333333 c 4.000000 t2 15.666667 b 1.333333 c 1.000000 dtype: float64
Comments
Post a Comment