numpy - Applying transformations to dataframes with multi-level indices in Python's pandas -

- May 15, 2010

i'm trying apply simple functions numeric data in pandas. data set of matrices indexed time. wanted use hierarchical/multilevel indices represent , use split-apply-combine operation group data, apply operation, , summarize result dataframe. i'd result of these operations dataframes , not series objects.

below simple example 2 matrices (two time points) represented multi level dataframe. want subtract matrix each time point, collapse data taking mean, , dataframe preserves original column names of data.

everything try either fails or gives odd result. tried follow http://pandas.pydata.org/pandas-docs/stable/groupby.html since split-apply-combine operation, think, documentation hard understand , examples dense.

how can achieved in pandas? annotated code fails along relevant lines:

import pandas import numpy np  t1 = pandas.dataframe([[0, 0, 0],                        [0, 1, 1],                        [5, 5, 5]], columns=[1, 2, 3], index=["a", "b", "c"]) t2 = pandas.dataframe([[10, 10, 30],                        [5, 1, 1],                        [2, 2, 2]], columns=[1, 2, 3], index=["a", "b", "c"]) m = np.ones([3,3]) c = pandas.concat([t1, t2], keys=["t1", "t2"], names=["time", "name"]) #print "c: ", c  # how view 'time' column values? #print c.ix["time"]  # fails #print c["time"] # fails  # how group matrix time, subtract value each matrix, , # take mean across columns , dataframe back? result = c.groupby(level="time").apply(lambda x: np.mean(x - m, axis=1))  # why 'result' appear have 2 "time" columns?! print result  # why 'result' series , not dataframe? print type(result)  # attempt dataframe df = pandas.dataframe(result)  # why 'df' have weird '0' outer (hierarchical) column?? print df #                         0 # time time name # t1   t1       -1.000000 #           b     -0.333333 #           c      4.000000 # t2   t2       15.666667 #           b      1.333333 #           c      1.000000

in short, operation i'd is:

for each time point:   subtract m time point matrix   collapse result matrix across columns taking mean (preserving row labels "a", "b", "c" return result dataframe

how view 'time' column values?

in [11]: c.index.levels[0].values out[11]: array(['t1', 't2'], dtype=object)

how group matrix time, subtract value each matrix, , take mean across columns , dataframe back?

your attempt pretty close:

in [46]: c.groupby(level='time').apply(lambda x: x - m).mean(axis=1) out[46]:  time  name t1          -1.000000       b       -0.333333       c        4.000000 t2          15.666667       b        1.333333       c        1.000000 dtype: float64

Search This Blog

Soju

numpy - Applying transformations to dataframes with multi-level indices in Python's pandas -

Comments

Post a Comment

Popular posts from this blog

python - TypeError: start must be a integer -

c# - DevExpress RepositoryItemComboBox BackColor property ignored -

django - Creating multiple model instances in DRF3 -