making multi-index in pandas dataframes in Python? -
i have data set there matrix of numeric values indexed time variable. each matrix numpy array (that can converted dataframe columns corresponding columns of matrix). if have these matrices how can make them single dataframe each matrix has time index? specifically:
# time t1 d1 = pandas.dataframe({"a": [1,2,3,4], "b":[10,20,30,40]}) # time t2 d2 = pandas.dataframe({"a": [10,20,30,40], "b": [1,2,5,6]}) # time t3 d3 = ...
i want make index called "time" index these dataframes, , aggregate values columns "a" , "b" across time index. how can in pandas?
my attempt:
d=pandas.dataframe([d1,d2],index=(0, 1),columns=["time"])
update: unutbu's solution adding 2 hierarchical columns is:
c = pd.concat([d1, d2], keys=[('t1', 'p1'), ('t2', 'p2')], names=['time', 'position'])
my final question how access resulting structure? example how do vectorized operations across time
, or across position
? eg take average of rows each value of time
.
also, how compare encoding time
, position
each dataframe , using groupby
? in other words when use levels versus flat columns grouped? here's alternative solution using flat dataframe groupby:
d1["time"] = 1 d1["position"] = "x" d2["time"] = 2 d2["position"] = "y" c = pandas.concat([d1, d2]) # take mean time values c.groupby("time").apply(lambda x: np.mean(x, axis=1))
given
import pandas pd d1 = pd.dataframe({"a": [1,2,3,4], "b":[10,20,30,40]}) d2 = pd.dataframe({"a": [10,20,30,40], "b": [1,2,5,6]})
then pd.concat([d1, d2], keys=['t1', 't2'])
returns:
in [177]: pd.concat([d1, d2], keys=['t1', 't2']) out[177]: b t1 0 1 10 1 2 20 2 3 30 3 4 40 t2 0 10 1 1 20 2 2 30 5 3 40 6
if wish add more 1 level new multiindex, can instead pass list of tuples keys
parameter:
in [237]: pd.concat([d1, d2], keys=[('t1', 'p1'), ('t2', 'p2')], names=['time', 'position']) out[237]: b time position t1 p1 0 1 10 1 2 20 2 3 30 3 4 40 t2 p2 0 10 1 1 20 2 2 30 5 3 40 6
note, important here keys
receives list of tuples, rather list of lists.
Comments
Post a Comment