在 pandas 交叉表中,如何计算加权平均值?以及如何添加行和列的总计?
我有一个带有两个分类变量(在我的示例中为city和colour)的pandas数据框,一个带有百分比的列,一个带有权重的列. 我想做一个城市和颜色的交叉表,显示两者的每种组合的perc加权平均值.
I have a pandas dataframe with two categorical variables (in my example, city and colour), a column with percentages, and one with weights. I want to do a crosstab of city and colour, showing, for each combination of the two, the weighted average of perc.
我设法用下面的代码做到这一点,在这里我首先创建一个权重为x perc的列,然后创建一个权重为(weights x perc)之和的交叉表,再创建一个权重之和的交叉表,最后将其除以第一到第二.
I have managed to do it with the code below, where I first create a column with weights x perc, then one crosstab with the sum of (weights x perc), another crosstab with the sum of weights, then finally divide the first by the second.
它可以工作,但是有更快或更优雅的方法吗?
import pandas as pd
import numpy as np
np.random.seed(123)
df=pd.DataFrame()
myrows=10
df['weight'] = np.random.rand(myrows)*100
np.random.seed(321)
df['perc']=np.random.rand(myrows)
df['weight x perc']=df['weight']*df['perc']
df['colour']=np.where( df['perc']<0.5, 'red','yellow')
np.random.seed(555)
df['city']=np.where( np.random.rand(myrows) <0.5,'NY','LA' )
num=pd.crosstab( df['city'], df['colour'], values=df['weight x perc'], aggfunc='sum', margins=True)
den=pd.crosstab( df['city'], df['colour'], values=df['weight'], aggfunc='sum', margins=True)
out=num/den
print(out)
在此使用带有apply()的groupby和numpy加权平均方法.
Here using a groupby with apply() and using the numpy weighted average method.
df.groupby(['colour','city']).apply(lambda x: np.average(x.perc, weights=x.weight)).unstack(level=0)
给出
colour red yellow
city
LA 0.173870 0.865636
NY 0.077912 0.687400
虽然我的边缘没有全部.
I don't have All on the margin though.
这将产生总计
df.groupby(['colour']).apply(lambda x: np.average(x.perc, weights=x.weight))
df.groupby(['city']).apply(lambda x: np.average(x.perc, weights=x.weight))
所授予的内容仍未打包到单个框架中
Granted still not packaged into a single frame