Los pandas de Python calculan el material rodante beta utilizando la aplicación de balanceo al objeto groupby en forma vectorializada

Tengo un dataframe grande, df, que contiene 4 columnas:

id period ret_1m mkt_ret_1m 131146 CAN00WG0 199609 -0.1538 0.047104 133530 CAN00WG0 199610 -0.0455 -0.014143 135913 CAN00WG0 199611 0.0000 0.040926 138334 CAN00WG0 199612 0.2952 0.008723 140794 CAN00WG0 199701 -0.0257 0.039916 143274 CAN00WG0 199702 -0.0038 -0.025442 145754 CAN00WG0 199703 -0.2992 -0.049279 148246 CAN00WG0 199704 -0.0919 -0.005948 150774 CAN00WG0 199705 0.0595 0.122322 153318 CAN00WG0 199706 -0.0337 0.045765 id period ret_1m mkt_ret_1m 160980 CAN00WH0 199709 0.0757 0.079293 163569 CAN00WH0 199710 -0.0741 -0.044000 166159 CAN00WH0 199711 0.1000 -0.014644 168782 CAN00WH0 199712 -0.0909 -0.007072 171399 CAN00WH0 199801 -0.0100 0.001381 174022 CAN00WH0 199802 0.1919 0.081924 176637 CAN00WH0 199803 0.0085 0.050415 179255 CAN00WH0 199804 -0.0168 0.018393 181880 CAN00WH0 199805 0.0427 -0.051279 184516 CAN00WH0 199806 -0.0656 -0.011516 id period ret_1m mkt_ret_1m 143275 CAN00WO0 199702 -0.1176 -0.025442 145755 CAN00WO0 199703 -0.0074 -0.049279 148247 CAN00WO0 199704 -0.0075 -0.005948 150775 CAN00WO0 199705 0.0451 0.122322 

etc.

Estoy intentando calcular una medida financiera común, conocida como beta, utilizando una función, que toma dos de las columnas, ret_1m, el stock_return mensual y ret_1m_mkt, el rendimiento de 1 mes del mercado para el mismo período (period_id). Quiero aplicar una función (calc_beta) para calcular el resultado de 12 meses de esta función de forma continua durante 12 meses.

Para hacer esto, estoy creando un objeto groupby:

 grp = df.groupby('id') 

Lo que me gustaría hacer es usar algo como:

 period = 12 for stock, sub_df in grp: arg = sub_df[['ret_1m', 'mkt_ret_1m']] beta = pd.rolling_apply(arg, period, calc_beta, min_periods = period) 

Ahora, aquí está el primer problema. Según la documentación, pd.rolling_apply arg puede ser una serie o un dataframe. Sin embargo, parece que el dataframe que proporciono se convierte en una matriz numpy que solo puede contener una columna de datos, en lugar de los dos que he intentado suministrar. Por lo tanto, mi código de abajo para calc_beta no funcionará, ya que necesito aprobar los resultados de las acciones y del mercado:

 def calc_beta(np_array) s = np_array[:,0] # stock returns are column zero from numpy array m = np_array[:,1] # market returns are column one from numpy array covariance = np.cov(s,m) # Calculate covariance between stock and market beta = covariance[0,1]/covariance[1,1] return beta 

Así que mis preguntas son las siguientes, creo que tiene sentido enumerarlas de esta manera:

 (i) How can I pass a data frame/multiple series/numpy array with more than one column to calc_beta using rolling_apply? (ii) How can I return more than one value (eg the beta) from the calc_beta function? (iii) Having calculated rolling quantities, how can I recombined with the original dataframe df so that I have the rolling quantities corresponding to the correct date in the period column? (iv) Is there a better (vectorized) way of achieving this? I have seen some similar questions using eg df.apply(pd.rolling_apply,period,??) but I did not understand how these worked. 

Supongo que rolling_apply anteriormente no podía manejar marcos de datos, pero la documentación sugiere que ahora puede hacerlo. Mis pandas. La versión es 0.16.1.

¡Gracias por cualquier ayuda! He perdido 1,5 días tratando de resolver esto y estoy totalmente perplejo.

En última instancia, lo que quiero es algo como esto:

  id period ret_1m mkt_ret_1m beta other_quantities 131146 CAN00WG0 199609 -0.1538 0.047104 0.521 xxx 133530 CAN00WG0 199610 -0.0455 -0.014143 0.627 xxxx 135913 CAN00WG0 199611 0.0000 0.040926 0.341 xxx 138334 CAN00WG0 199612 0.2952 0.008723 0.567 xx 140794 CAN00WG0 199701 -0.0257 0.039916 0.4612 xxx 143274 CAN00WG0 199702 -0.0038 -0.025442 0.215 xxx 145754 CAN00WG0 199703 -0.2992 -0.049279 0.4678 xxx 148246 CAN00WG0 199704 -0.0919 -0.005948 -0.4225 xxx 150774 CAN00WG0 199705 0.0595 0.122322 0.780 xxx 153318 CAN00WG0 199706 -0.0337 0.045765 0.623 xxx id period ret_1m mkt_ret_1m beta other_quantities 160980 CAN00WH0 199709 0.0757 0.079293 -0.913 xx 163569 CAN00WH0 199710 -0.0741 -0.044000 0.894 xxx 166159 CAN00WH0 199711 0.1000 -0.014644 0.563 xxx 168782 CAN00WH0 199712 -0.0909 -0.007072 0.734 xxx 171399 CAN00WH0 199801 -0.0100 0.001381 0.894 xxxx 174022 CAN00WH0 199802 0.1919 0.081924 0.789 xx 176637 CAN00WH0 199803 0.0085 0.050415 0.1563 xxxx 179255 CAN00WH0 199804 -0.0168 0.018393 -0.64 xxxx 181880 CAN00WH0 199805 0.0427 -0.051279 -0.742 xxx 184516 CAN00WH0 199806 -0.0656 -0.011516 0.925 xxx id period ret_1m mkt_ret_1m beta 143275 CAN00WO0 199702 -0.1176 -0.025442 -1.52 xx 145755 CAN00WO0 199703 -0.0074 -0.049279 -0.632 xxx 148247 CAN00WO0 199704 -0.0075 -0.005948 1.521 xx 150775 CAN00WO0 199705 0.0451 0.122322 0.0321 xxx 

etc.

Supongo que pd.rolling_apply no ayuda en este caso, ya que me parece que esencialmente solo toma una Series (incluso si se pasa un dataframe, se procesa una columna cada vez). Pero siempre puedes escribir tu propia aplicación rolling que toma un dataframe.

 import pandas as pd import numpy as np from StringIO import StringIO df = pd.read_csv(StringIO(''' id period ret_1m mkt_ret_1m 131146 CAN00WG0 199609 -0.1538 0.047104 133530 CAN00WG0 199610 -0.0455 -0.014143 135913 CAN00WG0 199611 0.0000 0.040926 138334 CAN00WG0 199612 0.2952 0.008723 140794 CAN00WG0 199701 -0.0257 0.039916 143274 CAN00WG0 199702 -0.0038 -0.025442 145754 CAN00WG0 199703 -0.2992 -0.049279 148246 CAN00WG0 199704 -0.0919 -0.005948 150774 CAN00WG0 199705 0.0595 0.122322 153318 CAN00WG0 199706 -0.0337 0.045765 160980 CAN00WH0 199709 0.0757 0.079293 163569 CAN00WH0 199710 -0.0741 -0.044000 166159 CAN00WH0 199711 0.1000 -0.014644 168782 CAN00WH0 199712 -0.0909 -0.007072 171399 CAN00WH0 199801 -0.0100 0.001381 174022 CAN00WH0 199802 0.1919 0.081924 176637 CAN00WH0 199803 0.0085 0.050415 179255 CAN00WH0 199804 -0.0168 0.018393 181880 CAN00WH0 199805 0.0427 -0.051279 184516 CAN00WH0 199806 -0.0656 -0.011516 143275 CAN00WO0 199702 -0.1176 -0.025442 145755 CAN00WO0 199703 -0.0074 -0.049279 148247 CAN00WO0 199704 -0.0075 -0.005948 150775 CAN00WO0 199705 0.0451 0.122322'''), sep='\s+') def calc_beta(df): np_array = df.values s = np_array[:,0] # stock returns are column zero from numpy array m = np_array[:,1] # market returns are column one from numpy array covariance = np.cov(s,m) # Calculate covariance between stock and market beta = covariance[0,1]/covariance[1,1] return beta def rolling_apply(df, period, func, min_periods=None): if min_periods is None: min_periods = period result = pd.Series(np.nan, index=df.index) for i in range(1, len(df)+1): sub_df = df.iloc[max(i-period, 0):i,:] #I edited here if len(sub_df) >= min_periods: idx = sub_df.index[-1] result[idx] = func(sub_df) return result df['beta'] = np.nan grp = df.groupby('id') period = 6 #I'm using 6 to see some not NaN values, since sample data don't have longer than 12 groups for stock, sub_df in grp: beta = rolling_apply(sub_df[['ret_1m','mkt_ret_1m']], period, calc_beta, min_periods = period) beta.name = 'beta' df.update(beta) print df 

Salida

  id period ret_1m mkt_ret_1m beta 131146 CAN00WG0 199609 -0.1538 0.047104 NaN 133530 CAN00WG0 199610 -0.0455 -0.014143 NaN 135913 CAN00WG0 199611 0.0000 0.040926 NaN 138334 CAN00WG0 199612 0.2952 0.008723 NaN 140794 CAN00WG0 199701 -0.0257 0.039916 NaN 143274 CAN00WG0 199702 -0.0038 -0.025442 -1.245908 145754 CAN00WG0 199703 -0.2992 -0.049279 2.574464 148246 CAN00WG0 199704 -0.0919 -0.005948 2.657887 150774 CAN00WG0 199705 0.0595 0.122322 1.371090 153318 CAN00WG0 199706 -0.0337 0.045765 1.494095 ... ... ... ... ... ... 171399 CAN00WH0 199801 -0.0100 0.001381 NaN 174022 CAN00WH0 199802 0.1919 0.081924 1.542782 176637 CAN00WH0 199803 0.0085 0.050415 1.605407 179255 CAN00WH0 199804 -0.0168 0.018393 1.571015 181880 CAN00WH0 199805 0.0427 -0.051279 1.139972 184516 CAN00WH0 199806 -0.0656 -0.011516 1.101890 143275 CAN00WO0 199702 -0.1176 -0.025442 NaN 145755 CAN00WO0 199703 -0.0074 -0.049279 NaN 148247 CAN00WO0 199704 -0.0075 -0.005948 NaN 150775 CAN00WO0 199705 0.0451 0.122322 NaN 

Pruebe pd.rolling_cov () y pd.rolling.var () de la siguiente manera:

 import pandas as pd import numpy as np from StringIO import StringIO df = pd.read_csv(StringIO(''' id period ret_1m mkt_ret_1m 131146 CAN00WG0 199609 -0.1538 0.047104 133530 CAN00WG0 199610 -0.0455 -0.014143 135913 CAN00WG0 199611 0.0000 0.040926 138334 CAN00WG0 199612 0.2952 0.008723 140794 CAN00WG0 199701 -0.0257 0.039916 143274 CAN00WG0 199702 -0.0038 -0.025442 145754 CAN00WG0 199703 -0.2992 -0.049279 148246 CAN00WG0 199704 -0.0919 -0.005948 150774 CAN00WG0 199705 0.0595 0.122322 153318 CAN00WG0 199706 -0.0337 0.045765 160980 CAN00WH0 199709 0.0757 0.079293 163569 CAN00WH0 199710 -0.0741 -0.044000 166159 CAN00WH0 199711 0.1000 -0.014644 168782 CAN00WH0 199712 -0.0909 -0.007072 171399 CAN00WH0 199801 -0.0100 0.001381 174022 CAN00WH0 199802 0.1919 0.081924 176637 CAN00WH0 199803 0.0085 0.050415 179255 CAN00WH0 199804 -0.0168 0.018393 181880 CAN00WH0 199805 0.0427 -0.051279 184516 CAN00WH0 199806 -0.0656 -0.011516 143275 CAN00WO0 199702 -0.1176 -0.025442 145755 CAN00WO0 199703 -0.0074 -0.049279 148247 CAN00WO0 199704 -0.0075 -0.005948 150775 CAN00WO0 199705 0.0451 0.122322'''), sep='\s+') df['beta'] = pd.rolling_cov(df['ret_1m'], df['mkt_ret_1m'], window=6) / pd.rolling_var(df['mkt_ret_1m'], window=6) print df 

Salida:

  id period ret_1m mkt_ret_1m beta 131146 CAN00WG0 199609 -0.1538 0.047104 NaN 133530 CAN00WG0 199610 -0.0455 -0.014143 NaN 135913 CAN00WG0 199611 0.0000 0.040926 NaN 138334 CAN00WG0 199612 0.2952 0.008723 NaN 140794 CAN00WG0 199701 -0.0257 0.039916 NaN 143274 CAN00WG0 199702 -0.0038 -0.025442 -1.245908 145754 CAN00WG0 199703 -0.2992 -0.049279 2.574464 148246 CAN00WG0 199704 -0.0919 -0.005948 2.657887 150774 CAN00WG0 199705 0.0595 0.122322 1.371090 153318 CAN00WG0 199706 -0.0337 0.045765 1.494095 160980 CAN00WH0 199709 0.0757 0.079293 1.616520 163569 CAN00WH0 199710 -0.0741 -0.044000 1.630411 166159 CAN00WH0 199711 0.1000 -0.014644 0.651220 168782 CAN00WH0 199712 -0.0909 -0.007072 0.652148 171399 CAN00WH0 199801 -0.0100 0.001381 0.724120 174022 CAN00WH0 199802 0.1919 0.081924 1.542782 176637 CAN00WH0 199803 0.0085 0.050415 1.605407 179255 CAN00WH0 199804 -0.0168 0.018393 1.571015 181880 CAN00WH0 199805 0.0427 -0.051279 1.139972 184516 CAN00WH0 199806 -0.0656 -0.011516 1.101890 143275 CAN00WO0 199702 -0.1176 -0.025442 1.372437 145755 CAN00WO0 199703 -0.0074 -0.049279 0.031939 148247 CAN00WO0 199704 -0.0075 -0.005948 -0.535855 150775 CAN00WO0 199705 0.0451 0.122322 0.341747