ValueError: debe pasar DataFrame solo con valores booleanos

Pregunta

En este archivo de datos, los Estados Unidos se dividen en cuatro regiones utilizando la columna “REGIÓN”.

Cree una consulta que encuentre los condados que pertenecen a las regiones 1 o 2, cuyo nombre comienza con ‘Washington’ y cuyo POPESTIMATE2015 fuera mayor que su POPESTIMATE 2014.

Esta función debe devolver un DataFrame 5×2 con las columnas = [‘STNAME’, ‘CTYNAME’] y el mismo ID de índice que census_df (ordenado por índice).

CÓDIGO

def answer_eight(): counties=census_df[census_df['SUMLEV']==50] regions = counties[(counties[counties['REGION']==1]) | (counties[counties['REGION']==2])] washingtons = regions[regions[regions['COUNTY']].str.startswith("Washington")] grew = washingtons[washingtons[washingtons['POPESTIMATE2015']]>washingtons[washingtons['POPESTIMATES2014']]] return grew[grew['STNAME'],grew['COUNTY']] outcome = answer_eight() assert outcome.shape == (5,2) assert list (outcome.columns)== ['STNAME','CTYNAME'] print(tabulate(outcome, headers=["index"]+list(outcome.columns),tablefmt="orgtbl")) 

ERROR

 --------------------------------------------------------------------------- ValueError Traceback (most recent call last)  in () 6 return grew[grew['STNAME'],grew['COUNTY']] 7 ----> 8 outcome = answer_eight() 9 assert outcome.shape == (5,2) 10 assert list (outcome.columns)== ['STNAME','CTYNAME']  in answer_eight() 1 def answer_eight(): 2 counties=census_df[census_df['SUMLEV']==50] ----> 3 regions = counties[(counties[counties['REGION']==1]) | (counties[counties['REGION']==2])] 4 washingtons = regions[regions[regions['COUNTY']].str.startswith("Washington")] 5 grew = washingtons[washingtons[washingtons['POPESTIMATE2015']]>washingtons[washingtons['POPESTIMATES2014']]] /opt/conda/lib/python3.5/site-packages/pandas/core/frame.py in __getitem__(self, key) 1991 return self._getitem_array(key) 1992 elif isinstance(key, DataFrame): -> 1993 return self._getitem_frame(key) 1994 elif is_mi_columns: 1995 return self._getitem_multilevel(key) /opt/conda/lib/python3.5/site-packages/pandas/core/frame.py in _getitem_frame(self, key) 2066 def _getitem_frame(self, key): 2067 if key.values.size and not com.is_bool_dtype(key.values): -> 2068 raise ValueError('Must pass DataFrame with boolean values only') 2069 return self.where(key) 2070 ValueError: Must pass DataFrame with boolean values only 

Soy despistado ¿A dónde me voy mal?

Gracias

Está intentando usar un df de forma diferente para enmascarar su df, esto es incorrecto, además, la forma en que está pasando las condiciones se está utilizando de manera incorrecta. Cuando compara una columna o serie en un df con un escalar para producir una máscara booleana, debe pasar solo la condición, no usarla sucesivamente.

 def answer_eight(): counties=census_df[census_df['SUMLEV']==50] # this is wrong you're passing the df here multiple times regions = counties[(counties[counties['REGION']==1]) | (counties[counties['REGION']==2])] # here you're doing it again washingtons = regions[regions[regions['COUNTY']].str.startswith("Washington")] # here you're doing here again also grew = washingtons[washingtons[washingtons['POPESTIMATE2015']]>washingtons[washingtons['POPESTIMATES2014']]] return grew[grew['STNAME'],grew['COUNTY']] 

usted quiere:

 def answer_eight(): counties=census_df[census_df['SUMLEV']==50] regions = counties[(counties['REGION']==1]) | (counties['REGION']==2])] washingtons = regions[regions['COUNTY'].str.startswith("Washington")] grew = washingtons[washingtons['POPESTIMATE2015']>washingtons['POPESTIMATES2014']] return grew[['STNAME','COUNTY']] 
 def answer_eight(): df=census_df[census_df['SUMLEV']==50] #df=census_df df=df[(df['REGION']==1) | (df['REGION']==2)] df=df[df['CTYNAME'].str.startswith('Washington')] df=df[df['POPESTIMATE2015'] > df['POPESTIMATE2014']] df=df[['STNAME','CTYNAME']] print(df.shape) return df.head(5) 
 def answer_eight(): county = census_df[census_df['SUMLEV']==50] req_col = ['STNAME','CTYNAME'] region = county[(county['REGION']<3) & (county['POPESTIMATE2015']>county['POPESTIMATE2014']) & (county['CTYNAME'].str.startswith('Washington'))] region = region[req_col] return region answer_eight()