SQLALCHEMY / PANDAS – SQLAlchemy leyendo la columna como CLOB para Pandas to_sql

He escrito un conjunto de datos en un dataframe.

inv.info()  Int64Index: 43839 entries, 0 to 43838 Data columns (total 16 columns): MST_CO 43839 non-null object LOAD_DATE 43839 non-null object WHSE_CODE 43839 non-null object ITEM_NO 43839 non-null object ITEM_ID 43839 non-null int64 LOCATION 43839 non-null object LOT_NO 43839 non-null object LOT_STATUS 43833 non-null object LOT_CREATED_DATE 43839 non-null datetime64[ns] LOT_EXPIRATION_DATE 43839 non-null object DATE_RECEIVED 43839 non-null datetime64[ns] ONHAND_QTY1 43839 non-null float64 UOM1 43839 non-null object ONHAND_QTY2 43418 non-null float64 UOM2 43408 non-null object SOURCE 43839 non-null object dtypes: datetime64[ns](2), float64(2), int64(1), object(11) 

Cuando bash escribir el dataframe en SQL, obtengo los siguientes errores.

 inv.to_sql('inventory', db2, 'MST', if_exists='append', index=False, chunksize=3000) 2015-03-30 09:33:10,656 INFO sqlalchemy.engine.base.Engine SELECT "SYSCAT"."TABLES"."TABNAME" FROM "SYSCAT"."TABLES" WHERE "SYSCAT"."TABLES"."TABSCHEMA" = ? AND "SYSCAT"."TABLES"."TABNAME" = ? 2015-03-30 09:33:10,656 INFO sqlalchemy.engine.base.Engine ('DWETL', 'INVENTORY') 2015-03-30 09:33:10,731 INFO sqlalchemy.engine.base.Engine CREATE TABLE inventory ( "MST_CO" CLOB, "LOAD_DATE" DATE, "WHSE_CODE" CLOB, "ITEM_NO" CLOB, "ITEM_ID" BIGINT, "LOCATION" CLOB, "LOT_NO" CLOB, "LOT_STATUS" CLOB, "LOT_CREATED_DATE" TIMESTAMP, "LOT_EXPIRATION_DATE" DATE, "DATE_RECEIVED" TIMESTAMP, "ONHAND_QTY1" FLOAT(53), "UOM1" CLOB, "ONHAND_QTY2" FLOAT(53), "UOM2" CLOB, "SOURCE" CLOB ) 2015-03-30 09:33:10,731 INFO sqlalchemy.engine.base.Engine () 2015-03-30 09:33:10,755 INFO sqlalchemy.engine.base.Engine ROLLBACK Traceback (most recent call last): File "", line 1, in  File "/usr/local/lib/python2.7/dist-packages/pandas/core/generic.py", line 977, in to_sql dtype=dtype) File "/usr/local/lib/python2.7/dist-packages/pandas/io/sql.py", line 538, in to_sql chunksize=chunksize, dtype=dtype) File "/usr/local/lib/python2.7/dist-packages/pandas/io/sql.py", line 1176, in to_sql table.create() File "/usr/local/lib/python2.7/dist-packages/pandas/io/sql.py", line 649, in create self._execute_create() File "/usr/local/lib/python2.7/dist-packages/pandas/io/sql.py", line 634, in _execute_create self.table.create() File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/sql/schema.py", line 707, in create checkfirst=checkfirst) File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1728, in _run_visitor conn._run_visitor(visitorcallable, element, **kwargs) File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1357, in _run_visitor **kwargs).traverse_single(element) File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/sql/visitors.py", line 120, in traverse_single return meth(obj, **kw) File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/sql/ddl.py", line 732, in visit_table self.connection.execute(CreateTable(table)) File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 841, in execute return meth(self, multiparams, params) File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/sql/ddl.py", line 69, in _execute_on_connection return connection._execute_ddl(self, multiparams, params) File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 895, in _execute_ddl compiled File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1070, in _execute_context context) File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1271, in _handle_dbapi_exception exc_info File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/util/compat.py", line 199, in raise_from_cause reraise(type(exception), exception, tb=exc_tb) File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1063, in _execute_context context) File "/usr/local/lib/python2.7/dist-packages/ibm_db_sa/ibm_db.py", line 106, in do_execute cursor.execute(statement, parameters) File "/usr/local/lib/python2.7/dist-packages/ibm_db_dbi.py", line 1335, in execute self._execute_helper(parameters) File "/usr/local/lib/python2.7/dist-packages/ibm_db_dbi.py", line 1247, in _execute_helper raise self.messages[len(self.messages) - 1] sqlalchemy.exc.ProgrammingError: (ProgrammingError) ibm_db_dbi::ProgrammingError: Statement Execute Failed: [IBM][CLI Driver][DB2/LINUXX8664] SQL1666N The table definition statement failed because some functionality was specified in the table definition that is not supported with the table type. Unsupported functionality: "CLOB". SQLSTATE=42613 SQLCODE=-1666 '\nCREATE TABLE inventory (\n\t"MST_CO" CLOB, \n\t"LOAD_DATE" DATE, \n\t"WHSE_CODE" CLOB, \n\t"ITEM_NO" CLOB, \n\t"ITEM_ID" BIGINT, \n\t"LOCATION" CLOB, \n\t"LOT_NO" CLOB, \n\t"LOT_STATUS" CLOB, \n\t"LOT_CREATED_DATE" TIMESTAMP, \n\t"LOT_EXPIRATION_DATE" DATE, \n\t"DATE_RECEIVED" TIMESTAMP, \n\t"ONHAND_QTY1" FLOAT(53), \n\t"UOM1" CLOB, \n\t"ONHAND_QTY2" FLOAT(53), \n\t"UOM2" CLOB, \n\t"SOURCE" CLOB\n)\n\n' () 

¿Cómo puedo hacer que SQLAlchemy y Pandas jueguen bien? Solo necesito convertir el CLOB para que se lea como STR. ¡Gracias!

Puede especificar el tipo de SQL que se usará para una determinada columna con el argumento de palabra clave dtype (ver documentos ):

 from sqlalchemy.types import String inv.to_sql('inventory', db2, dtype={'col_name': String}) 

De forma predeterminada, pandas usa el tipo de TEXT para las columnas objeto / cadena, que sqlalchemy asigna a CLOB o TEXT , pero al parecer su base de datos no conoce este tipo. Así que con lo anterior, puede especificar manualmente un tipo que su base de datos sabe.

Parece que tienes múltiples columnas. Así que para hacer esta asignación automática para todas las columnas dtyped objeto. Puedes seleccionar las columnas con objeto dtype:

 cols = df.dtypes[df.dtypes=='object'].index type_mapping = {col : String for col in cols } inv.to_sql('inventory', db2, dtype=type_mapping)