Importing .csv files into MySQL database can be annoying, especially when there are many columns in the .csv file. Using the conventional way you have to create the columns in MySQL first and specify the data types of each columns. In this article we demonstrate how to use Python to convert .csv file to MySQL database tables by creating the table and storing the data automatically
1. Connecting Python with MySQL¶
First we import necessary Python packages
import pandas as pd # pandas is used to read csv files
import MySQLdb # MySQLdb is used to connect Python with MySQL
Then we connect Python to MySQL database. We will not discuss details on the use of MySQL, such as how to create new users or table in MySQL.
db = MySQLdb.connect("localhost", "test", "mysql", "test")
# MySQLdb.connect(database server IP, username, password, database name)
Let's read a .csv file and show the first 5 lines of the .csv file
quandl_stock_w_fundamental_list = pd.read_csv(
"https://s3.amazonaws.com/quandl-static-content/Ticker+CSV%27s/" + \
"secwiki_tickers.csv").dropna()
quandl_stock_w_fundamental_list[:5]
We convert all the numbers in the dataframe to float to simply the process
for col in quandl_stock_w_fundamental_list.columns:
try:
quandl_stock_w_fundamental_list[col] = \
quandl_stock_w_fundamental_list[col].astype(float)
except:
pass
2. Create table in database and insert data¶
Below we define a function which takes a pandas dataframe as input, creates a table in MySQL and insert the data in the dataframe into the table in MySQL
def create_MySQL_table(db, tableName, dataframe, verbose = False):
"""
db: MySQLdb connection
tableName: the name of the table to be created in MySQL
dataframe: the dataframe read from the .csv file
verbose: will print the generated sql command if True
"""
cursor = db.cursor()
# remove the table if the tableName already exists
cursor.execute("DROP TABLE IF EXISTS " + tableName)
# create the table
sql = "CREATE TABLE " + tableName + " ("
for col in dataframe.columns:
if (dataframe[col].dtypes) == dtype('object'):
sql = sql + col + " VARCHAR(255), "
else:
sql = sql + col + " DOUBLE, "
sql = sql.strip(", ") + ")"
if verbose:
print sql
try:
cursor.execute(sql)
db.commit()
except:
db.rollback()
print "SQL operation failed: "
print sql
# insert the values into the table
colStr = "" # generate a string of column names separated by ", "
for colName in dataframe.columns:
colStr += colName + ", "
colStr = colStr.strip(", ")
for row in dataframe.iterrows():
sql = "INSERT INTO " + tableName + " ("
sql += colStr + ") VALUES ("
formatStr = ""; valueStr = ""
sql += ", ".join(str(row[1].values).strip("[]").split(' ')) + ")"
if verbose:
print sql
try:
cursor.execute(sql)
db.commit()
except:
db.rollback()
print "SQL operation failed: "
print sql
create_MySQL_table(db, "testTable", quandl_stock_w_fundamental_list[:5],
verbose = False)
We read from the table to make sure that the table creation and data insertion was successful
cursor = db.cursor()
cursor.execute("SELECT * FROM test.testtable;")
cursor.fetchall()
Clearly the table was successfully created and data insertion was succesful as well. Finally we close the connection to the database
db.close()
This comment has been removed by the author.
ReplyDeleteCan you write the updated code for Python 3? I am getting the following error :
ReplyDeletein
1 create_MySQL_table(db, "testTable", quandl_stock_w_fundamental_list[:5],
----> 2 verbose = False)
in create_MySQL_table(db, tableName, dataframe, verbose)
12 sql = "CREATE TABLE " + tableName + " ("
13 for col in dataframe.columns:
---> 14 if (dataframe[col].dtypes) == (dtype('object')):
15 sql = sql + col + " VARCHAR(255), "
16 else:
NameError: name 'dtype' is not defined
import numpy as np # add numpy libs
Delete#and change if (dataframe[col].dtypes) == (dtype('object')): on
if (dataframe[col].dtypes) == np.dtype('object'):
How much of an unique article, keep on posting better half programming socks
ReplyDelete