Tuesday, October 28, 2014

Import .csv file into MySQL database using Python

Notebook

Importing .csv files into MySQL database can be annoying, especially when there are many columns in the .csv file. Using the conventional way you have to create the columns in MySQL first and specify the data types of each columns. In this article we demonstrate how to use Python to convert .csv file to MySQL database tables by creating the table and storing the data automatically

1. Connecting Python with MySQL

First we import necessary Python packages

In [1]:
import pandas as pd # pandas is used to read csv files
import MySQLdb # MySQLdb is used to connect Python with MySQL

Then we connect Python to MySQL database. We will not discuss details on the use of MySQL, such as how to create new users or table in MySQL.

In [2]:
db = MySQLdb.connect("localhost", "test", "mysql", "test")
# MySQLdb.connect(database server IP, username, password, database name)

Let's read a .csv file and show the first 5 lines of the .csv file

In [3]:
quandl_stock_w_fundamental_list = pd.read_csv(
    "https://s3.amazonaws.com/quandl-static-content/Ticker+CSV%27s/" + \
    "secwiki_tickers.csv").dropna()
quandl_stock_w_fundamental_list[:5]
Out[3]:
Ticker Name Sector Industry Price Collection
0 A Agilent Technologies Inc. Healthcare Medical Laboratories & Research WIKI/A stocks/a
1 AA Alcoa Inc. Basic Materials Aluminum WIKI/AA stocks/aa
3 AAN Aaron's, Inc. Services Rental & Leasing Services WIKI/AAN stocks/aan
4 AAON AAON Inc. Industrial Goods General Building Materials WIKI/AAON stocks/aaon
5 AAP Advance Auto Parts Inc. Services Auto Parts Stores WIKI/AAP stocks/aap

5 rows × 6 columns

We convert all the numbers in the dataframe to float to simply the process

In [59]:
for col in quandl_stock_w_fundamental_list.columns:
    try:
        quandl_stock_w_fundamental_list[col] = \
            quandl_stock_w_fundamental_list[col].astype(float)
    except:
        pass

2. Create table in database and insert data

Below we define a function which takes a pandas dataframe as input, creates a table in MySQL and insert the data in the dataframe into the table in MySQL

In [4]:
def create_MySQL_table(db, tableName, dataframe, verbose = False):
    """
    db: MySQLdb connection
    tableName: the name of the table to be created in MySQL
    dataframe: the dataframe read from the .csv file
    verbose: will print the generated sql command if True
    """
    cursor = db.cursor()
    # remove the table if the tableName already exists
    cursor.execute("DROP TABLE IF EXISTS " + tableName)
    # create the table
    sql = "CREATE TABLE " + tableName + " ("
    for col in dataframe.columns:
        if (dataframe[col].dtypes) == dtype('object'):
            sql = sql + col + " VARCHAR(255), "
        else:
            sql = sql + col + " DOUBLE, "
    sql = sql.strip(", ") + ")"
    if verbose:
        print sql
    try:
        cursor.execute(sql) 
        db.commit()
    except:
        db.rollback()
        print "SQL operation failed: "
        print sql
    
    # insert the values into the table
    colStr = "" # generate a string of column names separated by ", "
    for colName in dataframe.columns:
        colStr += colName + ", "
    colStr = colStr.strip(", ")
    for row in dataframe.iterrows():
        sql = "INSERT INTO " + tableName + " ("
        sql += colStr + ") VALUES ("
        formatStr = ""; valueStr = ""
        sql += ", ".join(str(row[1].values).strip("[]").split(' ')) + ")"
        if verbose:
            print sql
        try:
            cursor.execute(sql) 
            db.commit()
        except:
            db.rollback()
            print "SQL operation failed: "
            print sql
        
In [8]:
create_MySQL_table(db, "testTable", quandl_stock_w_fundamental_list[:5], 
                   verbose = False)

We read from the table to make sure that the table creation and data insertion was successful

In [9]:
cursor = db.cursor()
cursor.execute("SELECT * FROM test.testtable;")
cursor.fetchall()
Out[9]:
(('A',
  'Agilent, Technologies, Inc.',
  'Healthcare',
  'Medical, Laboratories, &, Research',
  'WIKI/A',
  'stocks/a'),
 ('AA', 'Alcoa, Inc.', 'Basic, Materials', 'Aluminum', 'WIKI/AA', 'stocks/aa'),
 ('AAN',
  "Aaron's,, Inc.",
  'Services',
  'Rental, &, Leasing, Services',
  'WIKI/AAN',
  'stocks/aan'),
 ('AAON',
  'AAON, Inc.',
  'Industrial, Goods',
  'General, Building, Materials',
  'WIKI/AAON',
  'stocks/aaon'),
 ('AAP',
  'Advance, Auto, Parts, Inc.',
  'Services',
  'Auto, Parts, Stores',
  'WIKI/AAP',
  'stocks/aap'))

Clearly the table was successfully created and data insertion was succesful as well. Finally we close the connection to the database

In [74]:
db.close()

4 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. Can you write the updated code for Python 3? I am getting the following error :


    in
    1 create_MySQL_table(db, "testTable", quandl_stock_w_fundamental_list[:5],
    ----> 2 verbose = False)

    in create_MySQL_table(db, tableName, dataframe, verbose)
    12 sql = "CREATE TABLE " + tableName + " ("
    13 for col in dataframe.columns:
    ---> 14 if (dataframe[col].dtypes) == (dtype('object')):
    15 sql = sql + col + " VARCHAR(255), "
    16 else:

    NameError: name 'dtype' is not defined

    ReplyDelete
    Replies
    1. import numpy as np # add numpy libs
      #and change if (dataframe[col].dtypes) == (dtype('object')): on
      if (dataframe[col].dtypes) == np.dtype('object'):

      Delete
  3. How much of an unique article, keep on posting better half programming socks

    ReplyDelete