Monday, December 22, 2014

Properly Installing MySQLdb python package in Windows environment

A few problems can happen when you install MySQLdb python package in Windows. I summarize the problems and their solutions here:
  1. You should install Pythonxy, and install it on your computer using complete installation
  2. type "pip install MySQL-python" in your windows command line window, you may have the following problem: "unable to find vcvarsall.bat". This is because python can't find the visual studio C++ compiler properly. Execute the following command based on the version of Visual Studio installed:
    • Visual Studio 2010 (VS10): SET VS90COMNTOOLS=%VS100COMNTOOLS%
    • Visual Studio 2012 (VS11): SET VS90COMNTOOLS=%VS110COMNTOOLS%
    • Visual Studio 2013 (VS12): SET VS90COMNTOOLS=%VS120COMNTOOLS%
  3. After setting the C++ compiler correctly, you may have the following problem: "Cannot open include file: 'config-win.h'". This is because MySQL Connector was not installed or is not configured properly. Install MySQL Connector, and you can refer to the following post for configuration: http://stackoverflow.com/questions/1972259/mysql-python-install-problem-using-virtualenv-windows-pip

If you can read Chinese, actually this post is a good summary:
http://f.dataguru.cn/thread-133159-1-1.html

Tuesday, October 28, 2014

Import .csv file into MySQL database using Python

Notebook

Importing .csv files into MySQL database can be annoying, especially when there are many columns in the .csv file. Using the conventional way you have to create the columns in MySQL first and specify the data types of each columns. In this article we demonstrate how to use Python to convert .csv file to MySQL database tables by creating the table and storing the data automatically

1. Connecting Python with MySQL

First we import necessary Python packages

In [1]:
import pandas as pd # pandas is used to read csv files
import MySQLdb # MySQLdb is used to connect Python with MySQL

Then we connect Python to MySQL database. We will not discuss details on the use of MySQL, such as how to create new users or table in MySQL.

In [2]:
db = MySQLdb.connect("localhost", "test", "mysql", "test")
# MySQLdb.connect(database server IP, username, password, database name)

Let's read a .csv file and show the first 5 lines of the .csv file

In [3]:
quandl_stock_w_fundamental_list = pd.read_csv(
    "https://s3.amazonaws.com/quandl-static-content/Ticker+CSV%27s/" + \
    "secwiki_tickers.csv").dropna()
quandl_stock_w_fundamental_list[:5]
Out[3]:
Ticker Name Sector Industry Price Collection
0 A Agilent Technologies Inc. Healthcare Medical Laboratories & Research WIKI/A stocks/a
1 AA Alcoa Inc. Basic Materials Aluminum WIKI/AA stocks/aa
3 AAN Aaron's, Inc. Services Rental & Leasing Services WIKI/AAN stocks/aan
4 AAON AAON Inc. Industrial Goods General Building Materials WIKI/AAON stocks/aaon
5 AAP Advance Auto Parts Inc. Services Auto Parts Stores WIKI/AAP stocks/aap

5 rows × 6 columns

We convert all the numbers in the dataframe to float to simply the process

In [59]:
for col in quandl_stock_w_fundamental_list.columns:
    try:
        quandl_stock_w_fundamental_list[col] = \
            quandl_stock_w_fundamental_list[col].astype(float)
    except:
        pass

2. Create table in database and insert data

Below we define a function which takes a pandas dataframe as input, creates a table in MySQL and insert the data in the dataframe into the table in MySQL

In [4]:
def create_MySQL_table(db, tableName, dataframe, verbose = False):
    """
    db: MySQLdb connection
    tableName: the name of the table to be created in MySQL
    dataframe: the dataframe read from the .csv file
    verbose: will print the generated sql command if True
    """
    cursor = db.cursor()
    # remove the table if the tableName already exists
    cursor.execute("DROP TABLE IF EXISTS " + tableName)
    # create the table
    sql = "CREATE TABLE " + tableName + " ("
    for col in dataframe.columns:
        if (dataframe[col].dtypes) == dtype('object'):
            sql = sql + col + " VARCHAR(255), "
        else:
            sql = sql + col + " DOUBLE, "
    sql = sql.strip(", ") + ")"
    if verbose:
        print sql
    try:
        cursor.execute(sql) 
        db.commit()
    except:
        db.rollback()
        print "SQL operation failed: "
        print sql
    
    # insert the values into the table
    colStr = "" # generate a string of column names separated by ", "
    for colName in dataframe.columns:
        colStr += colName + ", "
    colStr = colStr.strip(", ")
    for row in dataframe.iterrows():
        sql = "INSERT INTO " + tableName + " ("
        sql += colStr + ") VALUES ("
        formatStr = ""; valueStr = ""
        sql += ", ".join(str(row[1].values).strip("[]").split(' ')) + ")"
        if verbose:
            print sql
        try:
            cursor.execute(sql) 
            db.commit()
        except:
            db.rollback()
            print "SQL operation failed: "
            print sql
        
In [8]:
create_MySQL_table(db, "testTable", quandl_stock_w_fundamental_list[:5], 
                   verbose = False)

We read from the table to make sure that the table creation and data insertion was successful

In [9]:
cursor = db.cursor()
cursor.execute("SELECT * FROM test.testtable;")
cursor.fetchall()
Out[9]:
(('A',
  'Agilent, Technologies, Inc.',
  'Healthcare',
  'Medical, Laboratories, &, Research',
  'WIKI/A',
  'stocks/a'),
 ('AA', 'Alcoa, Inc.', 'Basic, Materials', 'Aluminum', 'WIKI/AA', 'stocks/aa'),
 ('AAN',
  "Aaron's,, Inc.",
  'Services',
  'Rental, &, Leasing, Services',
  'WIKI/AAN',
  'stocks/aan'),
 ('AAON',
  'AAON, Inc.',
  'Industrial, Goods',
  'General, Building, Materials',
  'WIKI/AAON',
  'stocks/aaon'),
 ('AAP',
  'Advance, Auto, Parts, Inc.',
  'Services',
  'Auto, Parts, Stores',
  'WIKI/AAP',
  'stocks/aap'))

Clearly the table was successfully created and data insertion was succesful as well. Finally we close the connection to the database

In [74]:
db.close()