- You should install Pythonxy, and install it on your computer using complete installation
- type "pip install MySQL-python" in your windows command line window, you may have the following problem: "unable to find vcvarsall.bat". This is because python can't find the visual studio C++ compiler properly. Execute the following command based on the version of Visual Studio installed:
- Visual Studio 2010 (VS10):
SET VS90COMNTOOLS=%VS100COMNTOOLS%
- Visual Studio 2012 (VS11):
SET VS90COMNTOOLS=%VS110COMNTOOLS%
- Visual Studio 2013 (VS12):
SET VS90COMNTOOLS=%VS120COMNTOOLS%
- After setting the C++ compiler correctly, you may have the following problem: "Cannot open include file: 'config-win.h'". This is because MySQL Connector was not installed or is not configured properly. Install MySQL Connector, and you can refer to the following post for configuration: http://stackoverflow.com/questions/1972259/mysql-python-install-problem-using-virtualenv-windows-pip
This blog discuss data science, machine learning, statistics and big data technologies
Monday, December 22, 2014
Properly Installing MySQLdb python package in Windows environment
Tuesday, October 28, 2014
Import .csv file into MySQL database using Python
Importing .csv files into MySQL database can be annoying, especially when there are many columns in the .csv file. Using the conventional way you have to create the columns in MySQL first and specify the data types of each columns. In this article we demonstrate how to use Python to convert .csv file to MySQL database tables by creating the table and storing the data automatically
1. Connecting Python with MySQL¶
First we import necessary Python packages
import pandas as pd # pandas is used to read csv files
import MySQLdb # MySQLdb is used to connect Python with MySQL
Then we connect Python to MySQL database. We will not discuss details on the use of MySQL, such as how to create new users or table in MySQL.
db = MySQLdb.connect("localhost", "test", "mysql", "test")
# MySQLdb.connect(database server IP, username, password, database name)
Let's read a .csv file and show the first 5 lines of the .csv file
quandl_stock_w_fundamental_list = pd.read_csv(
"https://s3.amazonaws.com/quandl-static-content/Ticker+CSV%27s/" + \
"secwiki_tickers.csv").dropna()
quandl_stock_w_fundamental_list[:5]
Ticker | Name | Sector | Industry | Price | Collection | |
---|---|---|---|---|---|---|
0 | A | Agilent Technologies Inc. | Healthcare | Medical Laboratories & Research | WIKI/A | stocks/a |
1 | AA | Alcoa Inc. | Basic Materials | Aluminum | WIKI/AA | stocks/aa |
3 | AAN | Aaron's, Inc. | Services | Rental & Leasing Services | WIKI/AAN | stocks/aan |
4 | AAON | AAON Inc. | Industrial Goods | General Building Materials | WIKI/AAON | stocks/aaon |
5 | AAP | Advance Auto Parts Inc. | Services | Auto Parts Stores | WIKI/AAP | stocks/aap |
5 rows × 6 columns
We convert all the numbers in the dataframe to float to simply the process
for col in quandl_stock_w_fundamental_list.columns:
try:
quandl_stock_w_fundamental_list[col] = \
quandl_stock_w_fundamental_list[col].astype(float)
except:
pass
2. Create table in database and insert data¶
Below we define a function which takes a pandas dataframe as input, creates a table in MySQL and insert the data in the dataframe into the table in MySQL
def create_MySQL_table(db, tableName, dataframe, verbose = False):
"""
db: MySQLdb connection
tableName: the name of the table to be created in MySQL
dataframe: the dataframe read from the .csv file
verbose: will print the generated sql command if True
"""
cursor = db.cursor()
# remove the table if the tableName already exists
cursor.execute("DROP TABLE IF EXISTS " + tableName)
# create the table
sql = "CREATE TABLE " + tableName + " ("
for col in dataframe.columns:
if (dataframe[col].dtypes) == dtype('object'):
sql = sql + col + " VARCHAR(255), "
else:
sql = sql + col + " DOUBLE, "
sql = sql.strip(", ") + ")"
if verbose:
print sql
try:
cursor.execute(sql)
db.commit()
except:
db.rollback()
print "SQL operation failed: "
print sql
# insert the values into the table
colStr = "" # generate a string of column names separated by ", "
for colName in dataframe.columns:
colStr += colName + ", "
colStr = colStr.strip(", ")
for row in dataframe.iterrows():
sql = "INSERT INTO " + tableName + " ("
sql += colStr + ") VALUES ("
formatStr = ""; valueStr = ""
sql += ", ".join(str(row[1].values).strip("[]").split(' ')) + ")"
if verbose:
print sql
try:
cursor.execute(sql)
db.commit()
except:
db.rollback()
print "SQL operation failed: "
print sql
create_MySQL_table(db, "testTable", quandl_stock_w_fundamental_list[:5],
verbose = False)
We read from the table to make sure that the table creation and data insertion was successful
cursor = db.cursor()
cursor.execute("SELECT * FROM test.testtable;")
cursor.fetchall()
(('A', 'Agilent, Technologies, Inc.', 'Healthcare', 'Medical, Laboratories, &, Research', 'WIKI/A', 'stocks/a'), ('AA', 'Alcoa, Inc.', 'Basic, Materials', 'Aluminum', 'WIKI/AA', 'stocks/aa'), ('AAN', "Aaron's,, Inc.", 'Services', 'Rental, &, Leasing, Services', 'WIKI/AAN', 'stocks/aan'), ('AAON', 'AAON, Inc.', 'Industrial, Goods', 'General, Building, Materials', 'WIKI/AAON', 'stocks/aaon'), ('AAP', 'Advance, Auto, Parts, Inc.', 'Services', 'Auto, Parts, Stores', 'WIKI/AAP', 'stocks/aap'))
Clearly the table was successfully created and data insertion was succesful as well. Finally we close the connection to the database
db.close()