PostHeaderIcon Merge two datasets in Pandas

Pandas has a built-in functionality to perform SQL-like joins of two DataFrames.
just like python extend two list.

Create two DataFrames, one each from the companys and conpanys_an datasets:

import pandas as pd

company_data = 'company.csv'
company_an_data = 'company1.csv'
path = '~/Downloads/data/'
file_one = path+company_data
file_two = path+company_an_data
companys = pd.read_csv(file_one,
                    sep=',',
                    header=0,
                    index_col=0,
                    parse_dates=False,
                    tupleize_cols=False,
                    nrows=1000)

companys_an = pd.read_csv(file_two,
                    sep=',',
                    header=0,
                    index_col=0,
                    parse_dates=False,
                    tupleize_cols=False,
                    nrows=1000)
merged_data = pd.merge(companys, companys_an,
                    how='left', left_index=True,
                    right_index=True)

print merged_data

the result of it:

We first create two DataFrames, one for the companys data and another for the companys_an data. Each of these datasets contains millions of rows; so, for this recipe, we are only importing the first thousand records from each file. The index for each of the DataFrames will be the first row of data, which in this case is the companys index.

With the DataFrames created, we perform an SQL-like left join using the merge() method, specifying the DataFrames to be joined, the way to join them, and telling Pandas to use the index from both the left (companys) and the right (companys_an) DataFrames. What Pandas then does is attempt to join the rows from the DataFrames using the index values. Because Pandas requires a column to exist in both the DataFrames, we used the index to perform the join. By virtue of how we created the DataFrames, and by using the companys_index as the index of the DataFrame, Pandas is able to join the two DataFrames.

The result is a single DataFrame containing all the columns from both the DataFrames. Merge() can be used with any two DataFrames as long as the column that you use to join exists in both the DataFrames.

1656 views

Leave a Reply

Your email address will not be published. Required fields are marked *

*


*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>


Copyright © 2010 - C++ Technology. All Rights Reserved.

Powered by Jerry | Free Space Provided by connove.com