PostHeaderIcon Convert text categories to numbers in Pandas

When you have text categories in your data, you can dramatically speed up the processing of that data using Pandas categoricals. Categoricals encode the text as numerics, which allows us to take full advantage of Pandas’ fast C code. Examples of times when you’d use categoricals are stock symbols, gender, experiment outcomes, states, and in this case, a customer loyalty level.

we create a new DataFrame to work with as below:

import pandas as pd
import numpy

df = pd.DataFrame({
'people' :
    ["cole o'brien", "lise heidenreich", "zilpha skiles", "damion wisozk"],
'age' :
    [24, 35, 46, 57],
'ssn':
    ['6439', '689 24 9939', '306-05-2792', '992245832'],
'birth_date':
    ['1987-08-01', '1988-02-14', '1992-10-23', '1980-01-26'],
'customer_loyalty_level' :
    ['not at all', 'moderate', 'moderate', 'highly loyal']})

df.customer_loyalty_level = df.customer_loyalty_level.astype('category')
print '----after----'
print df.customer_loyalty_level
print df.dtypes

Tips of above:
First, convert the customer_loyalty_level column to a category type column:

df.customer_loyalty_level = lc.customer_loyalty_level.astype('category')

Next, print out the column:

df.customer_loyalty_level

the result of it:

After we have created our DataFrame, we use a single line of code to convert the customer_loyalty_level column to a categorical. When printing out the DataFrame, you see the original text. So how do you know if the conversion worked? Print out the dtypes (data types), which shows the type of data in the column.

The astype() method is used to convert one type of data to another. In this recipe, we are using it to convert an object type column to a category type column. Common use of it is to convert text to numeric values such as integers and floats.

1698 views

Leave a Reply

Your email address will not be published. Required fields are marked *

*


*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>


Copyright © 2010 - C++ Technology. All Rights Reserved.

Powered by Jerry | Free Space Provided by connove.com