PostHeaderIcon Python Visualizing Data

“I believe that visualization is one of the most powerful means of achieving personal goals.”
Harvey Mackay

A wide variety of tools exists for visualizing data
We will be using the matplotlib library

In particular, we will be using the matplotlib.pyplot module. In its simplest use, pyplot maintains an internal state in which you build up a visualization step by step, you can save it (with savefig()) or display it (with show()).
let us for one simple example:
Python Visualizing Data

# !/usr/bin/env python
# -*- coding: utf-8 -*-
__author__ = 'Jerry'
from matplotlib import pyplot as cplt

years = [1950, 1960, 1970, 1980, 1990, 2000, 2010]
price = [300.2, 543.3, 1075.9, 2862.5, 5979.6, 10289.7, 14958.3]

cplt.plot(years, price, color='blue', marker='o', linestyle='solid')

cplt.title("years of company price")

cplt.ylabel("Millions of RMB")

PostHeaderIcon Python Bar Charts

A bar chart is a good choice when you want to show how some quantity varies among some discrete set of items.

For instance, Figure below shows how many Movies Awards were won by each of stars.
Python Bar Charts
# bars are by default width 0.8, so we’ll add 0.1 to the left coordinates, so that each bar is centered
# then we set plot bars with left x-coordinates [xs], heights [num_ads]
# last we set label x-axis with star names at bar center
here’s the code:

# !/usr/bin/env python
# -*- coding: utf-8 -*-
__author__ = 'Jerry'
from matplotlib import pyplot as cplt

stars = ["TingFeng Xie", "DeHua Liu", "ChaoWei Liang", "QingYun Liu",
          "JiaHui Liang", "JiaHui Zhang"]
num_ads = [5, 11, 13, 8, 10, 7]

xs = [i+0.1 for i,_ in enumerate(stars)], num_ads)

cplt.ylabel("# of Movie Awards")

cplt.title("Super Stars")

cplt.xticks([i+0.5 for i,_ in enumerate(stars)], stars)

PostHeaderIcon Python Distribute Value Bar Chart

A bar chart can also be a good choice for plotting histograms of bucketed numeric values.
Let us see a short example. we have a score array.
# array[83,95,91,87,70,0,85,82,100,67,73,77,0]
the result of these array is below:
distribute value bar chart
the code is also simple:

# !/usr/bin/env python
# -*- coding: utf-8 -*-
__author__ = 'Jerry'
from matplotlib import pyplot as cplt

# A Counter turns a sequence of values into
# a defaultdict(int)-like object mapping keys to counts.

from collections import Counter

scores = [83,95,91,87,70,0,85,82,100,67,73,77,0]

den = lambda g: g/10*10

hgram = Counter(den(score) for score in scores)[x-4 for x in hgram.keys()], hgram.values(), 8) 

cplt.axis([-5, 105, 0, 5])

cplt.xticks([10*i for i in range(11)])

cplt.ylabel("# of Students")
cplt.title("Exam On Class One")

PostHeaderIcon Create a Pandas DataFrame from a MongoDB query

Create a Pandas DataFrame from a MongoDB query, we will leverage our knowledge of creating MongoDB queries to get the information that we want.

Before running a query against MongoDB, determine the information you want to look at. By creating a query filter, you will save time by only retrieving the information that you want. This is very important when you have millions or billions of rows of data.
Read the rest of this entry »


PostHeaderIcon Convert text categories to numbers in Pandas

When you have text categories in your data, you can dramatically speed up the processing of that data using Pandas categoricals. Categoricals encode the text as numerics, which allows us to take full advantage of Pandas’ fast C code. Examples of times when you’d use categoricals are stock symbols, gender, experiment outcomes, states, and in this case, a customer loyalty level.
Read the rest of this entry »


Copyright © 2010 - C++ Technology. All Rights Reserved.

Powered by Jerry | Free Space Provided by