PostHeaderIcon Generate summary statistics for the entire dataset in MongoDB

One of the first steps that business intelligence professionals perform on a new dataset is creating summary statistics. These statistics can be generated for an entire dataset or a part of it. I’ll show how to create summary statistics for the entire dataset.

to do it:

To generate summary statistics for the entire dataset, begin by importing the libraries that you need:

import pandas as pd

Next, import the dataset from the MongoDB:

from pymongo import  MongoClient

client = MongoClient('localhost', 27017)

db = client.smallbusiness
collection = db.company

data = collection.find()
company = pd.DataFrame(list(data))

After that, use the describe function to generate summary stats for the entire dataset:

company.describe()

Finally, transpose the results provided by describe() to make the results more readable:

company.describe().transpose()

OK, put all together:

import pandas as pd
from pymongo import  MongoClient

client = MongoClient('localhost', 27017)

db = client.smallbusiness
collection = db.company

data = collection.find()

company = pd.DataFrame(list(data))

company.describe().transpose()

result of it:

1608 views

Leave a Reply

Your email address will not be published. Required fields are marked *

*


*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>


Copyright © 2010 - C++ Technology. All Rights Reserved.

Powered by Jerry | Free Space Provided by connove.com