Quality blog content since 2006
- the imjtk.com successor
Quality blog content since 2006
- the imjtk.com successor

Blog Post

Python Pandas Prolonged

January 31, 2018 Dev
Python Pandas Prolonged

The other day I posted a Pandas for Python Primer . I promised a follow up before the end of the week; and away we go…

Pandas for Python, A Slight Return

Previously I talked about creating DataFrames, their rows and columns, and how one might apply a function to them. In my experience, more often than not you’ll want to import some data. You start by importing Pandas of course. A note here – by convention some coders import Pandas as pd. You’ll want to replace my pandas. to pd. if you are doing this.

Assume that there is an excel spreadsheet in the current working directory called MoultrieEateries.xlsx. We’ll need some additional help for Pandas to understand excel, so open your shell and: pip install xlrd. If you don’t use pip you should switch lol- j/k. Use your package manger to get and install xlrd.

import pandas
df=pandas.read_excel("MoultrieEateries.xlsx", sheet_name=0)

We’re importing the Pandas library, creating a DataFrame named df, and then printing the name of the columns. I just like to print something out to make sure nothing looks outta whack. You can also call something like:


to see a few lines of the top or the bottom of the DataFrame. Head and tail are safer than calling the entire DataFrame when you have no idea how big your data is and you just want to check the structure. You can specify how many rows are returned ala: df.head(3).

python pandas example

In my example here pandas is showing me 4 rows and all of the columns. I would only be interested in some of the columns in this DataFrame. They are also indexed by ID which is not what I want either. One of the best uses for Pandas is cleaning data. That is taking a bunch of information and manipulating it into just the data you need. Remember the DataFrame we created is called df:

df=df.drop(df.columns[5:14], axis=1)
df.sort_values("Ratings Average", ascending=False)

The first line of code up there deletes all of the columns that I don’t need. The numbers in the brackets are a slice. The second line re-orders my data so that it makes more sense by showing the eateries by ranking instead of by ID. Keep in mind that sort_values() does not work in-place, so you’ll have to change that parameter when calling it or assign it to a new variable if you want to retain the returned, sorted data. Also notice that I had to add another parameter setting the sort to ascending=False so those joints with the highest rating were first.

pandas dataframe

This has barely scratched the surface – there are tons of details about DataSets over here. Hopefully with these first 2 posts you have an idea about how to get started with Pandas. I’ll write one more post in a couple of days that will show you how to get some longitude and latitude from these addresses.

Not for nothin’, but this is my favorite Pandas reference link on the web.

  • This Coder Here 2:47 pm March 26, 2018 Reply

    Why not just write one article on Pandas instead of all of these different articles?

    • tripkendall 1:54 pm March 30, 2018 Reply

      I was writing about doing different things in Pandas in the different articles, was doing them over a time frame, and it would have been a l o n g article if I mashed it all up. Cheers.

  • IWannaCode 12:43 pm April 12, 2018 Reply

    This is very helpful. I am trying to learn Pandas and Python and some articles are difficult to grasp. You make Python simple! Maybe you could post the MoultrieEateries.xlsx sample data?

  • Margie 10:18 am May 11, 2018 Reply

    Thanks for these Pandas posts. I clicked on a couple of others in Google before I found your site and I am following along but what software are you using for Python in your screen shots?

    • tripkendall 1:43 pm May 11, 2018 Reply

      Hey Now 🙂 I think you are looking for the Jupyter Notebook


Write a comment