Code 501: Introduction to Data Analysis and Visualization with Python

Overview

It's crucial for modern companies to have clean, easy-to-understand data to inform business direction and measure outcomes. Data literacy and organization allow for better decision-making, faster interpretation, and more widespread comprehension throughout an organization. Being able to collect, manage, understand, contemplate, and communicate with data will separate those who experience change to those who drive it.

In this course, you will learn how to harness the power of Python to gain highly coveted skills in data analysis and visualization. We’ll cover how to use standard packages for the organization, analysis, and visualization of data, such as Numpy, Scipy, Matplotlib, and Scikit-Learn. You’ll get to apply these skills on a daily basis and, in the end, produce a substantial project showcasing your new abilities as a data analyst.

Outcomes

At the end of this course students will:

  • Be familiar with the standard data analysis tools of Python.
  • Know how to visualize their data, whether processed or not, so that they can communicate its relevance to those with and without the appropriate domain knowledge.
  • Understand how to clean data and prune for quality without losing depth of meaning (or at least be able to adequately explain information loss) such that any analysis using that data isn't hampered by regular or irregular artifacts.
  • Be equipped to attack small to mid-size data sets with one of the most popular modern programming languages, empowering them with the agency to handle mass amount of data that the world generates daily.
  • Be able to back up claims based in data with solid reporting and data-driven analyses, adding to the legitimacy of their work and the credibility of their decisions.
  • Complete a deep dive and thorough analysis of a publicly-available data set. This will be done in class as a group project.

Prerequisites

Students enrolling in this coures should:

  • Know arithmetic and basic algebra
  • Have seen a mathematical matrix (even if it's not yet understood)
  • Understand Git and GitHub
  • Have a basic understanding of Python:
    • Built-in Python data structures and functions (list, tuple, dict, int, float, string, len, sum, min, max)
    • Performing basic mathematical operations in Python (with and without math from the standard library)
    • Writing custom functions and classes
    • Importing packages
    • Running scripts
    • Reading/writing files
    • Installing new packages

Topics

Class 1: Introduction to Descriptive Statistics

  • Learn to create virtual environments in Jupyter Notebook
  • Be introduced to the fundamentals of descriptive statistics in the Jupyter environment

Class 2: Numpy and Thinking outside data

  • Be introduced to the mathematical superpowers of the NumPy package
  • Construct testable data-driven hypotheses

Class 3: Pandas Basics

  • Clean and engage data using pandas DataFrames within the Jupyter environment

Class 4: Data Viz Basics and Advanced Pandas

  • Learn data visualization best practices to keep in mind when creating plots and visuals
  • Visualize distributions with histograms, bar charts, scatter plots, and box plots created with Matplotlib, seaborn, and bokeh
  • Learn some advanced pandas and aggregating data into a pandas Dataframe

Class 5: Exploratory Visualization

  • Be introduced to exploratory visualization with Python in the Jupyter environment
  • Be introduced to data reporting in Jupyter Notebook

Class 6: Mapbox and GeoJson

  • Revisit identifying and managing problematic data with geojson
  • Learn the basics of mapping geojson in the Jupyter Environment with Mapbox and Pysal

Class 7: Exploratory Visualization of Large Datasets

  • Visualizing Large and High Dimensional Data sets
  • Brainstorm portfolio project ideas
  • Task teams to perform data munging of sections of a given data set

Class 8: Quick Reporting and Data Viz Extras

  • Evaluate portfolio project ideas
  • Bring the parts submitted by teams together in a quick reporting of the data
  • Be introduced to the Altari library
  • Learn how to create word cloud images and waffle charts in Python

Class 9 &10: Final Project

  • Put it all together with a portfolio-worthy data project
  • Find and clean a data set of interest
  • Create graphics to tell your data story
  • Write a robust analysis and interpretation of the data and your findings
  • Share your analysis with a final public Jupyter Notebook
  • Final notes on Data Analysis and Visualization with Python

Learn with Stacked Modules

Concepts in each of our courses are taught using stacked modules, where a new concept is introduced in each class session, building upon what came before it. This is a challenging style that requires persistence, practice, and collaboration, but allows more concepts to be introduced over the length of the course. This method helps students learn and retain more information in a short period of time. Learn more about stacked modules »

Computer Requirements

Students are required to bring their own laptop with plenty of free space on the hard drive. Most students use Macs, simply because they are easy to work with. Others install Linux on their laptops. We don’t encourage students to use any version of Windows unless they are already Windows development gurus. By the first day of class, students will need:

  • Python version 3
  • Linux: Windows machines set up to dual-boot a Linux operating system. We recommend Ubuntu. A C-compiler and the Python development headers will also need to be installed.
  • The latest version of Google Chrome
  • A GitHub account

If you would like to set up your Windows machine to dual boot to Linux, check out these guides: