Python Pandas Course for Data Analysis

Pandas are not only bears

Pandas is a Python package for the handling and analysis of spreadsheet data. The Pandas course deals with loading, cleaning, manipulating, merging, visualizing spreadsheet like data.

This course assumes familiarity with Python. The course material is interesting for data analysts or people who work a lot with Excel and want to automatize repetitive tasks or analyze more complex data.

The course consists of about 60% exercises with a trainer per 1 to 9 participants helping individually.

At the end of the course participants will have a very thorough knowledge and practical experience with Pandas. They will know all the core tools and capabilities of Pandas. The topics of the course are:

1) Numpy

  • ndarray creation routines.
  • Array elements access.
  • array slicing.
  • Elementwise operations.
  • Attributes of ndarray.

2) The Series object

  • Contruct a Series object. Different methods.
  • Series object behaves like a numpy array in certain aspects.
  • Checking if index key is present.
  • Series object behaves like a dict in certain aspects.

3) The DataFrame object

  • Construct a DataFrame object. Various methods to do so.
  • Add / delete columns.
  • Row selection and slicing.
  • df.loc[], df.iloc[], df.at[], df.iat[] selection and access methods.
  • head(), tail(), transpose() methods.
  • DataFrame attributes.
  • Column-wise, row-wise methods.
  • DataFrame behaves like a 2-dimensional numpy array in certain aspects.

4) Cleaning and replacing data in a DataFrame

  • How to deal with missing data.
  • The replace() method.
  • Reading or writing a DataFrame from / to a csv-file or Excel-file.
  • String operations of String-Series.
  • Iterating over rows, columns or cells.
  • Renaming certain columns or rows.
  • Sorting a DataFrame with respect to a self defined criterion.
  • Calculate covariances, correlations of pairs of columns.

5) SQL-like operations on DataFrames

  • The Split-Calculate-Combine principle.
  • Adding data to Series or DataFrames.
  • Joining DataFrames with SQL-like join-operations.

6) Data visualization

  • The plot method of DataFrame.
  • The Seaborn and Matplotlib plotting packages.

Each of the above chapters has one or more exercise units. The course duration is 3 to 4 days.

On request, this course can be combined with the other courses or shortened with a duration between 2 and 5 days. If you are interested in this course, please send us a message, since we plan courses dynamically on demand. (Price list).