Scikit Learn

Scikit-Learn is a very powerful Python package for machine learning. The Scikit-Learn course is intended for engineers, economists (Insurance, Banking), marketing experts, consultants, who want to apply machine learning algorithms to create software that acts intelligently by learning from data. The course consists of about 50%-60% exercises with a trainer per 5 to 9 participants helping individually. At the end of the course participants will be able to program powerful, state of the art predictive algorithms in a few lines of code and use them in production for forecasting. The course assumes familiarity with core Python and also with linear algebra and basic calculus. It covers the theoretical aspects of the 3 to 4 most powerful machine learning algorithms of Scikit-Learn in more detail. Otherwise the focus is on the main principles and tools of Scikit-Learn necessary to successfully apply it to real world problems. The topics:

1) Numpy

  • ndarray creation routines.
  • Array elements access.
  • array slicing.
  • Elementwise operations.
  • Attributes of ndarray.
  • Reshaping an array.
  • Advanced indexing.
  • Looping over ndarray.
  • The numpy.where function.
  • Important ndarray-methods.
  • numpy mathematical functions.
  • Dealing with NaN and infinite numbers.

2) Supervised learning - basics

  • Linear regression.
  • Measuring the quality of a prediction model.
  • The sklearn.model_selection.train_test_split() function.
  • Methods to find relevant features.
  • Visualization of the interaction of variables with seaborn.
  • Main methods of the sklearn Predictor instances.
  • The cross_val_score() function.
  • Logistic regression.

3) Important tools

  • sklearn.preprocessing.LabelEncoder
  • Predictors and Transformers interact with Pandas.
  • Quality metrics for models.
  • Visualization of the learning curve.

4) Analysis and improvement techniques

  • Regularization to avoid overfitting.
  • Evaluation of classification models.
  • Visualization of the quality of classification models.
  • Feature engineering.

5) Support Vector Machines

  • Support-Vektor-Classifiers.
  • sklearn.svm package’s SVC Predictors
  • Support vector regression.
  • How to find good model parameters. GridSearchCV and other classes.

6) Transformation of data to improve models

  • Principal Components Transformer.
  • Space Density Transformer
  • Pipelines
  • Transform non-numerical to numerical data.
  • Scikit-Learn Transformers.
  • Modul to_numeric_frame.

7) Decision-Trees, Random Forests, Gradient-Boost

  • Decision trees for classification.
  • Decision trees for regression.
  • RandomForrest.
  • GradientBoost and other ensemble methods.
  • Save and load prediction models.

8) Unsupervised learning

  • Clustering.
  • K-means-clustering.
  • The elbow method.
  • Mean shift clustering.

Each of the listed topics has one or more exercise units. The course duration is 5 days.On request, this course can be combined with the other courses or shortened with a duration between 2 and 5 days. If you are interested in this course, please send us a message, since we plan courses dynamically on demand.