We all know the importance of models in data science. It is very important to select the appropriate model for our problem statement. In the Unmask Machine Learning Models workshop, we learnt from basics how we should analyze our problem statement to select the best model.

We first started with understanding the difference between Statistical Modeling and Machine Learning. Ms. Rutuja explained this very creatively by calling it the 10 year challenge. Statistical modeling has evolved over the years to Machine Learning. Statistics is theory and when new data is added, the model learns from new data, this is known as machine learning.

Then we discussed the importance of understanding data before we can choose an appropriate model. Thus we learnt about the types of features. Our data can be of 2 types structured and unstructured and these can also be further divided into sub categories

1. Structured data

  • Ordinal (Categorical) – Ordinal data is a categorical data that has an order. Eg: low, medium, high

  • Nominal (Categorical) – Nominal data is categorical data that does not have any particular order. Eg: Male and Female.

  • Discrete (Numeric) – Discrete data is numeric data that can be counted in a finite time. Eg: No. Of people attending a workshop.

  • Continuous (Numeric) – Continuous data is a numeric data that is continuously increasing. Eg: Age

  • Time/date – The data that represents time/date of an event.

2. Unstructured data

  • Audio

  • Images

  • Text

  • Video

After we have understood our data fairly, we need to explore it in depth. We can do that with exploratory data analysis. Which is basically data visualization. You can read more about data visualization in detail at https://rishitabansal.wordpress.com/2019/01/15/data-visualisation/

We discussed Feature importance with null hypothesis testing.

table

We evaluated the null hypothesis condition table. We discussed about the type-1 and type-2 error and we realized type-1 error is considered as threshold value for risk in industry.

The normal distribution curve was used to explain the concept of p-value.

Let us consider the following normal distribution curve

graph

Suppose if we set up the null hypothesis (H0) as there is no difference between the averages of two groups, then the difference between the averages should be ideally zero. So, in this curve, the H0 null is true at the center. The red part is decided to be the threshold for the risk (type 1 error explained above).

The probability is calculated considering that the null hypothesis is true. This probability is called as the p-value. If p-value is very small, it will fall under red region. This means that it is too far away than the H0, hence we decided to reject null hypothesis. If p-value is big enough such that it doesn’t fall into red region, then we fail to reject the null hypothesis.

We saw basic Machine Learning Algorithms:

1. Unsupervised

Clustering

  • Hierarchical

  • DBSCAN

  • K-means – numerical

  • K Medoids/modes – categorical

2. Supervised (Features available)-

  • Regression – Linear, Lasso, ridge, Regression tree

  • Classification Decision Trees, random forest, Logistic Regression, SVM, Naive Bayes

3. Supervised (Features not available) –

  • Deep Learning

We were explained the vital principals of machine learning:

  • Practice

  • Master the fundamentals

  • Plan before execution

  • Better data beats fancier algorithms

After understanding the basics lucidly, it was time to apply it now. We were divided into teams and we were given a dataset. We analyzed the features of the data set and analyzed every feature. We selected the features that were important for our results. Then according to the problem statement we decided it was a classification problem. We used Rapid Miner tool to implement different classification models on our dataset. After applying the models, we got a report displaying accuracy and execution time of each model. This report helped us understand the best model for our problem statement.

It was a day full of fun and learning. Thank you WiDS Pune and e-zest for organizing such an amazing workshop.