We all know the importance of models in data science. It is very important to select the appropriate model for our problem statement. In the Unmask Machine Learning Models workshop, we learnt from basics how we should analyze our problem statement to select the best model.
We first started with understanding the difference between Statistical Modeling and Machine Learning. Ms. Rutuja explained this very creatively by calling it the 10 year challenge. Statistical modeling has evolved over the years to Machine Learning. Statistics is theory and when new data is added, the model learns from new data, this is known as machine learning.
Then we discussed the importance of understanding data before we can choose an appropriate model. Thus we learnt about the types of features. Our data can be of 2 types structured and unstructured and these can also be further divided into sub categories
1. Structured data

Ordinal (Categorical) – Ordinal data is a categorical data that has an order. Eg: low, medium, high

Nominal (Categorical) – Nominal data is categorical data that does not have any particular order. Eg: Male and Female.

Discrete (Numeric) – Discrete data is numeric data that can be counted in a finite time. Eg: No. Of people attending a workshop.

Continuous (Numeric) – Continuous data is a numeric data that is continuously increasing. Eg: Age

Time/date – The data that represents time/date of an event.
2. Unstructured data

Audio

Images

Text

Video
After we have understood our data fairly, we need to explore it in depth. We can do that with exploratory data analysis. Which is basically data visualization. You can read more about data visualization in detail at https://rishitabansal.wordpress.com/2019/01/15/datavisualisation/
We discussed Feature importance with null hypothesis testing.
We evaluated the null hypothesis condition table. We discussed about the type1 and type2 error and we realized type1 error is considered as threshold value for risk in industry.
The normal distribution curve was used to explain the concept of pvalue.
Let us consider the following normal distribution curve
Suppose if we set up the null hypothesis (H0) as there is no difference between the averages of two groups, then the difference between the averages should be ideally zero. So, in this curve, the H0 null is true at the center. The red part is decided to be the threshold for the risk (type 1 error explained above).
The probability is calculated considering that the null hypothesis is true. This probability is called as the pvalue. If pvalue is very small, it will fall under red region. This means that it is too far away than the H0, hence we decided to reject null hypothesis. If pvalue is big enough such that it doesn’t fall into red region, then we fail to reject the null hypothesis.
We saw basic Machine Learning Algorithms:
1. Unsupervised
Clustering

Hierarchical

DBSCAN

Kmeans – numerical

K Medoids/modes – categorical
2. Supervised (Features available)

Regression – Linear, Lasso, ridge, Regression tree

Classification Decision Trees, random forest, Logistic Regression, SVM, Naive Bayes
3. Supervised (Features not available) –

Deep Learning
We were explained the vital principals of machine learning:

Practice

Master the fundamentals

Plan before execution

Better data beats fancier algorithms
After understanding the basics lucidly, it was time to apply it now. We were divided into teams and we were given a dataset. We analyzed the features of the data set and analyzed every feature. We selected the features that were important for our results. Then according to the problem statement we decided it was a classification problem. We used Rapid Miner tool to implement different classification models on our dataset. After applying the models, we got a report displaying accuracy and execution time of each model. This report helped us understand the best model for our problem statement.
It was a day full of fun and learning. Thank you WiDS Pune and ezest for organizing such an amazing workshop.
Interesting.. keep sharing ur thoughts..
LikeLiked by 1 person