I have learned about What is Cross Validation and its types?
Cross Validation :
Cross-validation is a technique in machine learning.
It’s used to evaluate the performance of a predicted model.
The dataset is divided into subsets (folds).
The model is trained on some data folds and tested on other datafolds.
This process is repeated multiple times (k-fold) to assess performance robustly.
Cross-validation helps prevent overfitting and provides a better estimate of model generalization.
Types of Cross-validation:-
Leave-One-Out Cross-Validation (LOOCV):
N, the quantity of data points, is used to partition the dataset into N subsets. Thus one subset of the dataset is utilized for testing, while the other subsets are also used for testing.
One dataset is tested in each cycle while the others are used for training.
Each dataset serves as the test set once during the model’s N-time training and evaluation cycles.
It is helpful when you have little data or require a detailed assessment of the generality of your model. But it will perform less accurately when tested on a huge dataset.
k-Fold Cross-Validation:
The dataset is partitioned into k equal-sized subsets for k-fold cross-validation.
Each fold is utilized as the test set exactly once while the other k-1 folds are used for training. The model is trained and tested k times.
Let’s use an example where we divide a dataset of 12 records into three equal portions.
Data points 1, 2, 3, and 4 in Fold 1
Data points 5, 6, 7, and 8 in Fold 2
Data points 9, 10, 11, and 12 in Fold 3
1st iteration:
Training set: Folds 2 and 3 Testing set: Fold 1 The model is trained on the data points 5, 6, 7, 8, 9, 10, and 11, and tested on the data points 1, 2, and 3.
2nd iteration:
Fold 1 plus Fold 3 in the training set.
Set for testing: Fold 2
Data points 1, 2, 3, 4, 9, 10, and 12 are used to train the model, and data points 5, 6, and 8 are used to test it.
3rd iteration:
Exercise Set: Folds 1 and 2
Fold 3 is the test set.
Data points 1, 2, 3, 4, 5, 6, and 7 are used to train the model, and data points 9, 10, and 11 are used to test it.
You compute the average accuracy throughout the three iterations to provide an overall assessment of the model’s performance.
Stratified Cross-Validation:
This makes sure that the class distribution across all data folds is consistent.
It is advantageous for unbalanced datasets where one class has a disproportionately low number of examples.
To get more representative performance estimates for classification models, stratification is helpful.
Time Series Cross-Validation:
Time series data is a kind of archive of knowledge gathered over time that enables us to track changes in various phenomena.On the basis of time, we divided the data into training and testing segments. We begin by making predictions using a tiny amount of data. Then, we make future forecasts using those predictions along with our training data. Till all the data has been consumed, this process continues.