Skip to main content

Posts

Showing posts from October, 2011

Sampling strategies for Imbalanced Learning

As discussed in my previous blog, Imbalanced data poses serious challenges in Machine Learning.  One of approach to combat this imbalance is data is to alter the training set in such a way as to create a more balanced class distribution so that the resulting sampled data set can be used with traditional data-mining algorithms. This can be achieved through... 
Under-sample where the size of the majority class is reduced using different techniques like reducing redundancy, removing boundary candidates etc.,Over-sample where the size of the minority class is increased by adding more candidates which can augment the data set.Hybrid approach where a combination of both oversampling of minority class and under sampling of majority class is attempted. Each of these techniques discussed below

Random Over Sampling In random over-sampling, the minority class instances are duplicated in the data set until a more balanced distribution is reached. As a illustration, consider a data set of 100 it…