Last 8 months were been really hectic with the delivery commitments which kept me away from my blogs. Taking sometime, thinking of reviving my blog postings which were offline for quite some time due to expired membership with my hosting provider. Till I could make some proper arrangements on that I am thinking of posting all my old blogs here.
For many real world machine learning problem we see an imbalance in the data where one class under represented in relative to others. This leads to mis-classification of elements between classes. The cost of mis-classification is often unknown at learning time and can be far too high. We often see this type of imbalanced classification scenarios in fraud/intrusion detection, medical diagnosis/monitoring, bio-informatics, text categorization and et al. To better understand the problem, consider the “Mammography Data Set,” a collection of images acquired from a series of mammography examinations performed on a set of distinct patients. For such a data set, the natural classes that arise are “Positive” or “Negative” for an image representative of a “cancerous” or “healthy” patient, respectively. From experience, one would expect the number of noncancerous patients to exceed greatly the number of cancerous patients; indeed, this data set contains 10,923 “Negative” (majority class) and
Comments
Post a Comment