Monday, December 8, 2014

Machine Learning Challenges with Imbalanced Data

Abstract:

Application of Machine learning algorithms to some of the real-world problems pertaining to areas, like fraud/intrusion detection, medical diagnosis/monitoring, bio-informatics, text categorization and et al. where data set are not approximately equally distributed suffer from the perspective of reduced performance. The imbalances in class distribution often causes machine learning algorithms to perform poorly on the minority class. The cost minority class mis-classification is often unknown at learning time and can be far too high. A number of technique in data sampling, predominantly over-sampling and under-sampling, are proposed to address issues related to imbalanced data without discussing exactly how or why such methods work or what underlying issues they address. This paper tries to highlight some of the key challenges related to classification of imbalanced data while applying standard classification technique. This discusses some of the prevalent methods related to balancing the imbalanced data sets and their short comes in a hunt for better methods to handle the imbalanced data.  

Awaiting session recording. Will post it soon.

Intrinsically Motivated Systems

Abstract:

Motivation is a very complex psychological behavior arising out of ones current physiological and psychological state of affairs. Motivation in humans is always associated or studied with incentive theories. As per human psychology, our intrinsic motivation factors are centered around intrinsic rewards which are considered critical for the development of cognitive intelligence. In that case, can an artificially learning machine be motivated to develop cognitive intelligence? What are the factors that would lead a machine learning system to motivate itself intrinsically? This paper discusses some of these question based on the latest research work carried out in the fields of development psychology, active learning, neuroscience, adaptive curiosity et. al., and see how this can be applied to our context of developing intrinsically motivated systems.

Awaiting session recording. Will post it soon.

Effective Means of Handling Curse of Dimensionality

Abstract:

Increase in dimensions of the data decrease the performance of the machine learning systems as the increase in the dimensions increase the problem space under analysis make data sparse. As the efficiency of the machine learning algorithms directly relates to the volume of the test data, increased space demands more data for better learning opportunities. To address this challenge, most of the time we tend to reduce some of the data dimensions searching for dimensions which are not directly related to the problem under analysis. For efficient reduction of dimensions we need to address the question "what is the idea dimensionality we can address without compromising on the sensitive of the dimensions?" This paper outline the problem of dimensionality not just from the angle of issues with high dimensional data leading to the reduction of dimensions but analyses how to efficiency balance the dimensions through better data projection techniques for more accurate results.

Awaiting session recording. Will post it soon.

Effective Pattern Identification Model for DDoS Attack Detection

Abstract:

Distributed Denial of Service (DDoS) attacks are one of the major challenges to Internet community. Attackers send legitimate packets with often changing information from various compromised systems at random and at a very high frequency, rendering the target non-responsive for normal traffic. DDoS attacks are difficult to detect with traditional detection methods and standard Intrusion Detection Systems (IDS). Standard IDS tries to analyze the network traffic or system logs trying to identify emerging patterns on the network traffic. But due to randomness of the package origins it is difficult segregate true, false positive and normal traffic. This paper proposes a model based on Artificial Neural Networks to identify anomalies and detect DDoS patterns. In the proposed system sets of known characteristic features, which can separate attacks from normal traffic, are fed to the system to train the Artificial Neural Networks (ANN). This self learn system improves with each new attack as the false positives decrease and detection accuracy improves.

Awaiting session recording. Will post it soon.