Skip to main content

Posts

Showing posts from 2014

Machine Learning Challenges with Imbalanced Data

Abstract: Application of Machine learning algorithms to some of the real-world problems pertaining to areas, like fraud/intrusion detection, medical diagnosis/monitoring, bio-informatics, text categorization and et al. where data set are not approximately equally distributed suffer from the perspective of reduced performance. The imbalances in class distribution often causes machine learning algorithms to perform poorly on the minority class. The cost minority class mis-classification is often unknown at learning time and can be far too high. A number of technique in data sampling, predominantly over-sampling and under-sampling, are proposed to address issues related to imbalanced data without discussing exactly how or why such methods work or what underlying issues they address. This paper tries to highlight some of the key challenges related to classification of imbalanced data while applying standard classification technique. This discusses some of the prevalent methods related

Intrinsically Motivated Systems

Abstract: Motivation is a very complex psychological behavior arising out of ones current physiological and psychological state of affairs. Motivation in humans is always associated or studied with incentive theories. As per human psychology, our intrinsic motivation factors are centered around intrinsic rewards which are considered critical for the development of cognitive intelligence. In that case, can an artificially learning machine be motivated to develop cognitive intelligence? What are the factors that would lead a machine learning system to motivate itself intrinsically? This paper discusses some of these question based on the latest research work carried out in the fields of development psychology, active learning, neuroscience, adaptive curiosity et. al., and see how this can be applied to our context of developing intrinsically motivated systems. Awaiting session recording. Will post it soon.

Effective Means of Handling Curse of Dimensionality

Abstract: Increase in dimensions of the data decrease the performance of the machine learning systems as the increase in the dimensions increase the problem space under analysis make data sparse. As the efficiency of the machine learning algorithms directly relates to the volume of the test data, increased space demands more data for better learning opportunities. To address this challenge, most of the time we tend to reduce some of the data dimensions searching for dimensions which are not directly related to the problem under analysis. For efficient reduction of dimensions we need to address the question "what is the idea dimensionality we can address without compromising on the sensitive of the dimensions?" This paper outline the problem of dimensionality not just from the angle of issues with high dimensional data leading to the reduction of dimensions but analyses how to efficiency balance the dimensions through better data projection techniques for more accurate resu

Effective Pattern Identification Model for DDoS Attack Detection

Abstract: Distributed Denial of Service (DDoS) attacks are one of the major challenges to Internet community. Attackers send legitimate packets with often changing information from various compromised systems at random and at a very high frequency, rendering the target non-responsive for normal traffic. DDoS attacks are difficult to detect with traditional detection methods and standard Intrusion Detection Systems (IDS). Standard IDS tries to analyze the network traffic or system logs trying to identify emerging patterns on the network traffic. But due to randomness of the package origins it is difficult segregate true, false positive and normal traffic. This paper proposes a model based on Artificial Neural Networks to identify anomalies and detect DDoS patterns. In the proposed system sets of known characteristic features, which can separate attacks from normal traffic, are fed to the system to train the Artificial Neural Networks (ANN). This self learn system improves with each n

CI ready for Hanlon

Good news Hanlon developers, Hanlon Git repo is integrated with Travis CI to perform automated unit test and code validation. This step improves the current contribution model in a number of ways... 1. All pull requests are automatically tested based on RSpec 2. Easier for contributors to validate pull request and provide thumbs-up 3. Improves the overall turn around time on PR merge.

Setting up Hanlon Development Environment (Update)

This is an update to my previous post Setting up Hanlon Development Environment . This update reflects the latest script changes incorporated Pre-requisites Operating System: Support any standard Linux distribution (Ubuntu LTS 14.04) Database: MongoDB Web Server: Trinidad Dev Tools: JRuby development enviroment with necessary gems Others: git, make, openjdk7, etc., Setting up Update your linux to get all the latest distributions apt-get install updates 1. Install pre-requisites apt-get install -y git make mongodb openjdk-7-jre-headless g++ curl   2. Install ruby environment with your preferred Ruby environment manager like rvm or rbenv (I prefer using rvm) and setup jruby as your default environment \curl -sSL https://get.rvm.io | bash -s stable --ruby=jruby source /usr/local/rvm/scripts/rvm rvm use jruby –default   3. Setup Hanlon working directory (My preferred location ~/wspace/hanlon) and clone hanon git repository cd mkdir wspace git clone https://github.com

Hanlon server can now run on Java

Hanlon server code is migrated / refactored to run on JRuby so that hanlon server can be deployed on Java Application Servers. This blog outline the process to create hanlon.jar file for JAS deployment Dependencies 1. OpenJDK 7 2. Jruby Hanlon Jar Creation Assuming hanlon git repo is cloned and a suitable dev environment setup. (ref. Setting up Hanlon Development Environment for further details) 1. Create hanlon.jar # go to hanlon home directory cd scripts ./create_war.sh create_war.sh download necessary dependencies (xxx_jdbc.jar, ruby gems etc.,) and includes them into the war file   2. Deploy war Created war file can be found under builds directory of hanlon home directory. This file can be deployed onto your favorite java application server. It is tested on tomcat, jboss and glassfish. If you get a chance to test on other JAS containers, please post your feedback for any improvements

Building Hanlon Micro-Kernel (MK)

I am posting a quick cookbook on creating hanlon microkernel for easy reference. Detailed information on how the microkernel organized and build along with details on each of the command listed here can be found on git hanlon microkernel wiki   1. Install dependencies sudo apt-get install squashfs-tools -y sudo apt-get install -y fakeroot sudo apt-get install p7zip-full -y sudo apt-get install curl -y   2. Install Ruby (I prefer using rvm) \curl -sSL https://get.rvm.io | bash -s stable --ruby source /home/user/.rvm/scripts/rvm   3. Clone hanlon micro-kernel project into your working directory (my directory ~/wspace/hanlon/hanlon-mk) cd mkdir wspace mkdir hanlon git clone hanlon-mk cd hanlon-mk   4. Clone hanlon micro-kernel project into your working directory (my directory ~/wspace/hanlon/hanlon-mk) cd mkdir wspace mkdir hanlon git clone hanlon-mk   5. Create bundle file: This would create a temporary tar file containing all necessary files to complete iso cr

Setting up Hanlon Development Environment

Pre-requisites Operating System: Support any standard Linux distribution (Ubuntu LTS 14.04) Database: MongoDB Web Server: Trinidad Dev Tools: JRuby development enviroment with necessary gems Others: git, make, openjdk7, etc., Setting up Update your linux to get all the latest distributions apt-get install updates 1. Install pre-requisites apt-get install -y git make mongodb openjdk-7-jre-headless g++ isc-dhcp-server ipxe tftp tftpd curl 2. Install ruby environment with your preferred Ruby environment manager like rvm or rbenv (I prefer using rvm) and setup jruby as your default environment \curl -sSL https://get.rvm.io | bash -s stable --ruby=jruby source /usr/local/rvm/scripts/rvm rvm use jruby --default 3. Setup Hanlon working directory (My preferred location ~/wspace/hanlon) and clone hanon git repository cd mkdir wspace git clone https://github.com/csc/Hanlon.git hanlon 4. Install / update ruby gems cd hanlon bundle install gem install bundler trinidad 5. Run trinida

Hanlon Announced Today

The CSC Open Source Program launched today with the first production ready version of Hanlon, a node provisioning solution. It is a major rewrite of the Razor project, which was originally written by Tom McSweeney and Nick Weaver two years ago, with an improved architecture and design. For people not familiar with Razor, Razor is an automated, policy driven OS provisioning and node control solution for both bare metal and virtual machines provisioning. A detailed overview can be found in Nick's blog . Tom McSweeney and Nick Weaver , who originally built Razor during their EMC days, launched it as open source through Puppet Labs , which grabbed a lot of attention from the community. A detailed history of Razor and the events that lead to the birth of Hanlon can be found in Tom McSweeney's Hanlon announcement blog . Coming back to Hanlon, Hanlon is released as two open source projects : Hanlon (the web server component to manage Hanlon nodes) and the Hanlon-Microke

Phone's Internet 1,000 Times Faster Than 4G

Yes, I am not joking. Artemis, a start up came with the concept called pCell  using small reception unit instead of big towers. They found a way to use tower interference between two towers in the favour of strengthening the signal. This interference was the biggest challenge in the current model which is forcing mobile providers to position towers minimal distant apart leading to signal strength issues. Moreover, the small pCells also address the issue of bandwidth choking at crowded places. artemis pcell visualization 1080p 1920x1080 - YouTube pcell will change the world Sculley & Perlman: Wireless World Is Breaking Open - YouTube Artemis pCell technology demonstration of the future of wireless networking - YouTube Discover Artemis Networks' pCell Technology New Thinking - YouTube pCell academic demonstration at Columbia by Steve Perlman, CEO, Artemis Networks - YouTube 5 things to know about pCell This is a much needed technology for Internet of Everyth

Java 8 GA Build Ready

Finally, after two years, seven months, and eighteen days after the release of JDK 7, production-ready builds of JDK 8 are now  available for download . While the  what's new  list is pretty exhaustive, as we know Cloud centric features are nowhere to be seen.

25 amazing social media quotes

Exciting Quotes... reposting here http://www.businessesgrow.com/2014/03/10/20-amazing-social-media-quotes-sxsw-2014/ Sandy Carter GM Ecosystems and Social Business IBM 1) “Social media does not change your culture, it reveals it.” 2) “We embed social media inside our processes. Let’s look at our processes and see how we can enhance them with social.” 3) “If all you did to improve your commercial presence was to train your sales people on the importance of influencers … how much more effective could they be?” 4) “We have a constantly-changing portfolio of social media experiments. The first time we tried applying social technologies in a customer service department it became the most productive department in the company.” 5) “In just 200 tweets we can assess and identify 52 different personality traits of a customer. We ran an analysis over 500,000 people and we really nailed this. Think of providing this powerful insight to a retailer. We can see what they value, no