Monday, December 8, 2014

Machine Learning Challenges with Imbalanced Data

Abstract:

Application of Machine learning algorithms to some of the real-world problems pertaining to areas, like fraud/intrusion detection, medical diagnosis/monitoring, bio-informatics, text categorization and et al. where data set are not approximately equally distributed suffer from the perspective of reduced performance. The imbalances in class distribution often causes machine learning algorithms to perform poorly on the minority class. The cost minority class mis-classification is often unknown at learning time and can be far too high. A number of technique in data sampling, predominantly over-sampling and under-sampling, are proposed to address issues related to imbalanced data without discussing exactly how or why such methods work or what underlying issues they address. This paper tries to highlight some of the key challenges related to classification of imbalanced data while applying standard classification technique. This discusses some of the prevalent methods related to balancing the imbalanced data sets and their short comes in a hunt for better methods to handle the imbalanced data.  

Awaiting session recording. Will post it soon.

Intrinsically Motivated Systems

Abstract:

Motivation is a very complex psychological behavior arising out of ones current physiological and psychological state of affairs. Motivation in humans is always associated or studied with incentive theories. As per human psychology, our intrinsic motivation factors are centered around intrinsic rewards which are considered critical for the development of cognitive intelligence. In that case, can an artificially learning machine be motivated to develop cognitive intelligence? What are the factors that would lead a machine learning system to motivate itself intrinsically? This paper discusses some of these question based on the latest research work carried out in the fields of development psychology, active learning, neuroscience, adaptive curiosity et. al., and see how this can be applied to our context of developing intrinsically motivated systems.

Awaiting session recording. Will post it soon.

Effective Means of Handling Curse of Dimensionality

Abstract:

Increase in dimensions of the data decrease the performance of the machine learning systems as the increase in the dimensions increase the problem space under analysis make data sparse. As the efficiency of the machine learning algorithms directly relates to the volume of the test data, increased space demands more data for better learning opportunities. To address this challenge, most of the time we tend to reduce some of the data dimensions searching for dimensions which are not directly related to the problem under analysis. For efficient reduction of dimensions we need to address the question "what is the idea dimensionality we can address without compromising on the sensitive of the dimensions?" This paper outline the problem of dimensionality not just from the angle of issues with high dimensional data leading to the reduction of dimensions but analyses how to efficiency balance the dimensions through better data projection techniques for more accurate results.

Awaiting session recording. Will post it soon.

Effective Pattern Identification Model for DDoS Attack Detection

Abstract:

Distributed Denial of Service (DDoS) attacks are one of the major challenges to Internet community. Attackers send legitimate packets with often changing information from various compromised systems at random and at a very high frequency, rendering the target non-responsive for normal traffic. DDoS attacks are difficult to detect with traditional detection methods and standard Intrusion Detection Systems (IDS). Standard IDS tries to analyze the network traffic or system logs trying to identify emerging patterns on the network traffic. But due to randomness of the package origins it is difficult segregate true, false positive and normal traffic. This paper proposes a model based on Artificial Neural Networks to identify anomalies and detect DDoS patterns. In the proposed system sets of known characteristic features, which can separate attacks from normal traffic, are fed to the system to train the Artificial Neural Networks (ANN). This self learn system improves with each new attack as the false positives decrease and detection accuracy improves.

Awaiting session recording. Will post it soon.

Tuesday, November 25, 2014

CI ready for Hanlon

Good news Hanlon developers, Hanlon Git repo is integrated with Travis CI to perform automated unit test and code validation. This step improves the current contribution model in a number of ways...

1. All pull requests are automatically tested based on RSpec
2. Easier for contributors to validate pull request and provide thumbs-up
3. Improves the overall turn around time on PR merge.

Sunday, September 28, 2014

Setting up Hanlon Development Environment (Update)

This is an update to my previous post Setting up Hanlon Development Environment. This update reflects the latest script changes incorporated

Pre-requisites

Operating System: Support any standard Linux distribution (Ubuntu LTS 14.04)
Database: MongoDB
Web Server: Trinidad
Dev Tools: JRuby development enviroment with necessary gems
Others: git, make, openjdk7, etc.,

Setting up

Update your linux to get all the latest distributions

apt-get install updates

1. Install pre-requisites

apt-get install -y git make mongodb openjdk-7-jre-headless g++ curl

 

2. Install ruby environment with your preferred Ruby environment manager like rvm or rbenv (I prefer using rvm) and setup jruby as your default environment

\curl -sSL https://get.rvm.io | bash -s stable --ruby=jruby
source /usr/local/rvm/scripts/rvm
rvm use jruby –default

 

3. Setup Hanlon working directory (My preferred location ~/wspace/hanlon) and clone hanon git repository

cd
mkdir wspace
git clone
https://github.com/csc/Hanlon.git hanlon

 

4. Install / update ruby gems

cd hanlon
bundle install
gem install bundler trinidad

Hanlon now comes with scripted support both for trinidad and puma. (I prefer running trinidad)

 

5. Run hanlon initialization script

./hanlon_init.rb

This script creates config file and necessary directory structure to run hanlon.

 

6. Edit hanlon client and server configuration files

Now hanlon client and server configuration files are segregated to reflect right configuration parameter. Files created by hanlon_init (samples included below) are suitable for most of the practical purpose. Do edit them (just in case) before starting the server

~/wspace/hanlon/cli/config/hanlon_client.config

# This file is the main configuration for ProjectHanlon
#
# -- this was system generated --
#
#
--- !ruby/object:ProjectHanlon::Config::Client
noun: config
admin_port: 8025
api_port: 8026
api_version: v1
base_path: /hanlon/api
hanlon_log_level: Logger::ERROR
hanlon_server: 192.168.190.11
http_timeout: 60

~/wspace/hanlon/cli/config/hanlon_server.config

#
# This file is the main configuration for ProjectHanlon
#
# -- this was system generated --
#
#
--- !ruby/object:ProjectHanlon::Config::Server
noun: config
admin_port: 8025
api_port: 8026
api_version: v1
base_path: /hanlon/api
daemon_min_cycle_time: 30
force_mk_uuid: ''
hanlon_log_level: Logger::ERROR
hanlon_server: 192.168.190.11
hnl_mk_boot_debug_level: Logger::ERROR
hnl_mk_boot_kernel_args: ''
image_path: /home/user/wspace/hanlon/image
ipmi_password: ''
ipmi_username: ''
ipmi_utility: ''
mk_checkin_interval: 60
mk_checkin_skew: 5
mk_gem_mirror:
http://localhost:2158/gem-mirror
mk_gemlist_uri: /gems/gem.list
mk_kmod_install_list_uri: /kmod-install-list
mk_log_level: Logger::ERROR
mk_tce_install_list_uri: /tce-install-list
mk_tce_mirror:
http://localhost:2157/tinycorelinux
node_expire_timeout: 300
persist_dbname: project_hanlon
persist_host: 127.0.0.1
persist_mode: :mongo
persist_password: ''
persist_port: 27017
persist_timeout: 10
persist_username: ''
register_timeout: 120
sui_allow_access: 'true'
sui_mount_path: /docs

 

7. Run trinidad server

cd web
./run-trinidad.sh

Tuesday, August 12, 2014

Hanlon server can now run on Java

Hanlon server code is migrated / refactored to run on JRuby so that hanlon server can be deployed on Java Application Servers. This blog outline the process to create hanlon.jar file for JAS deployment

Dependencies

1. OpenJDK 7
2. Jruby

Hanlon Jar Creation

Assuming hanlon git repo is cloned and a suitable dev environment setup. (ref. Setting up Hanlon Development Environment for further details)

1. Create hanlon.jar

# go to hanlon home directory
cd scripts
./create_war.sh

create_war.sh download necessary dependencies (xxx_jdbc.jar, ruby gems etc.,) and includes them into the war file

 

2. Deploy war

Created war file can be found under builds directory of hanlon home directory. This file can be deployed onto your favorite java application server. It is tested on tomcat, jboss and glassfish. If you get a chance to test on other JAS containers, please post your feedback for any improvements

Monday, August 4, 2014

Building Hanlon Micro-Kernel (MK)

I am posting a quick cookbook on creating hanlon microkernel for easy reference. Detailed information on how the microkernel organized and build along with details on each of the command listed here can be found on git hanlon microkernel wiki

 

1. Install dependencies

sudo apt-get install squashfs-tools -y
sudo apt-get install -y fakeroot
sudo apt-get install p7zip-full -y
sudo apt-get install curl -y

 

2. Install Ruby (I prefer using rvm)

\curl -sSL https://get.rvm.io | bash -s stable --ruby
source /home/user/.rvm/scripts/rvm

 

3. Clone hanlon micro-kernel project into your working directory (my directory ~/wspace/hanlon/hanlon-mk)

cd
mkdir wspace
mkdir hanlon
git clone hanlon-mk
cd hanlon-mk

 

4. Clone hanlon micro-kernel project into your working directory (my directory ~/wspace/hanlon/hanlon-mk)

cd
mkdir wspace
mkdir hanlon
git clone hanlon-mk

 

5. Create bundle file: This would create a temporary tar file containing all necessary files to complete iso creation process

cd hanlon-mk
./build-bundle-file.sh -d -t test1234 -b additional-build-files/builtin-extensions.lst -m additional-build-files/mirror-extensions.lst

This would create hanlon-microkernal-bundle-<mode>.tar.gz. Mode of the file depends on dev/debug/prod switch selected with build-bundle-file.sh

 

6. Create iso file structure with tar file

cd bundle_files
tar zxvf hanlon-microkernel-bundle-debug.tar.gz
fakeroot ./build_initial_directories.sh

This would create a directory structure to be used for the microkernel iso file. build_initial_directories.sh should be run as root. Because I do not prefer installing ruby at root, I am using fakeroot to work around the issue.

 

7. Create final iso file

./rebuild_iso.sh
Build the mk iso (in my case it is hnl_mk_debug-image.2.0.0+2-g99e078f.iso). File is naming follows the convention hnl_mk_<mode>-image.<version>-<git-stamp>.iso

Sunday, May 25, 2014

Setting up Hanlon Development Environment

Pre-requisites

Operating System: Support any standard Linux distribution (Ubuntu LTS 14.04)
Database: MongoDB
Web Server: Trinidad
Dev Tools: JRuby development enviroment with necessary gems
Others: git, make, openjdk7, etc.,

Setting up

Update your linux to get all the latest distributions

apt-get install updates

1. Install pre-requisites

apt-get install -y git make mongodb openjdk-7-jre-headless g++ isc-dhcp-server ipxe tftp tftpd curl

2. Install ruby environment with your preferred Ruby environment manager like rvm or rbenv (I prefer using rvm) and setup jruby as your default environment

\curl -sSL https://get.rvm.io | bash -s stable --ruby=jruby
source /usr/local/rvm/scripts/rvm
rvm use jruby --default

3. Setup Hanlon working directory (My preferred location ~/wspace/hanlon) and clone hanon git repository

cd
mkdir wspace
git clone
https://github.com/csc/Hanlon.git hanlon

4. Install / update ruby gems

cd hanlon
bundle install
gem install bundler trinidad

5. Run trinidad server

trinidad --address 0.0.0.0 -p 8026 2>&1 | tee /tmp/trinidad.log

Thursday, May 22, 2014

Hanlon Announced Today

The CSC Open Source Program launched today with the first production ready version of Hanlon, a node provisioning solution. It is a major rewrite of the Razor project, which was originally written by Tom McSweeney and Nick Weaver two years ago, with an improved architecture and design.
For people not familiar with Razor, Razor is an automated, policy driven OS provisioning and node control solution for both bare metal and virtual machines provisioning. A detailed overview can be found in Nick's blog. Tom McSweeney and Nick Weaver, who originally built Razor during their EMC days, launched it as open source through Puppet Labs, which grabbed a lot of attention from the community. A detailed history of Razor and the events that lead to the birth of Hanlon can be found in Tom McSweeney's Hanlon announcement blog.

Coming back to Hanlon, Hanlon is released as two open source projects: Hanlon (the web server component to manage Hanlon nodes) and the Hanlon-Microkernel (a light weight Linux kernel built out of Tiny Core Linux to boot and monitor Hanlon nodes). Hanlon and the Hanlon-Microkernel are distributed under the Apache 2.0 and GPLv2 licenses respectively. Please read the Hanlon License for details. Production ready builds are available through the Hanlon and the Hanlon-Microkernel project pages.

Following the Hanlon philosophy --- when you are seeking an explanation or solution to a problem, “Everything should be made as simple as possible, but no simpler”---  Hanlon components are built to be very simple to solve the problem of policy-driven node provisioning, but not simpler in terms of what can be achieved out of it. 

Setting up and running Hanlon is very simple. All the information related to installation, configuration and command line instructions can be found on the Hanlon Wiki. Additional links can be found below.
As one of the contributors I am pretty excited about the release. Will keep posting more about Hanlon in the coming days.


Wednesday, March 19, 2014

Phone's Internet 1,000 Times Faster Than 4G

Yes, I am not joking.

Artemis, a start up came with the concept called pCell using small reception unit instead of big towers. They found a way to use tower interference between two towers in the favour of strengthening the signal. This interference was the biggest challenge in the current model which is forcing mobile providers to position towers minimal distant apart leading to signal strength issues.

Moreover, the small pCells also address the issue of bandwidth choking at crowded places.

5 things to know about pCell

This is a much needed technology for Internet of Everything and Pervasive computing to flourish

Wireless internet seems bright :)

Java 8 GA Build Ready

Finally, after two years, seven months, and eighteen days after the release of JDK 7, production-ready builds of JDK 8 are now available for download.

While the what's new list is pretty exhaustive, as we know Cloud centric features are nowhere to be seen.

Wednesday, March 12, 2014

25 amazing social media quotes

Exciting Quotes... reposting here
http://www.businessesgrow.com/2014/03/10/20-amazing-social-media-quotes-sxsw-2014/

Sandy Carter GM Ecosystems and Social Business IBM

1) “Social media does not change your culture, it reveals it.”
2) “We embed social media inside our processes. Let’s look at our processes and see how we can enhance them with social.”
3) “If all you did to improve your commercial presence was to train your sales people on the importance of influencers … how much more effective could they be?”
4) “We have a constantly-changing portfolio of social media experiments. The first time we tried applying social technologies in a customer service department it became the most productive department in the company.”
5) “In just 200 tweets we can assess and identify 52 different personality traits of a customer. We ran an analysis over 500,000 people and we really nailed this. Think of providing this powerful insight to a retailer. We can see what they value, not just what they are buying.  We have found a 40-45% increase in sales when you recommend upsales based on values instead of past buying behavior.”
6) “Some CEO’s feel like if the ‘opt out of social’ they are somehow protected. That is just crazy.”
7) “Security is a big concern on the social web. People are going to try to destroy social media just like they are trying to breach data in other areas.”
8) “The number one use case for social media among our customers is around innovation – innovating with employees and with customers. For most businesses this is going to deliver the highest ROI.”
9) “We have our own internal version of Klout. We do rate people in this way – their effectiveness on social media. Tying social into a performance measurement works. The productivity of a sales who has an effective social media presence is 3x an employee who is not active on the web.”

Mike Stenberg VP web & infrastructure Siemens

10) “In some ways, social has done too good of a job in the field of marketing, So much attention is there, it has been difficult to get it into other venues because it has been so successful.”
11) “If you have the leadership team be social, it will set an example for others. It can’t flow from the ground up. To get the middle managers involved, it has to be demonstrated from the top.”
12) “Data is a political tool. If you put data on the table it can adjust the way you manage, the way you lead.”

Andrew Bowins SVP External Communications MasterCard

13) “As long as we are not showing the real value of this effort in the language and terms of the business, we’re just teenage social media gurus generating Facebook Likes.”
14) “To drive employee adoption, you need to give them a message to rally around. Something to inspire and create pride. And then you need to take down the internal rules that stifle engagement and creativity.”
15) “We are in era of content pollution. We need to go beyond the world of filling the world with **** content and step back and listen to people and see what we’re really doing to them out there.”
16) “Curating and sharing something meaningful, driving participation — that’s where we need to go. Once we stopped shouting and started creating targeted, meaningful communication, we saw a 400% increase in engagement.”
17) “Look at content exchanges with customers. Use data and content to earn trust by being helpful and informative.”
18) “We need to get beyond ‘click accept.’ We need to have loud voice and encourage our customers to think about what they are doing, where the limits are. If we don’t, we will have practices driven by fear and invite regulation.”
19) “Do you have the right data? If you have bad data you will get bad answers.”

Natanya Anderson Social Media Coordinator – Whole Foods

20) “Creating a social enterprise with hourly employees is incredibly hard. There are labor laws, policies to deal with, personalities and people that are always churning. You always have to wonder, is this a person helping us with social or are they just texting their friends?”
21) “Company culture is the hardest part of the puzzle. After that it is just policy and procedures and that is pretty easy to solve.”
22) “We are always striving to make social customer data real and meaningful at the store level. We spend a lot of time on internal infographics to help explain what we do and why this is important. It’s not just about data. This still takes some human care and insight to make it work.”
23) “Business is about service to our customers. If you don’t put that value first in everything we do, including your presence, your, measurement, and analytics you are lost.”
24) “Getting buy-in for social from the C-suite is usually not so difficult. It is the next level of management who present the biggest challenge to enterprise social media. They are actually responsible for the human resources to get the job done.”
25) “To overcome adoption hurdles, you have to make it easy to integrate social into the work employees are already doing. It can’t be a scary commitment – it has to be a natural extension of what you do. We have trained our employees to do their jobs with any eye toward social – If they admire a new product on our shelf, use a camera to give a visual image into the store and all that they love, Make it crazy easy to participate.”