[DL] Digit Recognition with Tensor Flow

This time I am going to continue with the kaggle 101 level competition – digit recogniser with deep learning tool Tensor Flow.
In the previous post, I used PCA and Pooling methods to reduce the dimensions of the dataset, and train with the linear SVM. Due to the limited efficiency of the R SVM package. I only sampled 500 records and performed a 10-fold cross validation. The resulting accuracy is about 82.7%

Read More

[RecSys] Implementation of Model Based Recommendation System in R

The most straight forward recommendation system are either user based CF (collaborative filtering) or item based CF, which are categorized as memory based methods. User-Based CF is to recommend products based on behaviour of similar users, and the Item-Based CF is to recommend similar products from products that user purchased. No matter which method is used, the user-user or item-item similarity matrix, which could be sizable, is required to compute.

Read More

[exploratory analysis] Job Hunting Like A Data Analyst (Part II)

Continued with previous post, I’ve added some additional lines of codes to fetch the job description of each job post. This will take a bit longer time, which is about (1.5 hour) for me, because I set a delay of ~10 seconds between each request.
This week I will continue with overview picture of the job market of Data Analyst and develop a simple recommender based on skill and experience requirement.

Read More

[vis] Getting Started With Tableau

Intro to Tableau

Aspired by the course ‘Data Visualization’ offered by University of Illinois on Cousera, I have worked on the interactive data visualization using Tableau. There is a free version of Tableau Public is available and you can upload the visualization online for sharing.
Tableau is one of the Business Intelligence tools that makes it easier to do with aesthetic chart plotting and interactive report generating. There are 3 main components used in Tableau: Worksheet, Dashboard and Story.

  • Worksheets are single chart or plot
  • Dashboard is a single page can compose with mupliple charts or plots
  • Story is like powerpoint in MS Office, which put a series of pages of charts in sequence.
Read More

[kaggle] Recognize the Digits

This time I am going to demostrate the kaggle 101 level competition - digit recogniser. We are asked to train a model to recogize the digit from the pixel data in this competition. The data set is available here. description of the data:

  1. label: the integers from 0 - 9;
  2. features: pixel001-pixel784, which are rolled out from 28x28 digit image;
  3. pixel data is ranged from 0 -255, which indicating the brightness of the pixel in grey scale;

Visualize the digit:

Let’s randomly look at 100 digit examples:

Read More