Recent posts

Shiny + shinydashboard + googleVis = Powerful Interactive Visiualization

4 minute read

If you are a data scientist, who spent several weeks on developing a fantanstic model, you’d like to have an equally awesome way to visualize and demo your results. For R users, ggplots are good option, but no longer sufficient. R-shiny + shinydashboard + googleVis could be a wonderful combination for a quick demo application. For the purpose of illustration, I just downloaded a random sample data test.csv from kaggle’s latest competitions: https://www.kaggle.com/c/new-york-city-taxi-fare-pre...

Digit Recognition with Tensor Flow

7 minute read

This time I am going to continue with the kaggle 101 level competition – digit recogniser with deep learning tool Tensor Flow. In the previous post, I used PCA and Pooling methods to reduce the dimensions of the dataset, and train with the linear SVM. Due to the limited efficiency of the R SVM package. I only sampled 500 records and performed a 10-fold cross validation. The resulting accuracy is about 82.7% 1. this time with tensorflow we can address the problem differently: Deep Lea...

Implementation of Model Based Recommendation System in R

1 minute read

The most straight forward recommendation system are either user based CF (collaborative filtering) or item based CF, which are categorized as memory based methods. User-Based CF is to recommend products based on behaviour of similar users, and the Item-Based CF is to recommend similar products from products that user purchased. No matter which method is used, the user-user or item-item similarity matrix, which could be sizable, is required to compute. While on the contrast, a model based app...

Revisit Titanic Data using Apache Spark

5 minute read

This post is mainly to demonstrate the pyspark API (Spark 1.6.1), using Titanic dataset, which can be found here (train.csv, test.csv). Another post analysing the same dataset using R can be found here. Content Data Loading and Parsing Data Manipulation Feature Engineering Apply Spark ml/mllib models 1. data loading & parsing data loading sc is the SparkContext launched together with pyspark. Using sc.textFile, we can read csv file as text in RDD data format and data is sep...

Tableau Intersection Filter Tutorial

less than 1 minute read

If you used Tableau before, you will know that the filters in Tableau are union/or selection.Let’s take the table below for example. If you are going to create a filter and select product a & b, tableau will show client A,B,C and E instead of A,C. It’s because the filters will show us the list of clients who purchased product a or b, instead of product a and b. the idea Firstly, create a variable to count the selection of products. Then create another variable to count the selection...