Getting Started With Tableau

2 minute read

Intro to Tableau

Aspired by the course ‘Data Visualization’ offered by University of Illinois on Cousera, I have worked on the interactive data visualization using Tableau. There is a free version of Tableau Public is available and you can upload the visualization online for sharing.
Tableau is one of the Business Intelligence tools that makes it easier to do with aesthetic chart plotting and interactive report generating. There are 3 main components used in Tableau: Worksheet, Dashboard and Story.

  • Worksheets are single chart or plot
  • Dashboard is a single page can compose with mupliple charts or plots
  • Story is like powerpoint in MS Office, which put a series of pages of charts in sequence.

There are two kinds of charts that interest me: smart map and bubble chart.
I’m going to make use of the two chart to visual the U.S. Weather data set.

Description of the data set

The data is taken by U.S. Weather Serivce, the link of data set is available here. The comphrensive description of the data is in here.

Data manipulation using R

As Tableau is not ideal for data munging, I would like to do the first-hand pre-processing with R.

head(data[c("EVTYPE",   "LATITUDE"  , "LONGITUDE" , "STATE" ,"BGN_DATE","FATALITIES","INJURIES","PROPDMG","PROPDMGEXP" )])
   EVTYPE LATITUDE LONGITUDE STATE           BGN_DATE FATALITIES INJURIES
1 TORNADO     3040      8812    AL  4/18/1950 0:00:00          0       15
2 TORNADO     3042      8755    AL  4/18/1950 0:00:00          0        0
3 TORNADO     3340      8742    AL  2/20/1951 0:00:00          0        2
4 TORNADO     3458      8626    AL   6/8/1951 0:00:00          0        2
5 TORNADO     3412      8642    AL 11/15/1951 0:00:00          0        2
6 TORNADO     3450      8748    AL 11/15/1951 0:00:00          0        6
  PROPDMG PROPDMGEXP
1    25.0          K
2     2.5          K
3    25.0          K
4     2.5          K
5     2.5          K
6     2.5          K

Formatting variable Year, Longitude, Latitude

data$BGN_DATE<-as.Date(data$BGN_DATE,format='%m/%d/%Y %H:%M:%S')
library(lubridate)
data$Year<-year(data$BGN_DATE)
data$LONGITUDE<-(-data$LONGITUDE/100)
data$LATITUDE<-data$LATITUDE/100
head(data[c("EVTYPE",   "LATITUDE"  , "LONGITUDE" , "STATE" ,"Year","FATALITIES","INJURIES","PROPDMG","PROPDMGEXP" )])
   EVTYPE LATITUDE LONGITUDE STATE Year FATALITIES INJURIES PROPDMG
1 TORNADO    30.40    -88.12    AL 1950          0       15    25.0
2 TORNADO    30.42    -87.55    AL 1950          0        0     2.5
3 TORNADO    33.40    -87.42    AL 1951          0        2    25.0
4 TORNADO    34.58    -86.26    AL 1951          0        2     2.5
5 TORNADO    34.12    -86.42    AL 1951          0        2     2.5
6 TORNADO    34.50    -87.48    AL 1951          0        6     2.5
  PROPDMGEXP
1          K
2          K
3          K
4          K
5          K
6          K

Creating Economic Loss = Property Loss + Crop Loss

data$PROPDMGEXP<-transformEXP(data$PROPDMGEXP)
data$CROPDMGEXP<-transformEXP(data$CROPDMGEXP)
data$economicLoss<-with(data,(PROPDMG*(10**PROPDMGEXP)+CROPDMG*(10**CROPDMGEXP)))

Creating Health Impact = Fatalities + Injuries

data$healthImpact<-data$FATALITIES+data$INJURIES
head(data[c("EVTYPE",   "LATITUDE"  , "LONGITUDE" , "STATE" ,"BGN_DATE","healthImpact","economicLoss" )])
   EVTYPE LATITUDE LONGITUDE STATE   BGN_DATE healthImpact economicLoss
1 TORNADO    30.40    -88.12    AL 1950-04-18           15        25000
2 TORNADO    30.42    -87.55    AL 1950-04-18            0         2500
3 TORNADO    33.40    -87.42    AL 1951-02-20            2        25000
4 TORNADO    34.58    -86.26    AL 1951-06-08            2         2500
5 TORNADO    34.12    -86.42    AL 1951-11-15            2         2500
6 TORNADO    34.50    -87.48    AL 1951-11-15            6         2500

Subset the top 5 weather

Too many EVTYPE, let’s look at the top 5 in frequency, after removing the disasters without impact or loss.

data<-data[data$economicLoss!=0 | data$healthImpact!=0,]
freqEVTYPE<-data.frame(table(data$EVTYPE))
freqEVTYPE<-sortDF(freqEVTYPE,freqEVTYPE$Freq)
head(freqEVTYPE)
                 Var1  Freq
423         TSTM WIND 63234
364 THUNDERSTORM WIND 43655
407           TORNADO 39944
134              HAIL 26130
73        FLASH FLOOD 20967
258         LIGHTNING 13293
top5<-as.character(freqEVTYPE[1:5,1])
data<-data[data$EVTYPE %in% top5,]

Select the variables we need

var<-c("EVTYPE",   "LATITUDE"  , "LONGITUDE" , "STATE" ,"Year","healthImpact","economicLoss" )
data<-data[,var]

Output the data.csv

write.csv(data,'data.csv',row.names=F)

Leave a comment