Spatial Visualization with ggmap R package

3 minute read

ggmap, which is an R package built to visualize with map, is very similar to ggplot2. And its output is ‘ggplot’ class, which means it also support layered visualization just like ggplot2. I will demonstrate with two examples “Crimes in San Fransisco” and “Taxi in Porto”. The data of both examples are taken from kaggle.

some basics of ggmap:

get_map: is the function to download map from source (e.g. google/openstreetmap).

some parameters can play with:

location: the longitude and the latitude coordinates or the geocode (e.g. ‘San Francisco’)
maptype: could be ‘terrain’, ‘satellite’, ‘roadmap’, ‘hybrid’ … I usually prefer ‘terrain’.
zoom: from 3 (continental level) to 21 (building level). I usually use 10-14 (city level).
ggmap: is the function to plot map, can be appended with ggplot geom function to add layer.

Crime in San Fransisco

This data set consists of the criminal records of different categories in San Fransisco from 2003 to 2015. Let’s only look at the records in year 2014.

head(data_2014[c('Category','DayOfWeek','PdDistrict','X','Y','Year')])

##            Category DayOfWeek PdDistrict         X        Y Year
## 27585 LARCENY/THEFT    Sunday    MISSION -122.4356 37.76080 2014
## 27586 LARCENY/THEFT    Sunday   NORTHERN -122.4336 37.80004 2014
## 27587 VEHICLE THEFT    Sunday    CENTRAL -122.4128 37.78683 2014
## 27588       ASSAULT    Sunday   NORTHERN -122.4217 37.77364 2014
## 27589       ASSAULT    Sunday   NORTHERN -122.4217 37.77364 2014
## 27590       ASSAULT    Sunday   NORTHERN -122.4217 37.77364 2014

The most frequent crimes are Larceny, other offences, non-criminal, assault and vehicle theft, among which the other offences and non criminal are maybe not so interesting. Therefore I’m going to plot the top 3 crimes with specific categories, namely Larceny, Assault and Vehicle Theft.

Crime=data.frame(table(data_2014$Category))
Crime=Crime[order(Crime$Freq,decreasing=T),]
head(Crime)

##              Var1  Freq
## 17  LARCENY/THEFT 18901
## 22 OTHER OFFENSES 10290
## 21   NON-CRIMINAL  9669
## 2         ASSAULT  6254
## 35  VEHICLE THEFT  3564
## 34      VANDALISM  3538

Use ggmap to plot the map and use geom_jitter to plot the crimes in San Francisco:

library(ggmap)
map=get_map(location='San Fransico',maptype='roadmap',zoom=12)

## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=San+Fransico&zoom=12&size=640x640&scale=2&maptype=roadmap&language=en-EN&sensor=false

plot_crime=ggmap(map)+geom_jitter(aes(X,Y,colour=Category),
  data=data_2014[data_2014$Category %in% Top3,],alpha=0.2)+geom_jitter(aes(X,Y,colour=Category),
	data=data0[data0$Category %in% Top3,],alpha=1)+labs(x='Longitude',y='Latitude')
plot_crime

plot of chunk unnamed-chunk-3

Some Insights:

There are much more Larceny/Theft (in green) cases in 2014 than the rest two crimes and most of the cases happened in North Coast. Is it because the place is closer to Alcatraz Island? I guess not. The assault crimes (in red) are mostly clustered near Tenderloin, which is indeed a high-crime neighbourhood in San Francisco, quoted from wikipedia. The Vehicle theft crime (in blue) is much less frequent compared with the top 2 and there is no particular worse area.

Taxi in Porto

This data set contain a collection of location trajectory of the taxis in Porto, which is the 2nd largest city in Portugal.
Let’s take a overview of the data:

head(test[c('TRIP_ID','TAXI_ID','TIMESTAMP','DAY_TYPE','MISSING_DATA')])

##   TRIP_ID  TAXI_ID  TIMESTAMP DAY_TYPE MISSING_DATA
## 1      T1 20000542 1408039037        A        False
## 2      T2 20000108 1408038611        A        False
## 3      T3 20000370 1408038568        A        False
## 4      T4 20000492 1408039090        A        False
## 5      T5 20000621 1408039177        A        False
## 6      T6 20000607 1408037146        A        False

This is interesting data, but we are not going to use that. We will only use the last column of the data set, which is the POLYLIN (trajectory of the taxi). We define the first location to be pick-up point and the last location to be drop-off point.

Pre-Processing

test$POLYLINE[1]

## [1] "[[-8.585676,41.148522],[-8.585712,41.148639],[-8.585685,41.148855],[-8.58573,41.148927],[-8.585982,41.148963],[-8.586396,41.148954],[-8.586072,41.14872],[-8.586324,41.147847],[-8.586999,41.14746],[-8.586576,41.147154],[-8.584884,41.146623]]"

getCoord=function(x){
  x=gsub('[/[]','',x)
	x=gsub(']','',x)
	x=strsplit(x,',')[[1]]
	n=length(x)
	lon=as.numeric(x[seq(1,n,2)])
	lat=as.numeric(x[seq(2,n,2)])
	df=data.frame(lon=lon,lat=lat)
	df$status='moving'
	df$status[1]='pickup'
	df$status[nrow(df)]='dropoff'
	return(df)
}

After some pre-processing, we will transform the POLYLINE into a data frame contains series of longitude, latitude, status (pick-up/moving/drop-off).

loc=NULL
for (i in 1:length(polyline)){
  loc=rbind(loc,getCoord(polyline[i]))
}
head(loc)

##         lon      lat status
## 1 -8.585676 41.14852 pickup
## 2 -8.585712 41.14864 moving
## 3 -8.585685 41.14885 moving
## 4 -8.585730 41.14893 moving
## 5 -8.585982 41.14896 moving
## 6 -8.586396 41.14895 moving

Similar we use ggmap to plot the map of Porto with trajectory of taxis overlaid.

library(ggmap)
map=get_map(location=c(lon=median(loc$lon),lat=median(loc$lat)),
  maptype='roadmap',zoom=13)

## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=41.161833,-8.612033&zoom=13&size=640x640&scale=2&maptype=roadmap&language=en-EN&sensor=false

plot_taxi=ggmap(map)+labs(x='Longitude',y='Latitude')+
geom_jitter(aes(lon,lat,colour=factor(status)),data=loc[loc$status=='moving',],alpha=0.4)+
geom_point(aes(lon,lat,colour=factor(status)),data=centers,size=10,alpha=0.6)+
geom_jitter(aes(lon,lat,colour=factor(status)),data=loc[loc$status!='moving',],alpha=0.8)
plot_taxi

## Warning: Removed 2616 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 53 rows containing missing values (geom_point).

plot of chunk unnamed-chunk-7

Some Insights:

I’ve never been to Portugal, but I can guess Vitoria is the CBD area or places of attractions in Porto, as the taxi trajectories are very dense at the southern Porto, around the Vitoria.
There are two main pick-up and drop-off area in the northeast and the northwest, which I guess are two residential area.
If you noticed there’re some big spots I drew on the map, which are the clustered pick-up and drop-off area with using simple kmeans clustering.

the completed R code is available: here, if you are interested to check out.

Share on

Twitter Facebook LinkedIn

Chaoran Liu

Spatial Visualization with ggmap R package

some basics of ggmap:

some parameters can play with:

Crime in San Fransisco

Some Insights:

Taxi in Porto

Pre-Processing

Some Insights:

Share on

Leave a comment

You may also enjoy

Singapore PR Application Profile Estimate

Modified `readthedown` RMarkdown template for stylish analytical documents

Deploy deep learning models in browser using Tensorflow.js

Live Demo of predictive vital measurement from browser using Tensorflow.js

Chaoran Liu

some basics of ggmap:

some parameters can play with:

Crime in San Fransisco

Some Insights:

Taxi in Porto

Pre-Processing

Some Insights:

Share on

Leave a comment

You may also enjoy

Singapore PR Application Profile Estimate

Modified readthedown RMarkdown template for stylish analytical documents

Deploy deep learning models in browser using Tensorflow.js

Live Demo of predictive vital measurement from browser using Tensorflow.js

Modified `readthedown` RMarkdown template for stylish analytical documents