starter-blog/.Rhistory
2021-01-09 17:50:45 +08:00

191 lines
11 KiB
R

library(blogdown)
serve_site()
installr::install.pandoc()
install.packages("htmlwidgets")
serve_site()
install.packages("plotly")
install.packages("plotly")
serve_site()
install.packages("htmlwidgets")
---
title: "Scraping SG's GDP data using SingStat's API"
author: "Timothy Lin"
date: '2017-05-31'
slug: scraping-sg-s-gdp-data-using-singstat-s-api
subtitle: ''
tags:
- Singapore
- SG Economy
- Web-Scraping
- R
- Dashboard
categories: []
---
```{r global_options, include=FALSE}
knitr::opts_chunk$set(warning=FALSE, message=FALSE, cache=TRUE)
```
I have been trying to catch-up on the latest release of Singapore's economic results. Unfortunately, the official press release or media reports are not very useful. They either contain too much irrelevant information or not enough details for my liking. Maybe I just like looking at the numbers and letting the figures speak for themselves. Hence, I decided to obtain the data from the official SingStat's Table Builder website. Turns out that they have an API available that provides data for selected data series including the [GDP figures](http://www.tablebuilder.singstat.gov.sg/publicfacing/initApiList.action).
This makes it easy to obtain the required data without having to navigate through the complicated tree structure of table builder. The best part about having an API is that I can just re-run the same set of codes and produce my own customise GDP report tailored to my liking! For the first technical post of this blog, I decided to do a write-up on using the API to extract GDP figures and generate some graphs in R. A more detailed set of codes is available on [github](https://github.com/timlrx/sg-econ-api).
I use the `httr` package to read in JSON data before manipulating the data in `dplyr` and `reshape2`. `ggplot2` is used for graphing and to add a little interactivity we can also play around with the `plotly` package. First, we have to read in the JSON data and convert it to string format. I decided to read in both the seasonally-adjusted VA figures and raw VA figures as they are used to calculate quarter-on-quarter and year-on-year growth respectively.
```{r setup}
library(httr)
library(reshape2)
library(dplyr)
library(ggplot2)
url_va_sa <- "http://www.tablebuilder.singstat.gov.sg/publicfacing/api/json/title/12411.json"
url_va <- "http://www.tablebuilder.singstat.gov.sg/publicfacing/api/json/title/12407.json"
raw.result <- GET(url=url_va_sa)
names(raw.result)
raw.content <- rawToChar(raw.result$content) # Convert raw bytes to string
raw.content <- content(raw.result)
# levels correspond to the statistics level of aggregation
# 1 - overall economy, 2 - goods / services, 3 - sector level
```
Browsing the raw.content variable, we can see that it is a messy list of lists.
```{r namescontent}
names(raw.content)
```
Level1, Level2, Level3 contain the data we are interested in at various levels of industry aggregation. Let's convert it to a more managable dataframe format and extract the required data. The do.call function along with lapply helps to unpack the list.
```{r unpacklist}
data <- do.call(what = "rbind", args = lapply(raw.content$Level1, as.data.frame))
data.sector <- do.call(what = "rbind", args = lapply(raw.content$Level3, as.data.frame))
data$value <- as.numeric(as.character(data$value))
data.sector$value <- as.numeric(as.character(data.sector$value))
data <- data[,c("quarter","value")]
data.sector <- data.sector[,c("quarter","value","level_3")]
names(data)[names(data) == 'quarter'] <- 'period'
names(data.sector)[names(data.sector) == 'quarter'] <- 'period'
names(data.sector)[names(data.sector) == 'level_3'] <- 'industry'
```
Now we can use the levels data and calculate the quarter-on-quarter seasonally-adjusted annualised growth rate (qoqsaa).
```{r qoqsaa}
data <- data %>% mutate(qoqsaa = ((value/lag(value))^4-1)*100)
data.sector <- data.sector %>% group_by(industry) %>%
mutate(qoqsaa = ((value/lag(value))^4-1)*100) %>% ungroup()
```
We can repeat the process for the non-seasonally-adjusted data and calculate the year-on-year growth rates.[Code omitted due to repetition.]
```{r nonsa_yoy, include=FALSE}
# Non-seasonally adjusted series for YoY calculations
raw.result <- GET(url=url_va)
names(raw.result)
raw.content <- rawToChar(raw.result$content) # Convert raw bytes to string
raw.content <- content(raw.result)
names(raw.content)
# levels correspond to the statistics level of aggregation
# 1 - overall economy, 2 - goods / services, 3 - sector level
# Converts the list of list into a dataframe
va <- do.call(what = "rbind", args = lapply(raw.content$Level1, as.data.frame))
va.sector <- do.call(what = "rbind", args = lapply(raw.content$Level3, as.data.frame))
va$value <- as.numeric(as.character(va$value))
va.sector$value <- as.numeric(as.character(va.sector$value))
va <- va[,c("quarter","value")]
va.sector <- va.sector[,c("quarter","value","level_3")]
names(va)[names(va) == 'quarter'] <- 'period'
names(va.sector)[names(va.sector) == 'quarter'] <- 'period'
names(va.sector)[names(va.sector) == 'level_3'] <- 'industry'
#calculate qoq SA annualised growth rate (qoqsaa)
va<-va %>% mutate(year = as.numeric(substr(period, 1, 4)), quarter = substr(period, 6, 7)) %>%
group_by(quarter) %>% mutate(yoy = (value/lag(value)-1)*100) %>% ungroup()
va.sector<-va.sector %>% mutate(year = as.numeric(substr(period, 1, 4)), quarter = substr(period, 6, 7)) %>%
group_by(quarter, industry) %>% mutate(yoy = (value/lag(value)-1)*100) %>%
ungroup()
# Combine va data from both series
data$industry<-as.factor("Overall Economy")
va$industry<-as.factor("Overall Economy")
va.sector.comb<-merge(rbind(va, va.sector), rbind(data, data.sector), by=c("industry","period"))
names(va.sector.comb)[names(va.sector.comb) == 'value.x'] <- 'va.sa'
names(va.sector.comb)[names(va.sector.comb) == 'value.y'] <- 'va'
va.sector.comb<-arrange(va.sector.comb, year, quarter)
va.sector.comb$industry.short<-va.sector.comb$industry
va.sector.comb$industry.short<-plyr::revalue(va.sector.comb$industry,
c("Overall Economy"="Overall",
"Manufacturing"="Mfg",
"Other Goods Industries"="Other Goods",
"Wholesale & Retail Trade"="Wholesale Retail",
"Transportation & Storage"="Tpt & Storage",
"Accommodation & Food Services"="Acc & Food",
"Finance & Insurance"="F&I",
"Information & Communications"="Infocomms",
"Business Services"="Business Ser",
"Other Services Industries"="Other Ser"))
```
Let's take a look at some plots of the data.
```{r qoqsaplot, fig.cap='QoQ GDP Trend'}
ggplot(data[(nrow(data)-16+1):nrow(data),], aes(period, qoqsaa, group=1)) +
geom_point() + geom_line() + ylab("QoQ SA (%)") + xlab("Period") + theme_classic() +
theme(axis.text.x=element_text(angle=45,hjust=1,vjust=1))
```
And here's a plot comparing qoq and yoy growth for all the industries:
```{r reshapeplot, echo=FALSE}
melt.va.sector.comb <- melt(va.sector.comb[(nrow(va.sector.comb)-11):nrow(va.sector.comb),],
measure.vars=c("yoy","qoqsaa"))
names(melt.va.sector.comb)[names(melt.va.sector.comb) == 'variable'] <- 'series'
melt.va.sector.comb$industry.short <- factor(melt.va.sector.comb$industry.short,
levels=rev(levels(melt.va.sector.comb$industry.short)))
sector.bar<-ggplot(melt.va.sector.comb,aes(industry.short, value)) +
geom_bar(aes(fill=series), stat="identity", position=position_dodge(width=1)) +
ylab("Growth (%)") + xlab("Industry") + theme(axis.text.x=element_text(angle=45,hjust=1,vjust=1),
axis.title.x=element_blank()) +
theme_classic() + coord_flip()
sector.bar
```
Wondering what the actual percentage change is? We can convert the graph into a more interactive one using a simple line of code. And since it is locally generated, it can be hosted on github as well.
```{r sector_bar_plotly, include=FALSE}
library(plotly)
ggplotly(sector.bar)
```
While the precision of our calculated data is satisfying, the scale of the graph is being skewed by one particular sector. The other goods sector (Agriculture and Fishing, Mining and Quarrying) contracted 16.38% yoy (yikes!).^[I manually verified the figures with the official ones due to this huge number.] The other goods sector is normally not given any coverage as it is a very small component of GDP (just 30m). Could the decline be mostly due to noisy fluctuations?
```{r other_goods_plot}
other.goods.data <- va.sector.comb %>% filter(year>=2012 & industry=="Other Goods Industries") %>% select(period, yoy, qoqsaa) %>% melt(measure.vars=c("yoy","qoqsaa"))
other.goods.plot<-ggplot(other.goods.data, aes(period, value, group=variable, color=variable)) + geom_point() + geom_line() + ylab("QoQ SA (%)") + xlab("Period") + theme_classic() + theme(axis.text.x=element_text(angle=45,hjust=1,vjust=1))
```
Not suprisingly there is a downward trend in the sector, but the recent quarter's dip is the sharpest in all 4 years that I have plotted. The drastic decline in quarter-on-quarter VA figures for the finance and insurance industry is of potentially greater concern. Let's take a closer look by plotting an extended time series.
```{r fi, echo=FALSE, fig.cap=c('yoy plot', 'qoq plot')}
fi.data <- va.sector.comb %>% filter(year>=2010 & industry=="Finance & Insurance") %>%
select(period, yoy, qoqsaa, quarter) %>% melt(measure.vars=c("yoy","qoqsaa"))
## Looks like there is some seasonality in the data
ggplot(fi.data[fi.data$variable=="yoy",], aes(period, value,group=1)) +
geom_point() + geom_line() + ylab("YoY (%)") + xlab("Period") + theme_classic() +
theme(axis.text.x=element_text(angle=45,hjust=1,vjust=1))
library(ggrepel)
ggplot(fi.data[fi.data$variable=="qoqsaa",], aes(period, value,group=1)) +
geom_point() + geom_line() + geom_text_repel(aes(label=quarter)) + ylab("QoQ SA (%)") + xlab("Period") + theme_classic() +
theme(axis.text.x=element_text(angle=45,hjust=1,vjust=1))
```
Interestingly, from the yoy data it appears that the contribution of finance and insurance to Singapore's GDP growth is plateauing off. The qoq data is even more suprising showing a strong cyclical pattern since 2014. This is of particular concern as the F&I figures are not seasonally-adjusted. DOS believes that the sector does not exhibit seasonality, but looking at the recent data, a case could be made that the fundamentals of the sector has changed substantially over the past 3 years and one should definitely treat the qoq figures extremely cautiously.
serve_site()
build_site()
serve_site()
build_site()
build_site()
build_site()
build_site()
library(httr)
library(reshape2)
library(dplyr)
library(ggplot2)
url_va_sa <- "http://www.tablebuilder.singstat.gov.sg/publicfacing/api/json/title/12411.json"
url_va <- "http://www.tablebuilder.singstat.gov.sg/publicfacing/api/json/title/12407.json"
raw.result <- GET(url=url_va_sa)
names(raw.result)
raw.content <- rawToChar(raw.result$content) # Convert raw bytes to string
raw.content <- content(raw.result)
names(raw.content)
serve_site()
install.packages("cowplot")
serve_site()
install.packages("DT")
serve_site()
serve_site()
install.packages("kableExtra")
install.packages("kableExtra")
serve_site()
serve_site()
serve_site()
build_dir("static/dashboard")
build_dir("static/dashboard")
ls
library(blogdown)
build_dir("./static/dashboard")
build_dir("./static/dashboard")