Choosing a Control Group in a RCT with Multiple Treatment Periods

Came across a fun little problem over the past few weeks that is related to the topic of policy impact evaluation - a long time interest of mine! Here’s the setting: we have a large population of individuals and a number of treatments that we want to gauge the effectiveness of. The treatments are not necessarily the same but are targeted towards certain sub-segments in the population. Examples of such situations include online ad targeting or marketing campaigns. [Read More]

November Reflections

A collection of thoughts to start the month off. On the blog - Had a look at the google analytics data. There are about 750 views in total since the blog’s inception with a few users clocking in 5-10 minutes per post - so thank you for bumping up the stats if you are a regular reader! The most popular post…is the SG dashboard. This was a little surprisingly. I thought my thesis or any of the mathy stuff is more interesting but who knows? [Read More]

Notes on Regression - Singular Vector Decomposition

Here’s a fun take on the OLS that I picked up from The Elements of Statistical Learning. It applies the Singular Value Decomposition, also known as the method used in principal component analysis, to the regression framework. Singular Vector Decomposition (SVD) First, a little background on the SVD. The SVD could be thought of as a generalisation of the eigendecomposition. An eigenvector v of matrix \(\mathbf{A}\) is a vector that is mapped to a scaled version of itself: \[ \mathbf{A}v = \lambda v \] where \(\lambda\) is known as the eigenvalue. [Read More]

Mapping SG - Shiny App

While my previous posts on the Singapore census data focused mainly on the distribution of religious beliefs, there are many interesting trends that could be observed on other characteristics. I decided to pool the data which I have cleaned and processed into a Shiny app. Took a little longer than I expected but it is done. Have fun with it and hope you learn a little bit more about Singapore! [Read More]

Comparing the Population and Group Level Regression

I was planning to write a post that uses region level data to infer the underlying relationship at the population level. However, after thinking through the issue over the past few days and working out the math (below), I realise that the question I wanted to answer could not be solved using the aggregate data at hand. Nonetheless, here is a formal description of the problem outlining the assumptions needed to infer population level trends from more aggregated data. [Read More]

Notes on Regression - Maximum Likelihood

Part 4 in the series of notes on regression analysis derives the OLS formula through the maximum likelihood approach. Maximum likelihood involves finding the value of the parameters that maximise the probability of the observed data by assuming a particular functional form distribution. Bernoulli example Take for example a dataset consisting of results from a series of coin flips. The coin may be biased and we want to find an estimator for the probability of the coin landing heads. [Read More]

Using Leaflet in R - Tutorial

Here’s a tutorial on using Leaflet in R. While the leaflet package supports many options, the documentation is not the clearest and I had to do a bit of googling to customise the plot to my liking. This walkthrough documents the key features of the package which I find useful in generating choropleth overlays. Compared to the simple tmap approach documented in the previous post, creating a visualisation using leaflet gives more control over the final outcome. [Read More]

Examining the Changes in Religious Beliefs - Part 2

In a previous post, I took a look at the distribution of religious beliefs in Singapore. Having compiled additional characteristics across 3 time periods (2000, 2010, 2015), I decided to write a follow-up post to examine the changes across time. The dataset that I will be using is aggregated from the 2000 and 2010 Census as well as the 2015 General Household Survey. [Read More]

Notes on Regression - Method of Moments

Another way of establishing the OLS formula is through the method of moments approach. This method supposedly goes way back to Pearson in 1894. It could be thought of as replacing a population moment with a sample analogue and using it to solve for the parameter of interest. Example 1 To find an estimator for the sample mean, \(\mu=E[X]\), one replaces the expected value with a sample analogue, \(\hat{\mu}=\frac{1}{n}\sum_{i=1}^{n} X_{i} = \bar{X}\) [Read More]