An Overview of the Singapore Hiring Landscape

The idea of having a 360 degree view of the entire job seeking and matching landscape has always been a dream of any labour economist. Just imagine, a dataset of CVs and job seekers matched with job advertisements and openings! The potential of such a dataset to answer existing questions on the labour market is incredible. One could investigate market power between worker and firms, information asymmetry within the matching process, or find out new growth clusters and skills needed to support these areas. [Read More]

Visualising Networks in ASOIAF - Part II

This is the second post of a character network analysis of George R. R. Martin’s A Song Of Ice and Fire (ASOIAF) series as well as my first submission to the R Bloggers community. A warm welcome to all readers out there! In my first post, I touched on the Tidygraph package to manipulate dataframes and ggraph for network visualisation as well as some tricks to fix the position of nodes when ploting multiple graphs containing the same node set and labeling based on polar coordinates. [Read More]

Visualising Networks in ASOIAF

While waiting for the winds of winter to arrive, there is plenty of time to revisit the 5 books. One of my favourite aspects of the series is the character and world building. As the song of ice and fire universe is so big, many characters are mentioned in passing while the major characters meet each other only occasionally. I thought it would be interesting to see how various characters are connected and how that progresses through the series. [Read More]

Applications of DAGs in Causal Inference

Introduction Two years ago I came across Pearl’s work on using directed cyclical graphs (DAGs) to model the problem of causal inference and have read the debate between academics on Pearl’s framework vs Rubin’s potential outcomes framework. Then I found it quite intriguing from a scientific methods and history perspective how two different formal frameworks could be developed to solve a common goal. I read a few papers on the DAG approach but without fully understanding how it could be useful to my work filed it away in the back of my mind (and computer folder). [Read More]

Feature Selection Using Feature Importance Score - Creating a PySpark Estimator

In this post I discuss how to create a new pyspark estimator to integrate in an existing machine learning pipeline. This is an extension of my previous post where I discussed how to create a custom cross validation function. Recently, I have been looking at integrating existing code in the pyspark ML pipeline framework. A pipeline is a fantastic concept of abstraction since it allows the analyst to focus on the main tasks that needs to be carried out and allows the entire piece of work to be reusable. [Read More]

Statistical Musings

No technical details in this post. Just a few scattered thoughts and some stories that have kept me semi-entertained over the last month. Some are inspired by work and others are just my take on the world, exaggerated to some degree. Rademacher Coins As we move towards a cashless society, maybe it would make sense to do away with coins and decimal values. Decimal points make billed amount and account balances unnecessarily messy and untidy. [Read More]

Creating a Custom Cross-Validation Function in PySpark

Introduction Lately, I have been using PySpark in my data processing and modeling pipeline. While Spark is great for most data processing needs, the machine learning component is slightly lacking. Coming from R and Python’s scikit-learn where there are so many machine learning packages available, this limitation is frustrating. Having said that, there are ongoing efforts to improve the machine learning library so hopefully there would be more functionalities in the future. [Read More]

Uploading Jupyter Notebook Files to Blogdown

I have been working quite a bit with Python recently, using the popular Jupyter Notebook interface. Have been thinking about uploading some of my machine learning experiments and notes on the blog but integrating python with a blog built on blogdown seems problematic as I could not google a solution. Turns out it is actually quite simple (maybe that’s why nobody posted a tutorial on it, or maybe people who blog in R do not really use Python). [Read More]

Notes on Regression - Approximation of the Conditional Expectation Function

The final installment in my ‘Notes on Regression’ series! For a review on ways to derive the Ordinary Least Square formula as well as various algebraic and geometric interpretations, check out the previous 5 posts: Part 1 - OLS by way of minimising the sum of square errors Part 2 - Projection and Orthogonality Part 3 - Method of Moments Part 4 - Maximum Likelihood Part 5 - Singular Vector Decomposition [Read More]

February Thoughts

Sorry about the lack of post over the past few month. Hope to regain some work life balance and update the blog more regularly. To start of the first blog post of 2018 I thought it would be nice to do share some interesting things that I have been reading over the past few weeks and create a to-do list to function as my commitment device. Fun Facts Did you know that the skin color of a cat is heavily determined by a gene located on the X chromosome? [Read More]