February Thoughts

Sorry about the lack of post over the past few month. Hope to regain some work life balance and update the blog more regularly. To start of the first blog post of 2018 I thought it would be nice to do share some interesting things that I have been reading over the past few weeks and create a to-do list to function as my commitment device.

Fun Facts

Did you know that the skin color of a cat is heavily determined by a gene located on the X chromosome? Another interesting aspect of this gene is that only one copy is activated per cell. This creates the spotted and patchwork patterns in female cats as they contain two alleles of the gene but not in male cats. I came across this trivia while reading The Gene: An Intimate History, which provides a detailed yet comprehensible history of the study of genetics. Very enjoying read on the evolution of scientific ideas, the brilliance of human creativity and the socio-political elements of the gene.
Actually cat skin color is not totally random. Randomness does not generate patches. If it were truly random the cat would have the look of a scrabbled chessboard. There are probably some factors that locally control which particular allele is switched off...
It is hard and computationally intensive to prove that two graphs are isomorphic (whether the graph isomorphism problem can be solved in polynomial time is supposedly an unsolved problem) but easy to show that they are different. I remember reading the quanta magazine article on the problem being solved in 2015 but supposedly there is a flaw in the proof so the problem is still unresolved.
Recently I have been reading up on using graphlets (small connected sub-graphs of the larger network) to measure the similarity between two larger networks.
Fascinating edible stuff: The physics of bread

To-Do List

Maintain my blog a bit more regularly. I still have a few sets of notes on algebraic graph theory to post as well as one more piece on regression which I plan to write to complete the series.
Read up on data pipelines and workflow integration. Currently I find that a lot of my time is being wasted moving data around various systems and trying to bring a model from development into production. I am sure there are some nice ways to combine data explorations, model building, implementation and evaluation in a unified framework so this is probably my top priority over the next few months.
Learn more about text analysis. Machine learning advances over the past few years has made tremendous improvements in image recognition capabilities but text analytics have been lagging behind. Not to mention plenty of man-hours are wasted on mundane text analytics activities (drafting, summarisation, fact checking etc). Advancements in this field would truly be the next billion dollar technological change.
Start a deep learning project. Heard a lot of hype, have a decent knowledge of the theory, now it's time to start playing around with it.
Read up more about randomised controlled trials (A/B testing if you are from the marketing / engineering world). There's the bayesian side to explore and issues on early stopping worth knowing about.
Play around with more datasets! Let me know what you would like to see. Maybe I should do something that is related to the labour market again. Always interesting to see how things change over time.