Published on

Thesis Thursday 6 - The Final Stretch


Since my presentation about two weeks ago, I have been working on incorporating some of the suggestions and performing additional robustness tests. The updated version of the slides can be found here.

A recap of the key results - I find a positive correlation between the foreign-born and consumption shares within U.S. counties but this result does not hold across Asian countries. In fact, an increase in foreign-born share led to a decline in consumption of Asian-related consumer packaged goods.

Main changes

The main changes I made include using the 1980 NHGIS foreign-born data as an instrument instead of data from the 1970 IPUMS. The 1980 dataset contains many more counties giving me more power for the analysis. Unfortunately, the countries covered in the survey is not as comprehensive as that released in the 1970 IPUMS. I have included an additional table of results using various datasets to show that the results are consistent across the choice of countries / instrument.

I have included additional tests on the robustness of the expenditure share measure. I had quite a few questions on the way I model food culture as a bag of words and the accuracy of such a measure. To alleviate those concerns, I include a table showing a sample of products at various weights. I also implemented a few different measures including using a majority allocation weighting method (instead of a weighted average), and considering only products with a strong signal (weight > 0.5).

New tests of heterogeneity have also been included. I tried interacting the share of foreign-born with a dissimilarity index (to capture how integrated the migrant community is within the county) and splitting the panel by age but the results do not change much across both tests.

The updated presentation features accurate 1970 U.S. maps and plots. Previously, I used existing R packages which make mapping U.S. counties straightforward, but a limitation of those packages is that they do not accurately reflect historical county boundaries.1 The new plots are based on GIS shapefiles available from NHGIS which means that they correspond exactly to the data I am using.2

Computationally, I also streamline the code to run in half the time by cutting out repetitive loops and other unnecessary operations. The wait was just getting too long to generate the main dataset to run my regressions.


The results look pretty robust to the many ways I cut the data but I still plan to run a few more checks. I might try to use the 2012 Nielsen Consumer Panel data instead of 2011 data for additional samples.3 Additionally, I could also try to include multiple instruments (one for each county) and let the IV procedure choose the optimal weights.

I also want to try aggregating the counties to a broader level e.g. Commuting zones, to account for the possibility that such spillover effects may be wider than the county level.

It would be nice to try and find out why the share of consumption related to Asian goods is inversely related to the foreign-born share. My current hypothesis is that migration also led to the expansion of the number of restaurants. Since these restaurants started by Asian immigrants tend to be of the cheaper range, they meet the desires of consumers sufficiently and there is little incentive to cook Asian food at home. I could possibly test for such an effect using Yelp data but at this stage, the additional analysis feels like I am meandering off to a new topic, and I have way too much material on hand to write-up, so that will probably be on the future paper (if ever) shelf.

  1. There were some ugly holes in the maps from the old presentation but this is finally fixed!
  2. Maybe I should write a separate blog post on GIS mapping in R some time.
  3. I could also pool the various years together and control for year of survey but I doubt that it would affect my results and only add to time it takes to generate the dataset.