Multilevel Modeling

Posted on July 26, 2022

Neville Dubash headshot

Data can be expensive to obtain. When this data is naturally clustered or hierarchical, it can be beneficial to use multilevel modeling.

Imagine a simple model to predict how likely a Coanda employee is to cycle to work given their commute distance. The model is parameterized by ϴ and predicts that an employee cycles to work if they live less than ϴ km away.

Coanda has offices in multiple cities, so maybe ϴ depends on the office. One option is to pool all employees together (complete pooling). This assumes that each location is identical and that ϴ is the same everywhere. Another option is to treat each office as completely independent (no pooling) and have a separate model for each office. However, offices with few employees will have high variance and could be skewed. Instead, by using a multilevel model with partial pooling, we can allow ϴ to still vary with office but the data from all locations informs each estimate of ϴ. Then, if we have far more data for our Edmonton office than our Calgary office, the Edmonton office data will help inform the Calgary office’s ϴ value. This reduces the Calgary model’s variance and regularizes against an exceptionally keen Calgary cyclist.


View original post on LinkedIn