Skip to main content

Code from:

 

Household Pulse UI Analytic Code

 

A Note About Variance Estimation for this Project

The US Census recommends using the replicate weights available with each week’s data release for variance estimation with Household Pulse Survey data, using a balanced repeated replication (BRR) strategy. We anticipated using this strategy when we began this project. However, for reasons explained below, we ultimately decided that this strategy was not the best fit for our research questions.

 

In a survey with complex sampling design, such as Household Pulse, those respondents within a primary sampling unit may be more similar to each other than individuals in other primary sampling units. This correlation should be accounted for in variance estimation, or else the estimated standard errors may be too low. However, in the Household Pulse survey, there is another important source of correlation. Respondents can complete the Household Pulse survey up to 3 times. This means that there are multiple responses per respondent in the combined data sets we used. In general, correlations within respondents are much stronger than correlations within primary sampling units–for example, a person’s responses one week and the next are likely to be much more similar than a person’s responses and their neighbor’s in the same week. Although it is possible to account for this correlation using BRR, we were worried that the specific BRR weights provided by the US Census did not account for the correlations within individuals across weeks, as these weights did not change when new weeks of data were released, even though the Census could not know a priori who would complete the survey more than once. Instead, we chose to use a robust variance estimation (aka an empirical ‘sandwich’ estimator) with standard errors clustered at the level of the respondent ID to account for this. Because the Census does not release data on PSU or other design characteristics for the Household Pulse survey (to help preserve respondent anonymity), we were thus forced to choose between using a variation estimation strategy that either ignored the correlation within PSU, or ignored the correlation within responses from the same individuals on different weeks. We chose to use a robust variance estimation strategy with standard errors clustered at the level of the individual respondent for our primary analytic strategy, as we thought this was the stronger correlation and thus more important to account for.

 

In addition to this reason for selecting robust variance estimation over BRR, we had another reason as well. Because pre-pandemic income was a key covariate for our analyses, but had relatively high missingness, we thought it was important to use multiple imputation to address this. However, upon our review of the literature at the time we conducted the study, BRR variance estimation is not compatible with multiple imputation. However, robust variance estimation is compatible with multiple imputation. So this was another factor in our decision to use robust variance estimation rather than BRR.

 

To check that our findings were not sensitive to the choice of variance estimation strategy, we also conducted sensitivity analyses in the complete case dataset comparing robust variance estimation to BRR (these were not reported in the published manuscript owing to space limitations with the Research Letter format). We anticipated that the point estimates would be identical between the two approaches (because variance estimation does not affect the point estimates), but that standard errors would be larger for the robust variance estimation strategy compared with the BRR strategy (because the within individual correlation is stronger than the within PSU correlation, this will reduce the effective sample size more, thus leading to larger standard errors). This is exactly what we found, with considerably larger standard errors with the robust variance estimation strategy than the BRR strategy. This further suggested to us that the replicate weights may not be accounting for correlations within individuals.

 

Taken together, we judged that, for this project, using robust variance estimation was the conservative approach, in the sense of providing more protection against type 1 error. Since the majority of our analyses suggested a statistically significant association between unemployment insurance and study outcomes even using the robust variance estimation strategy, and since BRR analyses provide identical point estimates with smaller standard errors, we think it is unlikely that the choice of variance estimation strategy had a meaningful impact on the findings of this study.