Very often, I encounter practitioners interested in estimating an “interaction effect,” also known as a “moderation effect” in management. The goal is to determine whether the effect of a certain intervention is influenced by a third factor. For instance, in a marketing context, the effectiveness of advertising (independent variable) on sales performance (dependent variable) might be influenced by demographics or socioeconomic status (moderator).
Since I read the viral blog post by Andrew Gelman on this topic some time ago, I always remind researchers that the number of observations required to achieve a well-powered test for the interaction effect is usually much higher than those needed for detecting a main effect. Gelman’s post seems to have went viral as a “16 times rule” for this purpose, but requirements depend on assumptions.
A low-power test means low chances of detecting a true effect, which could lead to wasted resources. Moreover, it might result in a gross overestimation of the true effect.
To provide a practical understanding of these requirements and to illustrate the issue, my colleague Francisco Sesto and I built a small dashboard based on simulations. This tool allows you to manipulate the magnitudes of the main and interaction effects and visualize the power and the number of observations required.
The dashboard
This dashboard allows you to explore the relationship between sample size and the power to detect: i) a main effect and ii) an interaction effect.
This dashboard allows you to explore the relationship between sample size and the power to detect a chosen: i) main effect and ii) an interaction effect.
Power is defined as the probability of detecting an effect (i.e., rejecting the null hypothesis) when an actual effect exists.
Generally, the larger the actual effect, the higher the power. Similarly, the larger the sample size, the higher the power. This analysis is typically employed to determine the sample size required to detect a given effect at a specified power. Alternatively, if additional data collection is not possible, understanding the test's power can help identify potential limitations in the analysis.
Consider a scenario where you want to test the effect of an intervention (T). Define
Additionally, suppose you are interested in testing for a larger effect in a specific subgroup (e.g., a larger effect for women than for men). This interaction effect is defined as the difference between the intervention effects for men
Insights
The reason why so many more observations are often needed for interaction effects is that, when measuring such an effect , you are typically:
- Trying to measure a much smaller effect than the main effect (e.g., the expected difference between demographic groups is much smaller than the average effect being studied).
- Dividing the data into more groups (to estimate differences-in-differences), leading to larger standard errors.
Additionally, our dashboard reveals that these requirements are not linear. The "16 times" rule is a rough guideline and is highly sensitive to the relative sizes of the effects you aim to capture.
By understanding these nuances and using tools like our dashboard, practitioners can better plan their studies to ensure sufficient power for detecting interaction effects. This not only improves the reliability of their findings but also optimizes resource allocation.
Technical Notes
The objective of this dashboard was to provide an approximation on sample size requirements, under a set of basic assumptions.
The outcome variable is assumed to be normally distributed.
For simplicity, the sizes of the two subgroups in the population are assumed to be equal.
The parameters are defined in the following regression model:
where (W) is a dummy variable indicating the subgroup, such as 1 for men and 0 for women. Power was computed assuming 5% significance, and 1000 iterations per scenario.
References
This dashboard was inspired by Andrew Gelman's "You need 16 times the sample size to estimate an interaction than to estimate a main effect" blog post (dated March 15, 2018), which is also discussed in the book "Regression and Other Stories" by Gelman, A., Hill, J., & Vehtari, A. (2021).