The Fundamental Flaws of The Only Meta-Analysis of Social Media Reduction Experiments (And Why It Matters), Part 3
The recent meta-analysis of social media experiments achieved a statistically insignificant result only due to a string of errors.
Introduction from Zach Rausch and Jon Haidt:
In the previous two posts (one and two) in this series, we examined and highlighted numerous problems with a recent meta-analysis on social media reduction experiments and mental health. The study, conducted by Stetson University psychologist Chris Ferguson, is titled Do Social Media Experiments Prove a Link with Mental Health? A Methodological and Meta-Analytic Review. We uncovered significant factual errors, including inaccuracies in effect sizes, sample sizes, and study inclusion/exclusion criteria, which were used to conclude that the impact of social media use on mental health is indistinguishable from zero.
In his paper, Ferguson argued that his meta analysis “undermines causal claims by some scholars (e.g., Haidt, 2020; Twenge, 2020) that reductions in social media time would improve adolescent mental health.” This claim is problematic for a variety of reasons, but we want to emphasize one of them, as this is a point that has been missed in the debate around our critique of Ferguson’s meta-analysis.
The central causal claims that we have put forth in The Anxious Generation and on this Substack is that heavy social media use causes increases in internalizing disorders (e.g., anxiety and depression) by a variety of mechanisms laid out in chapters 5 and 6 of the book, including increased social comparison, perfectionism, sleep deprivation, social deprivation, behavioral addiction, and emotional contagion. One way of testing this causal theory is by examining the set of experiments in which people were randomly assigned to reduce their social media use for a period long enough to get past withdrawal effects and measuring changes in symptoms of internalizing disorders. We also predict that effects will be greater for girls than boys, and greater if done by a group, rather than if done individually. See this footnote for quotes where we lay out these claims in The Anxious Generation.1
Any analysis that combines short-term studies (e.g., one day reduction), which we predict would cause withdrawal, with multi-week reduction, which we predict would lead to benefits, would not be testing our causal theory. Similarly, merging together symptoms of internalizing disorders with broader measures of life satisfaction or temporary mood also complicates any test of our causal claims, as each of these measures will be impacted differently by social media reduction. Ferguson did both of these things and his findings thus do not “undermine” our causal claims; he failed to accurately test our causal claims.
Beyond the misunderstanding of our causal claims, we identified several additional methodological choices that we believe suppressed the real effects of social media use on mental health: Ferguson combined very different kinds of experiments together, such as three-week reduction studies with five-minute lab exposure studies. His creation of composite effect sizes (e.g., averaging together items like loneliness, self-esteem, social connectedness, depression, etc.) lacked a consistent methodology and transparency, making it difficult for other researchers to replicate his work and test the sensitivity of his results to alternative approaches. Together, these flaws render the meta-analysis difficult to replicate and difficult to interpret.
In this third post of our series, David Stein demonstrates that correcting Ferguson’s factual errors changes his results from statistical insignificance to statistical significance. Ferguson’s reliance on confidence intervals that include zero is central to his conclusions, and David will explain why his conclusions are incorrect. In future posts, we will be providing our case for causality, starting our analysis from scratch.
Since our last post, we have been in contact with Ferguson, who has acknowledged that he made some errors in his article, including inaccuracies in sample sizes. He also told us the he will be going through the effect sizes and will “correct those as well as needed.” He has said that he will update his study with corrections in the coming weeks.
[Correction: In the original version of the post, we said that Ferguson acknowledged errors in both effect sizes and sample sizes.]
As we outlined in our previous posts, much hinges on the question of correlation versus causation. Social media companies have consistently dismissed claims—and potential liability—regarding harm reported by parents and teens, arguing that current scientific evidence is merely correlational and does not establish causation. This is why scrutinizing this meta-analysis of experiments is so critical.
A prime example of this dismissal and the false claim of ‘no evidence’ can be seen in a recent interview from September 25th on The Verge's show Decoder. In the conversation, Mark Zuckerberg tells Alex Heath, “That, I think, is the next big fight,” referring to social media and youth mental health. He goes on to say, “I think a lot of the data on this is just not where the narrative is … the majority of high-quality research out there suggests there’s no causal connection at a broad scale between these things.”
And now, here’s David’s post.
– Zach and Jon
[Guest Post from David Stein]
In parts 1 and 2 of The Fundamental Flaws of The Only Meta-Analysis of Social Media Reduction Experiments, Jonathan Haidt and Zach Rausch demonstrated a number of errors and shortcomings in the recent meta-analytic review of social media (SM) experiments by Chris Ferguson.
In this part, we will see that the centerpiece of Ferguson’s arguments — a statistically insignificant result within his random-effects model — would not have been possible without the string of errors demonstrated in Part 2. We will also, in the Appendix, examine some of the failures of Ferguson to conduct a proper review of the experimental evidence (and why it matters).
Statistical Insignificance
Ferguson selected 27 experiments and then assigned each of them a sample size and an effect size (Cohen’s d). He then analyzed the data using a random-effects model in Jamovi, which produced a weighted average of d = 0.088 with a confidence interval of (-0.018, +0.197).
Ferguson relied greatly on the fact that this confidence interval includes zero, writing in his paper that “mean effect sizes are no different from zero” and then inferring that this “undermines causal claims by some scholars (e.g., Haidt, 2020; Twenge, 2020) that reductions in social media time would improve adolescent mental health.”’
Later, Ferguson even declared in a tweet that his finding showed that “reducing social media time has NO impact on mental health.”2
In this post, we will see that Ferguson’s model produced a statistically insignificant result only due to a string of errors and that correcting these errors would change the result from insignificance to significance.
Erroneous Data
In Part 2, we saw that Ferguson’s data contained a number of important errors: two experiments were assigned indefensible effect sizes, three sample sizes were substantially wrong, and Ferguson included several studies that violated his own selection criteria (while failing to include several studies that did fit his criteria).
Each one of these errors biased the analysis in the same direction: toward the negative, thus helping to achieve statistical insignificance.
One of the incorrect effect size assignments was in Lepp 2022 (see Part 2).3
After assigning a conservative effect size4 of d = +0.27 to this experiment (in contrast to Ferguson’s original d = -.365), the confidence interval ceases to include zero and p is below 0.05:
d = +0.106 (+0.002, +0.210) with p = 0.046
Ferguson’s finding that the confidence interval includes zero is so tenuous that even a single correction of an effect size within his data can switch the insignificance to significance.
If we fix the two erroneous effect sizes (Lepp 2022 and Brailovskaia 2022) as well as the three large sample size errors (Przybylski 2021, Lepp 2022, and the Kleefield dissertation) and exclude the two studies that violated Ferguson’s own criteria (Deters 2013 and Gajdics 2022),5 the magnitude of the estimated effect increases:
d = +0.146 (+0.048, +0.244)
Including several studies that match Ferguson's inclusion criteria but were left out of his analysis for reasons that are unclear6 — each indicating that SM use is harmful — would increase d substantially more.
The above demonstrates that Ferguson was able to reach an insignificant result only by making a string of errors and that correcting these errors would greatly increase the magnitude of the estimated effect within Ferguson’s model and invalidate his conclusions.
The Impacts of Social Media Reduction
In my initial critique of Ferguson’s meta-analytic review, I pointed out that he did not provide a valid model of SM reduction impacts on mental health as proposed by Haidt and Twenge. This matters because Ferguson argues that his study undermines their causal claims.
One of the reasons that his model is not a valid test of the theories of Haidt and Twenge is that Ferguson included many experiments that did not measure outcomes such as depression or anxiety — the mental health disorders that are the main focus of the theories proposed by Haidt and Twenge. The theories of Haidt and Twenge do not blame declines in every imaginable aspect of well-being on social media.7
Let us therefore restrict Ferguson’s model to the 11 studies (out of his original 27) that do measure depression or anxiety. Once we do so, it produces the following effect size estimate and confidence interval:
d = +0.26 (+0.13, +0.40)
Note that none of these 11 studies are SM exposure lab experiments. Instead, these are all SM time reduction field experiments lasting at least one week. The 11 studies that included a measure of anxiety or depression were:
Mahalingham 2023 1W +0.175
Kleefeld dissertation 1W -0.277
Lambert 2022 1W +0.797
Tromholt 2016 1W +0.310
Brailovskaia 2020 2W +0.154
Brailovskaia 2022 2W +0 [Incorrect effect size]
Faulhaber 2023 2W +0.484
Hunt 2018 3W +0.232
Thai 2021 3W +0.576
Alcott 2020 4W +0.090
Hunt 2021 4W +0.374
Note: Effect sizes are given as assigned by Ferguson.
Degree of Evidence
It might be tempting to conclude that the above constitutes strong evidence that reducing social media use for at least one week reduces the risk of mental health disorders such as depression and anxiety.
Unfortunately, the model still relies on Ferguson’s notion of ‘composite’ effect sizes, wherein each study is assigned a single effect size that is the average of impacts on several disparate aspects of well-being selected by Ferguson. Thus one study might be assigned the average of depression and loneliness impacts, while another study the average of anxiety and life satisfaction and self-esteem impacts. The interpretation of such composite effect sizes is problematic at best.
Note that Ferguson had to calculate effect sizes for depression and anxiety in order to determine the composite effect size for those studies that contained these outcomes, but he did not include these results in his paper. When requested to provide the effect sizes he calculated for depression and anxiety, Ferguson replied that he is unable to do so. Furthermore, Ferguson even declined to reveal which aspects of well-being he selected for his calculation of each composite effect size (see Part 2 for details).
In his paper, Ferguson did not define a consistent method of determining the component effect sizes and it appears unlikely, in view of the disparate types of data provided by the various studies, that Ferguson was able to achieve any consistency.
In other words, Ferguson’s composite effects are so disparate, fuzzy and opaque as to provide only very weak evidence for any arguments related to mental health outcomes of SM experiments.
Preview of Evidence
There are at least 17 published studies of SM time reduction impacts on depression or anxiety. According to these study reports, out of the 26 impacts on depression or anxiety measured within these studies, the great majority (62%) resulted in statistically significant evidence of a beneficial impact — and none resulted in statistically significant evidence of a detrimental impact.
Since methods of effect size determination vary among the studies, we are in the process of obtaining sufficient data from each experiment to enable us to provide consistent methodologies across all the studies. Once we are done collecting the necessary data, we should be able to provide a robust and reliable evidentiary base for a systematic review of the experiments.
Conclusion
Correcting even just a few of the numerous errors Ferguson made increases the effect size estimate substantially and shifts the result to statistical significance. This is yet another reason why Ferguson’s meta-analytic review should not be used to argue that there’s no experimental evidence of social media use reduction impacts on mental health.
Note: See Perils of Flawed Meta-Analytic Methodology for further notes on methodological flaws of Ferguson’s review.
Regarding withdrawal: See Anxious Generation, p. 134. “If dopamine release is pleasurable, dopamine deficit is unpleasant. Ordinary life becomes boring and even painful without the drug. Nothing feels good anymore, except the drug. The addicted person is in a state of withdrawal, which will go away only if she can stay off the drug long enough for her brain to return to its default state (usually a few weeks).”
Regarding internalizing disorders: In The Anxious Generation as well as in Jon’s earlier writing, we have repeatedly argued that social media’s central effects on mental health are increases in “internalizing disorders,” particularly anxiety and depression. From p. 25 of Anxious Generation: “We found important clues to this mystery by digging into more data on adolescent mental health. The first clue is that the rise is concentrated in disorders related to anxiety and depression, which are classed together in the psychiatric category known as internalizing disorders. These are disorders in which a person feels strong distress and experiences the symptoms inwardly. The person with an internalizing disorder feels emotions such as anxiety, fear, sadness, and hopelessness. They ruminate. They often withdraw from social engagement.”
The tweet, initially posted in April, has been deleted recently.
A note by Jon Haidt: In that study, participants who were assigned to spend 30 minutes on social media showed a significant decline in positive affect within 15 minutes, while those who spent the time studying or walking on a treadmill showed significant increases in positive affect. Despite the drop in positive affect, Ferguson assigned Leep the largest of all 27 negative effect sizes (meaning that exposure to social media was beneficial). How? Because he compared the social media condition to an aversive condition in which participants were required to sit alone in a room and do nothing for 30 minutes. The authors had labeled this a “control condition,” but it is not a true control condition, it is a highly aversive condition (the author, Lepp, agreed about that, by email).
In Lepp, a social media lab exposure experiment, there is no valid control group or condition, which is similar to Gajdics, where it seems Ferguson simply compared the before and after group means to calculate Cohen’s d. The situation is complicated in Lepp by the availability of outcome measures at mid-point of the intervention; using repeated ANOVA would result in an effect size of d = 0.45, but the effect size from the plain change score is d = 0.27. I therefore use the smaller value, that is the weaker effect size of harmful SM impact.
Ferguson includes two studies that violate his own inclusion criteria, which is to only include experiments that manipulate time spent on social media. Both Gajdics and Deters violate this criteria. More in Part 2.
In his meta-analysis, Ferguson credits two blog posts as valuable sources he consulted when searching for experimental studies: “Studies identified by previous commentators (e.g., Hanania, 2023; Smith, 2023) as this was valuable locating studies in other fields such as economics.” Smith, however, refers to Hanania for a list of studies, and Hanania explicitly states that the studies he mentions came from our Google doc: “Jonathan Haidt and Jean Twenge have put together a Google doc that contains all studies they can find relevant to whether social media use harms the mental health of young people.”
Furthermore, Ferguson provided only composite effects, which would make sense only if Haidt and Twenge expected all aspects of well-being to improve at the same rate during social media reduction experiments. This is not the case – for example, feelings of loneliness, per theories of Haidt and Twenge, could take considerable time to improve after SM reduction because it takes a while to develop relationships offline and may not improve greatly until some other peers also reduce SM use substantially.
Media effects research was the primary topic of my graduate research, though this specific area (social media and mental health) wasn't one of the topics I studied. The fact that Ferguson did a meta-analysis, made numerous errors, and obtained the result that a possible media effect is non-significant is extremely unsurprising. He's been doing that for the past 17 years.
Social psychology has faced a reckoning in a number of problems over the past 15 years (p-hacking, replication crisis). Those are tough issues that definitely needed attention from the field. That said, it seems to be a sadly neglected issue that it's quite easy (arguably much easier) to use flawed methods to obtain a non-significant result when one wishes, for whatever reason. This is often followed by the erroneous conclusion that the effect in question therefore doesn't exist. Journal editors and reviewers, in particular, need to be more attentive to this problem. This hurts the field and the public understanding of important issues.
Preventing children from accessing smartphones is well and good, but more ambitious solutions to this problem may be necessary:
https://swiftenterprises.substack.com/p/computational-independence