The Fundamental Flaws of The Only Meta-Analysis of Social Media Reduction Experiments (And Why It Matters), Part 1
A recent meta-analysis contains yet overlooks evidence that multi-week social media reduction experiments consistently improve mental health
Introductory Note, September 10th, 2024
We have made a number of significant changes to this post, based on helpful feedback from David Stein, Matthew Jané, and others. The changes include:
We are splitting the series (previously called: “The Case For Causality” into two different series to reduce confusion about the goals of each set of posts: In the first series (posts one, two, and three), we address the fundamental problems with the only available meta analysis of social media reduction experiments. In the second series, we will show that multi-week social media reduction experiments generally cause improvements in depression and anxiety, and short exposures to putatively harmful features of social media generally cause distress. In the initial version of this post, we included “type of study” as a moderator. In retrospect, we realize we should have simply stated that these are categories of fundamentally distinct types of experiments that need to be examined separately. The primary focus of this post is on the moderating effect of duration. Some text has been changed to address this.
We also classify our major tables as “field” vs. “lab” studies, instead of primarily classifying them as “Reduction” or “Exposure” (even though nearly all the field studies are reduction studies, and all of the lab studies are exposure studies). One significant implication of this change is that Deters & Mehl, 2013 is moved from Table 3 (lab studies) to Table 2 (one week field studies). The Deters experiment took place over the course of one week, in the participants' natural environment. It was not a social media reduction experiment and it was also not exactly an exposure study either. In post 2, we explain why Deters should have been removed entirely.
We now include both the simple average for the effect sizes (as we did in the earlier draft) and the average effect sizes with studies weighted by sample size and confidence intervals.
We want to note that our omission of these statistics was intentional. David Stein convinced us that Ferguson’s particular collection of experiments and composite effect sizes cannot be subjected to a fully valid ‘random effects’ analysis. The primary obstacle is that the effect sizes reported are a blend of very different dependent variables across the set of 27 studies, which themselves used very different methods. David lays out why this is a problem in part 3 of this series.Note that we are using Ferguson’s erroneous sample sizes and effect size calculations in this post, which means that the weighted averages and confidence intervals are incorrect. The reason we do not fix the errors in this post is to make the point that even without making any changes to Ferguson’s underlying data, Ferguson’s data still undermines his own conclusion that his meta-analytic review finds that “reducing social media time has NO impact on mental health.” In post 3 of this series, David Stein shows what happens when these errors are corrected.
Why are we publishing this long series? Because a great deal hangs on the question of causation. Social media companies have consistently dismissed claims—and liability—regarding the harm that parents and teens believe was caused by their platforms. The companies claim that the scientific evidence at present is merely correlational, and that it does not point to causation. Experiments using random assignment are very important because they do indicate that a manipulation caused an effect, rather than merely being correlated with it. So it is very important to know what the existing experiments show, when taken together, and that is why Ferguson’s meta-analysis is so important. Do the existing experiments suggest causation, or do they indicate an overall effect size that is indistinguishable from zero?
For full transparency, we have created a backup Google Doc with the original text of this post, which you can find here.
Introduction: The Debate over Correlation vs. Causation
At the heart of the academic debate over our claims in The Anxious Generation has been the question of correlation versus causation. While the book is about two major interlocking changes to childhood—the loss of the play-based childhood and the rise of the phone-based childhood—nearly all of the academic debate has been focused on the phone side, and most of that has focused just on the social media component of the phone side, and most of that has been focused on social media’s impact on mental health. Notably, our claims about the importance of real-world independence, unsupervised play, and responsibility in Chapters 2 and 3, and the impact of smartphones and social media on school climate and academic performance in Chapters 5 and 11, have elicited hardly any objections or critiques.
A great deal hangs on this question of causation because social media companies have consistently dismissed claims—and liability—regarding the harms that parents and teens believe were caused by their platforms. They claim that the scientific evidence at present does not point to causation. Mark Zuckerberg used this defense in his opening statement to a Senate subcommittee last January:
Mental health is a complex issue and the existing body of scientific work has not shown a causal link between using social media and young people having worse mental health outcomes.
Is he right? What kinds of scientific studies could show causal links, if they were there?
In two series of posts, we zoom in on the center of the debate: our claim that social media use (especially heavy use) is a substantial contributing cause of mental health problems (at least in some substantial minority of adolescent users). Most participants in the debate agree that heavy social media use is associated with many different kinds of harms — mental health deterioration, sleep deprivation, attention fragmentation, etc. (see, for example, Orben 2020 for the typical range of the associations, and see ch. 4 of the recent National Academies of Sciences report for a long list of associated harms). Regarding mental health, heavy users—who often qualify as having “problematic use” that interferes with other areas of functioning—are nearly always found to have higher rates of anxiety and depression than light users or non-users, and these differences are often quite large, especially for girls (among whom heavy users of social media were found to be three times as likely to be depressed as light users; Kelly et al., 2019).
It is, however, a challenge to establish whether something about social media in general causes mental health harms such as depression and anxiety, or whether the association is due primarily to reverse causation (meaning that depression or anxiety is what is causing some adolescents to use social media more often), as is claimed by Candice Odgers at UC-Irvine and by Michael Rich at Harvard.1
The Importance of Experiments
To address questions of causality, social scientists usually turn to experimental studies that use random assignment of people to conditions. (These are sometimes called RCTs, for “Randomized Controlled Trials.”) For example, Davis and Goldfield (2024) randomly assigned 220 distressed college students to either an experimental group, which was asked to reduce its social media usage to no more than 60 minutes per day for three weeks, or to a control group that had no social media restrictions. The study found that the intervention group showed significant reductions in symptoms of depression, anxiety, and FoMO (fear of missing out), and they showed increases in sleep. Such a finding in an RCT supports the inference (though does not prove) that social media reduction caused these benefits, which supports the inference that social media use (especially heavy use) causes such harms.
Since 2019, we (Jon and Zach) and Jean Twenge have been gathering all such experiments we can find into our main open-source Google document (titled Social Media and Mental Health: A Collaborative Review). We search for and continually add studies on all sides, including those that fail to find any benefits from social media reduction. However, as we note in the document, scientific questions are not decided simply by counting up studies on both sides. Conclusions are more reliable when studies are weighted for measures of quality, such as having a large sample size as opposed to a small one.
This kind of analysis is called a “meta-analytical review,” meaning it is a study of studies. In a meta-analysis, the researcher gathers all the studies meeting specific criteria and then extracts measures of effect sizes from each study for the relevant dependent variables one is studying. How much benefit or harm is there when we compare the experimental group to the control group in each study for each particular outcome? All effect sizes are converted to a common scale, such as Cohen’s d, which estimates the standardized difference between two means. (That is the number of standard deviations by which the means vary. That number is usually well below 1 because groups rarely vary by a full standard deviation.) The average of these effect sizes is then calculated, but not as a simple average. Instead, it is a weighted average where studies with larger sample sizes and lower variance are given more weight than those with smaller sample sizes and higher variance.
The Ferguson Meta-Analysis
Stetson University psychology professor Chris Ferguson recently carried out a meta-analysis2, which was published in early 2024 in the Journal of Psychology of Popular Media under the title “Do social media experiments prove a link with mental health: A methodological and meta-analytic review.”3
Ferguson selected 27 studies for analysis (25 were published studies, two were dissertations). Most of the studies asked participants in the experimental condition to reduce their use of social media in real life, for a period of time, as with the Davis and Goldfield (2024) study that we described. We will call these “reduction studies.” But Ferguson also included seven studies that exposed participants in the experimental condition to some aspect of social media and then looked for effects related to mental health or wellbeing. We will call these “exposure studies.” These exposures usually took place in a psychology lab at a university, and all but one lasted between 5 and 30 minutes.
Ferguson merged all of the 27 studies together (and all of their outcome variables, from satisfaction to loneliness to depression) to calculate an average effect size of d = .088 (which means that the experimental groups differed from the control groups by about 9% of one standard deviation). This finding suggests a small benefit from reducing social media consumption, but when Ferguson calculated a confidence interval around that number, it included zero (though just barely), meaning that the possibility could not be excluded that there was no effect overall. Ferguson summarized what he thought to be the implications of his meta-analysis like this:
Currently, experimental studies should not be used to support the conclusion that social media use is associated with mental health. Taken at surface value, mean effect sizes are no different from zero. Put very directly, this undermines causal claims by some scholars (e.g., Haidt, 2020; Twenge, 2020) that reductions in social media time would improve adolescent mental health. [Bolding added by Zach and Jon]
Are Ferguson’s conclusions valid? Well, one of the most essential prerequisites for any meta-analysis is to ensure that there is no obvious moderator that greatly influences the outcome. For instance, consider an experiment testing the efficacy of a drug for reducing anxiety. Suppose there are two slightly different versions of the drug made by two different companies. If version A consistently reduces anxiety while version B consistently increases it, then 'drug version' would be a moderator. In this case, the result of averaging the effects across experiments would depend on how many of the experiments investigated versions A and how many versions B. Any assertion about ‘the effect’ without regard to the version of the drug would be meaningless, and concluding that the drug has no overall impact would also be a serious error. It would mislead the medical community and discourage doctors from prescribing version A. Instead, the two versions should be analyzed separately, with effect sizes reported for each.
The most obvious candidate for a major moderator in Ferguson’s study is duration.
We would not expect the benefits of social media reduction to kick in within the first few days, given that withdrawal effects are common when people have been heavy users of social media, or cigarettes, or any addictive substance or activity. According to Anna Lembke, who studies and treats behavioral addictions as well as biological addictions, withdrawal symptoms generally include anxiety, irritability, insomnia, depression, and craving. Acute withdrawal symptoms typically peak after a few days, but often last for up to two weeks.4 So, it makes little sense to combine one-day abstinence studies with four-week reduction studies when it is only the longer studies that get participants past the withdrawal period. It is these multi-week studies that offer us the best test of the hypothesis that social media use causes declines in mental health.5 Parents who are considering delaying the age at which their children get social media accounts should look to the long-term reduction studies for guidance, not to one-day abstinence studies.
In addition to the problem of duration, Ferguson blended together two very different kinds of studies: social media reduction studies and exposure studies that use entirely different methods. The conflation of these two types of studies is problematic because it does not make sense to consider a 5-minute exposure to Facebook measuring momentary mood with a 3-week abstinence to social media measuring risk of clinical depression on a validated scale to be measures of the same effect.
Ferguson’s 27 Studies, Reordered and Reconsidered
We think that Ferguson’s 27 studies should be divided up into three categories, which should be analyzed separately:
Multi-week field experiments. These ten studies examined the impact of reducing social media use for at least two weeks, allowing withdrawal symptoms to dissipate, and occurred in the participants' natural environment.
Short (one week or less) field experiments. Ten out of eleven6 of these studies examined either reduction or brief periods of abstinence from social media use, which are likely to pick up withdrawal symptoms from heavy users, and occurred in the participants' natural environment.
Lab experiments. In these six studies, the ‘treatment’ is typically brief exposure to some kind of social media, such as requiring high school students to look at their Facebook or Instagram page for 10 minutes.
We have produced three tables corresponding to those three categories, which collectively show all of Ferguson’s 27 studies. Positive numbers (shown in orange) indicate that reducing or quitting social media had beneficial effects (or that exposure to SM in lab experiments was detrimental), which indicates that social media is harmful. (That’s why we use orange—a widely used warning color—to mark such findings).
In contrast, negative numbers (shown in green) indicate that people who quit or reduced social media use were worse off, at least by the well-being measures that Ferguson chose to analyze. This suggests that social media is helpful, at least in the case of multi-week studies which get past withdrawal effects. (We mark such findings in green, which indicates “go” or “go ahead and use social media”).
In writing this post we draw heavily from an in-depth analysis made by David Stein,7 an independent scholar in the Czech Republic who has long studied and written about suicide rates at his blog (now Substack) The Shores of Academia. You can read Stein’s original post here: Fundamental Flaws in Meta-Analytical Review of Social Media Experiments. We have checked Stein’s findings carefully and we report them here, with additions and with his permission, in a form that we think will be more accessible to readers.
We begin with the first bin. Table 1 shows the 10 multi-week field (all reduction) experiments selected by Ferguson.8 In this table and in Tables 2 and 3 we list the effect sizes as calculated by Ferguson (which are reported in an online OSF supplement). As you can see, six of the studies are in orange and just one is in green. (Numbers that fell below .10 but above -.10 we left uncolored.) If we take the simple average of all the effect sizes we get d = .20, meaning, taken at face value, that there are consistent mental health benefits to participants for reducing their social media consumption for at least two weeks.9 If we take the weighted average, the effect size goes to +0.16 with a confidence interval of +0.06 to +0.26.
You can replicate our work by following all instructions in the online supplement. Note: These are uncorrected averages and CI’s, using Ferguson’s erroneous effect sizes and sample sizes. This also does not address how Ferguson blended very different dependent variables across the set of 27 studies.
Table 1. Multi-Week Social Media Field Studies (Two Weeks or More)
In contrast, Table 2 gives Ferguson’s effect sizes for the short-term reduction studies (one week or less). Most of these produced negative numbers for their effect sizes, which Ferguson took to mean that social media is beneficial, since people felt worse when they quit. For consistency with Table 1 we colored such cells in green, but of course if these short-term studies are primarily measuring withdrawal effects, then this apparent benefit is really indicative of a larger harm.
We separated the short-term studies into two sections in Table 2 to highlight an interesting finding: all four of the very short term studies produced negatively valenced effect sizes (in green), whereas the six one-week studies produced more variable results (three in orange, three in green).
Table 2. Short Term Field Studies (One Week or Less)
*Note that Deters & Mehl 2013 did not actually manipulate time spent on social media. It was a field study, but unlike all the others, it did not involve reducing social media. It is not clear that it increased exposure either.
The simple average effect size of the seven one-week studies is .04. The simple average of the four less-than-one-week studies was -.17
When we weight the studies by sample size and include confidence intervals, we find:
Seven one week studies: d = .08 (-.21, .37)
Four less than one week studies: d = -0.17 (-0.28, -0.05)
You can replicate our work by following all instructions in the online supplement. Note: These are uncorrected averages and CI’s, using Ferguson’s erroneous effect sizes and sample sizes. This also does not address how Ferguson blended very different dependent variables across the set of 27 studies.
Ferguson interpreted all of the negative numbers in Table 2 as backfire effects (that is, getting off social media was harmful, which means that social media is good), but we think that what looks like a backfire effect is really just the withdrawal effect we should expect from some users in the first few days (note that the average teen spends 5 hours a day on social media). The fact that the effect size increases so substantially as the duration of the reduction increases from one day to one week to several weeks supports this interpretation.
Now, let’s look at those seven exposure experiments, shown below in Table 3. These studies employ many methods, from spending ten minutes looking at your own Facebook page to spending twenty minutes communicating with others through Facebook.10 It is not clear that they should be combined. But if we do combine them, and if we draw only from Ferguson’s calculations, we find that the average effect size is d = .12 (-.09, .34). In our next post in this series, we’ll correct some errors in the analysis of the exposure studies and show that they produce mostly harmful effects. Once again, it would be misleading to merge these exposure studies with the reduction studies and report that there is no effect, and no benefit, from reducing social media.
Table 3. Social Media Lab Studies
The simple average effect size of the six exposure studies is .06.
When we weight the studies by sample size and include confidence intervals, we find d = .12 (-.09, .34)
You can replicate our work by following all instructions in the online supplement. Note: These are uncorrected averages and CI’s, using Ferguson’s erroneous effect sizes and sample sizes. This also does not address how Ferguson blended very different dependent variables across the set of 27 studies.
These results indicate a clear moderating effect of duration in the social media reduction experiments. Lab studies using heterogeneous methods produce heterogeneous findings. Merging all of these studies together, as Ferguson did, and then reporting that the overall effect is close to zero is as misleading as merging those two anxiety drugs together and reporting that they don’t work.
Conclusion
The Ferguson meta-analysis is the first and only meta-analysis of experiments that we know of, and once we organize it by type of study it consistently shows a pattern that when young people reduce their social media consumption for at least two weeks, their mental health improves.
Note that in this post, we used only the studies that Ferguson collected, we used only the effect sizes that Ferguson calculated, and we still found that his data contradicts his contention that his study shows that “reducing social media time has NO impact on mental health.”11
Was Ferguson’s selection of studies correct? Were his calculations for those studies correct? We looked into all twenty-seven studies, and in our next post we’ll show that Ferguson made a number of errors and questionable decisions. To offer a preview of Part 2, we will show that Ferguson made two calculation errors in his effect sizes, made multiple sample size calculation errors, included two studies that did not meet his inclusion criteria, did not include at least two studies that he should have known about, and included two failed experiments.
In our second series of posts, we will present our own analysis of the relevant experiments, starting from scratch. We find many more experiments to analyze (including five multi-week experiments that Ferguson did not include in his analysis, all of which found mental health benefits from reducing use), and we focus on anxiety and depression as the key outcome variables. We find stronger and more consistent evidence that social media causes mental health harms.
So, was Zuckerberg correct to say that “the existing body of scientific work has not shown a causal link between using social media and young people having worse mental health outcomes?”
We think he was wrong. Stay tuned for more.
Jamovi File
Weighted averages and confidence intervals were computed using Jamovi, in an effort to mirror the methods used by Ferguson in his meta-analysis. You can download our Jamovi file here. We include instructions in the online supplement.
Postscript, added Sept 1, 2024:
One final point: a blog post by Matthew Jané used Ferguson’s numbers and concluded that there is no relationship between the duration of reduction and the effect size. This is obviously not true. All the studies lasting less than a week produced substantial negative effect sizes while only one of the multi-week studies did so: Collis and Eggers (2022). Because that study has such a long duration (10 weeks) while all the other studies lasted between 1 and 28 days, this one study caused the regression line to appear flat in Jané’s graph. Jané says there is no reason to remove or ignore this outlier, but there is, which we were going to describe in Post 2: The manipulation largely failed. The experimental group, which was told to reduce its use of Facebook, Instagram, and Snapchat did reduce its use of those three platforms, but they switched to other apps and platforms, mostly toward Whatsapp. In fact, their total screen time did not drop at all. As the authors write:
Remarkably, although students in the treatment group significantly reduced their social media activities, their overall digital activities overall are not affected but, in fact, exceed those of the control group in block 2 (t-test, p = 0.026). This result indicates that students substituted or even overcompensated their social media usage with other activities.
In addition, there is no evidence that participants reduced their use of other social media platforms such as TikTok, YouTube, Twitter, and Reddit (all of which were widely used by college students in 2019).
In other words, Collis & Eggers was not a 10 week reduction study of social media overall, it was a reduction study about three specific platforms, which caused students to migrate to other apps and spend even more time on their devices. It should probably not be included in any meta-analysis of the effects of social media reduction. At very least this one unusual study should not be allowed to outweigh the very tight association shown in the other reduction studies: Multi-week reduction studies produced benefits, while one-day studies backfired, producing withdrawal effects.
Nonetheless, all of this is to show why we are not trying to do our own meta analysis with Ferguson’s data. In a future post we plan to do a review of the evidence from scratch so that we avoid these many problems. (It’s worth noting that we have also found five additional multi-week reduction studies, all of which find benefits to reduction, which are not in Ferguson’s meta).
There is also the possibility that the correlation of two variables is caused by a third variable. An example might be a hypothesis that neglectful parenting causes both depression and heavy social media use.
Ferguson carried out a “random effects model meta-analysis” that is meant to approximate the size of a single effect based on data from a collection of experiments measuring this single effect. For such a meta-analysis to make sense, the 'effect' must be well-defined, and all the experiments must measure the same effect — at least practically the same effect. Furthermore, the word 'random' indicates the presumption that the variation in experimental outcomes is indiscernible from the influence of a mere chance. As we will see, these requirements for a valid 'random effects model meta-analysis' are all violated. In addition, we will also see that Ferguson fails to conduct some basic elements of a 'meta-analytical review' (study of studies). We will discuss this in more detail in Part 3 of this series.
In this post we respond only to the ‘meta-analytic’ half of Ferguson’s paper. Ferguson also devotes much of the paper to a critique of the methods that are generally used in these studies.
It is estimated that nearly 10% of adolescent users experience problematic use, with the average teenager spending around five hours a day on these platforms. One-third of all U.S. teens say they are on one of the main social media platforms “almost constantly.” Such teens would likely find the first few days of abstinence uncomfortable and unpleasant.
In ch. 5 of The Anxious Generation, Jon gave Lembke’s definition of withdrawal symptoms and then added “This is basically what many teens say they feel—and what parents and clinicians observe—when kids who are heavy users of social media or video games are separated from their phones and game consoles involuntarily. Symptoms of sadness, anxiety, and irritability are listed as the signs of withdrawal for those diagnosed with internet gaming disorder.” Our claim that only the long-term reduction studies test the relevant hypothesis is based on what we have been writing for a long time.
The eleventh is Deters. We will explain in post 2 why Deters should have been excluded from the meta-analysis.
David Stein has criticized arguments and analysis by scholars on both sides of the debate, including Jon, who encouraged Stein to further critique his posts on After Babel (see the After Babel section of The Shores of Academia). We hired Stein this summer as a research assistant to help us analyze and present the current state of evidence and scholarship (note that Stein wrote his critique of the meta-analysis paper back in April).
We also thank our research assistant, Jakey Lebwohl, for his invaluable support, insightful contributions, and meticulous editing of this post.
We are not able to create tables on Substack. You can see (and find links) to all of the tables and studies in the online supplement for this post.
In random effects meta-analysis, the weights usually do not differ much, and indeed Ferguson's weighted average (0.088) is nearly identical to the plain average (0.084) of all the study effect sizes.
We note that five of the seven of these studies are about Facebook, and thus, tell us little about Instagram or TikTok, the platforms which teens use most.
This public tweet has recently been deleted. Nonetheless, Ferguson’s paper makes a similar bold claim that his meta-analysis “undermines causal claims by some scholars (e.g., Haidt, 2020; Twenge, 2020) that reductions in social media time would improve adolescent mental health.”
I get the need for the science behind the assertion that social media creates anxiety and depression. But smartphones do so much more damage than that. I have run my own experiments with my 5 children and it was stunning to see the difference in their behavior. The first day was awful but by day 2 they started to change and by day 3 they were back to my amazing children. We all talked about the huge difference. This was in 2018. Every person can use their critical thinking skills to make rational decisions about what they experience in their life. You don’t need an expert to tell you. Just as both Jon and Abigail Shrier have written-we have acquiesced our decision making to someone else. We as parents need to take that back. You really do know what is right for your kids.
Go through this article and substitute “social media” with the word “cocaine.” Over and over, we talk about the potential benefits of brief and partial reduction of “social media” use, ignoring the fact that in every imaginable way, social media use is an addiction. So is cocaine use. With that in mind, of course a reduction in either behavior or substance will lead to some improvement in mental health—or none. But we’re missing the point entirely. No cocaine recovery center strives for a brief or partial reduction in use. None. And no parent would strive for a brief or partial reduction in cocaine use at home. And yet here we are talking about brief and partial reduction of social media use because parents lack the courage and information required to eliminate all phone use before the age of 18. The kids simply don’t need the risk of phones, which are destroying the real joy of life for children and parents, as Jon has said.
But here’s the real point. Simply stopping an addiction is never enough, as every recovering addict knows. You must replace the addictive behavior with something constructive. What children need most is unconditional love—the kind without the stain of disappointment and anger. But parents don’t know how to give that. They never got it themselves. Controlling children or enabling them is not loving them. And parents don’t have to figure out unconditional love and guidance on their own. Just go to the free and agenda-free websites RealLove.com and RealLoveParents.com. I have nothing to sell. But I do offer thirty—30—years of intense experience with teaching parents and children all over the world. It’s love they need, not social media or phones or indulgence or entertainment.