The Fundamental Flaws of The Only Meta-Analysis of Social Media Reduction Experiments (And Why It Matters), Part 2
Fives types of errors in a recent meta-analysis bias the analysis in the same direction: that there is no effect of social media use on mental health
On August 29, we (Zach and Jon) published the first post in our series designed to systematically examine dozens of experiments testing the hypothesis that reducing social media usage benefits mental health.
In that initial post, we aimed to address several fundamental flaws in the only available meta-analysis on social media experiments and adolescent mental health, conducted by Stetson University psychologist Chris Ferguson, titled Do Social Media Experiments Prove a Link with Mental Health? A Methodological and Meta-Analytic Review.
After publishing our review, we received feedback regarding our framing and data analysis. Specifically, some criticized us for averaging simple effect sizes rather than weighting them by sample size, and not including confidence intervals. They argued that this was a major problem with our analysis. We want to thank Matthew Jané for providing a variety of constructive critiques. In hindsight, we understand that it would have been helpful to better clarify our intentions and the rationale behind our decisions. As a result, we have now updated that post to provide both the unweighted and weighted effect sizes and to provide confidence intervals for the four key averages we calculated across different types of experiments.
We do want to note that our omission of these statistics was intentional. David Stein has explained to us that he doesn't think Ferguson’s particular collection of experiments and composite effect sizes can be subject to a valid ‘random effects’ analysis. The primary obstacle is that the effect sizes reported are a blend of very different dependent variables across the set of 27 studies, which themselves used very different methods. David lays out why this is a problem in our next post (part 3 of the series).
A second obstacle is the numerous errors in Ferguson’s study, which will be the focus of this post. We will address five types of errors: miscalculated effect sizes, incorrect sample sizes, inclusion errors, exclusion errors, and the inclusion of failed experiments.
These errors significantly impact Ferguson’s results, making it appear that social media reduction has no (or practically no) effect on mental health outcomes. Correcting these errors would substantially increase the overall effect size and would flip the result of the meta-analysis from statistically insignificant to significant, as we’ll discuss in part 3.1
Addressing these errors—and the claims Ferguson asserts as a result of them—is important because much hinges on the question of correlation versus causation. Social media companies have consistently dismissed claims—and liability—regarding the harm that parents and teens believe was caused by their platforms. The companies claim that the scientific evidence at present is merely correlational, and that it does not point to causation. Experiments using random assignment are very important because they do indicate that a manipulation caused an effect, rather than merely being correlated with it. So it is very important to know what the existing experiments show, when taken together, and that is why Ferguson’s meta-analysis is so important. Do the existing experiments suggest causation, or do they indicate an overall effect size that is indistinguishable from zero?
But before we dive into the errors, we first need to address a fundamental problem underlying the entire study.
A Lack of Transparency
One of the most significant decisions Ferguson made in his meta-analysis was to create composite effect sizes by averaging a wide variety of dependent variables within each study—ranging from depression and anxiety to life satisfaction, loneliness, temporary mood, and more—to generate effect sizes for each of the 27 studies he examined
For instance, in Brailovskaia et al. (2022), the researchers measured the effects of a two-week social media reduction on various outcomes, including depressive symptoms, life satisfaction, subjective happiness, smoking behavior, and the burden of COVID-19. Ferguson averaged together some combination of these variables, but we do not know which variables he chose in this study, or in many other studies.
In our next post, David Stein will demonstrate why merging mental health variables (such as anxiety and depression) with other well-being variables (such as loneliness, happiness, or temporary mood) into different composites for different studies is a major problem. In this post, we simply want to highlight the fact that we don’t actually know which variables Ferguson included under “mental health” in each study. At no point did Ferguson describe (either in his paper or in his open science materials) which mental health or well-being variables he averaged. Nor did he provide a consistent methodology for calculating effect sizes, even for a single variable. There are multiple ways effect sizes could have been calculated. Ferguson might have used a simple difference in post- vs. pre-intervention scores for the treatment group, or he might have employed a difference-in-difference comparison of the treatment and control groups (where changes over the intervention period are compared between the groups), or used regression results provided by authors.
We reached out to Ferguson for clarification on how he calculated his effect sizes. Unfortunately, we still don’t have answers, as Ferguson has not shared this information with us. This lack of transparency makes it impossible to verify his claims or replicate his analysis.
Now, let’s examine the problems we’ve identified in his study.
Five Major Problems In The Ferguson Meta Analysis
Problem 1. Errors in effect sizes reported
Two of the twenty-seven studies were assigned effect sizes by Ferguson that appear to be indefensible, based on what is reported in the studies themselves. These errors involved the misinterpretation of a control group (Lepp & Barkley, 2022) and the assignment of a zero effect size contrary to the provided data (Brailovskaia et al., 2022).
Lepp and Barkley (2022)
The effect size that Ferguson assigned to Lepp and Barkley (2022), d = -0.365, is the largest negative effect of all 27 studies, indicating that it offers the strongest evidence that social media is beneficial, not harmful. Yet the authors themselves reached the opposite conclusion.
How did Ferguson come to such a conclusion? Let’s examine the study itself.
Lepp and Barkley was a social media exposure experiment. Participants were randomly assigned to one of four conditions. As stated in the study: “All participants completed the following 30-minute activity conditions: treadmill walking, self-selected schoolwork (i.e., studying), social media use, and a control condition where participants sat in a quiet room (i.e., do nothing).”
You can see the results in Figure 1, below. Participants in the “social media” condition (3rd line from the top) showed a significant drop in positive affect, compared to their baseline ratings. That seems to suggest that spending time on social media reduces positive affect, especially when compared to participants who walked on a treadmill or who spent the time studying (shown in the top two lines). Yet Ferguson labeled this study as having the opposite effect, indicating that social media exposure benefited the participants. How did Ferguson get to this conclusion?
Figure 1. Illustrates the condition (control, studying, treadmill, social media) by time (0 min, 15 min, 30 min) interaction for positive affect scores. *Significantly different score than the corresponding baseline value. †Significantly different score from the corresponding 15 min value. (p < 0.05 for all). Source: Lepp and Barkley (2022).
Stein dug into the study and concluded that the cause of the differing interpretations was that the authors had labeled the “sit and do nothing” condition as a “control” condition, when in fact it was not really a control condition; it was itself an aversive activity. Ferguson had simply compared the social media condition to the “sit and do nothing” condition and found that while positive affect declined significantly within 15 minutes, it didn’t decline by as much as it did in the “sit and do nothing” condition. This smaller decline in positive affect does not justify coding Lepp as providing evidence of a benefit, let alone as providing the strongest evidence of benefit of any of the 27 studies.
A control condition usually shows us what would have happened to participants if nothing had been done to them. We usually don’t expect much movement in the control condition. But in this study, the 4th condition was designed to be an unpleasant task, as we can see from the following hypothesis stated by the authors:
In other words, the authors never meant the label ‘control’ to be interpreted as a control condition to determine impacts of the other treatments, which is why they indeed never used it in this manner.
In fact, an experiment by Wilson et al. (2014) found that college students assigned to sit around and do nothing found the task to be so aversive that most of them chose to administer electric shocks to themselves, just to break up the boredom. In a similar fashion, college students in the Lepp study who were required to sit in a quiet room and do nothing for 30 minutes were understandably irritated by this, and it showed up on the post-test results, as you can see in the bottom line in Figure 2.
David Stein had reached out to Ferguson to inquire about his calculations, with Andrew Lepp CC’d. Lepp responded by thanking David, “for the accurate description of our referenced study” and urged Ferguson to answer David’s inquiry. Ferguson replied solely to Lepp, stating: “I’m comfortable with my interpretation of your data as related to the specific questions of the meta.”
Thus out of the 27 studies examined by Ferguson, the one listed as providing the strongest evidence against harmful impacts of social media (per Ferguson’s effect sizes) comes from a study whose authors concluded the opposite and dispute Ferguson’s effect size determination.
Brailovskaia et al. (2022)
Brailovskaia et al. (2022) randomly assigned 642 participants to either (1) reduce social media by 30 minutes a day for two weeks, (2) increase physical activity by 30 minutes a day for two weeks, (3) follow both instructions, or (4) make no changes to their routines. The relevant group for the Ferguson study is the social media reduction group.
The effect size that Ferguson reports for Brailovskaia et al. is d = 0. Yet the authors report numerous benefits to well-being from reducing social media use:
Results: In the experimental groups, (addictive) SMU, depression symptoms, and COVID-19 burden decreased, while physical activity, life satisfaction, and subjective happiness increased.
There is no indication of any other well-being measures that could counter the ones mentioned by the authors. (Even smoking declined in the social media reduction group). The benefits of social media reduction remained even 6 months after the experiment ended. An effect size of d = 0 must be an error.
We corresponded with Ferguson by email about this but he did not believe that there was any mistake made. His response:
“Thanks for the inquiry. Brail 2022 was a difficult one as I was hoping to get more detail from her on the effect sizes. Unfortunately, I didn't hear back from her on her inquiry. She reported non-significant results for most group differences, and I was interested in pattern of results through the 6-month outcome. By that point means were largely very close or overlapping.”
We are puzzled by his response because this study is one of the very best regarding reporting outcomes (e.g., Table 2 includes all the relevant outcomes and standard deviations all the way up to the follow-up 6-months later). Furthermore, we do not see any justification for assigning d = 0 as the effect size. Lack of statistical significance did not deter Ferguson from calculating an effect size in other cases. For example, the dissertation by Ward (2017) reported no significant results in any of the outcomes measured, and yet Ferguson assigned it d = -0.298, one of the strongest effect sizes favoring Ferguson’s views. In short, lack of statistical significance is no excuse for assigning zero effect, especially when the information included in the study (tables and graphs) clearly contradicts such assignment.
Problem 2. Incorrect Sample Size Calculations
When estimating the average effect size in most meta-analyses, studies with larger samples and lower variances are given more weight in the calculation, while studies with smaller samples and greater variances are given less weight. There are many ways to determine how much weight should be given to each study. Based on the information provided by Ferguson, the weights were determined by a Jamovi software module based on the effect sizes and sample sizes provided by Ferguson.
Ferguson reported the sample size of each of the 27 experiments in his meta-analysis in his OSF spreadsheet. However, we found a number of errors and inconsistencies in his sample sizes. It is important to point out and correct these errors because each study’s sample size determines the weight it is assigned, which in turn contributes to the overall estimate of the meta-analysis.
We have reached out about many of these errors, some of them being large errors and others being small. Ferguson agreed to change only two of the five that we had questioned.
Below are six studies for which we believe the sample size was in error. (Note that what matters is the relative change in the sample size error. A doubling of sample sizes will have a significant impact, minor increases or decreases will not have much impact).2
Table of Sample Size Mistakes
Table. relative changes greater than 50% are highlighted in red.
Another complication is that Ferguson did not provide information on how he calculated effect sizes when there were several groups manipulating social media time. For example, In Hunt 2021, there is a pure social media reduction group but also a social media reduction group that additionally requires participants to increase active use. It is unclear which groups in these studies were used by Ferguson for the calculation of the composite effect size. We note this problem exists in three studies: Sagioglou 2014, Ozimek 2020, and Hunt 2021. Because of this, we do not know if the sample sizes to these studies are correct as we do not know how he calculated his effect sizes.
Here are the details for the three studies which have significant implications for Ferguson’s results (those with greater than a 50% relative change):
Kleefield Dissertation
Kleefield was a week-long field study that assigned 82 participants to one of three groups: social media reduction (by 50%), control, and a video condition in which participants watched a twenty-minute video to raise awareness about social media use. Ferguson assigned this study a sample size N = 82, meaning he counted the participants in all three experimental groups.
However, Ferguson stated in his paper that each study must “include an experimental comparison of social media with a control condition.” In his sample size reporting, he should have only counted those participants who participated in a condition related to his meta-analysis; that is to say, social media reduction or control. He should have excluded the participants in the educational video group (29 participants).This is what he did in other studies with irrelevant experimental groups, such as Brailovskaia 2022. If he had done this also here, he would have obtained the correct sample size of 53.
Note that this is a study with a substantially negative effect size (meaning that social media reduction made participants feel worse than the control), to which Ferguson assigned a substantially overestimated sample size.
Lepp & Barkley (2022)
Lepp & Barkley, 2022 was an exposure study in which 40 participants completed four 30-minute conditions: exercise, studying, social media use, and idly sitting in a room. As we previously mentioned, the participants who used social media for thirty minutes felt significantly worse afterwards. However, Ferguson came to the conclusion that social media use is beneficial by comparing the social media condition to the “do-nothing” condition, which was even less pleasant for the forty participants than using social media.
Ferguson assigned this study a sample size N = 80. We reached out to Ferguson, and he agreed that this is an error and the appropriate number should be N = 40.
Again, this is a study with a substantially negative effect size, as assigned by Ferguson, for which Ferguson substantially overestimated the sample size.
Przybylski et al. (2021)
Przybylski et al., 2021 was a field experiment in which 600 participants were asked to abstain from social media for a single day (which led, predictably, to substantial negative withdrawal effects). Ferguson assigned this study a sample size N=600. However, even though 600 participants enrolled in the experiment, only 297 made it to the end and were included in the statistical analysis. Ferguson should have only counted the participants who were included in the analysis, as he did for most of the studies on his list. The correct sample size is N = 297.
Once again, this is a study with a substantially negative effect size, as assigned by Ferguson, for which Ferguson substantially overestimated the sample size.
Problem 3. Including studies that should have been excluded
Ferguson includes two studies that violate his own inclusion criteria, which is to only include experiments that manipulate time spent on social media. As Ferguson explains,
To assess the relevance of studies, we identified that they should meet the following inclusion criteria: include an experimental comparison of social media with a control condition, and studies must examine time spent on social media use, not other variables such as motivations for use, problematic use, etc.
Gajdics 2021 (d = -0.364) and Deters 2013 (d = -.207) should therefore be removed from his analysis.
Gajdics and Jagodics (2021)
Gajdics and Jagodics (2021) (d = -0.364) isn’t really a social media experiment because it only tested what happens when high school students go without their phones during one school day. The strong effects Gajdics observed, which he calls ‘nomophobia’ (fear of being without a mobile phone), are likely more about the anxiety of not having a phone for kids who always have their phones with them, rather than the specific effects of not using social media during school hours. Taking kids’ phones away and then asking them questions at the end of the day is a very good way to study withdrawal effects. It cannot be taken as evidence that social media is good for kids. Yet Ferguson presents it as the study with the second-largest negative effect d = -0.364 (indicating that social media benefits mental health in Ferguson’s methodology), essentially tied with Lepp and Barkley.
If Gajdics and Jagodics’s study is included, then other similar phone abstinence studies, like Brailovskaia et al. (2023), should also be included, along with the many other studies that have examined the effects of smartphone detoxes.3
Deters and Mehl (2013)
In Deters and Mehl (2013) (d = -0.207), the treatment was to increase the frequency of updating Facebook status for a week, which resulted in a decline in feelings of loneliness. However, the study included no measure of time spent on social media, and this was a requirement that Ferguson had listed explicitly as one of his inclusion criteria: “studies must examine time spent on social media use” (p. 3). Most users spend a substantial amount of time on social media, including both active and passive use, so the instruction to post more updates may have simply shifted a bit of time from passive to active. It is plausible that increasing frequency of status updates, and receiving responses from friends and family, led to the decrease of passive use of FB as well as to a decrease of posting activities such as engaging in argumentation with strangers. We simply do not know how the time spent on social media use was affected as this was not measured. This study should not have been included.
Problem 4. Inclusion of Failed Experiments
Two of the studies Ferguson included did not really contain an “experimental comparison of social media with a control condition,” as he required, due to failed experimental manipulations. As Ferguson states,
“To assess the relevance of studies, we identified that they should meet the following inclusion criteria: include an experimental comparison of social media with a control condition, and studies must examine time spent on social media use, not other variables such as motivations for use, problematic use, etc. The studies must include enough information to calculate an effect size d”
In one study, the control group reduced its social media use nearly as much as the experimental condition (van Wezel et al 2021). In another, participants in the reduction group decreased their time spent on three social media platforms (Facebook, Snapchat, and Instagram), but they actually spent much more time on instant messaging platforms like WhatsApp, and likely also video platforms like YouTube and TikTok, and their total screen time actually increased and because they were only told to reduce three social media platforms (Collis & Eggers 2022).
van Wezel et al. (2021)
In van Wezel et al., 2021, the study authors assigned 102 undergraduate students to one of two conditions: in the treatment condition, participants were to reduce their social media usage by 50%. In the control condition, participants were to reduce their social media usage by only 10%.
The researchers measured participants’ actual time spent on social media through the built-in screen time reports found in smartphones. By the end of the study, they found that those in the treatment condition reduced their social media use by 58.3% - a significant reduction. However, the participants in the control group reduced their time on social media by 49.7% - almost as much as the treatment group! It does not make sense to compare these two treatments and conclude that the difference in their outcomes can be used to estimate the effects of substantially reducing time on social media.
In other words, since this study did not test anything close to a modified social media condition compared to an unmodified control group, it is not fit to be included in a meta-analysis testing the claim that social media is detrimental to mental health, and significant social media reduction will result in improved mental health.
Collis & Eggers (2022)
In Collis & Eggers, 2022, 121 undergraduates completed one of two conditions: an experimental group, in which they were told to limit their time on each of Snapchat, Facebook, and Instagram to 10 minutes per day, and a control group, in which no specific instructions were given.
The study authors measured the amount of time spent by their participants on all digital activities, including social media. Sure enough, the participants in the treatment group substantially reduced the amount of time they spent on the three platforms, while those in the control group did not. However, the treatment group actually spent more time on digital technology use than the control group.
How is this possible? While the treatment group reduced its time spent on Facebook, Snapchat, and Instagram, they increased their time spent on instant messaging platforms (e.g WhatsApp). There is also no indication that the participants spent any less time on other social media platforms, such as Twitter, Reddit, YouTube, and TikTok, which were all widely used by college students at the time the study was conducted.
Problem 5. Excluding studies that should have been included
Ferguson excluded two studies that met his inclusion criteria, which he should have known about because they were in our main collaborative review doc: Mosquera et al. (2019) and Engeln et al. (2020).4 In his meta-analysis, Ferguson credits two blog posts as valuable sources he consulted when searching for experimental studies: “Studies identified by previous commentators (e.g., Hanania, 2023; Smith, 2023) as this was valuable locating studies in other fields such as economics.” Smith, however, refers to Hanania for a list of studies, and Hanania explicitly states that the studies he mentions came from our Google doc: “Jonathan Haidt and Jean Twenge have put together a Google doc that contains all studies they can find relevant to whether social media use harms the mental health of young people.” Ferguson is aware of the document and has left comments on it from 2019 to 2023. He is also aware of the experimental studies section because on August 15, 2023, he added a comment to the doc regarding one of the experimental studies (Faulhaber 2023) on the list. (Via email exchange, Ferguson told us that he did his search for studies in August, 2023).
The first study, Mosquera et al., was a one-week abstention from Facebook that found significant declines in depression and minimal impacts on other variables, such as being satisfied and feeling life is worthwhile. This study has received attention from the media, and more importantly, it has been in our collaborative review doc for years.
The second study, Engeln et al., was a social media exposure study that compared the mood effects of seven minutes on Facebook and Instagram to the effects of a matching game on an Ipad (control condition). It found a significant decline in body image as well as a worsened mood for those who used Instagram compared to the control condition. Similarly, it has been in the Google doc for years.
It is worth noting that we have found many other additional studies that were not included in his paper, though we do not know if Ferguson should have found them, besides those published after August 2023. After a brief google scholar and Elicit search, we found five additional multi-week reduction studies. All five of those studies report mental health benefits for reducing use. We also found five one-week reduction studies, and one exposure study. We will discuss these studies in our own analysis in a future post.
Conclusion
In this post, we have shown that Ferguson has made numerous factual and procedural errors in his meta-analysis. All of these errors impact his results. Collectively, they make it appear as though social media reduction has little to no effect on mental health. We believe that numerous corrections are needed.
Here are the five, in brief:
1. He made two errors in his effect size calculations.
How does this influence his results? The largest effect size showing that social media benefits mental health was an error. The reality is that the study’s authors came to the opposite conclusion: that social media use was harmful. The second error also hides improvement from social media reduction. Both of these errors drag the average effect size down, closer to zero.
2. He made at least five errors in his sample size calculation.
How does this influence his results? Sample sizes are used to determine the weighting of each study, and influences how much each study “matters’ in the analysis. The three largest errors Ferguson made inflate the importance of studies that pull the average effect size down, closer to zero.
3. He included two studies that should have been excluded.
How does this influence his results? The two studies he included both had strong negative effect sizes. Including them dragged the average effect size down, closer to zero.
4. He included two failed experiments.
How does this influence his results? The two studies he included both had strong negative effect sizes, even though the manipulations in the studies failed or backfired in important ways. Including them dragged the average effect size down, closer to zero.
5. He failed to include two experiments he should have known about.
How does this influence his results? The two studies he did not include both find that social media is harmful. Excluding them dragged the average effect size down, closer to zero.
In sum, all of Ferguson’s major errors biased his analysis in the same direction: toward the conclusion that social media use has no effect on mental health, and that his meta-analysis ultimately “undermines causal claims by some scholars (e.g., Haidt, 2020; Twenge, 2020) that reductions in social media time would improve adolescent mental health.” However, as we have demonstrated so far in this series, this conclusion was only reached at the end of a long series of errors. We also note that the lack of transparency renders Ferguson's meta-analysis largely unverifiable and impossible to replicate. We hope that he will make these changes and provide researchers with the necessary data for full replication.
In the next post in this series, David Stein will show how correcting even one major error in the data flips the result to significance. Stay tuned for more.
In all of our posts in this series, we draw heavily from an in-depth analysis made by David Stein, an independent scholar in the Czech Republic who has long studied and written about suicide rates at his blog (now Substack) The Shores of Academia. You can read Stein’s original post here: Fundamental Flaws in Meta-Analytical Review of Social Media Experiments.
We thank Jakey Lebwohl for catching many of these errors.
Note also that Gajdics and Jagodics (2021) is not a randomized trial – neither participants nor conditions are randomized – and has no control group; instead, there is just a pre-intervention and post-intervention measurement for a single group. While we do not count this as an error, we acknowledge that many psychologists would deem Gajdics and Jagodics (2021) to be a ‘quasi-experiment’ rather than an experiment.
There are numerous other relevant experiments we have found, but these are the only two for which we can be certain Ferguson should have known about them.
I am glad people like Dr Haidt are working on this but GD. How much data do we need to prove that the sky is blue before we'll believe it?
It doesn't take a hypothesis when kids never look up from their screens to converse.