Is The Gallup World Poll Reliable?
The way some researchers use the GWP can explain why they fail to find effects of digital technology on youth mental health
Intro from Zach Rausch and Jon Haidt:
For some time, Jon, Jean Twenge, and I have been engaging in a productive academic debate with a few researchers about the scale of the youth mental health crisis. We argue that the mental health of adolescents—measured by levels of anxiety, depression, and other indicators—began to decline in the early 2010s across the Anglosphere, the Nordic nations, and numerous Western European countries.
Is the decline global, or is it limited to a subset of developed Western nations? In a previous post here at After Babel, Dartmouth economist Danny Blanchflower independently found that young people (ages 18-25) now report worse mental health than all other age groups in 82 countries around the world. In nations where we have long-term data, this was not true before 2015. It used to be that young adults and older adults were the happiest, producing a “U-shaped curve” when you graph happiness by age. (Note that youth suicide rates do vary more across nations and are not the focus of this post. See Zach’s posts on suicide variation here, here, and here.)
Although many researchers now agree that there is a youth mental health crisis in the United States, some argue that this is not a multi-national phenomenon and does not seem to be impacting youth from the developing world in the same way. While we acknowledge some regional and between-country variation (e.g., the changes since 2010 appear strongest in the most wealthy, secular, and individualistic nations), along with within-country variation, data from the developing world has been relatively sparse.
But we are starting to accumulate more data, and we are seeing that young people are reporting worse mental health than all other age groups in nations all over the world. A number of surveys, like the Program for International Student Assessment (PISA) (one of the few that surveys teens), and more recent large-scale surveys, such as The Global Minds Dataset (created in 2019), found declines in many non-Western nations, especially in recent years. (For a overreview of all available global mental health datasets we can find, see here).
At the same time, there is another dataset that does not show these same widespread trends: The Gallup World Poll (GWP).1 This discrepancy between datasets has naturally caused disagreement about what has been happening to young people. For those who rely on the Gallup World Poll, the mental health problems of the young are concentrated in some developed countries. For those who rely on other datasets, mental health problems are worsening on a larger scale. Which surveys are more accurate, and why do they differ so much?
In this post, Dartmouth College economist Danny Blanchflower examines the Gallup World Poll in-depth and identifies four major problems that he believes explain why the GWP finds such different results from other international datasets.
His insights about the Gallup World Poll matter not only for understanding youth mental health around the world but also for the academic studies that rely on the GWP that cast doubt on the idea that the Internet (and social media, in particular) are harming youth mental health.
* Note that we will soon have a Substack post that addresses these three articles directly.
As you will see below, relying on the Gallup World Poll to make these kinds of claims may be a problem.
– Zach and Jon
[Note: You can read a response to Blanchflower from Gallup in a post published July 3rd, 2024 titled, A Debate on the Strengths, Limitations, Uses and Misuses of the Gallup World Poll]
Evidence of an international decline in the well-being of young people since the early 2010s has been mounting. In the U.S., recent declines in youth mental health ar eevident across various datasets, including the Household Pulse Survey conducted by the Census Bureau, the CDC’s Behavioral Risk Factor Surveillance System (BRFSS), and the National Health Interview Surveys. Additionally, data from the Census Bureau’s Household Pulse Survey indicates that young people experienced the highest incidence of COVID-19 and the worst mental health from 2020 to 2022. These datasets also show that, in the U.S., the youngest adults are now the least happy, while older generations are progressively happier.
Internationally, similar evidence is found across various datasets, including the Global Minds Database and the International Social Survey Program. In a previous Substack post, I demonstrated that these declines have altered the traditional U-shape in happiness and the hump shape in unhappiness (which you can see for the U.S. in the blue and green lines in Figure 1), with young people now being the least happy across all age groups (see the red and yellow lines in Figure 1--the hump shape is gone)
Figure 1. Changes in Despair by Age. Age is on the X-axis; We measure despair as the percentage of respondents reporting 30 days of poor mental health in the last 30 days (depicted on the Y-axis). Source: Behavioral Risk Factor Surveillance System (BRFSS) surveys.
However, one dataset stands out for seeming to contradict this finding: the Gallup World Poll (GWP). The GWP is a publicly available individual-level survey that has been conducted since 2005 and currently includes data from 168 countries. It has been used by several researchers to examine well-being, along with a couple of my own papers on the midlife crisis.
However, upon shifting my focus to the well-being of the young, I have identified several major issues with using the Gallup World Poll to examine well-being. This has led me to conclude that the GWP is unreliable for understanding global mental health trends nor its relationship with digital media use, especially when breaking results down by sub-groups and sub-regions (e.g., young men or women in South America). In this post, I will discuss four primary concerns with the GWP and the broad implications that come along with them.
The four, in brief:
Samples are small, especially for large countries, and do not appear to be nationally representative.
The survey’s well-being measures did not decline during the Great Recession or COVID-19, which seems unlikely to reflect reality, given that most other surveys show declines.
The timing of the survey varies by month within and between countries.
The GWP shows that internet “access” is positively correlated with well-being, but the survey does not ask about the intensity of use, which is what matters.
Taken together, these problems cast doubt on the validity of the Gallup World Poll results and studies that rely on this dataset to measure the impact of the internet or social media on the well-being of the young, or the old for that matter.
Here are those four concerns in longer form.
1. Samples are small, especially for large countries, and do not appear to be nationally representative
From 2005 to 2023, the Gallup World Poll (GWP) sampled 168 countries with a total sample size of 2,734,564. Although this may seem like a large and representative sample, problems with the dataset become evident upon closer inspection. Notably, the sample size in large and small countries is not significantly different. For example, between 2005 and 2022, there were 15,097 observations in Malta (population 0.5 million in 2024), 15,684 in Cyprus (1.3 million), 17,271 in Kosovo (1.9 million), and 20,507 in the United States (340 million).
In the years 2020-2023, there were 4,020 observations for the entire U.S., with only 319 of those under the age of 25. Breaking it down by year, only 33 young women were surveyed in 2021 and 69 in 2022. This is too small a sample upon which to claim anything about trends facing American youth. (In contrast, the Behavioral Risk Factor Surveillance System (BRFFS) surveys 350,000+ Americans annually).
Table 1. Total # of young women aged 15-25 (2010, 2015, 2020-2022) in twelve countries
Does this impact the validity of the results? Yes. We can check the quality of one of these samples—the U.S.—and compare it with a much larger sample for the U.S., also surveyed by Gallup, with exactly the same variables and years as the GWP. The results are very different. In other words, the GWP does not appear to be nationally representative when compared to a much larger survey by the same company.
I took the microdata from Gallup’s U.S. Daily Tracker (USDT) for the period 2008-2017 (n=3,530,270) and also the microdata from the GWP for the United States (n=12,231). We examined the most used measure of well-being in the GWP—Cantril’s ladder, which was also used in Gallup’s U.S. Daily Tracker.
Q1. Please imagine a ladder with steps numbered from zero at the bottom to ten at the top. Suppose we say that the top of the ladder represents the best possible life for you and the bottom of the ladder represents the worst possible life for you. If the top step is 10 and the bottom step is 0, on which step of the ladder do you feel you personally stand at the present time?
The first thing to note is that both the average Cantril (life satisfaction) scores for each year and the direction of change over time between the two datasets did not correspond with one another.
To get a sense of just how different the results from the two datasets are, we can compare how the scores from U.S. states compare against each other, as Figure 2 shows. If the surveys were consistent, all the data points would be on the 45-degree line, but they are far from it (the R-squared is only 0.08).
For example, Hawaii ranks #1 on the Daily Tracker but #26 on the GWP. Conversely, West Virginia, which ranks 51st on the Daily Tracker, ranks 15th on the GWP.
State Cantril Rankings: Gallup World Poll vs. U.S. Daily Tracker
Figure 2. Comparing U.S. state Cantril rankings between the Gallup World Poll and the U.S. Daily Tracker.
In sum: The GWP does not seem to be nationally representative. If the results are not representative in the United States, this casts doubt on the representativeness of all other GWP countries, where we do not have an equivalent of the Daily Tracker. We have no idea how representative the small surveys in the GWP are for other large countries, including France, Germany, Italy, China, India, the UK, Indonesia, Pakistan, and Brazil, or any other country for that matter.
2. The survey’s well-being measures did not decline during the Great Recession or COVID-19, which seems unlikely to reflect reality, given that most other surveys show declines.
In a previous paper, my research partner Alex Bryson and I examined how the GWP data for Europe was impacted by two major shocks: the Great Recession (2008-2009) and COVID-19 (2020-2022).
Surprisingly, Cantril's life satisfaction measure, administered as part of the GWP dropped between 2006 and 2007, the years preceding the Great Recession, only to rise in 2008 before falling again in 2009 and 2010. This pattern suggested that people became more satisfied with life during the Great Recession, which strains credulity.
We observed similar unexpected trends during the COVID-19 pandemic. Cantril's life satisfaction scores, according to the GWP, rose in 2019 and 2020, coinciding with the onset of the pandemic, and increased further in 2021. Also note that the binary well-being items in the GWP for Europe, such as enjoyment, pain, smiling, sadness, anger, and worry, showed little to no change in response to these significant shocks. This seems unlikely.
These findings contrast sharply with evidence from almost all other available surveys, including the European Social Survey, Eurobarometer, and the IPSOS Happiness Survey (2018-2023) reported in Blanchflower and Bryson (forthcoming article in Plos One). These surveys typically show declines in well-being during these two crises.
Why do the GWP findings differ so significantly from other surveys?
3. The timing of the survey varies by month within and between countries.
A major problem with the GWP survey is its inconsistent timing, which partly explains the lack of decline in various well-being measures in response to shocks like the Great Recession and COVID-19. The GWP collects data in different countries at different times of the year, and this timing varies annually.
For example, within countries, the GWP collected data in India in July 2008 but not again until November 2009. Between countries, the timing also varied: in 2008, the U.S. sample was drawn in August, the Indian sample in July, and the Spanish sample in April. In 2009, the samples were drawn in July, November, and April, respectively. (Table 1 shows the dates of data collection for the three countries: the USA, India, and Spain)
Table 1. Sampling dates in the U.S., India, and Spain
This irregular timing means that some of the impacts of significant events, like the Financial Crisis, were missed in some countries. Additionally, the timing of data collection can affect well-being measures due to seasonal variations. As research has shown, seasons significantly influence well-being. Collecting data in the middle of winter versus the middle of summer can yield very different results.
4. The GWP shows that internet “access” is positively correlated with well-being, but the survey does not ask about the intensity of use, which is what matters.
In three widely discussed papers, Vuorre and Przybylski (2023a, 2023b, 2024) examined well-being in the GWP using Cantril’s life satisfaction measure and several binary variables, including pain, worry, sadness, stress, anger, smiling, laughing, and enjoyment (e.g., Did you smile or laugh a lot yesterday, yes or no? or Did you experience the following feelings during a lot of the day yesterday? How about worry? yes or no). All three papers found—on the whole—that gaining access to the internet, mobile phones, or Facebook had little or no harmful effect on people’s well-being and that there were even some signs of benefits. (Note that one of the studies did show some signs of worsening effects for young women.)
All three papers are plagued by the issues of the GWP that I have discussed in this essay: unrepresentative and small samples (especially when breaking down by age, region, and sex), inconsistent findings compared with other representative datasets, and sampling methodology that lead to a number of strange findings (sampling before the recession began and then after, and thus missing any change in well-being during the financial crisis).
It should be said that the three papers are also very hard to make sense of. For example, Figure 1 in Vuorre and Przybylski (2023a) has 22 separate boxes, sixteen of which have about a hundred lines plotted in each tiny box. Figure 2 of that paper has six columns and around a hundred rows, all of which are difficult to read, with several hundred uninterpretable squiggly lines in each column. Figures 1 (with 20 boxes), Figure 2 (9 boxes), and Figure 3 (with 40 boxes plus eight empty ones) and many lines too small to see. Vuorre and Przybylski (2023b) and Vuorre and Przybylski (2024) have the same issues. To illustrate, see this figure from Vuorre and Przybylski (2023a) showing the mean responses of 72 countries on three GWP well-being measures:
Figure 3. Figure A and B from Vuorre and Przybylski (2023a). “72 countries’ mean responses to three well-being scales in the Gallup World Poll from 2008 to 2019, separated by age category and sex.”
Some of the papers also suffer from additional issues with additional variables. In one, they found that access to the internet at home and using the internet in the past seven days are positively correlated with well-being and negatively correlated with ill-being. I replicated these findings, confirming that internet access variables are positively correlated with well-being.2 But what does this really mean? What Vuorre and Przybylski have shown is that developed countries with almost 100% access to the Internet are happier than developing countries with less access. That, in my view, tells us nothing about the impact of internet and smartphone usage on the well-being of the young. I outline my reasons below.
I examined these data and focused on one of the variables used (wp15862), which measures whether the respondent used the internet in the last 7 days and is available every year from 2015-2022 (n=813,776). This variable has a mean of 91% across all countries. In developed nations, it is substantially higher. In the U.S., Netherlands, Lebanon, Saudi Arabia, UK, Canada, Belgium, Spain, Italy, Czechia, Sweden, Denmark, Hong Kong, Israel, Vietnam, Thailand, New Zealand, South Korea, Austria, Bahrain, Estonia, Ireland, Luxembourg, UAE, and Norway, the mean access rate is over 95%. In other words, in developed countries, there’s just not much variance, so it will be difficult for that variable to correlate substantially with anything.3
In the U.S. GWP sample, just 133 people did not use the Internet in the prior week between 2020 and 2022. Among young people, none of the 313 individuals under the age of 25 had not used the Internet. Nobody. Most non-users are over 65 with low incomes.
To further illustrate this problem, I examined how many females under age 25 did not use the internet in the last seven days. In total, 7% of women (2,693 out of 38,710) did not use the internet between 2020-2023. Notably, there are 38 countries with five observations or fewer, including: Albania (2); Armenia (2); Austria (3); Azerbaijan (1); Bosnia and Herzegovina (1); Brazil (2); Costa Rica (5); Cyprus (2); Denmark (1); Georgia (3); Greece (1); Hong Kong (2); Iceland (1); Ireland (3); Jamaica (3); Jordan (4); Kazakhstan (4); Kyrgyzstan (4); Lithuania (2); Luxembourg (2); Malta (1); Moldova (1); North Macedonia (1); Northern Cyprus (4); Poland(1); Puerto Rico(1); Singapore (1); Slovakia (2); South Korea (2); Spain(1); Switzerland (1); Thailand (1); Turkey (1); Ukraine (1); United Kingdom ( 3); Uruguay (3); USA (0) and Vietnam (1). When there is no data, it is essentially impossible to produce reliable estimates.
This matters for the analyses of Vuorre and Przybylski because this variable is clearly a poor indicator of how much internet usage impacts well-being. There is no variation in the data, and hence, these are entirely inappropriate measures to examine the impact of the digital age on well-being and, especially, youth well-being. Essentially, all the authors have identified is that the 95% of people who used the internet the prior week are happier than the 5% who didn’t.
Conclusion
In my view, the Gallup World Poll (GWP) is too unreliable for making sense of global mental health trends. It has significant issues related to sample size, especially for large countries like the United States, which have similar sample sizes to those from small countries like Malta. I found little resemblance between the well-being findings from the U.S. GWP file and a much larger file with the same questions and years (2008-2017).
In many countries, sample sizes are extremely small, making it difficult to identify any separate effect for young people in general and young women in particular.
The questions used in the GWP, including Cantril’s ladder, which refers to life satisfaction as an integral measure of the past, do not respond to major macro shocks like COVID-19 and the Great Recession.
The fact that the GWP doesn’t show the declining well-being of the young over the last decade or so outside of the developed world does not mean that this decline isn’t happening. The decline in the well-being of the young means that mental health now improves with age. These trends are clear in better datasets such as Global Minds and others in Australia, the UK, the USA, France, Germany, Spain, Sweden, New Zealand, and Italy, which have better sample designs.
All of this is to say: The Gallup World Poll is an unreliable dataset for understanding global well-being. From the World Happiness Report, which relies on the GWP, to the three studies by Vuorre and Przybylski that depend heavily on this data, the findings should be taken with a big pinch of salt.
Gallup Responds to Danny Blanchflower
By Jonathan Rothwell, PhD and Rajesh Srinivasan, PhD4
We were surprised to see the comment from Danny Blanchflower on this Substack, which questioned the reliability of the Gallup World Poll. We know he has frequently published academic articles using Gallup’s global wellbeing data, including one this year, and has evidently not retracted any of them.5
While Gallup welcomes constructive criticism of our data collection methods, findings and any other component of research, Blanchflower’s grounds for declaring that "The Gallup World Poll is an unreliable dataset for understanding global well-being" are based on analytical errors and poor reasoning. Hence, we feel it is important to correct the record.
Let’s review each point.
Claim 1. The World Poll sample sizes are too small for large countries.
Reply 1. The World Poll annually conducts roughly 1,000 interviews with a nationally representative sample of adults per country, per wave. This is more than enough to provide population-wide estimates at the national level, which is what the World Poll is designed to measure. Importantly, nothing prevents scholars from pooling data across years to boost sample sizes for sub-populations like youth. The U.S. Census Bureau often uses a similar approach, providing multi-year estimates through its American Community Survey, for example.
Blanchflower claims that because the United States is a large country, it requires a sample size that is scaled proportionally to produce reliable estimates. This is an inaccurate view of statistics. Statistical software packages and scientific organizations provide online power calculators to help researchers determine the sample size needed to test hypotheses. To test whether a 16-percentage-point decline in the share of youth who are thriving in wellbeing is significant in the United States (see below), one only needs 143 respondents per period. The World Poll’s U.S. sample includes 470 people between the ages of 15 and 24 from 2018 to 2023 and 561 between 2005 and 2011.6 The capacity to draw reliable inferences from a randomly sampled population is what allows small-sample U.S. election polls to consistently come within a reasonable margin of error of predicting the presidential vote every four years.7 These points should be obvious to any quantitative researcher.
Gallup’s World Poll is also similar in size to other well-established and recurring international surveys, including those that Blanchflower uses. The World Values Survey (WVS) averaged 1,705 responses per country for Wave 7, collected over six years from 2017 to 2022 in 92 countries. During that same period, the World Poll collected about 6,000 responses per country across 142 countries. Blanchflower has not mentioned any concerns about the WVS sample size in his recent publications, despite its much smaller size than the World Poll.8 If he had compared the two surveys, he’d find Gallup’s measure of current life evaluation (using the Cantril ladder) has a correlation of 0.59, with the World Values Survey measure of life satisfaction, using data from 2017 to 2022.9
Claim 2. The World Poll wellbeing results do not match other Gallup wellbeing data.
Reply 2. This is incorrect and appears to be based on several errors.
Despite different data collection methods (a different mix of mail, phone, and web), Gallup’s U.S. wellbeing database (Gallup National Health and Well-Being Index) closely aligns with World Poll data.10 Aggregating data from 2009 to 2023, when both sources have available data collected with comparable survey methods, the mean share of U.S. adults who are thriving in wellbeing is 54.6% using the World Poll and 53.5% using the national database.
Blanchflower tries a different test to determine reliability. He looks at the correlation between state-level measures of wellbeing using the U.S. Well-Being Index database and World Poll, ignoring the fact that the World Poll is not designed for state-level analysis and has small state-level sample sizes for many states like Hawaii, Vermont and North Dakota. It is puzzling that Blanchflower conducts an analysis on a subset of a database, while complaining about small sample sizes in the aggregate database. Gallup methodologists do not create state-level weights for the World Poll, as they do for other Gallup databases that are designed for state-level reporting. This weighting issue alone can create large discrepancies between databases.
Even with these limitations, we replicated Blanchflower’s analysis, using the same data and period (2008 to 2017), and found that he made a basic error. Blanchflower says he used the Cantril ladder to measure wellbeing.11 Blanchflower reports an r^2 value of .08 (implying a correlation of around 0.28). We replicated that correlation (r=0.30) using unweighted data, but the correlation jumps to 0.45 using the sample size as a weight. Because the precision of the point estimate is proportional to the sample size, but the sample size varies widely across states (because it is based on population), Blanchflower made an analytical error by not using the sample size as a weight in his analysis.
Blanchflower’s use of a single measure of wellbeing -- rather than two (current and future) -- further injects error, and thus he finds that states with small populations and small sample sizes in the World Poll (Hawaii and West Virgina) rank differently across the two data sources. When we calculated the percentage of population that is thriving in wellbeing (using both the current and expected Cantril ladder), Hawaii’s and West Virginia’s rankings look similar across the Gallup Well-Being Index and the World Poll (1st and 3rd and 50th and 42nd, respectively).
Claim 3: Blanchflower argues that Gallup measures of wellbeing did not decline during the Great Recession or pandemic, according to World Poll.
Reply 3: The first point is bizarre because it is so easily refuted by publicly available summary data for the United States. According to World Poll data, the share of U.S. adults thriving in wellbeing fell from 66% in 2007 to 61% in 2008, just after the Great Recession began, and dipped further to 57% in 2009, when unemployment spiked.
While it is true that the World Poll data for the United States do not show a drop from 2019 to 2020 or 2021 in overall wellbeing (percentage thriving), World Poll data collected in May 2019 and May 2020 in the United States show a large increase in the share of adults who “worried a lot” the day before the survey; increasing from 35% to 43%. Gallup fielded the same item about worry on its U.S. Panel. The May 2020 point estimate for worry is 46%, not meaningfully different from the World Poll estimate, especially since the two surveys were not fielded on the same days.
As non-World Poll Gallup data for the United States show, wellbeing fluctuated greatly from month to month and even day to day, according to things like spikes in unemployment, hospitalization counts, pandemic relief checks, business re-openings, and vaccination announcements and implementation. It would be foolish to expect a point-in-time national survey to reflect all of this variation, and Gallup never intended for the World Poll to be used to analyze monthly national trends.
Claim 4: Gallup data are collected during different months and are thus unreliable because happiness varies by season of the year.
Reply 4. The World Poll has tried to stay consistent in its annual measurement to minimize the effect of seasonality by collecting data between April and November, in most years. Given that much of the data are collected in-person and so many countries are involved, Gallup also needs to consider religious observances, weather patterns, pandemics, war, and other local factors in deciding when to enter the field.
From a research perspective, variation in research collection timing is not a serious obstacle to analysis. There are well-known techniques to test for seasonal effects and adjust for them. In our recent paper, for example, we used month-effects to adjust for cross-country differences in the timing of economic-related items. A quick analysis suggests that, globally, month effects are small (on the order of 0 to 1.7 percentage points on the share of the population that is thriving).
Claim 5. The World Poll is not aligned with other and better data sources showing falling mental health.
Reply 5. The countries that Blanchflower claims have seen a decline in youth wellbeing using “good data” also show a decline in youth wellbeing in the Gallup World Poll, further validating the World Poll.
Blanchflower states that "The decline in the well-being of the young means that mental health now improves with age.” He lists Australia, the UK, the USA, France, Germany, Spain, Sweden, New Zealand, and Italy as countries that have seen declining youth wellbeing, using data “with better sample designs,” presumably from national surveys. He interprets this as evidence that the World Poll is unreliable.
This is truly head-scratching. World Poll data for every one of these countries shows a decline in youth wellbeing among those aged 15 to 24. Thus, the World Poll is consistent with evidence from the national sources Blanchflower prefers, based on what he has disclosed.
Overall, we identify 29 countries that have seen a statistically significant decline in youth wellbeing from period one (2005 to 2011) to period three (2018 to 2023). By contrast, 79 countries saw rising youth wellbeing over this period, and the remaining 38 saw no significant change.
Yet, the World Poll contains several other measures of wellbeing, including a Daily Experiences Index, which is coded such that more positive daily emotions (like enjoyment, laughter, respect) increase the score and more negative experiences (like worry, sadness, anger) diminish the score. On this measure, youth in many countries (60 of 146) around the world have seen worsening daily experiences in recent years (2018 to 2023) compared to the first period (2005 to 2011).
For example, youth in 70% of countries reported increased experiences of sadness and 70% reported increased worry, so Gallup World Poll data are consistent with the view that youth around the world are experiencing worse mental health, even if the trends in daily emotional experiences are not always aligned with summary evaluative measures.
In the introduction to Blanchflower’s blog, Haidt states that the OECD’s PISA, which collects mental health data in addition to test scores, shows declining mental health in many “non-Western nations.” To assess whether PISA data align with World Poll data, we downloaded the latest PISA data (2022) and created a variable “highly satisfied with life” for 15-year-olds who responded with a 7 or higher on a zero to 10 scale when asked by PISA, “Overall, how satisfied are you with your life as a whole these days?”
We compared this to the percentage in the World Poll who are thriving in wellbeing. We restricted the World Poll sample to 15- to 24-year-olds, because the sample of only 15-year-olds would be very small, and we used the years 2018 to 2022. The correlation between the two databases is 0.33 without any weight for sample size and 0.38 after accounting for variation in sample size. The correlation between the PISA measure of wellbeing and the Daily Experiences Index is even higher: 0.52 unweighted and 0.62 using sample size weights.
PISA, it should be clear, is not a representative sample of 15-year-olds, as others have pointed out. In the PISA data, Palestinian 15-year-olds are more satisfied with their life than Swedish 15-year-olds, which is difficult to believe if based on random sampling, so it is quite possible that Gallup data offer a better measure in many countries, despite a smaller sample size. There are also important context effects with the PISA. Students stressed out about a lengthy exam may have a temporarily distorted sense of their wellbeing. In any case, PISA wellbeing data are well-aligned with Gallup wellbeing data for the 67 countries we matched.
Claim 6. Blanchflower argues that the World Poll doesn’t measure the intensity of social media/internet use.
Reply 6. The reliability of a survey is unrelated to its thematic content. It is strange to claim that the World Poll is unreliable because it doesn’t address the topics that Blanchflower wants to study.
A compelling research project would look at the factors that potentially affected youth wellbeing -- including social media use, which Gallup has studied within the United States, building on the strong work of the World Happiness Report, which, like Blanchflower, notes a flattening of the usual U-shaped curve when happiness is plotted by age. Such a project should include the many countries for which internet access and smartphone use are luxury items. Gallup is committed to collecting data in all of them, not only the places where young people routinely binge on social media. It may very well be that smartphones do not have the same effects on youth wellbeing across all countries. In fact, it is extremely likely.
Claim 7. Blanchflower claims that the Global Minds Database is a “better dataset” than the Gallup World Poll and confuses his own cross-sectional (point-in-time) finding about wellbeing with a longitudinal trend, which he has not even attempted to measure.
Reply 7. Blanchflower’s claim that the Global Minds data are “better” conveys a complete misunderstanding about probability sampling, representation, and data quality. First, it contradicts the creators of the Global Minds database themselves. On their website, they compare the Gallup World Poll to their data and point out that the Global Minds survey uses respondents recruited from online Google and Facebook ads and is not representative of the national populations, especially in lower-income countries. Across all countries included in the Global Minds Project 2023 report, 53% of the sample had a tertiary education (e.g., a bachelor’s degree or higher), which is less than the actual U.S. rate. To illustrate this bias, consider the Global Minds bachelor’s degree or higher attainment rate of 58% for Zimbabwe and 80% for India. Meanwhile, Our World in Data reports the tertiary education rates of Zimbabwe and India as 0.7% and 7.9% respectively for the population between 25 and 65. Gallup World Poll estimates these rates as 1.3% and 6.1%, respectively, for the population 15 and over.
Moreover, the Global Minds project was limited to English in 2019 and 2020. In 2022, it expanded to 11 languages and 16 in 2023. By contrast, the Gallup World Poll is translated into over 100 languages and uses an extremely rigorous process to recruit and contact a random sample of residents in each country, including the use of face-to-face interviews in low-middle-income countries. There are surely differences between people who respond to Facebook and Google ads and those who do not, including those who never see the ads because they do not have internet access or do not speak the language of the survey. These sources of recruitment bias seem to account for the heavily biased final sample in the Global Minds data.
Blanchflower also confuses or misleadingly portrays his own findings. In recent work, he pooled Global Minds Project data from 2020 to 2024 for 34 countries and found that people aged 18 to 24 had lower mental health than older people. This analysis did not involve comparing mental health at some point in the past to a later time in the entire sample or at the country level. Yet, this does not stop him from citing this as evidence of a global decline in youth mental health (that somehow -- in his opinion -- also contradicts Gallup data). Ironically, Blanchflower’s claims about the downward trend also contradicts the Global Mind Project’s own report. They find no change in youth mental health over the brief period of 2021 to 2023 in the 32 countries they have been tracking.
Blanchflower’s post is far from a careful, rigorous critique. Nowhere in the blog does he attempt to correlate World Poll measures of wellbeing with alternative measures of wellbeing using similar high-quality samples, except in his flawed comparison of U.S. states, which he got badly wrong. Thus, Blanchflower does not even attempt to test the reliability of the World Poll in the 141 or more countries with data outside the United States. When we do so here, we confirm the well-established view that the World Poll reliably measures wellbeing. Nor does Blanchflower consider the thousands of academic papers that have been published using the Gallup World Poll and reached the same conclusion as ours: The World Poll is a valid and reliable database, with remarkable geographic coverage.
Conclusion
Gallup would be happy to work with any organization to address how social media use affects youth mental health, expanding on our U.S. research that was recently cited by the U.S. Surgeon General. Doing so is not cheap and not easy. Gallup is proud of the intense effort and high-quality work that make the World Poll a highly useful database for scholars around the world, who know it is highly reliable and highly valid. Blanchflower should reconsider whether he wants to be on record with such unreliable claims to the contrary.
Note that the GWP did show declines among young people in life satisfaction in the Anglosphere and some Western European nations. Also note that there is another regularly used dataset--the Global Burden of Disease (GBD)--but this is not actual data. This dataset has been used to show little change in youth mental health in the United States. The GBD is a set of estimates of what health data might be, based on other variables. Zach has shown in a previous post why the GBD is unreliable, as it does not reflect the most reliable population mental health trends we have available.
This can simply be done by running a life satisfaction regression with internet usage as a right hand side variable by country with or without controls, say for gender, age, education and labor force status perhaps. These can be done by country which is much easier to report.
The weighted percent of respondents by country who said they used the internet in the past week. 68 countries had 95% or higher usage. Individuals who did not have this usage tended to be African countries that had below 70% usage – Uganda, Zimbabwe, Chad, Burkina Faso, Niger, Tanzania, Ethiopia, Malawi and Madagascar, the old, low income earners, and the less educated and in the sample of pooled years, who were sampled at the beginning year such as 2015 rather than more recently. In 2022 coverage was 100% in Kuwait, 99.6% in Taiwan and 99.3% in Sweden.
Rothwell is Gallup’s Principal Economist. Srinivasan is the Global Director of Research for the Gallup World Poll.
Blanchflower, David G., and Alex Bryson. "Wellbeing rankings." Social Indicators Research 171, no. 2 (2024): 513-565; Blanchflower, David G. "Is happiness U-shaped everywhere? Age and subjective well-being in 145 countries." Journal of population economics 34, no. 2 (2021): 575-624; Blanchflower DG, Graham C (2020b) Subjective well-being around the world: trends and predictors across the life span: a response, working paper. http://www.dartmouth.edu/~blnchflr/papers/dgbcg%206%20March%20commentary.pdf
This result is from the Stata V18 power command: “power twomeans .67 .51, sd(.48).” 0.48 is approximately the standard deviation in the share thriving the United States, among 15-24-year-olds over the entire period (2005-2023). For other analysis see sites like https://clincalc.com/stats/samplesize.aspx
https://fivethirtyeight.com/features/the-death-of-polling-is-greatly-exaggerated/
Blanchflower DG, Oswald AJ (2008a) Is well-being U-shaped over the life cycle? Soc Sci Med 66:1733–1749; Blanchflower, David G. "Is happiness U-shaped everywhere? Age and subjective well-being in 145 countries." Journal of population economics 34, no. 2 (2021): 575-624.
The reported correlation above is unweighted. It is slightly higher using the WVS sample size as a weight (r=0.60). The World Values Survey item reads: “All things considered, how satisfied are you with your life as a whole these days?” Respondents choose on a 1-10 scale. The Gallup item is on 0 to 10 scale, with 10 being the best life you can imagine for yourself and 0 the worst. See https://www.gallup.com/394505/indicator-life-evaluation-index.aspx
See Dan Witters and Kayley Bayne, "New Normal: Lower U.S. Life Ratings" Jan 18, 2024, https://news.gallup.com/poll/548618/new-normal-lower-life-ratings-persisted-2023.aspx
Gallup routinely measures wellbeing using both current and expected wellbeing (in 5 years) using the Cantril ladder. Respondents who score a 7 or higher on current wellbeing and an 8 or higher on expected wellbeing are coded as “thriving.” See https://www.gallup.com/394505/indicator-life-evaluation-index.aspx
Thank you again for all of your advocacy and research. Last week, our school district announced a district-wide ban on cell phone use for all middle and high schools. Your book was linked as a reference in the initial survey sent to parents.
In “These Truths: A History of the United States” by Jill Lepore, the Gallup Poll was made by by a White American Male in the Mid 1930s to measure American Public Opinion. The same systems are largely used today. 😅
“[George] Gallup liked to call public opinion measurement ‘a new field of journalism,’ ” Lepore writes. He believed he was taking the “pulse of democracy.” E. B. White, on the other hand, writes that “although you can take a nation’s pulse, you can’t be sure that the nation hasn’t just run up a flight of stairs.” Though Lepore is careful to leave her own opining largely out of the narrative, it becomes clear that she rather agrees with White.
Excerpt from Ilana Masad @ https://www.theparisreview.org/blog/2018/09/18/america-doesnt-have-to-be-like-this/
“Gallup’s polls attempted to predict the outcomes of elections, but they were also meant to scientifically represent the opinions of the nation so elected officials could know what the people wanted. But representation was the woeful problem. Although 10 percent of American citizens in the thirties and forties were black, they made up less than 2 percent of survey groups—and only in the North, because Gallup didn’t bother to survey black people in the South, where a variety of methods prevented many from voting. This was selective representation at best. Plus, Gallup’s method implied that his participants already had opinions on the issues at hand, when often these takes were formed on the spot, spawned by the question, for the simple purpose of having a yes or no answer.”
So yes. This article tracks.