A Debate on the Strengths, Limitations, Uses and Misuses of the Gallup World Poll
Gallup researchers respond to "Is the Gallup World Poll Reliable?" Blanchflower responds back.
Intro from Zach Rausch and Jon Haidt:
On June 17, we published an essay by Dartmouth economist Danny Blanchflower titled, "Is the Gallup World Poll Reliable?" In the essay, Blanchflower makes two major claims:
The Gallup World Poll is not a fully reliable tool for analysis of youth mental health trends. (Some reasons include having sample sizes that may be too small to allow for accurate capture of smaller subgroups, such as young adult women, and that the timing of the survey varies by month within and between countries).
Some researchers misuse the Gallup World Poll, such as breaking down the data into very small subgroups, resulting in questionable analyses. These analyses are then misinterpreted by others, leading to the downplaying of concerns about the impact of digital technology on youth mental health.
Today, we present a response from two Gallup researchers, Jonathan Rothwell and Rajesh Srinivasan (Rothwell is Gallup’s Principal Economist. Srinivasan is the Global Director of Research for the Gallup World Poll). While they do not contest Blanchflower's point regarding the misuse of the poll by other researchers, they strongly disagree with his assessment of the poll's reliability.
We also include a counter-response from Blanchflower to Rothwell and Srinivasan's arguments at the bottom.
Our aim in sharing these essays is to deepen our understanding of what the Gallup World Poll can and cannot reveal about the state of youth mental health worldwide.
— Zach and Jon
NOTE: This post is long and highly technical.
Part 1. Gallup Researchers Respond to Blanchflower
The Gallup World Poll Is Reliable
By Jonathan Rothwell, PhD and Rajesh Srinivasan, PhD
We were surprised to see the comment from Danny Blanchflower on this Substack, which questioned the reliability of the Gallup World Poll. We know he has frequently published academic articles using Gallup’s global wellbeing data, including one this year, and has evidently not retracted any of them.1
While Gallup welcomes constructive criticism of our data collection methods, findings and any other component of research, Blanchflower’s grounds for declaring that "The Gallup World Poll is an unreliable dataset for understanding global well-being" are based on analytical errors and poor reasoning. Hence, we feel it is important to correct the record.
Let’s review each point.
Claim 1. The World Poll sample sizes are too small for large countries.
Reply 1. The World Poll annually conducts roughly 1,000 interviews with a nationally representative sample of adults per country, per wave. This is more than enough to provide population-wide estimates at the national level, which is what the World Poll is designed to measure. Importantly, nothing prevents scholars from pooling data across years to boost sample sizes for sub-populations like youth. The U.S. Census Bureau often uses a similar approach, providing multi-year estimates through its American Community Survey, for example.
Blanchflower claims that because the United States is a large country, it requires a sample size that is scaled proportionally to produce reliable estimates. This is an inaccurate view of statistics. Statistical software packages and scientific organizations provide online power calculators to help researchers determine the sample size needed to test hypotheses. To test whether a 16-percentage-point decline in the share of youth who are thriving in wellbeing is significant in the United States (see below), one only needs 143 respondents per period. The World Poll’s U.S. sample includes 470 people between the ages of 15 and 24 from 2018 to 2023 and 561 between 2005 and 2011.2 The capacity to draw reliable inferences from a randomly sampled population is what allows small-sample U.S. election polls to consistently come within a reasonable margin of error of predicting the presidential vote every four years.3 These points should be obvious to any quantitative researcher.
Gallup’s World Poll is also similar in size to other well-established and recurring international surveys, including those that Blanchflower uses. The World Values Survey (WVS) averaged 1,705 responses per country for Wave 7, collected over six years from 2017 to 2022 in 92 countries. During that same period, the World Poll collected about 6,000 responses per country across 142 countries. Blanchflower has not mentioned any concerns about the WVS sample size in his recent publications, despite its much smaller size than the World Poll.4 If he had compared the two surveys, he’d find Gallup’s measure of current life evaluation (using the Cantril ladder) has a correlation of 0.59, with the World Values Survey measure of life satisfaction, using data from 2017 to 2022.5
Claim 2. The World Poll wellbeing results do not match other Gallup wellbeing data.
Reply 2. This is incorrect and appears to be based on several errors.
Despite different data collection methods (a different mix of mail, phone, and web), Gallup’s U.S. wellbeing database (Gallup National Health and Well-Being Index) closely aligns with World Poll data.6 Aggregating data from 2009 to 2023, when both sources have available data collected with comparable survey methods, the mean share of U.S. adults who are thriving in wellbeing is 54.6% using the World Poll and 53.5% using the national database.
Blanchflower tries a different test to determine reliability. He looks at the correlation between state-level measures of wellbeing using the U.S. Well-Being Index database and World Poll, ignoring the fact that the World Poll is not designed for state-level analysis and has small state-level sample sizes for many states like Hawaii, Vermont and North Dakota. It is puzzling that Blanchflower conducts an analysis on a subset of a database, while complaining about small sample sizes in the aggregate database. Gallup methodologists do not create state-level weights for the World Poll, as they do for other Gallup databases that are designed for state-level reporting. This weighting issue alone can create large discrepancies between databases.
Even with these limitations, we replicated Blanchflower’s analysis, using the same data and period (2008 to 2017), and found that he made a basic error. Blanchflower says he used the Cantril ladder to measure wellbeing.7 Blanchflower reports an r^2 value of .08 (implying a correlation of around 0.28). We replicated that correlation (r=0.30) using unweighted data, but the correlation jumps to 0.45 using the sample size as a weight. Because the precision of the point estimate is proportional to the sample size, but the sample size varies widely across states (because it is based on population), Blanchflower made an analytical error by not using the sample size as a weight in his analysis.
Blanchflower’s use of a single measure of wellbeing — rather than two (current and future) — further injects error, and thus he finds that states with small populations and small sample sizes in the World Poll (Hawaii and West Virginia) rank differently across the two data sources. When we calculated the percentage of population that is thriving in wellbeing (using both the current and expected Cantril ladder), Hawaii’s and West Virginia’s rankings look similar across the Gallup Well-Being Index and the World Poll (1st and 3rd and 50th and 42nd, respectively).
Claim 3. Blanchflower argues that Gallup measures of wellbeing did not decline during the Great Recession or pandemic, according to World Poll.
Reply 3: The first point is bizarre because it is so easily refuted by publicly available summary data for the United States. According to World Poll data, the share of U.S. adults thriving in wellbeing fell from 66% in 2007 to 61% in 2008, just after the Great Recession began, and dipped further to 57% in 2009, when unemployment spiked.
While it is true that the World Poll data for the United States do not show a drop from 2019 to 2020 or 2021 in overall wellbeing (percentage thriving), World Poll data collected in May 2019 and May 2020 in the United States show a large increase in the share of adults who “worried a lot” the day before the survey; increasing from 35% to 43%. Gallup fielded the same item about worry on its U.S. Panel. The May 2020 point estimate for worry is 46%, not meaningfully different from the World Poll estimate, especially since the two surveys were not fielded on the same days.8
As non-World Poll Gallup data for the United States show, wellbeing fluctuated greatly from month to month and even day to day, according to things like spikes in unemployment, hospitalization counts, pandemic relief checks, business re-openings, and vaccination announcements and implementation.9 It would be foolish to expect a point-in-time national survey to reflect all of this variation, and Gallup never intended for the World Poll to be used to analyze monthly national trends.
Claim 4. Gallup data are collected during different months and are thus unreliable because happiness varies by season of the year.
Reply 4. The World Poll has tried to stay consistent in its annual measurement to minimize the effect of seasonality by collecting data between April and November, in most years. Given that much of the data are collected in-person and so many countries are involved, Gallup also needs to consider religious observances, weather patterns, pandemics, war, and other local factors in deciding when to enter the field.
From a research perspective, variation in research collection timing is not a serious obstacle to analysis. There are well-known techniques to test for seasonal effects and adjust for them. In our recent paper, for example, we used month-effects to adjust for cross-country differences in the timing of economic-related items.10 A quick analysis suggests that, globally, month effects are small (on the order of 0 to 1.7 percentage points on the share of the population that is thriving).
Claim 5. The World Poll is not aligned with other and better data sources showing falling mental health.
Reply 5. The countries that Blanchflower claims have seen a decline in youth wellbeing using “good data” also show a decline in youth wellbeing in the Gallup World Poll, further validating the World Poll.
Blanchflower states that "The decline in the well-being of the young means that mental health now improves with age.” He lists Australia, the UK, the USA, France, Germany, Spain, Sweden, New Zealand, and Italy as countries that have seen declining youth wellbeing, using data “with better sample designs,” presumably from national surveys. He interprets this as evidence that the World Poll is unreliable.
This is truly head-scratching. World Poll data for every one of these countries shows a decline in youth wellbeing among those aged 15 to 24. Thus, the World Poll is consistent with evidence from the national sources Blanchflower prefers, based on what he has disclosed.
Overall, we identify 29 countries that have seen a statistically significant decline in youth wellbeing from period one (2005 to 2011) to period three (2018 to 2023). By contrast, 79 countries saw rising youth wellbeing over this period, and the remaining 38 saw no significant change.
Yet, the World Poll contains several other measures of wellbeing, including a Daily Experiences Index, which is coded such that more positive daily emotions (like enjoyment, laughter, respect) increase the score and more negative experiences (like worry, sadness, anger) diminish the score.11 On this measure, youth in many countries (60 of 146) around the world have seen worsening daily experiences in recent years (2018 to 2023) compared to the first period (2005 to 2011).
For example, youth in 70% of countries reported increased experiences of sadness and 70% reported increased worry, so Gallup World Poll data are consistent with the view that youth around the world are experiencing worse mental health, even if the trends in daily emotional experiences are not always aligned with summary evaluative measures.
In the introduction to Blanchflower’s blog, Haidt states that the OECD’s PISA, which collects mental health data in addition to test scores, shows declining mental health in many “non-Western nations.” To assess whether PISA data align with World Poll data, we downloaded the latest PISA data (2022) and created a variable “highly satisfied with life” for 15-year-olds who responded with a 7 or higher on a zero to 10 scale when asked by PISA, “Overall, how satisfied are you with your life as a whole these days?”
We compared this to the percentage in the World Poll who are thriving in wellbeing. We restricted the World Poll sample to 15- to 24-year-olds, because the sample of only 15-year-olds would be very small, and we used the years 2018 to 2022. The correlation between the two databases is 0.33 without any weight for sample size and 0.38 after accounting for variation in sample size. The correlation between the PISA measure of wellbeing and the Daily Experiences Index is even higher: 0.52 unweighted and 0.62 using sample size weights.
PISA, it should be clear, is not a representative sample of 15-year-olds, as others have pointed out.12 In the PISA data, Palestinian 15-year-olds are more satisfied with their life than Swedish 15-year-olds, which is difficult to believe if based on random sampling, so it is quite possible that Gallup data offer a better measure in many countries, despite a smaller sample size. There are also important context effects with the PISA. Students stressed out about a lengthy exam may have a temporarily distorted sense of their wellbeing. In any case, PISA wellbeing data are well-aligned with Gallup wellbeing data for the 67 countries we matched.
Claim 6. Blanchflower argues that the World Poll doesn’t measure the intensity of social media/internet use.
Reply 6. The reliability of a survey is unrelated to its thematic content. It is strange to claim that the World Poll is unreliable because it doesn’t address the topics that Blanchflower wants to study.
A compelling research project would look at the factors that potentially affected youth wellbeing -- including social media use, which Gallup has studied within the United States,13 building on the strong work of the World Happiness Report, which, like Blanchflower, notes a flattening of the usual U-shaped curve when happiness is plotted by age.14 Such a project should include the many countries for which internet access and smartphone use are luxury items. Gallup is committed to collecting data in all of them, not only the places where young people routinely binge on social media. It may very well be that smartphones do not have the same effects on youth wellbeing across all countries. In fact, it is extremely likely.
Claim 7. Blanchflower claims that the Global Minds Database is a “better dataset” than the Gallup World Poll and confuses his own cross-sectional (point-in-time) finding about wellbeing with a longitudinal trend, which he has not even attempted to measure.
Reply 7. Blanchflower’s claim that the Global Minds data are “better” conveys a complete misunderstanding about probability sampling, representation, and data quality. First, it contradicts the creators of the Global Minds database themselves. On their website, they compare the Gallup World Poll to their data and point out that the Global Minds survey uses respondents recruited from online Google and Facebook ads and is not representative of the national populations, especially in lower-income countries.15 Across all countries included in the Global Minds Project 2023 report, 53% of the sample had a tertiary education (e.g., a bachelor’s degree or higher), which is less than the actual U.S. rate.16 To illustrate this bias, consider the Global Minds bachelor’s degree or higher attainment rate of 58% for Zimbabwe and 80% for India.17 Meanwhile, Our World in Data reports the tertiary education rates of Zimbabwe and India as 0.7% and 7.9% respectively for the population between 25 and 65. Gallup World Poll estimates these rates as 1.3% and 6.1%, respectively, for the population 15 and over.18
Moreover, the Global Minds project was limited to English in 2019 and 2020. In 2022, it expanded to 11 languages and 16 in 2023. By contrast, the Gallup World Poll is translated into over 100 languages and uses an extremely rigorous process to recruit and contact a random sample of residents in each country, including the use of face-to-face interviews in low-middle-income countries. There are surely differences between people who respond to Facebook and Google ads and those who do not, including those who never see the ads because they do not have internet access or do not speak the language of the survey. These sources of recruitment bias seem to account for the heavily biased final sample in the Global Minds data.
Blanchflower also confuses or misleadingly portrays his own findings. In recent work, he pooled Global Minds Project data from 2020 to 2024 for 34 countries and found that people aged 18 to 24 had lower mental health than older people.19 This analysis did not involve comparing mental health at some point in the past to a later time in the entire sample or at the country level. Yet, this does not stop him from citing this as evidence of a global decline in youth mental health (that somehow -- in his opinion -- also contradicts Gallup data). Ironically, Blanchflower’s claims about the downward trend also contradicts the Global Mind Project’s own report. They find no change in youth mental health over the brief period of 2021 to 2023 in the 32 countries they have been tracking.20
Blanchflower’s post is far from a careful, rigorous critique. Nowhere in the blog does he attempt to correlate World Poll measures of wellbeing with alternative measures of wellbeing using similar high-quality samples, except in his flawed comparison of U.S. states, which he got badly wrong. Thus, Blanchflower does not even attempt to test the reliability of the World Poll in the 141 or more countries with data outside the United States. When we do so here, we confirm the well-established view that the World Poll reliably measures wellbeing. Nor does Blanchflower consider the thousands of academic papers that have been published using the Gallup World Poll and reached the same conclusion as ours: The World Poll is a valid and reliable database, with remarkable geographic coverage.
Conclusion
Gallup would be happy to work with any organization to address how social media use affects youth mental health, expanding on our U.S. research that was recently cited by the U.S. Surgeon General. Doing so is not cheap and not easy. Gallup is proud of the intense effort and high-quality work that make the World Poll a highly useful database for scholars around the world, who know it is highly reliable and highly valid. Blanchflower should reconsider whether he wants to be on record with such unreliable claims to the contrary.
Part 2. Blanchflower Responds to Rothwell and Srinivasan
Improving the Gallup World Poll: A Response to My Critics
by David Blanchflower, PhD
Moving forward, I am hoping to calm this debate and focus on science and what we can learn about the human condition. My post on the Gallup World Poll (GWP) may have been overly confrontational, so this attempt addresses the critics constructively and, I hope, provides a way forward. The title of my last post, "Is The Gallup World Poll Reliable?" was also too provocative, and my long-time colleague Alex told me off for it! My goal is to engage constructively and foster dialogue to improve the Gallup World Poll, which in many respects, is a pathbreaking survey. My goal is to help to make it better in a post-Covid world.
In my original post, I also critiqued the misuse of GWP data in arguments that downplay concerns about the impact of digital technology on youth mental health. Rothwell and Srinivasan did not disagree about the misuse of the GWP — they disagreed about my broader claims around the reliability of the poll.
I do not intend to critique Gallup data in a way that implies it cannot shed light on important social and economic issues: it can. The Gallup World Poll is an invaluable data source for research, which I have utilized many times. Like any data file with extensive coverage across countries, it has strengths and weaknesses. The GWP's greatest strength is its broad coverage, especially in less developed regions where data is scarce, allowing researchers to track changes over time.
However, when a survey like the GWP covers numerous countries with varying populations using the same sample size, it becomes challenging to make detailed and accurate conclusions about specific subgroups, such as age groups or genders, due to insufficient sample sizes. As Gallup researchers Jonathan Rothwell and Rajesh Srinivasan point out, the individual country files are not intended to provide representative estimates by state, month, age, or gender.
Another significant limitation is the nature of the survey questions, most of which are just on/off variables. First, the well-being variables are not well-suited for a world that has experienced four major shocks: the Great Recession, the digital revolution, COVID-19, and a European war. Second, the survey's ability to track the declining mental health or unhappiness of young people is inadequate because they include no variables that can identify those struggling with serious mental health problems, who potentially are at risk of self-harm or even homelessness. The GWP struggles to identify negative effects, especially when severe mental health symptoms are of particular interest. We don’t know why yet, but it is harder to detect severe symptoms using questions about happiness/positive affect than with unhappiness/negative affect data. Even though the GWP has questions on pain, stress, sadness, and worry, they are not adequate.
The Cantril variable, a heavily used positive affect measure from the GWP, assesses well-being as a "ladder of life" but is not well-suited to capture the broad changes since 2008. It asks how someone’s life has turned out so far rather than something more immediate, which is impacted by life events like unemployment or financial worries. Indeed, the variables are often contradictory. Of the 426 respondents to the US GWP from 2015-2023 who reported a Cantril score of 3 or less on a scale of 0-10, 206 (48%) said they had enjoyment yesterday. Of the 4,275 who scored 8 or higher on Cantril 1073 (25%) said they experienced worry yesterday. These kinds of contradictions are unusual in well-being surveys.
The four main positive affect variables - enjoyment, smiling and laughing, well-rested, and happiness - and the four negative affect variables - worry, sadness, stress, anger, and pain - are all binary yes/no (1,0). Moving forward, this can be improved by including new measures of ill-being, such as the GHQ score used in the UK panel Understanding Society and the English and Scottish Health Surveys; the MHQ score used in Global Minds; the Kessler Score used in the NHIS and NSDUH; and the PHQ and GAD scores used in Come-Here, which are commonly used validated screening tools. If not, why not?
It seems the well documented U-shape in happiness and the hump shape in unhappiness in age have now gone. We have found evidence that in the years since 2020 especially, that ill-being now declines with age in 90 countries and the young are especially unhappy.
I do need, though, to respond to several of the points made by the Gallup team.
Point-by-point responses to Jonathan Rothwell and Rajesh Srinivasan
My major claim was that the GWP seems to be an outlier once I moved to an analysis of youth well-being, which does not seem to be picked up in the GWP. I argued there were four reasons:
Samples are small, especially for large countries, and do not appear to be nationally representative.
The survey’s well-being measures did not decline during the Great Recession or COVID-19, which seems unlikely to reflect reality, given that most other surveys show declines.21
The timing of the survey varies by month within and between countries.
The GWP shows that internet “access” is positively correlated with well-being.
Rothwell and Srinivasan responded to these criticisms, and I will address their main points in turn.
1. Academic Publications
“We know he has frequently published academic articles using Gallup’s global wellbeing data, including one this year, and has evidently not retracted any of them.”
This is true. In a previous paper, for example, my co-authors and I showed that GWP data for 2008–2017 found U-shape curves in happiness across the lifespan, and the results, as reported, are correct. However, we have also pointed out flaws with the Gallup World Poll in several papers, including its lack of responsiveness to shocks and issues with gender effects and well-being rankings. Once I became aware of these problems, I stopped using GWP and started using other sources like ISSP, Global Minds, and Come-Here, which do not appear to have the same issues. These data files show that the U-shape has gone, less so in the GWP.
2. Sample Size
“The World Poll annually conducts roughly 1,000 interviews with a nationally representative sample of adults per country, per wave. This is more than enough to provide population-wide estimates at the national level, which is what the World Poll is designed to measure.”
It is the case that a well-designed survey can allow researchers to produce population estimates with 1,000 unweighted respondents, provided we are able to relate the achieved sample to the population through sample probability weighting and adjustments for non-response. I have been concerned that many using GWP are not as aware as they should be of the survey’s design and the methodology used to reweight the raw data with the weights provided. I’m keen to see more information about how successful the reweighting strategy is in allowing us to extrapolate from the sample to country populations, and to know more about adjustments for survey non-response.
It is certainly true that we researchers are always greedy for more. But sometimes there is a good reason. In this case, as Rothwell and Srinivasan say, the precision of estimates is dependent upon sample sizes, and a survey’s ability to pick up statistically significant changes of a certain size is dependent upon sufficient unweighted sample sizes. So when we think of a sample size being ‘more than enough’ we need to remember that it might not pick up significant changes when those changes are imprecisely estimated. While 1,000 interviews per country may be sufficient for population-wide estimates at the national level, small sample sizes can lead to concerns about the stability of estimates, the size of coefficients, and the ability to generate disaggregated estimates.
For example, in 2023, the Current Population Survey, which is produced by the Bureau of Labor Statistics to calculate the monthly unemployment rate, had an annual sample of about 1 million respondents aged 16 and over. This sample size is a thousand times larger than the sample size of the Gallup World Poll. This survey, which is publicly available for researchers at no charge, does allow disaggregated monthly and state-level estimates as well as by age and gender, and race and ethnicity, to be published and analyzed.
The other issue with the GWP, of course, is being able to examine well-being and other outcomes of interest in sub-populations. This was more easily done with the US Daily Tracker conducted by Gallup, which had 350,000 observations a year, far more than the 1,000 for the US in the World Poll. The global ranking of the US was very different when we used the data from the Daily Tracker than it was when using USGWP data on all of the GWP wellbeing variables. In our Wellbeing Rankings paper, we found systematic differences in the US scores between those reported in the GWP and the Daily Tracker, with those in the Daily Tracker reporting much lower negative affect. (In all cases, the differences between the US scores in the two surveys are statistically significant).
3. Youth Well-Being
“To test whether a 16-percentage-point decline in the share of youth who are thriving in wellbeing is significant in the United States, one only needs 143 respondents per period. The World Poll’s U.S. sample includes 470 people between the ages of 15 and 24 from 2018 to 2023 and 561 between 2005 and 2011.”
Rothwell and Srinivasan do not deny the sample sizes for youth in the US GWP are small. From 2018 to 2023, there were only 470 people aged 15-24 in the US GWP sample, averaging 78 per year. A larger sample size would provide more reliable insights, especially for phenomena impacting specific subgroups like less-educated females. For instance, the GWP has three education categories: completed elementary education or less, completed secondary education, and completed four years of higher education. In the US in 2020, zero females under 25 had not completed elementary education, one in 2021 and 2023, and two in 2022. One also needs to be concerned not just about significance but also about coefficient size and stability, which are much bigger issues in small samples than bigger ones.
Rothwell and Srinivasan say that because of small sample sizes, the World Poll ‘is not designed for state-level analysis,’ or indeed monthly analysis, which presumably means it is not designed for analyzing subgroups, including the young. This also means it is not designed for analysis conducted on very small cells using these data, as done by some researchers.
I believe I am correct in saying that GWP does not stratify its sampling by location, e.g. by state, in the case of the USA. Consequently, the small sample sizes mean one cannot undertake analyses of geographical variance in well-being. That’s a shame because we know for economic and social reasons, they can be important. The way around this would be to raise sample sizes very substantially so that one can think of the sample as ‘as good as random’ or stratify by location such that representativeness at a local level will be assured. However, that would only be meaningful if the sample size was large enough to ensure one had a sizeable N of respondents in each state cell.
4. Comparison with World Values Survey (WVS)
“The World Values Survey (WVS) averaged 1,705 responses per country for Wave 7, collected over six years from 2017 to 2022 in 92 countries. During that same period, the World Poll collected about 6,000 responses per country across 142 countries. Blanchflower has not mentioned any concerns about the WVS sample size in his recent publications, despite its much smaller size than the World Poll.”
Rothwell and Srinivasan pointed out that the WVS averages 1,705 responses per country for Wave 7. It is true that in a previous paper, my co-authors and I used the WVS data to look at U-shapes between 1990-2014. However, the structure of the WVS is different from the GWP; each country was only interviewed in one of the years during the wave period. For example, in the Great Recession years, the US was interviewed in 2006 and not again until 2011, missing significant periods. This is why we have not used the WVS to analyze responses to shocks.
5. Inconsistency with the Gallup US Daily Tracker
Rothwell and Srinivasan also do not address my point from my previous post (also here) that the findings in the US GWP are entirely different from those in the much larger Gallup US Daily Tracker that have the same questions. That is a concern, especially as we have no other large surveys for any other country to check results. This is likely a major issue in large countries that have the same sample size as small countries – the US has approximately the same sample size as Malta and Cyprus.
6. Use of a Single Well-Being Measure
“Blanchflower’s use of a single measure of wellbeing — rather than two (current and future) — further injects error.”
Rothwell and Srinivasan claimed that using a single measure of well-being injects error. It is unclear what error this introduces. In prior work, we have used various measures, including Cantril, life satisfaction, and positive and negative affect variables. We found that different variables produce different rankings, highlighting inconsistencies within the GWP. The WHR report uses single measures. In the expansive literature of over 625 published papers on happiness U-shapes I don’t believe any used thriving.22
Nonetheless, I agree with Rothwell and Srinivasan that we should typically not use a single metric of well-being. Wellbeing is multifaceted, and we might reasonably expect rankings to vary quite a bit across metrics. This is what Ed Diener taught us many years ago. So, when we study wellbeing, we like to use as many of the available metrics as possible to fully understand what’s going on.
To illustrate the wide variability in well-being when using different measures, we can look at my analysis from 2008-2017 of the GWP. In it, we find that Finland ranked 4th on the Cantril measure, 93rd on enjoyment, 99th on laughing and smiling, 122nd on being well-rested, 51st on pain, 15th on sadness, 130th on worry, 1st on anger, and 51st overall, where we rank the negative affect variables from least to worst.
When we looked at the GWP data from 2021-2023, Finland ranked 1st and Denmark 2nd using the Cantril measure out of 145 countries. However, when we examined smiling and laughing, Finland ranked 73rd and Denmark 55th. If we used worry, ranking the least worried as top-ranked, Finland ranked 43rd and Denmark 41st. Alternatively, using sadness or stress produced different rankings. For example, in the smiling and laughing measure, Guatemala and Panama were the top two, and Kazakhstan and Taiwan were the least worried countries. The various GWP measures are inconsistent with each other which is a problem, because this generally isn't seen in other surveys.
An example of another inconsistency comes from Eurostat's data. In the third wave of the European Health Interview Survey, 7.2% of the European population reported experiencing chronic depression, with a higher proportion of women (8.7%) than men (5.7%). Country rankings showed Iceland with the highest incidence of chronic depression among 31 countries, followed by Portugal and Sweden, with Romania having the lowest incidence. In contrast, the GWP rankings in their 2019 report showed Sweden (7), Iceland (4), Denmark (2), and Finland (1) ranked very high for overall well-being, with the rankings from the 2020 World Happiness Report using GWP data for 2017-2019 in parentheses.23 Romania ranked 48th. Despite claims that Finland is the happiest country in the world based on GWP data it doesn’t seem to be. It also has high suicide rates.
7 Thriving in Well-Being
“The first point is bizarre because it is so easily refuted by publicly available summary data for the United States. According to World Poll data, the share of U.S. adults thriving in well-being fell from 66% in 2007 to 61% in 2008, just after the Great Recession began, and dipped further to 57% in 2009, when unemployment spiked.”
The authors say this is ‘easily refuted’ but do not refute anything that I stated, as I did not make any assertion about ‘thriving’ in my post. Indeed, I have never used thriving in any of my thirty peer-reviewed papers and know of no other peer-reviewed papers in the economics of happiness that use that concept. I was focusing on the lack of change in other single variables, not two combined unvalidated variables, in the last post.
8. Thriving and Other Measures
“According to World Poll data, the share of U.S. adults thriving in wellbeing fell from 66% in 2007 to 61% in 2008, just after the Great Recession began, and dipped further to 57% in 2009, when unemployment spiked. While it is true that the World Poll data for the United States do not show a drop from 2019 to 2020 or 2021 in overall wellbeing (percentage thriving), World Poll data collected in May 2019 and May 2020 in the United States show a large increase in the share of adults who “worried a lot” the day before the survey; increasing from 35% to 43%.”
My colleagues and I have not used the 'thriving' variable before. Nor is it widely used; indeed, it isn’t included at all in the World Happiness Report. Of course, I stand to be corrected. It is indeed true that both thriving and Cantril rose for the US in the pandemic, as the authors claim in their note, and we confirm that worry did rise. But it turns out that stress and pain both fell.
Here is an interesting observation for Europe, including both Western and Eastern Europe, when weights (variable = wgt) are applied. Both Cantril and thriving scores rose during the Great Recession and the COVID-19 pandemic, which seems unlikely given the economic and social disruptions caused by these events.
Table. Cantril and Thriving throughout Europe during the Great Recession and COVID.
9. Month-to-Month Analysis
“As non-World Poll Gallup data for the United States show, wellbeing fluctuated greatly from month to month and even day to day, according to things like spikes in unemployment, hospitalization counts, pandemic relief checks, business re-openings, and vaccination announcements and implementation. It would be foolish to expect a point-in-time national survey to reflect all of this variation, and Gallup never intended for the World Poll to be used to analyze monthly national trends.”
One benefit of large sample surveys is that they can be used to analyze month-to-month variations, as we have done in previous studies on the gender well-being gap. The Eurobarometer, for example, allows for month-by-month analysis. In one paper, we show major inconsistencies in the GWP overall, where we find a positive female differential using happiness and worry measures but a negative one using enjoyment measures.
I think good progress could be made in reconciling some of the inconsistencies in results across surveys if the timing of fieldwork, and the specific time of each survey interview, were available in the data. That way the researcher is about to account for timing differences in fieldwork when analyzing the data. But it would also help if survey fieldwork could be coordinated more: other surveys such as the Eurobarometers accomplish that.
10. Mental Health Trends
“Blanchflower states that, ‘The decline in the well-being of the young means that mental health now improves with age.’”
This contrasts with what we found in an earlier paper (see Blanchflower, D.G., Graham, C.L. The Mid-Life Dip in Well-Being: a Critique. Soc Indic Res 161, 287–344 (2022)), which found U-shapes in well-being across 155 countries using Cantril and stress from the GWP, 2005-2019 - their Table 3. As we show in footnotes 24 and 25, we now also find U-shapes in over a hundred countries in the GWP from 2015-2023 using thriving.
In other datasets, we find evidence that ill-being declines with age while well-being rises. In one paper, we find that in the International Social Survey Programme series pooled country files, the percentage of people saying they were often or very often unhappy and depressed in the prior four weeks rose with age in 2011 but fell with age in 2021. This change is illustrated in Chart 1, clearly indicating a shift over time.
Changes in those who report being “often or very often unhappy and depressed” by age (2011 and 2021)
Chart 1. Percent change in those who report being “often and very often unhappy and depressed” (2011 and 2021). Source: International Social Survey Programme.
Declining well-being among youth means that mental health now improves with age, contrasting with previous findings of a U-shape in happiness. Various datasets show this trend, indicating significant changes over time.
11. PISA Data Comparison
“To assess whether PISA data align with World Poll data, we downloaded the latest PISA data (2022) and created a variable “highly satisfied with life” for 15-year-olds who responded with a 7 or higher on a zero to 10 scale when asked by PISA’.
I also downloaded the 2022 PISA data and the 2018 data, which also contained the life satisfaction data that Rothwell and Srinivasan examined. The PISA date over these years, for fifteen year olds showed big declines (the size of the effect matters!) in life satisfaction in 55 out of 62 countries, as reported in Appendix 1. This aligns with our findings of declining youth well-being, which the GWP fails to capture adequately.
12. Thriving Regression Analysis
I have not used the 'thriving' variable before, and to my knowledge it is not used in the World Happiness Report, or by any economists, for example, but since the authors consider it their go-to measure, let's examine it. To determine if the GWP shows declining well-being among the young, we ran a regression analysis with thriving as the dependent variable using the GWP from 2015 in each country, with thriving regressed on age, age squared, gender, and year. Using age and age squared is the standard way to test for U-shapes with the expected signs negative for age and positive for age squared with a minimum around age fifty.
We found that for the thriving variable that there are U-shape in age, since 2015 in 112 countries – meaning that the age term was significantly negative and the squared term significantly positive with t-values>1.65. In 46 countries, there was no significant U-shape relationship.24 This includes nearly all of the major advanced countries we studied in the European Come-Here file, such as France, Germany, Italy, Spain, and Sweden, plus the US, UK, Canada, Australia, New Zealand and the Netherlands whose national agencies have reported declining ill-being in age.25 It is notable also that in contrast for the majority of those that have U-shapes we find using Global Minds and ISSP, BRFSS and UK Understanding Society data no evidence of U-shapes in age but declining ill-being in age and rising well-being in age throughout the lifespan. The GWP is an outlier.
13. Internet Use and Disaggregated Estimates
The GWP's suitability for disaggregated analysis is questionable, particularly for country/year/age/gender cells. Other datasets, like Global Minds, offer more comprehensive data, including a variety of mental health measures. This issue is less about the Gallup World Poll Itself and more about the way researchers have misused it.
14. Global Analysis
“Blanchflower does not even attempt to test the reliability of the World Poll in the 141 or more countries with data outside the United States.”
In fact, I’ve co-authored a number of papers which compare GWP and other data across multiple countries. That is precisely what Alex Bryson and I have done in a number of earlier papers, especially here and here. A major weakness of the GWP is the very poor quality of questions used, particularly for negative affect, which are binary (1,0) variables.
The issue with questions like 'Did you experience enjoyment yesterday Y/N' is that they are not sensitive to changes in the degree of enjoyment. Almost everyone experiences some enjoyment every day, so the question doesn't capture shifts in well-being. This makes the data not unreliable but not very useful.
The problem with Yes/No variables is that they miss changes at the extremes, such as severe mental health issues. For example, in the 2022 BRFSS, 37% of those under 25 said they had no bad mental health days in the past month, but 8.9% said all 30 days were bad, up from 3.5% in the 1993-2000 period.
In the Global Minds surveys from 2020-2024 across more than 100 countries, mental health is measured using the MHQ score which is an aggregate measure of 47 elements of mental function assessed on a 1-9 scale. This measure has been validated against productivity and clinical diagnoses so it is an aggregate measure of mental function beyond just mood. A sub-dimension called Mood & Outlook captures the mood aspects specifically. Similarly, the NHIS 1997-2018 surveys use the Kessler-6 measure, which ranges from 0-24, to identify serious mental distress. The National Survey on Drug Use and Health (NSDUH), as reported by Jean Twenge and co-authors, also utilizes similar measures. The Come-Here survey uses the PHQ-9 (0-27) for depression and the GAD-7 (0-21) for anxiety. The UK Household Longitudinal Survey uses the GHQ score (0-36), with scores of 20 or higher indicating 'despair'. In 2009, 8% were classified as in despair, compared to 12% in 2021. The GWP does not capture these changes well, especially among the most vulnerable.
As I said before, the Gallup World Poll is problematic for understanding global ill-being, especially since 2014. This is especially true in less developed countries. The weakness is especially apparent in documenting the worsening mental health of the young, where the GWP seems to be the outlier. Other data sources show that the U-shape in happiness and the hump shape in unhappiness have disappeared, with happiness now increasing with age. The GWP has largely missed this trend.
Proposals for Improvement
Having been an avid user of GWP to rank countries by wellbeing; to examine the age and sex profiles of wellbeing over time; and considered the links between wellbeing and behaviors such as internet usage, both in terms of intensity and content. I have some proposals which might, if implemented, increase the value we obtain from GWP data.
Consider raising sample sizes for GWP countries and, in the meantime, provide clear guidance to users on how GWP is designed and thus what the analyst can and can’t do when studying the role of age and location.
Revisit both the nature of the wellbeing questions in GWP and the way that wellbeing questions are coded and validated. Over-reliance on (0,1) outcomes and over-reliance on Cantril’s ladder mean we obtain only a partial view on what’s going on in the world in terms of people’s well and ill-being. Possibilities are to include questions like MHQ, PHQ, GHQ, GAD which will permit comparison with the findings from the existing variables.
I would experiment with different positive affect questions including life satisfaction and happiness on 1-10 scales perhaps also experimenting with time periods covered. This is likely preferable than asking about the integral of satisfaction of the life course.
Seek to align the timing of GWP surveys and, if this is not possible, issue guidance on the timing of survey fieldwork to researchers so they can incorporate this information into their analyses.
Introduce dedicated questions to address new challenges such as those relating to internet usage.
Conclusion
The trends or declining well-being of the young seem to have started around 2015 or so, and hence was not caused by, but was likely exacerbated, by Covid.
All of the datasets, researchers have used around the world to document well-being in the post-COVID world, have strengths and weaknesses. Some have data available prior to Covid but many don’t e.g. Global Minds and the US Household Pulse Surveys:
My research strategy over the years has been to look at multiple data sources. When estimates are contradictory, researchers need to collaborate to understand why. In this instance, the GWP seems to be an outlier, and we need to understand the reason for that.
I am happy to engage in a constructive dialogue about appropriate questions to use, the form they should take (not 1,0 binary yes/no questions), what weighting schemes to use to make them representative (which I am currently working on), and what level of disaggregation is appropriate. We also need to understand the extent to which findings on positive affect and happiness are consistent with those on negative affect, including unhappiness.
There are more serious issues at stake than a disagreement over one Gallup poll. The mental health of adolescents is a grave concern, with youth depression and suicide rates having more than doubled. Addressing these critical issues requires accurate and reliable data, and improving our understanding and measurement of well-being is a crucial step toward better mental health outcomes for our youth in particular.
My intention is to help improve the GWP, not to tear it down. We are all in this together.
Short Follow-up Response from Jonathan Rothwell and Rajesh Srinivasan
We are glad Danny Blanchflower reaffirms what many scholars and organizations around the world believe; as he puts it: “The Gallup World Poll is an invaluable data source for research.” We welcome suggestions from scholars about how to best measure wellbeing, illbeing, mental health and other important constructs. Indeed, Gallup is pioneering novel efforts to measure wellbeing using in-depth measures through the Global Flourishing Study, which is a five-year longitudinal data collection effort, involving scholars at Baylor University, Harvard University, the Center for Open Science, and a consortium of funders who prioritize the study of wellbeing. The initial sample comprises about 9,000 respondents per country across 22 diverse countries. Projects like this inform what is collected on the World Poll, and with additional funding, the World Poll could expand its sample size to the same level. It cannot be assumed that trends in English-speaking high-income countries align with trends in low-income countries or anywhere else. The World Poll gives voice to people who are otherwise ignored in social science, such as those living in Afghanistan, Somalia, and Honduras. Gallup’s local partners deploy best-practice scientific methods to speak with a representative sample from these and many other places and record their feelings and opinions in their native language, regardless of their level of education or access to telecommunications. Every day, we look for ways to build on this work and help scholars and organizations around the world make new discoveries.
Follow Up Response from Danny Blanchflower
Sign me up!
Blanchflower, David G., and Alex Bryson. "Wellbeing rankings." Social Indicators Research 171, no. 2 (2024): 513-565; Blanchflower, David G. "Is happiness U-shaped everywhere? Age and subjective well-being in 145 countries." Journal of population economics 34, no. 2 (2021): 575-624; Blanchflower DG, Graham C (2020b) Subjective well-being around the world: trends and predictors across the life span: a response, working paper. http://www.dartmouth.edu/~blnchflr/papers/dgbcg%206%20March%20commentary.pdf
This result is from the Stata V18 power command: “power twomeans .67 .51, sd(.48).” 0.48 is approximately the standard deviation in the share thriving the United States, among 15-24-year-olds over the entire period (2005-2023). For other analysis see sites like https://clincalc.com/stats/samplesize.aspx
https://fivethirtyeight.com/features/the-death-of-polling-is-greatly-exaggerated/
Blanchflower DG, Oswald AJ (2008a) Is well-being U-shaped over the life cycle? Soc Sci Med 66:1733–1749; Blanchflower, David G. "Is happiness U-shaped everywhere? Age and subjective well-being in 145 countries." Journal of population economics 34, no. 2 (2021): 575-624.
The reported correlation above is unweighted. It is slightly higher using the WVS sample size as a weight (r=0.60). The World Values Survey item reads: “All things considered, how satisfied are you with your life as a whole these days?” Respondents choose on a 1-10 scale. The Gallup item is on 0 to 10 scale, with 10 being the best life you can imagine for yourself and 0 the worst. See here.
See Dan Witters and Kayley Bayne, "New Normal: Lower U.S. Life Ratings" Jan 18, 2024.
Gallup routinely measures wellbeing using both current and expected wellbeing (in 5 years) using the Cantril ladder. Respondents who score a 7 or higher on current wellbeing and an 8 or higher on expected wellbeing are coded as “thriving.” See here.
Megan Brenan, "U.S. Adults Report Less Worry, More Happiness," May 18, 2020.
See Dan Witters and Kayley Bayne, "New Normal: Lower U.S. Life Ratings" Jan 18, 2024.
Rothwell, J.T., Cojocaru, A., Srinivasan, R. and Kim, Y.S., 2024. Global evidence on the economic effects of disease suppression during COVID-19. Humanities and Social Sciences Communications, 11(1), pp.1-14
Here is the list of items, all of which is clearly documented in every download package of World Poll: Did you feel well-rested yesterday? (WP60); Were you treated with respect all day yesterday? (WP61); Did you smile or laugh a lot yesterday? (WP63); Did you learn or do something interesting yesterday? (WP65); Did you experience the following feelings during a lot of the day yesterday? How about enjoyment? (WP67); How about physical pain? (WP68); How about worry? (WP69); How about sadness? (WP70); How about stress? (WP71); How about anger? (WP74)
Tom Loveless, "Lessons from the PISA-Shanghai Controversy."
Jonathan Rothwell "How Parenting and Self-control Mediate the Link Between Social Media Use and Youth Mental Health," Institute for Family Studies. Jonathan Rothwell "Parenting Mitigates Social Media-Linked Mental Health Issues," October 7, 2023. Jonathan Rothwell, "Teens Spend Average of 4.8 Hours on Social Media Per Day," October 13, 2023.
Wellbeing Research Center, Oxford University. World Happiness Report, https://worldhappiness.report/ed/2024/#appendices-and-data, accessed June 18, 2024.
https://sapienlabs.org/a-comparison-of-measures-and-methodologies-of-the-global-mind-project-world-mental-health-survey-initiative-world-happiness-report/
The U.S. Census Bureau estimates bachelor’s or higher attainment as 37.9% in 2021 for the population aged 25 and older.
These are calculated using the Excel data provided by Global Minds for their 2023 report.
Our World in Data, https://ourworldindata.org/grapher/share-of-the-population-with-completed-tertiary-education?time=2020
Blanchflower, David G., Alex Bryson, and Xiaowei Xu. The Declining Mental Health Of The Young And The Global Disappearance Of The Hump Shape In Age In Unhappiness. No. w32337. National Bureau of Economic Research, 2024.
https://sapienlabs.org/wp-content/uploads/2024/03/4th-Annual-Mental-State-of-the-World-Report.pdf
Plus, these two shocks were the biggest negative shocks in a generation, and it is hard to believe they didn’t lower wellbeing. We know, for example that a rise in the unemployment or inflation rates has a major negative impact on wellbeing – see Blanchflower DG, DNF Bell, A Montagnoli, and M Moro, ‘The happiness tradeoff between unemployment and inflation’, Journal of Money Credit and Banking, Supplement to vol 46(2), October, 2014, pp. 117-141
Vuorre and Przybylski only used single measures of wellbeing. See https://tmb.apaopen.org/pub/a2exdqgg/release/1
The 112 countries that we found U-shapes for are Afghanistan; Albania; Algeria; Argentina; Armenia; Azerbaijan; Bahrain; Bangladesh; Belarus; Benin; Bolivia; Bosnia Herz; Botswana; Brazil; Bulgaria; Burkina Faso; Cambodia; Cameroon; CAR; Chad; Chile; China; Colombia; Comoros; Congo Brazzaville; Costa Rica; Croatia; Cyprus; Dominica; Ecuador; Egypt; El Salvador; Estonia; Eswatini; Ethiopia; Gabon; Georgia; Greece; Guatemala; Haiti; Honduras; Hungary; India; Indonesia; Iran; Iraq; Ivory Coast; Jamaica; Japan; Jordan; Kazakhstan; Kenya; Kosovo; Kuwait; Kyrgyzstan; Laos; Lebanon; Lesotho; Liberia; Libya; Madagascar; Malawi; Malaysia; Mali; Malta; Mauritius; Moldova; Mongolia; Montenegro; Morocco; Mozambique; Myanmar; Namibia; Nepal; Nicaragua; Nigeria; North Macedonia; Pakistan; Palestine; Panama; Paraguay; Peru; Philippines; Portugal; Puerto Rico; Romania; Russia; Rwanda; Senegal; Serbia; Sierra Leone; Slovenia; Somalia; Sri Lanka; Taiwan; Tajikistan; Tanzania; Thailand; The Gambia; Togo; Tunisia; Turkey; UAE; Uganda; Ukraine; Uruguay; Uzbekistan; Venezuela; Vietnam; Yemen; Zambia and Zimbabwe.
The 46 countries that we did not find u-shapes for are Australia; Austria; Belgium; Bhutan; Burundi; Canada; Congo Kinshasa; Czechia; Denmark; Finland; France; Germany; Ghana; Guinea; HK; Iceland; Ireland; Israel; Italy; Latvia; Lithuania; Luxembourg; Maldives; Mauritania; Mexico; Netherlands; New Zealand; Niger; Northern Cyprus; Norway; Poland; Qatar; Saudi; Singapore; Slovakia; South Africa; South Korea; South Sudan; Spain; Sweden; Switzerland; Syria; Trinidad; Turkmenistan; UK and USA
This is good stuff! Open academic debates about all the nitty gritty of research and polling.
If this helps towards the goal of understanding and then solving the youth mental health crisis, it will all be worth it!
Does it do us any good to know that something is generally true, or mostly true, with data like this?
We live in a time where there is a universe of conflicting information in our pockets at all times, so we have no choice but to trust our intuition, and many people's perceptions are not grounded in reality. Seeing constant threats when we live in the safest time to be alive in all of history.
Research is conducted by people. People who are fallible. People who have biases. People who are paid to find a particular result.
Theres nothing inherently wrong with science, institutions, religion, or social media. The human elements cause all the problems.
Ive got no choice but to tiptoe the line of uncertainty...on every issue. Its maddening, but the best option. Someone much smarter than me said something like "Doubt is not the opposite of faith. Certainty is."