NBER WORKING PAPER SERIES
WHY IS ALL COVID-19 NEWS BAD NEWS?
Bruce Sacerdote
Ranjan Sehgal
Molly Cook
Working Paper 28110
http://www.nber.org/papers/w28110
NATIONAL BUREAU OF ECONOMIC RESEARCH
1050 Massachusetts Avenue
Cambridge, MA 02138
November 2020
We thank Max Grozovsky and Nashe Mutenda for superb research assistance. The views
expressed herein are those of the authors and do not necessarily reflect the views of the National
Bureau of Economic Research.
NBER working papers are circulated for discussion and comment purposes. They have not been
peer-reviewed or been subject to the review by the NBER Board of Directors that accompanies
official NBER publications.
© 2020 by Bruce Sacerdote, Ranjan Sehgal, and Molly Cook. All rights reserved. Short sections
of text, not to exceed two paragraphs, may be quoted without explicit permission provided that
full credit, including © notice, is given to the source.
Why Is All COVID-19 News Bad News?
Bruce Sacerdote, Ranjan Sehgal, and Molly Cook
NBER Working Paper No. 28110
November 2020
JEL No. I12,J01
ABSTRACT
We analyze the tone of COVID-19 related English-language news articles written since January
1, 2020. Ninety one percent of stories by U.S. major media outlets are negative in tone versus
fifty four percent for non-U.S. major sources and sixty five percent for scientific journals. The
negativity of the U.S. major media is notable even in areas with positive scientific developments
including school re-openings and vaccine trials. Media negativity is unresponsive to changing
trends in new COVID-19 cases or the political leanings of the audience. U.S. major media readers
strongly prefer negative stories about COVID-19, and negative stories in general. Stories of
increasing COVID-19 cases outnumber stories of decreasing cases by a factor of 5.5 even during
periods when new cases are declining. Among U.S. major media outlets, stories discussing
President Donald Trump and hydroxychloroquine are more numerous than all stories combined
that cover companies and individual researchers working on COVID-19 vaccines.
Bruce Sacerdote
6106 Rockefeller Hall
Department of Economics
Dartmouth College
Hanover, NH 03755-3514
and NBER
Ranjan Sehgal
6106 Rockefeller
Dartmouth College
Hanover, NH 03755
Molly Cook
Brown University
69 Brown Street
Providence, RI 02912
Introduction
On February 18
th
, the Oxford Mail published a story that Professor Sarah Gilbert and her
colleagues at Oxford’s Jenner Institute were working on a vaccine for the novel coronavirus and
that rapid vaccine development could be possible given the scientists’ existing work and
experience with a possible MERS vaccine.
1
In contrast to Oxford Mail’s reporting, the U.S. major
media outlets of Fox News, CNN, The New York Times, and The Washington Post did not begin
coverage of Professor Gilbert’s COVID-19 related work until late April.
2
The U.S. based stories
emphasized caveats from health officials and experts downplaying the optimistic timeline and past
success of the Oxford researchers. The earliest available (major outlet) U.S. story is from CNN
on April 23
rd
and begins with a quote from England's Chief Medical Officer Chris Whitty saying
that the probability of having a vaccine or treatment "anytime in the next calendar year" is
"incredibly small."
There is a similar disconnect between U.S. major media reporting on school reopenings and
scientific findings on the same topic; the reporting is overwhelmingly negative, while the scientific
literature tells a more optimistic story. Oster (2020) collects data on school reopenings and
COVID-19 infections within schools and districts.
3
She finds that infection rates among students
remain low (at 0.14 percent) and schools have not become the super-spreaders many feared.
4
Guthrie et al (2020) and Viner et al (2020) review the available evidence and reach similar
1
https://www.oxfordmail.co.uk/news/18243665.scientists-working-coronavirus-vaccine-oxford/
2
We base this statement on a LexisNexis search for the terms “Sarah Gilbert” or “Sarah Gilbert and vaccine” since
January 1, 2020.
3
https://statsiq.co1.qualtrics.com/public-
dashboard/v0/dashboard/5f62eaee4451ae001535c839#/dashboard/5f62eaee4451ae001535c839?pageId=Page_1
ac6a6bc-92b6-423e-9f7a-259a18648318.
4
https://www.theatlantic.com/ideas/archive/2020/10/schools-arent-superspreaders/616669/.
conclusions. However, ninety percent of school reopening articles from U.S. mainstream media
are negative versus only 56 percent for the English-language major media in other countries.
The tone of media coverage impacts both human health and attitudes towards preventative
measures including vaccination, mask wearing, and social distancing (Bursztyn et al 2020, Van
Bavel and Baicker et al 2020, Simonov et al 2020, Kearney and Levine 2015, Ash et al 2020)
5
.
The proportion of U.S. adults who exhibit depression symptoms has risen threefold since the start
of the novel coronavirus pandemic (Etman et al 2020, Fetzer et al 2020). In discussing this increase
in mental health problems, U.S. Centers for Disease Control and Prevention recommend against
heavy consumption of news stories about the pandemic
6
.
Our results suggest the CDC’s warning is prescient. We categorize by topic over 9.4 million
published news stories on COVID-19 since January 1, 2020. We then conduct several forms of
textual analysis on roughly 20,000 COVID-19 news stories to examine levels of negativity by
subtopic, source of the news, and time period. We have five major findings. First, COVID-19
stories published by the top 15 U.S media outlets (by readership/viewership) are 25 percentage
points more likely to be negative in content than more general U.S. sources or major media outlets
outside the U.S.
7
Second, the time pattern in observed negativity is at most weakly related to the
actual time trend in new weekly cases of COVID-19 in the U.S. Third, the most popular stories in
5
Bannerjee et al (2020) find that text messaging can significantly increase reporting of COVID symptoms and use of
social distancing and other health promoting measures. Nyhan et al (2014) find that it’s difficult to correct
misperceptions around vaccine safety.
6
https://www.cdc.gov/coronavirus/2019-ncov/daily-life-coping/managing-stress-anxiety.html
7
This regression-based estimate controls flexibly for article length and week of publication. The unadjusted
probability of an article being negative is 91 percent for US major media versus 54 percent for English-language
non-US major media.
The New York Times have high levels of negativity, particularly for COVID-19-related articles.
8
Fourth, negativity appears to be unrelated to the political leanings of the newspaper’s or network’s
audience (Niven 2001). Finally, U.S. major media stories that discuss the benefits of social
distancing or alternatively the benefits of mask wearing are less numerous than stories about
President Trump not wearing a mask. Similarly, the terms “Trump and hydroxychloroquine
receive more coverage than do all stories about companies and researchers developing vaccines.
Overall, we find that relative to other media sources, the most influential U.S. news sources are
outliers in terms of the negative tone of their coronavirus stories and their choices of stories
covered. We are unable to explain these patterns using differential political views of their
audiences or time patterns in infection rates. This is analogous to Niven (2001) which finds a
strong negative bias in the U.S. media when covering unemployment and limited evidence of
partisanship. U.S. major outlets do demonstrate an above- average interest in promoting prosocial
behavior like mask wearing and social distancing. Consistent with the existing literature
(Gentzkow and Shapiro 2010 and Gentzkow, Glaeser and Goldin 2006), our results suggest that
U.S. major outlets publish unusually negative COVID-19 stories in response to reader demand and
interest.
Data Description
We obtain counts of COVID-19 articles and separately the text of COVID-19 articles using the
LexisNexis database. We use all English news sources and a date range of January 1, 2020 to July
31, 2020. We divide our universe of sources into the top (most widely read or watched) sources
8
This is consistent with the findings of Gentzkow and Shapiro (2010) who find that media respond strongly to
consumer preferences. Eshbaugh-Soha (2010) finds that negativity media coverage of the President responds to
local support for the President.
and all other sources. We further stratify by U.S. versus non-U.S. sources. The top non-TV
sources for the U.S. that are also included in LexisNexis are Newsweek, the New York Post, Los
Angeles Times, USA Today, Politico, The Hill, and the New York Times. For the top television
sources we include both written articles and television transcripts from ABC, CBS, CNN, Fox
News, MSNBC and NBC. Further details for our data downloading procedure and the search
terms used are contained in Appendix 1.
We also gather the text of articles discussing COVID-19 vaccines from five widely read scientific
and medical journals namely Science, JAMA, The New England Journal of Medicine, The Lancet,
and Nature. We gather the New York Times most popular articles from their website from
September 4-October 6
th
2020. We rely on the New York Times most read articles in our current
investigation, but future versions of this paper will also incorporate “most emailed” articles, outlets
beyond the New York Times, and a larger date range.
We analyze the text of 20,000 articles that fall within three subtopics regarding COVID-19:
vaccines, increases and decreases in case counts, and reopenings (of businesses, schools, parks,
restaurants, government facilities, etc). We limit ourselves to roughly 20,000 articles given the
legal requirement to “manually” download the articles from LexisNexis 100 articles at a time
9
.
We classify all articles using two different but related methods. First, we measure the fraction of
words that are negative according to established dictionaries of negative words. See Liu 2012,
Tetlock 2007, Loughran and McDonald 2011 for canonical examples of this approach.
10
The
9
LexisNexis does not permit automated downloading of the text of stories. We manually downloaded articles in
batches of 100 articles.
10
Riffe Lacy Fico and Watson (2019) is an in depth presentation of these methods. Grimmer and Stewart (2013)
review the value of text analysis for summarizing political documents and transcripts.
results reported here use the Hu-Liu (2004) dictionary of positive and negative words.
11
We
compute the fraction of total words that are negative according to the dictionary and standardize
this variable to be mean 0 variance 1.
12
Second, we create a predicted probability that an article has a negative tone. We identify
characteristics of negative and positive media reports in a set of 200 articles classified as strongly
positive or negative by human readers. We use the two- and three-word phrases appearing in the
training articles combined with machine leaning techniques to find the phrases that best predict
whether the human reader will classify an article as strongly negative. We implement a Naïve
Bayes classification scheme (Zhang 2004 , Pazzani 1996, Antweiler and Frank 2004).
13
Naïve
Bayes assumes that each phrase in the article contributes independently to the probability that the
article is negative and maximizes the number of correct predictions given the phrases.
We use the resulting model to predict whether each of the 20,000 articles in our sample are
negative. For example, the inclusion of the phrases “clinical trial” and “Jenner Institute” are strong
predictors of an article being positive while “White House” and death toll” are strong predictors
of a negative article.
Table 1 reports summary statistics at the article level for our main sample which excludes New
York Times most popular articles and a comparison sample of non-COVID articles. We analyze
roughly 23,500 articles from January 1, 2020 to July 31, 2020. On average the articles have 1652
11
We have conducted the same analysis using the Harvard General Inquirer dictionary of positive and negative
words and obtain qualitatively similar results. http://www.wjh.harvard.edu/~inquirer/
12
To keep the yardstick consistent, we standardize once within our broad sample which includes New York Times
most popular articles and a sample of non-Covid articles. Our main analysis sample excludes these two categories.
We standardize before dropping duplicates of articles which were published multiple times.
13
To extract phrases and implement the Naïve Bayes classification scheme we use WordStat software created by
Provalis research.
words. This count and our subsequent statistics are measured after we apply a truncation procedure
to limit the text to be within 10 lines of the words “COVID” or “coronavirus”. We applied this
truncation to deal with very long television transcripts that switched to non-COVID topics in the
middle of the transcript. However, results are quite similar with or without truncation.
The share of negative words (using the Hu-Liu dictionary) is 4 percent. As mentioned above we
standardize this variable to aid in interpreting the coefficients. By construction, our articles are
divided roughly equally between articles on increases/decreases in cases, reopenings, and
vaccines. The division among US major media, US General media, International Major Media,
and International General media is also roughly equal.
14
Results
Figure 1 plots the time trend in media negativity for major media outlets in the U.S. (green line)
and outside the U.S. (blue line) using the scale on the left. The most striking fact is that 91 percent
of the U.S. stories are classified as negative whereas 54 percent of the non-U.S. stories are
classified as negative. Figure 1 uses our estimated probability that an article is negative. We
obtain similar results using the Hu-Liu dictionary and the fraction of words in the article that are
negative.
The red line plots the weekly average of daily new cases of COVID-19 in the U.S. using the scale
on the right. The x-axis is the week of the year within 2020. New cases per day rise sharply from
March through mid-April. Cases decline until about June 15th, then rise rapidly until late July,
14
We don’t have exactly 25% of articles in each major category because our initially drawn sample included many
articles that were repeats which we then eliminated to arrive at this final sample. Our reopenings analysis is for all
reopenings articles. We also examine school reopenings specifically and for these articles schoolmust appear in
the title of an article that also containsreopenor re-open”.
when cases begin to decline again. Average media negativity over time is not correlated with new
case counts, as regression results confirm (not reported).
In Table 2 we regress our estimated probability that an article is negative on indicator variables for
whether the source is from the U.S. major media, U.S. general sources, or international general
sources. The omitted category is international (non-U.S.) major media sources. In the regressions
we control flexibly for the length of the article and the week the article was published. We run a
linear probability model, though results from probit and logit models are similar to those reported
here. The non-U.S. major media sources have a baseline rate of negativity of 54 percent. In
column (1) we show that relative to this omitted category, articles in the U.S. major media are 25
percentage points more likely to be negative. In contrast, U.S. general and non-U.S. general
sources have about the same level of negativity as non-U.S. major media.
In column (2) we switch the dependent variable to the share of negative words in the article. We
standardize the outcome to be mean 0 standard deviation 1. The U.S. major media publish stories
that are .23 standard deviations more negative relative to non-U.S. major media. U.S. general
media outlets are significantly less negative than all other categories of sources. In columns (3)-
(5) we examine media negativity by subtopic within COVID-19. Relative to both types of
international media, U.S. major media vaccine articles are particularly negative. Vaccine stories
in the U.S. major media are 45 percentage points more likely to be negative relative to stories in
the non-U.S. general media.
In Figure 2 we present the mean share of negative words (standardized) by source and topic
(COVID-19 versus not). For Figure 2 only, we add a large sample of non-COVID articles and the
New York Times most popular articles. Starting with the bars at the bottom of the chart, we see
that in a sample of non-COVID-19 stories (pre-January 2020), the U.S. major media are only
modestly more negative than the rest of the sample. In covering COVID-19 (the second bar from
the bottom), U.S. major media negativity is .31 standard deviation above the average while the
non-U.S. major media are .17 standard deviation below average. Notably, scientific media articles
on COVID-19 vaccines are a full standard deviation below average in negativity. In contrast, the
New York Times’ most popular articles are .6 standard deviations above the sample mean in
negativity for non-COVID-19 stories and 1.5 standard deviations above the mean when covering
COVID-19 topics. Readers of the U.S. major media (as represented by the New York Times) are
attracted to negative stories in general and negative stories about COVID-19 in particular.
The next two figures look specifically at the share of words that are negative within vaccine articles
(Figure 3) and within school reopening articles (Figure 4). We standardize across the entire sample
(all topics) and hence are comparing the negativity in the vaccine articles to the overall sample
mean. For vaccine articles, all media categories are meaningfully below the overall sample mean
for negativity, except for the U.S. major media which produces articles on COVID-19 vaccines
that are .35 standard deviations higher on negativity.
These data were gathered prior to Pfizer’s positive stage three trial result announced on November
9
th
. Our results show that on a relative basis, U.S. major media gave much less positive coverage
to the developments that lead up to Pfizer’s breakthrough. We hypothesize (but have not yet
tested) that U.S. major media coverage of vaccines remained more negative than other categories
of media during and after the Pfizer announcement.
For school re-opening articles (Figure 4), the U.S. major media is .18 standard deviations more
negative than the overall sample mean. All other media categories are less negative than the
sample mean. The U.S. general media vaccine articles are .4 standard deviations less negative.
A natural question is whether media negativity varies greatly by the specific news source and
whether that variation is related to the political beliefs of the readership. Our results are perhaps
surprising. COVID-19 stories from all the major U.S. outlets have high levels of negativity and
the variation that does exist is not correlated with readerspolitical leanings. See Figure 5. We
plot the share of negative words (standardized) by U.S. media source versus the probability that
conservative-leaning people say that this is a “trusted media source.” The latter comes from a
2019 Pew survey of 12,000 people about their consumption of election news.
15
The estimated probability that a COVID-19 article is negative varies from 70 percent to 100
percent among major U.S. outlets. These probabilities are not correlated with the likelihood that
conservative consumers of news trust the source. COVID-19 stories from Fox News are about as
negative as those from CNN. We obtain similar results using our estimated probability that a story
is negative.
We now take a broader look at which COVID-19 topics the media choose to emphasize. Table 3
provides an overview of the number of COVID-19-related articles during our sample period
(January-July 2020) and counts of articles by topic, where one article can cover multiple topics.
Overall, we found 2.6 million articles from U.S.-based sources and 6.4 million from non-U.S.
sources. The rows represent different search terms we included while the columns represent four
broad categories of sources, namely U.S. versus non-U.S. interacted with major media outlet
versus general media. We are most interested in the relative coverage of different topics. For
example, among the U.S. major media (column 2) 15,000 stories mention increases in caseloads
15
https://www.pewresearch.org/fact-tank/2020/01/24/qa-how-pew-research-center-evaluated-americans-trust-
in-30-news-sources/
while only 2,500 mention decreases, or a 6 to 1 ratio. Even when caseloads were falling nationally
(April 24
th
to June 27
th
), this ratio remains relatively high at 5.3 to 1.
In row 3, we show results for mentions of COVID-19 vaccines and any names of the top ten
institutions or companies working on a COVID-19 vaccine. The U.S. major outlets ran 1,371 such
stories. During the same period they ran 8,756 stories involving Trump and mask wearing and
1,636 stories about Trump and hydroxychloroquine.
A natural question is whether the media is promoting prosocial behaviors (Simonov et al 2020 and
Burstyn et al 2020). While we cannot answer whether the U.S. media are “doing enough” to
promote transmission-reducing behavior in absolute terms, we can compare how emphasis of the
benefits of mask wearing or social distancing varies across media categories. Five percent of
COVID-19 articles in U.S. major outlets mention the benefits of mask wearing compared to .6
percent for non-U.S. outlets and 2 percent for general U.S. sources. U.S. major media outlets are
also much more likely to discuss the benefits of social distancing (4 percent of stories) than their
non-U.S. counterparts (1 percent of stories). This suggests the U.S. media are outperforming the
non-U.S. media in promoting prosocial behavior, though perhaps because such messages are more
needed in the U.S.
16
Overall, we find that COVID-19 stories from U.S. major media outlets are much more negative
than similar stories from other U.S. outlets and from non-U.S. sources. The negativity does not
respond to changes in new cases. Potentially positive developments such as vaccine stories receive
less attention from U.S. outlets than do negative stories about Trump and hydroxychloroquine.
Overall, we are unable to explain the variation in negativity with political affiliation of an outlet’s
16
See Della Vigna and La Ferrara (2015) for a summary which discusses more generally the impact of media
consumption on human behavior.
audience, or U.S case count changes, but we do find that U.S. readers demand negative stories (as
evidenced by article popularity). We conclude that the CDC’s implicit warning labelagainst
consuming too much U.S. COVID-19 media may be warranted.
References
Antweiler, Werner, and Murray Z. Frank. "Is all that talk just noise? The information content of
internet stock message boards." The Journal of finance 59, no. 3 (2004): 1259-1294.
Ash, Elliott, Sergio Galletta, Dominik Hangartner, Yotam Margalit, and Matteo Pinna. "The
Effect of Fox News on Health Behavior During COVID-19." Available at SSRN
3636762 (2020).
Banerjee, Abhijit, Marcella Alsan, Emily Breza, Arun G. Chandrasekhar, Abhijit Chowdhury,
Esther Duflo, Paul Goldsmith-Pinkham, and Benjamin A. Olken. Messages on covid-19
prevention in india increased symptoms reporting and adherence to preventive behaviors
among 25 million recipients with similar effects on non-recipient members of their
communities. No. w27496. National Bureau of Economic Research, 2020.
Bursztyn, Leonardo, Aakaash Rao, Christopher Roth, and David Yanagizawa-Drott.
"Misinformation during a pandemic." University of Chicago, Becker Friedman Institute
for Economics Working Paper 2020-44 (2020).
Eshbaugh-Soha, Matthew. "The tone of local presidential news coverage." Political
Communication 27, no. 2 (2010): 121-140.
Ettman, Catherine K., Salma M. Abdalla, Gregory H. Cohen, Laura Sampson, Patrick M. Vivier,
and Sandro Galea. "Prevalence of depression symptoms in US adults before and during
the COVID-19 pandemic." JAMA network open 3, no. 9 (2020): e2019686-e2019686.
Fetzer, Thiemo, Lukas Hensel, Johannes Hermle, and Christopher Roth. "Coronavirus
perceptions and economic anxiety." Review of Economics and Statistics (2020): 1-36.
DellaVigna, Stefano, and Eliana La Ferrara. "Economic and social impacts of the media."
In Handbook of media economics, vol. 1, pp. 723-768. North-Holland, 2015.
Gentzkow, Matthew, and Jesse M. Shapiro. "What drives media slant? Evidence from US daily
newspapers." Econometrica 78, no. 1 (2010): 35-71.
Gentzkow, Matthew, Edward L. Glaeser, and Claudia Goldin. "The rise of the fourth estate. How
newspapers became informative and why it mattered." In Corruption and reform:
Lessons from America's economic history, pp. 187-230. University of Chicago Press,
2006.
Gentzkow, Matthew, Bryan Kelly, and Matt Taddy. "Text as data." Journal of Economic
Literature 57, no. 3 (2019): 535-74.
Groseclose, Tim, and Jeffrey Milyo. "A measure of media bias." The Quarterly Journal of
Economics 120, no. 4 (2005): 1191-1237
Gurun, Umit G., and Alexander W. Butler. "Don't believe the hype: Local media slant, local
advertising, and firm value." The Journal of Finance 67, no. 2 (2012): 561-598.
Guthrie, Brandon L., Jessie Seiler, Lorenzo Tolentino, Wenwen Jiang, Molly Fischer, Rodal
Issema, Sherrilynne Fuller, Dylan Green, Diana M. Tordoff, Julianne Meisner, Ashley
Tseng, Diana Louden, Jennifer M. Ross, Alison L. Drake “Summary of Evidence Related
to Schools During the COVID-19 Pandemic” Report from COVID-19 Literature Report
Team, Washington State Department of Public Health. October 2020.
Grimmer, Justin, and Brandon M. Stewart. "Text as data: The promise and pitfalls of automatic
content analysis methods for political texts." Political analysis 21, no. 3 (2013): 267-297.
Hu Minqing and Bing Liu. "Mining and Summarizing Customer Reviews." Proceedings of the
ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
(KDD-2004), Aug 22-25, 2004, Seattle, Washington, USA
Kearney, Melissa S., and Phillip B. Levine. "Media influences on social outcomes: The impact of
MTV's 16 and pregnant on teen childbearing." American Economic Review 105, no. 12
(2015): 3597-3632.
Larcinese, Valentino, Riccardo Puglisi, and James M. Snyder Jr. "Partisan bias in economic
news: Evidence on the agenda-setting behavior of US newspapers." Journal of public
Economics 95, no. 9-10 (2011): 1178-1189.
Liu, Bing. "Sentiment analysis and opinion mining." Synthesis lectures on human language
technologies 5.1 (2012): 1-167.
Loughran, Tim, and Bill McDonald. "When is a liability not a liability? Textual analysis,
dictionaries, and 10‐Ks." The Journal of Finance 66, no. 1 (2011): 35-65.
Martin, Gregory J., and Ali Yurukoglu. "Bias in cable news: Persuasion and
polarization." American Economic Review 107, no. 9 (2017): 2565-99.
Niven, David. "Bias in the news: Partisanship and negativity in media coverage of presidents
George Bush and Bill Clinton." Harvard International Journal of Press/Politics 6.3
(2001): 31-46.
Nyhan, Brendan, Jason Reifler, Sean Richey, and Gary L. Freed. "Effective messages in vaccine
promotion: a randomized trial." Pediatrics 133, no. 4 (2014): e835-e842.
Oster, Emily, “Schools Aren’t Super-Spreaders”, The Atlantic, October 9, 2020.
Pazzani, M. J. 1996. Search for dependencies in Bayesian classifiers. In Fisher, D., and Lenz, H.
J., eds., Learning from Data: Artificial Intelligence and Statistics V. Springer Verlag.
Puglisi, Riccardo. "Being the New York Times: the political behaviour of a newspaper." The BE
journal of economic analysis & policy 11, no. 1 (2011).
Riffe, Daniel, Stephen Lacy, Frederick Fico, and Brendan Watson. Analyzing media messages:
Using quantitative content analysis in research. Routledge, 2019.
Simonov, Andrey, Szymon K. Sacher, Jean-Pierre H. Dubé, and Shirsho Biswas. The persuasive
effect of fox news: non-compliance with social distancing during the covid-19 pandemic.
No. w27237. National Bureau of Economic Research, 2020.
Tetlock, Paul C. "Giving content to investor sentiment: The role of media in the stock
market." The Journal of finance 62, no. 3 (2007): 1139-1168.
Van Bavel, Jay J., Katherine Baicker, Paulo S. Boggio, Valerio Capraro, Aleksandra Cichocka,
Mina Cikara, Molly J. Crockett et al. "Using social and behavioural science to support
COVID-19 pandemic response." Nature Human Behaviour (2020): 1-12.
Viner, Russell M., et al. "School closure and management practices during coronavirus outbreaks
including COVID-19: a rapid systematic review." The Lancet Child & Adolescent
Health (2020).
Zhang H. (2004). The optimality of Naive Bayes. Proc. FLAIR.
Figure 1: Media Negativity and New COVID-19 Cases Over Time
Notes: Negativity is estimated using supervised machine learning on article phrases coupled with a training data set.
Articles are manually downloaded from LexisNexis for the period January 1
st
, 2020 to July 31
st
, 2020. The red line
shows the weekly average of daily confirmed new COVID-19 cases and is accessed from the New York Times
website.
Figure 2:
Media Negativity by Source for COVID-19 and Non-COVID-19
Articles
Notes: Negativity is estimated as the fraction of negative words in the article and is standardized. Dark blue bars are
for COVID related articles and light blue bars are for non-COVID related articles. The raw share of negative words
is .043 with a standard deviation of .021. Negative words are defined by the Hu-Liu (1997) dictionary. Articles and
transcripts are manually downloaded from LexisNexis for the period January 1
st
, 2020 to July 31
st
, 2020 and
websites for Science, JAMA, The New England Journal of Medicine, The Lancet, and Nature. The New York Times
website is used for the list and text of the most popular articles.
Figure 3:
Media Negativity by Source for COVID-19 Vaccine Articles
Figure 4:
Media Negativity by Source for School Reopening Articles
F
igure 5: Media Negativity and Audience Political Leanings
Notes: Negativity is estimated as the fraction of negative words in the article and is standardized. The raw share of
negative words is .043 with a standard deviation of .021. Negative words are defined by the Hui-Lu (1997)
dictionary. Articles and transcripts are manually downloaded from LexisNexis for the period January 1
st
, 2020 to
July 31
st
, 2020. “Trusted source” is measured in a 2019 Pew Survey of U.S. adults.
Table 1: Summary Statistics
(1)
(2)
(3)
(4)
(5)
N
mean
sd
min
max
Word Count of Article
23,486
1,652
2,654
21
57,166
Estimated P(Article is Negative)
23,486
0.663
0.384
0
1
Share of Words That Are Negative
23,486
0.0420
0.0200
0
0.190
Share Words Negative Standardized
23,486
-0.118
0.999
-2.212
7.269
Is a Scientific Article on Vaccines
23,486
0.00860
0.0923
0
1
Is an Increase/Decrease in Cases Article
23,486
0.303
0.460
0
1
Is a Reopenings Article
23,486
0.350
0.477
0
1
Is a Vaccine Article
23,486
0.347
0.476
0
1
US Major Media
23,486
0.292
0.454
0
1
US General Media
23,486
0.236
0.425
0
1
International Major Media
23,486
0.270
0.444
0
1
International General Media
23,486
0.194
0.395
0
1
Fraction of Conservatives Who Trust This Source (US Major Media)
8,131
0.252
0.137
0.116
0.742
Notes: We present summary statistics for our main variables. Each article is one observation. Probability of the article being negative is estimated using
supervised machine learning on article phrases coupled with a training data set. Share of negative words is estimated as the fraction of negative words in the
article and is standardized. The raw share of negative words is .043 with a standard deviation of .021. Negative words are defined by the Hui-Lu (1997)
dictionary. Articles are manually downloaded from LexisNexis for the period January 1
st
, 2020 to July 31
st
, 2020.
T
able 2: Negativity by Media Category and Topic
(1)
(2)
(3)
(4)
(5)
Probability
Article is
Negative (All
Articles)
Share of
Negative
Words
Standardized
(All Articles)
Probability
Article is
Negative
(Vaccine
Articles)
Probability
Article is
Negative
(Case Count
Articles)
Probability
Article is
Negative
(Reopening
Articles)
US Major Media
0.253***
0.234***
0.452***
0.188***
0.203***
(0.00666)
(0.0192)
(0.0121)
(0.00805)
(0.00776)
US General Media
-0.00196
-0.422***
-0.0281**
0.0521***
0.142***
(0.00834)
(0.0206)
(0.0126)
(0.0111)
(0.00862)
International General Media
-0.00727
-0.0750***
0.0212**
0.0771***
0.0896***
(0.00648)
(0.0184)
(0.00935)
(0.00824)
(0.00789)
Observations
20,909
20,909
7,246
6,295
7,367
R-squared
0.388
0.223
0.546
0.422
0.423
Mean Negativity for Intl Major Media
.541
-.160
.242
.686
.645
(the Omitted Category)
Notes: Probability of the article being negative is estimated using supervised machine learning on article phrases coupled with a training data set. Share of
negative words is estimated as the fraction of negative words in the article and is standardized. The raw share of negative words is .043 with a standard deviation
of .021. Negative words are defined by the Hui-Lu (1997) dictionary. Articles are manually downloaded from LexisNexis for the period January 1
st
, 2020 to
July 31
st
, 2020. All columns use OLS with robust standard errors. *** p<0.01, ** p<0.05, * p<0.1
Table 3: Total COVID-19-Related Media Articles by Topic: January 31
st
,
2020 to July 31
st
, 2020
Topic
U.S. Total
U.S.
mainstream
U.S. non-
mainstream
Non-U.S.
Total
Non-U.S.
mainstream
Non-U.S.
non-
mainstream
Coronavirus/COVID-19
2,594,510
90,600
2,503,910
6,823,410
453,900
6,369,510
Vaccines
33,980
2,375
31,605
69,600
3,257
66,343
Vaccines + Sarah Gilbert Etc.
28,740
1,371
27,369
54,860
2,299
52,561
Increases Whole Time Period
325,550
15,200
310,350
666,895
41,386
625,509
Decreases Whole Time Period
87,550
2,462
85,088
99,630
3,067
96,563
Increases 4/24-6/27 Period
103,700
3,581
100,119
314,548
16,660
297,888
Decreases 4/24-6/27 Period
33,000
676
32,324
53,850
1,297
52,553
Reopening
412,780
19,300
393,480
680,052
31,630
648,422
Masks
386,890
23,600
363,290
670,994
43,090
627,904
Masks and Trump
56,579
8,756
47,823
46,187
2,339
43,848
Benefits Masks
51,700
4,436
47,264
61,680
2,687
58,993
Social Distancing
378,940
19,600
359,340
811,503
55,610
755,893
Benefits Social Distancing
60,450
3,975
56,475
86,249
4,163
82,086
Hydroxychloroquine
21,440
2,273
19,167
33,005
2,746
30,259
Hydroxychloroquine and Trump
10,640
1,636
9,004
12,503
929
11,574
Notes: Article counts come from a LexisNexis for the period January 1
st
, 2020 to July 31
st
, 2020. The left most column indicates the search terms used (see
methodology documents for exact searches). The article can be counted in multiple rows if the article contains both sets of terms.
Appendix 1
Dataset Construction Details and Search Terms Used
Data Set Construction
Our dataset was assembled from Nexus Lexis articles. We utilized the following instructions:
1. Click on the link (links were derived from search terms at the bottom of this document)
(setting pages to display 50 at a time instead of 10)
2. Click the dropdown on the left that says location by publication
3. Click edit settings
4. In results display settings, switch it from 10 to 50.
5. Scroll to the bottom and hit save (you may have to do this every time for each link, not entirely sure how it “saves”
(downloading)
6. Before downloading, double check that you are sorting by relevance, and the slider is set to group duplicates
7. Click the little box beside the folder to select the whole page
8. Go to the next page and do the same
9. Click the download button which looks like it’s a downwards pointing arrow
10. In the dialog box, make sure the format is RTF and “save as individual files” these likely won’t be done already.
11. Download, and repeat until reaching 2500/link. In the final dataset this number may be less due to duplicates.
Lexus Nexus Article Search Process
vaccines
inc/dec
reopening
coronavirus or COVID-19 and
ATLEAST5(vaccine)
coronavirus or COVID-19 and cases and increase or
decrease
COVID-19 or coronavirus and
reopening
American mainstream sources in our dataset consisted of:
US Mainstream
Sources
International
Mainstream Sources
Fox
AFR
IndianExpress
Hindu
MSNBC
Analysis
MetroUK
Sun (England)
ABC
AsiaPacific
Newcastle
SunHerald
CBS
AustralianFin
Northern Territory
Sydney Morning
Herald
CNN
BrisbaneTimes
SundayAge
Times of India
NBC
CTV
SundayHerald
TorontoStar
NPR
CanberraTimes
SydneyMorning
WestAZ
LAtimes
DailyMirror
Advertiser
WAToday
Newsweek
Geelong Advertiser
TheAge
Telegraph
Politico
HeraldSun
TheAustralian
Guardian
TheHill
HinduTimes
AustralianMag
NYtimes
Hobart
Courier
NYPost
IllawarraMercery
EveningStandard
USAToday
IndiaToday
GlobeMail
Appendix Table 1 (not for Publication):
Negativity by Specific Media Source
(1)
(2)
Prob Article is Negative—U.S.
Sources
Share of Negative Words (Standardized)—U.S.
Sources
Fox
0.396***
0.681***
(0.0131)
(0.0384)
MSNBC
0.283***
0.294*
(0.0532)
(0.156)
ABC
0.367***
0.745***
(0.0169)
(0.0495)
CBS
0.357***
0.707***
(0.0210)
(0.0616)
CNN
0.394***
0.833***
(0.00757)
(0.0222)
NBC
0.101**
0.365**
(0.0492)
(0.144)
NPR
0.255***
0.572***
(0.0125)
(0.0367)
LATimes
0.413***
1.088***
(0.0175)
(0.0513)
Newsweek
0.174***
0.170
(0.0483)
(0.142)
Politico
0.341***
1.045***
(0.0243)
(0.0712)
TheHill
0.563***
1.167***
(0.149)
(0.436)
NYTimes
0.256***
1.126***
(0.0109)
(0.0320)
NYPost
0.149***
0.788***
(0.0463)
(0.136)
USAToday
0.293***
0.884***
(0.0240)
(0.0704)
Constant
0.838***
-2.169***
(0.139)
(0.408)
Observations
10,156
10,156
R-squared
0.394
0.298
Omitted category consists of all U.S. sources not named above. Regressions are estimated using a linear probability model. Robust standard
errors in parentheses. *** p<0.01, ** p<0.05, * p<0.1
Appendix Table 2 (not for publication):
Relationship Between Negativity and Political Leanings of Audience
(1)
Probability Article is Negative
(Controlling for Politics)
conservative_trusted_source
0.0507
(0.0404)
Constant
0.300**
(0.111)
Observations
11,505
R-squared
0.355
Robust standard errors in parentheses
*** p<0.01, ** p<0.05, * p<0.1
Appendix Figure 1 (not for publication):
Probability Article is Negative and Audience Political Leanings