Established newspapers and other news sources have historically affirmed the 1972 agenda-setting theory, a communications studies hypothesis that posits that an elite few in control of authoritative avenues of mass-media have the power to set public discourse and opinion. However, O’Reilly specifies in his definition of web 2.0 systems that web pages gain commercial and popular influence by drawing on “the wisdom of the crowds,’” implying that the public now has a stronger control over content discussed than centralized sources do. Many argue that the Internet changes the way we believe by making more equal the voices of authority and of the common, potentially diminishing the ability of elite newsmakers from setting popular discussion trends.
Does the advent of Web 2.0 and the Internet age derail the agenda-setting theory? Does the Internet’s perceived democratization of voice mean that the public now determines its own topics, or does agenda-setting by traditional news sources still dominate discussion in web 2.0 and social-media environments despite the perceived decentralization that the two offer?
The subject of our analysis is the publication of decentralized sweets and the publication of centralized and curated articles by traditional media. We will overlay the two data sets and distinguish the rise and fall of topics within the realm of traditional and social media. If an interest spike first occurs in social media, and then is followed by a surge of reporting on the topic in traditional media, we can extrapolate that social media now has a stronger hand in shaping the topics of discussion, thus putting into question the validity of the agenda-setting theory as it pertains today. If interest first appears in traditional media, and is then followed by social media, then it would mean that traditional media still controls popular opinion, and that Web 2.0's decentralization of voice is an illusion since the public only reacts to traditional media instead of influencing it. Alternatively, if we see no correlation in the two data sets, it implies that the topics discussed in social media and traditional media are to an extent divorced.
We decided to draw our data from the traditional and social media surrounding the resignation and replacement of the Pope because of its historical and temporal relevance. The event has historical significance because it marks the first time since 1415 that Pope has voluntarily abdicated. The event has temporal relevance since Twitter API restricts us from accessing and recording tweets up to six to nine days back in time, and the ongoing replacement of the Pope allows us to start recording from its beginning to end, giving us a better overview of the situation than a historical event in the past. Additionally there will be administrative activities resulting from the resignation of the Pope such as the formation of the conclave, the body that elects the new Pope.
Spikes in twitter traffic will take place after newspaper spikes. Web 2.0 will do little to derail the agenda-setting theory as a large majority of news on the internet and social media are rehashes or discussions of prior news from more legitimate sources.
Data and Visualization
We first gathered the data by utilizing a PHP script to query Twitter’s API and store the tweets into our own MySql database so that we can have a longer archive. We then ran this script every seven days to further populate our database. Our tweet data begins on February 10th, and ends on March 20th. We then queried all the links in the tweets, filtered for their domain, and found the most tweeted websites in the Twitterverse. We used this information and Google News to create a second database containing the publish dates and article information within a time span. We then utilized D3JS in order to create an interactive tool for us and others to compare and analyze the data--filtering and increasing fidelity. We also applied a logarithmic scale in order to allow smaller spikes in tweets to be visible.
We first arbitrarily assigned colors to each news source in order to remove our biases. However, we soon realized that the data is unclear and that we needed to apply an editorialized approach. We utilized our best judgements and perception to group sites of similar reputations together.
On both data retrieval, twitter and traditional media, we queried for the word “pope” in order to get the most amount of data. We dove into and filtered through this data in our visualization in order to get a more refined picture. In order to help us with this, we also created a function that allows us to see the article publisher and the article title by hovering over the line that represents it. We also were able to get more details by clicking on the line, which will bring up the page of the article.
To check for accuracy, we overlayed two area charts, one colored black for the query “Pope” and one colored dark gray for the query “Conclave,” in order to see whether there are many tweets that are irrelevant or jokes rather than discussions. We determined that both charts for the most part coincided with each other, so jokes and irrelevant tweets do not appear to dilute or corrupt the data. The spikes still occurred on both queries and thus we surmise that the query we used gave an accurate enough data.
We then further attempted to confirm this accuracy by analyzing the tweets surrounding news article in order to confirm that they are tweets that pertains to the topic of the article. In the screenshot above there is a spike in tweets containing the world “conclave” just as news articles are being published on the conclave beginning (March 8th).
We noticed a direct correlation between news articles and tweet: large spikes in tweets did not usually occur without news articles, and flat periods of tweets correlated to periods where little to no articles are produced.
Spikes in tweets usually did not occur without articles directly preceding them, suggesting that a few select news articles build interest in a subject. Common initiators of huge spikes in tweets include cnn.com, nytimes.com, and bbc.com.
Articles from nytimes.com and similar news source are not neccesarily the first to be published in a series, but they were close to large spikes, suggesting that nytimes and the other major news sources are most influential in initiating trends.
Articles from most other news sources follow after spikes in tweets (on negative slopes), suggesting some news sources follow trends after select few news sources initiate a spike in discussion. It seems that articles published by these source have no impact on the discussion in the twitterverse--they only seem to follow the trends, just as Tweets do. That is, initial reports generate the greatest amount of interest, no matter how few the number of reports are. In the graph above it is clear that sites with higher perceived quality, such as CNN (blue), occur before the tweets spike, whereas sites like Huffington Post (red) occur after the tweets spike.
Articles from Huffington Post and Fox follow after large spikes, suggesting that they are less influential in setting discussion trends and more reactive to earlier articles by top-tier news sources or to popular discussion. That is, these two news agencies do not fit into the agenda-setting theory.
Moreover, this spectrum in publishers is further corroborated by the overall view of article published trend. In the picture above it is clearly seen that the lines representing articles published gets closer to purple, red added on blue, as it gets closer to trendy tweets. We say trendy tweets because at this point, the spike in tweets increased dramatically to almost 35,000. Before the dramatic spike, and where tweets are much fewer, and only select news media outlet coded blue, such as CNN, remain persistent in their publishing.
Agenda-setting theory still holds true for public discourse in Web 2.0 and internet environments, but we noticed that only a few powerful and influential news sources with traditional authority and credibility initiated huge spikes in tweets before a majority of all other news sources followed suit after the spikes. In essence, a few news sources set an agenda, popular discussion follows, and a majority of all other news sources trail afterwards, suggesting agenda theory does apply, but only to top-tier newspapers. Though there seems to be a larger and more diverse availability of news source, many of these news sources do not have the clout that the top-tier ones do. Thus, although it seems that the web has made news more diverse because it reduces overhead, the present analysis debunks this myth because it seems only a few news sources have the clout to effect general discussion. The internet has increased the quantity of news sources that exist, but not necessarily the number of those that influence. Instead, we see that despite the glorification of the internet as a replacement for traditional news sources, traditional elite newspapers remain at the front line of reporting and setting topics of discussion and that a majority of social media and internet blogs and smaller newspapers respond to these elite news sources’ original articles. A possible reason for this is that original content is better generated through institutions that specialize in producing it through a larger staff of dedicated journalists with budgets to travel and report, something that individuals on social media, smaller and less credible newspapers, and internet blogs are unable to afford, but that can be investigated at a later time.
The large amount of articles published following the announcement of the pope epitomizes this trend as the graph above shows. Most of the news articles are published following the spike. It almost seems as if some sources react to larger ones, just as twitter does. The diversification of information and news sources seem to be an illusion.
We started recording tweets right away after news broke about Pope Benedict XVI resigning. However, we did not record onto a static record the news articles from each news source since we assumed that they are more static. Instead, we used an importXML function in Google Spreadsheets and plugged in RSS feed addresses from Google News with queries like “pope+AND+(Benedict+OR+Francis)+www.nytimes.com” within a time range (February 10th to March 20th). The problem we ran into was that we didn’t realize that Google News restricts queries (arbitrarily or for monetary and server-load reasons) to within 30 days, without a way to search archived articles older than that period within a specific time period. Hence, our spreadsheet continually updated and cut off anything earlier than 30 days old, so at the end of our project we only had data for a month even though we had started the project 40 days ago, leading to 10 days where we had tweets but no article data. If we could go back in time and change our methodology, we would have started capturing the article data to a static copy instead of just using the spreadsheet with the dynamic importXML function. This easy fix in method would have given us much more data about the beginning of Pope Benedict’s resignation and a few more spikes to observe and analyze.
Further Research Questions
One trend we did see in our data that we did not have a chance to explore fully is the fact that there are instances where there is a large amounts of news being published, but no spikes in Twitter. maybe the diversification of information is revealed in this way. That is, people are not necessarily as concerned with some articles published since they have their own reading materials.
We can try separating tweets of journalists with twitter accounts and those who reblog them to see what the effect is of journalists actively participating in social media in driving conversation.
Additionally, we could use the data to track and explain unique behaviors of newspapers for purposes of characterization or measuring trustworthiness or reliability. We can track additional sources of data and in hopes of specifically answering vague questions like why Reuters has much more articles following the announcement of Pope Francis and not before? Is it an editorial decision or a result of market research? Is it a decision based on social media, or is it a decision based on resources and personnel available? Are other news stories such as the invasion of Mali by France taking up resources earlier in the development of the conclave announcement? Additional data would include publish dates and trending topics for other news headlines occurring concurrently with the publications of Pope-related articles and, most importantly, geolocation information of journalists via Twitter as they write articles on the topic of the pope as well as when they write topics on other articles before or after the Papal announcement. We could then build a hypothesis or argument from this data as to why certain news agencies acted the way they did. We can look for revisions in pages using any tool to capture version differences and see if articles that were quick to publish faced any mistakes by prioritizing publish dates over fact-checking. These behaviors could help both newspapers and newspaper organizations gauge themselves as well as readers with understanding biases or conditions affecting publication of stories.
Hayes, D. (2008). "Does the Messenger Matter? Candidate-Media Agenda Convergence and Its Effects on Voter Issue Salience". Political Research Quarterly 61 (1): 134–146.
O'Reilly, Tim. "What Is Web 2.0 - O'Reilly Media." O'Reilly Media - Technology Books, Tech Conferences, IT Courses, News. N.p., n.d. Web. 23 Mar. 2013. <http://oreilly.com/web2/archive/what-is-web-20.html>
Tubbs, Stewart L. Human Communication: Principles and Contexts. New York, NY: McGraw-Hill, 2010. Print.
This is a preliminary paper. While the data was collected over time and is correct, actual analysis, write up, and visualization was done within the span of two days. This information is provided without warranty.