Page 1
8/8/2019 What's on Wikipedia and What's Not
http://slidepdf.com/reader/full/whats-on-wikipedia-and-whats-not 1/34
F o r P
e e r R
e v i e w
Whatâ s on Wikipedia, and Whatâ s Notâ ¦?Completeness of Information on the Online Collaborative Encyclopedia
Journal: Journal of Computer-Mediated Communication
Manuscript ID: JCMC-07-186
Manuscript Type: Full-length Research Article
Keywords:Wikis, Social Network Analysis, Information Richness, OnlineCommunities
International Communication Association
Journal of Computer-Mediated Communication
Page 2
8/8/2019 What's on Wikipedia and What's Not
http://slidepdf.com/reader/full/whats-on-wikipedia-and-whats-not 2/34
F o r P
e e r R
e v i e w
What’s on Wikipedia and What’s Not…?
Completeness of Information on the Online Collaborative Encyclopedia
Abstract
The World Wide Web continues to grow closer to achieving the vision of becoming the
repository of all human knowledge. While improved search engines such as Google facilitate
access of knowledge across the Web, some sites have increased in popularity and have attracted
the attention of more Web users than others. Wikipedia is one such site that is becoming an
important resource for news and information. It is an online information source that is
increasingly used as the first, and sometimes only, stop for online encyclopedic information.
Much discussion has dealt with the accuracy of information on Wikipedia. While
accuracy is important, that is not what this project is measuring. Using a method employed by
Tankard and Royal (2005) to judge completeness of Web content, completeness of information
on Wikipedia is assessed. What we found was that some topics were covered more
comprehensively than others and that predictors of these biases included recency, importance,
population, and financial wealth.
Keywords: Wikipedia, wiki, social network, completeness of information, open source, online
community
ge 1 of 33
International Communication Association
Journal of Computer-Mediated Communication
Page 3
8/8/2019 What's on Wikipedia and What's Not
http://slidepdf.com/reader/full/whats-on-wikipedia-and-whats-not 3/34
F o r P
e e r R
e v i e w
What’s on Wikipedia, and What’s Not…?
2
Introduction
The World Wide Web continues to grow closer to achieving the vision of becoming the
repository of all human knowledge (Heylighten, 1995). While improved search engines such as
Google facilitate access of knowledge across the Web, some sites have increased in popularity
and have attracted the attention of more Web users than others. Wikipedia is one such site. It is
an online information source that is increasingly used as the first, and sometimes only, stop for
online encyclopedic information.
Wikipedia (www.wikipedia.org), deemed “the free encyclopedia,” was launched on the
Web in 2001. (Wikipedia:About, 2007)) It was started by Jimmy Wales, formerly a futures
trader in Chicago, as an open information source, allowing anyone with access to the Internet to
post or edit content on the site. Wikipedia uses the wiki software format, which is a
collaborative development environment. Established as a non-profit organization, Wikipedia
currently receives over 38 million unique visitors per month and is ranked #13 on ComScore
Media Metrix Top 50 Web Properties (Holiday Fever…, 2007). This open source project
operates under the assumption that more writers and editors are better than fewer, and that the
community will develop and monitor content in a manner that is improved over that of
traditional information publishing.
The open source concept has its roots in software development. One of its most notable
projects is the operating system Linux, which operates under the conditions of allowing and
encouraging multiple developers. Raymond (1997) compared this style of development using the
metaphor of bazaar and cathedral. “No quiet, reverent cathedral-building here - rather, the Linux
community seemed to resemble a great babbling bazaar of differing agendas and approaches
(aptly symbolized by the Linux archive sites, who'd take submissions from anyone) out of which
Page 2
International Communication Association
Journal of Computer-Mediated Communication
Page 4
8/8/2019 What's on Wikipedia and What's Not
http://slidepdf.com/reader/full/whats-on-wikipedia-and-whats-not 4/34
F o r P
e e r R
e v i e w
What’s on Wikipedia, and What’s Not…?
3
a coherent and stable system could seemingly emerge only by a succession of miracles.” Open
source is contrasted with propriety development environments in which only those with proper
license and authority can modify and implement source code. Benefits of this approach are the
inclusion of many and varied voices and agendas, the speed to which development can occur,
and policing of the environment by the community itself as opposed to regulatory or governing
bodies.
Wikipedia is now the Web's third most popular news and information source, with more
unique visitors than Yahoo News, MSNBC, AOL News, and CNN (Half of All U.S. Internet
Users…, 2006). Wikipedia's English-language version doubled in size last year and now has over
1 million articles. By this measure, it is almost 12 times larger than the print version of the
Encyclopaedia Britannica. It has over 100,000 contributors writing in 200 languages (The Wiki
Principle, 2006).
Wikipedia has become a popular site frequented by students, scholars, business people,
family members, and government officials for finding information on a variety of topics. But,
due to the open nature of contributions, much attention has been given to the level of accuracy of
information on Wikipedia. Many feel that Wikipedia’s policy of letting anyone create and edit
content causes the information to be inaccurate, misleading, or generally incorrect, both
purposefully and accidentally. Instances have occurred in which rumors and falsities have been
planted on Wikipedia articles. For example, a Wikipedia entry was created that falsely
implicated John Siegenthaler, Sr. in the Robert Kennedy assassination (Giles, 2005; Udell, 2004;
Johnson, 2006). While the error was eventually corrected, it was not done so before being picked
up by other information resources and seen by untold numbers of users. Still, the philosophy of
the site is that with so many people looking at the content, in the long run, accuracy will prevail.
ge 3 of 33
International Communication Association
Journal of Computer-Mediated Communication
Page 5
8/8/2019 What's on Wikipedia and What's Not
http://slidepdf.com/reader/full/whats-on-wikipedia-and-whats-not 5/34
F o r P
e e r R
e v i e w
What’s on Wikipedia, and What’s Not…?
4
Biases in editing content have been revealed on Wikipedia. A system was developed by
a Cal Tech graduate student to trace the IP addresses of edits done by self-interested parties
(Borland, 2007). These edits often entailed removal of negative information and addition of
positive or public relations material, thus bringing into question the objectivity and democratic
potential of the Wikipedia model.
Wikipedia has sought to counter some of the criticisms by instituting measures designed
to reduce the number of attacks on the credibility of information on the site. Volunteer
administrators monitor content on the site, and can now block users from editing content on
specific articles. Some articles are temporarily protected from editing, until the climate for the
attack has died down. Others, like the article on George W. Bush, are semi-protected and open
to editing only by people who had been registered on the site for at least four days. (Hafner,
2006). But according to Wales, Wikipedia's founder, this type of protection affects a tiny
fraction of the 1.2 million entries on the English-language site. ''Protection is a tool for quality
control, but it hardly defines Wikipedia,'' Mr. Wales said. ''What does define Wikipedia is the
volunteer community and the open participation.'' (Hafner, 2006)
Some studies have actually refuted Wikipedia’s position as a reliable information source.
In a recent study comparing the accuracy of science entries, Nature reported that Wikipedia’s
level of accuracy is close to that of Encyclopedia Britannica (Giles, 2005). The scientific journal
reported that, within 42 randomly selected general science articles, there were 162 mistakes in
Wikipedia versus 123 for Britannica, with the errors in Britannica being oriented towards
omissions rather than factual errors.
There is an indication that even librarians are finding value in the usage of Wikipedia
(Miller, et al., 2006). Attention to popular culture items and usage of links, objective presentation
Page 4
International Communication Association
Journal of Computer-Mediated Communication
Page 6
8/8/2019 What's on Wikipedia and What's Not
http://slidepdf.com/reader/full/whats-on-wikipedia-and-whats-not 6/34
Page 7
8/8/2019 What's on Wikipedia and What's Not
http://slidepdf.com/reader/full/whats-on-wikipedia-and-whats-not 7/34
F o r P
e e r R
e v i e w
What’s on Wikipedia, and What’s Not…?
6
particularly those that have become population information destinations. Wikipedia is a likely
candidate for analysis in that its goal is to provide information created and accessible by all with
an Internet connection, much like the Web itself.
Review of Literature
While there have been many articles questioning Wikipedia’s accuracy, few
communication studies have focused on Wikipedia. Lih (2004) studied news articles citing
Wikipedia and analyzed the trends in using Wikipedia as a source.
Denning, et al. (2005) listed several risks inherent in the Wikipedia model: accuracy,
motives, uncertain expertise, volatility, coverage, and sources. Of coverage, the authors said,
Voluntary contributions largely represent the interests and knowledge of a self-selected
set of contributors. They are not part of a careful plan to organize human knowledge.
Topics that interest the young and Internet-savvy are well covered, while events that happened “before the Web” may be covered inadequately or inaccurately, if at all. More
is written about current news than about historical knowledge.
Other studies have looked at Wikipedia’s strength as a reference source. Bill Katz
developed six fundamental evaluation criteria for reference work: purpose, authority, scope,
audience, cost, and format (Wallace and Van Fleet, 2005). Wikipedia did not perform well on
the brief analysis performed by Wallace and Van Fleet on these criteria. Value, however, was
identified in the democratic and timely circumstances under which articles are created and
revised. According to Bopp and Smith (2001), coverage in an encyclopedia reference source
“should be even across all subjects,” although “it is important to note that some subjects, by their
very nature, demand greater emphasis.” While Wikipedia boasts over 1 million articles, Wallace
and Van Fleet expressed that volume of articles alone is not a useful indicator of scope.
Page 6
International Communication Association
Journal of Computer-Mediated Communication
Page 8
8/8/2019 What's on Wikipedia and What's Not
http://slidepdf.com/reader/full/whats-on-wikipedia-and-whats-not 8/34
F o r P
e e r R
e v i e w
What’s on Wikipedia, and What’s Not…?
7
Like the Tankard and Royal study, this project challenges the notion that the Web may be
the repository of all human knowledge by assessing the coverage on one of its most popular
information destinations, Wikipedia. By making systematic measurements of the amount of
information on Wikipedia using the same dimensions, we attempt to identify factors that predict
Wikipedia’s completeness.
Borrowing methods from the Tankard and Royal study, this project measures the content
of Wikipedia against various indexes or standards of completeness to identify and uncover
potential inherent biases. Communication research provided direction in identifying predictor
variables. Journalism scholars have often included completeness as one of the basic concepts of
journalism. McQuail stated that completeness “is usually thought to be a precondition of proper
understanding of news, and the media generally promise completeness in the sense of a full
range of information about significant events of the day” (McQuail, 1992, p. 211).
In an early study of the completeness of newspaper coverage, Danielson and Adams
(1961) examined coverage of the 1960 presidential election campaign. They developed a list of
1,033 campaign events and then drew a random sample of 42 events to be used as a checklist
against which articles were judged.
Tankard and Showalter (1977), in their study of coverage of the 1972 Surgeon General’s
report on television violence, constructed an index of completeness by checking for presence or
absence of “three elements that were judged necessary for full reader understanding.” While the
present study does not focus on individual news stories, it borrows the technique of using a list of
facts or concepts as an effective means of measuring completeness.
Research on news flow has identified a number of factors that influence the presence or
absence of information. A related research approach—theoretical influences on mass media
ge 7 of 33
International Communication Association
Journal of Computer-Mediated Communication
Page 9
8/8/2019 What's on Wikipedia and What's Not
http://slidepdf.com/reader/full/whats-on-wikipedia-and-whats-not 9/34
F o r P
e e r R
e v i e w
What’s on Wikipedia, and What’s Not…?
8
content—has identified five major categories of influence on news content: the individual
journalist, media routines, the journalistic organization, extramedia sources, and ideology
(Shoemaker & Reese, 1995).
Current information is the bedrock of journalistic reporting (Berkowitz, 1990; McMillin,
1996; Curtin & Rhodenbaugh, 1999). With regard to the Web, currency comes into play in
another sense. Shoemaker & Reese (1995) identified the individual as a news influencer. Web
users and content creators tend to be young, with strong ties to current popular culture. The
contributors to Wikipedia are likely to mirror the demographics of the Web at large. This factor
would tend to weight the content of the Web, and ostensibly Wikipedia, toward material that
these individuals would be interested in—material of greater currency or recency.
Galtung & Ruge (1965), identified signal strength, or amplitude as another significant
factor influencing the flow of news. This factor might also be thought of as the importance of
information. When considering the probability of information being on Wikipedia, importance of
the information is likely to be a useful predictor, with the more important items having the most
attention paid to them.
Kariel and Rosenvall (1984) identified country population as an important predictor of
international news flow. Countries with larger populations have more individuals to become the
focus of news coverage, hold greater political influence, and have more people who could
potentially create and contribute to online content.
Shoemaker and Reese (1995, p. 190) suggested that capitalist-owned media content tends
to favor those with economic power. In addition, corporations that are larger have more market
impact, have larger budgets for advertising and public relations, and have influence on more
people.
Page 8
International Communication Association
Journal of Computer-Mediated Communication
Page 10
8/8/2019 What's on Wikipedia and What's Not
http://slidepdf.com/reader/full/whats-on-wikipedia-and-whats-not 10/34
F o r P
e e r R
e v i e w
What’s on Wikipedia, and What’s Not…?
9
Research Questions
Borrowing from the Tankard and Royal study of completeness of information on the
Web, the following research questions were developed as they related to Wikipedia:
1. Are there some systematic gaps or biases in the overall presentation of information
made available on Wikipedia?
2. Is recency (or currency) a predictor of amount of information on Wikipedia?
3. Is importance of information a predictor of amount of information on Wikipedia?
4. Is population a predictor of amount of information about particular countries on
Wikipedia?
5. Is economic power a predictor of amount of information about individual corporations
on Wikipedia?
Method
Using the same predictors as Tankard and Royal, recency, importance, country
population, and economic power, several systematic searches on Wikipedia were conducted.
Lists were developed within each of the dimensions, the contents of which are described in the
results section. Each term on the lists was searched using the Wikipedia search feature. A
determination was made as to the main page of content for that term. In some cases, such as the
countries of the United Nations, the list of countries on the United Nations page was used to find
the main article on a particular country. Each page was visited and the relevant content was
highlighted. Wikipedia navigation and other superfluous links that were not related to the actual
term being searched were not included in the selection. To capture the word count of items
selected on a page, an extension of the Firefox Web browser, Word Count, was downloaded
(http://roachfiend.com/archives/2005/03/03/word-count/). This extension counted the number of
ge 9 of 33
International Communication Association
Journal of Computer-Mediated Communication
Page 11
8/8/2019 What's on Wikipedia and What's Not
http://slidepdf.com/reader/full/whats-on-wikipedia-and-whats-not 11/34
F o r P
e e r R
e v i e w
What’s on Wikipedia, and What’s Not…?
10
words in the selection by simply using the Ctrl key on the computer’s keyboard. Word counts
were captured in a spreadsheet for each dimension. Items were plotted on charts, first in
ascending order, then by predictor variable. Items within dimensions were then compared and
correlated with predictor variables. When possible, the same search terms that were used in the
Tankard & Royal study were employed here.
All statistical analyses were conducted with Spearman (rank order) correlation
coefficients because parametric statistics (such as the Pearson correlation coefficient) are
inappropriate for L-shaped distributions (Bradley, 1982), which occurred with most of our data.
The correlations represent relative, as opposed to absolute, relationships.
Results
Several variables were used to test the currency dimension. First, using the same method
as Tankard and Royal, years were assessed. Wikipedia conveniently provided an article
depicting the highlights of each year. Figure 1a depicts the word count of each article in
ascending order, disregarding year. A backward L-shaped curve is evident. Figure 1b depicts
the word count by year in chronological order, starting with 1900 and going through 2010.
There is a clear progression of the length of each article with a dramatic increase occurring
starting in 2001. Years in the future, understandably, were shorter, given that there was not yet
much to write about them. The average word count for the years since 2001 was 90% greater
than the average for the entire preceding 100 years (4566 vs. 8692).
The chart in Figure 8 depicts correlations of dimensions variable with predictor variables.
The Spearman correlation for Years was .79, indicating a very strong relationship of article word
count to the recency of information.
Page 10
International Communication Association
Journal of Computer-Mediated Communication
Page 12
8/8/2019 What's on Wikipedia and What's Not
http://slidepdf.com/reader/full/whats-on-wikipedia-and-whats-not 12/34
F o r P
e e r R
e v i e w
What’s on Wikipedia, and What’s Not…?
11
Figure 1a
Years - Ascending Order
Figure 1b
Years – Chronological Order
0
2,000
4,000
6,000
8,000
10,000
12,000
1 9 17 25 33 41 49 57 65 73 81 89 97 105
0
2,000
4,000
6,000
8,000
10,000
12,000
1 9 0 0
1 9 0 6
1 9 1 2
1 9 1 8
1 9 2 4
1 9 3 0
1 9 3 6
1 9 4 2
1 9 4 8
1 9 5 4
1 9 6 0
1 9 6 6
1 9 7 2
1 9 7 8
1 9 8 4
1 9 9 0
1 9 9 6
2 0 0 2
2 0 0 8
ge 11 of 33
International Communication Association
Journal of Computer-Mediated Communication
Page 13
8/8/2019 What's on Wikipedia and What's Not
http://slidepdf.com/reader/full/whats-on-wikipedia-and-whats-not 13/34
F o r P
e e r R
e v i e w
What’s on Wikipedia, and What’s Not…?
12
Figure 2a shows the word count for articles on Wikipedia for the Academy Award
winning films in ascending order (Appendix A lists the films by year). This list was not searched
in the Tankard & Royal study, as it was difficult for them to identify only Web sites associated
with films with common names, such as Wings or Rebecca. However, this was made easier on
Wikipedia, with each film having a specific article associated with it. Another backward L-
shaped distribution is displayed. With few exceptions, such as Gone with the Wind (1939) and
Casablanca (1943) the analysis in Figure 2b plotted by year (1928-2005) shows a progression
favoring more current films. This demonstrates that while recency is an important predictor,
some films transcend time and are deemed important for other reasons, and thus have a strong
share of coverage on Wikipedia. The average word count for the films since 2001 was 80%
higher than the average word count for the time prior to 2000 (3190 vs. 1771). These last five
years accounted for 11% of the total word count for the 78 years of the award. The Spearman
correlation for films over years was .49 (see Figure 8), but that increased to .62 simply by
removing the two outliers mentioned above. This indicates a strong relationship between word
count and time for films.
Page 12
International Communication Association
Journal of Computer-Mediated Communication
Page 14
8/8/2019 What's on Wikipedia and What's Not
http://slidepdf.com/reader/full/whats-on-wikipedia-and-whats-not 14/34
F o r P
e e r R
e v i e w
What’s on Wikipedia, and What’s Not…?
13
Figure 2a
Films - Ascending Order
Figure 2b
Films – By Year
0
1,000
2,000
3,000
4,000
5,000
6,000
7,000
8,000
9,000
1 9 2 8
1 9 3 2
1 9 3 6
1 9 4 0
1 9 4 4
1 9 4 8
1 9 5 2
1 9 5 6
1 9 6 0
1 9 6 4
1 9 6 8
1 9 7 2
1 9 7 6
1 9 8 0
1 9 8 4
1 9 8 8
1 9 9 2
1 9 9 6
2 0 0 0
2 0 0 4
0
1,000
2,000
3,000
4,000
5,000
6,000
7,000
8,000
9,000
1 5 9 13 17 21 2 5 29 33 37 4 1 45 49 5 3 57 61 65 69 73 77
ge 13 of 33
International Communication Association
Journal of Computer-Mediated Communication
Page 15
8/8/2019 What's on Wikipedia and What's Not
http://slidepdf.com/reader/full/whats-on-wikipedia-and-whats-not 15/34
F o r P
e e r R
e v i e w
What’s on Wikipedia, and What’s Not…?
14
Figures 3a and 3b show another test of recency not performed in the Tankard & Royal
study by looking at Time Magazine’s Person of the Year (Appendix B lists the people by year).
Some years that did not include an individual were discarded (for example, in 2002, “The
Whistleblowers”). Figure 3a shows a backward L-shaped distribution when disregarding time,
although not as steep as some of the others experienced in this analysis. The progression appears
evenly distributed, only slightly skewed to the upper half of the distribution (the median was
93% of the average). But, Figure 3b shows a more random pattern than those experienced with
Year and Film. The Spearman correlation (see Figure 8) for recency was close to 0, thus
indicating no relationship with time. This indicates that while a bias is evidenced in the
consistently upward progression of Figure 3a, the bias is not due to recency in regard to Person
of the Year, but perhaps to some other measure of importance.
Page 14
International Communication Association
Journal of Computer-Mediated Communication
Page 16
8/8/2019 What's on Wikipedia and What's Not
http://slidepdf.com/reader/full/whats-on-wikipedia-and-whats-not 16/34
F o r P
e e r R
e v i e w
What’s on Wikipedia, and What’s Not…?
15
Figure 3a
Time’s Person of the Year - Ascending Order
Figure 3b
Time’s Person of the Year - By Year
0
2,000
4,000
6,000
8,000
10,000
12,000
14,000
16,000
18,000
1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61
0
2,000
4,000
6,000
8,000
10,000
12,000
14,000
16,00018,000
1 9 2 7
1 9 3 1
1 9 3 5
1 9 3 9
1 9 4 3
1 9 4 7
1 9 5 2
1 9 5 7
1 9 6 2
1 9 6 7
1 9 7 4
1 9 7 9
1 9 8 5
1 9 9 1
1 9 9 6
2 0 0 1
ge 15 of 33
International Communication Association
Journal of Computer-Mediated Communication
Page 17
8/8/2019 What's on Wikipedia and What's Not
http://slidepdf.com/reader/full/whats-on-wikipedia-and-whats-not 17/34
F o r P
e e r R
e v i e w
What’s on Wikipedia, and What’s Not…?
16
Another search that was added that was not performed in the Tankard & Royal study was
to consider musical artists over time. An artist holding the #1 song on the Billboard Top 100 for
the first week in February of each year since 1940 was selected (Appendix C shows artists by
year). Figure 4a depicts the word count of the main Wikipedia article associated with that artist
in ascending order for each of the selected artists, again depicting the backward L-shaped
distribution. Figure 4b shows each artist by year. While the pattern in the graph appears to
indicate a random distribution, the Spearman correlation with time was .30 (See Figure 8). By
eliminating just two outliers (Bing Crosby – 1945 and the Beatles – 1964), the correlation
increases to .40. The average word count for the artists since 1990 was 32% higher than for the
years from 1940-1989 (3332 vs. 2511). Similar to the trends found in film, it shows that while
the recency relationship is strong, some artists transcend time and receive more coverage on
Wikipedia than would be indicated by their currency.
Page 16
International Communication Association
Journal of Computer-Mediated Communication
Page 18
8/8/2019 What's on Wikipedia and What's Not
http://slidepdf.com/reader/full/whats-on-wikipedia-and-whats-not 18/34
F o r P
e e r R
e v i e w
What’s on Wikipedia, and What’s Not…?
17
Figure 4a
Artists with #1 Songs on Billboard – Ascending Order
Figure 4b
Artists with #1 Songs on Billboard – By Year
0
2,000
4,000
6,000
8,000
10,000
12,000
14,000
1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65
0
2,000
4,000
6,000
8,000
10,000
12,000
14,000
1 9 4 0
1 9 4 3
1 9 4 6
1 9 4 9
1 9 5 2
1 9 5 5
1 9 5 8
1 9 6 1
1 9 6 4
1 9 6 7
1 9 7 0
1 9 7 3
1 9 7 6
1 9 7 9
1 9 8 2
1 9 8 5
1 9 8 8
1 9 9 1
1 9 9 4
1 9 9 7
2 0 0 0
2 0 0 3
2 0 0 6
ge 17 of 33
International Communication Association
Journal of Computer-Mediated Communication
Page 19
8/8/2019 What's on Wikipedia and What's Not
http://slidepdf.com/reader/full/whats-on-wikipedia-and-whats-not 19/34
F o r P
e e r R
e v i e w
What’s on Wikipedia, and What’s Not…?
18
To measure comprehensiveness of information, we used the same random sample
employed in the Tankard & Royal study of 100 topics from the Micropaedia of the
Encyclopaedia Britannica (See Appendix D). Figure 6 shows the word count of each term’s main
page on Wikipedia. Once again, a backward L-shaped distribution emerged. Of the 100 items,
14 did not have a Wikipedia entry (included phrases such as “Russian Association of Proletariat
Writers,” “League for the Independence of Vietnam,” and “urethane”). Fifteen of the terms had
articles with a word count of 2000 or more. The average word count for those 15 terms was 5
times that of the average word count for the other items on the list with Wikipedia articles.
A Spearman correlation was used to compare inches of content in the Micropaedia of the
Encyclopaedia Britannica with word count on Wikipedia. This correlation was calculated at .26,
indicating some relationship with the importance placed on information in the traditional
encyclopedia with that in Wikipedia (See Figure 8). In some cases, the articles on Wikipedia
indicated that the content had been derived from a print encyclopedia source. There was no time
dimension or other predictor variable with which to compare for encyclopedia terms.
Page 18
International Communication Association
Journal of Computer-Mediated Communication
Page 20
8/8/2019 What's on Wikipedia and What's Not
http://slidepdf.com/reader/full/whats-on-wikipedia-and-whats-not 20/34
F o r P
e e r R
e v i e w
What’s on Wikipedia, and What’s Not…?
19
Figure 5
Encyclopedia Terms – Ascending Order
0
2,000
4,000
6,000
8,000
10,000
12,000
14,000
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99
ge 19 of 33
International Communication Association
Journal of Computer-Mediated Communication
Page 21
8/8/2019 What's on Wikipedia and What's Not
http://slidepdf.com/reader/full/whats-on-wikipedia-and-whats-not 21/34
F o r P
e e r R
e v i e w
What’s on Wikipedia, and What’s Not…?
20
Figure 6a shows the word count for the main Wikipedia article by country in the United
Nations in ascending order. Articles were analyzed for all 192 countries of the United Nations.
Once again, a backward L-shaped distribution emerged. The distribution is fairly even, with a
sharp increase experienced for the top 22 countries. Figure 6b shows a gradual upward
distribution when charted in order by population (higher number indicates higher population).
Spearman correlation for countries with population was .55, indicating that the larger countries
were more represented on Wikipedia in terms of word count per article (see Figure 8). The top
10% of countries by population accounted for 15% of the total word count for country articles
and the average word count for the top 10% of countries was 63% higher than those on the rest
of the list.
Page 20
International Communication Association
Journal of Computer-Mediated Communication
Page 22
8/8/2019 What's on Wikipedia and What's Not
http://slidepdf.com/reader/full/whats-on-wikipedia-and-whats-not 22/34
F o r P
e e r R
e v i e w
What’s on Wikipedia, and What’s Not…?
21
Figure 6a
Countries in UN – Ascending Order
Figure 6b
Countries of the UN - Ordered By Population
0
2,000
4,000
6,000
8,000
10,000
12,000
14,000
1 13 25 37 49 61 73 85 97 109 121 133 145 157 169
0
2,000
4,000
6,000
8,000
10,000
12,000
14,000
1 13 25 37 49 61 73 85 97 109 121 133 145 157 169 181 193
ge 21 of 33
International Communication Association
Journal of Computer-Mediated Communication
Page 23
8/8/2019 What's on Wikipedia and What's Not
http://slidepdf.com/reader/full/whats-on-wikipedia-and-whats-not 23/34
F o r P
e e r R
e v i e w
What’s on Wikipedia, and What’s Not…?
22
Figure 7a shows the word count for a random selection of 86 Fortune 1000 companies in
ascending order (Appendix E lists the selection of companies). This chart shows the backward
L-shaped distribution with a sharp increase for 10% of the companies. Another 10% of the
companies did not have Wikipedia entries. Figure 7b shows the companies ranked by revenue
(higher number indicates higher revenue). The chart shows a distribution trending toward
increased word count for companies with the highest revenue. The Spearman correlation for
word count of these articles with company revenue was .49. The top 10% of the companies by
revenue accounted for 30% of the total word count for articles about companies.
Page 22
International Communication Association
Journal of Computer-Mediated Communication
Page 24
8/8/2019 What's on Wikipedia and What's Not
http://slidepdf.com/reader/full/whats-on-wikipedia-and-whats-not 24/34
F o r P
e e r R
e v i e w
What’s on Wikipedia, and What’s Not…?
23
Figure 7a
Fortune 1000 Companies – Ascending Order
Figure 7b
Fortune 1000 Companies - By Revenue
0
1,000
2,000
3,000
4,000
5,000
6,000
1 5 9 13 1 7 21 2 5 29 33 3 7 41 45 49 5 3 57 6 1 65 69 7 3 77 8 1 85
0
1,000
2,000
3,000
4,000
5,000
6,000
1 5 9 1 3
1 7
2 1
2 5
2 9
3 3
3 7
4 1
4 5
4 9
5 3
5 7
6 1
6 5
6 9
7 3
7 7
8 1
8 5
ge 23 of 33
International Communication Association
Journal of Computer-Mediated Communication
Page 25
8/8/2019 What's on Wikipedia and What's Not
http://slidepdf.com/reader/full/whats-on-wikipedia-and-whats-not 25/34
Page 26
8/8/2019 What's on Wikipedia and What's Not
http://slidepdf.com/reader/full/whats-on-wikipedia-and-whats-not 26/34
F o r P
e e r R
e v i e w
What’s on Wikipedia, and What’s Not…?
25
In terms of country population, biases toward larger countries were found and were
positively correlated with country size. This indicates that the democratic nature of Wikipedia
on its own cannot counteract the effects of the magnitude of people that are available to
participate.
And, in regard to Fortune 1000 companies, those with larger revenue streams and
resources are more likely to have greater coverage on Wikipedia. This points to the strength of
financial power in circumventing any type of democratizing feature of an online space.
Conclusion
In some ways, this was a more straightforward study than the one performed by Tankard
& Royal. In their study, they had difficulty in determining whether certain searches were
capturing all the information on a topic while not including irrelevant information. For example,
search for years in a search engine can provide references to the numbers rather than the years.
They attempted to alleviate this problem by searching for the word “year” before the numerical
year and putting quotation marks around that text string. This did not capture hits regarding
years that were not preceded by the word “year”. Some searches were difficult to perform if the
topics were not presented consistently, as in the encyclopedia terms.
Shariatmadari (2006) identified characteristics of Wikipedia that make this case as well.
Wikipedia is specifically intended as a work of reference while using a search engine is not. A
search engine’s purpose is to identify various sites as opposed to finding immediate context.
Shariatmadari also indicated some coverage issues with Wikipedia, finding content more on
popular culture and science fiction than history.
ge 25 of 33
International Communication Association
Journal of Computer-Mediated Communication
Page 27
8/8/2019 What's on Wikipedia and What's Not
http://slidepdf.com/reader/full/whats-on-wikipedia-and-whats-not 27/34
F o r P
e e r R
e v i e w
What’s on Wikipedia, and What’s Not…?
26
In general, the searches on Wikipedia revealed individual articles on each topic, making
it easier to identify the relevancy of it to the search item. And, Wikipedia conveniently provided
an article stub for each year. A stub is an empty article that is ready to receive content. While this
approach does not measure all the content on Wikipedia related to a particular year, it does
provide one indicator of the amount of coverage and attention given to a year. Additional
searches that were not done in the Tankard & Royal study were performed on the recency or
currency dimension to help improve this area, including Time Magazine Person of the Year,
Academy Award Winning Films by Year, and Artists having #1 Songs by Year.
Length of the individual article was all that was included in the Word Count for each
topic. One feature of the Web that is also a feature of Wikipedia is the usage of links. Most
articles included links to other articles that enhanced or augmented the content of a particular
stub. Often these links are tangents, describing other people or events mentioned in the article.
Trying to capture the word count of associated links would have made for an unwieldy study.
Information on Wikipedia is extremely volatile and dynamic. Articles can change
dramatically over time. This study was performed during November 2006 and each search
within a variable was performed on the same day during the same time period, to improve the
comparison of that information. This project merely captures the presence of information in the
timeframe under analysis. Some of the biases uncovered may subside or change over time. So,
while this study uncovered important biases in information being presented on Wikipedia, it will
be important to continue research in the area of measuring both accuracy as well as completeness
of information on online sites that are becoming important information resources, particularly
those taking advantage of the democratic and open source features of the technology.
Page 26
International Communication Association
Journal of Computer-Mediated Communication
Page 28
8/8/2019 What's on Wikipedia and What's Not
http://slidepdf.com/reader/full/whats-on-wikipedia-and-whats-not 28/34
F o r P
e e r R
e v i e w
What’s on Wikipedia, and What’s Not…?
27
Bibliography
Berkowitz, D. (1990). Refining the gatekeeping metaphor for local television news. Journal of
Broadcasting & Electronic Media, 34, p. 55-69.
Bopp, R.E. & Smith, L.C. (2001), Reference and Information Services: An Introduction, 3rd ed.,
Englewood, Colorado, Libraries Unlimited, p. 436.
Borland, John (August 14, 2007), See Who's Editing Wikipedia - Diebold, the CIA, a Campaign,
Wired (http://www.wired.com/politics/onlinerights/news/2007/08/wiki_tracker).
Bradley, J. V. (1982). The insidious L-shaped distribution. Bulletin of the Psychonomic Society,
20, p. 85-88.
Curtin, P. A., & Rhodenbaugh, E. (1999). It's not easy being green: Building the news media
agenda on the environment. Association for Education in Journalism and Mass
Communication. New Orleans, LA.
Danielson, W. A., & Adams, J. B. (1961). Completeness of coverage of the 1960 campaign.
Journalism Quarterly, 38, p. 441-452.
Denning, P., Horning, J., Parnas, D., & Weinstein, L. (2005, December). Wikipedia Risks.
Communications of the ACM.
Giles, J. (2005, December). Internet Encyclopaedias Go Head to Head. Nature.
Hafner, K. (2006, June 17). Growing Wikipedia Revises Its 'Anyone Can Edit' Policy.” New
York Times.
Half of All U.S. Internet Users Visited News Sites in June 2006 (2006, August 7), ComScore
Media Metrix Press Release, http://www.comscore.com/press/release.asp?press=971.
Heylighten, F. (1995). From World-Wide Web to Super-Brain. Principia Cybernetica Web.
Retrieved April 22, 2002, from http://pespmc1.vub.ac.be/SUPBRAIN.html
Holiday Fever Drives Traffic to Shopping Sites in December (2007, January 16,, ComScoreMedia Metrix Press Release, http://www.comscore.com/press/release.asp?id=1177.
Johnson, G. (2006, January 3). The Nitpicking of the Masses vs. the Authority of the Experts. New York Times.
Kariel, H. G., & Rosenvall, L. A. (1984). Factors influencing international news flow.
Journalism Quarterly, 61, 509-516.
Lih, A. (2004, April 16-17). Wikipedia as Participatory Journalism: Reliable Sources? Metrics
for evaluating collaborative media as a news resource. 5th International Symposium on
Online Journalism. University of Texas at Austin.
ge 27 of 33
International Communication Association
Journal of Computer-Mediated Communication
Page 29
8/8/2019 What's on Wikipedia and What's Not
http://slidepdf.com/reader/full/whats-on-wikipedia-and-whats-not 29/34
F o r P
e e r R
e v i e w
What’s on Wikipedia, and What’s Not…?
28
McMillin, D .C. (1996). Roles journalists play: An examination of journalists' roles as
manifested in samples of their best work. Association for Education in Journalism and Mass Communication. Anaheim, CA.
McQuail, D. (1992). Media performance: Mass communication and the public interest . London:Sage.
Miller, B.X., Helicher, K., & Berry, T. (2006, April 1). I Want My Wikipedia. Library Journal .
Raymond, Eric (1997), The Cathedral and the Bazaar, First Monday, http://www.firstmonday.org/issues/issue3_3/raymond/.
Shariatmadari, D. (2006, Jul/Aug). Is A Million Articles Proof of Authentic Information? Intermedia, 34, 3, p. 17.
Shoemaker, P., & Reese, S. (1995). Mediating the message: Theories of influences on mass
media content . Reading, MA: Addison-Wesley.
Tankard, J. W. & Royal, C. L. (2005), Finding Out What's On the World Wide Web,Communication Impact: Designing Research that Matters, Susanna Hornig Priest, Ph.D.,
Editor, p. 253-264.
Tankard, J. W. & Royal, C. L. (2005, Fall), What’s on the Web and What’s Not, Social Science
Computer Review.
Tankard, J. W., & Showalter, S. S. (1977). Press coverage of the 1972 report on television andsocial behavior. Journalism Quarterly, 54, p. 293-298.
The Wiki Principle. (2006, April 22). The Economist , 379, p. 14-15.
Udell, J. (2004, January 9). Wikipedia’s Future. Retrieved from http://www.infoworld.com
Voss, J. (2005). Measuring Wikipedia. In Proceedings International Conference of the
International Society for Scientometrics and Informetrics. Stockholm (Sweden).
Wallace, D. & Van Fleet, C. (2005). The Democratization of Information? Wikipedia as a
Reference Resource. Reference and User Services Quarterly, 45.
Wikipedia:About (2007), http://en.wikipedia.org/wiki/Wikipedia:About, accessed 1/23/2007.
Page 28
International Communication Association
Journal of Computer-Mediated Communication
Page 30
8/8/2019 What's on Wikipedia and What's Not
http://slidepdf.com/reader/full/whats-on-wikipedia-and-whats-not 30/34
F o r P
e e r R
e v i e w
What’s on Wikipedia, and What’s Not…?
29
Appendix A: Academy Award Winning Films
1928 Wings1929 Broadway Melody
1930 All Quiet on the Western Front
1931 Cimarron1932 Grand Hotel
1933 Cavalcade
1934 It Happened One Night
1935 Mutiny on the Bounty1936 The Great Ziegfeld
1937 The Life of Emile Zola
1938 You Can't Take it With You1939 Gone with the Wind
1940 Rebecca
1941 How Green Was My Valley
1942 Mrs. Miniver 1943 Casablanca
1944 Going My Way
1945 The Lost Weekend1946 The Best Years of Our Lives
1947 Gentleman's Agreement
1948 Hamlet1949 All the King's Men
1950 All about Eve
1951 An American in Paris1952 The Greatest Show on Earth
1953 From Here to Eternity
1954 On the Waterfront1955 Marty
1956 Around the World in 80 Days
1957 The Bridge on the River Kwai
1958 Gigi1959 Ben-Hur
1960 The Apartment
1961 West Side Story1962 Lawrence of Arabia
1963 Tom Jones
1964 My Fair Lady1965 The Sound of Music
1966 A Man for All Seasons
1967 In the Heat of the Night
1968 Oliver!1969 Midnight Cowboy
1970 Patton
1971 The French Connection1972 The Godfather
1973 The Sting
1974 The Godfather Part II
1975 One Flew Over the Cuckoo's Nest1976 Rocky
1977 Annie Hall
1978 The Deer Hunter 1979 Kramer vs. Kramer
1980 Ordinary People
1981 Chariots of Fire
1982 Gandhi1983 Terms of Endearment
1984 Amadeus
1985 Out of Africa1986 Platoon
1987 The Last Emperor
1988 Rain Man1989 Driving Miss Daisy
1990 Dances With Wolves
1991 The Silence of the Lambs1992 Unforgiven
1993 Schindler's List
1994 Forrest Gump1995 Braveheart
1996 The English Patient
1997 Titanic
1998 Shakespeare in Love1999 American Beauty
2000 Gladiator
2001 A Beautiful Mind2002 Chicago
2003 Lord of the Rings: Return of the
King2004 Million Dollar Baby
2005 Crash
ge 29 of 33
International Communication Association
Journal of Computer-Mediated Communication
Page 31
8/8/2019 What's on Wikipedia and What's Not
http://slidepdf.com/reader/full/whats-on-wikipedia-and-whats-not 31/34
F o r P
e e r R
e v i e w
What’s on Wikipedia, and What’s Not…?
30
Appendix B: Time Person of the Year
1927 Charles Augustus Lindbergh1928 Walter P. Chrysler
1929 Owen D. Young
1930 Mohandas Karamchand Gandhi1931 Pierre Laval
1932 Franklin Delano Roosevelt
1933 Hugh Samuel Johnson
1934 Franklin Delano Roosevelt1935 Haile Selassie
1936 Mrs. Wallis Warfield Simpson
1937 Chiang Kai-Shek 1938 Adolf Hitler
1939 Joseph Stalin
1940 Winston Leonard Spencer Churchill
1941 Franklin Delano Roosevelt1942 Joseph Stalin
1943 George Catlett Marshall
1944 Dwight David Eisenhower 1945 Harry Truman
1946 James F. Byrnes
1947 George Catlett Marshall1948 Harry Truman
1949 Winston Leonard Spencer Churchill
1951 Mohammed Mossadegh1952 Elizabeth II
1953 Konrad Adenauer
1954 John Foster Dulles1955 Harlow Herbert Curtice
1957 Nikita Krushchev
1958 Charles De Gaulle
1959 Dwight David Eisenhower 1961 John Fitzgerald Kennedy
1962 Pope John XXIII1963 Martin Luther King Jr.
1964 Lyndon B. Johnson
1965 General William ChildsWestmoreland
1967 Lyndon B. Johnson
1970 Willy Brandt
1971 Richard Milhous Nixon1973 John J. Sirica
1974 King Faisal
1976 Jimmy Carter 1977 Anwar Sadat
1978 Teng Hsiao-P'ing
1979 Ayatullah Khomeini
1980 Ronald Reagan1981 Lech Walesa
1984 Peter Ueberroth
1985 Deng Xiaoping1986 Corazon Aquino
1987 Mikhail Sergeyevich Gorbachev
1989 Mikhail Sergeyevich Gorbachev1991 Ted Turner
1992 Bill Clinton
1994 Pope John Paul II1995 Newt Gingrich
1996 Dr. David Ho
1997 Andy Grove1999 Jeff Bezos
2000 George W. Bush
2001 Rudy Giuliani
2004 George W. Bush
Page 30
International Communication Association
Journal of Computer-Mediated Communication
Page 32
8/8/2019 What's on Wikipedia and What's Not
http://slidepdf.com/reader/full/whats-on-wikipedia-and-whats-not 32/34
F o r P
e e r R
e v i e w
What’s on Wikipedia, and What’s Not…?
31
Appendix C: Artist with Billboard #1 Song First Week of February
1940 Tommy Dorsey1941 Artie Shaw
1942 Glenn Miller
1943 Harry James1944 Glen Gray
1945 Bing Crosby
1946 Vaughn Monroe
1947 Sammy Kaye1948 Vaughn Monroe
1949 Evelyn Knight
1950 Andrews Sisters1951 Patti Page
1952 Johnny Ray
1953 Perry Como
1954 Eddie Fisher 1955 Joan Weber
1956 Dean Martin
1957 Guy Mitchell1958 Danny and the Juniors
1959 The Platters
1960 Johnny Preston1961 The Shirelles
1962 Joey Dee & the Starliters
1963 The Rooftop Singers1964 The Beatles
1965 Petula Clark
1966 Simon & Garfunkel1967 The Monkees
1968 John Fred & His Playboy Band
1969 Tommy James & the Shondells
1970 The Jackson 51971 Dawn
1972 Don McLean
1973 Stevie Wonder
1974 Ringo Starr 1975 Neil Sedaka
1976 Ohio Players
1977 Rose Royce1978 Player
1979 Chic
1980 Michael Jackson
1981 Blondie1982 Daryl Hall & John Oates
1983 Men at Work
1984 Yes1985 Madonna
1986 Dionne Warwick
1987 Billy Vera and The Beaters
1988 INXS1989 Phil Collins
1990 Michael Bolton
1991 Surface1992 George Michael
1993 Whitney Houston
1994 Mariah Carey1995 TLC
1996 Boyz II Men
1997 Toni Braxton1998 Janet Jackson
1999 Britney Spears
2000 Savage Garden2001 Destiny's Child
2002 Nickelback
2003 B2K
2004 OutKast2005 Mario
2006 Nelly
ge 31 of 33
International Communication Association
Journal of Computer-Mediated Communication
Page 33
8/8/2019 What's on Wikipedia and What's Not
http://slidepdf.com/reader/full/whats-on-wikipedia-and-whats-not 33/34
F o r P
e e r R
e v i e w
What’s on Wikipedia, and What’s Not…?
32
Appendix D: Random Selection of Encyclopedia Terms
actinAin River
Albert Nile
alkaline phosphatoseAn Srath Ban
analytic geometry
Antigonish
Augsburg ConfessionAugust Baron Lambermont
Big Sandy River
Bogdah botanical garden
calamine brass
cecum
Central African Federationcesura
Copernicus
Cote-Saint-LucDarcy's Law
domestication
dragondysmenorrhea
Earl Carroll
electromagnetic inductionequine
Fort Portal
frame designGeorge Edward Stanhope
Molyneux Herbert
Giacomo Meyerbeer
gyroscopeHans Geiger
Haratin
Hei-hoHenry Jackson
Herman Busenbaum
horntailhumour
Ijma
Ismail Gasprinski
Itaipu DamJack Miner
Jacobus van Looy
James H. Doolittle
Jefferson DavisJohn Davis
Kobenhavn
League for theIndependence of Vietnam
Lennart Torstenson
Leroy Randle Grumman
Louis-Armand de Lomd'Arce
Lydd
MabMaes
Maravi Confederacy
Marc A. Mitscher
Marcus Eremitamarlin
Max Weber
Mistinquett Nicomachus
Normandy Invasion
Olympiasostinato
paper
Paris BasinPastoral Epistles
Paul Signac
Philip Schaff Pierre Nicole
Ponca
PTA
Quintus Fabius Pictor rampion
Republican River
Robert Lansingrose moss
Rudolph Jacob Camerarius
Russian Association of Proletarian Writers
Saint George
Saint Irenaeus
SalemSalon
silica
Socialist Realism
Sporting RecordSterling Price
Suceava
sun rosetemenggong
The Spectator
Time magazine
Tirso de MolinaUniversal Declaration of
Human Rights
urethanevideodisc
Wesselenyi Conspiracy
William Pole
William Wrigley, Jr.Zenshin
Ziya Gokalp
Page 32
International Communication Association
Journal of Computer-Mediated Communication
Page 34
8/8/2019 What's on Wikipedia and What's Not
http://slidepdf.com/reader/full/whats-on-wikipedia-and-whats-not 34/34
F o r P
e e r R
e v i e w
What’s on Wikipedia, and What’s Not…?
Appendix E: Random Selection of Fortune 1000 companies
Air Products & ChemicalsAlberto-Culver
Allegheny Energy
America West HoldingsAmerican Greetings
Armstrong Holdings
AT&T
Auto-Owners InsuranceAutoZone
Avnet
Avon ProductsBellSouth
Benchmark Electronics
Beverly Enterprises
BoeingBriggs & Stratton
Cablevision Systems
ChubbCitizens Communications
Colgate-Palmolive
ConAgraCountrywide Credit
CUNA Mutual Group
Eastman Kodak Echostar Communications
Ecolab
El PasoEli Lilly
Energy East
Equity Office Properties
Expeditors International of WashingtonGap Inc.
General Motors
Genesis Health VenturesGold Kist
Goodrich
Great Plains EnergyH.B. Fuller
Hershey Foods
Hewlett-Packard
Hilton HotelsHome Depot
Hovnanian Enterprises
Humana
Ikon Office SolutionsITT Industries
J.C. Penney
KB HomeKellogg Company
Kellwood
Knight-Ridder
Legg MasonLehman Brothers
Lennar
Lennox InternationalLockheed Martin
LSI Logic
Mandalay Resort Group
Manor CareManpower Inc.
Marsh & McLennan
McKessonMDC Holdings
Mutual of Omaha
National Fuel Gas Northwest Airlines
Omnicare
Phelps DodgePNC Financial
Primedia
SafecoSchering-Plough
Scientific-Atlanta
Sentry Insurance Group
Snap-OnSonoco Products
SPX
Stanley WorksStarbucks
Sun Microsystems
Swift TransportationTenneco Automotive
Thermo Electron
Viacom
Walt DisneyWestern Digital
ge 33 of 33 Journal of Computer-Mediated Communication