-
Uncovering the Dark Web: A Case Study of Jihadon the Web
Hsinchun ChenArtificial Intelligence Lab, Department of
Management Information Systems, The University of Arizona,Tucson,
AZ 85721, USA. E-mail: [email protected]
Wingyan ChungDepartment of Operations and Management Information
Systems, Leavey School of Business, Santa ClaraUniversity, Santa
Clara, CA 95053, USA. E-mail: [email protected]
Jialun QinManagement Department, College of Management,
University of Massachusetts Lowell, Lowell, MA 01854,USA. E-mail:
[email protected]
Edna ReidDepartment of Library Science, Clarion University,
Clarion, PA 16214, USA. E-mail: [email protected]
Marc SagemanThe Solomon Asch Center for Study of Ethnopolitical
Conflict, University of Pennsylvania, Philadelphia,PA 19104, USA.
E-mail: [email protected]
Gabriel WeimannDepartment of Communication, University of Haifa,
Haifa 31905, Israel. E-mail: [email protected]
While the Web has become a worldwide platform forcommunication,
terrorists share their ideology and com-municate with members on
the Dark Webthe reverseside of the Web used by terrorists.
Currently, the prob-lems of information overload and difficulty to
obtain acomprehensive picture of terrorist activities hinder
effec-tive and efficient analysis of terrorist information on
theWeb. To improve understanding of terrorist activities,we have
developed a novel methodology for collectingand analyzing Dark Web
information. The methodologyincorporates information collection,
analysis, and visual-ization techniques, and exploits various Web
informationsources. We applied it to collecting and analyzing
infor-mation of 39 JihadWeb sites and developed visualizationof
their site contents, relationships,and activity levels. Anexpert
evaluation showed that the methodology is veryuseful and promising,
having a high potential to assist ininvestigation and understanding
of terrorist activities byproducing results that could potentially
help guide bothpolicymaking and intelligence research.
Received September 20, 2006; revised June 29, 2007; accepted
January 4,2008
2008 ASIS&T Published online 7 April 2008 in Wiley
InterScience(www.interscience.wiley.com). DOI:
10.1002/asi.20838
1. IntroductionThe Internet has evolved to become a global
platform
through which anyone can conveniently disseminate, share,and
communicate ideas. Despite many advantages, misuseof the Internet
has become ever more serious, however.Terrorist organizations,
extremist groups, hate groups, andracial supremacy groups are using
the Web to promote theirideology, to facilitate internal
communications, to attacktheir enemies, and to conduct criminal
activities. Warningshave been made that terrorists may launch
attacks on suchcritical infrastructure as major e-commerce sites
and govern-mental networks (Gellman, 2002). Insurgents in Iraq
haveposted Web messages asking for munitions, financial support,and
volunteers (Blakemore, 2004). It therefore has becomeimportant to
obtain from the Web intelligence that permitsbetter understanding
and analysis of terrorist and extremistgroups. We define this
reverse side of the Web as a DarkWeb, the portion of the World Wide
Web used to help achievethe sinister objectives of terrorists and
extremists.
Currently, intelligence from the Dark Web is scattered indiverse
information repositories through which investigatorsneed to browse
manually to be aware of their content. Much
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND
TECHNOLOGY, 59(8):13471359, 2008
-
of the information stored in search engine databases couldbe
properly collected and analyzed for transformation intointelligence
and knowledge that would enhance understand-ing of terrorists
activities. However, search engines oftenoverwhelm users by
producing laundry lists of irrelevantresults and creating
information overload problems. Relatedbut unfocused information
makes it difficult to obtain a com-prehensive description of a
terrorist group or a terrorismtopic. Many Web resources contain
information about ter-rorism, but a relatively small proportion
comes from terroristgroups themselves and data on the Web often are
not persis-tent and may be misleading. Many terrorist Web sites do
notuse English, so investigators who do not speak that languagemay
be unable to understand a sites content.
In this article, we have addressed the aforementionedproblems by
proposing and implementing a semiautomatedmethodology for
collecting and analyzing Dark Web infor-mation. Leveraging human
preciseness and machine effi-ciency, the methodology consists of
various steps includingcollection, filtering, analysis, and
visualization of Dark Webinformation. We used this comprehensive
methodology tocollect and analyze data from 39 Arabic terrorist Web
sitesand conducted an evaluation of the results. This researchaimed
to study to what extent the methodology can assistterrorism
analysts in collecting and analyzing Dark Webinformation. From a
broader perspective, this research con-tributes to the development
of the new science of Intelli-gence and Security Informatics (ISI),
the study of the use anddevelopment of advanced information
technologies, systems,algorithms, and databases for national
security related appli-cations through an integrated technological,
organizational,and policy based approach (Chen, 2005; Strickland
& Hunt,2005). We believe that many existing computer and
informa-tion systems techniques need to be reexamined and
adaptedfor this unique domain to create new insights and
innovations.
The rest of this paper is structured as follows. The
secondsection presents a review of terrorists use of
informationtechnologies to facilitate terrorism, information
services forstudying terrorism, and advanced techniques for
collect-ing and analyzing terrorism information. The third
sectiondescribes a methodology for collecting and analyzing DarkWeb
information. The fourth section illustrates the use of
themethodology in a case study of Jihad on the Web (whereJihad is
an Islamic term referring to a holy war wagedagainst enemies) and
discusses the evaluation results. The lastsection concludes the
study and discusses future directions.
2. Literature Review2.1. Terrorists Use of the Web
Recent studies have shown how terrorists use the Web
tofacilitate their activities. Tsfati and Weimann used the namesof
terrorist organizations to search six search engines andfound 16
relevant sites in 1998 and 29 such sites in 2002(Tsfati &
Weimann, 2002). Their analysis of site contentrevealed heavy use of
the Web by terrorist organizations to
share ideology, to provide news, and to justify use of
violence.Relying on open source information (e.g., court
testimony,reports, Web sites), researchers at the Institute for
SecurityTechnology Studies identified five categories of terrorist
useof the Web (Technical Analysis Group, 2004): propaganda(to
disseminate radical messages); recruitment and training(to
encourage people to join the Jihad and get online train-ing);
fundraising (to transfer funds, conduct credit card fraudand other
money laundering activities); communications (toprovide
instruction, resources, and support via email, digi-tal
photographs, and chat session); and targeting (to conductonline
surveillance and identify vulnerabilities of potentialtargets such
as airports). Among these, using the Web as apropaganda tool has
been widely observed.
Identified by the U.S. Government as a terrorist site,Alneda.com
called itself the Center for Islamic Studiesand Research, a bogus
name, and provided informationfor Al Qaeda (Thomas, 2003). To group
members (insid-ers), terrorists use the Web to share motivational
stories anddescriptions of operations. To mass media and
non-members(outsiders), they provide analysis and commentaries
ofrecent events on their Web sites. For example, Azzam.comurged
Muslims to travel to Pakistan and Afghanistan tofight the
Jewish-backed American Crusaders. Qassam.netappealed for donations
to purchase AK-47 rifles (Kelley,2002). Al Qaeda and some
humanitarian relief agenciesused the same bank accounts via
www.explizit-islam.de(Thomas, 2003).
Terrorists also share ideologies on the Web that
providereligious commentaries to legitimize their actions. Based
ona study of 172 members participating in the global SalafiJihad,
Sageman concluded that the Internet has created aconcrete bond
between individuals and a virtual religiouscommunity (Sageman,
2004). His study reveals that the Webappeals to isolated
individuals by easing loneliness throughconnections to people
sharing some commonality. Such vir-tual community offers a number
of advantages to terrorists.It no longer ties to any nation,
fostering a priority of fight-ing against the far enemy (e.g., the
United States) ratherthan the near enemy. Internet chat rooms tend
to encour-age extreme, abstract, but simplistic solutions, thus
attractingmost potential Jihad recruits who are not Islamic
scholars.The anonymity of Internet cafs also protects the
identityof terrorists. However, Sageman does not consider the
Inter-net to be a direct contact with Jihad, because devotion
toJihad must be fostered by an intense period of
face-to-faceinteraction. In addition, existing studies about
terrorists useof the Web mostly use a manual approach to analyze
volu-minous data. Such an approach does not scale up to rapidgrowth
of the Web and frequent change of terroristsidentitieson the
Web.
2.2. Information Services for Studying TerrorismDespite the
public nature of the Web, terrorists often try
to prevent authorities from tracing their Web addresses
andactivities, which has prompted several information services
1348 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND
TECHNOLOGYJune 2008DOI: 10.1002/asi
-
to monitor the Web sites of militant Islamic groups and
toprovide access to translated versions of information postedthere.
The Jihad and Terrorism Project was developed bythe Middle East
Media Research Institute to bridge the lan-guage gap between
theWest and the Middle East by providingtimely translations of
Arabic, Farsi, and Hebrew documents(Middle East Media Research
Institute, 2004). The Projectfor the Research of Islamist Movements
(www.e-prism.org)studies radical Islam and Islamist movements,
focusing pri-marily on Arabic sources. These projects provide
access toan array of information such as translated news stories,
tran-scripts, video clips, and training documents produced
byterrorists but fall short of supporting analysis and
visual-ization of terrorist data from the Dark Web (Project for
theResearch of Islamist Movements, 2004).
2.3. Advanced Information Technologies forCombating
Terrorism
Since the 9/11 attacks, there has been increased interestin
using information technologies to counter terrorism. Astudy
conducted by the U.S. Defense Advanced ResearchProjects Agency
shows that their collaboration, model-ing, and analysis tools
speeded analysis (Popp, Armour,Senator, & Numrych, 2004), but
these tools were not tai-lored to collecting and analyzing Web
information. Althoughnew approaches to terrorist network analysis
have been calledfor (Carley, Lee, & Krackhardt, 2001), existing
efforts haveremained mostly small scale; they have used manual
anal-ysis of a specific terrorist organization and did not
includeresources generated by terrorists in their native languages.
Forinstance, Krebs manually collected data from English
newsreleases after the 9/11 attacks and studied the network
sur-rounding the 19 hijackers (Krebs, 2001).Although
automatedsocial network analysis techniques have been proposed
toanalyze and portray criminal networks, it is not clear whetherthe
techniques are applicable to the mostly unstructured datain
terrorist Web sites that contain textual and multimediadata (Xu
& Chen, 2005). Their use of structured data ina police
department database also does not help understandterrorist Web
sites. Other advanced information technologieshaving potential to
help analyze terrorist data on the Webinclude information
visualization and Web mining.
Information visualization technologies have been used inmany
domains (Zhu & Chen, 2005) such as criminal anal-ysis (Chung,
Chen, Chaboya, OToole, & Atabakhsh, 2005)and business
stakeholder analysis (Chung, 2007). For exam-ple, multidimensional
scaling (MDS) algorithms consist of afamily of techniques that
portray a data structure in a spatialfashion, where the coordinates
of data points are calculatedby a dimensionality reduction
procedure (Young, 1987).MDS has been many different applications.
Chung and hiscolleagues developed a new browsing method based on
MDSto depict the competitive landscape of businesses on the
Web(Chung, Chen, & Nunamaker, 2005). He and Hui appliedMDS to
displaying author cluster maps in their author co-citation analysis
(He & Hui, 2002). Eom and Farris applied
MDS to author co-citation in decision support systems
(DSS)literature over 1971 through 1990 in order to find
contributingfields to DSS (Eom & Farris, 1996). Kealy applied
MDS tostudying changes in knowledge maps of groups over timeto
determine the influence of a computer-based collaborativelearning
environment on conceptual understanding (Kealy,2001). Although much
has been done in different domains tovisualize relationships of
objects using MDS, no attemptsto apply it to discovering terrorists
use of the Web have beenfound.
Web mining is the use of data mining techniques toautomatically
discover and extract information from Webdocuments and services
(Chen & Chau, 2004; Etzioni, 1996).Chen and his colleagues
(Chen, Fan, Chau, & Zeng, 2001)showed that the approach of
integrating meta-searching withtextual clustering tools achieved
high precision in searchingthe Web. Web page classification, a
process of automati-cally assigning Web pages into predefined
categories, canbe used to assign pages into meaningful classes
(Mladenic,1998). Web page clustering, a process of identifying
natu-rally occurring subgroups among a set of Web pages, can beused
to discover trends and patterns within a large number ofpages
(Chen, Schuffels, & Orwig, 1996). Although a numberof Web
mining technologies exist (e.g., Chen & Chau, 2004;Last,
Markov, & Kandel, 2006), there has not yet been a
com-prehensive methodology to address problems of collectingand
analyzing terrorist data on the Web. Unfortunately, exist-ing
frameworks using data and text mining techniques (e.g.,Nasukawa
& Nagano, 2001; Trybula, 1999) do not addressissues specific to
the Dark Web.
To our knowledge, few studies have used advanced Weband data
mining technologies to collect and analyze terroristinformation on
the Web, though these technologies have beenwidely applied in such
other domains as business and scien-tific research (e.g., Chung et
al., 2004; Marshall, McDonald,Chen, & Chung, 2004). New
approaches to collecting andanalyzing terrorist information on the
Web are needed.
3. A Methodology for Collecting and AnalyzingDark Web
Information3.1. The Methodology
To address threats from the wide range of informationsources
that terrorists and extremists use to spread their ideasand to
conduct destructive activities, we have proposed asemiautomated
methodology integrating various informationcollection and analysis
techniques and human domain knowl-edge. Figure 1 shows the
methodology aiming to effectivelyassist human investigators to
obtain Dark Web intelligenceusing information sources, collection
methods, filtering, andanalysis. Information sources consist of a
wide range of providers of
terrorist or terrorism information on the Web. Some of theseare
readily accessible (e.g., search engines) while some, liketerrorism
incident databases and Web sites developed and
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND
TECHNOLOGYJune 2008 1349DOI: 10.1002/asi
-
Information Sources
Collection Methods
DomainSpidering Back link
search
Group/PersonalProfile Search
MetaSearching
Downloading fromInternet archives
and forums
Filtering
AnalysisDomesticTerrorism
InternationalTerrorism
The WebDark WebHate Groups | Racial Supremacy | Suicidal
Attackers | Activists /
Extremists | Anti-Government | ...
Terrorist Group Web Sites
SearchEngines
PropagandaWeb Sites
Publications on Terrorism
-Domain knowledge-Linguistic knowledge
-Verification-Group profiling-Showing relationships-Analyzing
dynamics
-Searching-Browsing-Spidering
Indexing Visualization
Extraction Clustering
Classification
Section 4.1.1
Section 4.1.2
Section 4.1.3
FIG. 1. A methodology for collecting and analyzing Dark Web
information.
maintained by terrorists and their supporters, can only
bereached with the help of domain experts.
Collection methods make possible automatic searching,browsing,
and harvesting of information from identifiedsources. Domain
spidering starts with a set of relevant seedURLs and relies on an
automatic Web page collection pro-gram, often called a spider or
crawler, to harvest Web pageslinked to the seed URLs. Back-link
search, supported bysome search engines such as Google
(www.google.com) andAltaVista (www.altavista.com, acquired by
Overture that wasthen acquired by Yahoo! in 2003), allows searching
of Webpages that have hyperlinks pointing to a target Web domainor
page. It helps investigators trace activities of terrorist
sup-porters and sympathizers, whose Web pages often
referenceterrorist sites (e.g., glorify martyrs actions, show a
concur-rence of terrorist attacks). Group/personal profile
search,exemplified by major Web portals such as Yahoo!
(mem-bers.yahoo.com) and MSN (groups.msn.com), reveals theprofiles
of groups or individuals who share the same inter-ests. Terrorists
and their supporters may perhaps put hotlinks in their profiles,
which allow investigators to discoverhidden linkages.
Meta-searching uses related keywords asinput to query multiple
search engines from which investi-gators or automated programs can
collate top-ranked resultsand filter out duplicates to obtain
highly pertinent URLs ofterrorist Web sites. With careful
formulation of search termsand appropriate linguistic knowledge,
they can obtain highlyrelevant results. For example, searching the
Arabic name ofUsama Bin Laden ( ) in multiple search enginesreturns
mixed results about terrorist news articles and ter-rorist Web
sites, while augmenting Usama Bin Laden withthe keyword Sheikh (the
head of tribe or leader in Arabic),
which is frequently used by Al Qaeda to refer to Bin Laden,can
give more relevant terrorist and supporter Web sites.Downloading
from Internet archives and forums exploits thetemporal dimension of
Web information. For instance,the InternetArchive (www.archive.org)
offers access to histor-ical snapshots of Web sites. Usenet
discussion forums providea wealth of textual communication that can
be mined forhidden patterns over time.
Filtering involves sifting through collected information
andremoving irrelevant results, but to perform this task
requiresdomain knowledge and linguistic knowledge. Domain
knowl-edge refers to knowledge about terrorist groups, their
relation-ships with other terrorist and supporter groups, their
presenceon and usage of the Web, as well as their histories,
activi-ties, and missions. Linguistic knowledge deals with
terms,slogans, and other textual and symbolic clues in the
nativelanguages of the terrorist groups. Filtering can be
automaticor manual, depending on requirements for efficiency of
pro-cess and precision of the results. Typically, manual
filteringachieves high precision, but it is less efficient and
relies ondomain experts who have had years of experience in the
field.Automatic filtering is very efficient as it often uses
computersand machine learning to process large amounts of data but
theresults are less precise. Investigators can obtain
high-qualitydata for analysis from filtered repositories.
Analysis provides insights into data and helps
investigatorsidentify trends and verify conjectures. Several
functions sup-port these analytical tasks. Indexing relates textual
terms toindividual Web pages, thereby supporting precise
searchingof the pages. Extraction identifies meaningful entities
suchas terrorist names, frequently used slogans, and
suspiciousterms. Classification finds common properties among
entities
1350 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND
TECHNOLOGYJune 2008DOI: 10.1002/asi
-
and assigns them to predefined categories to help investiga-tors
predict trends of terrorist activities. Clustering
organizesentities into naturally occurring groups and helps to
iden-tify similar terrorist groups and their supporters.
Visualizationpresents voluminous data in a format perceivable by
humaneyes, so investigators can picture the relationship within a
net-work organization of terrorist groups and can recognize
theirunderlying structure.
3.2. Discussion of the MethodologyAlthough the Internet has been
publicly available since the
1990s, the Dark Web emerged only in recent years. A lackof
useful methodology designed for Dark Web data collec-tion and
analysis has limited the capability to fight againstterrorism. As
discussed above, the proposed methodologyhas incorporated various
data and Web mining technologieswhile still allowing human domain
knowledge to guide theirapplication. Its semiautomated nature
combines machine effi-ciency with the advantages of human
precision, a usefulcomplement to computers that usually fail to
detect deceptionand ambiguity on the Dark Web. Its coverage of wide
vari-eties of data sources and techniques ensures a
comprehensiveDark Web data collection, a challenge often faced by
terror-ism and intelligence analysts. Therefore, the methodologyand
its integration and application of data and Web miningtechnologies
to Dark Web analysis are novel contributions tothe ISI
research.
4. Jihad on the Web: A Case StudyTo demonstrate the value and
usability of our methodol-
ogy, we have applied it to collecting and analyzing the useof
the Web for Jihad, an Islamic term referring to a holy warwaged
against enemies as a religious duty. Believers contendthat those
who die in Jihad become martyrs and are guaran-teed a place in
paradise. In the recent decades, the conceptof Jihad has been used
as an ideological weapon to combatagainst Western influences and
secular governments and toestablish an ideal Islamic society
(Encyclopedia BritannicaOnline, 2007). Jihad supporters are closely
related to terror-ist groups while maintaining anonymity using the
Web. Forexample, prior to the 9/11 attacks, Al-Qaeda members
senteach other thousands of messages in a password-protectedsection
of an extreme Islamic Web site (Anti-DefamationLeague, 2002).
Terrorist groups such as Hamas, Hizbollah,and Palestinian Islamic
Jihad also use Web sites as propa-ganda tools. We describe the
steps of applying the methodol-ogy as follows (see Figure 1). The
data described below werecollected in 2004.
4.1. Application of the Methodology4.1.1. Collection. To collect
data, we first identified foursuspicious URLs through Web
searching, referencing to pub-lished terrorism reports, and
performing personal profilesearches on Yahoo. (For example, we
searched hizbollahin Google where we found its URL among the
top-ranked
results.) These URLs are Palestinian Islamic Jihad
(PIJ;www.qudsway.com), Hizbollah (www.hizbollah.org), themilitary
wing of Hamas (www.ezzedeen.net), and an Ara-bic Web site with a
pro-Jihad forum (www.al-imam.net). A2003 U.S. Department of State
report confirmed that PIJ,Hizbollah, and Hamas to be terrorist or
terrorist-affiliatedgroups (Department of State, 2003). Though
Al-Imam.netis not classified as a terrorist organization, it
contains pro-Jihad forums in which messages and links to terrorist
Websites are posted. We then used the back-link search functionof
Google to obtain several hundreds URLs that point to thefour
suspicious URLs. As Dark Web information can be scat-tered in many
different sources and can be changed quicklyover time, the several
methods used to identify the four initialURLs enabled us to cover a
broader scope and a more timelycontent than relying only on
published reports (e.g., U.S.Department of States annual report).
While different initialURLs and different times of data collection
could affect thecontent of the data collected, we believe that the
choice ofthe four URLs are representative of the Dark Web. It
wouldbe an interesting future direction to study the extent to
whichdata collection affect the quality of analysis results.
4.1.2. Filtering. We conducted two rounds of filtering.First, we
manually filtered out unrelated sites, such as newsor governmental
Web sites that report or discuss only terror-ist activities,
religious Web sites with no reference to Jihador violence, and
political Web sites where there is no men-tion or approval of
terrorist activities. We retained Web sitesof terrorist
organizations, those of terrorist leaders and thosethat praise
terrorists or their actions. Forty-six sites remainedafter this
round of filtering.
Second, with the help of a native Arabic speaker (whois not a
terrorism expert), we manually added 14 terror-ist and supporter
sites identified by querying Google withthe keywords (in Arabic)
that we had found in the terroristand supporter sites. Such
keywords included the leaders andorganizations names in Arabic
(mojahedin iran, markazdawa, , etc.). To limit the scope of
analy-sis, we considered only the top 50 results returned from
thesearch engine in each query search. In addition, we manu-ally
removed 21 sites from the set of all sites obtained basedon their
relevance to the domain. This round of filtering andrefining
resulted in 39 Arabic Web sites24 terrorist sitesand 15 supporter
sites.
4.1.3. Analysis. We performed clustering, classification,and
visualization on the 94,326 Web pages collected bycrawling the 39
terrorist and supporter sites using an exhaus-tive breadth-first
search spidering program (with a maximumdepth of 10 levels). The
first analysis task we performed wasclustering in which we
considered as input the 46 Web sitesidentified from the first round
of filtering (see paragraph 1 ofSection 4.1.2). The clustering
involves calculating a similar-ity between each pair of Web sites
in our collection to uncoverhidden Web communities. We define
similarity to be a real-valued multivariable function of the number
of hyperlinks in
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND
TECHNOLOGYJune 2008 1351DOI: 10.1002/asi
-
one Web site (A) pointing to another Web site (B), and thenumber
of hyperlinks in the latter site (B) pointing to theformer site
(A). In addition, a hyperlink is weighted propor-tionally to how
deep it appears in the Web site hierarchy. Forinstance, a hyperlink
appearing on the homepage of a Website is given a higher weight
than hyperlinks appearing at adeeper level. Specifically, the
similarity between Web sitesA and B is calculated as follows:
Similarity(A,B) =
All links Lb/w A and B
11 + lv(L)
where lv(L) is the level of link L in the Web site hierar-chy,
with homepage as level 0 and the level increased by 1with each
level down in the hierarchy. Using these heuristics,a computer
program automatically extracted hyperlinks onWeb pages and
calculated their similarities.
In the second analysis task, we classified the sites by
theiraffiliations with terrorist groups, ideologies, and
religions,and by their Web site attributes. Our native Arabic
speakermanually identified the affiliations of all the Web sites
accord-ing to their site content. Even with the help of the
Arabicspeaker, the components of methodology are generic enoughto
be applicable to other domains. The choice of this Arabicspeaker,
(again, who is not a terrorism expert), also wouldnot affect the
results. Table 1 shows the details of the Websites and their
affiliations.
In addition to using affiliations, we classified the sites
byindicating how terrorists and their supporters use the Webto
facilitate their activities. From our literature review,
weidentified six types of terrorist use of the Web and 27 uniqueWeb
site attributes. Table 2 presents these attributes catego-rized
under the six types. Following this coding scheme, theArabic
speaker manually read through all the subject Webpages to record
terrorist uses of the Web. Similarly to thatused in studying the
openness of government Web sites (LaPorte, Jong, & Demchak,
1999), our coding involved findingwhether an attribute existed on
the Web sites (i.e., binary scor-ing). Manual coding of each Web
site required 45 minutes to1 hour.
To reveal patterns of terroristWeb site existence and degreeof a
sites activities, we performed in the third analysis tasktwo types
of visualization: multidimensional scaling andsnowflake
visualizations.
Multidimensional scaling visualization provided a high-level
picture of all the terrorist groups and their rela-tionships. We
used Multidimensional scaling (MDS) totransform a high-dimensional
similarity matrix to a set oftwo-dimensional coordinates (Young,
1987). While othervisualization techniques might have been
applicable, wechose MDS because it suits the current data structure
andprovides a vivid picture summarizing terrorist groups
rela-tionships. Figure 2 shows these relationships in which
thesites appear as nodes and the lines connect pairs of sites
thathave at least one hyperlink pointing from one site to
another.Using the similarity matrix as input, the MDS algorithm
cal-culated coordinates of each site and placed the sites on a
two-dimensional space where proximity reflects similarity.Upon
closer examination of the figure, seven clusters of sitesemerge.
(The numbers in parentheses refer to the sites inTable1. The URLs
were filtered out in the second-round filteringbut appeared in the
collection after the first-round filtering.)
(1) Hizballah Cluster (# 7, 11, 12, hizbollah.org,
andintiqad.org) contains the Web site of Hizballah
group(www.hizbollah.org) and its affiliated sites such as
HizbollahE-magazine (www.intiqad.org), Hizbollah Support
Associ-ation (#11), and the site of Sayyed Hassan Nasrollah (#12),a
major leader of Hizbollah.
(2) Palestinian Cluster (# 4, 5, 6, 9, 13, 14, 15, 36,
andh4palestine.com) includes militant groups fighting againstIsrael
(e.g., Al-Aqsa Martyrs Brigade, Hamas). There arelinks between
sites of the same group (e.g., # 4 and 14) andlinks between sites
of different groups (e.g., # 9 and 6).
(3) Al Qaeda Cluster (# 26, 28, 31, 35, 37, and
sahwah.com)includes Salafi groups supporters Web sites that often
arelinked to each other in their Other friendly Web sites sec-tion.
They use their Web sites heavily to propagate theirideology. For
example, Al-ansar.biz posted a video of thebeheading of Nicholas
Berg, one of the first civilians killedby terrorists (Newman,
2004). Alsakifah.org provides anonline discussion forum.
(4) Caucasian Cluster (# 10, 34, kavkazcenter.com,
kavkaz.tv,kavkazcenter.net, and kavkazcenter.info) consists of
Websites that link to Chechen rebels and provide news updatesfrom
Chechen areas. For example, Qoqaz.com has docu-mented operations
against Russian military.
(5) Jihad Supporters (# 29, 30, 32, 33,
clearguidance.blogspot.com, and ummanews.com) consist of Web sites
providingnews and general information on the global Jihad
movement.These sites rarely are linked to each other and often play
apropaganda role that targets outsiders.
(6) Hizb-Ut-Tahrir (# 27, hizb-ut-tahrir.org,
expliciet.nl,khilafah.com, and hilafet.com) contains a
non-terrorist polit-ical group, Hizb-Ut-Tahrir, dedicated to the
restoration ofIslamic law and Khilafah (global leadership of
Muslims). Ithas a presence in many Arab countries (e.g., Lebanon,
Jor-dan) and some European countries. For instance, Expliciet.nlis
a Dutch Web site based in the Netherlands.
(7) Tanzeem-e-Islami Cluster (tanzeem.org) consists of a sin-gle
site representing the Pakistani Tanzeem-e-Islami partywith no clear
ties to terrorism.
Snowflake visualization supports analysis of differentdimensions
(or categories) of activities of a Web site clus-ter. It originates
from a star plot that has been widely used todisplay multivariate
data (Chambers, Cleveland, Kleiner, &Tukey, 1983). A snowflake
shown in Figure 2 represents aterrorist site cluster. Figure 3
shows five snowflake diagrams,each representing the degree of
activity of terrorist/supportergroups in the five terrorist
clusters (Clusters 15) describedabove. (Clusters 6 and 7 are not
included because they do notcontain terrorist sites.) The six sides
of a snowflake repre-sent the six dimensions of terrorist use of
the Web, as shownin Table 2 and explained above. Each of these six
dimen-sions represents a normalized scale between 0 and 1
(activityindex), showing the degree of activity on the
dimensions.
1352 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND
TECHNOLOGYJune 2008DOI: 10.1002/asi
-
TABL
E1.
Ana
lysis
ofJ
ihad
terr
orist
grou
psan
dth
eirs
upp
orte
rss
ites.
No
Nam
eU
RLa
Des
crip
tionb
Terr
oris
tgro
upc
Rel
igio
n
Terr
oris
tGro
ups
Web
Site
s(tot
al:24
)1
Spec
ialF
orc
ew
ww
.sp
ecia
lforc
e.ne
tPr
ovid
esco
mpu
terg
ame
repl
icat
ing
the
fight
ing
scen
esbe
twee
nLe
bane
sere
sista
nce
and
Isra
elio
ccu
pier
sH
izba
llah
Shia
Mus
lim
2Pa
lesti
neIn
foin
Urd
upa
lesti
ne-in
fo-u
rdu.
com
Ham
asn
ews
Web
site
inU
rdu
Ham
asSu
nniM
uslim
3A
l-Man
arw
eb.m
anar
tv.o
rgTh
eWeb
site
ofA
l-Man
ar,
theT
Vch
anne
lofL
eban
ese
Hiz
balla
hH
izba
llah
Shia
Mus
lim4
Abr
arw
ayw
ww
.ab
rarw
ay.co
mN
ews
Web
site
ofI
slam
icJih
ado
fPal
estin
eG
uerri
llagr
oup
Pale
stini
anIs
lam
icJih
adSu
nniM
uslim
5Is
lam
icJih
adM
ail
ww
w.jim
ail.co
mN
ews
Web
site
ofI
slam
icJih
ado
fPal
estin
eG
uerri
llagr
oup
Pale
stini
anIs
lam
icJih
adSu
nniM
uslim
6Ez
z-al
-din
eAl-Q
assam
ww
w.ez
zede
en.n
etA
gene
ralp
orta
lofI
zz-E
deen
Al-Q
asam
Ham
asSu
nniM
uslim
7H
izbo
llah
ww
w.hi
zbol
lah.
tvTh
eoffi
cial
Web
site
ofH
izba
llah
Org
aniz
atio
nH
izba
llah
Shia
Mus
lim8
Info
Pale
stina
ww
w.in
fopa
lesti
na.c
omH
amas
info
rmat
ion
and
new
sW
ebSi
tein
Mal
ayH
amas
Sunn
iMus
lim9
Kat
aeb
AlA
qsa
ww
w.ka
taeb
alaq
sa.c
omTh
eoffi
cial
Web
Site
ofA
lAqs
aM
arty
rsB
rigad
esA
l-Aqs
aM
arty
rsB
rigad
eSe
cula
r10
Kav
kaz
ww
w.ka
vka
z.or
g.uk
The
new
sW
ebSi
teo
fChe
chen
guer
rilla
fight
ers
Isla
mic
Inte
rnat
iona
lBrig
ade,
Spec
ialP
urpo
seIs
lam
icR
egim
ent,
Riy
adus
-Sal
ikhi
nR
econ
naiss
ance
and
Sabo
tage
Bat
talio
no
fCh
eche
nM
arty
rs
Sunn
iMus
lim
11M
oqaw
ama
ww
w.m
oqa
wam
a.tv
Web
site
oft
heH
izba
llah
ssu
ppor
tgro
upH
izba
llah
Shia
Mus
lim12
Nas
rolla
hw
ww
.n
asro
llah.
org
Hiz
balla
hle
ader
ssit
e(S
heikh
Has
san
Nas
rolla
h)H
izba
llah
Shia
Mus
lim13
Alsh
ohad
aw
ww
.b-
alsh
ohda
.com
Web
site
ofH
amas
and
Isla
mic
Jihad
dedi
cate
dto
mar
tyrs
Ham
as,P
ales
tinia
nIs
lam
icJih
adSu
nniM
uslim
14Qu
dsW
ayw
ww
.qu
dsw
ay.co
mPr
ovid
esge
nera
lnew
so
fIsla
mic
Jihad
ofP
ales
tine
Pale
stini
anIs
lam
icJih
adSu
nniM
uslim
15R
antis
iw
ww
.ra
ntis
i.net
Web
site
ofA
bdel
Azi
zAlR
antis
iaH
amas
lead
erH
amas
Sunn
iMus
lim16
Peop
les
Moja
hedin
of
Iran
ww
w.ira
n.m
ojahe
din.or
gW
ebsit
epos
ting
stat
emen
tsby
theP
eopl
es
Moja
hedin
Org
aniz
atio
nM
ujahe
din-e
Kha
lqO
rgan
izat
ion
Secu
lar
17N
atio
nalC
ounc
ilo
fR
esist
ance
ofI
ran
ww
w.ira
nncr
fac.
org
Offi
cial
Web
site
oft
heFo
reig
nA
ffairs
Com
mitt
eeo
fthe
Nat
iona
lCo
unci
lofR
esist
ance
ofI
ran
Muja
hedin
-eK
halq
Org
aniz
atio
nSe
cula
r
18Ir
ania
nPe
ople
sFa
daee
Gue
rrilla
sw
ww
.sia
hkal
.com
The
mem
oria
lWeb
Site
oft
heIr
ania
nPe
ople
sFa
daee
Gue
rrilla
sM
ujahe
din-e
Kha
lqO
rgan
izat
ion
Secu
lar
19Th
eO
rgan
izat
ion
of
Iran
ian
Peop
les
Feda
ian
ww
w.fa
dai.o
rgTh
eO
rgan
izat
ion
ofI
rani
anPe
ople
sFe
daia
n(M
ajority
)offi
cial
Web
site
Muja
hedin
-eK
halq
Org
aniz
atio
nSe
cula
r
20O
rgan
izat
ion
ofI
rani
anPe
ople
sFe
daye
eG
uerri
llas
ww
w.fa
daia
n.or
gO
rgan
izat
ion
ofI
rani
anPe
ople
sFe
daye
eG
uerri
llasm
emo
rial
Web
site
Muja
hedin
-eK
halq
Org
aniz
atio
nSe
cula
r
21Th
eU
nion
ofP
eopl
es
Feda
ian
ofI
ran
ww
w.et
ehad
efed
aian
.org
New
san
din
form
atio
nW
ebsit
eo
fthe
Uni
ono
fPeo
ple
sFe
daia
no
fIra
nM
ujahe
din-e
Kha
lqO
rgan
izat
ion
Secu
lar (Con
tinue
d)
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND
TECHNOLOGYJune 2008 1353DOI: 10.1002/asi
-
TABL
E1.
(Con
tinue
d)
No
Nam
eU
RLa
Des
crip
tionb
Terr
oris
tgro
upc
Rel
igio
n
22R
evo
lutio
nary
Peop
les
Libe
ratio
nFr
ont
ww
w.dh
kc.n
etR
evo
lutio
nary
Peop
lesL
iber
atio
nFr
onto
ffici
alW
ebsit
e.Pr
ovid
esnew
san
dst
atem
ents
oft
heorg
aniz
atio
nR
evo
lutio
nary
Peop
les
Libe
ratio
nA
rmy/
Fron
tSe
cula
r
23D
HK
CIn
tern
atio
nal
ww
w.dh
kc.in
foW
ebsit
eo
fDH
KC
inTu
rkish
Rev
olu
tiona
ryPe
ople
sLi
bera
tion
Arm
y/Fr
ont
Secu
lar
24Cr
usad
eB
egin
sjor
gev
inhe
do.si
tes.u
ol.c
om.b
rTh
eB
razi
l-bas
edW
ebsit
elin
ksto
Lash
kar-e
-Tai
ba
a
terr
orist
org
aniz
atio
nba
sed
inPa
kista
nLa
shka
r-eTa
yyib
aSu
nniM
uslim
Supp
orte
rsW
ebsit
es(to
tal:1
5)25
AlA
nsar
ww
w.al
-ans
ar.bi
zPr
ovid
essu
ppor
tto
AlQ
aeda
org
aniz
atio
n,as
wel
las
artic
les
abou
tthe
Sala
fiSu
nniI
deol
ogy
AlQ
aeda
Sunn
iMus
lim
26A
loka
bw
ww
.al
okab
.co
mPr
ovid
esar
ticle
sabo
utth
eSa
lafi
Sunn
iIde
olog
yan
dth
eJih
adist
movem
ent
AlQ
aeda
Sunn
iMus
lim
27A
lsaki
fah
Foru
mw
ww
.al
saki
fah.
org
Prov
ides
educ
atio
nals
erv
ices
and
afo
rum
dedi
cate
dto
the
disc
ussio
no
fthe
Sala
fiId
eolo
gyA
lQae
daSu
nniM
uslim
28Ci
had
ww
w.ci
had.
net
Age
nera
lJih
adW
ebsit
epr
ovid
ing
info
rmat
ion
abou
tall
Jihad
activ
ities
aro
un
dth
ew
orld
AlQ
aeda
Sunn
iMus
lim
29Cl
earG
uida
nceF
oru
mw
ww
.cl
earg
uida
nce.
com
Foru
mo
fJih
adsu
ppor
ters
AlQ
aeda
Sunn
iMus
lim30
Shei
khH
amid
Bin
Abd
alla
hA
lAli
ww
w.h-
alal
i.net
Sala
fiEd
ucat
iona
lWeb
site
with
som
eJih
adid
eas
AlQ
aeda
Sunn
iMus
lim
31Jih
adun
spun
ww
w.jih
adun
spun.c
omPr
o-Jih
adn
ews
Web
site
AlQ
aeda
Sunn
iMus
lim32
Mak
tab-
Al-J
ihad
ww
w.m
akta
b-al
-jihad
.com
Pro-
Jihad
new
sW
ebsit
eA
lQae
daSu
nniM
uslim
33Qo
qaz
ww
w.qo
qaz.
com
Jihad
new
sfro
mth
eCa
ucas
usIs
lam
icIn
tern
atio
nalB
rigad
e,Sp
ecia
lPur
pose
Isla
mic
Reg
imen
t,R
iyad
us-S
alik
hin
Rec
onna
issan
cean
dSa
bota
geB
atta
lion
of
Chec
hen
Mar
tyrs
Sunn
iMus
lim
34Su
ppor
ters
of
Shar
eeah
ww
w.sh
aree
ah.o
rgA
gene
ralp
orta
lded
icat
edto
the
Jihad
istm
ovem
ent
AlQ
aeda
Sunn
iMus
lim
35M
olta
qaw
ww
.al
mol
taqa
.org
Ham
asFo
rum
Ham
asSu
nniM
uslim
36Sa
raya
ww
w.sa
raya
.com
Pro-
Jihad
Web
site
AlQ
aeda
Sunn
iMus
lim37
Osa
ma
Bin
Lade
n1o
sam
abin
lade
n.5u
.com
AW
ebsit
ede
dica
ted
toO
sam
aB
inLa
den
AlQ
aeda
Sunn
iMus
lim38
Taw
hed
ww
w.ta
whe
d.w
sPr
o-Jih
adW
ebsit
eA
lQae
daSu
nniM
uslim
39Th
eR
ight
Wo
rdw
ww
.rig
htw
ord
.net
Pro-
AlQ
aeda
Web
Porta
lA
lQae
daSu
nniM
uslim
aSo
me
oft
heU
RLsa
nd
sites
may
have
been
chan
ged
atth
etim
eo
frea
ding
due
toth
era
pid
chan
geo
fthe
Dar
kW
eb.
b The
desc
riptio
nsar
eo
btai
ned
from
theW
ebsit
es.
cD
escr
iptio
nso
fthe
sete
rror
istgr
oups
appe
arin
the
U.S
.Dep
artm
ento
fSta
teR
epor
tPa
ttern
ofG
loba
lTer
rori
sm,2
002.
1354 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND
TECHNOLOGYJune 2008DOI: 10.1002/asi
-
TABL
E2.
Cate
gorie
soft
erro
ristu
seo
fthe
Web
and
Web
site
attr
ibu
tes.
Cate
gory
Attr
ibu
teD
escr
iptio
n
Com
mun
icat
ions
E-m
ail
Any
liste
dem
aila
ddre
sso
rfe
edba
ckfo
rm.
Tele
phon
e(in
cludin
gWeb
phon
e)Te
leph
one
num
bers
ofo
rgan
izat
ion
offi
cial
s.M
ultim
edia
tool
sVi
deo
clip
sofb
ombi
ngsa
nd
oth
erac
tiviti
es.V
ideo
,so
un
dre
cord
ing
&ga
me
(e.g.,
lead
ers
mes
sage
san
din
struc
tions
).O
nlin
efe
edba
ckfo
rmA
llow
the
use
rto
give
feed
back
or
ask
ques
tions
toth
eWeb
site
ow
ner
san
dm
aint
aine
rs.
Doc
umen
tatio
nR
epor
t,bo
ok,l
ette
r,m
emo
and
oth
erre
sou
rces
prov
ided
(e.g.,
inpd
f,W
ord
,an
dEx
celf
orm
ats).
Fund
raisi
ngEx
tern
alai
dm
entio
ned
Oth
ergr
oups
or
gover
nm
ents
supp
ortin
gth
eorg
aniz
atio
n.Fu
ndtr
ansf
erFu
ndtr
ansf
erm
etho
ds.
Don
atio
nD
onat
ions
un
dert
hefo
rmo
fdire
ctba
nkde
posit
s.Ch
arity
Don
atio
nsto
relig
ious
wel
fare
org
aniz
atio
nsas
soci
ated
with
terr
orist
org
aniz
atio
n.Su
ppor
tgro
ups
Subo
rgan
izat
iona
lstr
uctu
res
char
ged
with
the
fund
raisi
ngpr
ogra
m.
Oth
ers
Oth
erat
trib
ute
sbe
long
ing
toth
isca
tego
ry.
Shar
ing
ideo
logy
Miss
ion
The
majo
rgoa
lso
fthe
org
aniz
atio
n(e.
g.,de
struc
tion
ofa
nen
emy
stat
e,lib
erat
ion
ofo
ccu
pied
terr
itorie
s).D
octri
neTh
ebe
liefs
oft
hegr
oup
(e.g.,
relig
ious
,co
mm
un
ist,e
xtr
eme
right
).Ju
stific
atio
no
fthe
use
ofv
iole
nce
Ideo
logy
con
done
sthe
use
ofv
iole
nce
toac
com
plish
goal
s(e.g
.,su
icid
ebo
mbi
ng).
Pinp
oint
ing
enem
ies
Clas
sifies
oth
ersa
sei
ther
enem
ieso
rfri
ends
(e.g.,
U.S
.ise
nem
y,Ta
liban
regi
me
isfri
endl
y).Pr
opag
anda
(insid
ers)
Slog
ans
Shor
tphr
ases
with
relig
ious
or
ideo
logi
calc
onnota
tions
.D
ates
Men
tions
date
sin
the
histo
ryo
fthe
terr
orist
grou
p,su
chas
the
date
ofa
majo
ratta
ck.
Mar
tyrs
desc
riptio
nLi
ststh
en
ames
ofm
embe
rsw
hodi
edin
terr
orism
rela
ted
ope
ratio
nso
rde
scrip
tions
oft
heci
rcum
stanc
es.
Lead
ers
nam
e(s)
Terr
oris
tgro
upsl
eade
r(s)n
ame
ascl
aim
edby
theW
ebsit
e.B
anne
rand
seal
Ban
nerd
epic
ting
repr
esen
tativ
efig
ures
,gra
phic
alsy
mbo
ls,or
seal
soft
heorg
aniz
atio
n.N
arra
tives
ofo
pera
tions
and
even
tsPr
ovid
esn
arra
tives
oft
heo
pera
tions
and
atta
ckso
fthe
grou
p.O
ther
sO
ther
attr
ibu
tes
belo
ngin
gto
this
cate
gory
.
Prop
agan
da(ou
tside
rs)R
efer
ence
tom
edia
cover
age
For
exam
ple,
theW
ebsit
ecr
itici
zesW
este
rnm
edia
cover
age
ofe
ven
tsw
ithex
plic
itm
entio
no
foutle
tso
feven
tssu
chas
CNN
,CBS
.N
ews
repo
rting
Gro
ups
ow
nin
terp
reta
tion
ofe
ven
ts.
Virtu
alco
mm
un
ityLi
stser
vA
utom
atic
mai
ling
lists
erver
that
broa
dcas
tsto
ever
yone
on
the
list.
Tex
tcha
tro
om
Virtu
alro
om
whe
rea
chat
sess
ion
take
spl
ace.
Tex
tmes
sagi
ngch
atse
ssio
nsu
chas
ICQ.
Mes
sage
boar
dA
llow
sm
embe
rsto
post
and
read
mes
sage
son
line.
Web
ring
Ase
rieso
fweb
sites
linke
dto
geth
erin
arin
gth
atby
clic
king
thro
ugh
allo
fthe
sites
inth
erin
gth
ev
isito
rw
illev
entu
ally
com
eba
ckto
the
orig
inat
ing
site.
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND
TECHNOLOGYJune 2008 1355DOI: 10.1002/asi
-
11
12
27
7
5
13
15
14
9
6
36
37
2628
34
29
35
3332
ktab-al-jihad.com
1. Hizballah Cluster
5. Jihad Supporters
6. Hizb-Ut-Tahrir Cluster
2. Palestinian Cluster
4
30
4. Caucasian Cluster
31
3. Al-Qaeda Cluster
FIG. 2. Clustering and visualization of terrorist Web sites (The
numbers refer to those appearing in Table 1)*.
The activity index of Cluster c on dimension d was calculatedby
the following formula:
Activity Index (c, d) =
ni
mj
wi,j
m n
where wi,j ={
1 attribute i occurs in Web site j0 otherwise
n = total number of attributes in the specified dimension d;m =
total number of Web sites belonging to the specifiedCluster c.
The closer the activity index is to 1, the more active acluster
is on that dimension. This index reveals in what areasthe terrorist
groups are active and hence provides investiga-tors and analysts
with clues about how to devise strategies tocombat a group.
4.2. Results and Discussions
Our preliminary observations show that the methodol-ogy yielded
promising results. For example, it identifiedWeb sites affiliated
with 10 of the 26 groups classified asJihad terrorist organizations
in the U.S. State Departmentreport on terrorism. Al-Ansar.biz (#
26), the site that posted
the beheading video of Nicholas Berg, posted messagesfrom Al
Qaeda leaders such as Osama Bin Laden, AymanAl-Zawahiri, and
Al-Zarqawi, praising their attacks on ene-mies. Another site,
Tawhed.com (site 39), posted a poempraising the 9/11 attacks. The
rhetoric of the poem commonlyappears in many Al Qaeda affiliated
Web sites, referring tothe Americans as crusaders ( ). Words like
Sunna andJamah ( ) reflect the branch of Islam to which theSalafi
groups belong.
From the snowflake diagrams (Figure 3), we found thatterrorists
and supporters use the Web heavily to share ideol-ogy and to
propagate ideas, especially to their members. Forexample, the
Palestinian cluster (Cluster 2) actively sharesits ideology and
heavily uses the Web as a propaganda toolfor members. The Web sites
in this cluster support libera-tion of Palestine, pinpoint and
criticize their enemies, anddescribe details of operations and
rationales supported byQuaran verses. In contrast, Jihad supporters
(Custer 5) rarelyuse the Web for propaganda but share ideology and
com-municate there. The Hizbollah cluster (Cluster 1) resemblesthe
Palestinian cluster in heavy use of the Web for sharingideology and
insider propaganda. For example, the sites inthis cluster glorify
martyrs and leaders and also were usedmoderately for outsider
propaganda and communications.In all the five clusters, we found
little evidence of using the
1356 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND
TECHNOLOGYJune 2008DOI: 10.1002/asi
-
Communications
Fundraising
Sharingideology
Propaganda(insiders)
Propaganda(outsiders)
Virtualcommunity
Cluster 1: Hizballah Cluster
0.53
0.20
0.92
0.72
0.50
0.13
Communications
Fundraising
Sharingideology
Propaganda(insiders)
Propaganda(outsiders)
Virtualcommunity
Cluster 2: Palestinian Cluster
0.43
0.10
0.81
0.81
0.44
0.35
Communications
Fundraising
Sharingideology
Propaganda(insiders)
Propaganda(outsiders)
Virtualcommunity
Cluster 3: Al-Qaeda Cluster
0.52
0.12
0.85
0.30
0.30
0.32
Communications
Fundraising
Sharingideology
Propaganda(insiders)
Propaganda(outsiders)
Virtualcommunity
Cluster 4: Caucasian Cluster
0.60
0.10
0.50
0.50
0.50
0.40
Communications
Fundraising
Sharingideology
Propaganda(insiders)
Propaganda(outsiders)
Virtualcommunity
Cluster 5: Jihad Supporters
0.40
0.05
0.500.210.38
0.20
FIG. 3. Snowflake visualization of five terrorist site
clusters.
Web for fundraising or building a virtual community. Prob-ably
such uses have gone underground or do not appear onthe Web.
4.3. Expert Evaluation and Results
Based on the above results, we have invited a terrorismexpert to
conduct an evaluation of the methodology. A seniorfellow of the
U.S. Institute of Peace at Washington D.C., theexpert is a
professor of communication in a major researchuniversity in Israel.
Having expertise in modern terrorism andthe Internet, he has
published more than 80 refereed journalarticles and books and is a
frequent speaker at internationalconferences on counter terrorism.
This expert also leads ateam of about 16 research assistants who
regularly moni-tor 4,300 sites on the Dark Web for terrorist
activities. Theapproach he and his team use to collect and analyze
terror-ists use of the Web is largely manual, relying on
laborioushuman browsing and monitoring of selected Web sites.
Hisexperience in manual analysis served to contrast with
ourmethodology that automated part of the DarkWeb data collec-tion
and analysis. We decided to use expert validation insteadof other
evaluation methods because of two reasons: (1)Lab experiment is not
suitable because typical experimental
subjects do not have much knowledge in the Dark Web, and(2) it
is not feasible to invite terrorists to participate in an
inter-view or empirical evaluation. The expert was not involved
inwriting this article.
The evaluation was conducted using an unbiased struc-tured
questionnaire and a formal procedure. We showed theresults to our
expert and asked him to provide detailed com-ments on the
categorization of Web sites and attributes, thevisualization and
clustering of terrorist groups, and the usabil-ity of the snowflake
visualization. In general, he deemed theresults to be very
promising and the methodology designto be excellent. He believed
that this was the start of avery important research that will
result in a useful databaseand a reliable methodology to update and
maintain thedatabase.
The expert was greatly impressed by the visualization
andclustering capabilities of the methodology, and he
providedvaluable comments on our work. However, he said that the39
Web sites shown in Table 1 do not represent the entirepopulation of
all terrorist Web sites, the number of whichhe estimated to be over
four thousands. Because we focusedonly on Middle Eastern terrorist
groups (rather than all ter-rorist groups in the world), we believe
that our methodologyhas yielded representative results and has
automated much
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND
TECHNOLOGYJune 2008 1357DOI: 10.1002/asi
-
of the manual work of identifying and analyzing terroristWeb
sites. He suggested adding qualitative measures such aspersuasive
appeals, rhetoric, and attribution of guilt to theWeb site
attributes shown in Table 2. We believe that theseimportant
attributes are difficult to be incorporated intothe automated
processing of our methodology because oftheir qualitative nature.
He considered the clustering and visu-alization shown in Figure 2
to be very important because of itsusefulness to investigation of
terrorist activities on the Web.He called the snowflake
visualization very accurate and veryuseful to investigation of
terrorist Web sites but criticized theway we created linkages among
Web sites. He suggested con-sidering textual citations and other
references in addition tousing only hyperlinks.
Overall, the expert agreed that the results were verypromising
because they offer useful investigation leads andwould be very
helpful to improve understanding of terror-ist activities on the
Web. Because of the high qualificationand relevant experience of
this expert, we believe that theevaluation results can accurately
reflect the effectiveness ofthe methodology. These results also
contributed to advanc-ing the ISI discipline by showing the
applicability of themethodology to Dark Web data collection and
analysis.
5. Conclusions and Future DirectionsCollecting and analyzing
Dark Web information has chal-
lenged investigators and researchers because terrorists
caneasily hide their identities and remove traces of their
activi-ties on the Web. The abundance of Web information has madeit
difficult to obtain a comprehensive picture of
terroristsactivities. In this article, we have proposed a
methodology toaddress these problems. Using advanced Web mining,
con-tent analysis, visualization techniques, and human
domainknowledge, the methodology exploited various
informationsources to identify and analyze 39 Jihad Web sites.
Infor-mation visualization was used to help to identify
terroristclusters and to understand terrorist use of the Web. Our
expertevaluation showed that the methodology yielded
promisingresults that would be very useful to assist investigation
of ter-rorism. The expert considered the visualization results
veryuseful, having potential to guide policymaking and
intelli-gence research. Therefore, this research has contributed
todeveloping a useful methodology for collecting and analyzingDark
Web information, applying the methodology to study-ing and
analyzing 39 Jihad Web sites, and providing formalevaluation
results of the usability of the methodology.
We are pursuing a number of directions to further ourresearch.
As terrorists often change their Web sites to removetraces of their
activities, we plan to archive the Dark Web con-tent digitally and
apply our methodology to tracing terroristactivities over time. We
will develop scalable techniques tocollect such volatile yet
valuable content to visualize largevolumes of Dark Web data and
extract meaningful entitiesfrom terrorist Web sites. These efforts
will help investigatorstrace and prevent terrorist attacks.
6. AcknowledgmentsThis research was partly supported by funding
from the
U.S. Government Department of Homeland Security andCorporation
for National Research Initiatives and by SantaClara University. We
thank contributing members of theUniversity of Arizona Artificial
Intelligence Lab for theirsupport and assistance.
ReferencesAnti-Defamation League. (2002). Jihad Online: Islamic
Terrorists and the
Internet, retrieved March 26, 2008 from
http://www.adl.org/internet/jihad_online.pdf.
Blakemore, B. (November 23, 2004). Web posting may provide
insightinto Iraq insurgency. ABC News, retrieved March 26, 2008
from http://abcnews.go.com/WNT/story?id=277421.
Carley, Kathleen M. Ju-Sung Lee and David Krackhardt, 2001,
DestabilizingNetworks, Connections, 24(3): 3134.
Chambers, J., Cleveland, W., Kleiner, B., & Tukey, P.
(1983). Graphicalmethods for data analysis. Wadsworth International
Group (Belmont, CA)and Duxbury Press (Boston, MA).
Chen, H. (2005). Introduction to the special topic issue:
Intelligence andsecurity informatics. Journal of the American
Society for InformationScience and Technology, 56(3), 217220.
Chen, H., & Chau, M. (2004). Web mining: Machine learning
for Web appli-cations. In M. E. Williams (Ed.),Annual review of
information science andtechnology (Vol. 38, pp. 289329). Medford,
NJ: Information Today, Inc.
Chen, H., Fan, H., Chau, M., & Zeng, D. (2001). MetaSpider:
Meta-searchingand categorization on the Web. Journal of the
American Society forInformation Science and Technology, 52(13),
11341147.
Chen, H., Schuffels, C., & Orwig, R. (1996). Internet
categorization andsearch: A self-organizing approach. Journal of
Visual Communicationand Image Representation, 7(1), 88102.
Chung, W. (2008). Visualizing E-Business Stakeholders on the
Web: AMethodology and Experimental Results. International Journal
of Elec-tronic Business, 6(1), 2008, 2546.
Chung, W., Chen, H., Chaboya, L.G., OToole, C., & Atabakhsh,
H. (2005).Evaluating event visualization: A usability study of
COPLINK Spatio-Temporal Visualizer. International Journal of
Human-Computer Studies,62(1), 127157.
Chung, W., Chen, H., & Nunamaker, J.F. (2005). A visual
framework forknowledge discovery on the Web: An empirical study on
business intelli-gence exploration. Journal of Management
Information Systems, 21(4),5784.
Chung, W., Zhang, Y., Huang, Z., Wang, G., Ong, T.-H., &
Chen, H. (2004).Internet searching and browsing in a multilingual
world:An experiment onthe Chinese business intelligence portal
(CBizPort). Journal of the Amer-ican Society for Information
Science and Technology, 55(9), 818831.
Department of State. (2003). Patterns of Global Terrorism 2002:
The UnitedStates Government, retrieved March 26, 2008 from
http://www.state.gov/s/ct/rls/crt/2002/.
Encyclopedia Britannica Online. (2007). Jihad. Retrieved March
26, 2008from http://www.britannica.com/ebc/article-9368558,
Britannica ConciseEncyclopedia.
Eom, S.B., & Farris, R.S. (1996). The contributions of
organizational scienceto the development of decision support
systems research subspecial-ties. Journal of the American Society
for Information Science, 47(12),941952.
Etzioni, O. (1996). The World Wide Web: Quagmire or gold
mine?Communications of the ACM, 39(11), 6568.
Gellman, B. (June 27, 2002). Cyber-attacks by Al Qaeda
feared.Washington Post, page A01, retrieved March 26, 2008 from
http://www.washingtonpost.com/ac2/wp-dyn/A50765-2002Jun26.
He, Y., & Hui, S.C. (2002). Mining a Web citation database
for authorco-citation analysis. Information Processing and
Management, 38(4),491508.
1358 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND
TECHNOLOGYJune 2008DOI: 10.1002/asi
-
Kealy, W.A. (2001). Knowledge maps and their use in
computer-based col-laborative learning. Journal of Educational
Computing Research, 25(4),325349.
Kelley, J. (July 10, 2002). Militants Wire Web With Links to
Jihad. USAToday, retrieved March 26, 2008 from
http://www.usatoday.com/news/world/2002/07/10/web-terror-cover.htm.
Krebs, V.E. (2001). Mapping network of terrorist cells.
Connections, 24(3),4352.
La Porte, T. M., Jong, M. d., & Demchak, C. C. (1999).
Public Organi-zations on the World Wide Web: Empirical Correlates
of AdministrativeOpenness. Paper presented at the Proceedings of
the 5th National PublicManagement Research conference, College
Station, TX.
Last, M., Markov, A., & Kandel, A. (2006). Multi-Lingual
Detection ofTerrorist Content on the Web. Paper presented at the
Proceedings of thePAKDD06 International Workshop on Intelligence
and Security Infor-matics, Singapore, Springer, Berlin /
Heidelberg, pp. 1630.
Marshall, B., McDonald, D., Chen, H., & Chung, W. (2004).
EBizPort: col-lecting and analyzing business intelligence
information. Journal of theAmerican Society for Information and
Science and Technology, 55(10),873891.
Middle East Media Research Institute. (2004). Jihad and
Terrorism Stud-ies Project. Retrieved March 2004, retrieved March
26, 2008 from http://www.memri.org/jihad.html.
Mladenic, D. (1998). Turning Yahoo into an automatic web page
classifier.Paper presented at the Proceedings of the 13 European
Conference onArtificial Intelligence, Brighton, UK.
Nasukawa, T., & Nagano, T. (2001). Text analysis and
knowledge miningsystem. IBM Systems Journal, 40(4), 967984.
Newman, M. (2004, May 11). Video appears to show beheading
ofAmericancivilian. The New York Times.
Popp, R., Armour, T., Senator, T., & Numrych, K. (2004).
Countering terror-ism through information technology.
Communications of theACM, 47(3),3643.
Project for the Research of Islamist Movements. (2004). PRISM,
2004,retrieved March 26, 2008 from http://www.e-prism.org.
Sageman, M. (2004). Understanding terror networks. Philadelphia,
PA:University of Pennsylvania Press.
Strickland, L.S., & Hunt, L.E. (2005). Technology, security,
and individ-ual privacy: New tools, new threats, and new public
perceptions. Journalof the American Society for Information Science
and Technology, 56(3),221234.
Technical Analysis Group. (2004). Examining the cyber
capabilities ofIslamic terrorist groups. Hanover, NH: Institute for
Security TechnologyStudies at Dartmouth College.
Thomas, T.L. (2003, Spring). Al Qaeda and the Internet: The
danger ofcyberplanning. Parameters, 112123.
Trybula, W.J. (1999). Text mining. In M.E. Williams (Ed.),
Annual reviewof information science and technology (Vol. 34, pp.
385419). Medford,NJ: Information Today, Inc.
Tsfati, Y., & Weimann, G. (2002). retrieved March 26, 2008
fromhttp://www.terrorism.com/, Terror on the Internet. Studies in
Conflict &Terrorism, 25, 317332.
Xu, J., & Chen, H. (2005). Criminal network analysis and
visualization.Communications of the ACM, 48(6), 100107.
Young, F.W. (1987). Multidimensional scaling: History, theory,
and applica-tions. Hillsdale, NJ: Lawrence Erlbaum Associates.
Zhu, B., & Chen, H. (2005). Chapter 4: Information
Visualization. InB. Cronin (Ed.), Annual Review of Information
Science and Technology(Vol. 39, pp. 139177). Medford, NJ:
Information Today, Inc.
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND
TECHNOLOGYJune 2008 1359DOI: 10.1002/asi