Research Article Analyses of Crime Patterns in …downloads.hindawi.com/journals/tswj/2014/492461.pdfResearch Article Analyses of Crime Patterns in NIBRS Data Based on a Novel Graph
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Research ArticleAnalyses of Crime Patterns in NIBRS Data Based on a NovelGraph Theory Clustering Method Virginia as a Case Study
Peixin Zhao1 Marjorie Darrah2 Jim Nolan3 and Cun-Quan Zhang2
1 School of Management Shandong University Jinan Shandong China2Department of Mathematics West Virginia University Morgantown WV USA3Department of Sociology and Anthropology West Virginia University Morgantown WV USA
Correspondence should be addressed to Peixin Zhao pxzhao126com
Received 28 January 2014 Accepted 25 February 2014 Published 20 March 2014
Academic Editors Y Mao X Meng and J Zhou
Copyright copy 2014 Peixin Zhao et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited
This paper suggests a novel clustering method for analyzing the National Incident-Based Reporting System (NIBRS) data whichinclude the determination of correlation of different crime types the development of a likelihood index for crimes to occur in ajurisdiction and the clustering of jurisdictions based on crime type The method was tested by using the 2005 assault data from121 jurisdictions in Virginia as a test case The analyses of these data show that some different crime types are correlated and somedifferent crime parameters are correlatedwith different crime typesThe analyses also show that certain jurisdictionswithinVirginiashare certain crime patterns This information assists with constructing a pattern for a specific crime type and can be used todetermine whether a jurisdiction may be more likely to see this type of crime occur in their area
1 Introduction
The National Incident-Based Reporting System (NIBRS) isa crime reporting program for local state and federal lawenforcement agencies that provides a wealth of incident leveldata for use in analysis It is part of the Uniform CrimeReporting (UCR) Programwhich is administered by the FBITheUCRProgramprovides a nationwide view of crime basedon data submitted through state programs or directly to thenational UCR Program and has been operational for around70 years The NIBRS was implemented in the late 1970s tomeet law enforcement need for the 21st century This vastsystem houses information on offenses victims offendersproperty and persons arrested as well as the incident itselfThe data of NIBRS are well structured and readily availablefor researchers and law enforcement agencies to assist withunderstanding the intricate nature of crime
Akiyama and Nolan [1] outlined the structure of theNIBRS data set and provided methods for understandingand analyzing the data Dunn and Zelenock [2] also describeand test procedures to facilitate the use of this vast systemBuilding on these initial works many authors continue
to investigate how this storehouse of data can be turnedinto useful information for researchers and law enforcementagencies Much of the work employs descriptive statisticsapplied in various sophisticated ways to extract informationfrom the files For example Thompson et al [3] applydescriptive statistics to examine intimate partner violenceand make connections between this crime and other crimesthat occurred in the same incident They were able to showa link between intimidation and more serious violent crimesand extract information about the relationships between thevictim and the offender from the NIRBS data that helpsto better understand this type of crime Snyder [4] usedlogistic regression techniques to predict the arrest of juvenilerobbery offenders More recently Addington and Rennison[5] used logistic regressionmodels along with the NIBRS andthe National Crime Victimization Survey (NCVS) data forpredicting rape cooccurrence to provide a critical initial lookat rapes that occur with other crimes
For criminologists the NIBRS data holds the answersto many long-standing questions about crime criminaloffending and crime victimization However gaining accessto some of these answers has remained difficult because of
Hindawi Publishing Corporatione Scientific World JournalVolume 2014 Article ID 492461 8 pageshttpdxdoiorg1011552014492461
2 The Scientific World Journal
the size and complexity of the data Effective techniquessuch as data mining and clustering for criminal justice dataare of increasing importance to both the research and lawenforcement communities [6] In recent years clusteringcategorical data has gainedmore importance because it is oneof the fundamental methods in data mining [7] Clusteringcrime data as with other categorical data is unsupervisedlearning that aims at partitioning a data set into groups ofsimilar items The goal is to create clusters of data objectswhere the within-cluster similarity is maximized and thebetween-cluster similarity is minimized Over the yearsmany clustering algorithms have been developed and testedSome clustering techniques have developed specifically foruse with categorical data Abdu [8] presented three newclustering algorithms thatwere applied to the clustering of theNIBRS data Two of his approaches combine spectral analysisand clustering techniques that are scalable to large data setssuch as the NIBRS
Clustering categorical data poses a challenge not encoun-tered in clustering numerical data because the attributecategories are not ordered and defining a metric with whichto measure the distance between data objects in a dataset becomes a challenge Many of the algorithms that haveemerged for clustering categorical data rely on the occur-rencecooccurrence frequencies of attribute values in thedata set to determine clusters of similar data objects Thebasic goal is to choose a set of attribute categories thatprovide a summary of the data objects in a cluster There area wide range of clustering algorithms for categorical dataincluding K-modes [9] STIRR [10] CACTUS [11] ROCK[12] COOLCAT [13] LIMBO [14] and CLICKS [15] Agood summary can be found in [8] The clustering algorithmused in this research is a mathematically well-defined modelimplemented in a polynomial time algorithm that guaranteesan optimal solution [16]
Due to the lack of well-defined mathematical modelsand optimization goals most existing graph theory clusteringapproaches could not guarantee a proper clustering result ingeneral cases For example agglomerative hierarchical clus-tering methods could not produce proper clusters with largersizes while divisive hierarchical methods could not produceclusters with smaller sizes and clusters with large differencein their sizes and k-core method may produce clusters withsmall edge-cuts and so forth Many papers and articles havementioned these problems and frustration among users (egsee [17ndash19]) Even the most popular commercial softwareSAS is unable to produce proper outputs for some simpledata
The purpose of this paper is to present a novel multidi-mensional clustering method for the NIBRS data We firstlyoutlines a newmeasure called the likelihood index that helpsexamine quantitatively how likely a crime is to occur in aparticular jurisdiction This measure compares a vector thatdescribes a jurisdiction with a vector that represents a crimetype Then according to the defined distance between thesetwo vectors we can determine how closely the jurisdictionaligns with that crime type The data used in this studywere obtained from the 2005 NIBRS which is stored at theNational Archive for Criminal Justice Data at the University
Table 1 NIBRS indexes used
Segment Index Number ofsubindexes
Victim indexes
Type of victim 3Victim age 3Victim sex 2Victim race 5Victim ethnicity 3Victim residence status 3Aggravated assaulthomicidecircumstances 10
Type of injury 6
Offender indexesOffender age 3Offender sex 3Offender race 5
Additionalindexes
Injury 2Juvenile 1Violent crime 1Juvict 1Multiple victims 1Multiple offenders 1Multiple offenders and victims 1Multiple offenders and onevictim 1
One offender and multiplevictims 1
One offender and one victims 1
of Michigan This work explores the following researchquestions Do specific crimes exhibit certain quantifiablecharacteristics Do different types of crimes share similarquantifiable characteristics Do jurisdictions of a state clusterwith respect to different crime types What is the likelihoodthat if one type of crime is occurring in an area other types ofcrime with similar quantifiable characteristic will also occurin that area
The rest of the paper is organized as follows In Section 2we summarize the data unit of analysis and preparation InSection 3 we introduce the methods to deal with the datamatrix and take Virginia as a case study Section 4 providessome additional results and Section 5 gives the conclusionsof the research
2 Data Unit of Analysis and Preparation
The data sets available in the NIBRS provide a wealthof incident level data about each reported crime As for2010 approximately 40 states contribute their data to themassive data set The data and tools are made available byUniversity of Michigan for use by law enforcement agenciesand researchers In order to devise a manageable set of datafor preliminary testing of techniques and for preliminary dataanalysis only the 2005 data on assaults were explored Fromthe 2005 assault data 121 jurisdictions (counties or cities) in
The Scientific World Journal 3
Virginia were selected for examination These represent alljurisdictions within Virginia with populations greater than10000 There were 10183 incidents reported in these 121chosen jurisdictions
For this study 21 indexes from the NIBRS were chosenfrom the 246 available indexesThese 21 indexes were deemedimportant to provide the relevant characteristics of the vic-tim(s) offender(s) and the circumstances of each incidentThe selected particular indexes were listed in Table 1
In order to facilitate the selected analysis techniques thedata was expanded from one column with many possibleentries to multi columns that contained zero or one Forexample the Offender Segment index contains the sex ofthe offender and has the possible entries of male female orunknown This index column was split into three individualcolumns where an entry in the three columns of (1 0 0) meansfemale (0 1 0) means male and (0 0 1) means unknownThisturns the column for sex of the offender to three columns Allcreated columns were binary (01) columns that were usedto help classify the characteristics of the incident From theexpansion of the original 21 indexes 57 binary columns were
created This led to the creation of a 121 times 57 Crime DataMatrix where each row 119894 represents the 119894th jurisdiction andeach column 119895 represents the 119895th parameter related to theincident (eg offender sex victim resident status) or a crimetype (eg hate crime drug dealing) The Crime Data Matrixconstruction is pictured below
Crime Data Matrix = [[
[
11988611 119886
157
d
1198861211 11988612157
]]
]
(1)
where 119886119894119895
is the entry of the 119895th crime index from the 119894thjurisdiction
Normalization of the rows of thematrixwas completed bydividing each row entry by the population of the jurisdictionThis gave a per person rate for each crime parameter or eachcrime type Normalization of the columns was completedby averaging the columns and subtracting the average fromeach entry in the column Then each entry in the columnwas divided by the vector length of the column Equation (2)shows these normalization step-by-step operations
119886119894119895larr997888
number of occurances of crime parameter or crime typepopulation of the jurisdiction
where 119892119895is the average of the 119895th column
119886119894119895larr997888
119886119894119895
119899119895
where 119899119895is the vector length of the 119895th column
(2)
3 Finding Patterns in the Data
This section explains several different analyses that wereperformed on the data in thematrix described above in orderto attempt to answer the research questions listed in Section 1These analyses include comparing the columns of the matrixto determine the correlation of crime parameters to crimetypes and to determine the correlation of different crimetypes Also there were two analyses performed comparing therow vectors to develop the likelihood index and to cluster thejurisdictions by crime types
31 Correlation of Crime Parameters and Crime Types (Com-paring the Column Vectors) The motivation for comparingthe different crime parameters to crime types is to determineif there are some characteristics that can tell us about thelikelihood of a crime type to occur in a certain jurisdictionEach crime type may have factors that contribute to a specificcrime appearing in a certain place An overall increase incrime in an area may or may not correlate to an increase inany one particular type of crime say hate crime in that areaHowever there may be individual parameters whose increasemay indicate an increase in a particular type of crime Forexample if juvenile offenders are up in a certain area thismay indicate that hate crimes will also be up in that areaAlso crime type vectors were compared against each other
in a similar way For example crimes like juvenile gang werecompared against hate crime to see if these crime types alsohave a correlation
In order to determine the relationship between the 119895thparameter (119895th column of the Crime Data Matrix) and anyother column (another parameter or another crime type) thecorrelation coefficient (cosine of the angle between the twocolumn vectors over the norm of the vectors) was calculatedEach of the columns of the crimedatamatrix forms a vector ina 121-dimensional space and the vectors can be geometricallycompared with the correlation between two parametersrepresented by the cosine of the angle in this space Forexample the column for the juvenile offender could becompared against the column for hate crime or the columnfor hate crime can be compared to the column for gang-related crimeThese comparisons aremade by calculating theangle between these two columns to determine if there is anyrelationship and how strong that relationship may be Thismethod can assist in determining whether two columns varydirectly inversely or separately
For this comparison of two column vectors the variationin two vectors must be transformed to eliminate the effectsof mean differences Once the mean deviation is determinedthen the correlation can be determined by the cosine ofthe angle between the vectors As an example let 119895 =(1198951 1198952 119895
The sign of the answer is ignored since either a strong positiverelationship (close to 1 meaning that the two angles are inthe same direction and close to one another) or a strongnegative relationship (close to minus1 meaning that the two anglesare in opposite directions but nearly opposite one another)indicates that the vectors appear to be related in some way
In doing this comparison with one vector fixed andcomparing it to all other vectors and itself we form arow vector of size 57 Consider the example that comparesthe hate crime vector with all other vectors We constructanother vector that we refer to as the Hate Crime CharacterVector of size 57 which contains all the cosine 120579
119895(the
correlation coefficient of the 119895th parameter with respect tohate crime) A new vector is formed from all these correlationcoefficients (120572
1 1205722 120572
57) where each of the 57 parameters
has a correlation coefficient vector associated with it Nowconsider the corresponding components in the Hate CrimeCharacter Vector Table 2 tells us how hate crimes are relatedto aggravated assault crimes Among those crimes juvenilegang (04058) has higher correlation with hate crime whilethe others for example argument (01325) and drug dealing(00265) have relatively lower correlation
Table 3 also shows similar evidence that for victimrsquos agethe age group less than 18 is alsomore correlated to hate crime
than the other two age groups Many such comparisons canbe observed using these correlation coefficient vectors
32 Analyses of Jurisdictions (Comparing the Row Vectors)By using the correlation coefficients a 57-dimensional vectortwo data analyses can be performed each of which indicatesthe likelihood of a particular crime for each jurisdictionThe first one is a numerical index called the likelihood indexassigned to each of the 112 jurisdictions The second one isa clustering analysis of all jurisdictions Jurisdictions withsimilar recorded crime patterns (adjusted corresponding tocorrelation coefficients) form clusters
Let 120573 = 1205731 1205732 120573
121 be the likelihood index vector
where 120573119894is the likelihood index between jurisdiction 119894 and a
particular crime (eg hate crime) or crime parameter (egjuvenile offender) For this comparison of two row vectorsthe variation in two vectors must be transformed to eliminatethe effects of mean differences Once the mean deviation isdetermined then the correlation can be determined by thecosine of the angle between the vectors As an example let 119896 =(1198961 1198962 119896
57) be the row for theNorfolk jurisdiction and let
119897 = (1198971 1198972 11989757) be the row for the hate crime indicator To
find the correlation coefficient of these two rows calculate thefollowing
120573119894= cos120601
119894=
sum119895119896119895119897119895
119896 119897
(4)
Table 4 gives the top 30 jurisdictions of Virginia with respectto the hate crime likelihood index and also provides the dataon the clustering of the jurisdictions discussed in the nextsection
The Scientific World Journal 5
Table 2 Hate crime correlation coefficient vector values for othercrimes
Crime type Hate crime correlationcoefficient (120572
119895)
Argument 01325Assault on law enforcement officers 00371Drug dealing 00265Gangland (organized crimeinvolvement) 01565
Juvenile gang 04058Loversrsquo quarrel minus00281
Other felony involved minus00375
Other circumstances 00961Unknown circumstance 01356
Table 3 Hate crime correlation coefficients for age of victim
Age of victim Hate crime correlation coefficient (120572119895)
Age lt 18 0494418 le Age le 60 00540Age gt 60 00474
33 Using the Correlation Coefficients to Cluster the Jurisdic-tions To begin the clustering a weighted complete graphwith 121 vertices is formed The weight on each edge is thecorrelation coefficient between the jurisdictions The novelgraph theory clustering method we proposed in [20] is usedto find all the dense (highly weighted) subgraphs of thecomplete graph A distinguished feature of this methodnonbinary hierarchical tree clearly highlights meaningfulclusters which significantly reduces furthermanual efforts forcluster selections The results of clustering the jurisdictionswith respect to hate crimes are also displayed in Figure 1 andTable 4
It can be seen that of the top 30 jurisdictions with respectto hate crimes 12 of the top 15 are in Cluster A and the othersare in Cluster B The remaining 91 jurisdictions also fall intoCluster B The 12 higher hate crime rate counties (cities) areshowed in Figure 1 (in red and blue)
4 Additional Results
Similar analyses where performedwith respect to other crimetypes drug-dealing juvenile gang and gangland (organizedcrime involvement) The results are summarized in Tables 5ndash10
Table 6 summarizes the drug dealing vector comparisonwith the offender parameter vectors
Tables 5 and 6 summarize the comparison of the drugdealing vector with other crime parameter vectors Each ofthe other crime parameters related to the victim and theoffender was compared to the drug dealing vector A highvalue for this correlation coefficient implies that there is acorrelation between this crime parameter and drug dealingAgain the motivation for comparing the different crimeparameters to crime types is an attempt to determine if there
Table 4 Likelihood index and clustering for hate crime
Name Cluster Hate crimelikelihood index
1 NEWPORT NEWS B 08672 NORFOLK B 08373 CHESAPEAKE B 08104 GREENSVILLE B 07625 PORTSMOUTH B 07596 RICHMOND B 07167 WYTHE B 06818 ALEXANDRIA B 06789 BRISTOL B 066710 NEW KENT B 061811 FAUQUIER B 053112 ROANOKE A 043913 RICHMOND A 038614 WILLIAMSBURG A 037715 VIRGINIA BEACH B 031816 CHARLOTTESVILLE A 031217 PETERSBURG A 028018 SPOTSYLVANIA A 022619 HOPEWELL A 020520 RUSSELL A 012121 CLARKE A 012122 WINCHESTER A 011823 SUFFOLK A 011124 MARTINSVILLE A 007525 STAUNTON A 007326 GALAX A 007027 SHENANDOAH A 006328 CAROLINE A 004429 SURRY A minus0020
30 DANVILLE A minus0094
are some characteristics that can tell us about the likelihoodof a crime type to occur in a certain jurisdiction Forexample the correlation between the drug dealing vector andindividual victim is 05789 which is much higher than thecorrelation between drug dealing and victim type business(01721) or victim type societypublic (01729) This wouldseem to imply that for jurisdictions where the individual vic-tim crimes are evident they may also have the likelihood fordrug dealing related crimes The tables show the correlationcoefficient values the correlation values in bold show a highercorrelation with the drug dealing crime type than the otherparameters in that category
Table 7 summarizes the drug dealing vector comparisonwith the other crime types The highest correlation is withargument
Each of the other crime parameters related to the victimand the offender was compared to the gangland (organized
6 The Scientific World Journal
Table 5 Drug dealing correlation with other crime parametersrelated to victim
Drug dealing correlation coefficientType of victim
individual 05789Business 01721Societypublic 01729
Age of victimAge lt 18 02933Age ge 60 0387218 le age lt 60 05993
Sex of victimMale 05774Female 04989
Race of victimWhite 03906Black 05028AsiaPacific Islander 00764Unknown 00003American Indian 00393
Ethnicity of victimHispanic origin 00688Not of Hispanic origin 06009Unknown minus00616
Resident status of victimNonresident 02631Resident 05740Unknown 01290
Table 6 Drug dealing correlation with other crime parametersrelated to offender
Offender ageAge lt 18 03514Age ge 60 0376218 le age lt 60 05678
crime involvement) vector There were no significant rela-tionships to report from this comparison
Table 8 gives the top 30 jurisdictions of Virginia withrespect to the drug dealing likelihood index and also showshow the jurisdictions are clustered Given the 121 Virginiajurisdictions being considered of the top 30 with respect
Table 7 Drug dealing correlation with other crime types
Other crime typesArgument 04854Assault on law enforcement officer(s) 03117Gangland (organized crime involvement) 03456Juvenile gang 00706Loversrsquo quarrel 02967Other felony involved 02955Hate crime 00265
Table 8 Likelihood index and clustering for drug dealing
Name Cluster Drug dealinglikelihood index
1 CHARLOTTESVILLE B 090942 NEWPORT NEWS B 090123 CHESAPEAKE B 088514 PETERSBURG B 086375 RICHMOND B 083656 PORTSMOUTH A 082817 HOPEWELL A 079178 ROANOKE B 07919 SUFFOLK B 0783910 NORFOLK B 0775611 BRISTOL B 0736512 DANVILLE A 070713 GREENSVILLE A 0703214 GALAX A 0688115 CAROLINE A 0685216 SUSSEX A 0639617 WINCHESTER A 0632718 FREDERICKSBURG B 0603119 CLARKE A 0602120 FRANKLIN A 0580821 RICHMOND A 0566722 LYNCHBURG A 0539523 MECKLENBURG A 0518524 RADFORD A 0513325 GOOCHLAND A 0511426 MANASSAS A 049427 HENRY B 0484428 NORTON A 0445629 WISE B 0438130 WILLIAMSBURG A 0395
to the likelihood index twelve are clustered together Onlythree others jurisdictions from Cluster B appear outside thetop 30 these jurisdictions are Smyth (35) Tazewell (37)and Pittsylvania (50) Each of the other crime parametersrelated to the victim and the offender was compared to the
The Scientific World Journal 7
Table 9 Likelihood index and clustering for gangland (organizedcrime involvement)
Name Cluster Ganglandlikelihood index
1 NEWPORT NEWS B 092172 ROANOKE B 087863 CHESAPEAKE B 087134 PETERSBURG B 084215 NORFOLK B 081296 RICHMOND B 079537 LYNCHBURG B 069658 FREDERICKSBURG B 06899 BRISTOL B 0678710 ALEXANDRIA B 0656911 NORTHAMPTON B 063212 LOUDOUN B 0493913 CHARLOTTESVILLE A 0415214 STAFFORD B 0385315 PORTSMOUTH A 0307516 HOPEWELL A 0272417 GALAX A 0212118 CAROLINE A 0137219 SUFFOLK A 0107720 GREENSVILLE A 0081721 CLARKE A 0068922 DANVILLE A 0051923 HENRY A 0051824 RICHMOND A 0007425 WINCHESTER A minus00071
26 FRANKLIN A minus00508
27 SUSSEX A minus00763
28 MANASSAS A minus00812
29 WILLIAMSBURG A minus01001
30 GOOCHLAND A minus01112
gangland (organized crime involvement) vector There wereno significant relationships to report from this comparison
Table 9 gives the top 30 jurisdictions of Virginia withrespect to the gangland (organized crime involvement) like-lihood index and also shows how the jurisdictions are clus-tered Given the 121 Virginia jurisdictions being consideredof the top 30 with respect to the likelihood index thirteen areclustered together Only one other jurisdiction from clusterB appears outside the top 30 this one jurisdiction is FairfaxCounty PD (31) Each of the other crime parameters relatedto the victim and the offender was compared to the juvenilegang vector There were no significant relationships to reportfrom this comparison
Table 10 gives the top 30 jurisdictions of Virginia withrespect to the juvenile gang likelihood index and also showshow the jurisdictions are clustered Given the 121 Virginiajurisdictions being considered of the top 30 with respect to
Table 10 Likelihood index and clustering for juvenile gang
Name Cluster Juvenile ganglikelihood index
1 NORFOLK B 084242 ROANOKE B 083873 RICHMOND B 079874 PORTSMOUTH B 079775 GREENSVILLE B 075756 CHESAPEAKE B 072117 NEWPORT NEWS B 071128 WILLIAMSBURG B 066399 WYTHE B 0571910 ALEXANDRIA B 0559711 LYNCHBURG B 0545212 MARTINSVILLE B 0535513 PULASKI B 0488414 HAMPTON B 0484815 SPOTSYLVANIA B 045116 POWHATAN B 044217 PETERSBURG A 0420718 CHARLOTTESVILLE A 0406419 HOPEWELL A 0351920 RICHMOND A 020821 GALAX A 0202522 BRISTOL A 0201223 SUFFOLK A 0192524 CAROLINE A 012125 CLARKE A 0093926 WINCHESTER A 0089327 DANVILLE A 0083428 SUSSEX A 005129 CAMPBELL A 0011930 FRANKLIN A minus00773
the likelihood index sixteen are clustered together No otherjurisdiction from Cluster B appears outside the top 30
5 Conclusion
The NIBRS provides a wealth of incident level data for usein analysis The methods investigated in this research yieldedpromising preliminary results The methods were appliedonly to the assault data from 2005 but can easily be extendedto other crime types and to other years to validate these resultsand also provide longitudinal investigation
The comparison between the crime type vector and theindividual parameters vectors helped in two cases (hatecrimes and drug dealing) to determine which factors wasmore related to those crimes The different types of analysesthat were conducted on the jurisdictions helped to validateone anotherThe likelihood index looked at whether a certaincrime pattern existed in that jurisdiction while the clustering
8 The Scientific World Journal
method sought to cluster all the jurisdictions based on thecrime patterns of that jurisdictionThis information could beuseful to assist law enforcement agencies or policy makersin determining which jurisdictions share common challengesthat could possibly be addressed through cooperation andsharing resources between jurisdictions
The next steps would be to utilize this same approach fordata from other states or perhaps a larger region to examineif the same information is observed from the analyses It willbe interesting to see if Virginia data and other states havethe same patterns or if different patterns emerge Furtherresearch and refinement of these methods should yield toolsthat would provide researchers law enforcement agenciesand government officials with a means to find patterns ofdifferent crime types and possibly identify jurisdictions thatmay be likely to experience that type of crime
Conflict of Interests
Peixin Zhao Marjorie Darrah Jim Nolan and Cun-QuanZhang certify that there is no actual or potential conflict ofinterests in relation to this paper
Acknowledgments
First author was partially supported by the China Post-doctoral Science Foundation Funded Project (2011M501149)the Humanity and Social Science Foundation of Ministryof Education of China (12YJCZH303) the Special FundProject for Postdoctoral Innovation of Shandong Province(201103061) the Informationization Research Project ofShandong Province (2013EI153) the National Statistical Sci-ence Project (2013LZ38) and Independent Innovation Foun-dation of Shandong University (IIFSDU) (IFW12109)
References
[1] Y Akiyama and J Nolan ldquoMethods for understanding andanalyzingNIBRS datardquo Journal of Quantitative Criminology vol15 no 2 pp 225ndash238 1999
[2] C Dunn and T Zelenock ldquoNIBRS data available for secondaryanlaysisrdquo Journal of Quantitative Criminology vol 15 no 2 pp239ndash248 1999
[3] M PThompson L E Saltzman and D Bibel ldquoApplying NIBRSdata to the study of intimate partner violence Massachusetts asa case studyrdquo Journal of Quantitative Criminology vol 15 no 2pp 163ndash180 1999
[4] H N Snyder ldquoThe Overpresentation of juvenile crime pro-portions in robbery clearance statisticsrdquo Journal of QuantitativeCriminology vol 13 no 2 pp 151ndash161 1999
[5] L A Addington and C M Rennison ldquoRape co-occurrence doadditional crimes affect victim reporting and police clearanceof raperdquo Journal of Quantitative Criminology vol 24 no 2 pp205ndash226 2008
[6] H Chen W Chung J J Xu G Wang Y Qin and M ChauldquoCrime data mining a general framework and some examplesrdquoComputer vol 37 no 4 pp 50ndash56 2004
[7] O Maimon and L Rokach ldquoData mining and knowl-edge discovery handbookrdquo New York NY USA Springer
National Incident-Based Reporting System National Archiveof Criminal Justice Data 2005 httpwwwicpsrumicheduNACJDNIBRS
[8] E Abdu Clustering categorical data using data summaries andspectral techniques [PhD thesis] The City University of NewYork 2009
[9] Z Huang ldquoExtensions to the k-means algorithm for clusteringlarge data sets with categorical valuesrdquo Data Mining andKnowledge Discovery vol 2 no 3 pp 283ndash304 1998
[10] D Gibson J Kleinberg and P Raghavan ldquoClustering cate-gorical data an approach based on dynamical systemsrdquo inProceedings of the 24th InternationalConference on Very LargeDatabases 1998
[11] V Ganti J Gehrke and R Ramakrishnan ldquoCACTUS clusteringcategorical data using summariesrdquo in proceedings of the 5thACM SIGKDD International Conference on Knowledge Discov-ery and data mining pp 73ndash83 1999
[12] S Guha R Rastogi and K Shim ldquoROCK a robust clusteringalgorithm for categorical attributesrdquo in Proceedings of the 15thInternational Conference on Data Engineering (ICDE rsquo99) pp512ndash521 March 1999
[13] D Barbara J Couto and Y Li ldquoCOOLCAT an entropy-based algorithm for categorical clusteringrdquo in Proceedings ofthe 11th International Conference on Information and KnowledgeManagement (CIKM rsquo02) pp 582ndash589 November 2002
[14] P Andritsos Scalable clustering of categorical data and applica-tions [PhD thesis] University of Toronto 2004
[15] M Zaki M Peters I Assent and T Seidl ldquoClicks an effectivealgorithm for mining subspace clusters in categorical datasetsrdquoData and Knowledge Engineering vol 60 no 1 pp 51ndash70 2007
[16] Y Ou and C Q Zhang ldquoA new multimembership clusteringmethodrdquo Journal of Industrial and Management Optimizationvol 3 no 4 pp 619ndash624 2007
[17] RN Shepard andPArabie ldquoAdditive clustering representationof similarities as combinations of discrete overlapping proper-tiesrdquo Psychological Review vol 86 no 2 pp 87ndash123 1979
[18] A V Lukashin and R Fuchs ldquoAnalysis of temporal geneexpression profiles clustering by simulated annealing anddetermining the optimal number of clustersrdquo Bioinformaticsvol 17 no 5 pp 405ndash414 2001
[19] Y Xu V Olman and D Xu ldquoClustering gene expression datausing a graph-theoretic approach an application of minimumspanning treesrdquo Bioinformatics vol 18 no 4 pp 536ndash545 2002
[20] P Zhao and C Q Zhang ldquoA new clustering method and itsapplication in social networksrdquo Pattern Recognition Letters vol32 no 15 pp 2109ndash2118 2011
the size and complexity of the data Effective techniquessuch as data mining and clustering for criminal justice dataare of increasing importance to both the research and lawenforcement communities [6] In recent years clusteringcategorical data has gainedmore importance because it is oneof the fundamental methods in data mining [7] Clusteringcrime data as with other categorical data is unsupervisedlearning that aims at partitioning a data set into groups ofsimilar items The goal is to create clusters of data objectswhere the within-cluster similarity is maximized and thebetween-cluster similarity is minimized Over the yearsmany clustering algorithms have been developed and testedSome clustering techniques have developed specifically foruse with categorical data Abdu [8] presented three newclustering algorithms thatwere applied to the clustering of theNIBRS data Two of his approaches combine spectral analysisand clustering techniques that are scalable to large data setssuch as the NIBRS
Clustering categorical data poses a challenge not encoun-tered in clustering numerical data because the attributecategories are not ordered and defining a metric with whichto measure the distance between data objects in a dataset becomes a challenge Many of the algorithms that haveemerged for clustering categorical data rely on the occur-rencecooccurrence frequencies of attribute values in thedata set to determine clusters of similar data objects Thebasic goal is to choose a set of attribute categories thatprovide a summary of the data objects in a cluster There area wide range of clustering algorithms for categorical dataincluding K-modes [9] STIRR [10] CACTUS [11] ROCK[12] COOLCAT [13] LIMBO [14] and CLICKS [15] Agood summary can be found in [8] The clustering algorithmused in this research is a mathematically well-defined modelimplemented in a polynomial time algorithm that guaranteesan optimal solution [16]
Due to the lack of well-defined mathematical modelsand optimization goals most existing graph theory clusteringapproaches could not guarantee a proper clustering result ingeneral cases For example agglomerative hierarchical clus-tering methods could not produce proper clusters with largersizes while divisive hierarchical methods could not produceclusters with smaller sizes and clusters with large differencein their sizes and k-core method may produce clusters withsmall edge-cuts and so forth Many papers and articles havementioned these problems and frustration among users (egsee [17ndash19]) Even the most popular commercial softwareSAS is unable to produce proper outputs for some simpledata
The purpose of this paper is to present a novel multidi-mensional clustering method for the NIBRS data We firstlyoutlines a newmeasure called the likelihood index that helpsexamine quantitatively how likely a crime is to occur in aparticular jurisdiction This measure compares a vector thatdescribes a jurisdiction with a vector that represents a crimetype Then according to the defined distance between thesetwo vectors we can determine how closely the jurisdictionaligns with that crime type The data used in this studywere obtained from the 2005 NIBRS which is stored at theNational Archive for Criminal Justice Data at the University
Table 1 NIBRS indexes used
Segment Index Number ofsubindexes
Victim indexes
Type of victim 3Victim age 3Victim sex 2Victim race 5Victim ethnicity 3Victim residence status 3Aggravated assaulthomicidecircumstances 10
Type of injury 6
Offender indexesOffender age 3Offender sex 3Offender race 5
Additionalindexes
Injury 2Juvenile 1Violent crime 1Juvict 1Multiple victims 1Multiple offenders 1Multiple offenders and victims 1Multiple offenders and onevictim 1
One offender and multiplevictims 1
One offender and one victims 1
of Michigan This work explores the following researchquestions Do specific crimes exhibit certain quantifiablecharacteristics Do different types of crimes share similarquantifiable characteristics Do jurisdictions of a state clusterwith respect to different crime types What is the likelihoodthat if one type of crime is occurring in an area other types ofcrime with similar quantifiable characteristic will also occurin that area
The rest of the paper is organized as follows In Section 2we summarize the data unit of analysis and preparation InSection 3 we introduce the methods to deal with the datamatrix and take Virginia as a case study Section 4 providessome additional results and Section 5 gives the conclusionsof the research
2 Data Unit of Analysis and Preparation
The data sets available in the NIBRS provide a wealthof incident level data about each reported crime As for2010 approximately 40 states contribute their data to themassive data set The data and tools are made available byUniversity of Michigan for use by law enforcement agenciesand researchers In order to devise a manageable set of datafor preliminary testing of techniques and for preliminary dataanalysis only the 2005 data on assaults were explored Fromthe 2005 assault data 121 jurisdictions (counties or cities) in
The Scientific World Journal 3
Virginia were selected for examination These represent alljurisdictions within Virginia with populations greater than10000 There were 10183 incidents reported in these 121chosen jurisdictions
For this study 21 indexes from the NIBRS were chosenfrom the 246 available indexesThese 21 indexes were deemedimportant to provide the relevant characteristics of the vic-tim(s) offender(s) and the circumstances of each incidentThe selected particular indexes were listed in Table 1
In order to facilitate the selected analysis techniques thedata was expanded from one column with many possibleentries to multi columns that contained zero or one Forexample the Offender Segment index contains the sex ofthe offender and has the possible entries of male female orunknown This index column was split into three individualcolumns where an entry in the three columns of (1 0 0) meansfemale (0 1 0) means male and (0 0 1) means unknownThisturns the column for sex of the offender to three columns Allcreated columns were binary (01) columns that were usedto help classify the characteristics of the incident From theexpansion of the original 21 indexes 57 binary columns were
created This led to the creation of a 121 times 57 Crime DataMatrix where each row 119894 represents the 119894th jurisdiction andeach column 119895 represents the 119895th parameter related to theincident (eg offender sex victim resident status) or a crimetype (eg hate crime drug dealing) The Crime Data Matrixconstruction is pictured below
Crime Data Matrix = [[
[
11988611 119886
157
d
1198861211 11988612157
]]
]
(1)
where 119886119894119895
is the entry of the 119895th crime index from the 119894thjurisdiction
Normalization of the rows of thematrixwas completed bydividing each row entry by the population of the jurisdictionThis gave a per person rate for each crime parameter or eachcrime type Normalization of the columns was completedby averaging the columns and subtracting the average fromeach entry in the column Then each entry in the columnwas divided by the vector length of the column Equation (2)shows these normalization step-by-step operations
119886119894119895larr997888
number of occurances of crime parameter or crime typepopulation of the jurisdiction
where 119892119895is the average of the 119895th column
119886119894119895larr997888
119886119894119895
119899119895
where 119899119895is the vector length of the 119895th column
(2)
3 Finding Patterns in the Data
This section explains several different analyses that wereperformed on the data in thematrix described above in orderto attempt to answer the research questions listed in Section 1These analyses include comparing the columns of the matrixto determine the correlation of crime parameters to crimetypes and to determine the correlation of different crimetypes Also there were two analyses performed comparing therow vectors to develop the likelihood index and to cluster thejurisdictions by crime types
31 Correlation of Crime Parameters and Crime Types (Com-paring the Column Vectors) The motivation for comparingthe different crime parameters to crime types is to determineif there are some characteristics that can tell us about thelikelihood of a crime type to occur in a certain jurisdictionEach crime type may have factors that contribute to a specificcrime appearing in a certain place An overall increase incrime in an area may or may not correlate to an increase inany one particular type of crime say hate crime in that areaHowever there may be individual parameters whose increasemay indicate an increase in a particular type of crime Forexample if juvenile offenders are up in a certain area thismay indicate that hate crimes will also be up in that areaAlso crime type vectors were compared against each other
in a similar way For example crimes like juvenile gang werecompared against hate crime to see if these crime types alsohave a correlation
In order to determine the relationship between the 119895thparameter (119895th column of the Crime Data Matrix) and anyother column (another parameter or another crime type) thecorrelation coefficient (cosine of the angle between the twocolumn vectors over the norm of the vectors) was calculatedEach of the columns of the crimedatamatrix forms a vector ina 121-dimensional space and the vectors can be geometricallycompared with the correlation between two parametersrepresented by the cosine of the angle in this space Forexample the column for the juvenile offender could becompared against the column for hate crime or the columnfor hate crime can be compared to the column for gang-related crimeThese comparisons aremade by calculating theangle between these two columns to determine if there is anyrelationship and how strong that relationship may be Thismethod can assist in determining whether two columns varydirectly inversely or separately
For this comparison of two column vectors the variationin two vectors must be transformed to eliminate the effectsof mean differences Once the mean deviation is determinedthen the correlation can be determined by the cosine ofthe angle between the vectors As an example let 119895 =(1198951 1198952 119895
The sign of the answer is ignored since either a strong positiverelationship (close to 1 meaning that the two angles are inthe same direction and close to one another) or a strongnegative relationship (close to minus1 meaning that the two anglesare in opposite directions but nearly opposite one another)indicates that the vectors appear to be related in some way
In doing this comparison with one vector fixed andcomparing it to all other vectors and itself we form arow vector of size 57 Consider the example that comparesthe hate crime vector with all other vectors We constructanother vector that we refer to as the Hate Crime CharacterVector of size 57 which contains all the cosine 120579
119895(the
correlation coefficient of the 119895th parameter with respect tohate crime) A new vector is formed from all these correlationcoefficients (120572
1 1205722 120572
57) where each of the 57 parameters
has a correlation coefficient vector associated with it Nowconsider the corresponding components in the Hate CrimeCharacter Vector Table 2 tells us how hate crimes are relatedto aggravated assault crimes Among those crimes juvenilegang (04058) has higher correlation with hate crime whilethe others for example argument (01325) and drug dealing(00265) have relatively lower correlation
Table 3 also shows similar evidence that for victimrsquos agethe age group less than 18 is alsomore correlated to hate crime
than the other two age groups Many such comparisons canbe observed using these correlation coefficient vectors
32 Analyses of Jurisdictions (Comparing the Row Vectors)By using the correlation coefficients a 57-dimensional vectortwo data analyses can be performed each of which indicatesthe likelihood of a particular crime for each jurisdictionThe first one is a numerical index called the likelihood indexassigned to each of the 112 jurisdictions The second one isa clustering analysis of all jurisdictions Jurisdictions withsimilar recorded crime patterns (adjusted corresponding tocorrelation coefficients) form clusters
Let 120573 = 1205731 1205732 120573
121 be the likelihood index vector
where 120573119894is the likelihood index between jurisdiction 119894 and a
particular crime (eg hate crime) or crime parameter (egjuvenile offender) For this comparison of two row vectorsthe variation in two vectors must be transformed to eliminatethe effects of mean differences Once the mean deviation isdetermined then the correlation can be determined by thecosine of the angle between the vectors As an example let 119896 =(1198961 1198962 119896
57) be the row for theNorfolk jurisdiction and let
119897 = (1198971 1198972 11989757) be the row for the hate crime indicator To
find the correlation coefficient of these two rows calculate thefollowing
120573119894= cos120601
119894=
sum119895119896119895119897119895
119896 119897
(4)
Table 4 gives the top 30 jurisdictions of Virginia with respectto the hate crime likelihood index and also provides the dataon the clustering of the jurisdictions discussed in the nextsection
The Scientific World Journal 5
Table 2 Hate crime correlation coefficient vector values for othercrimes
Crime type Hate crime correlationcoefficient (120572
119895)
Argument 01325Assault on law enforcement officers 00371Drug dealing 00265Gangland (organized crimeinvolvement) 01565
Juvenile gang 04058Loversrsquo quarrel minus00281
Other felony involved minus00375
Other circumstances 00961Unknown circumstance 01356
Table 3 Hate crime correlation coefficients for age of victim
Age of victim Hate crime correlation coefficient (120572119895)
Age lt 18 0494418 le Age le 60 00540Age gt 60 00474
33 Using the Correlation Coefficients to Cluster the Jurisdic-tions To begin the clustering a weighted complete graphwith 121 vertices is formed The weight on each edge is thecorrelation coefficient between the jurisdictions The novelgraph theory clustering method we proposed in [20] is usedto find all the dense (highly weighted) subgraphs of thecomplete graph A distinguished feature of this methodnonbinary hierarchical tree clearly highlights meaningfulclusters which significantly reduces furthermanual efforts forcluster selections The results of clustering the jurisdictionswith respect to hate crimes are also displayed in Figure 1 andTable 4
It can be seen that of the top 30 jurisdictions with respectto hate crimes 12 of the top 15 are in Cluster A and the othersare in Cluster B The remaining 91 jurisdictions also fall intoCluster B The 12 higher hate crime rate counties (cities) areshowed in Figure 1 (in red and blue)
4 Additional Results
Similar analyses where performedwith respect to other crimetypes drug-dealing juvenile gang and gangland (organizedcrime involvement) The results are summarized in Tables 5ndash10
Table 6 summarizes the drug dealing vector comparisonwith the offender parameter vectors
Tables 5 and 6 summarize the comparison of the drugdealing vector with other crime parameter vectors Each ofthe other crime parameters related to the victim and theoffender was compared to the drug dealing vector A highvalue for this correlation coefficient implies that there is acorrelation between this crime parameter and drug dealingAgain the motivation for comparing the different crimeparameters to crime types is an attempt to determine if there
Table 4 Likelihood index and clustering for hate crime
Name Cluster Hate crimelikelihood index
1 NEWPORT NEWS B 08672 NORFOLK B 08373 CHESAPEAKE B 08104 GREENSVILLE B 07625 PORTSMOUTH B 07596 RICHMOND B 07167 WYTHE B 06818 ALEXANDRIA B 06789 BRISTOL B 066710 NEW KENT B 061811 FAUQUIER B 053112 ROANOKE A 043913 RICHMOND A 038614 WILLIAMSBURG A 037715 VIRGINIA BEACH B 031816 CHARLOTTESVILLE A 031217 PETERSBURG A 028018 SPOTSYLVANIA A 022619 HOPEWELL A 020520 RUSSELL A 012121 CLARKE A 012122 WINCHESTER A 011823 SUFFOLK A 011124 MARTINSVILLE A 007525 STAUNTON A 007326 GALAX A 007027 SHENANDOAH A 006328 CAROLINE A 004429 SURRY A minus0020
30 DANVILLE A minus0094
are some characteristics that can tell us about the likelihoodof a crime type to occur in a certain jurisdiction Forexample the correlation between the drug dealing vector andindividual victim is 05789 which is much higher than thecorrelation between drug dealing and victim type business(01721) or victim type societypublic (01729) This wouldseem to imply that for jurisdictions where the individual vic-tim crimes are evident they may also have the likelihood fordrug dealing related crimes The tables show the correlationcoefficient values the correlation values in bold show a highercorrelation with the drug dealing crime type than the otherparameters in that category
Table 7 summarizes the drug dealing vector comparisonwith the other crime types The highest correlation is withargument
Each of the other crime parameters related to the victimand the offender was compared to the gangland (organized
6 The Scientific World Journal
Table 5 Drug dealing correlation with other crime parametersrelated to victim
Drug dealing correlation coefficientType of victim
individual 05789Business 01721Societypublic 01729
Age of victimAge lt 18 02933Age ge 60 0387218 le age lt 60 05993
Sex of victimMale 05774Female 04989
Race of victimWhite 03906Black 05028AsiaPacific Islander 00764Unknown 00003American Indian 00393
Ethnicity of victimHispanic origin 00688Not of Hispanic origin 06009Unknown minus00616
Resident status of victimNonresident 02631Resident 05740Unknown 01290
Table 6 Drug dealing correlation with other crime parametersrelated to offender
Offender ageAge lt 18 03514Age ge 60 0376218 le age lt 60 05678
crime involvement) vector There were no significant rela-tionships to report from this comparison
Table 8 gives the top 30 jurisdictions of Virginia withrespect to the drug dealing likelihood index and also showshow the jurisdictions are clustered Given the 121 Virginiajurisdictions being considered of the top 30 with respect
Table 7 Drug dealing correlation with other crime types
Other crime typesArgument 04854Assault on law enforcement officer(s) 03117Gangland (organized crime involvement) 03456Juvenile gang 00706Loversrsquo quarrel 02967Other felony involved 02955Hate crime 00265
Table 8 Likelihood index and clustering for drug dealing
Name Cluster Drug dealinglikelihood index
1 CHARLOTTESVILLE B 090942 NEWPORT NEWS B 090123 CHESAPEAKE B 088514 PETERSBURG B 086375 RICHMOND B 083656 PORTSMOUTH A 082817 HOPEWELL A 079178 ROANOKE B 07919 SUFFOLK B 0783910 NORFOLK B 0775611 BRISTOL B 0736512 DANVILLE A 070713 GREENSVILLE A 0703214 GALAX A 0688115 CAROLINE A 0685216 SUSSEX A 0639617 WINCHESTER A 0632718 FREDERICKSBURG B 0603119 CLARKE A 0602120 FRANKLIN A 0580821 RICHMOND A 0566722 LYNCHBURG A 0539523 MECKLENBURG A 0518524 RADFORD A 0513325 GOOCHLAND A 0511426 MANASSAS A 049427 HENRY B 0484428 NORTON A 0445629 WISE B 0438130 WILLIAMSBURG A 0395
to the likelihood index twelve are clustered together Onlythree others jurisdictions from Cluster B appear outside thetop 30 these jurisdictions are Smyth (35) Tazewell (37)and Pittsylvania (50) Each of the other crime parametersrelated to the victim and the offender was compared to the
The Scientific World Journal 7
Table 9 Likelihood index and clustering for gangland (organizedcrime involvement)
Name Cluster Ganglandlikelihood index
1 NEWPORT NEWS B 092172 ROANOKE B 087863 CHESAPEAKE B 087134 PETERSBURG B 084215 NORFOLK B 081296 RICHMOND B 079537 LYNCHBURG B 069658 FREDERICKSBURG B 06899 BRISTOL B 0678710 ALEXANDRIA B 0656911 NORTHAMPTON B 063212 LOUDOUN B 0493913 CHARLOTTESVILLE A 0415214 STAFFORD B 0385315 PORTSMOUTH A 0307516 HOPEWELL A 0272417 GALAX A 0212118 CAROLINE A 0137219 SUFFOLK A 0107720 GREENSVILLE A 0081721 CLARKE A 0068922 DANVILLE A 0051923 HENRY A 0051824 RICHMOND A 0007425 WINCHESTER A minus00071
26 FRANKLIN A minus00508
27 SUSSEX A minus00763
28 MANASSAS A minus00812
29 WILLIAMSBURG A minus01001
30 GOOCHLAND A minus01112
gangland (organized crime involvement) vector There wereno significant relationships to report from this comparison
Table 9 gives the top 30 jurisdictions of Virginia withrespect to the gangland (organized crime involvement) like-lihood index and also shows how the jurisdictions are clus-tered Given the 121 Virginia jurisdictions being consideredof the top 30 with respect to the likelihood index thirteen areclustered together Only one other jurisdiction from clusterB appears outside the top 30 this one jurisdiction is FairfaxCounty PD (31) Each of the other crime parameters relatedto the victim and the offender was compared to the juvenilegang vector There were no significant relationships to reportfrom this comparison
Table 10 gives the top 30 jurisdictions of Virginia withrespect to the juvenile gang likelihood index and also showshow the jurisdictions are clustered Given the 121 Virginiajurisdictions being considered of the top 30 with respect to
Table 10 Likelihood index and clustering for juvenile gang
Name Cluster Juvenile ganglikelihood index
1 NORFOLK B 084242 ROANOKE B 083873 RICHMOND B 079874 PORTSMOUTH B 079775 GREENSVILLE B 075756 CHESAPEAKE B 072117 NEWPORT NEWS B 071128 WILLIAMSBURG B 066399 WYTHE B 0571910 ALEXANDRIA B 0559711 LYNCHBURG B 0545212 MARTINSVILLE B 0535513 PULASKI B 0488414 HAMPTON B 0484815 SPOTSYLVANIA B 045116 POWHATAN B 044217 PETERSBURG A 0420718 CHARLOTTESVILLE A 0406419 HOPEWELL A 0351920 RICHMOND A 020821 GALAX A 0202522 BRISTOL A 0201223 SUFFOLK A 0192524 CAROLINE A 012125 CLARKE A 0093926 WINCHESTER A 0089327 DANVILLE A 0083428 SUSSEX A 005129 CAMPBELL A 0011930 FRANKLIN A minus00773
the likelihood index sixteen are clustered together No otherjurisdiction from Cluster B appears outside the top 30
5 Conclusion
The NIBRS provides a wealth of incident level data for usein analysis The methods investigated in this research yieldedpromising preliminary results The methods were appliedonly to the assault data from 2005 but can easily be extendedto other crime types and to other years to validate these resultsand also provide longitudinal investigation
The comparison between the crime type vector and theindividual parameters vectors helped in two cases (hatecrimes and drug dealing) to determine which factors wasmore related to those crimes The different types of analysesthat were conducted on the jurisdictions helped to validateone anotherThe likelihood index looked at whether a certaincrime pattern existed in that jurisdiction while the clustering
8 The Scientific World Journal
method sought to cluster all the jurisdictions based on thecrime patterns of that jurisdictionThis information could beuseful to assist law enforcement agencies or policy makersin determining which jurisdictions share common challengesthat could possibly be addressed through cooperation andsharing resources between jurisdictions
The next steps would be to utilize this same approach fordata from other states or perhaps a larger region to examineif the same information is observed from the analyses It willbe interesting to see if Virginia data and other states havethe same patterns or if different patterns emerge Furtherresearch and refinement of these methods should yield toolsthat would provide researchers law enforcement agenciesand government officials with a means to find patterns ofdifferent crime types and possibly identify jurisdictions thatmay be likely to experience that type of crime
Conflict of Interests
Peixin Zhao Marjorie Darrah Jim Nolan and Cun-QuanZhang certify that there is no actual or potential conflict ofinterests in relation to this paper
Acknowledgments
First author was partially supported by the China Post-doctoral Science Foundation Funded Project (2011M501149)the Humanity and Social Science Foundation of Ministryof Education of China (12YJCZH303) the Special FundProject for Postdoctoral Innovation of Shandong Province(201103061) the Informationization Research Project ofShandong Province (2013EI153) the National Statistical Sci-ence Project (2013LZ38) and Independent Innovation Foun-dation of Shandong University (IIFSDU) (IFW12109)
References
[1] Y Akiyama and J Nolan ldquoMethods for understanding andanalyzingNIBRS datardquo Journal of Quantitative Criminology vol15 no 2 pp 225ndash238 1999
[2] C Dunn and T Zelenock ldquoNIBRS data available for secondaryanlaysisrdquo Journal of Quantitative Criminology vol 15 no 2 pp239ndash248 1999
[3] M PThompson L E Saltzman and D Bibel ldquoApplying NIBRSdata to the study of intimate partner violence Massachusetts asa case studyrdquo Journal of Quantitative Criminology vol 15 no 2pp 163ndash180 1999
[4] H N Snyder ldquoThe Overpresentation of juvenile crime pro-portions in robbery clearance statisticsrdquo Journal of QuantitativeCriminology vol 13 no 2 pp 151ndash161 1999
[5] L A Addington and C M Rennison ldquoRape co-occurrence doadditional crimes affect victim reporting and police clearanceof raperdquo Journal of Quantitative Criminology vol 24 no 2 pp205ndash226 2008
[6] H Chen W Chung J J Xu G Wang Y Qin and M ChauldquoCrime data mining a general framework and some examplesrdquoComputer vol 37 no 4 pp 50ndash56 2004
[7] O Maimon and L Rokach ldquoData mining and knowl-edge discovery handbookrdquo New York NY USA Springer
National Incident-Based Reporting System National Archiveof Criminal Justice Data 2005 httpwwwicpsrumicheduNACJDNIBRS
[8] E Abdu Clustering categorical data using data summaries andspectral techniques [PhD thesis] The City University of NewYork 2009
[9] Z Huang ldquoExtensions to the k-means algorithm for clusteringlarge data sets with categorical valuesrdquo Data Mining andKnowledge Discovery vol 2 no 3 pp 283ndash304 1998
[10] D Gibson J Kleinberg and P Raghavan ldquoClustering cate-gorical data an approach based on dynamical systemsrdquo inProceedings of the 24th InternationalConference on Very LargeDatabases 1998
[11] V Ganti J Gehrke and R Ramakrishnan ldquoCACTUS clusteringcategorical data using summariesrdquo in proceedings of the 5thACM SIGKDD International Conference on Knowledge Discov-ery and data mining pp 73ndash83 1999
[12] S Guha R Rastogi and K Shim ldquoROCK a robust clusteringalgorithm for categorical attributesrdquo in Proceedings of the 15thInternational Conference on Data Engineering (ICDE rsquo99) pp512ndash521 March 1999
[13] D Barbara J Couto and Y Li ldquoCOOLCAT an entropy-based algorithm for categorical clusteringrdquo in Proceedings ofthe 11th International Conference on Information and KnowledgeManagement (CIKM rsquo02) pp 582ndash589 November 2002
[14] P Andritsos Scalable clustering of categorical data and applica-tions [PhD thesis] University of Toronto 2004
[15] M Zaki M Peters I Assent and T Seidl ldquoClicks an effectivealgorithm for mining subspace clusters in categorical datasetsrdquoData and Knowledge Engineering vol 60 no 1 pp 51ndash70 2007
[16] Y Ou and C Q Zhang ldquoA new multimembership clusteringmethodrdquo Journal of Industrial and Management Optimizationvol 3 no 4 pp 619ndash624 2007
[17] RN Shepard andPArabie ldquoAdditive clustering representationof similarities as combinations of discrete overlapping proper-tiesrdquo Psychological Review vol 86 no 2 pp 87ndash123 1979
[18] A V Lukashin and R Fuchs ldquoAnalysis of temporal geneexpression profiles clustering by simulated annealing anddetermining the optimal number of clustersrdquo Bioinformaticsvol 17 no 5 pp 405ndash414 2001
[19] Y Xu V Olman and D Xu ldquoClustering gene expression datausing a graph-theoretic approach an application of minimumspanning treesrdquo Bioinformatics vol 18 no 4 pp 536ndash545 2002
[20] P Zhao and C Q Zhang ldquoA new clustering method and itsapplication in social networksrdquo Pattern Recognition Letters vol32 no 15 pp 2109ndash2118 2011
Virginia were selected for examination These represent alljurisdictions within Virginia with populations greater than10000 There were 10183 incidents reported in these 121chosen jurisdictions
For this study 21 indexes from the NIBRS were chosenfrom the 246 available indexesThese 21 indexes were deemedimportant to provide the relevant characteristics of the vic-tim(s) offender(s) and the circumstances of each incidentThe selected particular indexes were listed in Table 1
In order to facilitate the selected analysis techniques thedata was expanded from one column with many possibleentries to multi columns that contained zero or one Forexample the Offender Segment index contains the sex ofthe offender and has the possible entries of male female orunknown This index column was split into three individualcolumns where an entry in the three columns of (1 0 0) meansfemale (0 1 0) means male and (0 0 1) means unknownThisturns the column for sex of the offender to three columns Allcreated columns were binary (01) columns that were usedto help classify the characteristics of the incident From theexpansion of the original 21 indexes 57 binary columns were
created This led to the creation of a 121 times 57 Crime DataMatrix where each row 119894 represents the 119894th jurisdiction andeach column 119895 represents the 119895th parameter related to theincident (eg offender sex victim resident status) or a crimetype (eg hate crime drug dealing) The Crime Data Matrixconstruction is pictured below
Crime Data Matrix = [[
[
11988611 119886
157
d
1198861211 11988612157
]]
]
(1)
where 119886119894119895
is the entry of the 119895th crime index from the 119894thjurisdiction
Normalization of the rows of thematrixwas completed bydividing each row entry by the population of the jurisdictionThis gave a per person rate for each crime parameter or eachcrime type Normalization of the columns was completedby averaging the columns and subtracting the average fromeach entry in the column Then each entry in the columnwas divided by the vector length of the column Equation (2)shows these normalization step-by-step operations
119886119894119895larr997888
number of occurances of crime parameter or crime typepopulation of the jurisdiction
where 119892119895is the average of the 119895th column
119886119894119895larr997888
119886119894119895
119899119895
where 119899119895is the vector length of the 119895th column
(2)
3 Finding Patterns in the Data
This section explains several different analyses that wereperformed on the data in thematrix described above in orderto attempt to answer the research questions listed in Section 1These analyses include comparing the columns of the matrixto determine the correlation of crime parameters to crimetypes and to determine the correlation of different crimetypes Also there were two analyses performed comparing therow vectors to develop the likelihood index and to cluster thejurisdictions by crime types
31 Correlation of Crime Parameters and Crime Types (Com-paring the Column Vectors) The motivation for comparingthe different crime parameters to crime types is to determineif there are some characteristics that can tell us about thelikelihood of a crime type to occur in a certain jurisdictionEach crime type may have factors that contribute to a specificcrime appearing in a certain place An overall increase incrime in an area may or may not correlate to an increase inany one particular type of crime say hate crime in that areaHowever there may be individual parameters whose increasemay indicate an increase in a particular type of crime Forexample if juvenile offenders are up in a certain area thismay indicate that hate crimes will also be up in that areaAlso crime type vectors were compared against each other
in a similar way For example crimes like juvenile gang werecompared against hate crime to see if these crime types alsohave a correlation
In order to determine the relationship between the 119895thparameter (119895th column of the Crime Data Matrix) and anyother column (another parameter or another crime type) thecorrelation coefficient (cosine of the angle between the twocolumn vectors over the norm of the vectors) was calculatedEach of the columns of the crimedatamatrix forms a vector ina 121-dimensional space and the vectors can be geometricallycompared with the correlation between two parametersrepresented by the cosine of the angle in this space Forexample the column for the juvenile offender could becompared against the column for hate crime or the columnfor hate crime can be compared to the column for gang-related crimeThese comparisons aremade by calculating theangle between these two columns to determine if there is anyrelationship and how strong that relationship may be Thismethod can assist in determining whether two columns varydirectly inversely or separately
For this comparison of two column vectors the variationin two vectors must be transformed to eliminate the effectsof mean differences Once the mean deviation is determinedthen the correlation can be determined by the cosine ofthe angle between the vectors As an example let 119895 =(1198951 1198952 119895
The sign of the answer is ignored since either a strong positiverelationship (close to 1 meaning that the two angles are inthe same direction and close to one another) or a strongnegative relationship (close to minus1 meaning that the two anglesare in opposite directions but nearly opposite one another)indicates that the vectors appear to be related in some way
In doing this comparison with one vector fixed andcomparing it to all other vectors and itself we form arow vector of size 57 Consider the example that comparesthe hate crime vector with all other vectors We constructanother vector that we refer to as the Hate Crime CharacterVector of size 57 which contains all the cosine 120579
119895(the
correlation coefficient of the 119895th parameter with respect tohate crime) A new vector is formed from all these correlationcoefficients (120572
1 1205722 120572
57) where each of the 57 parameters
has a correlation coefficient vector associated with it Nowconsider the corresponding components in the Hate CrimeCharacter Vector Table 2 tells us how hate crimes are relatedto aggravated assault crimes Among those crimes juvenilegang (04058) has higher correlation with hate crime whilethe others for example argument (01325) and drug dealing(00265) have relatively lower correlation
Table 3 also shows similar evidence that for victimrsquos agethe age group less than 18 is alsomore correlated to hate crime
than the other two age groups Many such comparisons canbe observed using these correlation coefficient vectors
32 Analyses of Jurisdictions (Comparing the Row Vectors)By using the correlation coefficients a 57-dimensional vectortwo data analyses can be performed each of which indicatesthe likelihood of a particular crime for each jurisdictionThe first one is a numerical index called the likelihood indexassigned to each of the 112 jurisdictions The second one isa clustering analysis of all jurisdictions Jurisdictions withsimilar recorded crime patterns (adjusted corresponding tocorrelation coefficients) form clusters
Let 120573 = 1205731 1205732 120573
121 be the likelihood index vector
where 120573119894is the likelihood index between jurisdiction 119894 and a
particular crime (eg hate crime) or crime parameter (egjuvenile offender) For this comparison of two row vectorsthe variation in two vectors must be transformed to eliminatethe effects of mean differences Once the mean deviation isdetermined then the correlation can be determined by thecosine of the angle between the vectors As an example let 119896 =(1198961 1198962 119896
57) be the row for theNorfolk jurisdiction and let
119897 = (1198971 1198972 11989757) be the row for the hate crime indicator To
find the correlation coefficient of these two rows calculate thefollowing
120573119894= cos120601
119894=
sum119895119896119895119897119895
119896 119897
(4)
Table 4 gives the top 30 jurisdictions of Virginia with respectto the hate crime likelihood index and also provides the dataon the clustering of the jurisdictions discussed in the nextsection
The Scientific World Journal 5
Table 2 Hate crime correlation coefficient vector values for othercrimes
Crime type Hate crime correlationcoefficient (120572
119895)
Argument 01325Assault on law enforcement officers 00371Drug dealing 00265Gangland (organized crimeinvolvement) 01565
Juvenile gang 04058Loversrsquo quarrel minus00281
Other felony involved minus00375
Other circumstances 00961Unknown circumstance 01356
Table 3 Hate crime correlation coefficients for age of victim
Age of victim Hate crime correlation coefficient (120572119895)
Age lt 18 0494418 le Age le 60 00540Age gt 60 00474
33 Using the Correlation Coefficients to Cluster the Jurisdic-tions To begin the clustering a weighted complete graphwith 121 vertices is formed The weight on each edge is thecorrelation coefficient between the jurisdictions The novelgraph theory clustering method we proposed in [20] is usedto find all the dense (highly weighted) subgraphs of thecomplete graph A distinguished feature of this methodnonbinary hierarchical tree clearly highlights meaningfulclusters which significantly reduces furthermanual efforts forcluster selections The results of clustering the jurisdictionswith respect to hate crimes are also displayed in Figure 1 andTable 4
It can be seen that of the top 30 jurisdictions with respectto hate crimes 12 of the top 15 are in Cluster A and the othersare in Cluster B The remaining 91 jurisdictions also fall intoCluster B The 12 higher hate crime rate counties (cities) areshowed in Figure 1 (in red and blue)
4 Additional Results
Similar analyses where performedwith respect to other crimetypes drug-dealing juvenile gang and gangland (organizedcrime involvement) The results are summarized in Tables 5ndash10
Table 6 summarizes the drug dealing vector comparisonwith the offender parameter vectors
Tables 5 and 6 summarize the comparison of the drugdealing vector with other crime parameter vectors Each ofthe other crime parameters related to the victim and theoffender was compared to the drug dealing vector A highvalue for this correlation coefficient implies that there is acorrelation between this crime parameter and drug dealingAgain the motivation for comparing the different crimeparameters to crime types is an attempt to determine if there
Table 4 Likelihood index and clustering for hate crime
Name Cluster Hate crimelikelihood index
1 NEWPORT NEWS B 08672 NORFOLK B 08373 CHESAPEAKE B 08104 GREENSVILLE B 07625 PORTSMOUTH B 07596 RICHMOND B 07167 WYTHE B 06818 ALEXANDRIA B 06789 BRISTOL B 066710 NEW KENT B 061811 FAUQUIER B 053112 ROANOKE A 043913 RICHMOND A 038614 WILLIAMSBURG A 037715 VIRGINIA BEACH B 031816 CHARLOTTESVILLE A 031217 PETERSBURG A 028018 SPOTSYLVANIA A 022619 HOPEWELL A 020520 RUSSELL A 012121 CLARKE A 012122 WINCHESTER A 011823 SUFFOLK A 011124 MARTINSVILLE A 007525 STAUNTON A 007326 GALAX A 007027 SHENANDOAH A 006328 CAROLINE A 004429 SURRY A minus0020
30 DANVILLE A minus0094
are some characteristics that can tell us about the likelihoodof a crime type to occur in a certain jurisdiction Forexample the correlation between the drug dealing vector andindividual victim is 05789 which is much higher than thecorrelation between drug dealing and victim type business(01721) or victim type societypublic (01729) This wouldseem to imply that for jurisdictions where the individual vic-tim crimes are evident they may also have the likelihood fordrug dealing related crimes The tables show the correlationcoefficient values the correlation values in bold show a highercorrelation with the drug dealing crime type than the otherparameters in that category
Table 7 summarizes the drug dealing vector comparisonwith the other crime types The highest correlation is withargument
Each of the other crime parameters related to the victimand the offender was compared to the gangland (organized
6 The Scientific World Journal
Table 5 Drug dealing correlation with other crime parametersrelated to victim
Drug dealing correlation coefficientType of victim
individual 05789Business 01721Societypublic 01729
Age of victimAge lt 18 02933Age ge 60 0387218 le age lt 60 05993
Sex of victimMale 05774Female 04989
Race of victimWhite 03906Black 05028AsiaPacific Islander 00764Unknown 00003American Indian 00393
Ethnicity of victimHispanic origin 00688Not of Hispanic origin 06009Unknown minus00616
Resident status of victimNonresident 02631Resident 05740Unknown 01290
Table 6 Drug dealing correlation with other crime parametersrelated to offender
Offender ageAge lt 18 03514Age ge 60 0376218 le age lt 60 05678
crime involvement) vector There were no significant rela-tionships to report from this comparison
Table 8 gives the top 30 jurisdictions of Virginia withrespect to the drug dealing likelihood index and also showshow the jurisdictions are clustered Given the 121 Virginiajurisdictions being considered of the top 30 with respect
Table 7 Drug dealing correlation with other crime types
Other crime typesArgument 04854Assault on law enforcement officer(s) 03117Gangland (organized crime involvement) 03456Juvenile gang 00706Loversrsquo quarrel 02967Other felony involved 02955Hate crime 00265
Table 8 Likelihood index and clustering for drug dealing
Name Cluster Drug dealinglikelihood index
1 CHARLOTTESVILLE B 090942 NEWPORT NEWS B 090123 CHESAPEAKE B 088514 PETERSBURG B 086375 RICHMOND B 083656 PORTSMOUTH A 082817 HOPEWELL A 079178 ROANOKE B 07919 SUFFOLK B 0783910 NORFOLK B 0775611 BRISTOL B 0736512 DANVILLE A 070713 GREENSVILLE A 0703214 GALAX A 0688115 CAROLINE A 0685216 SUSSEX A 0639617 WINCHESTER A 0632718 FREDERICKSBURG B 0603119 CLARKE A 0602120 FRANKLIN A 0580821 RICHMOND A 0566722 LYNCHBURG A 0539523 MECKLENBURG A 0518524 RADFORD A 0513325 GOOCHLAND A 0511426 MANASSAS A 049427 HENRY B 0484428 NORTON A 0445629 WISE B 0438130 WILLIAMSBURG A 0395
to the likelihood index twelve are clustered together Onlythree others jurisdictions from Cluster B appear outside thetop 30 these jurisdictions are Smyth (35) Tazewell (37)and Pittsylvania (50) Each of the other crime parametersrelated to the victim and the offender was compared to the
The Scientific World Journal 7
Table 9 Likelihood index and clustering for gangland (organizedcrime involvement)
Name Cluster Ganglandlikelihood index
1 NEWPORT NEWS B 092172 ROANOKE B 087863 CHESAPEAKE B 087134 PETERSBURG B 084215 NORFOLK B 081296 RICHMOND B 079537 LYNCHBURG B 069658 FREDERICKSBURG B 06899 BRISTOL B 0678710 ALEXANDRIA B 0656911 NORTHAMPTON B 063212 LOUDOUN B 0493913 CHARLOTTESVILLE A 0415214 STAFFORD B 0385315 PORTSMOUTH A 0307516 HOPEWELL A 0272417 GALAX A 0212118 CAROLINE A 0137219 SUFFOLK A 0107720 GREENSVILLE A 0081721 CLARKE A 0068922 DANVILLE A 0051923 HENRY A 0051824 RICHMOND A 0007425 WINCHESTER A minus00071
26 FRANKLIN A minus00508
27 SUSSEX A minus00763
28 MANASSAS A minus00812
29 WILLIAMSBURG A minus01001
30 GOOCHLAND A minus01112
gangland (organized crime involvement) vector There wereno significant relationships to report from this comparison
Table 9 gives the top 30 jurisdictions of Virginia withrespect to the gangland (organized crime involvement) like-lihood index and also shows how the jurisdictions are clus-tered Given the 121 Virginia jurisdictions being consideredof the top 30 with respect to the likelihood index thirteen areclustered together Only one other jurisdiction from clusterB appears outside the top 30 this one jurisdiction is FairfaxCounty PD (31) Each of the other crime parameters relatedto the victim and the offender was compared to the juvenilegang vector There were no significant relationships to reportfrom this comparison
Table 10 gives the top 30 jurisdictions of Virginia withrespect to the juvenile gang likelihood index and also showshow the jurisdictions are clustered Given the 121 Virginiajurisdictions being considered of the top 30 with respect to
Table 10 Likelihood index and clustering for juvenile gang
Name Cluster Juvenile ganglikelihood index
1 NORFOLK B 084242 ROANOKE B 083873 RICHMOND B 079874 PORTSMOUTH B 079775 GREENSVILLE B 075756 CHESAPEAKE B 072117 NEWPORT NEWS B 071128 WILLIAMSBURG B 066399 WYTHE B 0571910 ALEXANDRIA B 0559711 LYNCHBURG B 0545212 MARTINSVILLE B 0535513 PULASKI B 0488414 HAMPTON B 0484815 SPOTSYLVANIA B 045116 POWHATAN B 044217 PETERSBURG A 0420718 CHARLOTTESVILLE A 0406419 HOPEWELL A 0351920 RICHMOND A 020821 GALAX A 0202522 BRISTOL A 0201223 SUFFOLK A 0192524 CAROLINE A 012125 CLARKE A 0093926 WINCHESTER A 0089327 DANVILLE A 0083428 SUSSEX A 005129 CAMPBELL A 0011930 FRANKLIN A minus00773
the likelihood index sixteen are clustered together No otherjurisdiction from Cluster B appears outside the top 30
5 Conclusion
The NIBRS provides a wealth of incident level data for usein analysis The methods investigated in this research yieldedpromising preliminary results The methods were appliedonly to the assault data from 2005 but can easily be extendedto other crime types and to other years to validate these resultsand also provide longitudinal investigation
The comparison between the crime type vector and theindividual parameters vectors helped in two cases (hatecrimes and drug dealing) to determine which factors wasmore related to those crimes The different types of analysesthat were conducted on the jurisdictions helped to validateone anotherThe likelihood index looked at whether a certaincrime pattern existed in that jurisdiction while the clustering
8 The Scientific World Journal
method sought to cluster all the jurisdictions based on thecrime patterns of that jurisdictionThis information could beuseful to assist law enforcement agencies or policy makersin determining which jurisdictions share common challengesthat could possibly be addressed through cooperation andsharing resources between jurisdictions
The next steps would be to utilize this same approach fordata from other states or perhaps a larger region to examineif the same information is observed from the analyses It willbe interesting to see if Virginia data and other states havethe same patterns or if different patterns emerge Furtherresearch and refinement of these methods should yield toolsthat would provide researchers law enforcement agenciesand government officials with a means to find patterns ofdifferent crime types and possibly identify jurisdictions thatmay be likely to experience that type of crime
Conflict of Interests
Peixin Zhao Marjorie Darrah Jim Nolan and Cun-QuanZhang certify that there is no actual or potential conflict ofinterests in relation to this paper
Acknowledgments
First author was partially supported by the China Post-doctoral Science Foundation Funded Project (2011M501149)the Humanity and Social Science Foundation of Ministryof Education of China (12YJCZH303) the Special FundProject for Postdoctoral Innovation of Shandong Province(201103061) the Informationization Research Project ofShandong Province (2013EI153) the National Statistical Sci-ence Project (2013LZ38) and Independent Innovation Foun-dation of Shandong University (IIFSDU) (IFW12109)
References
[1] Y Akiyama and J Nolan ldquoMethods for understanding andanalyzingNIBRS datardquo Journal of Quantitative Criminology vol15 no 2 pp 225ndash238 1999
[2] C Dunn and T Zelenock ldquoNIBRS data available for secondaryanlaysisrdquo Journal of Quantitative Criminology vol 15 no 2 pp239ndash248 1999
[3] M PThompson L E Saltzman and D Bibel ldquoApplying NIBRSdata to the study of intimate partner violence Massachusetts asa case studyrdquo Journal of Quantitative Criminology vol 15 no 2pp 163ndash180 1999
[4] H N Snyder ldquoThe Overpresentation of juvenile crime pro-portions in robbery clearance statisticsrdquo Journal of QuantitativeCriminology vol 13 no 2 pp 151ndash161 1999
[5] L A Addington and C M Rennison ldquoRape co-occurrence doadditional crimes affect victim reporting and police clearanceof raperdquo Journal of Quantitative Criminology vol 24 no 2 pp205ndash226 2008
[6] H Chen W Chung J J Xu G Wang Y Qin and M ChauldquoCrime data mining a general framework and some examplesrdquoComputer vol 37 no 4 pp 50ndash56 2004
[7] O Maimon and L Rokach ldquoData mining and knowl-edge discovery handbookrdquo New York NY USA Springer
National Incident-Based Reporting System National Archiveof Criminal Justice Data 2005 httpwwwicpsrumicheduNACJDNIBRS
[8] E Abdu Clustering categorical data using data summaries andspectral techniques [PhD thesis] The City University of NewYork 2009
[9] Z Huang ldquoExtensions to the k-means algorithm for clusteringlarge data sets with categorical valuesrdquo Data Mining andKnowledge Discovery vol 2 no 3 pp 283ndash304 1998
[10] D Gibson J Kleinberg and P Raghavan ldquoClustering cate-gorical data an approach based on dynamical systemsrdquo inProceedings of the 24th InternationalConference on Very LargeDatabases 1998
[11] V Ganti J Gehrke and R Ramakrishnan ldquoCACTUS clusteringcategorical data using summariesrdquo in proceedings of the 5thACM SIGKDD International Conference on Knowledge Discov-ery and data mining pp 73ndash83 1999
[12] S Guha R Rastogi and K Shim ldquoROCK a robust clusteringalgorithm for categorical attributesrdquo in Proceedings of the 15thInternational Conference on Data Engineering (ICDE rsquo99) pp512ndash521 March 1999
[13] D Barbara J Couto and Y Li ldquoCOOLCAT an entropy-based algorithm for categorical clusteringrdquo in Proceedings ofthe 11th International Conference on Information and KnowledgeManagement (CIKM rsquo02) pp 582ndash589 November 2002
[14] P Andritsos Scalable clustering of categorical data and applica-tions [PhD thesis] University of Toronto 2004
[15] M Zaki M Peters I Assent and T Seidl ldquoClicks an effectivealgorithm for mining subspace clusters in categorical datasetsrdquoData and Knowledge Engineering vol 60 no 1 pp 51ndash70 2007
[16] Y Ou and C Q Zhang ldquoA new multimembership clusteringmethodrdquo Journal of Industrial and Management Optimizationvol 3 no 4 pp 619ndash624 2007
[17] RN Shepard andPArabie ldquoAdditive clustering representationof similarities as combinations of discrete overlapping proper-tiesrdquo Psychological Review vol 86 no 2 pp 87ndash123 1979
[18] A V Lukashin and R Fuchs ldquoAnalysis of temporal geneexpression profiles clustering by simulated annealing anddetermining the optimal number of clustersrdquo Bioinformaticsvol 17 no 5 pp 405ndash414 2001
[19] Y Xu V Olman and D Xu ldquoClustering gene expression datausing a graph-theoretic approach an application of minimumspanning treesrdquo Bioinformatics vol 18 no 4 pp 536ndash545 2002
[20] P Zhao and C Q Zhang ldquoA new clustering method and itsapplication in social networksrdquo Pattern Recognition Letters vol32 no 15 pp 2109ndash2118 2011
The sign of the answer is ignored since either a strong positiverelationship (close to 1 meaning that the two angles are inthe same direction and close to one another) or a strongnegative relationship (close to minus1 meaning that the two anglesare in opposite directions but nearly opposite one another)indicates that the vectors appear to be related in some way
In doing this comparison with one vector fixed andcomparing it to all other vectors and itself we form arow vector of size 57 Consider the example that comparesthe hate crime vector with all other vectors We constructanother vector that we refer to as the Hate Crime CharacterVector of size 57 which contains all the cosine 120579
119895(the
correlation coefficient of the 119895th parameter with respect tohate crime) A new vector is formed from all these correlationcoefficients (120572
1 1205722 120572
57) where each of the 57 parameters
has a correlation coefficient vector associated with it Nowconsider the corresponding components in the Hate CrimeCharacter Vector Table 2 tells us how hate crimes are relatedto aggravated assault crimes Among those crimes juvenilegang (04058) has higher correlation with hate crime whilethe others for example argument (01325) and drug dealing(00265) have relatively lower correlation
Table 3 also shows similar evidence that for victimrsquos agethe age group less than 18 is alsomore correlated to hate crime
than the other two age groups Many such comparisons canbe observed using these correlation coefficient vectors
32 Analyses of Jurisdictions (Comparing the Row Vectors)By using the correlation coefficients a 57-dimensional vectortwo data analyses can be performed each of which indicatesthe likelihood of a particular crime for each jurisdictionThe first one is a numerical index called the likelihood indexassigned to each of the 112 jurisdictions The second one isa clustering analysis of all jurisdictions Jurisdictions withsimilar recorded crime patterns (adjusted corresponding tocorrelation coefficients) form clusters
Let 120573 = 1205731 1205732 120573
121 be the likelihood index vector
where 120573119894is the likelihood index between jurisdiction 119894 and a
particular crime (eg hate crime) or crime parameter (egjuvenile offender) For this comparison of two row vectorsthe variation in two vectors must be transformed to eliminatethe effects of mean differences Once the mean deviation isdetermined then the correlation can be determined by thecosine of the angle between the vectors As an example let 119896 =(1198961 1198962 119896
57) be the row for theNorfolk jurisdiction and let
119897 = (1198971 1198972 11989757) be the row for the hate crime indicator To
find the correlation coefficient of these two rows calculate thefollowing
120573119894= cos120601
119894=
sum119895119896119895119897119895
119896 119897
(4)
Table 4 gives the top 30 jurisdictions of Virginia with respectto the hate crime likelihood index and also provides the dataon the clustering of the jurisdictions discussed in the nextsection
The Scientific World Journal 5
Table 2 Hate crime correlation coefficient vector values for othercrimes
Crime type Hate crime correlationcoefficient (120572
119895)
Argument 01325Assault on law enforcement officers 00371Drug dealing 00265Gangland (organized crimeinvolvement) 01565
Juvenile gang 04058Loversrsquo quarrel minus00281
Other felony involved minus00375
Other circumstances 00961Unknown circumstance 01356
Table 3 Hate crime correlation coefficients for age of victim
Age of victim Hate crime correlation coefficient (120572119895)
Age lt 18 0494418 le Age le 60 00540Age gt 60 00474
33 Using the Correlation Coefficients to Cluster the Jurisdic-tions To begin the clustering a weighted complete graphwith 121 vertices is formed The weight on each edge is thecorrelation coefficient between the jurisdictions The novelgraph theory clustering method we proposed in [20] is usedto find all the dense (highly weighted) subgraphs of thecomplete graph A distinguished feature of this methodnonbinary hierarchical tree clearly highlights meaningfulclusters which significantly reduces furthermanual efforts forcluster selections The results of clustering the jurisdictionswith respect to hate crimes are also displayed in Figure 1 andTable 4
It can be seen that of the top 30 jurisdictions with respectto hate crimes 12 of the top 15 are in Cluster A and the othersare in Cluster B The remaining 91 jurisdictions also fall intoCluster B The 12 higher hate crime rate counties (cities) areshowed in Figure 1 (in red and blue)
4 Additional Results
Similar analyses where performedwith respect to other crimetypes drug-dealing juvenile gang and gangland (organizedcrime involvement) The results are summarized in Tables 5ndash10
Table 6 summarizes the drug dealing vector comparisonwith the offender parameter vectors
Tables 5 and 6 summarize the comparison of the drugdealing vector with other crime parameter vectors Each ofthe other crime parameters related to the victim and theoffender was compared to the drug dealing vector A highvalue for this correlation coefficient implies that there is acorrelation between this crime parameter and drug dealingAgain the motivation for comparing the different crimeparameters to crime types is an attempt to determine if there
Table 4 Likelihood index and clustering for hate crime
Name Cluster Hate crimelikelihood index
1 NEWPORT NEWS B 08672 NORFOLK B 08373 CHESAPEAKE B 08104 GREENSVILLE B 07625 PORTSMOUTH B 07596 RICHMOND B 07167 WYTHE B 06818 ALEXANDRIA B 06789 BRISTOL B 066710 NEW KENT B 061811 FAUQUIER B 053112 ROANOKE A 043913 RICHMOND A 038614 WILLIAMSBURG A 037715 VIRGINIA BEACH B 031816 CHARLOTTESVILLE A 031217 PETERSBURG A 028018 SPOTSYLVANIA A 022619 HOPEWELL A 020520 RUSSELL A 012121 CLARKE A 012122 WINCHESTER A 011823 SUFFOLK A 011124 MARTINSVILLE A 007525 STAUNTON A 007326 GALAX A 007027 SHENANDOAH A 006328 CAROLINE A 004429 SURRY A minus0020
30 DANVILLE A minus0094
are some characteristics that can tell us about the likelihoodof a crime type to occur in a certain jurisdiction Forexample the correlation between the drug dealing vector andindividual victim is 05789 which is much higher than thecorrelation between drug dealing and victim type business(01721) or victim type societypublic (01729) This wouldseem to imply that for jurisdictions where the individual vic-tim crimes are evident they may also have the likelihood fordrug dealing related crimes The tables show the correlationcoefficient values the correlation values in bold show a highercorrelation with the drug dealing crime type than the otherparameters in that category
Table 7 summarizes the drug dealing vector comparisonwith the other crime types The highest correlation is withargument
Each of the other crime parameters related to the victimand the offender was compared to the gangland (organized
6 The Scientific World Journal
Table 5 Drug dealing correlation with other crime parametersrelated to victim
Drug dealing correlation coefficientType of victim
individual 05789Business 01721Societypublic 01729
Age of victimAge lt 18 02933Age ge 60 0387218 le age lt 60 05993
Sex of victimMale 05774Female 04989
Race of victimWhite 03906Black 05028AsiaPacific Islander 00764Unknown 00003American Indian 00393
Ethnicity of victimHispanic origin 00688Not of Hispanic origin 06009Unknown minus00616
Resident status of victimNonresident 02631Resident 05740Unknown 01290
Table 6 Drug dealing correlation with other crime parametersrelated to offender
Offender ageAge lt 18 03514Age ge 60 0376218 le age lt 60 05678
crime involvement) vector There were no significant rela-tionships to report from this comparison
Table 8 gives the top 30 jurisdictions of Virginia withrespect to the drug dealing likelihood index and also showshow the jurisdictions are clustered Given the 121 Virginiajurisdictions being considered of the top 30 with respect
Table 7 Drug dealing correlation with other crime types
Other crime typesArgument 04854Assault on law enforcement officer(s) 03117Gangland (organized crime involvement) 03456Juvenile gang 00706Loversrsquo quarrel 02967Other felony involved 02955Hate crime 00265
Table 8 Likelihood index and clustering for drug dealing
Name Cluster Drug dealinglikelihood index
1 CHARLOTTESVILLE B 090942 NEWPORT NEWS B 090123 CHESAPEAKE B 088514 PETERSBURG B 086375 RICHMOND B 083656 PORTSMOUTH A 082817 HOPEWELL A 079178 ROANOKE B 07919 SUFFOLK B 0783910 NORFOLK B 0775611 BRISTOL B 0736512 DANVILLE A 070713 GREENSVILLE A 0703214 GALAX A 0688115 CAROLINE A 0685216 SUSSEX A 0639617 WINCHESTER A 0632718 FREDERICKSBURG B 0603119 CLARKE A 0602120 FRANKLIN A 0580821 RICHMOND A 0566722 LYNCHBURG A 0539523 MECKLENBURG A 0518524 RADFORD A 0513325 GOOCHLAND A 0511426 MANASSAS A 049427 HENRY B 0484428 NORTON A 0445629 WISE B 0438130 WILLIAMSBURG A 0395
to the likelihood index twelve are clustered together Onlythree others jurisdictions from Cluster B appear outside thetop 30 these jurisdictions are Smyth (35) Tazewell (37)and Pittsylvania (50) Each of the other crime parametersrelated to the victim and the offender was compared to the
The Scientific World Journal 7
Table 9 Likelihood index and clustering for gangland (organizedcrime involvement)
Name Cluster Ganglandlikelihood index
1 NEWPORT NEWS B 092172 ROANOKE B 087863 CHESAPEAKE B 087134 PETERSBURG B 084215 NORFOLK B 081296 RICHMOND B 079537 LYNCHBURG B 069658 FREDERICKSBURG B 06899 BRISTOL B 0678710 ALEXANDRIA B 0656911 NORTHAMPTON B 063212 LOUDOUN B 0493913 CHARLOTTESVILLE A 0415214 STAFFORD B 0385315 PORTSMOUTH A 0307516 HOPEWELL A 0272417 GALAX A 0212118 CAROLINE A 0137219 SUFFOLK A 0107720 GREENSVILLE A 0081721 CLARKE A 0068922 DANVILLE A 0051923 HENRY A 0051824 RICHMOND A 0007425 WINCHESTER A minus00071
26 FRANKLIN A minus00508
27 SUSSEX A minus00763
28 MANASSAS A minus00812
29 WILLIAMSBURG A minus01001
30 GOOCHLAND A minus01112
gangland (organized crime involvement) vector There wereno significant relationships to report from this comparison
Table 9 gives the top 30 jurisdictions of Virginia withrespect to the gangland (organized crime involvement) like-lihood index and also shows how the jurisdictions are clus-tered Given the 121 Virginia jurisdictions being consideredof the top 30 with respect to the likelihood index thirteen areclustered together Only one other jurisdiction from clusterB appears outside the top 30 this one jurisdiction is FairfaxCounty PD (31) Each of the other crime parameters relatedto the victim and the offender was compared to the juvenilegang vector There were no significant relationships to reportfrom this comparison
Table 10 gives the top 30 jurisdictions of Virginia withrespect to the juvenile gang likelihood index and also showshow the jurisdictions are clustered Given the 121 Virginiajurisdictions being considered of the top 30 with respect to
Table 10 Likelihood index and clustering for juvenile gang
Name Cluster Juvenile ganglikelihood index
1 NORFOLK B 084242 ROANOKE B 083873 RICHMOND B 079874 PORTSMOUTH B 079775 GREENSVILLE B 075756 CHESAPEAKE B 072117 NEWPORT NEWS B 071128 WILLIAMSBURG B 066399 WYTHE B 0571910 ALEXANDRIA B 0559711 LYNCHBURG B 0545212 MARTINSVILLE B 0535513 PULASKI B 0488414 HAMPTON B 0484815 SPOTSYLVANIA B 045116 POWHATAN B 044217 PETERSBURG A 0420718 CHARLOTTESVILLE A 0406419 HOPEWELL A 0351920 RICHMOND A 020821 GALAX A 0202522 BRISTOL A 0201223 SUFFOLK A 0192524 CAROLINE A 012125 CLARKE A 0093926 WINCHESTER A 0089327 DANVILLE A 0083428 SUSSEX A 005129 CAMPBELL A 0011930 FRANKLIN A minus00773
the likelihood index sixteen are clustered together No otherjurisdiction from Cluster B appears outside the top 30
5 Conclusion
The NIBRS provides a wealth of incident level data for usein analysis The methods investigated in this research yieldedpromising preliminary results The methods were appliedonly to the assault data from 2005 but can easily be extendedto other crime types and to other years to validate these resultsand also provide longitudinal investigation
The comparison between the crime type vector and theindividual parameters vectors helped in two cases (hatecrimes and drug dealing) to determine which factors wasmore related to those crimes The different types of analysesthat were conducted on the jurisdictions helped to validateone anotherThe likelihood index looked at whether a certaincrime pattern existed in that jurisdiction while the clustering
8 The Scientific World Journal
method sought to cluster all the jurisdictions based on thecrime patterns of that jurisdictionThis information could beuseful to assist law enforcement agencies or policy makersin determining which jurisdictions share common challengesthat could possibly be addressed through cooperation andsharing resources between jurisdictions
The next steps would be to utilize this same approach fordata from other states or perhaps a larger region to examineif the same information is observed from the analyses It willbe interesting to see if Virginia data and other states havethe same patterns or if different patterns emerge Furtherresearch and refinement of these methods should yield toolsthat would provide researchers law enforcement agenciesand government officials with a means to find patterns ofdifferent crime types and possibly identify jurisdictions thatmay be likely to experience that type of crime
Conflict of Interests
Peixin Zhao Marjorie Darrah Jim Nolan and Cun-QuanZhang certify that there is no actual or potential conflict ofinterests in relation to this paper
Acknowledgments
First author was partially supported by the China Post-doctoral Science Foundation Funded Project (2011M501149)the Humanity and Social Science Foundation of Ministryof Education of China (12YJCZH303) the Special FundProject for Postdoctoral Innovation of Shandong Province(201103061) the Informationization Research Project ofShandong Province (2013EI153) the National Statistical Sci-ence Project (2013LZ38) and Independent Innovation Foun-dation of Shandong University (IIFSDU) (IFW12109)
References
[1] Y Akiyama and J Nolan ldquoMethods for understanding andanalyzingNIBRS datardquo Journal of Quantitative Criminology vol15 no 2 pp 225ndash238 1999
[2] C Dunn and T Zelenock ldquoNIBRS data available for secondaryanlaysisrdquo Journal of Quantitative Criminology vol 15 no 2 pp239ndash248 1999
[3] M PThompson L E Saltzman and D Bibel ldquoApplying NIBRSdata to the study of intimate partner violence Massachusetts asa case studyrdquo Journal of Quantitative Criminology vol 15 no 2pp 163ndash180 1999
[4] H N Snyder ldquoThe Overpresentation of juvenile crime pro-portions in robbery clearance statisticsrdquo Journal of QuantitativeCriminology vol 13 no 2 pp 151ndash161 1999
[5] L A Addington and C M Rennison ldquoRape co-occurrence doadditional crimes affect victim reporting and police clearanceof raperdquo Journal of Quantitative Criminology vol 24 no 2 pp205ndash226 2008
[6] H Chen W Chung J J Xu G Wang Y Qin and M ChauldquoCrime data mining a general framework and some examplesrdquoComputer vol 37 no 4 pp 50ndash56 2004
[7] O Maimon and L Rokach ldquoData mining and knowl-edge discovery handbookrdquo New York NY USA Springer
National Incident-Based Reporting System National Archiveof Criminal Justice Data 2005 httpwwwicpsrumicheduNACJDNIBRS
[8] E Abdu Clustering categorical data using data summaries andspectral techniques [PhD thesis] The City University of NewYork 2009
[9] Z Huang ldquoExtensions to the k-means algorithm for clusteringlarge data sets with categorical valuesrdquo Data Mining andKnowledge Discovery vol 2 no 3 pp 283ndash304 1998
[10] D Gibson J Kleinberg and P Raghavan ldquoClustering cate-gorical data an approach based on dynamical systemsrdquo inProceedings of the 24th InternationalConference on Very LargeDatabases 1998
[11] V Ganti J Gehrke and R Ramakrishnan ldquoCACTUS clusteringcategorical data using summariesrdquo in proceedings of the 5thACM SIGKDD International Conference on Knowledge Discov-ery and data mining pp 73ndash83 1999
[12] S Guha R Rastogi and K Shim ldquoROCK a robust clusteringalgorithm for categorical attributesrdquo in Proceedings of the 15thInternational Conference on Data Engineering (ICDE rsquo99) pp512ndash521 March 1999
[13] D Barbara J Couto and Y Li ldquoCOOLCAT an entropy-based algorithm for categorical clusteringrdquo in Proceedings ofthe 11th International Conference on Information and KnowledgeManagement (CIKM rsquo02) pp 582ndash589 November 2002
[14] P Andritsos Scalable clustering of categorical data and applica-tions [PhD thesis] University of Toronto 2004
[15] M Zaki M Peters I Assent and T Seidl ldquoClicks an effectivealgorithm for mining subspace clusters in categorical datasetsrdquoData and Knowledge Engineering vol 60 no 1 pp 51ndash70 2007
[16] Y Ou and C Q Zhang ldquoA new multimembership clusteringmethodrdquo Journal of Industrial and Management Optimizationvol 3 no 4 pp 619ndash624 2007
[17] RN Shepard andPArabie ldquoAdditive clustering representationof similarities as combinations of discrete overlapping proper-tiesrdquo Psychological Review vol 86 no 2 pp 87ndash123 1979
[18] A V Lukashin and R Fuchs ldquoAnalysis of temporal geneexpression profiles clustering by simulated annealing anddetermining the optimal number of clustersrdquo Bioinformaticsvol 17 no 5 pp 405ndash414 2001
[19] Y Xu V Olman and D Xu ldquoClustering gene expression datausing a graph-theoretic approach an application of minimumspanning treesrdquo Bioinformatics vol 18 no 4 pp 536ndash545 2002
[20] P Zhao and C Q Zhang ldquoA new clustering method and itsapplication in social networksrdquo Pattern Recognition Letters vol32 no 15 pp 2109ndash2118 2011
Table 2 Hate crime correlation coefficient vector values for othercrimes
Crime type Hate crime correlationcoefficient (120572
119895)
Argument 01325Assault on law enforcement officers 00371Drug dealing 00265Gangland (organized crimeinvolvement) 01565
Juvenile gang 04058Loversrsquo quarrel minus00281
Other felony involved minus00375
Other circumstances 00961Unknown circumstance 01356
Table 3 Hate crime correlation coefficients for age of victim
Age of victim Hate crime correlation coefficient (120572119895)
Age lt 18 0494418 le Age le 60 00540Age gt 60 00474
33 Using the Correlation Coefficients to Cluster the Jurisdic-tions To begin the clustering a weighted complete graphwith 121 vertices is formed The weight on each edge is thecorrelation coefficient between the jurisdictions The novelgraph theory clustering method we proposed in [20] is usedto find all the dense (highly weighted) subgraphs of thecomplete graph A distinguished feature of this methodnonbinary hierarchical tree clearly highlights meaningfulclusters which significantly reduces furthermanual efforts forcluster selections The results of clustering the jurisdictionswith respect to hate crimes are also displayed in Figure 1 andTable 4
It can be seen that of the top 30 jurisdictions with respectto hate crimes 12 of the top 15 are in Cluster A and the othersare in Cluster B The remaining 91 jurisdictions also fall intoCluster B The 12 higher hate crime rate counties (cities) areshowed in Figure 1 (in red and blue)
4 Additional Results
Similar analyses where performedwith respect to other crimetypes drug-dealing juvenile gang and gangland (organizedcrime involvement) The results are summarized in Tables 5ndash10
Table 6 summarizes the drug dealing vector comparisonwith the offender parameter vectors
Tables 5 and 6 summarize the comparison of the drugdealing vector with other crime parameter vectors Each ofthe other crime parameters related to the victim and theoffender was compared to the drug dealing vector A highvalue for this correlation coefficient implies that there is acorrelation between this crime parameter and drug dealingAgain the motivation for comparing the different crimeparameters to crime types is an attempt to determine if there
Table 4 Likelihood index and clustering for hate crime
Name Cluster Hate crimelikelihood index
1 NEWPORT NEWS B 08672 NORFOLK B 08373 CHESAPEAKE B 08104 GREENSVILLE B 07625 PORTSMOUTH B 07596 RICHMOND B 07167 WYTHE B 06818 ALEXANDRIA B 06789 BRISTOL B 066710 NEW KENT B 061811 FAUQUIER B 053112 ROANOKE A 043913 RICHMOND A 038614 WILLIAMSBURG A 037715 VIRGINIA BEACH B 031816 CHARLOTTESVILLE A 031217 PETERSBURG A 028018 SPOTSYLVANIA A 022619 HOPEWELL A 020520 RUSSELL A 012121 CLARKE A 012122 WINCHESTER A 011823 SUFFOLK A 011124 MARTINSVILLE A 007525 STAUNTON A 007326 GALAX A 007027 SHENANDOAH A 006328 CAROLINE A 004429 SURRY A minus0020
30 DANVILLE A minus0094
are some characteristics that can tell us about the likelihoodof a crime type to occur in a certain jurisdiction Forexample the correlation between the drug dealing vector andindividual victim is 05789 which is much higher than thecorrelation between drug dealing and victim type business(01721) or victim type societypublic (01729) This wouldseem to imply that for jurisdictions where the individual vic-tim crimes are evident they may also have the likelihood fordrug dealing related crimes The tables show the correlationcoefficient values the correlation values in bold show a highercorrelation with the drug dealing crime type than the otherparameters in that category
Table 7 summarizes the drug dealing vector comparisonwith the other crime types The highest correlation is withargument
Each of the other crime parameters related to the victimand the offender was compared to the gangland (organized
6 The Scientific World Journal
Table 5 Drug dealing correlation with other crime parametersrelated to victim
Drug dealing correlation coefficientType of victim
individual 05789Business 01721Societypublic 01729
Age of victimAge lt 18 02933Age ge 60 0387218 le age lt 60 05993
Sex of victimMale 05774Female 04989
Race of victimWhite 03906Black 05028AsiaPacific Islander 00764Unknown 00003American Indian 00393
Ethnicity of victimHispanic origin 00688Not of Hispanic origin 06009Unknown minus00616
Resident status of victimNonresident 02631Resident 05740Unknown 01290
Table 6 Drug dealing correlation with other crime parametersrelated to offender
Offender ageAge lt 18 03514Age ge 60 0376218 le age lt 60 05678
crime involvement) vector There were no significant rela-tionships to report from this comparison
Table 8 gives the top 30 jurisdictions of Virginia withrespect to the drug dealing likelihood index and also showshow the jurisdictions are clustered Given the 121 Virginiajurisdictions being considered of the top 30 with respect
Table 7 Drug dealing correlation with other crime types
Other crime typesArgument 04854Assault on law enforcement officer(s) 03117Gangland (organized crime involvement) 03456Juvenile gang 00706Loversrsquo quarrel 02967Other felony involved 02955Hate crime 00265
Table 8 Likelihood index and clustering for drug dealing
Name Cluster Drug dealinglikelihood index
1 CHARLOTTESVILLE B 090942 NEWPORT NEWS B 090123 CHESAPEAKE B 088514 PETERSBURG B 086375 RICHMOND B 083656 PORTSMOUTH A 082817 HOPEWELL A 079178 ROANOKE B 07919 SUFFOLK B 0783910 NORFOLK B 0775611 BRISTOL B 0736512 DANVILLE A 070713 GREENSVILLE A 0703214 GALAX A 0688115 CAROLINE A 0685216 SUSSEX A 0639617 WINCHESTER A 0632718 FREDERICKSBURG B 0603119 CLARKE A 0602120 FRANKLIN A 0580821 RICHMOND A 0566722 LYNCHBURG A 0539523 MECKLENBURG A 0518524 RADFORD A 0513325 GOOCHLAND A 0511426 MANASSAS A 049427 HENRY B 0484428 NORTON A 0445629 WISE B 0438130 WILLIAMSBURG A 0395
to the likelihood index twelve are clustered together Onlythree others jurisdictions from Cluster B appear outside thetop 30 these jurisdictions are Smyth (35) Tazewell (37)and Pittsylvania (50) Each of the other crime parametersrelated to the victim and the offender was compared to the
The Scientific World Journal 7
Table 9 Likelihood index and clustering for gangland (organizedcrime involvement)
Name Cluster Ganglandlikelihood index
1 NEWPORT NEWS B 092172 ROANOKE B 087863 CHESAPEAKE B 087134 PETERSBURG B 084215 NORFOLK B 081296 RICHMOND B 079537 LYNCHBURG B 069658 FREDERICKSBURG B 06899 BRISTOL B 0678710 ALEXANDRIA B 0656911 NORTHAMPTON B 063212 LOUDOUN B 0493913 CHARLOTTESVILLE A 0415214 STAFFORD B 0385315 PORTSMOUTH A 0307516 HOPEWELL A 0272417 GALAX A 0212118 CAROLINE A 0137219 SUFFOLK A 0107720 GREENSVILLE A 0081721 CLARKE A 0068922 DANVILLE A 0051923 HENRY A 0051824 RICHMOND A 0007425 WINCHESTER A minus00071
26 FRANKLIN A minus00508
27 SUSSEX A minus00763
28 MANASSAS A minus00812
29 WILLIAMSBURG A minus01001
30 GOOCHLAND A minus01112
gangland (organized crime involvement) vector There wereno significant relationships to report from this comparison
Table 9 gives the top 30 jurisdictions of Virginia withrespect to the gangland (organized crime involvement) like-lihood index and also shows how the jurisdictions are clus-tered Given the 121 Virginia jurisdictions being consideredof the top 30 with respect to the likelihood index thirteen areclustered together Only one other jurisdiction from clusterB appears outside the top 30 this one jurisdiction is FairfaxCounty PD (31) Each of the other crime parameters relatedto the victim and the offender was compared to the juvenilegang vector There were no significant relationships to reportfrom this comparison
Table 10 gives the top 30 jurisdictions of Virginia withrespect to the juvenile gang likelihood index and also showshow the jurisdictions are clustered Given the 121 Virginiajurisdictions being considered of the top 30 with respect to
Table 10 Likelihood index and clustering for juvenile gang
Name Cluster Juvenile ganglikelihood index
1 NORFOLK B 084242 ROANOKE B 083873 RICHMOND B 079874 PORTSMOUTH B 079775 GREENSVILLE B 075756 CHESAPEAKE B 072117 NEWPORT NEWS B 071128 WILLIAMSBURG B 066399 WYTHE B 0571910 ALEXANDRIA B 0559711 LYNCHBURG B 0545212 MARTINSVILLE B 0535513 PULASKI B 0488414 HAMPTON B 0484815 SPOTSYLVANIA B 045116 POWHATAN B 044217 PETERSBURG A 0420718 CHARLOTTESVILLE A 0406419 HOPEWELL A 0351920 RICHMOND A 020821 GALAX A 0202522 BRISTOL A 0201223 SUFFOLK A 0192524 CAROLINE A 012125 CLARKE A 0093926 WINCHESTER A 0089327 DANVILLE A 0083428 SUSSEX A 005129 CAMPBELL A 0011930 FRANKLIN A minus00773
the likelihood index sixteen are clustered together No otherjurisdiction from Cluster B appears outside the top 30
5 Conclusion
The NIBRS provides a wealth of incident level data for usein analysis The methods investigated in this research yieldedpromising preliminary results The methods were appliedonly to the assault data from 2005 but can easily be extendedto other crime types and to other years to validate these resultsand also provide longitudinal investigation
The comparison between the crime type vector and theindividual parameters vectors helped in two cases (hatecrimes and drug dealing) to determine which factors wasmore related to those crimes The different types of analysesthat were conducted on the jurisdictions helped to validateone anotherThe likelihood index looked at whether a certaincrime pattern existed in that jurisdiction while the clustering
8 The Scientific World Journal
method sought to cluster all the jurisdictions based on thecrime patterns of that jurisdictionThis information could beuseful to assist law enforcement agencies or policy makersin determining which jurisdictions share common challengesthat could possibly be addressed through cooperation andsharing resources between jurisdictions
The next steps would be to utilize this same approach fordata from other states or perhaps a larger region to examineif the same information is observed from the analyses It willbe interesting to see if Virginia data and other states havethe same patterns or if different patterns emerge Furtherresearch and refinement of these methods should yield toolsthat would provide researchers law enforcement agenciesand government officials with a means to find patterns ofdifferent crime types and possibly identify jurisdictions thatmay be likely to experience that type of crime
Conflict of Interests
Peixin Zhao Marjorie Darrah Jim Nolan and Cun-QuanZhang certify that there is no actual or potential conflict ofinterests in relation to this paper
Acknowledgments
First author was partially supported by the China Post-doctoral Science Foundation Funded Project (2011M501149)the Humanity and Social Science Foundation of Ministryof Education of China (12YJCZH303) the Special FundProject for Postdoctoral Innovation of Shandong Province(201103061) the Informationization Research Project ofShandong Province (2013EI153) the National Statistical Sci-ence Project (2013LZ38) and Independent Innovation Foun-dation of Shandong University (IIFSDU) (IFW12109)
References
[1] Y Akiyama and J Nolan ldquoMethods for understanding andanalyzingNIBRS datardquo Journal of Quantitative Criminology vol15 no 2 pp 225ndash238 1999
[2] C Dunn and T Zelenock ldquoNIBRS data available for secondaryanlaysisrdquo Journal of Quantitative Criminology vol 15 no 2 pp239ndash248 1999
[3] M PThompson L E Saltzman and D Bibel ldquoApplying NIBRSdata to the study of intimate partner violence Massachusetts asa case studyrdquo Journal of Quantitative Criminology vol 15 no 2pp 163ndash180 1999
[4] H N Snyder ldquoThe Overpresentation of juvenile crime pro-portions in robbery clearance statisticsrdquo Journal of QuantitativeCriminology vol 13 no 2 pp 151ndash161 1999
[5] L A Addington and C M Rennison ldquoRape co-occurrence doadditional crimes affect victim reporting and police clearanceof raperdquo Journal of Quantitative Criminology vol 24 no 2 pp205ndash226 2008
[6] H Chen W Chung J J Xu G Wang Y Qin and M ChauldquoCrime data mining a general framework and some examplesrdquoComputer vol 37 no 4 pp 50ndash56 2004
[7] O Maimon and L Rokach ldquoData mining and knowl-edge discovery handbookrdquo New York NY USA Springer
National Incident-Based Reporting System National Archiveof Criminal Justice Data 2005 httpwwwicpsrumicheduNACJDNIBRS
[8] E Abdu Clustering categorical data using data summaries andspectral techniques [PhD thesis] The City University of NewYork 2009
[9] Z Huang ldquoExtensions to the k-means algorithm for clusteringlarge data sets with categorical valuesrdquo Data Mining andKnowledge Discovery vol 2 no 3 pp 283ndash304 1998
[10] D Gibson J Kleinberg and P Raghavan ldquoClustering cate-gorical data an approach based on dynamical systemsrdquo inProceedings of the 24th InternationalConference on Very LargeDatabases 1998
[11] V Ganti J Gehrke and R Ramakrishnan ldquoCACTUS clusteringcategorical data using summariesrdquo in proceedings of the 5thACM SIGKDD International Conference on Knowledge Discov-ery and data mining pp 73ndash83 1999
[12] S Guha R Rastogi and K Shim ldquoROCK a robust clusteringalgorithm for categorical attributesrdquo in Proceedings of the 15thInternational Conference on Data Engineering (ICDE rsquo99) pp512ndash521 March 1999
[13] D Barbara J Couto and Y Li ldquoCOOLCAT an entropy-based algorithm for categorical clusteringrdquo in Proceedings ofthe 11th International Conference on Information and KnowledgeManagement (CIKM rsquo02) pp 582ndash589 November 2002
[14] P Andritsos Scalable clustering of categorical data and applica-tions [PhD thesis] University of Toronto 2004
[15] M Zaki M Peters I Assent and T Seidl ldquoClicks an effectivealgorithm for mining subspace clusters in categorical datasetsrdquoData and Knowledge Engineering vol 60 no 1 pp 51ndash70 2007
[16] Y Ou and C Q Zhang ldquoA new multimembership clusteringmethodrdquo Journal of Industrial and Management Optimizationvol 3 no 4 pp 619ndash624 2007
[17] RN Shepard andPArabie ldquoAdditive clustering representationof similarities as combinations of discrete overlapping proper-tiesrdquo Psychological Review vol 86 no 2 pp 87ndash123 1979
[18] A V Lukashin and R Fuchs ldquoAnalysis of temporal geneexpression profiles clustering by simulated annealing anddetermining the optimal number of clustersrdquo Bioinformaticsvol 17 no 5 pp 405ndash414 2001
[19] Y Xu V Olman and D Xu ldquoClustering gene expression datausing a graph-theoretic approach an application of minimumspanning treesrdquo Bioinformatics vol 18 no 4 pp 536ndash545 2002
[20] P Zhao and C Q Zhang ldquoA new clustering method and itsapplication in social networksrdquo Pattern Recognition Letters vol32 no 15 pp 2109ndash2118 2011
crime involvement) vector There were no significant rela-tionships to report from this comparison
Table 8 gives the top 30 jurisdictions of Virginia withrespect to the drug dealing likelihood index and also showshow the jurisdictions are clustered Given the 121 Virginiajurisdictions being considered of the top 30 with respect
Table 7 Drug dealing correlation with other crime types
Other crime typesArgument 04854Assault on law enforcement officer(s) 03117Gangland (organized crime involvement) 03456Juvenile gang 00706Loversrsquo quarrel 02967Other felony involved 02955Hate crime 00265
Table 8 Likelihood index and clustering for drug dealing
Name Cluster Drug dealinglikelihood index
1 CHARLOTTESVILLE B 090942 NEWPORT NEWS B 090123 CHESAPEAKE B 088514 PETERSBURG B 086375 RICHMOND B 083656 PORTSMOUTH A 082817 HOPEWELL A 079178 ROANOKE B 07919 SUFFOLK B 0783910 NORFOLK B 0775611 BRISTOL B 0736512 DANVILLE A 070713 GREENSVILLE A 0703214 GALAX A 0688115 CAROLINE A 0685216 SUSSEX A 0639617 WINCHESTER A 0632718 FREDERICKSBURG B 0603119 CLARKE A 0602120 FRANKLIN A 0580821 RICHMOND A 0566722 LYNCHBURG A 0539523 MECKLENBURG A 0518524 RADFORD A 0513325 GOOCHLAND A 0511426 MANASSAS A 049427 HENRY B 0484428 NORTON A 0445629 WISE B 0438130 WILLIAMSBURG A 0395
to the likelihood index twelve are clustered together Onlythree others jurisdictions from Cluster B appear outside thetop 30 these jurisdictions are Smyth (35) Tazewell (37)and Pittsylvania (50) Each of the other crime parametersrelated to the victim and the offender was compared to the
The Scientific World Journal 7
Table 9 Likelihood index and clustering for gangland (organizedcrime involvement)
Name Cluster Ganglandlikelihood index
1 NEWPORT NEWS B 092172 ROANOKE B 087863 CHESAPEAKE B 087134 PETERSBURG B 084215 NORFOLK B 081296 RICHMOND B 079537 LYNCHBURG B 069658 FREDERICKSBURG B 06899 BRISTOL B 0678710 ALEXANDRIA B 0656911 NORTHAMPTON B 063212 LOUDOUN B 0493913 CHARLOTTESVILLE A 0415214 STAFFORD B 0385315 PORTSMOUTH A 0307516 HOPEWELL A 0272417 GALAX A 0212118 CAROLINE A 0137219 SUFFOLK A 0107720 GREENSVILLE A 0081721 CLARKE A 0068922 DANVILLE A 0051923 HENRY A 0051824 RICHMOND A 0007425 WINCHESTER A minus00071
26 FRANKLIN A minus00508
27 SUSSEX A minus00763
28 MANASSAS A minus00812
29 WILLIAMSBURG A minus01001
30 GOOCHLAND A minus01112
gangland (organized crime involvement) vector There wereno significant relationships to report from this comparison
Table 9 gives the top 30 jurisdictions of Virginia withrespect to the gangland (organized crime involvement) like-lihood index and also shows how the jurisdictions are clus-tered Given the 121 Virginia jurisdictions being consideredof the top 30 with respect to the likelihood index thirteen areclustered together Only one other jurisdiction from clusterB appears outside the top 30 this one jurisdiction is FairfaxCounty PD (31) Each of the other crime parameters relatedto the victim and the offender was compared to the juvenilegang vector There were no significant relationships to reportfrom this comparison
Table 10 gives the top 30 jurisdictions of Virginia withrespect to the juvenile gang likelihood index and also showshow the jurisdictions are clustered Given the 121 Virginiajurisdictions being considered of the top 30 with respect to
Table 10 Likelihood index and clustering for juvenile gang
Name Cluster Juvenile ganglikelihood index
1 NORFOLK B 084242 ROANOKE B 083873 RICHMOND B 079874 PORTSMOUTH B 079775 GREENSVILLE B 075756 CHESAPEAKE B 072117 NEWPORT NEWS B 071128 WILLIAMSBURG B 066399 WYTHE B 0571910 ALEXANDRIA B 0559711 LYNCHBURG B 0545212 MARTINSVILLE B 0535513 PULASKI B 0488414 HAMPTON B 0484815 SPOTSYLVANIA B 045116 POWHATAN B 044217 PETERSBURG A 0420718 CHARLOTTESVILLE A 0406419 HOPEWELL A 0351920 RICHMOND A 020821 GALAX A 0202522 BRISTOL A 0201223 SUFFOLK A 0192524 CAROLINE A 012125 CLARKE A 0093926 WINCHESTER A 0089327 DANVILLE A 0083428 SUSSEX A 005129 CAMPBELL A 0011930 FRANKLIN A minus00773
the likelihood index sixteen are clustered together No otherjurisdiction from Cluster B appears outside the top 30
5 Conclusion
The NIBRS provides a wealth of incident level data for usein analysis The methods investigated in this research yieldedpromising preliminary results The methods were appliedonly to the assault data from 2005 but can easily be extendedto other crime types and to other years to validate these resultsand also provide longitudinal investigation
The comparison between the crime type vector and theindividual parameters vectors helped in two cases (hatecrimes and drug dealing) to determine which factors wasmore related to those crimes The different types of analysesthat were conducted on the jurisdictions helped to validateone anotherThe likelihood index looked at whether a certaincrime pattern existed in that jurisdiction while the clustering
8 The Scientific World Journal
method sought to cluster all the jurisdictions based on thecrime patterns of that jurisdictionThis information could beuseful to assist law enforcement agencies or policy makersin determining which jurisdictions share common challengesthat could possibly be addressed through cooperation andsharing resources between jurisdictions
The next steps would be to utilize this same approach fordata from other states or perhaps a larger region to examineif the same information is observed from the analyses It willbe interesting to see if Virginia data and other states havethe same patterns or if different patterns emerge Furtherresearch and refinement of these methods should yield toolsthat would provide researchers law enforcement agenciesand government officials with a means to find patterns ofdifferent crime types and possibly identify jurisdictions thatmay be likely to experience that type of crime
Conflict of Interests
Peixin Zhao Marjorie Darrah Jim Nolan and Cun-QuanZhang certify that there is no actual or potential conflict ofinterests in relation to this paper
Acknowledgments
First author was partially supported by the China Post-doctoral Science Foundation Funded Project (2011M501149)the Humanity and Social Science Foundation of Ministryof Education of China (12YJCZH303) the Special FundProject for Postdoctoral Innovation of Shandong Province(201103061) the Informationization Research Project ofShandong Province (2013EI153) the National Statistical Sci-ence Project (2013LZ38) and Independent Innovation Foun-dation of Shandong University (IIFSDU) (IFW12109)
References
[1] Y Akiyama and J Nolan ldquoMethods for understanding andanalyzingNIBRS datardquo Journal of Quantitative Criminology vol15 no 2 pp 225ndash238 1999
[2] C Dunn and T Zelenock ldquoNIBRS data available for secondaryanlaysisrdquo Journal of Quantitative Criminology vol 15 no 2 pp239ndash248 1999
[3] M PThompson L E Saltzman and D Bibel ldquoApplying NIBRSdata to the study of intimate partner violence Massachusetts asa case studyrdquo Journal of Quantitative Criminology vol 15 no 2pp 163ndash180 1999
[4] H N Snyder ldquoThe Overpresentation of juvenile crime pro-portions in robbery clearance statisticsrdquo Journal of QuantitativeCriminology vol 13 no 2 pp 151ndash161 1999
[5] L A Addington and C M Rennison ldquoRape co-occurrence doadditional crimes affect victim reporting and police clearanceof raperdquo Journal of Quantitative Criminology vol 24 no 2 pp205ndash226 2008
[6] H Chen W Chung J J Xu G Wang Y Qin and M ChauldquoCrime data mining a general framework and some examplesrdquoComputer vol 37 no 4 pp 50ndash56 2004
[7] O Maimon and L Rokach ldquoData mining and knowl-edge discovery handbookrdquo New York NY USA Springer
National Incident-Based Reporting System National Archiveof Criminal Justice Data 2005 httpwwwicpsrumicheduNACJDNIBRS
[8] E Abdu Clustering categorical data using data summaries andspectral techniques [PhD thesis] The City University of NewYork 2009
[9] Z Huang ldquoExtensions to the k-means algorithm for clusteringlarge data sets with categorical valuesrdquo Data Mining andKnowledge Discovery vol 2 no 3 pp 283ndash304 1998
[10] D Gibson J Kleinberg and P Raghavan ldquoClustering cate-gorical data an approach based on dynamical systemsrdquo inProceedings of the 24th InternationalConference on Very LargeDatabases 1998
[11] V Ganti J Gehrke and R Ramakrishnan ldquoCACTUS clusteringcategorical data using summariesrdquo in proceedings of the 5thACM SIGKDD International Conference on Knowledge Discov-ery and data mining pp 73ndash83 1999
[12] S Guha R Rastogi and K Shim ldquoROCK a robust clusteringalgorithm for categorical attributesrdquo in Proceedings of the 15thInternational Conference on Data Engineering (ICDE rsquo99) pp512ndash521 March 1999
[13] D Barbara J Couto and Y Li ldquoCOOLCAT an entropy-based algorithm for categorical clusteringrdquo in Proceedings ofthe 11th International Conference on Information and KnowledgeManagement (CIKM rsquo02) pp 582ndash589 November 2002
[14] P Andritsos Scalable clustering of categorical data and applica-tions [PhD thesis] University of Toronto 2004
[15] M Zaki M Peters I Assent and T Seidl ldquoClicks an effectivealgorithm for mining subspace clusters in categorical datasetsrdquoData and Knowledge Engineering vol 60 no 1 pp 51ndash70 2007
[16] Y Ou and C Q Zhang ldquoA new multimembership clusteringmethodrdquo Journal of Industrial and Management Optimizationvol 3 no 4 pp 619ndash624 2007
[17] RN Shepard andPArabie ldquoAdditive clustering representationof similarities as combinations of discrete overlapping proper-tiesrdquo Psychological Review vol 86 no 2 pp 87ndash123 1979
[18] A V Lukashin and R Fuchs ldquoAnalysis of temporal geneexpression profiles clustering by simulated annealing anddetermining the optimal number of clustersrdquo Bioinformaticsvol 17 no 5 pp 405ndash414 2001
[19] Y Xu V Olman and D Xu ldquoClustering gene expression datausing a graph-theoretic approach an application of minimumspanning treesrdquo Bioinformatics vol 18 no 4 pp 536ndash545 2002
[20] P Zhao and C Q Zhang ldquoA new clustering method and itsapplication in social networksrdquo Pattern Recognition Letters vol32 no 15 pp 2109ndash2118 2011
Table 9 Likelihood index and clustering for gangland (organizedcrime involvement)
Name Cluster Ganglandlikelihood index
1 NEWPORT NEWS B 092172 ROANOKE B 087863 CHESAPEAKE B 087134 PETERSBURG B 084215 NORFOLK B 081296 RICHMOND B 079537 LYNCHBURG B 069658 FREDERICKSBURG B 06899 BRISTOL B 0678710 ALEXANDRIA B 0656911 NORTHAMPTON B 063212 LOUDOUN B 0493913 CHARLOTTESVILLE A 0415214 STAFFORD B 0385315 PORTSMOUTH A 0307516 HOPEWELL A 0272417 GALAX A 0212118 CAROLINE A 0137219 SUFFOLK A 0107720 GREENSVILLE A 0081721 CLARKE A 0068922 DANVILLE A 0051923 HENRY A 0051824 RICHMOND A 0007425 WINCHESTER A minus00071
26 FRANKLIN A minus00508
27 SUSSEX A minus00763
28 MANASSAS A minus00812
29 WILLIAMSBURG A minus01001
30 GOOCHLAND A minus01112
gangland (organized crime involvement) vector There wereno significant relationships to report from this comparison
Table 9 gives the top 30 jurisdictions of Virginia withrespect to the gangland (organized crime involvement) like-lihood index and also shows how the jurisdictions are clus-tered Given the 121 Virginia jurisdictions being consideredof the top 30 with respect to the likelihood index thirteen areclustered together Only one other jurisdiction from clusterB appears outside the top 30 this one jurisdiction is FairfaxCounty PD (31) Each of the other crime parameters relatedto the victim and the offender was compared to the juvenilegang vector There were no significant relationships to reportfrom this comparison
Table 10 gives the top 30 jurisdictions of Virginia withrespect to the juvenile gang likelihood index and also showshow the jurisdictions are clustered Given the 121 Virginiajurisdictions being considered of the top 30 with respect to
Table 10 Likelihood index and clustering for juvenile gang
Name Cluster Juvenile ganglikelihood index
1 NORFOLK B 084242 ROANOKE B 083873 RICHMOND B 079874 PORTSMOUTH B 079775 GREENSVILLE B 075756 CHESAPEAKE B 072117 NEWPORT NEWS B 071128 WILLIAMSBURG B 066399 WYTHE B 0571910 ALEXANDRIA B 0559711 LYNCHBURG B 0545212 MARTINSVILLE B 0535513 PULASKI B 0488414 HAMPTON B 0484815 SPOTSYLVANIA B 045116 POWHATAN B 044217 PETERSBURG A 0420718 CHARLOTTESVILLE A 0406419 HOPEWELL A 0351920 RICHMOND A 020821 GALAX A 0202522 BRISTOL A 0201223 SUFFOLK A 0192524 CAROLINE A 012125 CLARKE A 0093926 WINCHESTER A 0089327 DANVILLE A 0083428 SUSSEX A 005129 CAMPBELL A 0011930 FRANKLIN A minus00773
the likelihood index sixteen are clustered together No otherjurisdiction from Cluster B appears outside the top 30
5 Conclusion
The NIBRS provides a wealth of incident level data for usein analysis The methods investigated in this research yieldedpromising preliminary results The methods were appliedonly to the assault data from 2005 but can easily be extendedto other crime types and to other years to validate these resultsand also provide longitudinal investigation
The comparison between the crime type vector and theindividual parameters vectors helped in two cases (hatecrimes and drug dealing) to determine which factors wasmore related to those crimes The different types of analysesthat were conducted on the jurisdictions helped to validateone anotherThe likelihood index looked at whether a certaincrime pattern existed in that jurisdiction while the clustering
8 The Scientific World Journal
method sought to cluster all the jurisdictions based on thecrime patterns of that jurisdictionThis information could beuseful to assist law enforcement agencies or policy makersin determining which jurisdictions share common challengesthat could possibly be addressed through cooperation andsharing resources between jurisdictions
The next steps would be to utilize this same approach fordata from other states or perhaps a larger region to examineif the same information is observed from the analyses It willbe interesting to see if Virginia data and other states havethe same patterns or if different patterns emerge Furtherresearch and refinement of these methods should yield toolsthat would provide researchers law enforcement agenciesand government officials with a means to find patterns ofdifferent crime types and possibly identify jurisdictions thatmay be likely to experience that type of crime
Conflict of Interests
Peixin Zhao Marjorie Darrah Jim Nolan and Cun-QuanZhang certify that there is no actual or potential conflict ofinterests in relation to this paper
Acknowledgments
First author was partially supported by the China Post-doctoral Science Foundation Funded Project (2011M501149)the Humanity and Social Science Foundation of Ministryof Education of China (12YJCZH303) the Special FundProject for Postdoctoral Innovation of Shandong Province(201103061) the Informationization Research Project ofShandong Province (2013EI153) the National Statistical Sci-ence Project (2013LZ38) and Independent Innovation Foun-dation of Shandong University (IIFSDU) (IFW12109)
References
[1] Y Akiyama and J Nolan ldquoMethods for understanding andanalyzingNIBRS datardquo Journal of Quantitative Criminology vol15 no 2 pp 225ndash238 1999
[2] C Dunn and T Zelenock ldquoNIBRS data available for secondaryanlaysisrdquo Journal of Quantitative Criminology vol 15 no 2 pp239ndash248 1999
[3] M PThompson L E Saltzman and D Bibel ldquoApplying NIBRSdata to the study of intimate partner violence Massachusetts asa case studyrdquo Journal of Quantitative Criminology vol 15 no 2pp 163ndash180 1999
[4] H N Snyder ldquoThe Overpresentation of juvenile crime pro-portions in robbery clearance statisticsrdquo Journal of QuantitativeCriminology vol 13 no 2 pp 151ndash161 1999
[5] L A Addington and C M Rennison ldquoRape co-occurrence doadditional crimes affect victim reporting and police clearanceof raperdquo Journal of Quantitative Criminology vol 24 no 2 pp205ndash226 2008
[6] H Chen W Chung J J Xu G Wang Y Qin and M ChauldquoCrime data mining a general framework and some examplesrdquoComputer vol 37 no 4 pp 50ndash56 2004
[7] O Maimon and L Rokach ldquoData mining and knowl-edge discovery handbookrdquo New York NY USA Springer
National Incident-Based Reporting System National Archiveof Criminal Justice Data 2005 httpwwwicpsrumicheduNACJDNIBRS
[8] E Abdu Clustering categorical data using data summaries andspectral techniques [PhD thesis] The City University of NewYork 2009
[9] Z Huang ldquoExtensions to the k-means algorithm for clusteringlarge data sets with categorical valuesrdquo Data Mining andKnowledge Discovery vol 2 no 3 pp 283ndash304 1998
[10] D Gibson J Kleinberg and P Raghavan ldquoClustering cate-gorical data an approach based on dynamical systemsrdquo inProceedings of the 24th InternationalConference on Very LargeDatabases 1998
[11] V Ganti J Gehrke and R Ramakrishnan ldquoCACTUS clusteringcategorical data using summariesrdquo in proceedings of the 5thACM SIGKDD International Conference on Knowledge Discov-ery and data mining pp 73ndash83 1999
[12] S Guha R Rastogi and K Shim ldquoROCK a robust clusteringalgorithm for categorical attributesrdquo in Proceedings of the 15thInternational Conference on Data Engineering (ICDE rsquo99) pp512ndash521 March 1999
[13] D Barbara J Couto and Y Li ldquoCOOLCAT an entropy-based algorithm for categorical clusteringrdquo in Proceedings ofthe 11th International Conference on Information and KnowledgeManagement (CIKM rsquo02) pp 582ndash589 November 2002
[14] P Andritsos Scalable clustering of categorical data and applica-tions [PhD thesis] University of Toronto 2004
[15] M Zaki M Peters I Assent and T Seidl ldquoClicks an effectivealgorithm for mining subspace clusters in categorical datasetsrdquoData and Knowledge Engineering vol 60 no 1 pp 51ndash70 2007
[16] Y Ou and C Q Zhang ldquoA new multimembership clusteringmethodrdquo Journal of Industrial and Management Optimizationvol 3 no 4 pp 619ndash624 2007
[17] RN Shepard andPArabie ldquoAdditive clustering representationof similarities as combinations of discrete overlapping proper-tiesrdquo Psychological Review vol 86 no 2 pp 87ndash123 1979
[18] A V Lukashin and R Fuchs ldquoAnalysis of temporal geneexpression profiles clustering by simulated annealing anddetermining the optimal number of clustersrdquo Bioinformaticsvol 17 no 5 pp 405ndash414 2001
[19] Y Xu V Olman and D Xu ldquoClustering gene expression datausing a graph-theoretic approach an application of minimumspanning treesrdquo Bioinformatics vol 18 no 4 pp 536ndash545 2002
[20] P Zhao and C Q Zhang ldquoA new clustering method and itsapplication in social networksrdquo Pattern Recognition Letters vol32 no 15 pp 2109ndash2118 2011
method sought to cluster all the jurisdictions based on thecrime patterns of that jurisdictionThis information could beuseful to assist law enforcement agencies or policy makersin determining which jurisdictions share common challengesthat could possibly be addressed through cooperation andsharing resources between jurisdictions
The next steps would be to utilize this same approach fordata from other states or perhaps a larger region to examineif the same information is observed from the analyses It willbe interesting to see if Virginia data and other states havethe same patterns or if different patterns emerge Furtherresearch and refinement of these methods should yield toolsthat would provide researchers law enforcement agenciesand government officials with a means to find patterns ofdifferent crime types and possibly identify jurisdictions thatmay be likely to experience that type of crime
Conflict of Interests
Peixin Zhao Marjorie Darrah Jim Nolan and Cun-QuanZhang certify that there is no actual or potential conflict ofinterests in relation to this paper
Acknowledgments
First author was partially supported by the China Post-doctoral Science Foundation Funded Project (2011M501149)the Humanity and Social Science Foundation of Ministryof Education of China (12YJCZH303) the Special FundProject for Postdoctoral Innovation of Shandong Province(201103061) the Informationization Research Project ofShandong Province (2013EI153) the National Statistical Sci-ence Project (2013LZ38) and Independent Innovation Foun-dation of Shandong University (IIFSDU) (IFW12109)
References
[1] Y Akiyama and J Nolan ldquoMethods for understanding andanalyzingNIBRS datardquo Journal of Quantitative Criminology vol15 no 2 pp 225ndash238 1999
[2] C Dunn and T Zelenock ldquoNIBRS data available for secondaryanlaysisrdquo Journal of Quantitative Criminology vol 15 no 2 pp239ndash248 1999
[3] M PThompson L E Saltzman and D Bibel ldquoApplying NIBRSdata to the study of intimate partner violence Massachusetts asa case studyrdquo Journal of Quantitative Criminology vol 15 no 2pp 163ndash180 1999
[4] H N Snyder ldquoThe Overpresentation of juvenile crime pro-portions in robbery clearance statisticsrdquo Journal of QuantitativeCriminology vol 13 no 2 pp 151ndash161 1999
[5] L A Addington and C M Rennison ldquoRape co-occurrence doadditional crimes affect victim reporting and police clearanceof raperdquo Journal of Quantitative Criminology vol 24 no 2 pp205ndash226 2008
[6] H Chen W Chung J J Xu G Wang Y Qin and M ChauldquoCrime data mining a general framework and some examplesrdquoComputer vol 37 no 4 pp 50ndash56 2004
[7] O Maimon and L Rokach ldquoData mining and knowl-edge discovery handbookrdquo New York NY USA Springer
National Incident-Based Reporting System National Archiveof Criminal Justice Data 2005 httpwwwicpsrumicheduNACJDNIBRS
[8] E Abdu Clustering categorical data using data summaries andspectral techniques [PhD thesis] The City University of NewYork 2009
[9] Z Huang ldquoExtensions to the k-means algorithm for clusteringlarge data sets with categorical valuesrdquo Data Mining andKnowledge Discovery vol 2 no 3 pp 283ndash304 1998
[10] D Gibson J Kleinberg and P Raghavan ldquoClustering cate-gorical data an approach based on dynamical systemsrdquo inProceedings of the 24th InternationalConference on Very LargeDatabases 1998
[11] V Ganti J Gehrke and R Ramakrishnan ldquoCACTUS clusteringcategorical data using summariesrdquo in proceedings of the 5thACM SIGKDD International Conference on Knowledge Discov-ery and data mining pp 73ndash83 1999
[12] S Guha R Rastogi and K Shim ldquoROCK a robust clusteringalgorithm for categorical attributesrdquo in Proceedings of the 15thInternational Conference on Data Engineering (ICDE rsquo99) pp512ndash521 March 1999
[13] D Barbara J Couto and Y Li ldquoCOOLCAT an entropy-based algorithm for categorical clusteringrdquo in Proceedings ofthe 11th International Conference on Information and KnowledgeManagement (CIKM rsquo02) pp 582ndash589 November 2002
[14] P Andritsos Scalable clustering of categorical data and applica-tions [PhD thesis] University of Toronto 2004
[15] M Zaki M Peters I Assent and T Seidl ldquoClicks an effectivealgorithm for mining subspace clusters in categorical datasetsrdquoData and Knowledge Engineering vol 60 no 1 pp 51ndash70 2007
[16] Y Ou and C Q Zhang ldquoA new multimembership clusteringmethodrdquo Journal of Industrial and Management Optimizationvol 3 no 4 pp 619ndash624 2007
[17] RN Shepard andPArabie ldquoAdditive clustering representationof similarities as combinations of discrete overlapping proper-tiesrdquo Psychological Review vol 86 no 2 pp 87ndash123 1979
[18] A V Lukashin and R Fuchs ldquoAnalysis of temporal geneexpression profiles clustering by simulated annealing anddetermining the optimal number of clustersrdquo Bioinformaticsvol 17 no 5 pp 405ndash414 2001
[19] Y Xu V Olman and D Xu ldquoClustering gene expression datausing a graph-theoretic approach an application of minimumspanning treesrdquo Bioinformatics vol 18 no 4 pp 536ndash545 2002
[20] P Zhao and C Q Zhang ldquoA new clustering method and itsapplication in social networksrdquo Pattern Recognition Letters vol32 no 15 pp 2109ndash2118 2011