Research Article Analyses of Crime Patterns in …downloads.hindawi.com/journals/tswj/2014/492461.pdfResearch Article Analyses of Crime Patterns in NIBRS Data Based on a Novel Graph

Research ArticleAnalyses of Crime Patterns in NIBRS Data Based on a NovelGraph Theory Clustering Method Virginia as a Case Study

Peixin Zhao1 Marjorie Darrah2 Jim Nolan3 and Cun-Quan Zhang2

1 School of Management Shandong University Jinan Shandong China2Department of Mathematics West Virginia University Morgantown WV USA3Department of Sociology and Anthropology West Virginia University Morgantown WV USA

Correspondence should be addressed to Peixin Zhao pxzhao126com

Received 28 January 2014 Accepted 25 February 2014 Published 20 March 2014

Academic Editors Y Mao X Meng and J Zhou

Copyright copy 2014 Peixin Zhao et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

This paper suggests a novel clustering method for analyzing the National Incident-Based Reporting System (NIBRS) data whichinclude the determination of correlation of different crime types the development of a likelihood index for crimes to occur in ajurisdiction and the clustering of jurisdictions based on crime type The method was tested by using the 2005 assault data from121 jurisdictions in Virginia as a test case The analyses of these data show that some different crime types are correlated and somedifferent crime parameters are correlatedwith different crime typesThe analyses also show that certain jurisdictionswithinVirginiashare certain crime patterns This information assists with constructing a pattern for a specific crime type and can be used todetermine whether a jurisdiction may be more likely to see this type of crime occur in their area

1 Introduction

The National Incident-Based Reporting System (NIBRS) isa crime reporting program for local state and federal lawenforcement agencies that provides a wealth of incident leveldata for use in analysis It is part of the Uniform CrimeReporting (UCR) Programwhich is administered by the FBITheUCRProgramprovides a nationwide view of crime basedon data submitted through state programs or directly to thenational UCR Program and has been operational for around70 years The NIBRS was implemented in the late 1970s tomeet law enforcement need for the 21st century This vastsystem houses information on offenses victims offendersproperty and persons arrested as well as the incident itselfThe data of NIBRS are well structured and readily availablefor researchers and law enforcement agencies to assist withunderstanding the intricate nature of crime

Akiyama and Nolan [1] outlined the structure of theNIBRS data set and provided methods for understandingand analyzing the data Dunn and Zelenock [2] also describeand test procedures to facilitate the use of this vast systemBuilding on these initial works many authors continue

to investigate how this storehouse of data can be turnedinto useful information for researchers and law enforcementagencies Much of the work employs descriptive statisticsapplied in various sophisticated ways to extract informationfrom the files For example Thompson et al [3] applydescriptive statistics to examine intimate partner violenceand make connections between this crime and other crimesthat occurred in the same incident They were able to showa link between intimidation and more serious violent crimesand extract information about the relationships between thevictim and the offender from the NIRBS data that helpsto better understand this type of crime Snyder [4] usedlogistic regression techniques to predict the arrest of juvenilerobbery offenders More recently Addington and Rennison[5] used logistic regressionmodels along with the NIBRS andthe National Crime Victimization Survey (NCVS) data forpredicting rape cooccurrence to provide a critical initial lookat rapes that occur with other crimes

For criminologists the NIBRS data holds the answersto many long-standing questions about crime criminaloffending and crime victimization However gaining accessto some of these answers has remained difficult because of

Hindawi Publishing Corporatione Scientific World JournalVolume 2014 Article ID 492461 8 pageshttpdxdoiorg1011552014492461

2 The Scientific World Journal

the size and complexity of the data Effective techniquessuch as data mining and clustering for criminal justice dataare of increasing importance to both the research and lawenforcement communities [6] In recent years clusteringcategorical data has gainedmore importance because it is oneof the fundamental methods in data mining [7] Clusteringcrime data as with other categorical data is unsupervisedlearning that aims at partitioning a data set into groups ofsimilar items The goal is to create clusters of data objectswhere the within-cluster similarity is maximized and thebetween-cluster similarity is minimized Over the yearsmany clustering algorithms have been developed and testedSome clustering techniques have developed specifically foruse with categorical data Abdu [8] presented three newclustering algorithms thatwere applied to the clustering of theNIBRS data Two of his approaches combine spectral analysisand clustering techniques that are scalable to large data setssuch as the NIBRS

Clustering categorical data poses a challenge not encoun-tered in clustering numerical data because the attributecategories are not ordered and defining a metric with whichto measure the distance between data objects in a dataset becomes a challenge Many of the algorithms that haveemerged for clustering categorical data rely on the occur-rencecooccurrence frequencies of attribute values in thedata set to determine clusters of similar data objects Thebasic goal is to choose a set of attribute categories thatprovide a summary of the data objects in a cluster There area wide range of clustering algorithms for categorical dataincluding K-modes [9] STIRR [10] CACTUS [11] ROCK[12] COOLCAT [13] LIMBO [14] and CLICKS [15] Agood summary can be found in [8] The clustering algorithmused in this research is a mathematically well-defined modelimplemented in a polynomial time algorithm that guaranteesan optimal solution [16]

Due to the lack of well-defined mathematical modelsand optimization goals most existing graph theory clusteringapproaches could not guarantee a proper clustering result ingeneral cases For example agglomerative hierarchical clus-tering methods could not produce proper clusters with largersizes while divisive hierarchical methods could not produceclusters with smaller sizes and clusters with large differencein their sizes and k-core method may produce clusters withsmall edge-cuts and so forth Many papers and articles havementioned these problems and frustration among users (egsee [17ndash19]) Even the most popular commercial softwareSAS is unable to produce proper outputs for some simpledata

The purpose of this paper is to present a novel multidi-mensional clustering method for the NIBRS data We firstlyoutlines a newmeasure called the likelihood index that helpsexamine quantitatively how likely a crime is to occur in aparticular jurisdiction This measure compares a vector thatdescribes a jurisdiction with a vector that represents a crimetype Then according to the defined distance between thesetwo vectors we can determine how closely the jurisdictionaligns with that crime type The data used in this studywere obtained from the 2005 NIBRS which is stored at theNational Archive for Criminal Justice Data at the University

Table 1 NIBRS indexes used

Segment Index Number ofsubindexes

Victim indexes

Type of victim 3Victim age 3Victim sex 2Victim race 5Victim ethnicity 3Victim residence status 3Aggravated assaulthomicidecircumstances 10

Type of injury 6

Offender indexesOffender age 3Offender sex 3Offender race 5

Additionalindexes

Injury 2Juvenile 1Violent crime 1Juvict 1Multiple victims 1Multiple offenders 1Multiple offenders and victims 1Multiple offenders and onevictim 1

One offender and multiplevictims 1

One offender and one victims 1

of Michigan This work explores the following researchquestions Do specific crimes exhibit certain quantifiablecharacteristics Do different types of crimes share similarquantifiable characteristics Do jurisdictions of a state clusterwith respect to different crime types What is the likelihoodthat if one type of crime is occurring in an area other types ofcrime with similar quantifiable characteristic will also occurin that area

The rest of the paper is organized as follows In Section 2we summarize the data unit of analysis and preparation InSection 3 we introduce the methods to deal with the datamatrix and take Virginia as a case study Section 4 providessome additional results and Section 5 gives the conclusionsof the research

2 Data Unit of Analysis and Preparation

The data sets available in the NIBRS provide a wealthof incident level data about each reported crime As for2010 approximately 40 states contribute their data to themassive data set The data and tools are made available byUniversity of Michigan for use by law enforcement agenciesand researchers In order to devise a manageable set of datafor preliminary testing of techniques and for preliminary dataanalysis only the 2005 data on assaults were explored Fromthe 2005 assault data 121 jurisdictions (counties or cities) in

The Scientific World Journal 3

Virginia were selected for examination These represent alljurisdictions within Virginia with populations greater than10000 There were 10183 incidents reported in these 121chosen jurisdictions

For this study 21 indexes from the NIBRS were chosenfrom the 246 available indexesThese 21 indexes were deemedimportant to provide the relevant characteristics of the vic-tim(s) offender(s) and the circumstances of each incidentThe selected particular indexes were listed in Table 1

In order to facilitate the selected analysis techniques thedata was expanded from one column with many possibleentries to multi columns that contained zero or one Forexample the Offender Segment index contains the sex ofthe offender and has the possible entries of male female orunknown This index column was split into three individualcolumns where an entry in the three columns of (1 0 0) meansfemale (0 1 0) means male and (0 0 1) means unknownThisturns the column for sex of the offender to three columns Allcreated columns were binary (01) columns that were usedto help classify the characteristics of the incident From theexpansion of the original 21 indexes 57 binary columns were

created This led to the creation of a 121 times 57 Crime DataMatrix where each row 119894 represents the 119894th jurisdiction andeach column 119895 represents the 119895th parameter related to theincident (eg offender sex victim resident status) or a crimetype (eg hate crime drug dealing) The Crime Data Matrixconstruction is pictured below

Crime Data Matrix = [[

[

11988611 119886

157

d

1198861211 11988612157

]]

]

(1)

where 119886119894119895

is the entry of the 119895th crime index from the 119894thjurisdiction

Normalization of the rows of thematrixwas completed bydividing each row entry by the population of the jurisdictionThis gave a per person rate for each crime parameter or eachcrime type Normalization of the columns was completedby averaging the columns and subtracting the average fromeach entry in the column Then each entry in the columnwas divided by the vector length of the column Equation (2)shows these normalization step-by-step operations

119886119894119895larr997888

number of occurances of crime parameter or crime typepopulation of the jurisdiction

119886119894119895larr997888 119886119894119895minus 119892119895

where 119892119895is the average of the 119895th column

119886119894119895larr997888

119886119894119895

119899119895

where 119899119895is the vector length of the 119895th column

(2)

3 Finding Patterns in the Data

This section explains several different analyses that wereperformed on the data in thematrix described above in orderto attempt to answer the research questions listed in Section 1These analyses include comparing the columns of the matrixto determine the correlation of crime parameters to crimetypes and to determine the correlation of different crimetypes Also there were two analyses performed comparing therow vectors to develop the likelihood index and to cluster thejurisdictions by crime types

31 Correlation of Crime Parameters and Crime Types (Com-paring the Column Vectors) The motivation for comparingthe different crime parameters to crime types is to determineif there are some characteristics that can tell us about thelikelihood of a crime type to occur in a certain jurisdictionEach crime type may have factors that contribute to a specificcrime appearing in a certain place An overall increase incrime in an area may or may not correlate to an increase inany one particular type of crime say hate crime in that areaHowever there may be individual parameters whose increasemay indicate an increase in a particular type of crime Forexample if juvenile offenders are up in a certain area thismay indicate that hate crimes will also be up in that areaAlso crime type vectors were compared against each other

in a similar way For example crimes like juvenile gang werecompared against hate crime to see if these crime types alsohave a correlation

In order to determine the relationship between the 119895thparameter (119895th column of the Crime Data Matrix) and anyother column (another parameter or another crime type) thecorrelation coefficient (cosine of the angle between the twocolumn vectors over the norm of the vectors) was calculatedEach of the columns of the crimedatamatrix forms a vector ina 121-dimensional space and the vectors can be geometricallycompared with the correlation between two parametersrepresented by the cosine of the angle in this space Forexample the column for the juvenile offender could becompared against the column for hate crime or the columnfor hate crime can be compared to the column for gang-related crimeThese comparisons aremade by calculating theangle between these two columns to determine if there is anyrelationship and how strong that relationship may be Thismethod can assist in determining whether two columns varydirectly inversely or separately

For this comparison of two column vectors the variationin two vectors must be transformed to eliminate the effectsof mean differences Once the mean deviation is determinedthen the correlation can be determined by the cosine ofthe angle between the vectors As an example let 119895 =(1198951 1198952 119895

121) be the column for juvenile offender and let


Independent cities(1) Alexandria(2) Bedford(3) Bristol(4) Buena Vista(5) Charlottesville(6) Chesapeake(7) Clifton Forge(8) Colonial Heights(9) Covington(10) Danville

(11) Emporia(12) Fairfax(13) Falls Church(14) Franklin(15) Fredericksburg(16) Galax(17) Hampton(18) Harrisonburg(19) Hopewell(20) Lexington

(21) Lynchburg(22) Manassas(23) Manassas Park(24) Martinsville(25) Newport News(26) Norfolk(27) Norton(28) Petersburg(29) Poquoson(30) Portsmouth

(31) Radford(32) Richmond(33) Roanoke(34) Salem(35) Staunton(36) Suffolk(37) Virginia Beach(38) Waynesboro(39) Williamsburg(40) Winchester

Figure 1 The clustering results of 121 counties in Virginia according to hate crime

ℎ = (ℎ1 ℎ2 ℎ

121) be the column for the hate crime

indicator To find the correlation coefficient of this twocolumns calculate the following

120572119895= cos 120579

119895=

sum119894119895119894ℎ119894

10038171003817100381710038171198951003817100381710038171003817ℎ

(3)

The sign of the answer is ignored since either a strong positiverelationship (close to 1 meaning that the two angles are inthe same direction and close to one another) or a strongnegative relationship (close to minus1 meaning that the two anglesare in opposite directions but nearly opposite one another)indicates that the vectors appear to be related in some way

In doing this comparison with one vector fixed andcomparing it to all other vectors and itself we form arow vector of size 57 Consider the example that comparesthe hate crime vector with all other vectors We constructanother vector that we refer to as the Hate Crime CharacterVector of size 57 which contains all the cosine 120579

119895(the

correlation coefficient of the 119895th parameter with respect tohate crime) A new vector is formed from all these correlationcoefficients (120572

1 1205722 120572

57) where each of the 57 parameters

has a correlation coefficient vector associated with it Nowconsider the corresponding components in the Hate CrimeCharacter Vector Table 2 tells us how hate crimes are relatedto aggravated assault crimes Among those crimes juvenilegang (04058) has higher correlation with hate crime whilethe others for example argument (01325) and drug dealing(00265) have relatively lower correlation

Table 3 also shows similar evidence that for victimrsquos agethe age group less than 18 is alsomore correlated to hate crime

than the other two age groups Many such comparisons canbe observed using these correlation coefficient vectors

32 Analyses of Jurisdictions (Comparing the Row Vectors)By using the correlation coefficients a 57-dimensional vectortwo data analyses can be performed each of which indicatesthe likelihood of a particular crime for each jurisdictionThe first one is a numerical index called the likelihood indexassigned to each of the 112 jurisdictions The second one isa clustering analysis of all jurisdictions Jurisdictions withsimilar recorded crime patterns (adjusted corresponding tocorrelation coefficients) form clusters

Let 120573 = 1205731 1205732 120573

121 be the likelihood index vector

where 120573119894is the likelihood index between jurisdiction 119894 and a

particular crime (eg hate crime) or crime parameter (egjuvenile offender) For this comparison of two row vectorsthe variation in two vectors must be transformed to eliminatethe effects of mean differences Once the mean deviation isdetermined then the correlation can be determined by thecosine of the angle between the vectors As an example let 119896 =(1198961 1198962 119896

57) be the row for theNorfolk jurisdiction and let

119897 = (1198971 1198972 11989757) be the row for the hate crime indicator To

find the correlation coefficient of these two rows calculate thefollowing

120573119894= cos120601

119894=

sum119895119896119895119897119895

119896 119897

(4)

Table 4 gives the top 30 jurisdictions of Virginia with respectto the hate crime likelihood index and also provides the dataon the clustering of the jurisdictions discussed in the nextsection


Table 2 Hate crime correlation coefficient vector values for othercrimes

Crime type Hate crime correlationcoefficient (120572

119895)

Argument 01325Assault on law enforcement officers 00371Drug dealing 00265Gangland (organized crimeinvolvement) 01565

Juvenile gang 04058Loversrsquo quarrel minus00281

Other felony involved minus00375

Other circumstances 00961Unknown circumstance 01356

Table 3 Hate crime correlation coefficients for age of victim

Age of victim Hate crime correlation coefficient (120572119895)

Age lt 18 0494418 le Age le 60 00540Age gt 60 00474

33 Using the Correlation Coefficients to Cluster the Jurisdic-tions To begin the clustering a weighted complete graphwith 121 vertices is formed The weight on each edge is thecorrelation coefficient between the jurisdictions The novelgraph theory clustering method we proposed in [20] is usedto find all the dense (highly weighted) subgraphs of thecomplete graph A distinguished feature of this methodnonbinary hierarchical tree clearly highlights meaningfulclusters which significantly reduces furthermanual efforts forcluster selections The results of clustering the jurisdictionswith respect to hate crimes are also displayed in Figure 1 andTable 4

It can be seen that of the top 30 jurisdictions with respectto hate crimes 12 of the top 15 are in Cluster A and the othersare in Cluster B The remaining 91 jurisdictions also fall intoCluster B The 12 higher hate crime rate counties (cities) areshowed in Figure 1 (in red and blue)

4 Additional Results

Similar analyses where performedwith respect to other crimetypes drug-dealing juvenile gang and gangland (organizedcrime involvement) The results are summarized in Tables 5ndash10

Table 6 summarizes the drug dealing vector comparisonwith the offender parameter vectors

Tables 5 and 6 summarize the comparison of the drugdealing vector with other crime parameter vectors Each ofthe other crime parameters related to the victim and theoffender was compared to the drug dealing vector A highvalue for this correlation coefficient implies that there is acorrelation between this crime parameter and drug dealingAgain the motivation for comparing the different crimeparameters to crime types is an attempt to determine if there

Table 4 Likelihood index and clustering for hate crime

Name Cluster Hate crimelikelihood index

1 NEWPORT NEWS B 08672 NORFOLK B 08373 CHESAPEAKE B 08104 GREENSVILLE B 07625 PORTSMOUTH B 07596 RICHMOND B 07167 WYTHE B 06818 ALEXANDRIA B 06789 BRISTOL B 066710 NEW KENT B 061811 FAUQUIER B 053112 ROANOKE A 043913 RICHMOND A 038614 WILLIAMSBURG A 037715 VIRGINIA BEACH B 031816 CHARLOTTESVILLE A 031217 PETERSBURG A 028018 SPOTSYLVANIA A 022619 HOPEWELL A 020520 RUSSELL A 012121 CLARKE A 012122 WINCHESTER A 011823 SUFFOLK A 011124 MARTINSVILLE A 007525 STAUNTON A 007326 GALAX A 007027 SHENANDOAH A 006328 CAROLINE A 004429 SURRY A minus0020

30 DANVILLE A minus0094

are some characteristics that can tell us about the likelihoodof a crime type to occur in a certain jurisdiction Forexample the correlation between the drug dealing vector andindividual victim is 05789 which is much higher than thecorrelation between drug dealing and victim type business(01721) or victim type societypublic (01729) This wouldseem to imply that for jurisdictions where the individual vic-tim crimes are evident they may also have the likelihood fordrug dealing related crimes The tables show the correlationcoefficient values the correlation values in bold show a highercorrelation with the drug dealing crime type than the otherparameters in that category

Table 7 summarizes the drug dealing vector comparisonwith the other crime types The highest correlation is withargument

Each of the other crime parameters related to the victimand the offender was compared to the gangland (organized


Table 5 Drug dealing correlation with other crime parametersrelated to victim

Drug dealing correlation coefficientType of victim

individual 05789Business 01721Societypublic 01729

Age of victimAge lt 18 02933Age ge 60 0387218 le age lt 60 05993

Sex of victimMale 05774Female 04989

Race of victimWhite 03906Black 05028AsiaPacific Islander 00764Unknown 00003American Indian 00393

Ethnicity of victimHispanic origin 00688Not of Hispanic origin 06009Unknown minus00616

Resident status of victimNonresident 02631Resident 05740Unknown 01290

Table 6 Drug dealing correlation with other crime parametersrelated to offender

Offender ageAge lt 18 03514Age ge 60 0376218 le age lt 60 05678

Offender sexMale 05331Female 05685Unknown 01972

Offender raceWhite 03145Black 05133American IndianAlaskan native minus00479Unknown 01948AsianPacific Islander 0113

crime involvement) vector There were no significant rela-tionships to report from this comparison

Table 8 gives the top 30 jurisdictions of Virginia withrespect to the drug dealing likelihood index and also showshow the jurisdictions are clustered Given the 121 Virginiajurisdictions being considered of the top 30 with respect

Table 7 Drug dealing correlation with other crime types

Other crime typesArgument 04854Assault on law enforcement officer(s) 03117Gangland (organized crime involvement) 03456Juvenile gang 00706Loversrsquo quarrel 02967Other felony involved 02955Hate crime 00265

Table 8 Likelihood index and clustering for drug dealing

Name Cluster Drug dealinglikelihood index

1 CHARLOTTESVILLE B 090942 NEWPORT NEWS B 090123 CHESAPEAKE B 088514 PETERSBURG B 086375 RICHMOND B 083656 PORTSMOUTH A 082817 HOPEWELL A 079178 ROANOKE B 07919 SUFFOLK B 0783910 NORFOLK B 0775611 BRISTOL B 0736512 DANVILLE A 070713 GREENSVILLE A 0703214 GALAX A 0688115 CAROLINE A 0685216 SUSSEX A 0639617 WINCHESTER A 0632718 FREDERICKSBURG B 0603119 CLARKE A 0602120 FRANKLIN A 0580821 RICHMOND A 0566722 LYNCHBURG A 0539523 MECKLENBURG A 0518524 RADFORD A 0513325 GOOCHLAND A 0511426 MANASSAS A 049427 HENRY B 0484428 NORTON A 0445629 WISE B 0438130 WILLIAMSBURG A 0395

to the likelihood index twelve are clustered together Onlythree others jurisdictions from Cluster B appear outside thetop 30 these jurisdictions are Smyth (35) Tazewell (37)and Pittsylvania (50) Each of the other crime parametersrelated to the victim and the offender was compared to the


Table 9 Likelihood index and clustering for gangland (organizedcrime involvement)

Name Cluster Ganglandlikelihood index

1 NEWPORT NEWS B 092172 ROANOKE B 087863 CHESAPEAKE B 087134 PETERSBURG B 084215 NORFOLK B 081296 RICHMOND B 079537 LYNCHBURG B 069658 FREDERICKSBURG B 06899 BRISTOL B 0678710 ALEXANDRIA B 0656911 NORTHAMPTON B 063212 LOUDOUN B 0493913 CHARLOTTESVILLE A 0415214 STAFFORD B 0385315 PORTSMOUTH A 0307516 HOPEWELL A 0272417 GALAX A 0212118 CAROLINE A 0137219 SUFFOLK A 0107720 GREENSVILLE A 0081721 CLARKE A 0068922 DANVILLE A 0051923 HENRY A 0051824 RICHMOND A 0007425 WINCHESTER A minus00071

26 FRANKLIN A minus00508

27 SUSSEX A minus00763

28 MANASSAS A minus00812

29 WILLIAMSBURG A minus01001

30 GOOCHLAND A minus01112

gangland (organized crime involvement) vector There wereno significant relationships to report from this comparison

Table 9 gives the top 30 jurisdictions of Virginia withrespect to the gangland (organized crime involvement) like-lihood index and also shows how the jurisdictions are clus-tered Given the 121 Virginia jurisdictions being consideredof the top 30 with respect to the likelihood index thirteen areclustered together Only one other jurisdiction from clusterB appears outside the top 30 this one jurisdiction is FairfaxCounty PD (31) Each of the other crime parameters relatedto the victim and the offender was compared to the juvenilegang vector There were no significant relationships to reportfrom this comparison

Table 10 gives the top 30 jurisdictions of Virginia withrespect to the juvenile gang likelihood index and also showshow the jurisdictions are clustered Given the 121 Virginiajurisdictions being considered of the top 30 with respect to

Table 10 Likelihood index and clustering for juvenile gang

Name Cluster Juvenile ganglikelihood index

1 NORFOLK B 084242 ROANOKE B 083873 RICHMOND B 079874 PORTSMOUTH B 079775 GREENSVILLE B 075756 CHESAPEAKE B 072117 NEWPORT NEWS B 071128 WILLIAMSBURG B 066399 WYTHE B 0571910 ALEXANDRIA B 0559711 LYNCHBURG B 0545212 MARTINSVILLE B 0535513 PULASKI B 0488414 HAMPTON B 0484815 SPOTSYLVANIA B 045116 POWHATAN B 044217 PETERSBURG A 0420718 CHARLOTTESVILLE A 0406419 HOPEWELL A 0351920 RICHMOND A 020821 GALAX A 0202522 BRISTOL A 0201223 SUFFOLK A 0192524 CAROLINE A 012125 CLARKE A 0093926 WINCHESTER A 0089327 DANVILLE A 0083428 SUSSEX A 005129 CAMPBELL A 0011930 FRANKLIN A minus00773

the likelihood index sixteen are clustered together No otherjurisdiction from Cluster B appears outside the top 30

5 Conclusion

The NIBRS provides a wealth of incident level data for usein analysis The methods investigated in this research yieldedpromising preliminary results The methods were appliedonly to the assault data from 2005 but can easily be extendedto other crime types and to other years to validate these resultsand also provide longitudinal investigation

The comparison between the crime type vector and theindividual parameters vectors helped in two cases (hatecrimes and drug dealing) to determine which factors wasmore related to those crimes The different types of analysesthat were conducted on the jurisdictions helped to validateone anotherThe likelihood index looked at whether a certaincrime pattern existed in that jurisdiction while the clustering


method sought to cluster all the jurisdictions based on thecrime patterns of that jurisdictionThis information could beuseful to assist law enforcement agencies or policy makersin determining which jurisdictions share common challengesthat could possibly be addressed through cooperation andsharing resources between jurisdictions

The next steps would be to utilize this same approach fordata from other states or perhaps a larger region to examineif the same information is observed from the analyses It willbe interesting to see if Virginia data and other states havethe same patterns or if different patterns emerge Furtherresearch and refinement of these methods should yield toolsthat would provide researchers law enforcement agenciesand government officials with a means to find patterns ofdifferent crime types and possibly identify jurisdictions thatmay be likely to experience that type of crime

Conflict of Interests

Peixin Zhao Marjorie Darrah Jim Nolan and Cun-QuanZhang certify that there is no actual or potential conflict ofinterests in relation to this paper

Acknowledgments

First author was partially supported by the China Post-doctoral Science Foundation Funded Project (2011M501149)the Humanity and Social Science Foundation of Ministryof Education of China (12YJCZH303) the Special FundProject for Postdoctoral Innovation of Shandong Province(201103061) the Informationization Research Project ofShandong Province (2013EI153) the National Statistical Sci-ence Project (2013LZ38) and Independent Innovation Foun-dation of Shandong University (IIFSDU) (IFW12109)

References

[1] Y Akiyama and J Nolan ldquoMethods for understanding andanalyzingNIBRS datardquo Journal of Quantitative Criminology vol15 no 2 pp 225ndash238 1999

[2] C Dunn and T Zelenock ldquoNIBRS data available for secondaryanlaysisrdquo Journal of Quantitative Criminology vol 15 no 2 pp239ndash248 1999

[3] M PThompson L E Saltzman and D Bibel ldquoApplying NIBRSdata to the study of intimate partner violence Massachusetts asa case studyrdquo Journal of Quantitative Criminology vol 15 no 2pp 163ndash180 1999

[4] H N Snyder ldquoThe Overpresentation of juvenile crime pro-portions in robbery clearance statisticsrdquo Journal of QuantitativeCriminology vol 13 no 2 pp 151ndash161 1999

[5] L A Addington and C M Rennison ldquoRape co-occurrence doadditional crimes affect victim reporting and police clearanceof raperdquo Journal of Quantitative Criminology vol 24 no 2 pp205ndash226 2008

[6] H Chen W Chung J J Xu G Wang Y Qin and M ChauldquoCrime data mining a general framework and some examplesrdquoComputer vol 37 no 4 pp 50ndash56 2004

[7] O Maimon and L Rokach ldquoData mining and knowl-edge discovery handbookrdquo New York NY USA Springer

National Incident-Based Reporting System National Archiveof Criminal Justice Data 2005 httpwwwicpsrumicheduNACJDNIBRS

[8] E Abdu Clustering categorical data using data summaries andspectral techniques [PhD thesis] The City University of NewYork 2009

[9] Z Huang ldquoExtensions to the k-means algorithm for clusteringlarge data sets with categorical valuesrdquo Data Mining andKnowledge Discovery vol 2 no 3 pp 283ndash304 1998

[10] D Gibson J Kleinberg and P Raghavan ldquoClustering cate-gorical data an approach based on dynamical systemsrdquo inProceedings of the 24th InternationalConference on Very LargeDatabases 1998

[11] V Ganti J Gehrke and R Ramakrishnan ldquoCACTUS clusteringcategorical data using summariesrdquo in proceedings of the 5thACM SIGKDD International Conference on Knowledge Discov-ery and data mining pp 73ndash83 1999

[12] S Guha R Rastogi and K Shim ldquoROCK a robust clusteringalgorithm for categorical attributesrdquo in Proceedings of the 15thInternational Conference on Data Engineering (ICDE rsquo99) pp512ndash521 March 1999

[13] D Barbara J Couto and Y Li ldquoCOOLCAT an entropy-based algorithm for categorical clusteringrdquo in Proceedings ofthe 11th International Conference on Information and KnowledgeManagement (CIKM rsquo02) pp 582ndash589 November 2002

[14] P Andritsos Scalable clustering of categorical data and applica-tions [PhD thesis] University of Toronto 2004

[15] M Zaki M Peters I Assent and T Seidl ldquoClicks an effectivealgorithm for mining subspace clusters in categorical datasetsrdquoData and Knowledge Engineering vol 60 no 1 pp 51ndash70 2007

[16] Y Ou and C Q Zhang ldquoA new multimembership clusteringmethodrdquo Journal of Industrial and Management Optimizationvol 3 no 4 pp 619ndash624 2007

[17] RN Shepard andPArabie ldquoAdditive clustering representationof similarities as combinations of discrete overlapping proper-tiesrdquo Psychological Review vol 86 no 2 pp 87ndash123 1979

[18] A V Lukashin and R Fuchs ldquoAnalysis of temporal geneexpression profiles clustering by simulated annealing anddetermining the optimal number of clustersrdquo Bioinformaticsvol 17 no 5 pp 405ndash414 2001

[19] Y Xu V Olman and D Xu ldquoClustering gene expression datausing a graph-theoretic approach an application of minimumspanning treesrdquo Bioinformatics vol 18 no 4 pp 536ndash545 2002

[20] P Zhao and C Q Zhang ldquoA new clustering method and itsapplication in social networksrdquo Pattern Recognition Letters vol32 no 15 pp 2109ndash2118 2011

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014


Active and Passive Electronic Components

Control Scienceand Engineering

Journal of



RotatingMachinery


Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design



Shock and Vibration


Civil EngineeringAdvances in

Acoustics and VibrationAdvances in



Electrical and Computer Engineering

Journal of

Advances inOptoElectronics


Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of


Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014


Chemical EngineeringInternational Journal of Antennas and

Propagation




Navigation and Observation



DistributedSensor Networks



the size and complexity of the data Effective techniquessuch as data mining and clustering for criminal justice dataare of increasing importance to both the research and lawenforcement communities [6] In recent years clusteringcategorical data has gainedmore importance because it is oneof the fundamental methods in data mining [7] Clusteringcrime data as with other categorical data is unsupervisedlearning that aims at partitioning a data set into groups ofsimilar items The goal is to create clusters of data objectswhere the within-cluster similarity is maximized and thebetween-cluster similarity is minimized Over the yearsmany clustering algorithms have been developed and testedSome clustering techniques have developed specifically foruse with categorical data Abdu [8] presented three newclustering algorithms thatwere applied to the clustering of theNIBRS data Two of his approaches combine spectral analysisand clustering techniques that are scalable to large data setssuch as the NIBRS

Clustering categorical data poses a challenge not encoun-tered in clustering numerical data because the attributecategories are not ordered and defining a metric with whichto measure the distance between data objects in a dataset becomes a challenge Many of the algorithms that haveemerged for clustering categorical data rely on the occur-rencecooccurrence frequencies of attribute values in thedata set to determine clusters of similar data objects Thebasic goal is to choose a set of attribute categories thatprovide a summary of the data objects in a cluster There area wide range of clustering algorithms for categorical dataincluding K-modes [9] STIRR [10] CACTUS [11] ROCK[12] COOLCAT [13] LIMBO [14] and CLICKS [15] Agood summary can be found in [8] The clustering algorithmused in this research is a mathematically well-defined modelimplemented in a polynomial time algorithm that guaranteesan optimal solution [16]

Due to the lack of well-defined mathematical modelsand optimization goals most existing graph theory clusteringapproaches could not guarantee a proper clustering result ingeneral cases For example agglomerative hierarchical clus-tering methods could not produce proper clusters with largersizes while divisive hierarchical methods could not produceclusters with smaller sizes and clusters with large differencein their sizes and k-core method may produce clusters withsmall edge-cuts and so forth Many papers and articles havementioned these problems and frustration among users (egsee [17ndash19]) Even the most popular commercial softwareSAS is unable to produce proper outputs for some simpledata

The purpose of this paper is to present a novel multidi-mensional clustering method for the NIBRS data We firstlyoutlines a newmeasure called the likelihood index that helpsexamine quantitatively how likely a crime is to occur in aparticular jurisdiction This measure compares a vector thatdescribes a jurisdiction with a vector that represents a crimetype Then according to the defined distance between thesetwo vectors we can determine how closely the jurisdictionaligns with that crime type The data used in this studywere obtained from the 2005 NIBRS which is stored at theNational Archive for Criminal Justice Data at the University

Table 1 NIBRS indexes used

Segment Index Number ofsubindexes

Victim indexes

Type of victim 3Victim age 3Victim sex 2Victim race 5Victim ethnicity 3Victim residence status 3Aggravated assaulthomicidecircumstances 10

Type of injury 6

Offender indexesOffender age 3Offender sex 3Offender race 5

Additionalindexes

Injury 2Juvenile 1Violent crime 1Juvict 1Multiple victims 1Multiple offenders 1Multiple offenders and victims 1Multiple offenders and onevictim 1

One offender and multiplevictims 1

One offender and one victims 1

of Michigan This work explores the following researchquestions Do specific crimes exhibit certain quantifiablecharacteristics Do different types of crimes share similarquantifiable characteristics Do jurisdictions of a state clusterwith respect to different crime types What is the likelihoodthat if one type of crime is occurring in an area other types ofcrime with similar quantifiable characteristic will also occurin that area

The rest of the paper is organized as follows In Section 2we summarize the data unit of analysis and preparation InSection 3 we introduce the methods to deal with the datamatrix and take Virginia as a case study Section 4 providessome additional results and Section 5 gives the conclusionsof the research

2 Data Unit of Analysis and Preparation

The data sets available in the NIBRS provide a wealthof incident level data about each reported crime As for2010 approximately 40 states contribute their data to themassive data set The data and tools are made available byUniversity of Michigan for use by law enforcement agenciesand researchers In order to devise a manageable set of datafor preliminary testing of techniques and for preliminary dataanalysis only the 2005 data on assaults were explored Fromthe 2005 assault data 121 jurisdictions (counties or cities) in







[

11988611 119886

157

d

1198861211 11988612157

]]

]

(1)

where 119886119894119895



119886119894119895larr997888


119886119894119895larr997888 119886119894119895minus 119892119895


119886119894119895larr997888

119886119894119895

119899119895


(2)














ℎ = (ℎ1 ℎ2 ℎ



120572119895= cos 120579

119895=

sum119894119895119894ℎ119894

10038171003817100381710038171198951003817100381710038171003817ℎ

(3)



119895(the


1 1205722 120572






Let 120573 = 1205731 1205732 120573







120573119894= cos120601

119894=

sum119895119896119895119897119895

119896 119897

(4)





119895)


























































5 Conclusion








Acknowledgments


References
























RoboticsJournal of





Journal of



RotatingMachinery





VLSI Design



Shock and Vibration







Journal of



Volume 2014


SensorsJournal of





Propagation















[

11988611 119886

157

d

1198861211 11988612157

]]

]

(1)

where 119886119894119895



119886119894119895larr997888


119886119894119895larr997888 119886119894119895minus 119892119895


119886119894119895larr997888

119886119894119895

119899119895


(2)














ℎ = (ℎ1 ℎ2 ℎ



120572119895= cos 120579

119895=

sum119894119895119894ℎ119894

10038171003817100381710038171198951003817100381710038171003817ℎ

(3)



119895(the


1 1205722 120572






Let 120573 = 1205731 1205732 120573







120573119894= cos120601

119894=

sum119895119896119895119897119895

119896 119897

(4)





119895)


























































5 Conclusion








Acknowledgments


References
























RoboticsJournal of





Journal of



RotatingMachinery





VLSI Design



Shock and Vibration







Journal of



Volume 2014


SensorsJournal of





Propagation















ℎ = (ℎ1 ℎ2 ℎ



120572119895= cos 120579

119895=

sum119894119895119894ℎ119894

10038171003817100381710038171198951003817100381710038171003817ℎ

(3)



119895(the


1 1205722 120572






Let 120573 = 1205731 1205732 120573







120573119894= cos120601

119894=

sum119895119896119895119897119895

119896 119897

(4)





119895)


























































5 Conclusion








Acknowledgments


References
























RoboticsJournal of





Journal of



RotatingMachinery





VLSI Design



Shock and Vibration







Journal of



Volume 2014


SensorsJournal of





Propagation












119895)


























































5 Conclusion








Acknowledgments


References
























RoboticsJournal of





Journal of



RotatingMachinery





VLSI Design



Shock and Vibration







Journal of



Volume 2014


SensorsJournal of





Propagation














































5 Conclusion








Acknowledgments


References
























RoboticsJournal of





Journal of



RotatingMachinery





VLSI Design



Shock and Vibration







Journal of



Volume 2014


SensorsJournal of





Propagation

























5 Conclusion








Acknowledgments


References
























RoboticsJournal of





Journal of



RotatingMachinery





VLSI Design



Shock and Vibration







Journal of



Volume 2014


SensorsJournal of





Propagation














Acknowledgments


References
























RoboticsJournal of





Journal of



RotatingMachinery





VLSI Design



Shock and Vibration







Journal of



Volume 2014


SensorsJournal of





Propagation











RoboticsJournal of





Journal of



RotatingMachinery





VLSI Design



Shock and Vibration







Journal of



Volume 2014


SensorsJournal of





Propagation









Research Article Analyses of Crime Patterns in …downloads.hindawi.com/journals/tswj/2014/492461.pdfResearch Article Analyses of Crime Patterns in NIBRS Data Based on a Novel Graph

Documents