International Journal Of Scientific & Engineering Research, Volume 4, Issue 1, January-2013 1 ISSN 2229-5518 1 Comparison of Various Classification Techniques for Satellite Data 1 Manoj Pandya, 1 Astha Baxi, 1 M.B. Potdar, 1 M.H. Kalubarme, 2 Bijendra Agarwal Abstract - Computer interpretation of remote sensing data is referred to as quantitative analysis because of its ability to identify pixels based upon their numerical properties and owing to its ability for counting pixels for area estimates. It is also generally called classification, which is a method by which labels may be attached to pixels in view of their spectral character. This labeling is implemented by a computer by having trained it beforehand to recognize pixels with spectral similarities. Clearly the image data for quantitative analysis must be available in digital form. This is an advantage with image data types, such as that from Landsat, SPOT, IRS, etc, as against more traditional aerial photographs. The latter require digitization before quantitative analysis can be performed. Various classification techniques adopted in this study include unsupervised classifiers like K-Means, ISODATA and supervised classifiers like Minimum Distance, Maximum Likelihood Classifier, Parallelpiped and Enhanced Seeded region Growing Technique. Index Terms – Satellite Data, Classification, supervised, unsupervised, K-Means, ISODATA, MXL, Prallelpiped, Seeded Region Growing —————————— —————————— 1. INTRODUCTION atellite imagery is a source of large amount of information. A two dimensional image that is recorded by satellite sensors is the mapping of the three dimensional visual world. The captured two dimensional signals are sampled and quantized to yield digital images. An Image is worth a thousand words. Satellite image interpreters from various domains extract Information by marking polygon features on image. Various classification methods facilitate to automatically classify objects from image. There are several methods which can be used to extract features. 2. IMAGE SEGMENTATION Segmentation of an image is defined by a set of regions that are connected and non-overlapping, so that each pixel in a segment in the image acquires a unique region label that indicates the region it belongs to. Segmentation is one of the most important elements in automated image analysis, mainly because at this step the objects or other entities of interest are extracted from an image for subsequent processing, such as description and recognition. 3 CLASSIFICATION METHODS There are basically two kinds of classification methods. Unsupervised and supervised. Unsupervised Classifiers: These kinds of classifiers don’t require training site or training resources to classify objects. K-MEANS In statistics and data mining, k-means clustering is a method of cluster analysis which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean. It is an iterative procedure. • In first step, it assigns an arbitrary initial cluster vector. • The second step classifies each pixel to the closest cluster. • In the third step the new cluster mean vectors are calculated based on all the pixels in one cluster. • The second and third steps are repeated until the "change" between the iteration is small. The "change" can be defined in several different ways; either by measuring the distances the mean cluster vector has changed from one iteration to another or by the percentage of pixels that have changed between iterations. • The objective of the k-means algorithm is to minimize the within cluster variability. The objective function (which is to be minimized) is: Sums of squares distances (errors) between each pixel and its assigned cluster centre. S ———————————————— Manoj Pandya, Astha Baxi, M.B. Potdar & M.H. Kalubarme are currently affiliated with Bhaskaracharya Institute for Space Applications and Geo- Informatics (BISAG), Gandhinagar, Gujarat, India PIN 382007 E-mail: [email protected]Dr. Bijendra Agarwal is Director, VJMS College, VADU, Kadi, Mahesana, Gujarat, India
31
Embed
Comparison of Various Classification Techniques for Satellite Datashodhganga.inflibnet.ac.in/bitstream/10603/32838/19/19_publication… · Abstract - Computer interpretation of remote
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
International Journal Of Scientific & Engineering Research, Volume 4, Issue 1, January-2013 1 ISSN 2229-5518
1
1
Comparison of Various Classification Techniques for Satellite Data
1Manoj Pandya,
1Astha Baxi,
1M.B. Potdar,
1M.H. Kalubarme,
2Bijendra Agarwal
Abstract - Computer interpretation of remote sensing data is referred to as quantitative analysis because of its ability to identify pixels
based upon their numerical properties and owing to its ability for counting pixels for area estimates. It is also generally called classification,
which is a method by which labels may be attached to pixels in view of their spectral character. This labeling is implemented by a computer
by having trained it beforehand to recognize pixels with spectral similarities. Clearly the image data for quantitative analysis must be
available in digital form. This is an advantage with image data types, such as that from Landsat, SPOT, IRS, etc, as against more
traditional aerial photographs. The latter require digitization before quantitative analysis can be performed. Various classification
techniques adopted in this study include unsupervised classifiers like K-Means, ISODATA and supervised classifiers like Minimum
Distance, Maximum Likelihood Classifier, Parallelpiped and Enhanced Seeded region Growing Technique.
Index Terms – Satellite Data, Classification, supervised, unsupervised, K-Means, ISODATA, MXL, Prallelpiped, Seeded Region
Growing
—————————— ——————————
1. INTRODUCTION
atellite imagery is a source of large amount of information.
A two dimensional image that is recorded by satellite
sensors is the mapping of the three dimensional visual
world. The captured two dimensional signals are sampled
and quantized to yield digital images. An Image is worth a
thousand words. Satellite image interpreters from various
domains extract Information by marking polygon features on
image. Various classification methods facilitate to
automatically classify objects from image. There are several
methods which can be used to extract features.
2. IMAGE SEGMENTATION
Segmentation of an image is defined by a set of regions that
are connected and non-overlapping, so that each pixel in a
segment in the image acquires a unique region label that
indicates the region it belongs to. Segmentation is one of the
most important elements in automated image analysis, mainly
because at this step the objects or other entities of interest
are extracted from an image for subsequent processing,
such as description and recognition.
3 CLASSIFICATION METHODS
There are basically two kinds of classification methods.
Unsupervised and supervised.
Unsupervised Classifiers:
These kinds of classifiers don’t require training site or
training resources to classify objects.
K-MEANS
In statistics and data mining, k-means clustering is a
method of cluster analysis which aims to partition n
observations into k clusters in which each observation belongs
to the cluster with the nearest mean.
It is an iterative procedure.
• In first step, it assigns an arbitrary initial cluster
vector.
• The second step classifies each pixel to the closest
cluster.
• In the third step the new cluster mean vectors are
calculated based on all the pixels in one cluster.
• The second and third steps are repeated until the
"change" between the iteration is small. The "change" can be
defined in several different ways; either by measuring the
distances the mean cluster vector has changed from one
iteration to another or by the percentage of pixels that have
changed between iterations.
• The objective of the k-means algorithm is to minimize
the within cluster variability. The objective function (which is
to be minimized) is:
Sums of squares distances (errors) between each pixel and
its assigned cluster centre.
S
————————————————
Manoj Pandya, Astha Baxi, M.B. Potdar & M.H. Kalubarme are currently affiliated with Bhaskaracharya Institute for Space Applications and Geo-Informatics (BISAG), Gandhinagar, Gujarat, India PIN 382007 E-mail: [email protected]
Dr. Bijendra Agarwal is Director, VJMS College, VADU, Kadi, Mahesana, Gujarat, India
International Journal Of Scientific & Engineering Research, Volume 4, Issue 1, January-2013 2 ISSN 2229-5518
1
2
1
3
1
4
Where, C(x) is the mean of the cluster that pixel x is
assigned to.
Minimizing the SSdistances is equivalent to minimizing the
Mean Squared Error (MSE). The MSE is a measure of the
within cluster variability.
ISO Data (Iterative Self-Organizing) Classifier
ISO Data stands for Iterative Self-Organizing Data Analysis
Techniques. This algorithm allows the number of clusters to
be automatically adjusted during the iteration by merging
similar clusters and splitting clusters.
Clusters are merged if either the number of members (pixel)
in a cluster is less than a certain threshold or if the centres of
two clusters are closer than a certain threshold. Clusters are
split into two different clusters if the cluster standard
deviation exceeds a predefined value and the number of
members (pixels) is twice the threshold for the minimum
number of members. [1]
The ISODATA algorithm is similar to the k-means
algorithm with the distinct difference that the ISODATA
algorithm allows for different number of clusters while the k-
means assumes that the number of clusters is known a priori.
• Each pixel is compared to each cluster mean and
assigned to the cluster whose mean is closest in Euclidean
distance.
• A new cluster center is computed by averaging the
locations of all the pixels assigned to that cluster.
• The Sum of Squared Errors (SSE) computes the
cumulative squared difference (in the various bands) of each
pixel from its cluster center for each cluster individually, and
then sums these measures over all the clusters.
• The algorithm will stop either when the # iteration
threshold is reached or the max % of unchanged pixel
threshold is reached
Supervised Classifiers:
These kinds of classifiers require adequate training sites or
training signatures for each class of a given image.
Minimum Distance Classifier
This classifier is also referred to as central classifier. This
classifier is the simplest classifier.
• Analyst first computes mean of each training class.
• Next the distance (Euclidian) of each pixel from the
mean is calculated.
• The pixel is assigned to that class whose distance is
nearest to the mean (i.e. the Euclidian distance is minimum
between the pixel and the mean).
The distance is defined as an index of similarity so that the
minimum distance is identical to the maximum similarity.
Figure 1 shows the concept of a minimum distance classifier.
The following distances are often used in this procedure. [7]
Figure 1: Minimum Distance Classifier
It is used to classify unknown image data to classes which
minimize the distance between the image data and the class in
multi-feature space. In the Euclidean plane, if p = (p1, p2) and q = (q1, q2) then the
distance is given by
Fig: 1 Schema for feature tables using pre-defined data types
Parallel Piped Classifier:
The parallelepiped classifier (often termed multi-level slicing)
divides each axis of multi-spectral feature space, as shown in
an example in Figure 2
Figure 2: Schematic concept of parallel piped classifier in three dimensional feature spaces
The decision region for each class is defined on the basis of a
lowest and highest value on each axis. The accuracy of
classification depends on the selection of the lowest and
highest values in consideration of the population statistics of
each class. In the two-dimensional feature space this forms a
rectangular box. We can have as many boxes as number of
classes. All the data points which fall within the box are
labeled to that class. In this respect, it is most important that
the distribution of population of each class is well understood.
[2]
International Journal Of Scientific & Engineering Research, Volume 4, Issue 1, January-2013 3 ISSN 2229-5518
1
5
1
6
1
7
1
8
Maximum Likelihood Classifier (MXL):
The maximum likelihood classifier is one of the most popular
methods of classification in remote sensing, in which a pixel
with the maximum likelihood is classified into the
corresponding class. The likelihood Lk is defined as the
posterior probability of a pixel belonging to class k. [4]
Lk = P(k)*P(X/k) / P(i)*P(X/i)
where P(k) : prior probability of class k
P(X/k) : conditional probability to observe X from class k, or
probability density function
Usually P(k) are assumed to be equal to each other and
P(i)*P(X/i) is also common to all classes.
Therefore Lk depends on P(X/k) or the probability density
function.
For mathematical reasons, a multivariate normal distribution
is applied as the probability density function. In the case of
normal distributions, the likelihood can be expressed as
follows. [5]
Where,
n: number of bands
X: image data of n bands
Lk(X) : likelihood of X belonging to class k
Xk : mean vector of class k
: variance-covariance matrix of class k
is determinant of
In the case where the variance-covariance matrix is
symmetric, the likelihood is the same as the Euclidian
distance.
Figure 3: concept of the maximum likelihood method
Seeded Region Growing Technique:
Seed point selection is based on some user criterion like
pixels in a certain gray-level range, pixels evenly spaced on a
grid, etc. An image is segmented into regions with respect to a
set of q seeds. Given the set of seeds, S1, S2, . . . , Sq, each step
of SRG involves one additional pixel to one of the seed sets.
Moreover, these initial seeds are further replaced by the
centroids of these generated homogeneous regions, R1, R2, ... ,
Rq, by involving the additional pixels step by step. The pixels
in the same region are labeled by the same symbol and the
pixels in variant regions are labeled by different symbols. All
these labeled pixels are called the allocated pixels, and the
others are called the unallocated pixels. Let H be the set of all
unallocated pixels which are adjacent to at least one of the
labeled regions. [14]
Where, N(x, y) is the second-order neighborhood of the
pixel (x, y) as shown in Figure 1. For the unlabeled pixel (x, y)
є H, N(x, y) meets just one of the labeled image region Ri and
define (x, y) є {1, 2, ..., q} to be that index such that N(x, y) ∩
R(x, y) . (x, y, Ri) is defined as the difference between the
testing pixel at (x, y) and its adjacent labeled region Ri. (x, y,
Ri) is calculated as
Where g(x, y) indicates the values of the three color
components of the testing pixel (x, y), g (Xci , Y ci ) represents
2
International Journal Of Scientific & Engineering Research, Volume 4, Issue 1, January-2013 4 ISSN 2229-5518
the average values of three colors components of the
homogeneous region Ri, with g(Xci , Y ci ) the centroid of Ri.
Figure 1 identification of pixels around the seed pixel
If N(x, y) meets two or more of the labeled regions, (x, y)
takes a value of i such that N(x, y) meets Ri and (x, y, Ri) is
minimized.
This seeded region growing procedure is repeated until all
pixels in the image have been allocated to the corresponding
regions.
Enhanced Seeded Region Growing:
This technique was specifically designed by Baxi et al., 2012,
for mobile applications along with mahalanobis distance
classifier [16]. SRG is also very attractive for semantic image
segmentation by involving the high-level knowledge of image
components in the seed selection procedure.
The FEATURE TABLE stores a collection of features.
The location information with metadata is considered as seed
points which can be super-imposed on the geog-referenced
image using Application Programming Interface (API). These
seed points generate training sites for SRG technique
Enhanced Classifier:
The distance of pixel from the mean of given class is
calculated by variance and co-variance methods. It differs
from Euclidean distance in that it takes into account the
correlations of the data set and is multivariate effect size.
where X : vector of image data (n bands)
X = [ x1, x2, .... xn]
Xk : mean of the kth class
Xk = [ m1, m2, .... mn]
(
9
(
10
International Journal Of Scientific & Engineering Research, Volume 4, Issue 1, January-2013 5 ISSN 2229-5518
k : variance matrix
k : variance-covariance matrix
4 COMPARISONS OF CLASSIFIERS
Figure 1 (a) FCC LISS 3 Image of Surendranagar district India with Cotton
Seeds marked in yellow circles
Figure 1 (b) Paddy Seeds in yellow circles
CLASSIFIED IMAGE
Figure 3 Enhanced SRG for both classes
ACCURACY ASSESSMENT
Kappa coefficient is used to evaluate the accuracy. The
original intent of Cohen's Kappa was to measure the degree of
agreement or disagreement of two or more people observing
the same phenomenon. Cohen's kappa measures the
agreement between two raters who each classify N items into
C mutually exclusive categories. The equation for K is:
K =Pr(a) -Pr(e)/1 - Pr(e)
Where Pr(a) is the relative observed agreement among
raters or the total agreement probability, and Pr(e) is the
hypothetical probability of chance agreement, using the
observed data to calculate the probabilities of each observer
randomly saying each category. If the raters are in complete
agreement then K = 1. If there is no agreement among the
raster, then K ≤ 0. The better performing classifiers should
then have the higher K.
Table: Comparison of various classifiers
International Journal Of Scientific & Engineering Research, Volume 4, Issue 1, January-2013 6 ISSN 2229-5518
Figure: Classifier vs. Kappa co-efficient chart
5 CONCLUSIONS
Various Classification techniques for satellite images have
different accuracy. Hybrid method of SRG and Mahalanobis
technique leads to better accuracy.
The extension to the ESRG classifier is Artificial Neural
Network (ANN) and Fuzzy logic. Object oriented
classification also leads to precisely identify objects like trees,
car, buses etc.
ACKNOWLEDGEMENTS
The authors would like to thank to the Director, BISAG, T .P.
status of plot, Property Account details, Overlay of revenue
maps and satellite image on the estate map, Customized
query and analysis functions for the Planning, Land and
Engineering divisions of GIDC, Linkage with the Oracle
database to provide up-to-date data, on the land transactions
and interactive decisions through report generation
mechanism.
2 RATIONALE OF THE MODEL
Industrial Estates have led to the development of large
urban regions especially in the States of Gujarat and
Maharashtra, wherein large-scale city/ town development has
taken place [2]. Effective development can be achieved by
implementing Geographical Information System (GIS). The
Maps improve decision-making capabilities. At user interface
level, the most essential for the stakeholders is that MIS and
GIS should be incorporated into a single application so that
the user has a single interface with which to interact. Spatial
dimension allows harnessing various analyses like visibility,
buffer, intersection, union etc. Effective decisions can be
carried out by leveraging the potential of GIS. The system
should be efficient, scalable and cost-effective.
3 DATABASE PLATFORMS
Database is a foundation for various kinds of research
activity and projects. In this paper a scenario of Gujarat
Industrial Development Corporation (GIDC) is considered
where department’s industrial estate database is stored on
Oracle database. There are several database platforms which
can be scaled for enterprise solutions. The geo database of
industrial estate is warehoused in Microsoft® SQL Server 2005
as the solution has been developed with Microsoft DOT NET
2005 which has better performance with native platform. The
GIS system used in this system is an indigenous and low cost.
Fig: 1 A synoptic view of industrial Estate System
I
————————————————
Manoj Pandya and Krunal Patel are currently working as a senior project scientist in Bhaskaracharya Institute for Space Applications and Geo-Informatics (BISAG), Gandhinagar, Gujarat, India PIN 382007 E-mail: [email protected], [email protected]
Dr. N.N. Jani is affiliated with S.K. patel Institute, Gandhinagar, Gujarat, India, E-mail: [email protected]
International Journal of Scientific & Engineering Research, Volume 3, Issue 9, September-2012 2 ISSN 2229-5518
4 METHODOLOGIES
Spatial database creation is a vital step where interactive
decisions are required like finding the associatively of
Industrial Plots, buffer analysis and hence propose site
suitability based on custom criteria. Spatial entities are
digitized on satellite imagery of cartosat data having 2.5 meter
resolution. Cadastral maps have also been co-registered with
satellite imagery.
Fig: 2 Workflow of the System
Spatial database:
To construct spatial data, cartosat satellite imagery having
resolution of 2.5 meters is used for creating base maps
including industrial plots. Departmental data is linked with
the spatial database to make a dynamic seamless information
matrix. Spatial dataset like Location of Industrial estates,
Layout of Industrial estates, Plot-wise details like Unsold
notes; Taxation; Maintenance etc., Layout of Utilities like
Industrial Zones, Water Supply System, Waste Water
Treatment System Plant, Solid Waste Management, Drainage,
Power Supply etc., Infrastructure facilities, Super imposing
Revenue map, Amenities etc. are derived.
5 DECISION SUPPORT SYSTEM (DSS)
DSS is knowledge based system that serves the
management, operations, and planning levels of an
organization and helps to make decisions, which may be
rapidly changing and not easily specified in advance [3].
Web Interface has authentication condition to allow specific
users to access the portal. The portal has in built query builder
to process user’s custom criteria. In this scenario, bridging
between spatial and aspatial information has lead to an
enterprise level solution by facilitating certain tools like Daily
land data creation, MIS report generation (Monthly,
Quarterly, Vacant Plot List, etc.), Auto notice generation,
Updation of regional office transactions to central database
server, Receipt and pay order generation, On-demand auto-
generation of Offer letter (Lease deed, Scrutiny report, etc.),
Selective dispatch of generated reports (eMail and/or hard
copy), Report Generation etc.
Satellite Imagery and geographical entities are created at
Bhaskaracharya Institute for Space Applications and Geo-
Informatics (BISAG).
Fig: 3 Digitized layout map of Viramgam GIDC estate overlaid on
satellite imagery
Web interface has facility to super impose multiple
amenities, selecting interactive criteria based plots etc. on
industrial plots as shown in below figure.
Fig 4: An example of Criteria for Outstanding amount >= 500000
By executing the criteria of Outstanding Amount >= ‘50000’,
plot fields which satisfy the criteria will be highlighted with
yellow color.
Fig 5: Super-imposing BORE and E.S.R. on Plot of Industrial Estates
International Journal of Scientific & Engineering Research, Volume 3, Issue 9, September-2012 3 ISSN 2229-5518
By super-imposing BORE location and E.S.R. location, it is
easy to find the source of water. Nearest source of water
supply using buffer analysis can be derived easily.
6 APPLICATIONS
The proposed methodology is capable to converge and
extract datasets from the given data warehouse. Real time
Decisions can be taken by implementing GIS engine which
considers spatial dimension. The system provides aid to plan
to regulate and control land use in the vicinity of the estate
providing buffer zones around the estates. Decision makers
can recommend land use controls around the estates for
controlling and minimizing adverse environmental impacts,
Recommending necessary effluent treatment and waste
disposal facilities and other needed abatement infrastructures
needed to be commonly used by all industries of the estate,
Monitoring of the various planning, allotment details of the
aspect etc.
7 LIMITATIONS
Huge data in the form of spatial layers is process oriented.
The proposed methodology can be optimized by introducing
distributed computing. Parallel Processing of tasks can lead to
improve the efficiency to produce spatial and aspatial output
from complex user defined queries.
8 CONCLUSIONS
The proposed solution is scalable to enterprise level.
Industrial Estates are crucial for financial growth of any
nation. Geo spatial technology can not only be useful for
better planning but also useful for better management of
assets in industries. Convergence of MIS and GIS technologies
can provide near real time information resulting in efficient
and effective decision support system (DSS) to help multiple
areas like Health, Defense, and Disaster etc.
ACKNOWLEDGEMENTS
The authors would like to thank to the Director, BISAG, T .P.
Singh and for his inspiration and motivation and Manager
S&A, GIDC, Mr. H V Bagtharia for providing valuable
3. MAP PROJECTION: A map projection is any method of representing the surface of a sphere or other three-dimensional body on a plane. Map projections can be constructed to preserve one or more properties like area, shape, distance etc. though not all of them simultaneously. At the same time, geographical entities can be projected in a single map projection. To overlay different projection geographical entities, map projections and datum should be same. To overcome this problem, ‘on-the fly’ map-projection can be calculated and then overlaid on the reference map layer. 3.1. LOCATION BASED SERVICES (LBS) LBS include services to identify a location of a person or object. LBS utilize Google Maps as the Google API renders services in customized and
Page 133
INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS IN ENGINEERING, TECHNOLOGY AND SCIENCES (IJ-CA-ETS)
simplified fashion. Hence Google Map services are accepted in industries.
Organizations and institutions are accustomed with conventional shape file format. Every organizations and institutions store geographic information in their own map projection. The issue comes when the geometry is superimposed on Google maps as the co-ordinate reference System (CRS) is geographical where the location is measure in latitude and longitude rather than the measurements in easting and northing for local projection system. If the geographical entities overlaid without applying on-the-fly map projection, shifting in map extent is raised that creates trouble while super-imposition process. The proposed solution is to convert map projection using mathematical model. 4. MAP PROJECTION MATH-MODEL Mathematical formulae can be derived to transform one type of map projection system in to the other one For Google maps, geographic map projection is used. The mathematical model converts Transverse Mercator projection coordinates ( N , E ) to their geographic equivalents. Projection Symbols used in the Math-Model:
Map Projections can be modeled in the form of mathematical formulae and equations. Universal Transverse Mercator (UTM) or Transvers Mercator (TM) projections are generally used in organizations. The conversion equations are divided into two parts I. Projection Constats II. Transverse Mercator (TM) to Geographic
I. Projection Constants: Several additional parameters need to be computed before transformations can be undertaken. It is a function of various parameters. These parameters are constant for a projection.
mo is obtained by evaluating m using ∅0 The conversion of Transverse Mercator projection
coordinates (N, E) to geographic coordinates ( , ) is achieved in several steps. First determine N’, m’, n, G, σ , ∅’ II. Transverse Mercator (TM) to Geographic
It is required to determine σ’,ν’, ψ’, t’, E’ and using following formulae:
Page 134
INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS IN ENGINEERING, TECHNOLOGY AND SCIENCES (IJ-CA-ETS)
The latitude of the computation point is then computed using:
The longitude of the computation point is determined using:
Hence the value of latitude & longitude is derived. 5. XML AS TRANSITIONAL STORAGE Extensible Markup Language (XML) can be used to store vertices of geographical datasets as intermediate storage. This is required as the quantum of vertices is
very large. The sample XML file which contains tag for marker, polyline and polygon. This XML file can be overlaid on Google Map. Hence XML can store geometry & attributes that can be parsed to display geometry on Google map. <markers> <marker lat="43.65654" lng="-79.90138" label="Marker One"> <infowindow><![CDATA[ Some stuff to display in the<br>First Info Window ]]></infowindow> </marker> <marker lat="23.65654" lng="72.90138" label="Marker two"> <infowindow><![CDATA[ Some stuff to display in the<br>Second Info Window ]]></infowindow> </marker> <marker lat="24.65654" lng="74.90138" label="Marker Third"> <infowindow><![CDATA[ Some stuff to display in the<br>Third Info Window ]]></infowindow> </marker> <line colour="#FF0000" width="4" html="You clicked the red polyline"> <point lat="43.65654" lng="-79.90138" /> <point lat="43.91892" lng="-78.89231" /> <point lat="43.82589" lng="-79.10040" /> </line> <line colour="#10aa00" width="8" html="You clicked the green polyline"> <point lat="43.9078" lng="-79.0264" /> <point lat="44.1037" lng="-79.6294" /> <point lat="43.5908" lng="-79.2567" /> <point lat="44.2248" lng="-79.2567" /> <point lat="43.7119" lng="-79.6294" /> <point lat="43.9078" lng="-79.0264" /> </line> <line colour="#00dfff" width="2" html="You clicked the yellow polygon"> <point lat="23.2478" lng="73.0264" /> <point lat="24.1027" lng="72.6294" /> <point lat="23.2908" lng="75.2567" /> <point lat="24.2448" lng="70.2567" /> <point lat="23.4119" lng="62.6294" /> <point lat="23.9048" lng="71.0264" /> </line> </markers> 6. APPLICATION Super-imposition of geographical datasets in custom projections can become easier to overlay on Google maps. Without any physical transformations, the geographical layers can be overlaid and thematic
Page 135
INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS IN ENGINEERING, TECHNOLOGY AND SCIENCES (IJ-CA-ETS)
maps can be generated. Following figure shows that organization’s data in red border color can be overlaid on Google map.
Figure 2. Overlaid geographical datsets on Google
Map (using XML datasets) 7. CONCLUSION The mathematical model provides leverage to integrate geographical datasets on a same platform from various map projections. The mathematical model can be evolved for numerous kinds of co-ordinate reference systems to carry out thematic maps for various geo-spatial analyses. 8. REFERENCES
[1] http://mathworld.wolfram.com/MercatorProjection.html [2] http://en.wikipedia.org/wiki/Mercator_projection [3] Tobler, W. R. (1962). “A classification of map projections”, Annals of the Association of American Geographers, Vol. 52, pp. 167-175. [4] http://www.gpsy.com/gpsinfo/geotoutm/ [5] Pickles, J., ed. (1995) Ground Truth: The social Implications of Geographic Information Systems, New York: Guilford Press. [6] Berthon, S., and Robinson, A. (1991) The Shape of the World: The Mapping and Discovery of the Earth. Chicago: Rand McNally. [7] http://mysite.du.edu/~jcalvert/math/mercator.htm [8] http://surveying.wb.psu.edu/sur162/UTM/UTM.htm [9] http://gisremote.blogspot.com/2008/02/about-map-projection-what-is-map.html [10] C. P. Lo & Albert K.W. Yeung (2004) “Concepts and Techniques of Geographic Information Systems” [11] David Salomon (2006) “Transformations and Projections in Computer Graphics” [12] EARL F. BURKHOLDER (2008) “THE 3-D GLOBAL SPATIAL DATA MODEL- Foundation of the Spatial Data Infrastructure”
Page 136
International Journal of Computer Applications (0975 – 8887)
Volume 52– No.19, August 2012
28
Distributed Commuting Augmented Shortest Path Finding for Geo Spatial Datasets
ABSTRACT Geo Spatial information is a large collection of datasets referring
to the real world entities. Geo Spatial information has evolved in
the last decade which led to produce a vast platform in
Government Administration, Scientific Analysis and other
various sectors especially in disaster management (DM), site
suitability of Check-dams for Irrigation Department etc. It is
required to obtain imperative geographically analyzed solutions
like finding shortest path between sources (i.e. location having
stock of food packets, clinical remedial and therapeutic kits, etc.)
to the destination (i.e. location where disaster emerges).
Departmental data (i.e. village Maps) covers detailed spatial and
attribute information compared to the readily available sources.
Hence custom solutions based on Information Technology are
required to be constructed, processed efficiently and quickly to
outfit seamless performance that facilitates in mission critical
Abstract - In the last two decades, Geographic Information Systems (GIS) has been widely developed and is applied to various fields, which include resources management, environment management, prevention of disaster, planning area, education and national defense etc in many sections and subjects. However traditional GIS application systems can’t interact with each other and was considered as an isolated island of information facing problem of interoperability. This is because of different data models adopted by different GIS applications. The spatial data interoperability concept brought forward a huge amount of geospatial information for efficacious management, sharing and increase in value usage. OGC standards are developed in a unique consensus process supported by the OGC's industry, government and academic members to enable geo-processing technologies to interoperate, or "plug and play".
Index Terms – Geo Database, Geo processing, Oracle Spatial, OGC, 3D Rendering, Extrude, GIS
1. INTRODUCTION ithin an enterprise, various software packages are used and in many cases they can’t talk with each other. This
situation is ubiquitous and also exists between different departments. An Open GIS Consortium (OGC) is an international industry consortium of more than 400 companies, government agencies and universities participating in a consensus process to develop publicly available interface standards. Within a local government or enterprise, the spatial data is centrally stored. Interoperable metadata enables organizations to use the right tool for the job while eliminating complicated data transfers and multiple copies of the same data throughout the enterprise or department. The application server is a mediation system, this model uses oracle application server as the mediation system, and through the application server the application client sends WMS or WFS request and get the map server for background application. The three‐tier structure model exposes a GIS portal which is an online GIS for outside of the department. Any client can request the server if it accords with WMS or WFS specification.
2 DATABASE PLATFORMS There are several database platforms which provide the
spatial support. Oracle® is matured to support OGC standards. However there are other database platforms
available like Microsoft® SQL Server 2008, IBM DB2 etc. Open Source database platforms are also available in the realm of proprietary products. MySQL and PostGres with PostGIS extension can server large scale Geo‐Database.
3 ORACLE SPATIAL DATA MODEL Oracle® offers a spatial data model that provides basic
enterprise access to geospatial information. This spatial data model provides a standard structure for point, line, and area features. Oracle Spatial 10g XE (Express Edition) is used in this prototype model. This database platform is free to use upto 4 GB of data. However enterprise version has no limit. Basic DDL (CREATE, ALTER, DROP) and DML (INSERT, UPDATE, DELETE) statements can be used for Database Management. With Spatial, the geometric description of a spatial object is stored in a single row, in a single column of object type SDO_GEOMETRY in a user‐defined table.
Fig: 1 Schema for feature tables using pre-defined data types
The GEOMETRY_COLUMNS table describes the available feature tables and their Geometry properties.
W
———————————————— • Manoj Pandya is currently working as a senior project scientist in
Bhaskaracharya Institute for Space Applications and Geo-Informatics (BISAG), Gandhinagar, Gujarat, India PIN 382007 E-mail: [email protected]
• Pooja Nair, Parthi Gandhi, Shubhada Pareek are pursuing project at BISAG
• Dr. Bijendra Agarwal is Head Of Department (HOD), VJMS College, VADU, Kadi, Mahesana, Gujarat, India
International Journal Of Scientific & Engineering Research, Volume 3, Issue 3, March‐2012 2 ISSN 2229‐5518
GEOMETRY_COLUMNS table provides information on the feature table, spatial reference, geometry type, and co‐ordinate dimension for each Geometry column in the database. CREATE TABLE GEOMETRY_COLUMNS ( F_TABLE_CATALOG CHARACTER VARYING NOT NULL, F_TABLE_SCHEMA CHARACTER VARYING NOT NULL, F_TABLE_NAME CHARACTER VARYING NOT NULL, F_GEOMETRY_COLUMN CHARACTER VARYING NOT NULL, ……………….. ……………….. ……………….. GEOMETRY_TYPE INTEGER, COORD_DIMENSION INTEGER, MAX_PPR INTEGER, SRID INTEGER NOT NULL REFERENCES SPATIAL_REF_SYS, CONSTRAINT GC_PK PRIMARY KEY (F_TABLE_CATALOG, F_TABLE_SCHEMA, F_TABLE_NAME, F_GEOMETRY_COLUMN) ) Example 1: Creation of Geometry Table The SPATIAL_REF_SYS table describes the coordinate system and transformations for Geometry. CREATE TABLE SPATIAL_REF_SYS ( SRID INTEGER NOT NULL PRIMARY KEY, AUTH_NAME CHARACTER VARYING, AUTH_SRID INTEGER, SRTEXT CHARACTER VARYING(2048) ) Example 2: Creation of Spatial Reference System The FEATURE TABLE stores a collection of features.
Metadata is required to store ancillary information about Geo database. Metadata is data about data. It allows common information to be stored at a common level. The attributes like owner name, version number, description of layer, remarks, annotation etc are stored in metadata. CREATE TABLE ANNOTATION_TEXT_METADATA AS { F_TABLE_CATALOG AS CHARACTER VARYING NOT NULL, ………………… …………………
………………… A_TEXT_DEFAULT_MAP_BASE_SCALE AS CHARACTER VARYING, A_TEXT_DEFAULT_ATTRIBUTES AS CHARACTER VARYING} Example 3: Creation of Metadata The geometry metadata describing the dimensions, lower and upper bounds and tolerance in each dimension is stored in a global table owned by MDSYS (which users should never directly update). Each Spatial user has the following views available in the schema associated with that user: USER_SDO_GEOM_METADATA contains metadata information for all spatial tables owned by the user (schema). This is the only view that you can update, and it is the one in which Spatial users must insert metadata related to spatial tables. ALL_SDO_GEOM_METADATA contains meta‐data information for all spatial tables on which the user has SELECT permission. Spatial users are responsible for populating these views. For each spatial column, you must insert an appropriate row into the USER_SDO_GEOM_METADATA view. Metadata prototype is given below. INSERT INTO spatial_ref_sys VALUES (101, ʹPOSCʹ, 32214, ʹPROJCS[ʺUTM_ZONE_14Nʺ, GEOGCS[ʺWorld Geodetic System 72ʺ, DATUM[ʺWGS_72ʺ, ELLIPSOID[ʺNWL_10Dʺ, 6378135, 298.26]], PRIMEM[ʺGreenwichʺ, 0], UNIT[ʺMeterʺ,1.0]], PROJECTION[ʺTransverse_Mercatorʺ], PARAMETER[ʺFalse_Eastingʺ, 500000.0], PARAMETER[ʺFalse_Northingʺ, 0.0], PARAMETER[ʺCentral_Meridianʺ, ‐99.0], PARAMETER[ʺScale_Factorʺ, 0.9996], PARAMETER[ʺLatitude_of_originʺ, 0.0], UNIT[ʺMeterʺ, 1.0]]ʹ); Example 4: Inserting Map Projection parameters
4 RATIONALE OF THE MODEL
This model is scalable to obtain certain objectives. The model implements centralization enterprise spatial data. If database is centralized, it will be more convenient for data manager to update, maintain and distribute. The model enables organizations to use the right tool for the job while eliminating complicated data transfers and multiple copies of the same data throughout the enterprise. The model also helps to implement data sharing between enterprises, especially as data sources for local emergency department. Because emergency department can access and maintain the all data they need. After all, data redundancy can be reduced.
International Journal Of Scientific & Engineering Research, Volume 3, Issue 3, March‐2012 3 ISSN 2229‐5518
Fig: 2 Data translation component model
The oracle spatial data server can be established at centralized server. Different users can access Geo Spatial information through web browser. The application server is the main tier in internet model, which includes GIS application server, GIS application manager server and application connector. The application in this scenario is developed in Java using NetBeans 6.8 Interactive Development Environment (IDE).
5 GEO PROCESSING Geo‐processing is a GIS operation used to manipulate
spatial data. A typical geo‐processing operation takes an input dataset, performs an operation on that dataset, and returns the result of the operation as an output dataset. Common geo‐processing operations include geographic feature overlay, feature selection and analysis, topology processing, raster processing, and data conversion. Geo‐processing allows for definition, management, and analysis of information used to form decisions. Gujarat state from India is used in this example. Proposed System is developed in JAVA platform. JAVA
SERVER PAGE (JSP) serves the solution in web interface. The Code snippet is given as below for Buffer Analysis using Geo Database. CODE SNIPPET: // ASSIGN VARIABLES int sDragx=Integer.parseInt(startDragx); int sDragy=Integer.parseInt(startDragy); int eDragx=Integer.parseInt(endDragx); int eDragy=Integer.parseInt(endDragy); … BufferedImage buffimage1=null; JGeometry gmtry1,gmtry2; STRUCT dbobj1,dbobj2; Graphics2D g11,g22; Shape sing_sh=null,sh_rect=null;
Example 5: A prototype Code in Java Server Page to create Buffer using Geo database
International Journal Of Scientific & Engineering Research, Volume 3, Issue 3, March‐2012 4 ISSN 2229‐5518
Fig 3(a): Buffer Creation for give Area of Interest (AOI) Data source: BISAG
3 (b) Data inside Yellow Ring (buffered)
3 (c) Data inside Red
Ring (buffered)
3 (d) Union of Two Rings (buffered)
To perform geo‐processing in the GIS system, it requires
very efficient algorithm and good processing power. It is required to find the association between two objects
in Geo ‐processing. If two objects are A and B it can be associated like A touches B or A inside B or A equals B etc. Other GIS operations can be performed like difference between two geometries, Union, Intersection.
Table 1: Various operations based on Set theory
6 TOPOLOGICAL RELATIONSHIPS Topology is a major area of mathematics concerned with
properties that are preserved under continuous deformations of objects, such as deformations that involve stretching, but no tearing or gluing. It emerged through the development of concepts from geometry and set theory, such as space, dimension, and transformation.
Fig 4 Possible association combinations between objects A and B
7 METHODOLOGY Use of oracle Geo Database can be utilized by converging
Information Communication Technology (ICT). The Geo Database communicates with Web based GIS Map engine. The
International Journal Of Scientific & Engineering Research, Volume 3, Issue 3, March‐2012 5 ISSN 2229‐5518
GIS map engine talks with client with authentication from Java Application Programming Interface (API).
Fig: 5 Interface to access Oracle Spatial Geometry
8 3D RENDERING Oracle spatial can extrude a Two‐Dimensional Geometry to
Three Dimension. Two‐dimensional footprints of buildings can be extruded using the EXTRUDE function in the SDO_UTIL package to erect a building on the two‐dimensional footprint by specifying the ground height and the top height for each vertex of the two‐dimensional geometry.
Fig 6(a) Example of a two-dimensional solid with the top heights and
ground heights specified and Fig 6(b) the extruded solid object SDO_UTIL.EXTRUDE ( geometry SDO_GEOMETRY, groundheights SDO_NUMBER_ARRAY, topheights SDO_NUMBER_ARRAY,
result_to_be_validated VARCHAR2 tolerance NUMBER ) RETURN SDO_GEOMETRY
Example 6: Extrude schema The parameters can be interpreted as shown below: geometry: This specifies the input two‐dimensional SDO_GEOMETRY object that needs to be extruded. groundheights: This is an array of numbers, one each for each vertex for use as the ground height (minimum z value). If only one number is specified, then all vertices get the same value (that is specified here). topheights: This is an array of numbers, one each for each vertex for use as the top height (minimum z value). If only one number is specified, then all vertices get the same value (that is specified here). result_to_be_validated: This is a character string that can be set to either ʹTRUEʹ or ʹFALSEʹ. This string informs Oracle whether to validate the resulting geometry. tolerance: This specifies the tolerance to use to validate the geometry. A prototype example extruding a Polygon to a Three‐Dimensional Solid is given below. SELECT SDO_UTIL.EXTRUDE( SDO_GEOMETRY ‐‐ first argument to validate is geometry ( 2003, ‐‐ 2‐D Polygon NULL, NULL, SDO_ELEM_INFO_ARRAY(1, 1003, 1 ‐‐ A polygon element ), SDO_ORDINATE_ARRAY (0,0, 2,0, 2,2, 0,2, 0,0) ‐‐ vertices of polygon ), SDO_NUMBER_ARRAY(‐1), ‐‐ Just 1 ground height value applied to all vertices SDO_NUMBER_ARRAY(1), ‐‐ Just 1 top height value applied to all vertices ʹFALSEʹ, ‐‐ No need to validate 0.5 ‐‐ Tolerance value ) EXTRUDED_GEOM FROM DUAL; Example 7: Extruding a sample data using SQL statement EXTRUDED_GEOM(SDO_GTYPE, SDO_SRID, SDO_POINT(X, Y, Z), SDO_ELEM_INFO, SDO_ORDINA ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐SDO_GEOMETRY( 3008, ‐‐ 3‐Dimensional Solid Type NULL, NULL, SDO_ELEM_INFO_ARRAY( 1, 1007, 1, ‐‐ Solid Element
International Journal Of Scientific & Engineering Research, Volume 3, Issue 3, March‐2012 6 ISSN 2229‐5518
1, 1006, 6, ‐‐ 1 Outer Composite Surface made up of 6 polygons 1, 1003, 1, ‐‐ First polygon element starts at offset 1 in SDO_ORDINATES array 16, 1003, 1, ‐‐ second polygon element starts at offset 16 31, 1003, 1, ‐‐ third polygon element starts at offset 31 46, 1003, 1, ‐‐ fourth polygon element starts at offset 46 61, 1003, 1, ‐‐ fifth polygon element starts at offset 61 76, 1003, 1), ‐‐ sixth polygon element starts at offset 76 SDO_ORDINATE_ARRAY( ‐‐ ordinates storing the vertices of the polygons 0, 0, ‐1, 0, 2, ‐1, 2, 2, ‐1, 2, 0, ‐1, 0, 0, ‐1, 0, 0, 1, 2, 0, 1, 2, 2, 1, 0, 2, 1, 0, 0, 1, 0, 0, ‐1, 2, 0, ‐1, 2, 0, 1, 0, 0, 1, 0, 0, ‐1, 2, 0, ‐1, 2, 2, ‐1, 2, 2, 1, 2, 0, 1, 2, 0, ‐1, 2, 2, ‐1, 0, 2, ‐1, 0, 2, 1, 2, 2, 1,2, 2, ‐1, 0, 2, ‐1, 0, 0, ‐1, 0, 0, 1, 0, 2, 1, 0, 2, ‐1)) Example 8: output of Extruded sample Following example shows the satellite image of a building in New York City, USA taken from Google map. The Building ground height and top height are measured in the form of latitude, longitude and altitude with Global Positioning System (GPS). The building corner points are marked and respective co‐ordinates are extruded using SDO_UTIL.EXTRUDE method.
Fig: 7(a) Building in a satellite image with edges marked with dark spots. The extruded geometry vertices returns an array that is plotted as an object as shown in the below figure.
Fig 7(b): Extruded building
9 APPLICATIONS The proposed method is used to derive new datasets from
the data warehouse. The geo‐processing kind of tasks that require very efficient algorithms can be effectively carried out by leveraging the potential of Oracle spatial geo database. 3D Elevation Model, Triangulation Irregular Network (TIN) can also be generated from geo database. Moreover, centralized data‐centric and secured applications can be developed and scaled at enterprise level.
10 LIMITATIONS The geo‐processing task and extrude task are memory
intensive. The proposed methodology can be optimized by introducing cloud based storage and routing mechanism. Parallel Processing can also optimize the performance to render the output in terms of geographical maps.
11 CONCLUSIONS The proposed solution is scalable to enterprise level. The
Spatial Data Infrastructure (SDI) can be established that can be helpful to derive spatial data and attributes from various datasets of different organizations in multiple combinations that can provide near real time information resulting in efficient and effective decision support system (DSS) to help multiple areas like Health, Defense, Disaster etc.
ACKNOWLEDGEMENTS
The authors would like to thank Mr. T .P. Singh, The Director of BISAG for his inspiration and motivation.
REFERENCES [1] Oracle Spatial User guide and Reference Release 2002,
a96630.pdf [2] OpenGIS Coordinate Transformation Service Implementation
Specification from http://www.opengeospatial.org/standards. [3] http://www.opengeospatial.org/ogc [4] http://en.wikipedia.org/wiki/Geoprocessing [5] http://en.wikipedia.org/wiki/Topology [6] Z.L. Li, R.L. Zhao, and J. Chen, ʺA generic algebra for spatial
relations,ʺ Progressing in Natural Science, vol. 12, no. 7, pp. 528‐536, July 2002.Oracle® Spatial User’s Guide and Reference Release 9.2
[7] Ravi Kothuri, Albert Godfrind, and Euro Beinat, Pro Oracle® Spatial for Oracle Database 11g, Apress, 2007
[8] C. Murray, “Oracle spatial Topology and Network Data Models,” Oracle Corporation, pp. 193‐251, 2005
International Journal of Applied Information Systems (IJAIS) – ISSN : 2249-0868 Foundation of Computer Science FCS, New York, USA Volume – No. , January 2013 – www.ijais.org
1. INTRODUCTION In cloud computing hardware, software etc. are available to
the end user in form of services. The user can subscribe for
the required service and he will be charged on the go. Means
he needs not to pay while resources are not in use. In cloud
security we have basic three characteristics confidentiality,
integrity and availability.
Availability is one of important aspect in cloud security
stating availability of service while user want to consume but
due to some security threats availability can not be achieved.
DDoS attack is one such threat which is distributed form of
Denial of Service attack in which service is consumed by
attacker and legitimate user can not use the service.
We can find solution against DDoS attack but they are based
on single host and lacks performance so here Hadoop system
for distributed processing is used.
2. Cloud Computing In cloud computing every resource is available to end user in
form of service and user pay only for what he has used. This
kind of freedom for the user leads cloud services more
popular. Cloud computing services are separated by layer
shown in fig 1.
Fig 1 Layers of Cloud Computing
1. Application Layer
This layer provides Software as a Service (SaaS).
Applications are provided on demand basis in this layer.
Highly scalable internet based applications are hosted on the
cloud and offered as a service to the end user. Google Docs,
SalesForce.com, Acrobat.com are some popular SaaS.
2. Platform Layer
This layer provides Platform as a Service (PaaS). In this layer
platforms used to design, develop, build and test application
are provided by the cloud infrastructure. Some popular PaaS
providers are Google App Engine, Azure Service Platform.
3. Infrastructure Layer
This layer provides Infrastructure as a Service (IaaS). In this
layer hardware and network equipment are provided as
service. It also provides storage and database management
and computing capabilities on demand. Some well known
International Journal of Applied Information Systems (IJAIS) – ISSN : 2249-0868 Foundation of Computer Science FCS, New York, USA Volume – No. , January 2013 – www.ijais.org
2
IaaS providers are Amazon Web Services, Go Grid and 3
Tera.
3. Cloud Security Cloud security has three basic aspects called confidentiality,
Integrity and Availability. This are also called basic security
goals. Fig 2 shows cloud security goals and availability is
focused in this research paper. DDoS attack is an issue of
availability of any cloud service.
A. Confidentiality
Confidentiality is the prevention of the intentional or
unintentional unauthorized disclosure of contents. Sometime
it is referred as Secrecy or Privacy in other terms. It means
that Confidentiality is the prevention of the intentional or
unintentional unauthorized disclosure of contents. Only
authorized person can read, write, view, print and know about
the content is available.
To ensure confidentiality network security protocols, Data
Encryption services, Authentication services are used.
Fig 2 Cloud Security Goals
B. Integrity
Integrity is the guarantee that message sent is received
without changing and the message is not intentionally or
unintentionally altered.
To ensure integrity Firewalls, Intrusion Detection Systems
etc. are used.
C. Availability
Availability is the element that creates reliability and stability
in the system. It means that the data is available whenever
asked.
To ensure Availability redundant disks and network system
and Backup utility are used.
4. Distributed Denial of Service Attack
4.1 Definition and Purpose A denial-of-service attack (DoS attack) is an attempt to make
a computer resource unavailable to its intended users.
Distributed denial-of-service attack (DDoS attack) is a kind of
DoS attack where attackers are distributed and targeting a
victim. It generally consists of the concerted efforts of a
person, or multiple people to prevent an Internet site or
service from functioning efficiently or at all, temporarily or
indefinitely.
Motives of attackers to perform Denial of Service attack could
be of different nature; an attacker might want to show his
power by attacking a large, popular Web site that will permit
him or her to be recognised in the underground community.
Another possibility of motive is a political one, for example, a
political party can ask an attacker to do a DoS attack on the
Web site of a concurrent political party. Most common motive
is the commercial one; a company can ask an attacker to
attack a concurrent; during the time where the commercial
website is unavailable it will lose lots of money and worst, the
trust of user and they might want to go shopping on another
website.
4.2 How DDoS Attack works DDoS attack is performed by infected machines called bots
and group of bots is called botnet. This bots (Zombie) are
controlled by an attacker by installing malicious code or
software which acts as per command passed by attacker. Bots
are ready to attack anytime upon receiving command from the
attacker. Many types of agents have scanning capability that
permit to identify open port of a range of machines. When the
scanning is finished, the agent takes the list of machines with
open port and launches vulnerability-specific scanning to
detect machines with un-patched vulnerability. If the agent
found a machine with vulnerability, it could launch an attack
to install another agent in the machine.
Fig 3 shows DDoS attack architecture explaining its working
mechanism.
Fig 3 DDoS attack Architecture
For controlling agents, the attackers use Command and
Control (C&C) server, currently, the majority of botnet use
Internet Relay Chat (IRC). Other possibilities exist as Peer-to-
Peer (P2P) C&C, Instant Messaging (IM) C&C or even Web-
based C&C server. When the agent is connected to the C&C it
can ask for updates as IP address for other C&Cs, software
update or exploit software. The agent could also ask for
orders, if the agent is newly installed it could ask order for
protect himself, this could be by asking the C&C the location
of the latest anti-virus and prevent it to detect the agent by
stopping the service.
A DoS attack’s main characteristic is that an attacker attempts
to prevent one or more legitimate users of a service from the
use of the required resources. There are two general forms of
DoS attacks: those that crash services and those that flood
services. Therefore, he attempts (1) to inhibit legitimate
network traffic by flooding the network with useless traffic.
International Journal of Applied Information Systems (IJAIS) – ISSN : 2249-0868 Foundation of Computer Science FCS, New York, USA Volume – No. , January 2013 – www.ijais.org
3
(2) to deny access to a service by disrupting connections
between two parties, (3) to block the access of a particular
individual to a service, or (4) to disrupt the specific system or
service itself. Here http flooding as a DDoS attack is
considered and its solution is proposed.
5. Hadoop MapReduce
5.1 Introduction MapReduce, developed by Google, is a software paradigm for
processing a large data set in a distributed parallel way. Since
Google’s MapReduce and Google file system (GFS) are
proprietary, an open-source MapReduce software project,
Hadoop, was launched to provide similar capabilities of the
Google’s MapReduce platform by using thousands of cluster
nodes. Hadoop distributed file system (HDFS) is also an
important component of Hadoop, that corresponds to GFS.
Hadoop consists of two core components: the job
management framework that handles the map and reduces
tasks and the Hadoop Distributed File System (HDFS).
Hadoop's job management framework is highly reliable and
available, using techniques such as replication and automated
restart of failed tasks. The framework has optimization for
heterogeneous environments and workloads, e.g., speculative
(redundant) execution that reduces delays due to stragglers.
Fig 4 Hadoop Multinode Cluster Architecture
Fig 4 shows Hadoop multinode cluster architecture which
works in distributed manner for MapReduce problem.
5.2 DDoS Attack and Hadoop DDoS attack is critical and find easily but conventional
solution are single host oriented. We know that for that we
can use honeypot system to overcome from it. Some network
monitoring tools like wireshark, nmap are available but they
just sniff the network and generate log files in huge size. So to
handle this gigantic file we need a special file system like
HDFS and MapReduce framework for processing log
effectively and efficiently. So Hadoop is a well suited solution
here.
6. MapReduce for Counter-based DDoS
Detection method In this method simple MapReduce algorithm to detect DDoS
with URL counting is implemented. This algorithm needs
three input parameters of time interval, threshold and
unbalanced ratio, which can be loaded through the
configuration property or the distributed cache mechanism of
MapReduce. Time interval limits monitoring duration of the
page request. Threshold indicates the permitted frequency of
the page request to the server against the previous normal
status, which determines whether the server should be
alarmed or not. The unbalanced ratio variable denotes the
anomaly ratio of response per page request between a specific
client and a server. This value is used for picking out attackers
from the clients.
The map function filters non-HTTP GET packets and
generates key values of server IP address, masked timestamp,
and client IP address. The masked timestamp with time
interval is used for counting the number of requests from a
specific client to the specific URL within the same time
duration. The reduce function summarizes the number of URL
requests, page requests, and server responses between a client
and a server. Finally, the algorithm aggregates values per
server. When total requests for a specific server exceeds the
threshold, the MapReduce job emits records whose response
ratio against requests is greater than unbalanced ratio,
marking them as attackers. While this algorithm has the low
computational complexity and could be easily converted to
the MapReduce implementation, it needs a prerequisite to
know the threshold value from historical monitoring data in
advance.
Fig 5 MapReduce for counter-based DDoS detection
7. Result As far as scalability is concerned, the performance of the
counter based DDoS detection method is used by varying
cluster nodes. Figure 6 shows statistical graph of the detection
job with ten worker nodes was completed within 25 minutes
for 500GB and 47 minutes for 1TB, which is over 8 times
faster than one node's and 2.9 times faster than 3 nodes'
respectively. From evaluation results the increased
performance enhancement is observed as the volume of input
traffic becomes large.
International Journal of Applied Information Systems (IJAIS) – ISSN : 2249-0868 Foundation of Computer Science FCS, New York, USA Volume – No. , January 2013 – www.ijais.org
4
Fig 6 Completion time of a counter-based DDoS detection
job regarding various # of Hadoop datanodes and traffic
sizes
8. Conclusion Scalability of the detection of DDoS attack is focused in this
paper and focused a Hadoop based DDoS detection model.
The approach of Hadoop based solution solves scalability
issues by parallel data processing for DDoS attack detection
for a huge volume of traffic. Other approaches are single host
based and suggesting memory increment for efficiency but in
this paper suggested a Hadoop based solution which solves
scalability issue by parallel data processing.
This kind of MapReduce job is preferable for Hadoop as it is
Computation oriented framework.
9. REFERENCES [1] NathalieWeiler, Eleventh IEEE International Workshops
on Enabling Technologies,2002, “Honeypots for
Distributed Denial of Service Attacks”
[2] Yuqing Mai, Radhika Upadrashta and Xiao Su, ITCC’04
“J-Honeypot: A Java-Based Network Deception Tool
with Monitoring and Intrusion Detection“
[3] Yeonhee Lee, Wonchul Kang, and Youngseok Lee, 51-
63, TMA'11“A Hadoop-based Packet Trace Processing
Tool”
[4] Yun Yang, Hongli Yang and JiaMi, IEEE, 2011, “Design
of Distributed Honeypot System Based on Intrusion
Tracking”
[5] Anup Suresh Talwalkar, 2011, “HadoopT - Breaking the
Scalability Limits of Hadoop”
[6] Flavien Flandrin, 2010, “A Methodology To Evaluate