Top Banner
STEGANOGRAPHY: STEGANOGRAPHY: Data Mining: Data Mining: SOUNDARARAJAN EZEKIEL Department of Computer Science Indiana University of Pennsylvania Indiana, PA 15705
24
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Chap1

STEGANOGRAPHY: STEGANOGRAPHY: Data Mining:Data Mining:STEGANOGRAPHY: STEGANOGRAPHY: Data Mining:Data Mining:

SOUNDARARAJAN EZEKIEL

Department of Computer Science

Indiana University of Pennsylvania

Indiana, PA 15705

SOUNDARARAJAN EZEKIEL

Department of Computer Science

Indiana University of Pennsylvania

Indiana, PA 15705

Page 2: Chap1

Steganography Cryptography Data MiningSteganography Cryptography Data MiningSteganography Cryptography Data MiningSteganography Cryptography Data Mining

Art of hiding Art of hiding information information in ways that in ways that prevent the prevent the detection of detection of hidden hidden messagemessage

Existence is Existence is not knownot know

Science of Science of writing in writing in secret code secret code

It encodes a It encodes a message so it message so it cannot be cannot be understoodunderstood

Discovering hidden Values in your data Warehouse

That isThe extraction of hidden predictive information from large database

Knowledge discovery method– extraction of implicit and interesting pattern from large data collection

Page 3: Chap1

Data Mining-- IntroductionData Mining-- Introduction It started when we started to store data in

computer( businesses) Continued improvements– technology that

navigate through data in real time Examples:-– Single case:– Web server collect data for every single cleick– Logs are too big and contain gibberish– Lots of data and statistics– What we collected is not really useful– Multiple Case:-– Collection of web servers with large bandwidth– Think about the size of the data we collect

Page 4: Chap1

Data Mining --- ContinueData Mining --- Continue It helps to design better and more intelligent

business( e-learning environments) because it supported by– Massive data collection– Powerful multiprocessor computers– Good data mining algorithms

It existed at least 10 years, but it is getting popular recently

Example:-– Winter Corporation Report

• Data warehouses with as much as 100 to 200 terabytes of raw data will be operational by next year, performing nearly 2,000 concurrent queries and occupying nearly 1 petabyte (1,000 terabytes) of disk space. In the same time period, transaction-processing databases will handle workloads of nearly 66,000 transactions per second

Page 5: Chap1

Evolution of Data miningEvolution of Data miningEvolutionary step Question Tech Product

providers

characteristics

Data collection

60’s

What was my total revenue last few years

Computer, tapes, disks

IBM , CDC Retrospective static data delivery

Data Access

80’s

What were unit sales in India last year January

RDBMS(Relational DataBases)

SQL( Structured Query Languages)

ODBC

Oracle

Sybase

Informix

IBM

Microsoft

Dynamic data delivery

Data warehouse and decision support

90’s

What were unit sales price in India last March?

On-line analytic processing (OLAP)

Multidimensional data base, data warehouses

Pilot

Comshare

Arbor

Cognos

Microstrategy

Dynamic data delivery in multiple level

Data mining

Now

What will be unit price in India next month?

Why?

Advanced algorithms, multiprocessor computers, massive database

Pilot

Lockheed

IBM,SGI

Many more…

Prospective, proactive information delivery

Page 6: Chap1

The scope of Data miningThe scope of Data miningIt is similar to sifting gold from immense

amount of dirt--- searching valuable information in a gigabytes data

Automated prediction of trends and behaviors: Data mining automates the process of finding predictive information in a large database.

• Example: Question related to target marketing– Data mining can use mailing list data– other previous data to

identify the solution

• Another example- Forecasting bankruptcy by identifying segments of a population likely to respond similarly to given events

Page 7: Chap1

Automated discovery of previously unknown patterns: It sweep through the database and identify previously hidden patterns in one step– Example: Unrelated items purchased together in

a store.• Detecting fraudulent credit card transactions etc

Data base can be larger in both depth and breadth– High performance data mining need to analyze

full depth of a database without pre-selecting subsets

– Larger samples yield lower estimation errors and variances

Page 8: Chap1

Research RankResearch Rank2001 – According to MIT’s Technology

Review – Data mining is a top 10 research area

Recently – According to Gartner Group Advanced Technology Research Note– data mining and AI is top 5 key research area.

Page 9: Chap1

Multi-disciplinary field with a broad applicabilityMulti-disciplinary field with a broad applicability Has several applications

– Market based analysis– Customer relationship

management– Fraud detection– Network intrusion detection– Non-destructive eavaluation– Astronomy (look up dataa)– Remote sensing data

• ( look down data)

– Text and mulitmedia mining– Medical imaging– Automated target recognition

Combined ideas from several diffferent fields– Steganography-- Cryptography

My point of view of Data My point of view of Data miningminingBorrowing the idea from•Machine Learning•Artificial Intelligence•Statistics•High performance computing•Signal and Image Processing• Mathematical Optimization• Pattern Recognition•Natural Language processing•Steganography•Cryptography

Page 10: Chap1

General view of Data miningGeneral view of Data mining

RawData

TargetData

Preprocesseddata

Transformed Data Pattern

Knowledge

Data processing pattern recog. Interpreting resultsData FusionSamplingMRA

De-noisingObject IdentificationFeature ExtractionNormalization

DimensionReduction

ClassificationClusteringRegression

VisualizationValidation

An Iterative and Interactive ProcessAn Iterative and Interactive Process

Page 11: Chap1

Our Research Based OnOur Research Based On

Data Preprocessing–Multiresolution Analysis– De-noising ( wavelet based methods)– Object Classifications– Feature Extraction

Pattern Recognition– Classification– Clustering

Visualization and Validation– Steganography– Cryptography

Page 12: Chap1

Where we are going from hereWhere we are going from here More robust , accurate, scalable algorthim– For pre-processing and pattern recognition– Wavelets– and fractals

Newer data types– Video and multimedia– Multi-sensor data

More complex problems– Dynamic tracking in video– Mining text, audio, video, images

Investigating Steganography in images, analysis of data hiding methods, attacks against hidden information, and counter measures to attacks against digital watermarking ( detection and distortion)

Page 13: Chap1

How data mining works?How data mining works? How exactly the data mining able to tell you important

things that you did not know or what is going to happen next?

The method/ techniques that is used to perform these feats in data mining is called modeling – Modeling is simply the act of building a model in one situation

where you know the answer and then applying it to another situation that you don’t

– Example: Sunken treasure ship– Bermuda shore, other ships– path-- keep all these information– build the model– if the model is good– you find the treasure in the ocean

– Example 2: Identify telephone customer– for example you have the information that is the model that 98% customer who makes $60K per year spend more than $80 per month on long distance• with this model new customer can be selectively targeted

Page 14: Chap1

Most commonly used techniquesMost commonly used techniques Artificial Neural Networks: Non linear predictive models

that learn through training and resemble biological neural networks in structure

Decision Trees: Tree- shaped structures that represents set of decisions . These decisions generated rules for the classification of a dataset. Specific decision tree include classification and Regression Test(CART)and Chi Square Automated Interaction Detection (CAID)

Genetic Algorithms: optimization techniques that uses processes genetic combination, mutation, and selection in a design based on the concept of evolution

Nearest Neighbor Method: Rule Induction: OUR METHODS WILL BE BASED ON WAVELETS, OUR METHODS WILL BE BASED ON WAVELETS,

FRACTALS, STEG, AND CRYPTFRACTALS, STEG, AND CRYPT

Page 15: Chap1

Steganography MethodsSteganography MethodsLets us discuss few methods and its

advantage and disadvantage 1. Least Significant Method– Idea:- Hide the hidden message in LSB of the

pixels– Example:- – Advantage:- quick and easy– works well in

gray image– Disadvantage:- insert in 8 bit– changes color–

noticeable change– vulnerable to image processing– cropping and compression

Page 16: Chap1

Redundant method– Store more than one time--- withstand

croppingSpread Spectrum – Store the hidden message everywhere

STEGANALYSISSTEGANALYSISDetection DistortionDetection Distortion

Analyst observe various Various relationship betweenCover, message, stego-mediaSteganography tool

Analyst manipulate the stego-mediaTo render the embedded informationUseless or remove it altogether

Seeing the Unseen

Page 17: Chap1

DCT - Discrete Cosine TransformationDCT - Discrete Cosine Transformation– Encode

• Take image• Divide into 8x8 blocks• Apply 2-D DCT--- DCT

coefficients• Apply threshold value• Store the hidden message

in that place• Take inverse– store as

image

– Decode• Start with modified image• Apply DCT• Find coefficient less than

T• Extract bits• Combine bits and make

message

219 215 214 216 218 218 217 216

219 216 216 216 215 215 215 215 217 217 218 216 212 212 213 215 215 215 215 215 211 212 214 216 217 216 214 216 215 215 217 218 216 216 215 214 215 215 215 216 215 214 210 210 211 215 215 216

218 215 211 211 213 214 216 216

1720 1.524 7.683 1.234 1.625 0.9234 -0.07047 -1.055 5.667 3.475 -4.181 -1.524 1.152 1.637 1.016 0.38020.3711 -1.442 1.067 5.944 0.3943 -0.4591 0.1313 0.7812 3.888 -3.356 -1.97 3.265 0.5632 -0.939 -0.2434 0.2354 1.625 -2.279 0.4735 1.392 1.375 0.6552 -1.143 0.03459-4.049 -1.223 0.5466 -0.5425 -1.013 -0.2651 0.5696 -0.9296 1.876 1.924 -1.369 -1.132 -0.02802 -0.4646 0.1831 0.97290.8995 -0.7233 0.667 0.436 0.1325 -0.03665 -0.3141 -0.4749

Page 18: Chap1

Wavelets TransformationWavelets TransformationWavelets are basis function in continuous time.a basis is a set of linearly independent functions that can be used to produce all admissible functions f(t)

( )jkw t

,

( ) combination of basis functions ( )jk jkj k

f t b w t

The special feature of wavelet basis is that all functions ( )jkw t

are constructed from a single mother wavelet w(t). This wavelet is is a small wave ( a pulse). Normally it starts at time t=0 and end at time t=N Compressed = 0 (2 )j

jw w t Shifted k time = 0 ( ) ( )kw t w t k

Combine both we have ( ) (2 )jjkw t w t k

Haar Wavelet :- 1909 Haar, 1984– theory, 88– daubechies 89- Mallat 2-d, mra, -- 92- bi-orthogonal

Haar=

Page 19: Chap1

Message to be Hidden

Carrier Wavelet Wavelet

TransformationTransformationThresholdingThresholdingCompressionCompression

Stego image

Error ImageError Image

Inverse TransformationInverse TransformationExtract the Hidden MessageExtract the Hidden Message

figurefigure

Page 20: Chap1

Information security and data miningInformation security and data miningGoal of intrusion detection – discover

intrusion into a computer or networkWith internet and available tool for attacking

networks– security becomes a critical component of network

Misuse detection: finds intrusion by looking for activity corresponding to known techniques for intrusion

Anomaly detection: the system defines the expected behavior of the network in advance

Page 21: Chap1

What we wantWhat we wantThe tools to filter and classify informationTools to find and retrieve the relevant

information when you need itTools that adapt to your pace and needsTools to predict information needsTools to recommend tasks and information

sourcesTools than can be personalized, manually or

automatically

Page 22: Chap1

The tools should be…The tools should be… Non- intrusive Secure Integrated Adaptable Controllable Automatic or semi-automatic Useful For learners For educators Integrate operational data with customer,

suppliers and market --

Page 23: Chap1

Profitable applicationProfitable application A wide range of companies have deployed successful

application of data mining Some applications area include

– A pharmaceutical company can analyze its recent sales force activity and their results to improve target of high-value physician and determine which marketing activities will have the greatest impact in the next few months

– A credit card companies can leverage its vast warehouse of customers transactions data to identify customers most likely to be interested in a new credit product

– A diversified transportation company with a large direct sales forces can apply data mining to identify the best prospect for its services

– A large consumer package goods company can apply data mining to improve its sales process to retailers

Page 24: Chap1

ConclusionConclusionIn this talk, we have discussed data mining

related topics Our goals– Research– Software and algorithms– Application

Our main focus is Science Data, though applicable to other data sets as well

More information – check out websitehttp://www.cosc.iup.eud/sezekielContact: [email protected]