Scott Clements, Monash University Software Engineering, Copyright 2003. Web Document Analysis- Improving Search Technology using Image Processing Scott.

Post on 14-Jan-2016

218 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

Scott Clements, Monash University Software Engineering, Copyright 2003.

Web Document Analysis-Improving Search Technology using Image

Processing

Scott Clements Bachelor of Software Engineering

Monash Universitywww.csse.monash.edu.au/~sdcle1/

Supervisor: Dr. Sid Ray

Scott Clements, Monash University Software Engineering, Copyright 2003.

Interests and Expertise

Dr. Sid Ray• Image Processing expert

Scott Clements•Internet Technology

•Software Engineering

•Database Management

•Interface Design

Scott Clements, Monash University Software Engineering, Copyright 2003.

Union of Expertise

Engineering a product which uses:

Image processing & Internet Technology

Scott Clements, Monash University Software Engineering, Copyright 2003.

Primary Goals

• To improve search quality using Image Processing.

• To investigate– Image histogram matching to find similar images– Colour predominance in images

Scott Clements, Monash University Software Engineering, Copyright 2003.

Secondary Goals

• To Engineer a product which has industry potential.

–Project Management–Interface Design–Database Management –Information Retrieval

Scott Clements, Monash University Software Engineering, Copyright 2003.

Background• Popular search technology [mcbryan94, brin98, pinkerton00]

– Text based– Quality of results can be poor– Difficult to find images

• Multimedia search technology [ogle95, smith97]

– Text, image and video based– Poor interface design

•Aimed at Image Processing experts

– Good use of Databases Management systems

Scott Clements, Monash University Software Engineering, Copyright 2003.

Software Engineering Methods

Stages• Initial program: Grey-scale image matching• Refinement 1: Colour image matching• Refinement 2: Colour predominance image matching

S y s te m R e fine m e nt s ta ge s

Integratio n

I m p lem en t

T es t

D o c u m en t R es u lts an dF in d in g s

Initial p ro gram

Scott Clements, Monash University Software Engineering, Copyright 2003.

Image processing technique

Data types:

• Histogram Data

• Colour Predominance Data

I m ag e I m ag e p r o c es s in g D ata

In p u t O u tp u t

Scott Clements, Monash University Software Engineering, Copyright 2003.

System Architecture

P re -p ro ce s s in g

- Im a ge a na ly s is

- F o rm a t info rm a tio n

- A d d info rm a tio n tod a ta b a s e

D ata b a s e

P o s t-p ro ce s s in g

- S e nd q u e ry info rm a tio n

- P ro c e s s q u e ryinfo rm a tio n

- C a lc u la te s e a rc hre s u lts

- R e tu rn s e a rc h re s u ltsUs e r

Scott Clements, Monash University Software Engineering, Copyright 2003.

System Architecture continued

P re -p ro ces s in g

- C

- M o na s h Im a ge Lib .

- P H P /H T M LD atab as e

M y SQ L

P o s t-p ro ces s in g

- P H P / H T M L

Us e r

Scott Clements, Monash University Software Engineering, Copyright 2003.

Colour histogram matching•Method:

–Using: •Group 16 configuration

•Total difference Algorithm

•Requirements–Database design–Histogram analysis

•Investigate:–Interface design–Relevance Feedback

Scott Clements, Monash University Software Engineering, Copyright 2003.

Histograms (Group 16 Configuration)

Colour Histograms

-Count the number of occurrences of each colour intensity

-256 intensities for each RGB component. (24bit image)

-Insert this information into the database

Problem: Excessive amount of information

Solution: Convert to Group 16 Configuration.

Scott Clements, Monash University Software Engineering, Copyright 2003.

Database Design

r 2 r e d

id entific atio nr1r2r3r4r5r6r7r8... .r15r16

r 2 bl ue

id entific atio nb 1b 2b 3b 4b 5b 6b 7b 8... .b 15b 16

r 2 g r e e n

id entific atio ng1g2g3g4g5g6g7g8... .g15g16

r2

id entific atio nnam e...**R es erved S p ac e**

Scott Clements, Monash University Software Engineering, Copyright 2003.

Algorithm

•Aim: To find other similar images

•Method: Compare each of the histograms with the query histogram

•Algorithm: Total difference

Scott Clements, Monash University Software Engineering, Copyright 2003.

Total Difference Algorithm

-Query Image versus images in the database

-Compare each histogram-Find the positive difference between each histogram (Total Difference)-Convert 0-300% range to a similarity rating between 0-100%

-Return the results which are within a user defined similarity rating

Scott Clements, Monash University Software Engineering, Copyright 2003.

Interface Design

Scott Clements, Monash University Software Engineering, Copyright 2003.

Relevance Feedback

User Feedback:–Clicking the similarity button–Proving interest in a particular image

Relevance–Sorting results:

• most similar to least similar

Scott Clements, Monash University Software Engineering, Copyright 2003.

Results and Findings

Method Accuracy

Grey-scale Histogram matching 64%

Colour Histogram Matching 84%

•Test Set: –Real life photos–Computer generated images

•Weakness–Grey-scale histogram matching. (Unacceptable results)–Images with many different colours–Spatial Arrangements–Needing to resize the images. (standardisation for histograms)

Scott Clements, Monash University Software Engineering, Copyright 2003.

Colour predominance

• Assign each pixel a colour value (if possible)

• Found that RGB was not suitable in this case

• HSB was much easier to find colour ranges

• Method: Using an image program find the Hue, Saturation and Brightness ranges for each colour.

Scott Clements, Monash University Software Engineering, Copyright 2003.

Algorithm DesignAnalysis• Count each occurrence of a certain colour

• Convert the occurrence result to a percent of predominance between 0-100%

Query• Query the database to find images which have predominant colours.

Scott Clements, Monash University Software Engineering, Copyright 2003.

Database Refinement

r 3 r e d

id entific atio nr1r2r3r4r5r6r7r8... .r15r16

r 3 bl ue

id entific atio nb 1b 2b 3b 4b 5b 6b 7b 8... .b 15b 16

r 3 g r e e n

id entific atio ng1g2g3g4g5g6g7g8... .g15g16

r3

id entific atio nnam e...**R es erved S p ac e**

r 3 pr e do m i nanc e

id entific atio nredm agentap urp leb luec yangreenyello wo ranged arkb right

Scott Clements, Monash University Software Engineering, Copyright 2003.

Interface design

Scott Clements, Monash University Software Engineering, Copyright 2003.

Interface design continued

Scott Clements, Monash University Software Engineering, Copyright 2003.

Relevance Feedback

• Not fully suitable for Colour predominance

• Use a subset of Relevance Feedback to improve useability

• Sort the result from most to least relevant

Scott Clements, Monash University Software Engineering, Copyright 2003.

Results and Findings

•Test set: –Real life photos–Computer generated images

–Easy method to understand for users–Less information stored in the database–Accurate and efficient method to use

Algorithm Similarity results

Colour Predominance 86%

Scott Clements, Monash University Software Engineering, Copyright 2003.

Conclusion and ApplicationsSmall to Medium sized system • Example: local image database• Colour histogram matching• Colour predominance

Medium to Large system • Example: Internet search engine• Only Colour predominance

–More efficient–Less information to store about images–Easy to understand

Scott Clements, Monash University Software Engineering, Copyright 2003.

Future Research• Parallelism in image analysis• Alternative image data for histogram matching (e.g. HSB)• Replace or extend Monash Image Library (MIL) to directly support popular internet image formats.• Improve the documentation for colour image manipulation in MIL.•More extensive testings of colour predominance•Addition of predominance levels

Scott Clements, Monash University Software Engineering, Copyright 2003.

Questions?

Are there any questions?

top related