Top Banner
Scott Clements, Monash University Software Engineering, Copyright 2003. Web Document Analysis- Improving Search Technology using Image Processing Scott Clements Bachelor of Software Engineering Monash University www.csse.monash.edu.au/~sdcle1/ Supervisor: Dr. Sid Ray
28

Scott Clements, Monash University Software Engineering, Copyright 2003. Web Document Analysis- Improving Search Technology using Image Processing Scott.

Jan 14, 2016

Download

Documents

Melvin Stables
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Scott Clements, Monash University Software Engineering, Copyright 2003. Web Document Analysis- Improving Search Technology using Image Processing Scott.

Scott Clements, Monash University Software Engineering, Copyright 2003.

Web Document Analysis-Improving Search Technology using Image

Processing

Scott Clements Bachelor of Software Engineering

Monash Universitywww.csse.monash.edu.au/~sdcle1/

Supervisor: Dr. Sid Ray

Page 2: Scott Clements, Monash University Software Engineering, Copyright 2003. Web Document Analysis- Improving Search Technology using Image Processing Scott.

Scott Clements, Monash University Software Engineering, Copyright 2003.

Interests and Expertise

Dr. Sid Ray• Image Processing expert

Scott Clements•Internet Technology

•Software Engineering

•Database Management

•Interface Design

Page 3: Scott Clements, Monash University Software Engineering, Copyright 2003. Web Document Analysis- Improving Search Technology using Image Processing Scott.

Scott Clements, Monash University Software Engineering, Copyright 2003.

Union of Expertise

Engineering a product which uses:

Image processing & Internet Technology

Page 4: Scott Clements, Monash University Software Engineering, Copyright 2003. Web Document Analysis- Improving Search Technology using Image Processing Scott.

Scott Clements, Monash University Software Engineering, Copyright 2003.

Primary Goals

• To improve search quality using Image Processing.

• To investigate– Image histogram matching to find similar images– Colour predominance in images

Page 5: Scott Clements, Monash University Software Engineering, Copyright 2003. Web Document Analysis- Improving Search Technology using Image Processing Scott.

Scott Clements, Monash University Software Engineering, Copyright 2003.

Secondary Goals

• To Engineer a product which has industry potential.

–Project Management–Interface Design–Database Management –Information Retrieval

Page 6: Scott Clements, Monash University Software Engineering, Copyright 2003. Web Document Analysis- Improving Search Technology using Image Processing Scott.

Scott Clements, Monash University Software Engineering, Copyright 2003.

Background• Popular search technology [mcbryan94, brin98, pinkerton00]

– Text based– Quality of results can be poor– Difficult to find images

• Multimedia search technology [ogle95, smith97]

– Text, image and video based– Poor interface design

•Aimed at Image Processing experts

– Good use of Databases Management systems

Page 7: Scott Clements, Monash University Software Engineering, Copyright 2003. Web Document Analysis- Improving Search Technology using Image Processing Scott.

Scott Clements, Monash University Software Engineering, Copyright 2003.

Software Engineering Methods

Stages• Initial program: Grey-scale image matching• Refinement 1: Colour image matching• Refinement 2: Colour predominance image matching

S y s te m R e fine m e nt s ta ge s

Integratio n

I m p lem en t

T es t

D o c u m en t R es u lts an dF in d in g s

Initial p ro gram

Page 8: Scott Clements, Monash University Software Engineering, Copyright 2003. Web Document Analysis- Improving Search Technology using Image Processing Scott.

Scott Clements, Monash University Software Engineering, Copyright 2003.

Image processing technique

Data types:

• Histogram Data

• Colour Predominance Data

I m ag e I m ag e p r o c es s in g D ata

In p u t O u tp u t

Page 9: Scott Clements, Monash University Software Engineering, Copyright 2003. Web Document Analysis- Improving Search Technology using Image Processing Scott.

Scott Clements, Monash University Software Engineering, Copyright 2003.

System Architecture

P re -p ro ce s s in g

- Im a ge a na ly s is

- F o rm a t info rm a tio n

- A d d info rm a tio n tod a ta b a s e

D ata b a s e

P o s t-p ro ce s s in g

- S e nd q u e ry info rm a tio n

- P ro c e s s q u e ryinfo rm a tio n

- C a lc u la te s e a rc hre s u lts

- R e tu rn s e a rc h re s u ltsUs e r

Page 10: Scott Clements, Monash University Software Engineering, Copyright 2003. Web Document Analysis- Improving Search Technology using Image Processing Scott.

Scott Clements, Monash University Software Engineering, Copyright 2003.

System Architecture continued

P re -p ro ces s in g

- C

- M o na s h Im a ge Lib .

- P H P /H T M LD atab as e

M y SQ L

P o s t-p ro ces s in g

- P H P / H T M L

Us e r

Page 11: Scott Clements, Monash University Software Engineering, Copyright 2003. Web Document Analysis- Improving Search Technology using Image Processing Scott.

Scott Clements, Monash University Software Engineering, Copyright 2003.

Colour histogram matching•Method:

–Using: •Group 16 configuration

•Total difference Algorithm

•Requirements–Database design–Histogram analysis

•Investigate:–Interface design–Relevance Feedback

Page 12: Scott Clements, Monash University Software Engineering, Copyright 2003. Web Document Analysis- Improving Search Technology using Image Processing Scott.

Scott Clements, Monash University Software Engineering, Copyright 2003.

Histograms (Group 16 Configuration)

Colour Histograms

-Count the number of occurrences of each colour intensity

-256 intensities for each RGB component. (24bit image)

-Insert this information into the database

Problem: Excessive amount of information

Solution: Convert to Group 16 Configuration.

Page 13: Scott Clements, Monash University Software Engineering, Copyright 2003. Web Document Analysis- Improving Search Technology using Image Processing Scott.

Scott Clements, Monash University Software Engineering, Copyright 2003.

Database Design

r 2 r e d

id entific atio nr1r2r3r4r5r6r7r8... .r15r16

r 2 bl ue

id entific atio nb 1b 2b 3b 4b 5b 6b 7b 8... .b 15b 16

r 2 g r e e n

id entific atio ng1g2g3g4g5g6g7g8... .g15g16

r2

id entific atio nnam e...**R es erved S p ac e**

Page 14: Scott Clements, Monash University Software Engineering, Copyright 2003. Web Document Analysis- Improving Search Technology using Image Processing Scott.

Scott Clements, Monash University Software Engineering, Copyright 2003.

Algorithm

•Aim: To find other similar images

•Method: Compare each of the histograms with the query histogram

•Algorithm: Total difference

Page 15: Scott Clements, Monash University Software Engineering, Copyright 2003. Web Document Analysis- Improving Search Technology using Image Processing Scott.

Scott Clements, Monash University Software Engineering, Copyright 2003.

Total Difference Algorithm

-Query Image versus images in the database

-Compare each histogram-Find the positive difference between each histogram (Total Difference)-Convert 0-300% range to a similarity rating between 0-100%

-Return the results which are within a user defined similarity rating

Page 16: Scott Clements, Monash University Software Engineering, Copyright 2003. Web Document Analysis- Improving Search Technology using Image Processing Scott.

Scott Clements, Monash University Software Engineering, Copyright 2003.

Interface Design

Page 17: Scott Clements, Monash University Software Engineering, Copyright 2003. Web Document Analysis- Improving Search Technology using Image Processing Scott.

Scott Clements, Monash University Software Engineering, Copyright 2003.

Relevance Feedback

User Feedback:–Clicking the similarity button–Proving interest in a particular image

Relevance–Sorting results:

• most similar to least similar

Page 18: Scott Clements, Monash University Software Engineering, Copyright 2003. Web Document Analysis- Improving Search Technology using Image Processing Scott.

Scott Clements, Monash University Software Engineering, Copyright 2003.

Results and Findings

Method Accuracy

Grey-scale Histogram matching 64%

Colour Histogram Matching 84%

•Test Set: –Real life photos–Computer generated images

•Weakness–Grey-scale histogram matching. (Unacceptable results)–Images with many different colours–Spatial Arrangements–Needing to resize the images. (standardisation for histograms)

Page 19: Scott Clements, Monash University Software Engineering, Copyright 2003. Web Document Analysis- Improving Search Technology using Image Processing Scott.

Scott Clements, Monash University Software Engineering, Copyright 2003.

Colour predominance

• Assign each pixel a colour value (if possible)

• Found that RGB was not suitable in this case

• HSB was much easier to find colour ranges

• Method: Using an image program find the Hue, Saturation and Brightness ranges for each colour.

Page 20: Scott Clements, Monash University Software Engineering, Copyright 2003. Web Document Analysis- Improving Search Technology using Image Processing Scott.

Scott Clements, Monash University Software Engineering, Copyright 2003.

Algorithm DesignAnalysis• Count each occurrence of a certain colour

• Convert the occurrence result to a percent of predominance between 0-100%

Query• Query the database to find images which have predominant colours.

Page 21: Scott Clements, Monash University Software Engineering, Copyright 2003. Web Document Analysis- Improving Search Technology using Image Processing Scott.

Scott Clements, Monash University Software Engineering, Copyright 2003.

Database Refinement

r 3 r e d

id entific atio nr1r2r3r4r5r6r7r8... .r15r16

r 3 bl ue

id entific atio nb 1b 2b 3b 4b 5b 6b 7b 8... .b 15b 16

r 3 g r e e n

id entific atio ng1g2g3g4g5g6g7g8... .g15g16

r3

id entific atio nnam e...**R es erved S p ac e**

r 3 pr e do m i nanc e

id entific atio nredm agentap urp leb luec yangreenyello wo ranged arkb right

Page 22: Scott Clements, Monash University Software Engineering, Copyright 2003. Web Document Analysis- Improving Search Technology using Image Processing Scott.

Scott Clements, Monash University Software Engineering, Copyright 2003.

Interface design

Page 23: Scott Clements, Monash University Software Engineering, Copyright 2003. Web Document Analysis- Improving Search Technology using Image Processing Scott.

Scott Clements, Monash University Software Engineering, Copyright 2003.

Interface design continued

Page 24: Scott Clements, Monash University Software Engineering, Copyright 2003. Web Document Analysis- Improving Search Technology using Image Processing Scott.

Scott Clements, Monash University Software Engineering, Copyright 2003.

Relevance Feedback

• Not fully suitable for Colour predominance

• Use a subset of Relevance Feedback to improve useability

• Sort the result from most to least relevant

Page 25: Scott Clements, Monash University Software Engineering, Copyright 2003. Web Document Analysis- Improving Search Technology using Image Processing Scott.

Scott Clements, Monash University Software Engineering, Copyright 2003.

Results and Findings

•Test set: –Real life photos–Computer generated images

–Easy method to understand for users–Less information stored in the database–Accurate and efficient method to use

Algorithm Similarity results

Colour Predominance 86%

Page 26: Scott Clements, Monash University Software Engineering, Copyright 2003. Web Document Analysis- Improving Search Technology using Image Processing Scott.

Scott Clements, Monash University Software Engineering, Copyright 2003.

Conclusion and ApplicationsSmall to Medium sized system • Example: local image database• Colour histogram matching• Colour predominance

Medium to Large system • Example: Internet search engine• Only Colour predominance

–More efficient–Less information to store about images–Easy to understand

Page 27: Scott Clements, Monash University Software Engineering, Copyright 2003. Web Document Analysis- Improving Search Technology using Image Processing Scott.

Scott Clements, Monash University Software Engineering, Copyright 2003.

Future Research• Parallelism in image analysis• Alternative image data for histogram matching (e.g. HSB)• Replace or extend Monash Image Library (MIL) to directly support popular internet image formats.• Improve the documentation for colour image manipulation in MIL.•More extensive testings of colour predominance•Addition of predominance levels

Page 28: Scott Clements, Monash University Software Engineering, Copyright 2003. Web Document Analysis- Improving Search Technology using Image Processing Scott.

Scott Clements, Monash University Software Engineering, Copyright 2003.

Questions?

Are there any questions?