Top Banner
PROJECT REPORT CONTENT BASED SEARCH BY RETRIVING THE FILES A thesis submitted in partial fulfillment of the requirement for the Award of Degree Of BACHELOR OF TECHNOLOGY(Computer Sciences) NIMRA COLLEGE OF ENGINEERING AND TECHNOLOGY VIJAYAWADA Zulfikar Ali.Md (06231A05C0) 1
59

Content Based Search Final Document

Nov 12, 2014

Download

Documents

sivasatish007
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Content Based Search Final Document

PROJECT REPORT

CONTENT BASED SEARCH BY RETRIVING THE FILES

A thesis submitted in partial fulfillment of the requirement for the Award of Degree Of

BACHELOR OF TECHNOLOGY(Computer Sciences)

NIMRA COLLEGE OF ENGINEERING AND TECHNOLOGY VIJAYAWADA Zulfikar Ali.Md (06231A05C0) Sudeesha.M Ritesh Abhishekh (06231A05A4) (06231A0570)

Sajida Bhanu Pramod.G (06231A0572) (06231A0562)

Under the esteemed guidance of

Miss. G.ANITHA, B.Tech (CSE)Lecturer

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

NIMRA COLLEGE OF ENGINEERING AND TECHNOLOGY

(Affiliated to Jawaharlal Nehru Technological University)

1

Page 2: Content Based Search Final Document

PROJECT REPORT

AN ISO 9001-2000 CERTIFIED INSTITUTION

JUPUDI, VIJAYAWADA, AP.MAY, 2009

ABSTRACT

Content Based File Search is a Java application to find files that

contain (or don’t contain) a given string. The string may be in plain

text or it may be a Java regular expression. Such a trivial search

should be part of the operating system, and in fact, once was. As

bigger and more impressive features were added to Windows, it lost

the ability to search files for arbitrary bytes of text. Windows

98/ME/2000 could find words buried in files with unknown formats;

Windows XP and Vista search only their supported file types. Through

the creation of files of content through applications, downloading of

content from the Internet, or receiving content via email, this file

system can become quite full of important content located throughout

the system. Whether these files are carefully filed away in deeply

nested hierarchical folders, or haphazardly filed away in a nearly flat

system, at some point that data probably needs to be accessed again.

It is at this point the problem of desktop search becomes apparent. In

a system consisting of gigabytes and gigabytes of thousands or even

millions of files, it is important have more efficient search engine for

desktop. The speed of this program depends upon the speed of

computer’s hardware and the complexity of the search string. When

searching for plain ASCII text or Unicode characters from 0x20 to

0x7E, the “(raw data bytes)” encoding is about 40% faster than the

local system’s in “(default encoding)”.

2

Page 3: Content Based Search Final Document

PROJECT REPORT

Introduction

The capacity of our hard-disk drives has in creased

tremendously over the past decade, and so has the number of files we

usually store on our computer. It is no wonder that sometimes we

cannot find a document any more, even when we know we saves it

some where. The recent arrival of desktop search applications, which

index all data on a PC, promises to increase search efficiency on the

desktop. Still, these search applications are weaker than their web

counterparts. Unfortunately, they also fall short of utilizing desktop

specific characteristics, especially context information. For example,

one file might contain a question describing the object one is looking

for, and another file in the same thread might include the answer to

that question in the form of an attached document. The search

functionality in earlier versions of Windows searches all files for the

specified string and may return a large number of irrelevant files such

as program and configuration files. As a result of this change, the

search functionality can find the same set of files if the Content Index

service is turned on or off. In previous versions of Windows, the

computer exhibited different behavior if you turned on the Content

Index service.

Content-based File retrieval was initially proposed to overcome

the difficulties encountered in keyword-based File search in 1990s.

Since then, it has been an active research topic, and a lot of

algorithms have been published in the literature. In keyword-based

search,file have to be manually annotated with keywords. As keyword

annotation is a tedious process, it is impractical to annotate.

Furthermore, annotation may be inconsistent. Moreover, the feature

extraction can be performed automatically. Thus, the human labeling

process can be avoided.

3

Page 4: Content Based Search Final Document

PROJECT REPORT

Context and Content

This metric brings about two points. First, the context of the

search - what documents and text you have open or have recently

modified - could help immensely, and since this is search done on a

local computer that information could be accessible. Second, it points

out that a text-based keyword search may not be the whole answer. A

content-based information retrieval system that allows you to

construct search queries based on the kind of content you're searching

for could be an important area for research. This isn't the best

example, but rather than just searching for a company name in your

email to find correspondence with members of that company, if you

have one email from that company the fact that all email from that

company will be from the same domain name is something your

search tool could notice. It might rank email to and from that specific

person as most relevant, email to and from that company as also

relevant. When you think of your documents and content as query

statements, interesting possibilities open up.

When the domain switches from email to media - like music or

images - the possibilities for content-based image retrieval seem even

more interesting. Especially considering the relatively impoverished

state of metadata, text-based searching for media content on the

desktop is extremely difficult.

Existing System

4

Page 5: Content Based Search Final Document

PROJECT REPORT

Through the creation of files of content through applications,

downloading of content from the Internet, or receiving content via

email, this file system can become quite full of important content

located throughout the system. Whether these files are carefully filed

away in deeply nested hierarchical folders, or haphazardly filed away

in a nearly flat system, at some point that data probably needs to be

accessed again. It is at this point the problem of desktop search

becomes apparent. In a system consisting of gigabytes and gigabytes

of thousands or even millions of files, how does one locate a specific

file? If it is filed away "properly," that is, in a manner the user was

conscious of and remembers, perhaps it will be easily located in that

folder. But what if the user has put the file in a folder he can't

remember? Or software automatically saved it somewhere he does not

expect? Or the folder it is in contains over a hundred files, and the

user can't remember the file's name? Or he knows the folder it is in,

but can't remember where the folder is? There are many reasons to

not be able to instantly remember the folder location of a file,

especially if it was created months or even years earlier.

Disadvantages

Speed is a major issue. By default, neither file metadata nor content is

indexed in such a way that results are returned quickly. Although

Windows XP includes something called "Indexing Service" that will

index files for quick access, it is not enabled by default. It was not

examined for the purposes of this paper since it is so seldom used or

mentioned by normal users. There is no meaningful ranking of the

results. That is, although you can resort the results by the common file

system metadata: name, folder location, file type, and date modified,

results seemed to be returned simply in the order they are found as

Windows XP Search linearly searches through files and folders.

5

Page 6: Content Based Search Final Document

PROJECT REPORT

Proposed System

The string may be in plain text or it may be a Java regular

expression. Such a trivial search should be part of the operating

system, and in fact, once was. As bigger and more impressive

features were added to Windows, it lost the ability to search files for

arbitrary bytes of text. Windows 98/ME/2000 could find words buried

in files with unknown formats; Windows XP and Vista search only their

supported file types. A regular expression is a way of specifying

relationships between elements of a complex pattern. You don’t need

to understand regular expressions to use this program. This program

can be executed from both the command prompt and the graphical

user interface. As we implement the regular expression we can over

come the disadvantages of the previous system.

System Specifications

Hardware Specification:

6

Page 7: Content Based Search Final Document

PROJECT REPORT

The speed of this program depends upon the speed of your

computer’s hardware. When searching for plain ASCII text or Unicode

characters from 0x20 to 0x7E, the “(raw data bytes)” encoding is

about 40% faster than the local system’s “(default encoding)”. Even

an old Intel Pentium 3 processor at 3.0 GHz should be able to scan

large files at 15 megabytes per second (MB/s) as raw data bytes with

the “case” option enabled.

PROCESSOR Pentium Series

RAM 64 MB

KEY BOARD 104 Keys

FLOPPY DISK 1.44 MB

HARD DISK 6 GB

MOUSE Serial Mouse

Software Specification:

FileSearch was developed with Java 1.4 and should run on later

versions. It may also run on earlier versions, but this has not been

tested. For Macintosh computers, the version of Java is determined by

7

Page 8: Content Based Search Final Document

PROJECT REPORT

your version of MacOS. For Windows, Linux, and Solaris, you can

download the JRE from Sun Microsystems:

Sun Java

JRE for end users: http://www.java.com/getjava/

SDK for programmers: http://developers.sun.com/downloads/

IDE for programmers: http://www.netbeans.org/

As the application is developed using the java technology, for

compiling the project we need the java installed on the system but in

order to run the project we just need JVM installed in system. Now a

day most of the operating system is installed with JVM inbuilt. If we

don’t found the JVM on system we can download from the Sun

Microsoft web sites as it is free to download. Once we load the JVM we

can run the application. As per this project development and executing

we need software as follows

Operating Systems Windows XP, 2000

Technologies Java 5 or 6, JVM

Run time environment Java Virtual machine

IDE Net Beans IDE

8

Page 9: Content Based Search Final Document

PROJECT REPORT

System Analysis

Requirement Analysis

Digital data volume has been increasing at a phenomenal rate during

the past decade. The ``Moore's law curve'' (doubling every 18 months)

9

Page 10: Content Based Search Final Document

PROJECT REPORT

no longer refers only to the exponential improvement rate of

processor performance, storage density and network bandwidth, but

also to the data growth rates of many disciplines. The dominating data

types are feature-rich data such as audio, digital photos, videos, and

scientific sensor data. As we are moving into a digital society where all

information is digitized and where the world is interconnected by

digital means, it is highly desirable for next-generation systems to

provide users with abilities to access, search, explore and manage

feature-rich data.

Although several new operating systems attempt to provide users with

content-based search capabilities, they are limited to text documents.

A key challenge in implementing a content-based similarity search

system for feature-rich data is that such data is noisy and complex.

For example, consider two different photographs of an identical scene,

or two separate recordings of a person speaking the same sentence.

Despite the high degree of similarity between the two images or

between the audio recordings, the digital representations are different

at the bit level. Comparing noisy, feature-rich data requires matching

based on similarity instead of exact match, and thus searching for

noisy data requires similarity search instead of exact search. However,

similarity search in high-dimensional spaces is notoriously difficult (the

so called curse of dimensionality). Hence, practical advanced search

solutions, such as database tools and search engines (e.g. Google),

have been limited to searching for exact matches and tend to work

only for text documents and text annotations. To date, there is no

practical content-based search engine for massive amounts of

inherently noisy, feature-rich data.

A key component in our research is a general-purpose similarity

search engine. To deliver high-quality similarity search results with

minimal CPU cycles and memory resources, we have developed novel

techniques based on dimension-reduction ideas recently developed in

the theory community. We use these to construct sketches -- tiny data

10

Page 11: Content Based Search Final Document

PROJECT REPORT

structures that can be used to estimate properties of the original data

-- from feature vectors as highly compact metadata for the similarity

search engine. This approach allows us to attack the ``curse of

dimensionality'' problem in the design of the similarity search engine

for feature-rich data.

2. SYSTEM ANALYSIS:

System Analysis is first stage according to System

Development Life Cycle model. This System Analysis is a process that

starts with the analyst.

Analysis is a detailed study of the various operations performed

by a system and their relationships within and outside the system.

One aspect of analysis is defining the boundaries of the system and

11

Page 12: Content Based Search Final Document

PROJECT REPORT

determining whether or not a candidate should consider other related

systems. During analysis, data is collected from the available files,

decision points, and transactions handled by the present system.

Logical system models and tools are used in analysis. Training,

experience, and common sense are required for collection of the

information needed to do the analysis.

2.1 SYSTEM OBJECTIVES:

1. To automate selection process

2. To facilitate high graphical user interface to the user

3. To provide better functioning and accurate information in

time

4. To provide data maintenance features.

5. To improve the efficiency and to reduce the overload of

work

6. To generate appropriate and concerned information to the

user using dynamic queries

7. To generate appropriate reports

8. To provide security.

2.2 FEASIBILITY STUDY:2.2 FEASIBILITY STUDY:

All projects are feasible, given unlimited resources and infinite time.

But the development of software is plagued by the scarcity of

resources and difficult delivery rates. It is both necessary and prudent

to evaluate the feasibility of a project at the earliest possible time.

Three key considerations are involved in the feasibility analysis.

2.2.1 Economic Feasibility:

12

Page 13: Content Based Search Final Document

PROJECT REPORT

This procedure is to determine the benefits and savings that are

expected from a candidate system and compare them with costs. If

benefits outweigh costs, then the decision is made to design and

implement the system. Otherwise, further justification or alterations in

proposed system will have to be made if it is to have a chance of being

approved. This is an ongoing effort that improves in accuracy at each

phase of the system life cycle.

2.2.2 Technical Feasibility:

Technical feasibility centers on the existing computer system

(hardware, software, etc.,) and to what extent it can support the

proposed addition. If the budget is a serious constraint, then the

project is judged not feasible.

2.2.3 Operational Feasibility:

People are inherently resistant to change, and computers have been

known to facilitate change. It is understandable that the introduction

of a candidate system requires special effort to educate, sell, and train

the staff on new ways of conducting business.

2.3 FEASIBILITY STUDY IN THIS PROJECT:FEASIBILITY STUDY IN THIS PROJECT:

In this test, the operational scope of the system is checked. The

system under consideration should have enough operational reach. It

is observed that the proposed system is very user friendly and since

the system is built with enough help, even persons with little

knowledge of windows can find the system very easy.

2.3.1 Technical Feasibility:

13

Page 14: Content Based Search Final Document

PROJECT REPORT

This test includes a study of function; performance and

constraints that may affect the ability to achieve an acceptable

system. This test begins with an assessment of the technical viability

of the proposed system. One of the main factors to be accessed is the

need of various kinds of resources for the successful implementation

for the proposed system.

2.3.2 Economical Feasibility:

An evaluation of development cost weighed against the ultimate

income or benefit derived from the development of the proposed

system is made. Care must be taken that incurred in the development

of the proposed of the system should not exceed from the system. The

income can be in terms of money or goodwill, since the software

brings in both, the system is highly viable.

14

Page 15: Content Based Search Final Document

PROJECT REPORT

SYSTEM DESIGN

3.1 SYSTEM DESIGN:

Once software requirements have been analyzed and

specified, software design as the first of 3 technical activities – design,

code generation and test-that are required to build and software.

Each of the elements of the analysis model provides

necessary information for the specification of the designs.

Systems design goes through two phases of development:

15

Page 16: Content Based Search Final Document

PROJECT REPORT

3.1.1 Logical Design:

DFD shows the logical flow of the system and defines the

boundaries of the system for a candidate system it describes the

inputs (source), output(destination), databases(data stores) and

procedures(data flows)- all in a format that meets the user

requirements. The DFD are already explained in previous section.

3.1.2 Physical Design:

This produces the working system by defining the design

specifications that tell programmers exactly what the candidate

system must to. In turn the programmer writes necessary programs

or modifies the software package that accepts input form the user,

performs necessary calculations through the existing file or data base,

produces report on a hard copy or displays it on a screen and

maintains a updated database at all times.

3.1.3 Design Principles:

Software designs is a both process and a model Basic design

principles enables the analyst to navigate the design process.

The design process should not suffer from “tunnel vision”.

The design should be traceable to analysis model.

The design should “minimize the intellectual distance” between

software and problem that exists in the real world.

The design should exhibit uniformity and integration.

The design should be structured to accommodate.

The design should be structured to degrade gently, even when

aberrant data, events or operating conditions are encountered.

The design should be reviewed to minimize conceptual (semantic)

errors.

3.1.4 Input Design:

16

Page 17: Content Based Search Final Document

PROJECT REPORT

Inaccurate input data are the most common cause of

errors in data processing. Errors entered by data entry operator can

be controlled by input design. Input design is the process of converting

user-originated inputs to a computer-based format.

Once defined, appropriate input media are selected for

processing. The goal of the designing input data is to make data entry

as easy, logical and free from errors as possible.

3.1.5 Output Design:

Computers output is the most important and direct source

of information’s to the user. Efficient, intelligible output design should

improve the systems relationships with the user, and help in decision

making.

3.3 SYSTEM PLANNING:

Planning information systems in business has become

increasingly important during the past decade. First, information is

recognized as vital resource and must be managed. Secondly more

and more financial resources are committed to information system.

Thirdly there is a growing need for long range for use of common

database or have a greater competitive edge.

3.3.1 Initial Investigation:

The user request identifies the need for change and authorizes

the initial investigation. It undergoes several modifications before it

becomes a written Commitment .In this system the following are done.

Background investigation, fact finding and analysis.

3.3.2 Needs Identification:

17

Page 18: Content Based Search Final Document

PROJECT REPORT

User needs identification and analysis is concerned with what

the user needs rather than what they want. Often problems come into

focus after a joint meeting between the user and the analyst.

18

Page 19: Content Based Search Final Document

PROJECT REPORT

INTRODUCTION TO JAVA

JAVA

Its creators have called java “programming for the

internet”. What makes java a good language for networking are the

classes defined in the java net package. These networking classes

encapsulate the “socket” paradigm pioneered in the Berkeley software

distribution from the University of California at Berkeley.

The Java Programming language is a high-level language that can be

characterized by all the following buzz words:

Simple Architecture neutral

19

Page 20: Content Based Search Final Document

PROJECT REPORT

Object oriented Portable

Distributed High performance

Interpreted Multithreaded

Robust Dynamic

Secure

With most programming languages, you either compile

or interpret a program so that you can run it on your computer. The

Java programming language is unusual in that a program is both

compiled and interpreted. With the compiler, first you translate a

program into an intermediate language called Java byte codes —the

platform-independent codes interpreted by the interpreter on the Java

platform. The interpreter parses and runs each Java byte code

instruction on the Computer. Compilation happens just once;

interpretation occurs each time the program is executed. The following

figure illustrates how this works.

You can think of Java byte codes as the machine code instructions for

the Java Virtual Machine (Java VM). Every Java interpreter, whether it's

a development tool or a Web browser that can run applets, is an

implementation of the Java VM.

Java byte codes help make "write once, run anywhere" possible. You

can compile your program into byte codes on any platform that has a

Java compiler. The byte codes can then be run on any implementation

20

Page 21: Content Based Search Final Document

PROJECT REPORT

of the Java VM. That means that as long as a computer has a Java VM,

the same program written in the Java programming language can run

on Windows 2000, a Solaris workstation, or on an iMac.

4.1 The Java Platform

A platform is the hardware or software environment in

which a program runs. We've already mentioned some of the most

popular platforms like Windows 2000, Linux, Solaris, and MacOS. Most

platforms can be described as a combination of the operating system

and hardware. The Java platform differs from most other platforms in

that it's a software-only platform that runs on top of other hardware-

based platforms.

The Java platform has two components:

The Java Virtual Machine (Java VM)

The Java Application Programming Interface (Java API)

JVM is the base for the Java platform and is ported onto various

hardware-based platforms.

21

Page 22: Content Based Search Final Document

PROJECT REPORT

The Java API is a large collection of ready-made software components

that provide many useful capabilities, such as graphical user interface

(GUI) widgets. The Java API is grouped into libraries of related classes

and interfaces; these libraries are known as packages. The next

section, What Can Java Technology Do?, highlights what functionality

some of the packages in the Java API provide.

The following figure depicts a program that's running on the Java

platform. As the figure shows, the Java API and the virtual machine

insulate the program from the hardware.

Native code is code that after you compile it, the compiled code runs

on a specific hardware platform. As a platform-independent

environment, the Java platform can be a bit slower than native code.

However, smart compilers, well-tuned interpreters, and just-in-time

bytecode compilers can bring performance close to that of native code

without threatening portability.

What Can Java Technology Do?

The most common types of programs written in the Java

programming language are applets and applications. If you've surfed

the Web, you're probably already familiar with applets. An applet is a

program that adheres to certain conventions that allow it to run within

a Java-enabled browser.

However, the Java programming language is not just for writing cute,

entertaining applets for the Web. The general-purpose, high-level Java

22

Page 23: Content Based Search Final Document

PROJECT REPORT

programming language is also a powerful software platform. Using the

generous API, you can write many types of programs.

An application is a standalone program that runs directly on the Java

platform. A special kind of application known as a server serves and

supports clients on a network. Examples of servers are Web servers,

proxy servers, mail servers, and print servers. Another specialized

program is a servlet. A servlet can almost be thought of as an applet

that runs on the server side. Java Servlets are a popular choice for

building interactive web applications, replacing the use of CGI scripts.

Servlets are similar to applets in that they are runtime extensions of

applications. Instead of working in browsers, though, servlets run

within Java Web servers, configuring or tailoring the server.

How does the API support all these kinds of programs? It does so with

packages of software components those provide a wide range of

functionality. Every full implementation of the Java platform gives you

the following features:

The essentials: Objects, strings, threads, numbers,

input and output, data structures, system properties,

date and time, and so on.

Applets: The set of conventions used by applets.

Networking: URLs, TCP (Transmission Control

Protocol), UDP (User Data gram Protocol) sockets, and

IP (Internet Protocol) addresses.

Internationalization: Help for writing programs that

can be localized for users worldwide. Programs can

automatically adapt to specific locales and be displayed

in the appropriate language.

Security: Both low level and high level, including

electronic signatures, public and private key

management, access control, and certificates.

23

Page 24: Content Based Search Final Document

PROJECT REPORT

Software components: Known as JavaBeans, can plug

into existing component architectures.

Object serialization: Allows lightweight persistence

and communication via Remote Method Invocation

(RMI).

Java Database Connectivity (JDBC): Provides uniform

access to a wide range of relational databases.

The Java platform also has APIs for 2D and 3D graphics, accessibility,

servers, collaboration, telephony, speech, animation, and more. The

following figure depicts what is included in the Java 2 SDK.

Java Patterns:

Java has several design patterns Singleton Pattern being the most

commonly used. Java Singleton pattern belongs to the family of

design patterns, that govern the instantiation process. This design

pattern proposes that at any time there can only be one instance of a

singleton (object) created by the JVM.

24

Page 25: Content Based Search Final Document

PROJECT REPORT

The class’s default constructor is made private, which prevents the

direct instantiation of the object by others (Other Classes). A static

modifier is applied to the instance method that returns the object as it

then makes this method a class level method that can be accessed

without creating an object.

We write a public static getter or access method to get the instance of

the Singleton Object at runtime. First time the object is created inside

this method as it is null. Subsequent calls to this method returns the

same object created as the object is globally declared (private) and

the hence the same referenced object is returned.

public static synchronized SingletonObjectDemo getSingletonObject()

It could happen that the access method may be called twice from 2

different classes at the same time and hence more than one object

being created. This could violate the design patter principle. In order

to prevent the simultaneous invocation of the getter method by 2

threads or classes simultaneously we add the synchronized keyword to

the method declaration

We can still be able to create a copy of the Object by cloning it using

the Object’s clone method. This can be done as shown below

SingletonObjectDemo clonedObject = (SingletonObjectDemo)

obj.clone ();

This again violates the Singleton Design Pattern’s objective. So to deal

with this we need to override the Object’s clone method which throws

a CloneNotSupportedException exception.

25

Page 26: Content Based Search Final Document

PROJECT REPORT

Unified Modeling Language Diagrams

The unified modified language allows the software engineer to

express an analysis model using the modeling notation that is

governed by a set of syntactic, semantic and pragmatic rules.

A UML system is represented using five different views that

describe the system from distinctly different perspective. Each

view is defined by a set of diagram ,which is as follows.

User Model View

i. This view represents the system from the users

perspective.

ii. The analysis representation describes a usage scenario

from the end-users Perspective.

Structural model view

i. In this model the data and functionality are arrived from

inside the system.

ii. This model view models the static structures.

Behavioral model view

i. It represents the dynamic of behavioral as parts of the

system, depicting the

ii. Interactions of collection between various structural

elements described in the user model and structural model

view.

Implementation model view

In this the structural and behavioral as parts of the system

are represented as they are to be built.

Environmental model view

26

Page 27: Content Based Search Final Document

PROJECT REPORT

In this the structural and behavioral aspects of the

environment in which

The system is to be implemented are represented.

UML is specifically constructed through two different domains they are

UML Analysis modeling, which focuses on the user model and

Structural model views of the system.

UML design modeling, which focuses on the behavioral

modeling,

Implementation modeling and environmental model views.

27

Page 28: Content Based Search Final Document

PROJECT REPORT

Use Case Diagrams

Set the Parameter for the Searching Operation

“Search String” Enter the search string

Select the directory, where to search

Save the result output

Set Case Sub folders & hidden

Set Regular text

User

28

Page 29: Content Based Search Final Document

PROJECT REPORT

Use case diagram

29

Page 30: Content Based Search Final Document

PROJECT REPORT

Class FileSearch1

static final long BIG_FILE_SIZE ;

static final int BUFFER_SIZE ;

static final int BYTE_MASK ;

static final String LOCAL_ENCODING;

static final String[] FONT_SIZES;

static final String[] REPORT_CHOICES

static final int TIMER_DELAY

ActionListener action;

FontName;

Font Size new font(arial,20);

String textsearch=null;

JButtons open,close,save,cancel;

JTextField textString(10);

JComboBox

fontSize,fontName,regularText;

JcheckBoxes nullchk, caseChk;

Class FileSearch1User

ActionPerformed(ActionEvent ae);

public FileSearch1User() ;

public void actionPerformed(ActionEvent

event);

FileSearch1.userButton(event);

public void run ()

doOpenButton();

doCancelButton();

doSaveButton();

doOpenRunner();

formatMatchWindow();

makeRegularPlain(String text);

prettyPlural();

processFileOrFolder();

processUnknownFile(File givenFile);

putError(String text);

putOutput(String text);

setStatusMessage(String text);

static void showHelp();

static File[] sortFileList(File[] input);

30

Page 31: Content Based Search Final Document

PROJECT REPORT

Run the Program

Enter the String

Settings

Open the Folder and select the folder

Cancel

Write the search file to the output text

Search process Starts

Stop Search Process

Write the search file to the output text

31

Page 32: Content Based Search Final Document

PROJECT REPORT

OUTPUT SCREENS

32

Page 33: Content Based Search Final Document

PROJECT REPORT

Fig 1: Running of project

33

Page 34: Content Based Search Final Document

PROJECT REPORT

Fig 2: Main Page

34

Page 35: Content Based Search Final Document

PROJECT REPORT

Fig 3: Entering of String to Search

35

Page 36: Content Based Search Final Document

PROJECT REPORT

Fig 4: Opening the Directory where to Search

36

Page 37: Content Based Search Final Document

PROJECT REPORT

Fig 5: Select the Drive

37

Page 38: Content Based Search Final Document

PROJECT REPORT

Fig 6: Searching Process

38

Page 39: Content Based Search Final Document

PROJECT REPORT

Fig 7: Searching process where cancel but is enabled

39

Page 40: Content Based Search Final Document

PROJECT REPORT

Fig 8: End of File Search

40

Page 41: Content Based Search Final Document

PROJECT REPORT

Fig 9: Saving the output to Text file

41

Page 42: Content Based Search Final Document

PROJECT REPORT

Fig 10: File1.txt generated by the project Save in C drive

42

Page 43: Content Based Search Final Document

PROJECT REPORT

Fig 11: Content of saved file

43

Page 44: Content Based Search Final Document

PROJECT REPORT

TESTINGTESTING

44

Page 45: Content Based Search Final Document

PROJECT REPORT

5.1 Testing and testing types

Software testing is critical element of Software Quality

Assurance and represents the ultimate review of specification, design

and coding. Software testing is one of broader topic and often

referred to as verification refer to all the activities that endure that

software built is traceable to use requirements. System testing

consists of the following steps:

1. Modular Testing

2. Integrated Testing

3. User Acceptance Testing

5.1.1 Modular Testing:

A module represents the logical element of a system. For

a module to run satisfactorily, it must compile and test data correctly

and tie in properly with other modules. Modular testing checks for

two types of errors: Syntax and Logic. A syntax error is a program

statement that violates one or more rules of the language in which it

is written. A logic error, on the other hand deals with incorrect data

fields, out –of-range items, and invalid combination.

5.1.2 Integrated Testing:

Individual modules are invariably related to one another

and interact in a total system. Each portion of the system is tested

against the entire module with both testing and live data before the

entire system is ready to be implemented.

When the individual modules were found works

satisfactorily, the system integration test was carried out. Data was

collected in such way that all program paths could be covered. Using

these data a complete test was made. All outputs were generated.

Different users were allowed to work on the system to check its

performance.

45

Page 46: Content Based Search Final Document

PROJECT REPORT

5.1.3 User Acceptance Testing:

An acceptance test has the objective of selling the user

the validity and reliability of the system. It verifies that system’s

procedure operates to system specifications that their integrity of

vital data is maintained. Performance of an acceptance test is

actually user’s show. User motivation and knowledge are critical for

the accused performance of the system. Then a comprehensive test

report is prepared. The report indicated the system’s tolerance,

performance range, error rate and accuracy. In the development of

the system both verification and validation was done. Testing carried

out in 2 phases.

Each module is thoroughly tested for various test cases. Testing

methodologies such as input/output testing validation testing are

conducted at each step so as to get more efficient performance

5.2 TESTING STRATEGIES:

A Strategy for software testing integrates software test cases

into a series of well planned steps that result in the successful

construction of software. Software testing is a broader topic for what

is referred to as Verification and Validation. Verification refers to the

set of activities that ensure that the software correctly implements a

specific function. Validation refers he set of activities that ensure that

the software that has been built is traceable to customer’s

requirements

5.2.1 Unit Testing:

Unit testing focuses verification effort on the smallest unit of

software design that is the module. Using procedural design

description as a guide, important control paths are tested to uncover

errors within the boundaries of the module. The unit test is normally

white box testing oriented and the step can be conducted in parallel

for multiple modules.

46

Page 47: Content Based Search Final Document

PROJECT REPORT

5.2.2 Integration Testing:

Integration testing is a systematic technique for constructing the

program structure, while conducting test to uncover errors associated

with the interface. The objective is to take unit tested methods and

build a program structure that has been dictated by design.

5.2.2.1 Top-down Integration:

Top down integrations is an incremental approach for

construction of program structure. Modules are integrated by moving

downward through the control hierarchy, beginning with the main

control program. Modules subordinate to the main program are

incorporated in the structure either in the breath-first or depth-first

manner.

5.2.2.2 Bottom-up Integration:

This method as the name suggests, begins construction and

testing with atomic modules i.e., modules at the lowest level. Because

the modules are integrated in the bottom up manner the processing

required for the modules subordinate to a given level is always

available and the need for stubs is eliminated.

5.2.3 Validation Testing:

At the end of integration testing software is completely

assembled as a package. Validation testing is the next stage, which

can be defined as successful when the software functions in the

manner reasonably expected by the customer. Reasonable

expectations are those defined in the software requirements

specifications. Information contained in those sections form a basis for

validation testing approach.

5.2.4 System Testing:

47

Page 48: Content Based Search Final Document

PROJECT REPORT

System testing is actually a series of different tests whose

primary purpose is to fully exercise the computer-based system.

Although each test has a different purpose, all work to verify that all

system elements have been properly integrated to perform allocated

functions.

5.2.5 Security Testing:

Attempts to verify the protection mechanisms built into the

system.

5.2.6 Performance Testing:

This method is designed to test runtime performance of software

within the context of an integrated system.

We use these testing Strategies in this project.

BIBILOGRAPHY: Java Complete Reference

--Herbert Schild www.3gpp .com

48

Page 49: Content Based Search Final Document

PROJECT REPORT

49