Top Banner
AN APPROACH TO CATEGORIZATION OF TEXT IN WEBSITES USING PARALLEL SEARCH BAKTAVATCHALAM.G (08MW03) MASTER OF ENGINEERING Branch: SOFTWARE ENGINEERING of Anna University May 2009 DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING PSG COLLEGE OF TECHNOLOGY (Autonomous Institution) COIMBATORE – 641 004
43

DCA Mini Project Report

Nov 18, 2014

Download

Documents

AN APPROACH TO CATEGORIZATION OF TEXT IN WEBSITES
USING PARALLEL SEARCH
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: DCA Mini Project Report

AN APPROACH TO CATEGORIZATION OF TEXT IN WEBSITES USING PARALLEL SEARCH

BAKTAVATCHALAM.G (08MW03)

MASTER OF ENGINEERING

Branch: SOFTWARE ENGINEERING

of Anna University

May 2009

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING PSG COLLEGE OF TECHNOLOGY

(Autonomous Institution)

COIMBATORE – 641 004

Page 2: DCA Mini Project Report

PSG COLLEGE OF TECHNOLOGY (Autonomous Institution)

COIMBATORE – 641 004

AN APPROACH TO CATEGORIZATION OF TEXT IN WEBSITES USING

PARALLEL SEARCH

Bona fide record of work done by

BAKTAVATCHALAM.G (08MW03)

MASTER OF ENGINEERING

Branch: COMPUTER SCIENCE AND ENGINEERING

of Anna University, Coimbatore.

May 2009

Page 3: DCA Mini Project Report

Acknowledgement

i

ACKNOWLEDGEMENT

We wish to express our sincere gratitude to our respected Principal Dr. R. Rudramoorthy for having given us the opportunity to undertake our project.

We also wish to express our sincere thanks to Dr. S. N. Sivanandam, Professor and Head of the Department of Computer Science and Engineering, for his

encouragement and support that he extends towards our project work.

We extend our sincere thanks to our internal guide, Mrs. D. Indumathi, Asst. Professor, Department of Computer Science and Engineering, for his guidance and

help rendered for the successful completion of our project.

Page 4: DCA Mini Project Report

Contents

iii

CONTENTS

CHAPTER Page No. Synopsis………………….………………………………………………..…………….. .(i) List of Figures.………….………………………………………………...…………….. .(ii) List of Tables.…………………………………………………………………………….(iii) 1. INTRODUCTION.……...…………………………………………………………... .1

1.1. Problem Definition 1

1.2. Objective of the Project 1

1.3. Significance of the Project 1

1.4. Outline of the Project 1

2. SYSTEM STUDY..…….……………………..……………………………………...3 2.1. Proposed System 3

3. SYSTEM ANALYSIS..…….……………………..………………………………….4 3.1 Requirement Analysis 4 3.2 Feasibility Study 4

4. SYSTEM IMPLEMENTATION.………………..…………………………………...10 5.1 Server Module 10

5.2 Parser Module 11

5. TESTING……………………….………………..……………………………………12 6.1 Unit Testing 12

6.2 Integration Testing 14

6.3 Sample Test Cases 15

6. SNAPSHOT.…..……………….………………..…………………………………. 16

7.1 Finding Document Category 16

7.2 Finding keyword Document 16

CONCLUSIONS………………..………………………………………….……….……..17 FUTURE ENHANCEMENTS..…………………………………………………….……. .18

BIBLIOGRAPHY...…………………………………………………………….………….19

Page 5: DCA Mini Project Report

Synopsis

i

SYNOPSIS

In this project, we search a given set of keywords in categorized

documents. Searching is done after the categorization is completed and categories of

given documents are available.

Here we do two separate operations. First we generate the categories and

its related categories. After that we give required web site links to find categories of

those links. Here each website contents are parsed into keywords list and using those

keys the corresponding category is determined. Now the documents and its categories

are computed to search using keys.

Second, we give keywords to search engine to search the document and

its corresponding category. If keyword is composite of multiple keywords then all keys

are searched and its corresponding document and its corresponding category will be

retrieved. The category contains name, keys, and weights for corresponding keys.

Category is sorted using those weights and key occurrences.

Page 6: DCA Mini Project Report

List of Figures

ii

LIST OF FIGURES

FIGURE NO LIST OF FIGURES PAGE NO.

Fig: 2.1

System Architecture 3

Page 7: DCA Mini Project Report

List of Tables

iii

LIST OF TABLES

TABLE NO NAME PAGE NO.

Table 6.1 Sample Test Cases 15

Page 8: DCA Mini Project Report

Introduction Chapter 1

1

CHAPTER 1

INTRODUCTION

This chapter provides a brief overview of the problem definition, objectives and

significance of the project and an outline of the report.

1.1 PROBLEM DEFINITION Searching a given keyword set in a given website set and categorizes the

websites. If a keyword set is given then it will determine the documents which are most

relevant to that keyword set and also the category which it belongs to that keyword set

1.2 OBJECTIVE OF THE PROJECT Most of the users are interested in the website contents of their desired

information. Also users want the information location where that info is found. So this

project gives a solution for user that user can search where a particular text paragraph is

found in a given set of websites and corresponding category.

1.3 SIGNIFICANCE OF THE PROJECT With the enormous growth in information on the Internet, there is a corresponding

need for tools that enable fast and efficient searching, browsing and delivery of textual

data. The concurrent execution will greatly simplify the complexity of the search.

1.4 OUTLINE OF THE PROJECT The rest of the report is structures as follows. Chapter 2 provides a detailed study

of the existing system and the basic ideas of the proposed system. Chapter 3 discusses

the requirements for the development of the system and an analysis on the feasibility of

the system. Chapter 4 presents the overall design of the system. Chapter 5 discusses

Page 9: DCA Mini Project Report

Introduction Chapter 1

2

the implementation details. Chapter 6 explains various testing procedures conducted on

the system. Chapter 7 contains the snapshot of various forms in our system. The last

section summarizes the project.

Page 10: DCA Mini Project Report

System Study Chapter 2

3

CHAPTER 2

SYSTEM STUDY

This chapter elucidates the existing system and a brief description of the

proposed system.

2.1 PROPOSED SYSTEM

In our project, we search a given set of keywords in categorized

documents. Searching is done after the categorization is completed and categories of

given documents are available. Here we do two separate operations. First we generate

the categories and its related categories. After that we give required web site links to find

categories of those links. Here each website contents are parsed into keywords list and

using those keys the corresponding category is determined. Now the documents and its

categories are computed to search using keys. Second, we give keywords to search

engine to search the document and its corresponding category. If keyword is composite

of multiple keywords then all keys are searched and its corresponding document and its

corresponding category will be retrieved. The category contains name, keys, and

weights for corresponding keys. Category is sorted using those weights and key

occurrences.

Figure 2.1

Keywords

Websites

Document Finder Categorizer

Search Keyword Documents +

Categories

Page 11: DCA Mini Project Report

System Analysis Chapter 3

4

CHAPTER 3

SYSTEM ANALYSIS This section describes the hardware and software specifications for the

development of the system and an analysis on the feasibility of the system.

3.1 REQUIREMENT ANALYSIS 3.1.1 Software Requirements After experimenting with various commercial software available and analyzing

the Pros and Cons of the software, the following are chosen.

• Operating System – Platform Independent • Programming Languages – Java 1.6+ • Front End - Java

3.1.2 Hardware Requirements The Hardware requirements of the proposed system are as follows:

• Pentium-III machine & above

• RAM-256 MB

• Hard Disk with a Capacity of 10 GB 3.2 FEASIBILITY ANALYSIS Feasibility deals with step-by-step analysis of the system. Analysis showed that

this project was feasible in all respects. Three kinds of feasibility factors are considered:

• Economic Feasibility

• Technical Feasibility

• Operational Feasibility

Page 12: DCA Mini Project Report

System Analysis Chapter 3

5

3.2.1 Economic Feasibility

The system is developed only using those softwares that are very well used in

the market, so there is no need for installation of new softwares. Hence, the cost

incurred towards this project is negligible

3.2.2 Technical Feasibility

3.2.2.1 Searching The main aim of our project is to search a specific set of keywords in a specific

set of websites only.

3.2.2.2 Categorizing Next important thing that must be done in our project is to categorize the

documents, so that we can able to search for a specific keyword set.

3.2.3 Operational Feasibility The functions needed to be performed by the system are all valid and without

any conflicts. All functions and constraints specified in the requirements are completely

operational. The requirements stated are realistically testable.

The requirements are adaptable to changes with out any large-scale effects on

other system requirements. The system is capable of accommodating future

requirements if they arise.

Page 13: DCA Mini Project Report

System Design Chapter 4

6

CHAPTER 4

SYSTEM DESIGN This chapter describes the functional decomposition of the system and illustrates

the movement of data between external entities, the processes and the data stores

within the system, with the help of data flow diagrams.

4.1 USE CASE DIAGRAM

Actors User, Client, Server

Usecases IP List, URL List, Keywords, Specification, Send Jobs Process Jobs,

Searching, Results

IP List URL List Keywords

Process JobsClient

Specification

Send Jobs

Server

Searching

Results

User

Page 14: DCA Mini Project Report

System Design Chapter 4

7

4.2 CLASS DIAGRAM

ClientReadS : Socket

dataFS()

ClientWriteS : Socket

send()

ServerReadS : Socket

dataFS()

ServerWriteS : Socket

send()

ServerGUI

main()

ClientGUI

main()

ServerManagerS : SocketkN : intkey[] : StringURL : String

ClientManagerS : SocketkN : intkey[] : StringURL : String

search()parseURL()dataFS()

4.3 SEQUENCE DIAGRAM

User Server Client(s)

1: IP List

2: Keywords

3: URL List

4: Init Process

5: Allocate Jobs

6: Distribute Jobs

7: Process Searching

8: Result9: Combined Result

Page 15: DCA Mini Project Report

System Design Chapter 4

8

4.4 COLLABORATION DIAGRAM

User Server

Client(s)

5: Allocate Jobs

7: Process Searching

1: IP List2: Keywords3: URL List

4: Init Process

9: Combined Result6: Distribute Jobs

8: Result

4.5 STATE CHART / ACTIVITY DIAGRAM

Read IPList, URL List and Keywords

Send Keywords To all IP

Send URL List to all IP

Display Results

Results Found?

Yes

No

Receive all Data

Search each keyword count in each URL

Compute All keywords Count from all URL's

Results

ClientServ er

Page 16: DCA Mini Project Report

System Design Chapter 4

9

4.6 DEPLOYMENT DIAGRAM

IP List URL ListKeywords

ServerClient(s)

Page 17: DCA Mini Project Report

Implementation Chapter 5

10

CHAPTER 5

IMPLEMENTATION

This phase is broken up into two phases: Development and Implementation. The

individual system components are built during the development period. Programs are

written and tried by users.

During Implementation, the components built during development are put into

operational use.

In the development phase of our system, the following system components were

built.

• Server module

• Parser module

The Server & Parser module is developed using Java.

5.1 Server Module This module contains following sub-modules,

• Load Details

• Categorizing

• Searching

5.1.1 Load Details In this module we load Categories & its related categories, Documents & its

categories, Categories & its Keys with Weights.

5.1.2 Categorizing In this module we categorize the given document using key set parsed from that

document and corresponding weights relevant to available categories.

5.1.3 Searching In this module we search documents and its category using given key set.

Page 18: DCA Mini Project Report

Implementation Chapter 5

11

5.2 Parser Module This module contains following sub-modules,

• Load Module

• URL Content Grabber Module

5.2.1 Load Module In this module we load keywords from server and then retrieve URL to begin

searching.

5.2.1 URL Content Grabber Module Whenever a URL is coming from server then the parser makes connection to that

URL and retrieves the contents to begin searching and after it collects key sets from that

site.

Page 19: DCA Mini Project Report

Testing Chapter 6

12

CHAPTER 6

TESTING

This chapter explains the various testing procedures conducted on the system.

Testing is a process of executing a program with the intent of finding an error. A

successful test is one that uncovers an as yet undiscovered error. A testing process

cannot show the absence of defects but can only show that software errors are present.

It ensures that defined input will produce actual results that agree with the required

results. A good testing methodology should include

• Clearly define testing roles, responsibilities and procedures

• Establish consistent testing process

• Streamline testing requirements

• Overcome “requirements slow me down” mentality

• Common sense process approach

• Use some elements of existing Process

• Not an attempt to replace, rewrite or redefine Process

• To find defects early and to give good time to developers for bug fixes

• Independent perspective in testing

Some of the testing principles used in this project are:

• Unit Testing

• Integration Testing

6.1 UNIT TESTING Unit testing is a strategy by which individual components, which make up the

system, are tested first to ensure that system works up to the desired extent. It focuses

on the verification effort on the smallest unit of the software design i.e. module. Various

modules of the system are tested to see whether they perform their intended functions.

Using procedural design description, important control paths are tested to uncover the

Page 20: DCA Mini Project Report

Testing Chapter 6

13

errors with in the boundary of the module. While accepting a connection using specified

functions we go for unit testing in their respective modules. The unit test is normally a

white box test (a testing method in which the control structure of the procedural design is

used to derive test cases).

6.1.1 Process Objectives To test every unit of the software in isolation before integrating it with other units.

6.1.2 Definition of Unit

A unit is a module as identified during size estimation process with a size

estimate that does not exceed 1000LOC.

For GUI applications each screen will be a unit.

If the size estimate for a unit exceeds 1000 LOC and it is not feasible to break it

into smaller logically independent units that can be tested in isolation, the project lead in

concurrence with the SQA can decide to define this as a unit.

6.1.3 Entry Criteria The entry criteria for this process are the following:

• Unit completed

• Unit peer reviewed

6.1.4 Exit Criteria The exit criteria for this process are the following:

• Unit test cases executed

• Any defects that are identified during unit testing and that are not fixed before the

unit enters component testing is listed in the test report and verified

• 100% statement coverage

If unit will be tested before code review of unit, this must be identified in the

project plan. In these projects the developer will self-review (desk check) the code

before unit testing.

In cases of exception handling of error conditions that are difficult to generate,

thereby making it impossible to achieve 100% statement coverage, the code should be

formally reviewed with this additional criteria

Page 21: DCA Mini Project Report

Testing Chapter 6

14

6.2 INTEGRATION TESTING The integration testing is a systematic technique for constructing the program

structure while conducting tests to uncover errors associated with interfacing. It is a type

of testing by which the individual modules of the system are combined and tested

whether they work properly as a whole. The objective is to take unit test modules and

build a program that has been dictated by the design. Integration testing can be either

‘Incremental’ or ‘Non-Incremental’.

The objective of the integration testing is to help engineers plan and execute the

component and Integration testing for their respective projects.

Integration testing should include the following objectives:

• Performed by the product group/Dev test team after feature complete

• Determines that all product components on a list of specific platforms function

successfully together (The List specified in Master test plan)

• Performed in a basic product / platform environment (Basic environment

specified in Master test plan)

• Tests the product functionality against the specification

• Tests functionality of fake languages with sample single and double byte

languages

• Tests scaling to an acceptable minimum level as called out in the master test

plan

• Tests performance, reliability to an acceptable level as called out in the master

test plan

• Final integration tests done after all components are integrated, with the build in

production format

The tasks of the project have been integrated and the functioning of the entire

system has been found to be satisfactory. The functionality of the entire system has

been subjected to a series of tests and all the modules have been found to interoperate

properly.

Finally the integration testing was performed on the integrated system and found

to work properly.

Page 22: DCA Mini Project Report

Testing Chapter 6

15

6.3 SAMPLE TEST CASES The following are the some of the sample test cases employed along with the

test results have been described in the table below.

Table 6.1 Sample Test Cases

Test Description

Result

Is Server stable for running more than one key set? OK

Is parser returns the results properly? OK

Is searching is done correctly? OK

Is Server takes Lower Resources? OK

Is the result is got over a less time? OK

Page 23: DCA Mini Project Report

Snapshot Chapter 7

16

CHAPTER 7

SNAPSHOT

This chapter contains the snapshot of various forms in our system.

7.1 Finding Category of given document

Page 24: DCA Mini Project Report

Snapshot Chapter 7

17

7.2 Finding the Document & its Category using given keyword

Page 25: DCA Mini Project Report

Conclusion

17

CONCLUSION

Thus the analysis, design and implementation of text categorization and

searching are done successfully. So that the user can able to do searching of a set of

keywords in a list of websites and the user can able to view the each keyword count for

a particular website. This searching is very useful for crawl the websites with particular

perspective view of specific content. Also the search is running concurrently, so we can

get higher performance.

Page 26: DCA Mini Project Report

Future Enhancements

18

FUTURE ENHANCEMENTS Currently we have flat classification scheme to find categories, in future it will

extended to hierarchical tree structure classification to reduce the time complexity and

improve relevancy. Currently we give set of websites for classification, in future

classification is done by automatic parsing of sites.

Page 27: DCA Mini Project Report

Bibliography

19

BIBLIOGRAPHY

• [Lorenz 1994] Lorenz, L. Kidd, J. Object Oriented Software Metrics, Prentice Hall 1994, ISBN 0-13-179292-X

• Saturnino Luz, Implementing a Text Categorization System: a step-by-step tutorial

• A. McCallum and K. Nigam. A comparison of event models for naive Bayes text classification. In AAAI/ICML-98 Workshop on Learning for Text Categorization, pages 41–48. AAAI Press, 1998.

• Y. Yang and J. O. Pedersen. A comparative study on feature selection in text categorization.

• In D. H. Fisher, editor, Proceedings of ICML-97, 14th International Conference on Machine Learning, pages 412–420, Nashville, 1997. Morgan Kaufmann Publishers.

• Java Network Programming, O'Reilly & Associates, Inc.,, Second Edition

• Herbert Schildt ., and Patrick Naughton , 2001,“Java2: The Complete Reference “, Fourth

Edition , Tata McGraw-Hill Publishing Company Limited . Websites

http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/wwkb/ http://paul.luminos.nl/documents/show_document.php?d=197

Page 28: DCA Mini Project Report

Appendix

20

APPENDIX

SOURCE CODE LISTINGS

This chapter provides source code listings.

INPUT FILES IP.TXT 2 127.0.0.1 127.0.0.1 127.0.0.1 JOBS.TXT 0 5 www.google.co.in www.yahoo.com www.chennaionline.com www.psgtech.edu www.psgtech.edu KEY.TXT 4 page href www Tamil OUTPUT (In Server)

Sockets created

Keys distributed

Is Found:true

Socket[addr=/127.0.0.1,port=4926,localport=5678]:www.google.co.in---page---1

Is Found:true

Page 29: DCA Mini Project Report

Appendix

21

Socket[addr=/127.0.0.1,port=4926,localport=5678]:www.google.co.in---href---36

Is Found:true

Socket[addr=/127.0.0.1,port=4926,localport=5678]:www.google.co.in---www---18

Is Found:true

Socket[addr=/127.0.0.1,port=4926,localport=5678]:www.google.co.in---Tamil---1

Is Found:true

Socket[addr=/127.0.0.1,port=4926,localport=5678]:www.yahoo.com---page---1

Is Found:true

Socket[addr=/127.0.0.1,port=4926,localport=5678]:www.yahoo.com---href---48

Is Found:true

Socket[addr=/127.0.0.1,port=4926,localport=5678]:www.yahoo.com---www---5

Is Found:false

Socket[addr=/127.0.0.1,port=4926,localport=5678]:www.yahoo.com---Tamil---0

SERVER /* * ServerGUI.java * * Created on November 2, 2008, 3:09 PM */ import java.io.*; import java.util.*; import javax.swing.*; /** * * @author SuperStar */ interface ServerI { public void setErr(String err); public void setInfo(String info); } public class ServerGUI extends javax.swing.JFrame implements ServerI { String[] ip; int ipN=0,rN=0,jN=0,jT=0,kN=0; String[] jobs; String[] rank; String[] key; ServerManager SM; /** Creates new form ServerGUI */ public ServerGUI() { initComponents(); this.jTextArea2.setText("Err Stream:"); this.jList1.removeAll();

Page 30: DCA Mini Project Report

Appendix

22

// this.jList2.removeAll(); this.jList3.removeAll(); (new MessageBox("welcome To SuperStar's Network!")).setVisible(true); } /** This method is called from within the constructor to * initialize the form. * WARNING: Do NOT modify this code. The content of this method is * always regenerated by the Form Editor. */ // <editor-fold defaultstate="collapsed" desc="Generated Code">//GEN-BEGIN:initComponents private void initComponents() { jScrollPane1 = new javax.swing.JScrollPane(); jList1 = new javax.swing.JList(); jLabel1 = new javax.swing.JLabel(); jButton1 = new javax.swing.JButton(); jScrollPane3 = new javax.swing.JScrollPane(); jList3 = new javax.swing.JList(); jLabel2 = new javax.swing.JLabel(); jScrollPane2 = new javax.swing.JScrollPane(); jTextArea1 = new javax.swing.JTextArea(); jButton3 = new javax.swing.JButton(); jScrollPane4 = new javax.swing.JScrollPane(); jTextArea2 = new javax.swing.JTextArea(); jButton2 = new javax.swing.JButton(); jScrollPane5 = new javax.swing.JScrollPane(); jTextArea3 = new javax.swing.JTextArea(); setDefaultCloseOperation(javax.swing.WindowConstants.EXIT_ON_CLOSE); setTitle("Server"); jList1.setModel(new javax.swing.AbstractListModel() { String[] strings = { "Item 1", "Item 2", "Item 3", "Item 4", "Item 5" }; public int getSize() { return strings.length; } public Object getElementAt(int i) { return strings[i]; } }); jScrollPane1.setViewportView(jList1); jLabel1.setText("Clients IP :"); jButton1.setText("Load Details"); jButton1.addActionListener(new java.awt.event.ActionListener() { public void actionPerformed(java.awt.event.ActionEvent evt) { jButton1ActionPerformed(evt); } }); jList3.setModel(new javax.swing.AbstractListModel() { String[] strings = { "Item 1", "Item 2", "Item 3", "Item 4", "Item 5" }; public int getSize() { return strings.length; } public Object getElementAt(int i) { return strings[i]; } }); jScrollPane3.setViewportView(jList3); jLabel2.setText("Clients Rank :");

Page 31: DCA Mini Project Report

Appendix

23

jTextArea1.setColumns(20); jTextArea1.setEditable(false); jTextArea1.setLineWrap(true); jTextArea1.setRows(5); jTextArea1.setWrapStyleWord(true); jTextArea1.setOpaque(false); jScrollPane2.setViewportView(jTextArea1); jButton3.setText("Exit"); jButton3.addActionListener(new java.awt.event.ActionListener() { public void actionPerformed(java.awt.event.ActionEvent evt) { jButton3ActionPerformed(evt); } }); jTextArea2.setColumns(20); jTextArea2.setRows(5); jScrollPane4.setViewportView(jTextArea2); jButton2.setText("Process"); jButton2.addActionListener(new java.awt.event.ActionListener() { public void actionPerformed(java.awt.event.ActionEvent evt) { jButton2ActionPerformed(evt); } }); jTextArea3.setColumns(20); jTextArea3.setRows(5); jScrollPane5.setViewportView(jTextArea3); javax.swing.GroupLayout layout = new javax.swing.GroupLayout(getContentPane()); getContentPane().setLayout(layout); layout.setHorizontalGroup( layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEADING) .addGroup(layout.createSequentialGroup() .addContainerGap() .addGroup(layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEADING) .addGroup(javax.swing.GroupLayout.Alignment.TRAILING, layout.createSequentialGroup() .addGroup(layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEADING) .addComponent(jLabel1) .addComponent(jScrollPane1, javax.swing.GroupLayout.DEFAULT_SIZE, 330, Short.MAX_VALUE)) .addPreferredGap(javax.swing.LayoutStyle.ComponentPlacement.RELATED) .addGroup(layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEADING) .addComponent(jLabel2) .addComponent(jScrollPane3, javax.swing.GroupLayout.PREFERRED_SIZE, 333, javax.swing.GroupLayout.PREFERRED_SIZE))) .addGroup(layout.createSequentialGroup() .addGroup(layout.createParallelGroup(javax.swing.GroupLayout.Alignment.TRAILING) .addComponent(jScrollPane5, javax.swing.GroupLayout.PREFERRED_SIZE, 195, javax.swing.GroupLayout.PREFERRED_SIZE) .addComponent(jScrollPane4, javax.swing.GroupLayout.PREFERRED_SIZE, 195, javax.swing.GroupLayout.PREFERRED_SIZE)) .addPreferredGap(javax.swing.LayoutStyle.ComponentPlacement.RELATED)

Page 32: DCA Mini Project Report

Appendix

24

.addComponent(jScrollPane2, javax.swing.GroupLayout.DEFAULT_SIZE, 371, Short.MAX_VALUE) .addGap(6, 6, 6) .addGroup(layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEADING) .addComponent(jButton2, javax.swing.GroupLayout.DEFAULT_SIZE, 91, Short.MAX_VALUE) .addGroup(layout.createSequentialGroup() .addGap(10, 10, 10) .addComponent(jButton3, javax.swing.GroupLayout.PREFERRED_SIZE, 60, javax.swing.GroupLayout.PREFERRED_SIZE)) .addComponent(jButton1, javax.swing.GroupLayout.DEFAULT_SIZE, javax.swing.GroupLayout.DEFAULT_SIZE, Short.MAX_VALUE)))) .addContainerGap()) ); layout.setVerticalGroup( layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEADING) .addGroup(layout.createSequentialGroup() .addGap(11, 11, 11) .addGroup(layout.createParallelGroup(javax.swing.GroupLayout.Alignment.TRAILING) .addGroup(layout.createSequentialGroup() .addComponent(jLabel2) .addPreferredGap(javax.swing.LayoutStyle.ComponentPlacement.RELATED) .addComponent(jScrollPane3, javax.swing.GroupLayout.PREFERRED_SIZE, 88, javax.swing.GroupLayout.PREFERRED_SIZE)) .addGroup(layout.createSequentialGroup() .addComponent(jLabel1) .addPreferredGap(javax.swing.LayoutStyle.ComponentPlacement.RELATED) .addComponent(jScrollPane1, javax.swing.GroupLayout.PREFERRED_SIZE, 88, javax.swing.GroupLayout.PREFERRED_SIZE))) .addPreferredGap(javax.swing.LayoutStyle.ComponentPlacement.RELATED) .addGroup(layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEADING) .addComponent(jScrollPane2, javax.swing.GroupLayout.DEFAULT_SIZE, 104, Short.MAX_VALUE) .addGroup(layout.createSequentialGroup() .addComponent(jButton1) .addPreferredGap(javax.swing.LayoutStyle.ComponentPlacement.RELATED) .addComponent(jButton3) .addPreferredGap(javax.swing.LayoutStyle.ComponentPlacement.RELATED) .addComponent(jButton2)) .addGroup(layout.createSequentialGroup() .addComponent(jScrollPane4, javax.swing.GroupLayout.PREFERRED_SIZE, 49, javax.swing.GroupLayout.PREFERRED_SIZE) .addPreferredGap(javax.swing.LayoutStyle.ComponentPlacement.RELATED) .addComponent(jScrollPane5, javax.swing.GroupLayout.PREFERRED_SIZE, 49, javax.swing.GroupLayout.PREFERRED_SIZE))) .addContainerGap()) ); pack(); }// </editor-fold>//GEN-END:initComponents private void jButton1ActionPerformed(java.awt.event.ActionEvent evt) {//GEN-FIRST:event_jButton1ActionPerformed // TODO add your handling code here: _getIPList(); _getRankList();

Page 33: DCA Mini Project Report

Appendix

25

_getJobs(); _getKeyList(); }//GEN-LAST:event_jButton1ActionPerformed private void jButton3ActionPerformed(java.awt.event.ActionEvent evt) {//GEN-FIRST:event_jButton3ActionPerformed // TODO add your handling code here: this.dispose(); System.exit(0); }//GEN-LAST:event_jButton3ActionPerformed private void jButton2ActionPerformed(java.awt.event.ActionEvent evt) {//GEN-FIRST:event_jButton2ActionPerformed // TODO add your handling code here: SM=new ServerManager(ipN,rN,jN,kN,ip,jobs,rank,key,this); }//GEN-LAST:event_jButton2ActionPerformed /** * @param args the command line arguments */ public static void main(String args[]) { java.awt.EventQueue.invokeLater(new Runnable() { public void run() { new ServerGUI().setVisible(true); } }); } // Variables declaration - do not modify//GEN-BEGIN:variables private javax.swing.JButton jButton1; private javax.swing.JButton jButton2; private javax.swing.JButton jButton3; private javax.swing.JLabel jLabel1; private javax.swing.JLabel jLabel2; private javax.swing.JList jList1; private javax.swing.JList jList3; private javax.swing.JScrollPane jScrollPane1; private javax.swing.JScrollPane jScrollPane2; private javax.swing.JScrollPane jScrollPane3; private javax.swing.JScrollPane jScrollPane4; private javax.swing.JScrollPane jScrollPane5; private javax.swing.JTextArea jTextArea1; private javax.swing.JTextArea jTextArea2; private javax.swing.JTextArea jTextArea3; // End of variables declaration//GEN-END:variables // public void _getIPList() { this.jList1.removeAll(); try { BufferedReader in=new BufferedReader(new FileReader("ip.txt")); ipN=Integer.parseInt(in.readLine()); ip=new String[ipN];

Page 34: DCA Mini Project Report

Appendix

26

for(int i=0;i<ipN;i++) { ip[i]=in.readLine(); } in.close(); this.jList1.setListData(ip); this.jButton1.setEnabled(false); } catch(Exception e) { setErr(e.getMessage()); } } // public void _getRankList() { this.jList3.removeAll(); try { BufferedReader in=new BufferedReader(new FileReader("rank.txt")); rN=Integer.parseInt(in.readLine()); rank=new String[rN]; for(int i=0;i<rN;i++) { rank[i]=in.readLine(); } in.close(); this.jList3.setListData(rank); } catch(Exception e) { setErr(e.getMessage()); } } // public void _getKeyList() { this.jTextArea3.setText(""); try { BufferedReader in=new BufferedReader(new FileReader("key.txt")); kN=Integer.parseInt(in.readLine()); key=new String[kN]; for(int i=0;i<kN;i++) { key[i]=in.readLine(); this.jTextArea3.setText(this.jTextArea3.getText()+"\n"+key[i]); } in.close(); //this.jList3.setListData(rank); } catch(Exception e) { setErr(e.getMessage());

Page 35: DCA Mini Project Report

Appendix

27

} } // public void _getJobs() { this.jTextArea2.setText(""); try { BufferedReader in=new BufferedReader(new FileReader("jobs.txt")); jT=Integer.parseInt(in.readLine()); this.jTextArea2.setText("Job Type:"+jT); switch(jT) { case 0: jN=Integer.parseInt(in.readLine()); jobs=new String[jN]; for(int i=0;i<jN;i++) { jobs[i]=in.readLine(); this.jTextArea2.setText(this.jTextArea2.getText()+"\n"+jobs[i]); } break; } in.close(); } catch(Exception e) { setErr(e.getMessage()); } } // public void setErr(String err) { this.jTextArea1.setText(this.jTextArea1.getText()+"\n"+err); System.out.println(err); } public void setInfo(String info) { setErr(info); } } /** * * @author SuperStar */ import java.net.*; import java.io.*; interface ServerIF { final int PORT=5678; public void dataFC(String data); }

Page 36: DCA Mini Project Report

Appendix

28

public class ServerManager extends Thread implements ServerIF { String IP[],R[],J[],K[]; int rN,ipN,jN,kN; Socket[] sock; ServerWriteThread[] SWT; ServerReadThread[] SRT; ServerI SI=null; public ServerManager(int i,int r,int j,int k,String[] ip1,String[] j1,String[] r1,String[] k1,ServerI si) { rN=r; ipN=i; jN=j; kN=k; IP=ip1; J=j1; R=r1; K=k1; SI=si; start(); } public void run() { try { sock=new Socket[ipN]; SWT=new ServerWriteThread[ipN]; SRT=new ServerReadThread[ipN]; //SI.setInfo("ipn:"+ipN); for(int i=0;i<ipN;i++) { sock[i]=new Socket(IP[i],5678); //SI.setInfo("ip:"+IP[i]); SWT[i]=new ServerWriteThread(sock[i],SI,this); SRT[i]=new ServerReadThread(sock[i],SI,this); //SI.setInfo("soc:"+sock[i].toString()); } SI.setInfo("Sockets created"); _split(); } catch(Exception e1) { SI.setErr("Sock Cre:"+e1.toString()); } } public void _split() { //java.util.Arrays.sort(R); for(int i=0;i<ipN;i++) { SWT[i].send(""+kN); //SI.setInfo(""+kN); }

Page 37: DCA Mini Project Report

Appendix

29

for(int i=0;i<ipN;i++) { for(int j=0;j<kN;j++) { SWT[i].send(K[j]); //SI.setInfo(K[j]); } } // SI.setInfo("Keys distributed"); for(int i=0,j=0;i<jN;i++) { SWT[j].send(J[i]); //SI.setInfo(J[i]); if(j<ipN-1) j++; else j=0; } } public void dataFC(String data) { SI.setInfo(data); } public void _quit() { // } } ////////////// class ServerWriteThread { Socket S; ServerI SI=null; ServerIF SIF; public ServerWriteThread(Socket s,ServerI si,ServerIF sif) { SIF=sif; SI=si; S=s; //SI.setInfo(s.toString()); } public void send(String msg) { try { //SI.setInfo(msg); PrintWriter out=new PrintWriter(new BufferedWriter(new OutputStreamWriter(S.getOutputStream())),true); out.println(msg); } catch(Exception e3) { SI.setErr(e3.getMessage()); } }

Page 38: DCA Mini Project Report

Appendix

30

} ////////////// class ServerReadThread extends Thread { Socket S; ServerI SI=null; ServerIF SIF; public ServerReadThread(Socket s,ServerI si,ServerIF sif) { S=s; SIF=sif; SI=si; //SI.setInfo(s.toString()); start(); } public void run() { try { BufferedReader in=new BufferedReader(new InputStreamReader(S.getInputStream())); while(true) { //PrintWriter out=new PrintWriter(new BufferedWriter(new OutputStreamWriter(os.getOutputStream())),true); SIF.dataFC(in.readLine()); } } catch(Exception e2) { SI.setErr(e2.getMessage()); } } } /* * MessageBox.java * * Created on November 2, 2008, 9:15 PM */ /** * * @author SuperStar */ public class MessageBox extends javax.swing.JFrame { String MSG="SuperStar"; /** Creates new form MessageBox */ public MessageBox(String msg) { MSG=msg; initComponents(); this.jTextArea1.setText(MSG); } /** This method is called from within the constructor to * initialize the form. * WARNING: Do NOT modify this code. The content of this method is

Page 39: DCA Mini Project Report

Appendix

31

* always regenerated by the Form Editor. */ // <editor-fold defaultstate="collapsed" desc="Generated Code">//GEN-BEGIN:initComponents private void initComponents() { jButton1 = new javax.swing.JButton(); jScrollPane1 = new javax.swing.JScrollPane(); jTextArea1 = new javax.swing.JTextArea(); setDefaultCloseOperation(javax.swing.WindowConstants.EXIT_ON_CLOSE); setTitle("MessageBox"); setAlwaysOnTop(true); setBackground(new java.awt.Color(183, 226, 252)); setForeground(new java.awt.Color(0, 0, 0)); jButton1.setText("OK"); jButton1.addActionListener(new java.awt.event.ActionListener() { public void actionPerformed(java.awt.event.ActionEvent evt) { jButton1ActionPerformed(evt); } }); jTextArea1.setColumns(20); jTextArea1.setRows(5); jTextArea1.setOpaque(false); jScrollPane1.setViewportView(jTextArea1); javax.swing.GroupLayout layout = new javax.swing.GroupLayout(getContentPane()); getContentPane().setLayout(layout); layout.setHorizontalGroup( layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEADING) .addGroup(javax.swing.GroupLayout.Alignment.TRAILING, layout.createSequentialGroup() .addComponent(jScrollPane1, javax.swing.GroupLayout.DEFAULT_SIZE, 315, Short.MAX_VALUE) .addPreferredGap(javax.swing.LayoutStyle.ComponentPlacement.RELATED) .addComponent(jButton1) .addContainerGap()) ); layout.setVerticalGroup( layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEADING) .addComponent(jScrollPane1, javax.swing.GroupLayout.PREFERRED_SIZE, 46, javax.swing.GroupLayout.PREFERRED_SIZE) .addGroup(layout.createSequentialGroup() .addContainerGap() .addComponent(jButton1)) ); pack(); }// </editor-fold>//GEN-END:initComponents private void jButton1ActionPerformed(java.awt.event.ActionEvent evt) {//GEN-FIRST:event_jButton1ActionPerformed // TODO add your handling code here: this.dispose(); }//GEN-LAST:event_jButton1ActionPerformed

Page 40: DCA Mini Project Report

Appendix

32

// Variables declaration - do not modify//GEN-BEGIN:variables private javax.swing.JButton jButton1; private javax.swing.JScrollPane jScrollPane1; private javax.swing.JTextArea jTextArea1; // End of variables declaration//GEN-END:variables } CLIENT /** * * @author SuperStar */ import java.io.*; import java.net.*; import java.util.*; public class ClientGUI { public static void main(String[] s) throws Exception { ServerSocket SS=new ServerSocket(5678); new ClientManager(SS.accept()); } } ///////// interface ClientIF { final int PORT=5678; public void dataFS(String s); public void setErr(String err); public void setInfo(String info); public void setKLen(int kn); public void setKeys(String[] k); } //////// class ClientManager implements ClientIF { Socket S; ClientWriteThread CWT; ClientReadThread CRT; int kN=0; String[] key; String URL; public ClientManager(Socket s) { S=s; //setInfo(s.toString()); CWT=new ClientWriteThread(S,this); CRT=new ClientReadThread(S,this); }

Page 41: DCA Mini Project Report

Appendix

33

// public void _search(String src,String key) { // //java.util.Scanner ss=new java.util.Scanner(src); //StringTokenizer ss=new StringTokenizer(src,key,true); int c=0,i=0,j=-1; while(i<src.length()) { if((j=src.indexOf(key,(j+1)))!=-1) ++c; else break; //ss.next(key); //System.out.println(c); ++i; } CWT.send("Is Found:"+src.contains(key)); CWT.send(S.toString()+"\n:"+URL+"---"+key+"---"+c); //setInfo(URL+"---"+key+"---"+c); } // public String _parseURL(String u) { String r=""; try { URL url=new URL("http",u,"/"); URLConnection con=url.openConnection(); con.connect(); InputStream in=con.getInputStream(); int ch=-1; while((ch=in.read())!=-1) { r+=((char)ch); } in.close(); System.out.println(r); } catch(Exception e1) { setErr("URL Err:"+e1.toString()); } //setInfo(r); return r; } // public void dataFS(String s) { URL=s; //setInfo(s); for(int i=0;i<kN;i++) _search(_parseURL(s),key[i]); } public void setErr(String err)

Page 42: DCA Mini Project Report

Appendix

34

{ System.out.println(err); } public void setInfo(String info) { setErr(info); } public void setKLen(int kn) { kN=kn; } public void setKeys(String[] k) { key=k; } } /////////// class ClientWriteThread { Socket S; ClientIF CIF; public ClientWriteThread(Socket s,ClientIF cif) { CIF=cif; S=s; //CIF.setInfo(S.toString()); } public void send(String msg) { try { //CIF.setInfo(msg); PrintWriter out=new PrintWriter(new BufferedWriter(new OutputStreamWriter(S.getOutputStream())),true); out.println(msg); } catch(Exception e3) { CIF.setErr("Send:"+e3.getMessage()); } } } ////////////// class ClientReadThread extends Thread { Socket S; //ServerI SI=null; ClientIF CIF; int kN=0; String[] key; public ClientReadThread(Socket s,ClientIF cif) { S=s; CIF=cif; //SI=si; //CIF.setInfo(s.toString());

Page 43: DCA Mini Project Report

Appendix

35

//CIF.setInfo(""+kN); start(); } public void run() { try { BufferedReader in=new BufferedReader(new InputStreamReader(S.getInputStream())); kN=Integer.parseInt(in.readLine()); key=new String[kN]; //CIF.setInfo(""+kN); for(int i=0;i<kN;i++) { key[i]=in.readLine(); //CIF.setInfo(key[i]); } CIF.setKLen(kN); //CIF.setInfo(""+kN); CIF.setKeys(key); //CIF.setInfo(key.toString()); while(true) { //PrintWriter out=new PrintWriter(new BufferedWriter(new OutputStreamWriter(os.getOutputStream())),true); CIF.dataFS(in.readLine()); } } catch(Exception e2) { CIF.setErr("Read:"+e2.getMessage()); } } }