Performance Evaluation of Apriori Algorithm on Reservation ... · data mining techniques like association rule mining, classification, clustering, and prediction are used. II. A.

INTERNATIONAL JOURNAL OF TECHNOLOGY AND COMPUTING (IJTC)

ISSN-2455-099X,

Volume 3, Issue 1 January 2017

IJTC201701002 www. ijtc.org 3

Performance Evaluation of Apriori Algorithm on

Reservation Policy Jasleen Kaur

1, Rasbir Singh

2, Rupinder Kaur Gurm

3

1M.Tech Scholar,

23Asst. Professor

123RIMT, Mandi Gobindgarh, Punjab

[email protected], [email protected],

[email protected]

Abstract—various advantages and ill effects of reservation policy are affecting Indian Education system and employment sector.

Due to disturbance created by reservation system in education and job sector in the society, people are marginally divided into two

groups i.e. people who are in favor or non-favor of reservation which accelerates the other causes which are associated with

negative effect of reservation to further worse the situation. So, survey is conducted with teachers and students to detect main

causes of reservation and their point of view. Statistical t-test is conducted on surveyed dataset to filter the high impact factors of

reservation out of all factors taken and interpreted graphically. Further, Apriori Association rule mining applied using data mining

tool to study the interdependence of main causes behind growing impact of reservation and finding the possible solutions to make

system unbiased. Improvement in Apriori algorithm is proposed and Comparison of of proposed and existing techniques in terms of

parameters candidate generation, number of cycles performed and minimum support done using graphs to obtain faster results

thus reducing the repeatedly database scan.

Keywords- Aprioi algorithm, Association Rules, Statistical test, reservation, data mining

I. INTRODUCTION

The Constitution specifically prohibits discrimination on the

basis of caste, and reserves 22.5% of seats in institutions of

higher education and government employment for

Scheduled Castes, Scheduled Tribes and OBC’s. The

Mandal commission also recommended that the total

number of seats subject to reservation be increased from

22.5% to 49.5%. Many people think reservation is necessary

for the lower caste people because the people who are living

in those sections are much more economically weak.

Due to disturbance created by reservation system in

education and job sector in the society, people are

marginally divided into two groups i.e. people who are in

favor or non-favor of reservation.

Even comparing Indian Universities with Israeli selective

universities [1] examine the effect of eligibility for

affirmative action on admission and enrolment. While

studying earning patterns in [2], changing the admission and

financial aid rules at colleges affects future earnings.

In paper [3] author survey the literature on the impact of

racial preferences in college admissions on both minority

and majority students in US higher education.

Data mining is a process of extracting hidden patterns from

large datasets or data warehouses. Effective data mining

depends upon data being supplied. Data varies from large to

small datasets, structured to unstructured dataset. Various

data mining techniques like association rule mining,

classification, clustering, and prediction are used.

II. APRIORI ALGORITHM

Association Process means to find frequently occurring

patterns of data items and then finding the relation among

them and association rules are defined using support and

confidence attributes. Association rule is used in field of

bioinformatics, web mining, and customer relationship.

Each rule is compared mainly by two measures confidence

and support. Support is defined when P items with respect

to transactions T is present in transaction T. For e.g. Item

set {bread, butter} has support for 90% means this bread

and butter is opted for 90 times every 100 times. Confidence

means P item gives item Q for every transaction T. For e.g.

{bread, butter} => {milk}; the person who buys bread,

butter definitely buys milk also. So from these two measures

we conclude four main types of rule are rule with high

support and high confidence, rule with low support and high

confidence, rule with low support and low confidence and

rule with high support and low confidence. Two algorithms

based on association rule are Apriori and Eclat algorithms.

Apriori Algorithm is the algorithm to find frequent item sets

existing in databases with multiple scanning of data. From

these frequent item sets, the strong association rules are

generated. R. Agrawal in 1993 discovered the frequent item

set generation algorithms for increasing speed of mining.

Basic Apriori algorithm contains commonly of two steps

join and prune actions.

(i) Join action: Let k is the item sets present in set Lk, ,which

is a frequent set. To find Lk, join operation is performed

between Lk-1 with itself.

(ii) Prune action: Let, Ck be candidate set which is superset

of Lk, and then items from Ck according to Apriori are

removed from Lk if having value less than threshold.

Steps for Apriori algorithm are:

Step 1: Set the user predefined minimum support and

confidence.

Step 2: Construct first candidate set and name it as C1(k-1)

having item sets C1, C2,... , Cn. Now perform prune operation

by removing item sets with support values lower than

threshold. Here, frequent-1 item set (L1) is obtained.

IJTC.O

RG


ISSN-2455-099X,



Step 3: Join L1 with itself to obtain C2, candidate item sets-

2(k). Again remove infrequent item sets from C2 to get L2,

frequent item set-2.

Step 4: Keep repeating the step 3 until no more candidate

set is generated.

Further, improvement in apriori is done by minimising the

candidate itemsets. In paper [4], author proposed the

Hospital Exam Reservation System (HERS), using the data

mining method Apriori algorithm which focused on carrying

patient and clinical exam data and finding the best schedule

for generating rules using the multi-examination pattern-

mining algorithm for patient satisfaction. In [5]

implementation of the Apriori algorithm using WEKA tool

has been explained by step wise procedure. A new dataset

for this study has been created and tested using the ARFF

files. Staring from pre-processor step to generation of

association rules.

In [6] using the some Associative Rule data mining

algorithms, a voting data base is studied to find out the

interest of the voters among the given attributes. The

Association Rule algorithm studies the frequent items that

are being used in the data base. A comparative study of the

Associate Rule (FP-Growth & Apriori) algorithms is done.

Quality of generated association rules is measured, and how

near the top they are which is discussed in [7].

Seven association rule quality measures are invented. Study

is conducted in [8] where important rules are generated to

measure the correlation among various attributes which will

help to improve the student’s academic performance using

Weka and real time dataset available in the college

premises. In [9], author investigated a novel web

recommender system, which combines usage data, content

data, and structure data in a web site to generate user

navigational models. In this paper [10], describe study on

enrolment prediction using support vector machines and

rule-based predictive models.

In [11] Agrawal in 1993, proposed and defined Apriori

algorithm and applied this algorithm to sales data obtained

from a large retailing company, which shows the

effectiveness of the algorithm.

III. LITERATURE SURVEY

Bamrah [12] implemented his work mainly on the

reduction of the variability by observing student potential

along with the reservation policy to find its prediction of

admission of each student in the course. Linear regression

technique has been applied to find the relation and

hypothesis testing reports the difference between two

samples in different branches.

Dr Sunil Kumar Jangir, 2013 [13] reports the connection

between caste discrimination and the government plan

where to identify the people under OBCs category, eleven

criteria are adopted. The reservation policy was first

implemented to reduce the mass poverty. But, according to

today’s scenario this policy is dividing society on basis of

caste system.

Falguni [14] this paper presents the model which combines

action of Agents and Data Mining techniques. Data Mining

used in education system known as EDM (education data

mining). Various visualization methods are used to predict

serious issues related to student’s degrading performance

based on K-means and the way to improve it.

Dinesh Kumar [15] In this paper, the new algorithm has

been presented with Binary Search Tree which stores the

global rules by consolidating the local rules generated at

each site which can be further used in prediction of

Students' admission to college.

Marianne Bertrand, [16] examines an affirmative action

program for “lower-caste” groups in engineering colleges in

India and conducted survey in which total of 721

households agreed to participate in the survey. As a result,

paper concluded that the reservation policy may provide

benefits only to those who are already economically better

off within the lower caste groups.

Ajay Kumar [17] use two most popular algorithms namely

Apriori and frequent pattern growth algorithm using SPECT

heart dataset available Tunedit Machine Learning

Repository. They analyzed that Apriori algorithm runs

better in terms of frequent item sets generated and number

of cycle performed during execution of two algorithms

using WEKA.

D. Bansal [18] apply Apriori on real dataset against crimes

on women which extracts hidden information that what age

group is responsible for this and to find where the real

culprit is hiding. Comparison is done between Apriori &

Predictive Apriori Algorithm in which Apriori is better and

faster than Predictive Apriori Algorithm.

M. Girotra [19] discusses the respective characteristics and

the shortcomings of the algorithms for mining association

rules in this paper. It also provides a comparative study of

different association rule mining techniques stating which

algorithm is best suitable in which case.

Haripriya [20] found some interesting patterns from an

unstructured mixed data using association mining which can

automatically compute number of clusters formed and pair

wise distance measure. Experimentation is done with real

mixed data taken from UCI repository. Proposed algorithm

proved to generate accurate results.

Kenneth Lai [21] presents a description of two types of

association rule algorithms and compare the performance of

the MinHash algorithm against DLG in terms of various

parameters. He concluded the performance of an algorithm

depends not only on the execution speed, support or

confidence but also on other factors such as memory usage.

MinHash, differed from DLG in that it used a confidence-

then-support approach, used less memory and the support

requirement is low.

IV. OBJECTIVES OF RESEARCH

To evaluate effectiveness of reservation policy on job and

admission.

To identify the problems faced by students of all

categories.

To suggest the strategies for the improvement of

reservation policy.

To implement Apriori algorithm for generating best

IJTC.O

RG


ISSN-2455-099X,



association rules on reservation policy.

To propose the improvement in Apriori algorithm to

reduce number of iterartions.

Compare the results of proposed and existing techniques

in terms of parameters candidate generation, number of

cycles performed and minimum support.

V. EXPERIMENTAL STUDY AND IMPLEMENTATION

The proposed work has been implemented using first

statistical testing using SPSS tool and then applying data

mining techniques to filtered dataset for further qualitative

analysis using WEKA tool. The design of proposed work is

shown in Fig. 1.

A. Statistical Testing

A t-test is any statistical hypothesis test. It can be used to

determine if two sets of data are significantly different from

each other.

The independent samples t-test is used when two separate

sets of independent as shown in equation (1) and identically

distributed samples are obtained, one from each of the two

populations being compared. In my research I have used t-

test to filter out low impact factors to make my study

effective which is based on high impact factors, graphs are

also generated for same.

(1)

Where, 1 = mean of sample x2 = mean of sample 2

N1 = number of entries in sample 1

N2 = number of entries in sample 2

S12 = variance of sample 1

S22 = variance of sample 2

B. Dataset description

Basically three datasets are used in my experiment from

which two are surveyed dataset as a result of survey

conducted during my research and other is training dataset

for purpose of testing functionality of improved Apriori

algorithms.

Datasets on which above mentioned three Apriori

algorithms are implemented, tested and compared :

Student dataset (Survey dataset)

Student response is recorded in spreadsheet and applied on

three Apriori algorithms. Basic Apriori applied on student

dataset, the first six rules of output have confidence value=1

i.e. 100% confidence.

Teacher dataset (Survey dataset)

Teacher’s response is recorded to analyse their opinion

whether they are in favor or non-favor of reservation. Data

is collected via questionnaire form filled by respective

teachers.

Spect_Test dataset (Training set)

This dataset is taken from online sources[?] which is used as

training set. Training set is applied to algorithms to test the

proper functioning of algorithm

Fig. 1. Flowchart for proposed work

C. Techniques Used

Basic Apriori

Basic Apriori algorithm is applied in WEKA tool to dataset

with properties confidence=0.9 and ‘n’ number of rules is

10. Best ten association rules are presented as output. Min

Support value is automatically calculated by the tool i.e.

Min Support is calculated equal to 0.2. Rules with highest

confidence are placed first. It is also known as confidence

then support algorithm. Its means when minimum

confidence value and number of rules to be found are

predefined entered by user. Support is automatically

calculated starting from value 1.0 i.e. 100% support then

keeps on decreasing by delta value each time to adjust itself

to predefined confidence and number of rules. By default

delta is set to 0.05.

Improved Apriori

Results of association rule can be refined using different

measure i.e. lift. Lift is measure of association/dependencies

between attributes. Rules with high lift values but low

confidence which are very important for decision making

process are placed at bottom whereas using lift value they

are displayed in top results. Minimum value of lift is 1. If

value of lift is less than 1, then it is known as negative lift

IJTC.O

RG


ISSN-2455-099X,



which means L.H.S. of rule is completely independent on

R.H.S. of the rule. If value of lift is more than 1, then it is

known as positive lift which means L.H.S. of rule is

completely dependent on R.H.S. of the rule. The

associations between attributes are high.

The parameter Min Metric gives four options to predefine

any one option from confidence, lift, leverage and

conviction. In improved Apriori, lift is predefined instead of

confidence. Properties of Improved Apriori with lift value

are adjusted where lift is set to 1.0 and number of rules is

10. Best ten association rules are displayed as output. It is

also known as lift then support algorithm.

Filtered Apriori

In filtered Associater algorithm, Apriori algorithm can be

combined with various filters. In my research, I have

implemented ‘Add cluster’ of Simple K-Means method

which is type of unsupervised filter. Basically it is fusion of

association and clustering data mining techniques. It adds

clusters to the association rules. Clusters are shown on

R.H.S. of the rule n.

VI. RESULTS

A. Statistical T-test

According to response from students parameters like

“Rebate in fees”, “Quota Based On Economic Status”,

“Cannot Be Accepted By Increasing Seats”, “Direct

Recruitments Basis On Open Competition”, “Equality of

Opportunity”, “Reservations Should Not Be Based On

Caste” and “Reservation Is More Dangerous than admission

through Donation Or Management Quota” are highly

significant. Other parameters such as “Students Are

Severely Restricted on Choice of Occupation”, “Reservation

Is a Self Destructive Process Adopted by the Government”

and “Reservation Disrespect Students’ Ability and Intellect”

were found to be significant.

According to the response collected from teacher’s

parameters such as “Quota Based On Economic Status”,

“Cannot Be Accepted By Increasing Seats”, “Direct

Recruitments Basis On Open Competition”, “Reservation

Hampers The Autonomy Of Educational Institution” and

“Reservation Divides the Students by Recognizing the Caste

System In Sophisticated Way” are highly significant. Other

parameters such as “Rebate In Fees”, “Students Are

Severely Restricted On Choice Of Occupation”, “Equality

of Opportunity” and “Reservation In Jobs Produces Bad

Effect In The Work Areas” were found to be significant.

Insignificant factors of reservation policy were ignored and

left out in further steps because there is no point of carrying

out study on such factors which will generate results of less

importance.

B. Comparison of Existing and Proposed Apriori

Algorithm

Comparison between three algorithms i.e. Basic Apriori,

Improved Apriori and Filtered Apriori is done on the basis

of three measures of association data mining algorithm i.e

number of cycles, candidate item set generation and

minimum support value.

Number of cycles

When compared with basic Apriori, in Improved Apriori

results came out to be positive, with each dataset number of

cycles are surely reduced but in case of filtered Apriori

results were variable where in student dataset cycles are

reduced and proved to be better than Improved Apriori. On

other hand in other two datasets cycles remain equal to basic

Apriori. As a result, Improved Apriori has reduced number

of cycles, thus making system efficient.

Number of cycles is also known as number iterations. Less

number of cycles means reducing effort and more memory

space and more resource allocation thus making the speed

slow. An ideal data mining Algorithm is one which

generates output with less number of iterations and speed

should be maximum.

Fig.2. No. of cycles performed

Number of candidates generated

Candidate Item sets are generated when each cycle is

executed during running of Apriori algorithm. Total number

of item sets is calculated by adding number of item sets

generated at each cycle. Observation from fig. 2, results was

similar as number of cycles performed, number of item sets

are almost reduced to half. Improved Apriori proved to give

best results. On other hand, Filtered Apriori has increased

number of itemsets to double, thus allocating more memory

space.

Fig. 3 . No. of item sets generated

Minimum Support

Third parameter Minimum Support value is increased by

nearly 0.05 in improved Apriori and in some cases it

remains same or increases in Filtered Apriori. Minimum

Support should be not too small as it does give desirable

results.

Fig. 4. Minimum Support value

IJTC.O

RG


ISSN-2455-099X,



VII. CONCLUSION

Best rules did not include reservation causes of less

importance and are ignored. Improved Apriori algorithm is

faster and better than basic Apriori as evidence number of

cycles performed and candidates generated is less.

Minimum Support value increases in Improved Apriori

resulting in more accurate rules. Filtered Apriori give

variable results for different datasets in comparison to basic

and Improved Apriori.

VIII. FUTURE SCOPE

The responses gathered from teacher and students are

confined to colleges having reservation criteria in particular

region. This research can be extended with wider area such

as different states and countries. Working with modern data

mining techniques such as Eclat algorithm, neural networks,

Preditive Apriori for better results.More parameters can be

added for comparison of Apriori algorithms such as

leverage, accuracy.

ACKNOWLEDGMENT

I want to express my gratitude towards my guide Mr. Rasbir

Singh who supported me and guided me through every

mistake which I committed during the writing of this paper.

And I am thankful to Mrs. Rupinder Kaur Gurm who gave

me precious time and gave me more clarity about the topics

I discussed. This paper would never be possible without

support of my parents and family.

REFERENCES

[1] P. Arcidiacono. "Affirmative action in higher education: How do

admission and financial aid rules affect future earnings"

Econometrica 73, no. 5, pp. 1477-1524, Sep 2005.

[2] S. Alon, and O. Malamud. "The impact of Israel's class-based

affirmative action policy on admission and academic

outcomes." Economics of Education Review 40, pp.123-139, Jun

2014.

[3] P. Arcidiacono, M. Lovenheim, and M. Zhu. "Affirmative Action in

Undergraduate Education." Annu. Rev. Econ. 7, no. 1, pp. 487-518,

Aug 2015.

[4] H. S. Cha, T.S. Yoon, K. C. Ryu, I. W. Shin, Y. H. Choe, K. Y. Lee,

J. D. Lee, K. H. Ryu, and S. H. Chung. "Implementation of Hospital

Examination Reservation System Using Data Mining

Technique." Healthcare informatics research 21, no. 2, pp. 95-101,

Apr 2015.`

[5] A. K. Shrivastava, and R. N. Panda. "Implementation of Apriori

Algorithm using WEKA.", KIET International Journal of Intelligent

Computing and Informatics, Vol. 1, Issue 1, January 2014.

[6] K. Padmavathi, and R. A. Kirithika. "Performance Based Study of

Association Rule Algorithms On Voter DB." International Journal

of Innovative Science, Engineering & Technology, Vol. 1 Issue 4,

June 2014.

[7] J. L. Balcázar, and F. Dogbey. "Evaluation of association rule quality

measures through feature extraction." In International Symposium on

Intelligent Data Analysis, pp. 68-79. Springer Berlin Heidelberg,

Aug 2013.

[8] S. Borkar, and K. Rajeswari. "Predicting students academic

performance using education data mining." IJCSMC International

Journal of Computer Science and Mobile Computing, ISSN, pp.

273-279, July 2013.

[9] J. Li, and O. R. Zaïane. "Combining usage, content, and structure

data to improve web site recommendation." In International

Conference on Electronic Commerce and Web Technologies, pp.

305-315. Springer Berlin Heidelberg, Aug 2004.

[10] S. S. Aksenova, D. Zhang, and M. Lu. "Enrollment prediction

through data mining." In 2006 IEEE International Conference on

Information Reuse & Integration, pp. 510-515. IEEE, Sep 2006.

[11] R. Agrawal, T. Imieliński, and A. Swami. "Mining association rules

between sets of items in large databases." In Acm sigmod record, vol.

22, no. 2, pp. 207-216. ACM, June 1993.

[12] I. S. Bamrah, and A. Girdhar. "Investigation on impact of reservation

policy on student enrollment using data mining." In 2015 IEEE

International Conference on Computational Intelligence and

Computing Research (ICCIC), pp. 1-5. IEEE, Dec 2015.

[13] Dr SK Jangir, "Reservation system and indian constitution-special

refrence to mandal commission." American International Journal of

Research in Humanities, Arts and Social Sciences , 2013.

[14] F. Ranadive, and A. Z. Surti. "Hybrid Agent Based Educational Data

Mining Model for Student Performance Improvement." International

Journal of Modern Communication Technologies & Research

(IJMCTR) ISSN: 2321-0850, Vol.-2, Issue-4, April 2014.

[15] D. B. Vaghela, and P. Sharma. "Students' Admission Prediction

using GRBST with Distributed Data Mining.", Communications on

Applied Electronics (CAE) – ISSN : 2394-4714 Foundation of

Computer Science FCS, New York, USA, Vol. 2 – No.1, June 2015.

[16] M. Bertrand, R. Hanna and S. Mullainathan, "Affirmative action in

education: Evidence from engineering college admissions in India,"

Journal of Public Economics, vol. 94, no. 1, pp. 16-29, 2010.

[17] A. K. Mishra, S. K. Pani, and B. K. Ratha. "Association rule mining

with Apriori and FP growth using weka.", 2 nd international

conference of science, technology and management, University of

Delhi (DU), New Delhi, India, Sep 2015.

[18] D. Bansal, and L. Bhambhu. "Execution of APRIORI Algorithm of

Data Mining Directed Towards Tumultuous Crimes Concerning

Women." International Journal of Advanced Research in Computer

Science and Software Engineering 3, no. 9, pp. 54-62, Sep 2013.

[19] M. Girotra, K. Nagpal, S. Minocha, and N. Sharma. "Comparative

Survey on Association Rule Mining Algorithms." International

Journal of Computer Applications 84, no. 10, Jan 2013.

[20] H. Haripriya, S. Amrutha, R. Veena, and P. Nedungadi. "Integrating

Apriori with paired k-means for Cluster fixed mixed data."

In Proceedings of the Third International Symposium on Women in

Computing and Informatics, pp. 10-16. ACM, Aug 2015.

[21] K. Lai and N. Cerpa. "Support v/s confidence in association rule

algorithms." In Proceedings of the OPTIMA Conference, Curicó.

Oct 2001

IJTC.O

RG

Performance Evaluation of Apriori Algorithm on Reservation ... · data mining techniques like association rule mining, classification, clustering, and prediction are used. II. A.

Documents