Page 1
INTERNATIONAL JOURNAL OF TECHNOLOGY AND COMPUTING (IJTC)
ISSN-2455-099X,
Volume 3, Issue 1 January 2017
IJTC201701002 www. ijtc.org 3
Performance Evaluation of Apriori Algorithm on
Reservation Policy Jasleen Kaur
1, Rasbir Singh
2, Rupinder Kaur Gurm
3
1M.Tech Scholar,
23Asst. Professor
123RIMT, Mandi Gobindgarh, Punjab
[email protected] , [email protected] ,
[email protected]
Abstract—various advantages and ill effects of reservation policy are affecting Indian Education system and employment sector.
Due to disturbance created by reservation system in education and job sector in the society, people are marginally divided into two
groups i.e. people who are in favor or non-favor of reservation which accelerates the other causes which are associated with
negative effect of reservation to further worse the situation. So, survey is conducted with teachers and students to detect main
causes of reservation and their point of view. Statistical t-test is conducted on surveyed dataset to filter the high impact factors of
reservation out of all factors taken and interpreted graphically. Further, Apriori Association rule mining applied using data mining
tool to study the interdependence of main causes behind growing impact of reservation and finding the possible solutions to make
system unbiased. Improvement in Apriori algorithm is proposed and Comparison of of proposed and existing techniques in terms of
parameters candidate generation, number of cycles performed and minimum support done using graphs to obtain faster results
thus reducing the repeatedly database scan.
Keywords- Aprioi algorithm, Association Rules, Statistical test, reservation, data mining
I. INTRODUCTION
The Constitution specifically prohibits discrimination on the
basis of caste, and reserves 22.5% of seats in institutions of
higher education and government employment for
Scheduled Castes, Scheduled Tribes and OBC’s. The
Mandal commission also recommended that the total
number of seats subject to reservation be increased from
22.5% to 49.5%. Many people think reservation is necessary
for the lower caste people because the people who are living
in those sections are much more economically weak.
Due to disturbance created by reservation system in
education and job sector in the society, people are
marginally divided into two groups i.e. people who are in
favor or non-favor of reservation.
Even comparing Indian Universities with Israeli selective
universities [1] examine the effect of eligibility for
affirmative action on admission and enrolment. While
studying earning patterns in [2], changing the admission and
financial aid rules at colleges affects future earnings.
In paper [3] author survey the literature on the impact of
racial preferences in college admissions on both minority
and majority students in US higher education.
Data mining is a process of extracting hidden patterns from
large datasets or data warehouses. Effective data mining
depends upon data being supplied. Data varies from large to
small datasets, structured to unstructured dataset. Various
data mining techniques like association rule mining,
classification, clustering, and prediction are used.
II. APRIORI ALGORITHM
Association Process means to find frequently occurring
patterns of data items and then finding the relation among
them and association rules are defined using support and
confidence attributes. Association rule is used in field of
bioinformatics, web mining, and customer relationship.
Each rule is compared mainly by two measures confidence
and support. Support is defined when P items with respect
to transactions T is present in transaction T. For e.g. Item
set {bread, butter} has support for 90% means this bread
and butter is opted for 90 times every 100 times. Confidence
means P item gives item Q for every transaction T. For e.g.
{bread, butter} => {milk}; the person who buys bread,
butter definitely buys milk also. So from these two measures
we conclude four main types of rule are rule with high
support and high confidence, rule with low support and high
confidence, rule with low support and low confidence and
rule with high support and low confidence. Two algorithms
based on association rule are Apriori and Eclat algorithms.
Apriori Algorithm is the algorithm to find frequent item sets
existing in databases with multiple scanning of data. From
these frequent item sets, the strong association rules are
generated. R. Agrawal in 1993 discovered the frequent item
set generation algorithms for increasing speed of mining.
Basic Apriori algorithm contains commonly of two steps
join and prune actions.
(i) Join action: Let k is the item sets present in set Lk, ,which
is a frequent set. To find Lk, join operation is performed
between Lk-1 with itself.
(ii) Prune action: Let, Ck be candidate set which is superset
of Lk, and then items from Ck according to Apriori are
removed from Lk if having value less than threshold.
Steps for Apriori algorithm are:
Step 1: Set the user predefined minimum support and
confidence.
Step 2: Construct first candidate set and name it as C1(k-1)
having item sets C1, C2,... , Cn. Now perform prune operation
by removing item sets with support values lower than
threshold. Here, frequent-1 item set (L1) is obtained.
IJTC.O
RG
Page 2
INTERNATIONAL JOURNAL OF TECHNOLOGY AND COMPUTING (IJTC)
ISSN-2455-099X,
Volume 3, Issue 1 January 2017
IJTC201701002 www. ijtc.org 4
Step 3: Join L1 with itself to obtain C2, candidate item sets-
2(k). Again remove infrequent item sets from C2 to get L2,
frequent item set-2.
Step 4: Keep repeating the step 3 until no more candidate
set is generated.
Further, improvement in apriori is done by minimising the
candidate itemsets. In paper [4], author proposed the
Hospital Exam Reservation System (HERS), using the data
mining method Apriori algorithm which focused on carrying
patient and clinical exam data and finding the best schedule
for generating rules using the multi-examination pattern-
mining algorithm for patient satisfaction. In [5]
implementation of the Apriori algorithm using WEKA tool
has been explained by step wise procedure. A new dataset
for this study has been created and tested using the ARFF
files. Staring from pre-processor step to generation of
association rules.
In [6] using the some Associative Rule data mining
algorithms, a voting data base is studied to find out the
interest of the voters among the given attributes. The
Association Rule algorithm studies the frequent items that
are being used in the data base. A comparative study of the
Associate Rule (FP-Growth & Apriori) algorithms is done.
Quality of generated association rules is measured, and how
near the top they are which is discussed in [7].
Seven association rule quality measures are invented. Study
is conducted in [8] where important rules are generated to
measure the correlation among various attributes which will
help to improve the student’s academic performance using
Weka and real time dataset available in the college
premises. In [9], author investigated a novel web
recommender system, which combines usage data, content
data, and structure data in a web site to generate user
navigational models. In this paper [10], describe study on
enrolment prediction using support vector machines and
rule-based predictive models.
In [11] Agrawal in 1993, proposed and defined Apriori
algorithm and applied this algorithm to sales data obtained
from a large retailing company, which shows the
effectiveness of the algorithm.
III. LITERATURE SURVEY
Bamrah [12] implemented his work mainly on the
reduction of the variability by observing student potential
along with the reservation policy to find its prediction of
admission of each student in the course. Linear regression
technique has been applied to find the relation and
hypothesis testing reports the difference between two
samples in different branches.
Dr Sunil Kumar Jangir, 2013 [13] reports the connection
between caste discrimination and the government plan
where to identify the people under OBCs category, eleven
criteria are adopted. The reservation policy was first
implemented to reduce the mass poverty. But, according to
today’s scenario this policy is dividing society on basis of
caste system.
Falguni [14] this paper presents the model which combines
action of Agents and Data Mining techniques. Data Mining
used in education system known as EDM (education data
mining). Various visualization methods are used to predict
serious issues related to student’s degrading performance
based on K-means and the way to improve it.
Dinesh Kumar [15] In this paper, the new algorithm has
been presented with Binary Search Tree which stores the
global rules by consolidating the local rules generated at
each site which can be further used in prediction of
Students' admission to college.
Marianne Bertrand, [16] examines an affirmative action
program for “lower-caste” groups in engineering colleges in
India and conducted survey in which total of 721
households agreed to participate in the survey. As a result,
paper concluded that the reservation policy may provide
benefits only to those who are already economically better
off within the lower caste groups.
Ajay Kumar [17] use two most popular algorithms namely
Apriori and frequent pattern growth algorithm using SPECT
heart dataset available Tunedit Machine Learning
Repository. They analyzed that Apriori algorithm runs
better in terms of frequent item sets generated and number
of cycle performed during execution of two algorithms
using WEKA.
D. Bansal [18] apply Apriori on real dataset against crimes
on women which extracts hidden information that what age
group is responsible for this and to find where the real
culprit is hiding. Comparison is done between Apriori &
Predictive Apriori Algorithm in which Apriori is better and
faster than Predictive Apriori Algorithm.
M. Girotra [19] discusses the respective characteristics and
the shortcomings of the algorithms for mining association
rules in this paper. It also provides a comparative study of
different association rule mining techniques stating which
algorithm is best suitable in which case.
Haripriya [20] found some interesting patterns from an
unstructured mixed data using association mining which can
automatically compute number of clusters formed and pair
wise distance measure. Experimentation is done with real
mixed data taken from UCI repository. Proposed algorithm
proved to generate accurate results.
Kenneth Lai [21] presents a description of two types of
association rule algorithms and compare the performance of
the MinHash algorithm against DLG in terms of various
parameters. He concluded the performance of an algorithm
depends not only on the execution speed, support or
confidence but also on other factors such as memory usage.
MinHash, differed from DLG in that it used a confidence-
then-support approach, used less memory and the support
requirement is low.
IV. OBJECTIVES OF RESEARCH
To evaluate effectiveness of reservation policy on job and
admission.
To identify the problems faced by students of all
categories.
To suggest the strategies for the improvement of
reservation policy.
To implement Apriori algorithm for generating best
IJTC.O
RG
Page 3
INTERNATIONAL JOURNAL OF TECHNOLOGY AND COMPUTING (IJTC)
ISSN-2455-099X,
Volume 3, Issue 1 January 2017
IJTC201701002 www. ijtc.org 5
association rules on reservation policy.
To propose the improvement in Apriori algorithm to
reduce number of iterartions.
Compare the results of proposed and existing techniques
in terms of parameters candidate generation, number of
cycles performed and minimum support.
V. EXPERIMENTAL STUDY AND IMPLEMENTATION
The proposed work has been implemented using first
statistical testing using SPSS tool and then applying data
mining techniques to filtered dataset for further qualitative
analysis using WEKA tool. The design of proposed work is
shown in Fig. 1.
A. Statistical Testing
A t-test is any statistical hypothesis test. It can be used to
determine if two sets of data are significantly different from
each other.
The independent samples t-test is used when two separate
sets of independent as shown in equation (1) and identically
distributed samples are obtained, one from each of the two
populations being compared. In my research I have used t-
test to filter out low impact factors to make my study
effective which is based on high impact factors, graphs are
also generated for same.
(1)
Where, 1 = mean of sample x2 = mean of sample 2
N1 = number of entries in sample 1
N2 = number of entries in sample 2
S12 = variance of sample 1
S22 = variance of sample 2
B. Dataset description
Basically three datasets are used in my experiment from
which two are surveyed dataset as a result of survey
conducted during my research and other is training dataset
for purpose of testing functionality of improved Apriori
algorithms.
Datasets on which above mentioned three Apriori
algorithms are implemented, tested and compared :
Student dataset (Survey dataset)
Student response is recorded in spreadsheet and applied on
three Apriori algorithms. Basic Apriori applied on student
dataset, the first six rules of output have confidence value=1
i.e. 100% confidence.
Teacher dataset (Survey dataset)
Teacher’s response is recorded to analyse their opinion
whether they are in favor or non-favor of reservation. Data
is collected via questionnaire form filled by respective
teachers.
Spect_Test dataset (Training set)
This dataset is taken from online sources[?] which is used as
training set. Training set is applied to algorithms to test the
proper functioning of algorithm
Fig. 1. Flowchart for proposed work
C. Techniques Used
Basic Apriori
Basic Apriori algorithm is applied in WEKA tool to dataset
with properties confidence=0.9 and ‘n’ number of rules is
10. Best ten association rules are presented as output. Min
Support value is automatically calculated by the tool i.e.
Min Support is calculated equal to 0.2. Rules with highest
confidence are placed first. It is also known as confidence
then support algorithm. Its means when minimum
confidence value and number of rules to be found are
predefined entered by user. Support is automatically
calculated starting from value 1.0 i.e. 100% support then
keeps on decreasing by delta value each time to adjust itself
to predefined confidence and number of rules. By default
delta is set to 0.05.
Improved Apriori
Results of association rule can be refined using different
measure i.e. lift. Lift is measure of association/dependencies
between attributes. Rules with high lift values but low
confidence which are very important for decision making
process are placed at bottom whereas using lift value they
are displayed in top results. Minimum value of lift is 1. If
value of lift is less than 1, then it is known as negative lift
IJTC.O
RG
Page 4
INTERNATIONAL JOURNAL OF TECHNOLOGY AND COMPUTING (IJTC)
ISSN-2455-099X,
Volume 3, Issue 1 January 2017
IJTC201701002 www. ijtc.org 6
which means L.H.S. of rule is completely independent on
R.H.S. of the rule. If value of lift is more than 1, then it is
known as positive lift which means L.H.S. of rule is
completely dependent on R.H.S. of the rule. The
associations between attributes are high.
The parameter Min Metric gives four options to predefine
any one option from confidence, lift, leverage and
conviction. In improved Apriori, lift is predefined instead of
confidence. Properties of Improved Apriori with lift value
are adjusted where lift is set to 1.0 and number of rules is
10. Best ten association rules are displayed as output. It is
also known as lift then support algorithm.
Filtered Apriori
In filtered Associater algorithm, Apriori algorithm can be
combined with various filters. In my research, I have
implemented ‘Add cluster’ of Simple K-Means method
which is type of unsupervised filter. Basically it is fusion of
association and clustering data mining techniques. It adds
clusters to the association rules. Clusters are shown on
R.H.S. of the rule n.
VI. RESULTS
A. Statistical T-test
According to response from students parameters like
“Rebate in fees”, “Quota Based On Economic Status”,
“Cannot Be Accepted By Increasing Seats”, “Direct
Recruitments Basis On Open Competition”, “Equality of
Opportunity”, “Reservations Should Not Be Based On
Caste” and “Reservation Is More Dangerous than admission
through Donation Or Management Quota” are highly
significant. Other parameters such as “Students Are
Severely Restricted on Choice of Occupation”, “Reservation
Is a Self Destructive Process Adopted by the Government”
and “Reservation Disrespect Students’ Ability and Intellect”
were found to be significant.
According to the response collected from teacher’s
parameters such as “Quota Based On Economic Status”,
“Cannot Be Accepted By Increasing Seats”, “Direct
Recruitments Basis On Open Competition”, “Reservation
Hampers The Autonomy Of Educational Institution” and
“Reservation Divides the Students by Recognizing the Caste
System In Sophisticated Way” are highly significant. Other
parameters such as “Rebate In Fees”, “Students Are
Severely Restricted On Choice Of Occupation”, “Equality
of Opportunity” and “Reservation In Jobs Produces Bad
Effect In The Work Areas” were found to be significant.
Insignificant factors of reservation policy were ignored and
left out in further steps because there is no point of carrying
out study on such factors which will generate results of less
importance.
B. Comparison of Existing and Proposed Apriori
Algorithm
Comparison between three algorithms i.e. Basic Apriori,
Improved Apriori and Filtered Apriori is done on the basis
of three measures of association data mining algorithm i.e
number of cycles, candidate item set generation and
minimum support value.
Number of cycles
When compared with basic Apriori, in Improved Apriori
results came out to be positive, with each dataset number of
cycles are surely reduced but in case of filtered Apriori
results were variable where in student dataset cycles are
reduced and proved to be better than Improved Apriori. On
other hand in other two datasets cycles remain equal to basic
Apriori. As a result, Improved Apriori has reduced number
of cycles, thus making system efficient.
Number of cycles is also known as number iterations. Less
number of cycles means reducing effort and more memory
space and more resource allocation thus making the speed
slow. An ideal data mining Algorithm is one which
generates output with less number of iterations and speed
should be maximum.
Fig.2. No. of cycles performed
Number of candidates generated
Candidate Item sets are generated when each cycle is
executed during running of Apriori algorithm. Total number
of item sets is calculated by adding number of item sets
generated at each cycle. Observation from fig. 2, results was
similar as number of cycles performed, number of item sets
are almost reduced to half. Improved Apriori proved to give
best results. On other hand, Filtered Apriori has increased
number of itemsets to double, thus allocating more memory
space.
Fig. 3 . No. of item sets generated
Minimum Support
Third parameter Minimum Support value is increased by
nearly 0.05 in improved Apriori and in some cases it
remains same or increases in Filtered Apriori. Minimum
Support should be not too small as it does give desirable
results.
Fig. 4. Minimum Support value
IJTC.O
RG
Page 5
INTERNATIONAL JOURNAL OF TECHNOLOGY AND COMPUTING (IJTC)
ISSN-2455-099X,
Volume 3, Issue 1 January 2017
IJTC201701002 www. ijtc.org 7
VII. CONCLUSION
Best rules did not include reservation causes of less
importance and are ignored. Improved Apriori algorithm is
faster and better than basic Apriori as evidence number of
cycles performed and candidates generated is less.
Minimum Support value increases in Improved Apriori
resulting in more accurate rules. Filtered Apriori give
variable results for different datasets in comparison to basic
and Improved Apriori.
VIII. FUTURE SCOPE
The responses gathered from teacher and students are
confined to colleges having reservation criteria in particular
region. This research can be extended with wider area such
as different states and countries. Working with modern data
mining techniques such as Eclat algorithm, neural networks,
Preditive Apriori for better results.More parameters can be
added for comparison of Apriori algorithms such as
leverage, accuracy.
ACKNOWLEDGMENT
I want to express my gratitude towards my guide Mr. Rasbir
Singh who supported me and guided me through every
mistake which I committed during the writing of this paper.
And I am thankful to Mrs. Rupinder Kaur Gurm who gave
me precious time and gave me more clarity about the topics
I discussed. This paper would never be possible without
support of my parents and family.
REFERENCES
[1] P. Arcidiacono. "Affirmative action in higher education: How do
admission and financial aid rules affect future earnings"
Econometrica 73, no. 5, pp. 1477-1524, Sep 2005.
[2] S. Alon, and O. Malamud. "The impact of Israel's class-based
affirmative action policy on admission and academic
outcomes." Economics of Education Review 40, pp.123-139, Jun
2014.
[3] P. Arcidiacono, M. Lovenheim, and M. Zhu. "Affirmative Action in
Undergraduate Education." Annu. Rev. Econ. 7, no. 1, pp. 487-518,
Aug 2015.
[4] H. S. Cha, T.S. Yoon, K. C. Ryu, I. W. Shin, Y. H. Choe, K. Y. Lee,
J. D. Lee, K. H. Ryu, and S. H. Chung. "Implementation of Hospital
Examination Reservation System Using Data Mining
Technique." Healthcare informatics research 21, no. 2, pp. 95-101,
Apr 2015.`
[5] A. K. Shrivastava, and R. N. Panda. "Implementation of Apriori
Algorithm using WEKA.", KIET International Journal of Intelligent
Computing and Informatics, Vol. 1, Issue 1, January 2014.
[6] K. Padmavathi, and R. A. Kirithika. "Performance Based Study of
Association Rule Algorithms On Voter DB." International Journal
of Innovative Science, Engineering & Technology, Vol. 1 Issue 4,
June 2014.
[7] J. L. Balcázar, and F. Dogbey. "Evaluation of association rule quality
measures through feature extraction." In International Symposium on
Intelligent Data Analysis, pp. 68-79. Springer Berlin Heidelberg,
Aug 2013.
[8] S. Borkar, and K. Rajeswari. "Predicting students academic
performance using education data mining." IJCSMC International
Journal of Computer Science and Mobile Computing, ISSN, pp.
273-279, July 2013.
[9] J. Li, and O. R. Zaïane. "Combining usage, content, and structure
data to improve web site recommendation." In International
Conference on Electronic Commerce and Web Technologies, pp.
305-315. Springer Berlin Heidelberg, Aug 2004.
[10] S. S. Aksenova, D. Zhang, and M. Lu. "Enrollment prediction
through data mining." In 2006 IEEE International Conference on
Information Reuse & Integration, pp. 510-515. IEEE, Sep 2006.
[11] R. Agrawal, T. Imieliński, and A. Swami. "Mining association rules
between sets of items in large databases." In Acm sigmod record, vol.
22, no. 2, pp. 207-216. ACM, June 1993.
[12] I. S. Bamrah, and A. Girdhar. "Investigation on impact of reservation
policy on student enrollment using data mining." In 2015 IEEE
International Conference on Computational Intelligence and
Computing Research (ICCIC), pp. 1-5. IEEE, Dec 2015.
[13] Dr SK Jangir, "Reservation system and indian constitution-special
refrence to mandal commission." American International Journal of
Research in Humanities, Arts and Social Sciences , 2013.
[14] F. Ranadive, and A. Z. Surti. "Hybrid Agent Based Educational Data
Mining Model for Student Performance Improvement." International
Journal of Modern Communication Technologies & Research
(IJMCTR) ISSN: 2321-0850, Vol.-2, Issue-4, April 2014.
[15] D. B. Vaghela, and P. Sharma. "Students' Admission Prediction
using GRBST with Distributed Data Mining.", Communications on
Applied Electronics (CAE) – ISSN : 2394-4714 Foundation of
Computer Science FCS, New York, USA, Vol. 2 – No.1, June 2015.
[16] M. Bertrand, R. Hanna and S. Mullainathan, "Affirmative action in
education: Evidence from engineering college admissions in India,"
Journal of Public Economics, vol. 94, no. 1, pp. 16-29, 2010.
[17] A. K. Mishra, S. K. Pani, and B. K. Ratha. "Association rule mining
with Apriori and FP growth using weka.", 2 nd international
conference of science, technology and management, University of
Delhi (DU), New Delhi, India, Sep 2015.
[18] D. Bansal, and L. Bhambhu. "Execution of APRIORI Algorithm of
Data Mining Directed Towards Tumultuous Crimes Concerning
Women." International Journal of Advanced Research in Computer
Science and Software Engineering 3, no. 9, pp. 54-62, Sep 2013.
[19] M. Girotra, K. Nagpal, S. Minocha, and N. Sharma. "Comparative
Survey on Association Rule Mining Algorithms." International
Journal of Computer Applications 84, no. 10, Jan 2013.
[20] H. Haripriya, S. Amrutha, R. Veena, and P. Nedungadi. "Integrating
Apriori with paired k-means for Cluster fixed mixed data."
In Proceedings of the Third International Symposium on Women in
Computing and Informatics, pp. 10-16. ACM, Aug 2015.
[21] K. Lai and N. Cerpa. "Support v/s confidence in association rule
algorithms." In Proceedings of the OPTIMA Conference, Curicó.
Oct 2001
IJTC.O
RG