Learning from Ordinal Data with ILP in Description Logic Nunung Nurul Qomariyah and Dimitar Kazakov Computer Science, University of York, UK Presented on the 27 th InternaDonal Conference on InducDve Logic Programming Orléans, France, 4-6 September 2017
28
Embed
Learning from Ordinal Data with ILP in Description Logic · 2017. 9. 12. · Learning from Ordinal Data with ILP in Description Logic Nunung Nurul Qomariyah and Dimitar Kazakov Computer
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
several benefits of representing both data and models in alogic-based language, as this allows for the use of reasonertools that can infer logical consequences from a given knowl-edge data base. Our method employs an ontology reasonerto recommend items consistent with the user preference hy-potheses produced by ILP. This approach has the potentialto make suggestions about items that have never been ex-plicitly discussed with the user.
In summary, our contribution in this paper is as follows:• We propose a new approach to learning preference
from multi-attribute items for recommender systemsbased on Inductive Logic Programming (ILP).
• We propose a new architecture for knowledge repre-sentation and inference using Semantic Web Rule Lan-guage (SWRL) and an ontology reasoner.
• We also describe a way to tune the settings of ourlearning algorithm based on experiments with a realworld dataset in order to improve system performance.
We divide the rest of the paper as follows. In Section 2, weexplain the details of our proposed approach. We describethe dataset and the experiments carried out in Section 3.Then, we discuss the results in Section 4. We explore relatedwork in Section 5. Finally, we conclude and suggest futurework in Section 6.
2. PROPOSED APPROACHWe have guided our choice of learning algorithm by the
need for an expressive representation formalism learning al-gorithm capable of handling a variety of hypotheses based onthe user preferences, along with the desire to be able to learnrobust hypotheses from a limited number of examples andexpress the result in a human readable form. While otherresearchers [9] have used linear SVM to approximate userpreferences, we opted for the flexibility of Inductive LogicProgramming. In addition, the performance of our systemis boosted through the use of constraints on the range of hy-potheses considered, which reduces the time complexity ofthe learning task. We divide this section into four subsectionexplaining each step in more detail.
2.1 Problem FormalizationArguably, the use of data that genuinely reflects the user
preferences is essential for the success of any recommendersystem. Therefore we have opted for a form of knowledgeelicitation that minimises the subjectivity of the user’s repliesby limiting the complexity of the query asked and restrictingthe feedback provided to qualitative information alone. Inpractice, this is achieved through queries consisting of pairsof items along with their descriptions, where the user onlyneeds to select the better of the two items. Such pairwisecomparisons are used to learn which items will be classifiedas “Good”, i.e. ones that the user would consider buying.We use the user’s answers to classify the unlabelled dataand make a prediction about classes. We illustrate the gen-eral annotation process in Figure 1.
The figure shows how we derive conclusions about prefer-ences regarding individual attributes from data pairs of theform “Car 1 is-better-than Car 2”. The bold arrow repre-sents the annotation from the user and the dotted arrowsshow possible implications about individual attributes thatthe learning algorithm will consider. Note that in general,ILP makes it possible to compare combinations of attributes,e.g. hprice1, mileage1i vs. hprice2, mileage2i, through the
car 1 car 2better than
mileage 1
price 1
mileage 2
price 2
year 1
type 1
year 2
type 2
Figure 1: User annotation
use of appropriately defined relations (so called backgroundknowledge), but this aspect of ILP is not explored here. Theway in which we build hypotheses is explained in more detailin Section 2.2.
Definition 1 (Items). An item I is described by a set ofattribute names and their values: {A1 = v1, A2 = v2, . . . An
=vn
}.
Definition 2 (Comparison Pair). Given a set of itemsE, we define a comparison pair P as any (e, e0) 2 E⇥E, e 6=e0. We shall then refer to the variable represented by the at-tribute A of the first element of the pair as A
first
, whileA
second
will refer to the variable represented by the attributeA of the second element of the pair. The values of these vari-ables will be denoted as value(A
first
), resp. value(Asecond
).
Definition 3 (User Annotation). The annotation pro-vided by the user is binary: a given pair (e, e0) is given aclass label 1 if the user considers e to be better than e0, orthe class label is set to 0 if the user considers e0 to be betterthan e. The relationship better than represents strict in-equality, and the user is forced to choose between these twoalternatives. We therefore define a predicate C, such that:
C(he, e0i) =(1 if e is better than e0
0 if e0 is better than e.(1)
Definition 4 (Training Examples). The set of train-ing examples S consists of the union of all pairs he, e0i suchthat C(he, e0i) = 1, along with all pairs he0, ei such thatC(he, e0i) = 0.
Now we state our main learning task as:
Definition 5 (Learning Problem). Find a model T thatis consistent with the set of training examples S.
2.2 Learning AlgorithmWe use the annotated data as input for our learning al-
gorithm. We build an algorithm that searches the space ofpossible hypotheses starting from the most general hypothe-ses, i.e. the ones based on the least number of constraints,and progresses towards the most specific rule possible givenby a Progol-like bottom clause [8]. The di↵erence betweenthe ILP system Progol and ours is that Progol searches thehypothesis space in a greedy way, throwing away all pos-itive examples that are already covered by the hypothesis,while we derive all parts of the hypothesis in a cautious way,
Learning from Ordinal Data with ILP in Description Logic 3
bottom clause contains the conjunction of n constraints (of type class member-ship) on the Domain side, and same number of constraints again on the Rangeside of the relation. This will produce n ⇥ n possible pairs on the first level ofgeneralisation. (We have chosen not to consider hypotheses only constrainingone of the arguments.) We evaluate all combinations of constraints, except theones that imply the same class membership of both arguments (i.e. X is better
than Y because they both share the same property/class membership) and thosethat have already been considered. This is illustrated in Figure 1.
?(Manual u NonHybrid u SmallCar u Sedan) betterthan ((LargeCar u Manual u NonHybrid u Suv)
Fig. 1: Refinement Operator
We use a common ILP scoring function, P ⇥ (P �N), where P is the numberof positive examples covered, and N – the number of negative examples covered.In the case that the solution has the same score as another alternative, Alephwill only return the first solution found. In our algorithm, we consider all thenon-redundant hypotheses that are consistent with the examples (i.e. coveredzero negative and more than 2 positive). The search will not stop until all thepossible combinations have been considered.
If we have not found yet a consistent hypothesis, we continue to refine theone with the highest non-negative score, which means that we add a pair ofliterals to constrain each of the two objects in the relation. We stop at 2 literalseach for Domain and Range (this is the same as Aleph’s default clause lengthof 5). Similarly to Aleph, we also consider any examples where we cannot find aconsistent generalisation as exceptions. In this case, we add the bottom clauseas the consistent rule.
4 Algorithm complexity
We implement our algorithm in one of the DL family of languages, namely ALC(attributive language with complement) [14], the basic DL language which has
the least expressivity. ALC allows one to construct complex concepts from sim-pler ones using various language constructs. The capabilities include direct orindirect expression, e.g. concept disjointness, domain and range of roles, includ-ing the empty role.
The most expensive process is the membership checking part for every possi-ble hypothesis. This is used for scoring the hypothesis. For every single hypoth-esis the reasoner needs to check the coverage of each hypothesis. One possibleway to reduce the complexity is by minimising the search tree and checking theredundancy without reducing the accuracy.
5 Evaluation
Dataset. We use two publicly available preference datasets [3] [4]. Both thesushi and the car datasets have 10 items to rank which leads to 45 prefer-ence pairs per user. We take 60 users from each dataset and perform 10-foldcross validation for each user’s individual preferences. The car dataset has 4 at-tributes: body type, transmission, fuel consumption and engine size, while thesushi dataset has 7 attributes: style, major, minor, heaviness, how frequentlyconsumed by a user, price and how frequently sold. Despite the di↵erence in thenumber of attributes in the two datasets, we found that the maximum clauselength of 4 (in Aleph and in our algorithm) is su�cient to produce consistenthypotheses.
Evaluation method. The goal of this evaluation is to assess the accuracy of thepredictive power of each algorithm to solve the preference learning problem. Wecompare our algorithm with three other machine learning algorithms: SVM, theMatlab CART Decision Tree (DT) learner, and Aleph. SVM is a very commonstatistical classification algorithm that used in many domains. Similar work ofpairwise preference learning was performed by Qian et. al. [8] show that SVMcan also be used to learn in this domain. Both DT and Aleph can be includedin the evaluation since both of them are logic based learner, where the first is inpropositional logic and the latter is in First Order Logics.
We learn each individual preferences and test them using 10-fold cross vali-dation. The result is shown in Table 1 and Figure 2a. According to the ANOVAtest, the result shows that there is a significant di↵erence amongst the algorithmswith the p-value 2.0949⇥ 10�21 for the car dataset and 7.3234⇥ 10�36 for thesushi dataset.
Table 1: Mean and standard deviation of 10-fold cross validation testSVM DT Aleph Our algorithm
(b) Accuracy by varying number of training examples
Fig. 2: Evaluation results
We also perform several experiments with the algorithms by varying theproportion of training examples and test it on 10% of examples. For a morerobust result, we validate each cycle with 10-fold cross validation. The result ofthis experiments is shown in Figure 2b. We show that our algorithm still workbetter even with the smaller number of training examples.
Sample solutions found. Our algorithm can produce more readable resultsfor a novice user compared to Aleph. An example of consistent hypothesis foundby our algorithm is shown below:Automatic u Hybrid betterthan MediumCar u Suv
While Aleph produces rules, such as:betterthan(A,B) :-hasfuelcons(B,nonhybrid), hasbodytype(B,suv).
6 Conclusion and Further Work
In this paper, we have shown that the implementation of ILP in DL can beuseful to learn a user’s preferences from pairwise comparisons. We are currentlyworking to address the following limitations of our algorithm:
– We only consider one level class hierarchy in the ontology for simplicity. Inthe real world, the class hierarchy can be more complex.
– Currently, our algorithm uses the Closed World Assumption, which makesit easier to find a consistent hypothesis. This is not in line with the fact thatmost DL-based knowledge databases and their reasoners operate under theOpen World Assumption.