Automatic ingredient replacement in digital recipes ...flavourspace.com/wp-content/uploads/2018/06/Master... · Automatic ingredient replacement in digital recipes: combining machine

Automatic ingredient replacement in

digital recipes: combining machine

learning with expert knowledge

Swaan [email protected]

10437495

June 23, 2017

Master Information StudiesData Science

Faculty of ScienceUniversity of Amsterdam

Internally supervised byDr. Maarten Marx

University of Amsterdam

Externally supervised byDr. Vladimir Nedovic

Flavourspace

1

Abstract

Over the past years, people have become increasingly aware of their eating habitsand the corresponding effects on mental and physical health. This thesis arguesthat eating habits could be further improved by user empowerment, which could beachieved by the possibility of people adjusting recipes to their needs and preferences.One of the most prominent examples of recipe adjustment is ingredient substitutionwhen the proposed ingredient is unavailable or contains allergen substances. Thisthesis explores how to identify suitable ingredient substitutes in a broad range ofrecipes. The following two methods have been implemented and evaluated: thetraditional use of domain expert knowledge and data-driven methods of using wordembeddings. The main assumption of the word embedding method was inspired bythe distributional hypothesis in linguistics, namely that words that are often usedin similar contexts tend to convey similar meanings. In the context of cooking,this hypothesis suggests that ingredients consumed in similar recipes are likely tobe suitable substitutes of each other. The two methods were extensively evaluatedwith a user test, which resulted in an adapted and therefore better performance ofthe expert knowledge model.

1 Introduction 2

1 Introduction

1.1 Motivation

Over the past years, there has been an increasing awareness of the effectsof eating habits on mental and physical health. People, but also companiesand the government, are more and more concerned with consuming andpromoting ’healthy’ eating standards. Improvement of eating habits couldbe further encouraged by user empowerment. This thesis argues that userempowerment could be achieved by providing people the possibility to easilyadjust recipes to their needs and preferences.

Studies show that home cooking is a complex process. Recipe choice andfood consumption is influenced by different factors such as time, food pref-erences, allergies, eating cultures, cooking equipment and dependency uponseasonal products. A way to simplify cooking and enhance user empower-ment is to provide different substitutes for ingredients that are for exampleunavailable or contain allergen substances. However, how can we offer theright substitute for certain ingredients in such a complex process?

Flavourspace (FS), an Amsterdam based company, is currently workingon this issue. This company aims to improve cooking practices using ar-tificial intelligence. In collaboration with Flavourspace, this research hasresulted in an ingredient substitution system to enhance their recipe searchengine. This engine assists users in finding a fitting recipe, which could befurther improved by making the recipes adaptable, so customers can easilychange ingredients to their own tastes. If a user for example dislikes co-riander, the system can offer a substitute ingredient such as parsley. Otherapplications of this could be if the user wants to reduce calories the caloriesof a certain dish or prefers a vegetarian or vegan version.

This research has used two methods for obtaining ingredient substitutes,namely using expert knowledge (EK) from a domain expert and the data-driven method of word embeddings (WE). The method of expert knowledgeis based on the implicit and explicit substitution rules derived from the ex-pert knowledge source Cook’s Thesaurus, which will be explained later on.Using word embedding to extract substitution rules is inspired by the distri-butional hypothesis in linguistics: words that occur in the similar contextstend to convey similar meanings. When applying this hypothesis on cook-ing practices, ingredients used in similar recipes are likely to be a suitablesubstitute of each other. The two methods are evaluated with a user testand will be combined in an ingredient substitution system.

2 Related work 3

1.2 Research Question

The design of the ingredient substitution system is based on the followingquestion:

– How can we develop a system that effectively determines ingredientsubstitutes for recipes based on a combination of word embeddings andexpert knowledge?

. To answer this question, this research has proposed the following sub-questions:

1. Can the expert data also support implicit substitution rules, next to theexplicit substitution rules defined by the expert?

2. Which similarity formula can best be applied when calculating the dis-tance between two vectors in the word embedding model, intending torank the vectors to find ingredient substitutes?

3. Which of the following systems results in a better performance: theexpert knowledge system or the word embedding system?

1.3 Overview

The following section 2 provides an extensive literature review on earlierstudies on this issue. Based on this theoretical background, several exper-iments are designed in which the two ingredient substitution systems areevaluated. Section 3 is an in-depth explanation of the methodology of thesubstitution system and the designed experiments. The results of these ex-periments are discussed in Section 4. Finally, Section 5 presents a conclusionbased on the outcome of these experiments.

2 Related work

2.1 Ingredient substitution in recipes

Earlier studies have explored different methods for obtaining ingredient sub-stitutes in recipes. One method used to extract substitute ingredients fromrecipes is the use of network analysis [37]. The substitute network is de-rived from user generated suggestions for modifications. This substitutionnetwork is then further used to uncover novel recommendation algorithmssuitable for recipe recommendations. However, it has not been evaluated onitself.

2 Related work 4

Another approach to extract substitutes is with the use of a statisticalmodel [5]. This method shows that topic densities of recipe document’smixtures can be used for ranking candidate substitutes from expert gener-ated rules. The model uses a topic space in which recipes are placed andrankings are generated by considering the closest topics according to theKullback-Leibler divergence.

Thirdly, Shidochi et al. [35] proposed an algorithm to extract substi-tution ingredients from recipes. Their method was based on matching thecooking practices with the corresponding ingredient. According to theirhypothesis, substitutable ingredients are subjected to the same processingmethods.

Furthermore, other research explores the effectiveness of using ontologiesin improving ingredient substitution [9, 2, 7, 15, 41, 2]. The first prelimi-nary results of this research were positive, however, this method would needa more extensive evaluation. Additionally, it is a very costly method toimplement.

To the best of our knowledge, this research is the first to extensivelyevaluate the used methods and conduct an extensive user test. Also, usingword embedding, the substitution relationship is directly identified from thedata instead of relying on external knowledge sources [5, 37, 9, 2, 7, 35].This is a very cost-effective approach due to its reliance on data rather thanhuman expertise.

2.2 Expert knowledge

Expert knowledge is used in a wide range of scientific domains like ecologicalresearch[18, 40], hydrology [8] and medics[1]. It provides a valuable sourceof information, often because of the complexity of problems or lack of data.Expert knowledge is defined as “the substantive information on a particulartopic that is not widely known by others” [25].

With the help of the knowledge-representation theory, expert knowledgecan be transformed into a representation such that is understandable forcomputer systems. In knowledge representation, different types of knowl-edge are defined, such as objects, events, procedures, relations, mental statesand meta knowledge [33]. These types provide different sorts of information.A specific type of object information is category and subcategory informa-tion. From this categorical information, an taxonomy can be constructed inwhich all categories and subcategories are structured. By identifying specificproperties of the objects and relations between the objects, the researchercan construct an ontology, resulting in the substitution system as mentioned

2 Related work 5

above [5, 37, 9, 2, 7, 35].In regard to cooking practices, expert knowledge is most often used as

source for ingredient substitute information. In recipes provided by recipebooks or websites, authors often provide complementary or possible replace-ments of the proposed ingredients. The quality of such an substitute dependson the subjective factor of taste. Until now, there has been no prior researchon the satisfaction of users with the offered substitutes in recipes.

2.3 Word embedding

Vector based word representations have a long tradition of usage in theNatural Language Processing (NLP) research community. The vectors areused to compute similarity between terms, but can also be used as repre-sentational basis for downstream NLP tasks like clustering, POS tagging,classification and sentiment analysis. Recently, there has been an increasingusage of word embeddings as an input in machine learning tasks [29, 20, 3].

Word embeddings are based on the idea that contextual informationconstitutes a representation of linguistic items. This idea is derived fromthe distributional linguistic hypothesis: words occurring in similar contextstend to have similar meanings [14]. This hypothesis found its origins inseveral published works the 1950s by Zelling Harris, John Firth, and LudwigWittgenstein [14, 12, 39] and has been previously explored by Flavourspaceusing factor analysis and topic modelling [5].

A popular training technique for word embeddings, word2vec [30], con-sists of using a 2-layer neural network that is trained to identify a certainword in relation to its context. A neural network can’t be feed with wordsstrings, so the word should be transformed so that the network can makesense of it. Every unique word in the training corpus is represented as aone-hot vector. This vector will have the length of the number of uniquewords in the corpus. It will have a ”1” in the position of the correspondingword, and 0s in all other positions. The output of the network is a singlevector with the length of the number of unique words in the training corpus.If two different words are used in similar contexts - that is, when the samewords are likely to appear in the same setting - the model will output similarresults for these words.

Word2vec has been widely used because it results high accuracy on awide range of tasks [32] and is robust across a wide range of semantic tasks[34]. The aim of this thesis is finding ingredient substitutes with word2vec.Although this exact task has never been examined before, there have beenexperiments with finding lexical substitutions with word embeddings which

3 Methodology 6

led to positive results [27, 26, 4, 17].A very fascinating property of word2vec is the ability to capture lin-

guistic relationships of both semantic (king to man is like queen to woman)and syntactic nature (ran to run is like laughed to laugh) [29, 22]. Analog-ical reasoning is therefore a promising line of research, since it can be usedfor many tasks like word sense disambiguation [11], morphological analysis[19], semantic search [6], and even for broad-range detection of both mor-phological and semantic features [21]. However, it remains unclear to whatextent word embedding models are able to capture relations between words.Research has shown that derivational and lexicographic relations such assynonymy remain a major challenge [16, 13]. This thesis examines whetheranalogical reasoning improves the task of finding ingredient substitutes.

3 Methodology

Figure 2 illustrates the flowchart of the proposed method of this research.The following Section 3.1 will first explain the data that has been used.

The logic combiner, where all input comes together, ranks different sub-stitutions and eventually outputs an ordered list of substitutes. The firstinput in the logic combiner consists of the the substitution rules derivedfrom the domain expert knowledge. The methodology of this data will bedescribed in Section 3.2. This section also describes the methodology ofthe conducted experiments to find implicit substitution rules, next to theexplicitly defined rules by the expert, addressing first sub-question proposedin this research.

The second input consists of the word embedding substitution rules, de-rived from WE models and explained in Section 3.3. An experiment hasbeen conducted to determine which combination of model and a similaritymeasure performs best, addressing sub-question 2. A more detailed explana-tion of the logic combiner is given in Section 3.4. Finally, a user test has beenperformed to decide how to rank potential substitution candidates withinthe logic combiner. The outcome of the user test will answer the third sub-question of this research. An explanation of the user test evaluation methodis given in Section 3.5.

3.1 Data

For this research, Cooks Thesaurus (CT) will function as the source of expertknowledge. CT is a cooking encyclopedia that contains information about

3 Methodology 7

Fig. 2: Flowchart of the proposed methodology. In the left low corner of thegrey boxes, the corresponding section number can be found.

2373 ingredients. The data has been crawled, parsed and tokanized for thisthesis.

The CT proposes possible substitutes for certain ingredients, often ac-companied with a note on how this substitute should be prepared and wouldinfluence the end result of the dish. It also provides synonyms for the ingre-dient. An example CT substitute can be found in Figure 3. As explainedin Section 2.2,there are different types of knowledge of which various canbe found in the CT. The first one is object knowledge, which consists ofboth specific and general object knowledge. Specific object knowledge arethe notes given on a certain ingredient. General object knowledge are thecategory and subcategory an ingredients belongs to. For example, chili beanis of the subcategory dry beans. Dry beans is again a category of legumes& nuts. In this manner a taxonomy, a system of categories and subcate-gories, is constructed. Next to the object information, the main reason touse CT as source of information is the relational information between theobjects it provides. Object relations that can be derived from the expertdata are substitute relationships and synonym relationships. For example;SubstituteOf(pinto bean, chili bean) and SynonymOf(pink bean, chili bean).

3 Methodology 8

Fig. 3: Screenshot of information provided by CT for chili bean, for furtherreferences see: http://www.foodsubs.com/Beans.html.

The second data source used is the collection of 112 000 recipes fromFlavourspace, used to train the word embedding model. The collection isobtained by scraping and parsing recipes from different websites (i.e. All-recipes, Epicurious, MyRecipes, Jamie Oliver, and others) and are repre-sented as lists of ingredients. For example, a Caesar salad would be repre-sented as: tuna, vegetable oil, kosher salt, black pepper, tomato, egg, ciabattabread, mayonnaise, olive tapenade.

3.2 Expert knowledge based substitutions

The ingredients and the corresponding substitute relationships form an in-gredient network, with the ingredients as nodes and the substitute rela-tionships as edges. The ingredient network is based on ingredient relationsderived from the expert knowledge and consists of directed and undirectededges. If ingredient A is a substitute for ingredient B, there will be an edgegoing from A to B. Some ingredient pairs are linked in both ways, thusdirected edges are going in both ways. Therefore, these could also be inter-preted as an undirected edge. For other pairs, the relationship is only givenin one direction. The number of edges going into a node is known as theindegree of the corresponding node and the number of edges coming out ofa node is known as the outdegree.

3.2.1 Explicit substitution rules

The expert provides substitutes for certain ingredients: the explicit substi-tution rules. CT provided substitution rules for 2373 unique ingredients and14536 explicit substitution rules.

Because a domain expert has written the explicit substitutes relations,they are considered to be very reliable. The explicit substitution rules that

http://www.foodsubs.com/Beans.html

3 Methodology 9

can be derived from the example given by Figure 3, can be seen in Figure 4.

Fig. 4: Substitution rules derived from the explicit substitution rules of chilibean, given in Figure 3.

The first column contains the categories of the ingredient, starting withthe main category and ending with the subcategory. The second columncontains the ID of the ingredient. As mentioned before, an ingredient canhave multiple names. To prevent offering synonyms to the user, the firstingredient mentioned by the expert is taken as ID, because it is the mostcommon name used. The third column contains the ingredient name of allsynonym names given. In the fourth column the substitute ingredients arelisted. The last column contains the type of ingredient, so for example theexplicit substitution rule, implicit substitution rule or word embedding sub-stitution rule. All ingredients are lemmatized to simplify future comparison.So, if the expert would have given the substitute pinto beans, lemmatizationwould have turned it into pinto bean.

3.2.2 Implicit substitution rules

Other substitution rules can be derived by implicit substitution rules. Therehas been no proof for these implicit substitution rules in cooking literature.Various experiments were conducted as part of the research in order todetermine whether or not to apply these implicit substitution rules. Theseexperiments answer our first sub-question.

The first implicit substitution hypothesis is based on symmetry. Theapplication of this rule would mean that all nodes who have either a directededge from A to B, from B to A or in both ways, would get an undirectededge. This would turn the graph in an undirected graph. To determinewhether or not to apply this rule, an experiment is designed to validate thison the CT data. For each node v, the amount of edges of a substitute in both

3 Methodology 10

direction, deg−+(v), is divided by the indegree, deg+(v). When referring tothe symmetry of the whole ingredient graph G, the average symmetry of allnodes N in the network is calculated using the following formula:

S(G) =1

N

N∑v=0

deg−+(v)

deg+(v)

The second implicit substitution hypothesis is based on triadic closure[10]. The principle of triadic closure applied on this ingredient networkentails that if two ingredients in the ingredient network have an substituterelation to a certain ingredient in common, there is an increased likelihoodthat they also have an substitute relation. A method to measure the presenceof this triadic closure is the clustering coefficient [31, 38]. The clusteringcoefficient of node A is defined as the probability that two randomly selectednodes that have an edge with A will also have an edge with each other.The clustering coefficient of a node ranges from 0 (when none of the nodessubstitutes are substitutes from each other) to 1 (when all of the nodessubstitutes are substitutes from each other). For note v, let λG(v) be thenumber of triangles on v ∈ G for undirected graph G. That is, λG(v) is thenumber of subgraphs of G with 3 edges and 3 vertices, one of which is v.Let τG(v) be the number of triples on v ∈ G. That is, τG(v) is the numberof subgraphs with 2 edges and 3 vertices, one of which is v and such that vis incident to both edges. Then, the clustering coefficient is defined as:

Ci =λG(v)

τG(v).

When referring to the clustering coefficient of the whole ingredient graph G,the average clustering coefficient of all nodes in the network is calculated as:

C(G) =1

N

N∑v=0

C(v)

When applying the rule of triadic closure, all the missing edges would beadded to the database. To determine if this rule should be applied, onecalculates the average clustering coefficient over all of the nodes in the net-work. This calculation is applied on the directed graph based on the explicitsubstitution rules and on the undirected graph constructed after applyingrule one.

The third rule is based on the taxonomy that can be derived fromthe Cook’s Thesaurus. The taxonomy of CT is the classification and sub-classification of ingredients in an ordered system. Distance through the tax-onomy denotes a form of gastronomic similarity. The assumption holds that

3 Methodology 11

ingredients in the same category, e.g. types of nuts or types of greens, have ahigher probability of being interchangeable [9, 2, 7, 15, 41, 2]. To determinethe strength of this assumption in the case of CT, the taxonomic strengthis calculated. To calculate the taxonomic strength of node A, the incomingedges from nodes with the similar subcategory as node A, deg+similar(v), isdivided by the total indegree, deg+(v).

The substitutions of that ingredient with the same category as the in-gredient are divided by the total amount of substitutions of that ingredient.When referring to the taxonomic strength of the whole ingredient graph G,the average taxonomic strength of all nodes N is calculated as followed:

C(G) =1

N

N∑i=0

deg+similar(v)

deg+(v)

3.3 Word embedding based substitutions

Two models are used in order to obtain ingredients through word embedding.The first word embedding model is based on the 112 000 ingredient lists ofrecipes mentioned before and is referred to as the Recipe model.

The second word2vec model is trained by Google and is freely available[30, 29, 28]. The dictionary is based on a 100 million word corpus fromGoogle News. Using this dictionary may seem like an surprising choice sincethis research only needs very specific information on ingredients. However,Google News also offers a lot of recipes. It is not possible to look into theexact corpus the Google model uses, but because the corpus is so extremelyextensive, we assume that it will contain the information needed for thistask.

To find ingredients that are substitutes of each other, the two abovemodels are compared. For calculating similarities between vectors, threepossible similarity measures are implemented, explained in Section 3.3.1.The performances of the combination of the two models and three similaritymeasures is evaluated with three evaluation metrics, see Section 3.3.2.

3.3.1 Similarity measures

To find substitutes for ingredient A, first the vector location of A is obtainedfrom the word embedding model. Possible substitutes are found by findingvectors that are similar to vector A. For calculating similarity, differentformulas are used. Finally, the output is a list of vectors, ranked from highto low similarity.

3 Methodology 12

The first similarity measure is cosine similarity. For ingredient A, possi-ble substitutes B are calculated with the following cosine similarity formula:

cos(A,B) =

n∑i=1

AiBi√n∑

i=1A2

i

√n∑

i=1B2

i

The other two proposed similarity measures are based on the linguisticregularities, explained in Section 2.3. The idea is that words which have aspecific relation, will have a similar offset in the vector space. This is stilla relatively new idea. This thesis further examines whether this holds forsearching ingredient substitutes.

To calculate relational similarity, a word pair is needed of which the offsetresembles the ingredient substitute relationship. The method of finding thisword pair is to look at the cosine similarity of all the ingredient substitutionpairs proposed by the domain expert. Finally, the top hundred pairs withthe highest cosine similarity are selected.

Given the analogy ’A is to B like C is to D’, the following measureis proposed for finding a possible substitute D for ingredient C, given thepredefined substitution pair (A, B). The first formula, CosAdd, proposedby Mikolov et al. [29], calculates the similarity and ranks according to thefollowing formula:

CosAdd(A : B,C : D) = cos(B −A+ C,D)

CosAdd measure can be deconstructed into the summation of three cosinesimilarities, where in practice one of the three terms often dominates thesum. To overcome this bias, Levy and Goldberg [23] proposed the CosMulformula:

CosMul(A : B,C : D) =cos(B,D) cos(C,D)

cos(A,D) + ε

To determine whether the Google or Recipe model should be used andwhich similarity method then should be applied, three quantitative perfor-mance metrics are evaluated. For CosAdd and CosMul, the metrics arecalculated for hundred selected word pairs.

The outcome of this evaluation shows if using linguistic regularities isan improvement compared to using the cosine similarity. Additionally, italso shows which word pair performs best and which of the two modelsperforms best. The model that performs best will be used in the user testand implemented in the logic combiner.

3 Methodology 13

3.3.2 Quantitative performance evaluation

As metric for evaluating the different similarity methods, recall, precisionand Mean Average Precision (MAP) are used. The substitution rules derivedfrom the expert knowledge are used as the ground truth. For this task,precision is the fraction of retrieved substitute ingredients that are suitablereplacements. Precision could be defined with the following formula:

precision =|{relevant documents} ∩ {retrieved documents}|

|{retrieved documents}|

Recall is the fraction of the relevant ingredients that are successfully re-trieved, calculated with the following formula:

recall =|{relevant documents} ∩ {retrieved documents}|

|{relevant documents}|

Precision and recall are single-value metrics based on the whole list ofingredients returned by the system. The system implemented in this thesisis a ranked system according to the measure of similarity between the queryingredient vector and the found substitute ingredient vector. For systemsthat return a ranked sequence, it is desirable to also consider the order inwhich the returned documents are presented. That is why MAP is imple-mented.

When calculating the MAP, one first needs to calculate the average pre-cision for each query. This is defined as the mean of the precision at K valuescomputed after each relevant document was retrieved. The final MAP valueis defined as the mean of average precision of all queries in the test set.

ap@n =

n∑k=1

(P (k) ∗ rel(k))

min(m,n),where rel(k) =

{0

1

Whereas P (k) refers to the precision at cut-off k in the item list, rel(k)is an indicator function equaling 1 if the item at rank k is a relevant doc-ument and zero otherwise. This summation is divided by the minimum ofall relevant retrieved documents retrieved (m), or the number of predictedingredients (n). The mean average precision for N ingredients at position nis the normalized average precision over all ingredients:

MAP@n =

N∑i=1

ap@ni

N

3 Methodology 14

For this task, precision is more important than recall. The system doesnot have to return all possible substitute ingredients, as only the top twoare used. It is most important that the correct substitutes are offered at thetop of the rank, making MAP the most important evaluation metric. Forthis reason, all metrics are considered up to rank two (@2).

In addition to the ingredient level, evaluation is performed at a levelof sub-categories. This evaluation is based on the taxonomy hypothesisdescribed in 3.2.2. For example, instead of the expert ground truth thatchili bean can be replaced by rattlesnake bean, it now could be substitutedby all ingredients from the subcategory dried beans, because rattlesnake beanbelongs to the subcategory dried beans.

3.4 The logic combiner

The final ingredient substitute is determined using a logic combiner, whichranks methods according to their reliability. The reliability of EK and WEapproaches is, in turn, determined by a user test.

3.5 User test evaluation

A user test is conducted to decide if the expert knowledge and word em-bedding substitutes are good enough to use in a system. And, if so, whichmethod preforms better. This user test is designed with help of professionalmarket researchers and extensive testing.

The user test consist of seventeen questions. The first two questions arefocused on assessing the user’s cooking level and cuisine preferences. Theother 15 questions, of which Figure 5 is an example, are used to evaluatethe two methods.

For each of the four proposed substitutes, the user rates on a scale from1 to 6 the extent to which they think the ingredient is a suitable substitutefor the missing ingredient (1 = I would never use this as a replacement, 6 =This would be a perfect substitute for the missing ingredient). Two of thesesubstitutes are provided by the EK model, and two by using the WE model.In the example, mustard and roasted garlic are proposed by the EK modeland cream cheese and sour cream by the WE model.

The supermarket case is used for various reasons. Firstly, it is a contextall users know. Providing a clear context to respondents improves the accu-racy of the answer, because respondents rely less on their own imagination[36]. Secondly, it excludes a lot of side factors as the proposed ingredientsare all available in the supermarket. Beforehand, the user is also asked to

3 Methodology 15

Fig. 5: Example question of the user test for the ingredient mayonnaise,asking the user to rate four possible substitutes of which two arederived from the WE model and tow from the EK model.

assume that all ingredients are equal in price. In this manner, the user testtries to capture the effect of the quality of the substitution ingredient andexternalize other influences like money or availability. Furthermore, userscan choose the don’t know option if they are unfamiliar with the product.In this manner, the results show if one of the methods proposes a lot ofrelatively unknown ingredients, indicating a lower usability of the method.

When all the results are collected, the average rates of the answers arecalculated. If assuming that users randomly answer questions, the proba-bility mass function for the rating of A will be:

P (Ai = x) ={

16 , x ∈ {1, 2, 3, 4, 5, 6}

This results in an expected mean of µo = E(Ai) = 3.5 and expected varianceof σ2o = V ar(Ai) = 2.917. The hypotheses of this experiment will thereforebe H0 : µ = 3.5 and H1 : µ 6= 3.5.

Since the sample of respondents will be large, the results should be ap-proximately standard normal under the null hypothesis if the given rates ofthe respondents are random. Conducting a two-tailed Z-test, the p-valueis calculated. Using α = 0.05, a p-value smaller then 0.05 results in therejection of H0, which would imply that the results are not random and canbe meaningfully interpreted.

4 Results 16

When calculating the mean, variance and p-value per question, the scoreof that question for the WE model is based on all the ratings from the useron the two answers obtained from the same model. So in the examplequestion, the ratings of mustard and roasted garlic are combined. The don’tknow answer is not taken into account, when determining the N used inthe Z-test. When calculating the overall mean, variance and p-value, theratings from all 15 questions on the answers proposed by the same modelare combined.

4 Results

4.1 Expert knowledge experiments

To determine whether to apply the symmetry rule on the expert data, thisresearch used the experiment as explained in Section 3.2.2. The result ofthis experiment is an average symmetry coefficient of 0.43. This numberimplies that on average, for almost halve of the substitute relations of aningredient, also the reverse substitution relationship is given by the expert.When manually checking the cases where the assumption does not hold, themissing substitute is often a less known ingredient that is not offered byCT, probably because the expert suggests more common ingredients. Whenapplying this rule on the data-set, 3831 new substitute relationships can beadded. These 3831 new substitute relationships hold the substitute rules for689 ingredients.

For the second experiment, concerning triadic closure, the average clus-tering coefficient is calculated. The average clustering coefficient, calculatedon the original directed graph based on CT, is 0.216. After applying the firstimplicit substitution rule, which turns the graph in an undirected network,the average clustering coefficient becomes 0.469, locating 1587 triangles. Anexample representing a small part of the ingredient network can be found inFigure 6. Alphonso olive node has a clustering coefficient of 1 because thetwo neighbours are also connected with each other. Kalamanta olive has acluster coefficient of 2

6 because of the six possible connections between theneighbours, only two are actually made.

The third experiment is based on the taxonomy of the ingredients. Theaverage taxonomic strength calculated with the formula presented in 3.2.2,is 0.72 when looking at the subcategory. At the category level, this is 0.877.This implies that on average, 72 to 87 percent of the substitutes of an giveningredient, have the same subcategory or category as that ingredient.

Finally, only the first rule concerning symmetry is applied to the data

4 Results 17

Fig. 6: Example of the average clustering coefficient of different types ofolives. Alphonso olive haves a clustering coefficient of 1, and theKalamanta olive of 2

6 .

because new ingredients can be added to the database and applying this ruleturns the network in an undirected graph. The second and third rule werenot applied because of the following two reasons: firstly, no new ingredientswould have been added to the database and secondly, applying these rulesmade the ingredient network too generic. The high score on the experimentbased on taxonomy did provide a base for calculating the earlier proposedevaluation metrics on the level of subcategories.

4.2 Experiments on word embedding models and similaritymeasures

The results of the MAP, Recall and Precision evaluated on the level of theingredient and the level of the subcategory can be found in the appendixA.1. As explained before, MAP is considered the most important metric,therefore those results can be found in in Table 1.

The highest results for all performance metrics are achieved by theGoogle model while applying the CosAdd formula. The word pair usedin this similarity measure is: cointreau - curacao, both fruit liqueurs. Otherword pairs that scored high on the evaluation metrics are; button mushroom- oyster mushroom, black mustard seed - brown mustard seed, white choco-late - milk chocolate. Overall, the metrics do not score very high. Whenanalyzing the metrics at the level of the subcategory, an improvement canbe found. In both cases, the highest score is achieved by using the Googletrained model and using the CosAdd formula to calculate similarity. There-fore, these settings are used in the user test.

4 Results 18

Tab. 1: Results of Mean Average Precision of the Google and Recipe model,using three different similarity measures.

(a) MAP applied on the ingredient level

CoSim CosAdd CosMul

MAPGoogle 0.047 0.055 0.055

Recipe 0.031 0.050 0.045

(b) MAP applied on the subcategory level

CoSim CosAdd CosMul

MAPGoogle 0.208 0.229 0.222

Recipe 0.174 0.173 0.166

(a) Zoom-in on the representation varioustype of onions.

(b) Zoom-in on various types of rice andthe ingredient red lentil.

Fig. 7: A 2-dimensional visualization of Recipe word2vec model, constructedwith the t-SNE algorithm. This algorithm is developed for visualizinghigh dimensional data [24].

4.3 Results user test

In the user test, there is a total of 333 respondents. Each respondent an-swered 15 questions consisting of 4 proposed substitutes, two obtained fromthe expert knowledge and two from the word embedding model, resulting in333*15*4 = 19980 answers.

The result of the combined ratings of the 15 questions can be found inTable 2. When analyzing the overall mean, the EK model performs slightlybetter than the WE model as the WE model scores slightly below 3.5 andthe EK almost an average rate of 3.6.

The variance of the WE model is 0.05 higher then the EK model. For

4 Results 19

Tab. 2: Results of the user test when all ratings from the 15 questions aresummed per model. * means significant with α = 0.05, ** meanssignificant with α = 0.1. DK explains the percentage of respondentsthat did not know the proposed substitute.

Expert knowledge Word embedding

Mean Var DK p-value mean var DK p-value3.59 3.08 18.0% 0.007** 3.44 3.13 19.9% 0.093*

both models, the variance is high, which indicates that the data points arespread out from the mean and from one another. This could could be causedby the subjectivity of a taste judgment, leading to strong differences betweenthe rates of the substitutes. The ingredients proposed by the expert wereunknown for 1274 times, which is 18.0% of the cases. For the WE model,this is 19.9%. In this respect, the usability of the expert knowledge modelis slightly better.

The results of the user test showed per question can be found in Table 3.For each question, the mean, variance and p-value is calculated. Althoughthe overall mean of the EK model is higher, the WE model scores betterat the average mean per question due to its higher variance. Of 15 ques-tions, the mean of the WE was higher in eight questions. Four times, bothmodels had a mean below 3.5, which is considered insufficient. For bothmodels, 10 of the 15 calculated means are significant. Concluding, there isno model obviously outperforming the other model, but the EK model per-forms slightly better than the WE model because the overall mean is higherand the percentage of Don’t know is lower.

5 Conclusion 20

Tab. 3: Results of the user test per individual question. * means signifi-cant with α = 0.1, ** means significant with α = 0.05. DK refersto the percentage of respondents that did not know the proposedsubstitute.

Expert knowledge Word embedding

mean var DK p-value mean var DK p-value

q1 2.90 2.67 43.2% 0.001** 3.32 3.02 25.8% 0.250

q2 2.42 2.45 18.9% 0.000** 2.71 2.92 15.0% 0.000**

q3 2.99 2.85 36.3% 0.003** 3.77 2.88 12.7% 0.059*

q4 3.72 2.53 19.3% 0.139 3.04 3.19 34.5% 0.005**

q5 3.50 3.01 11.7% 0.992 2.74 2.34 13.6% 0.000**

q6 2.88 2.84 7.4% 0.000** 3.78 2.78 8.9% 0.049**

q7 3.01 2.88 30.9% 0.003** 3.71 3.23 28.8% 0.190

q8 4.75 1.76 3.2% 0.000** 4.14 3.26 4.4% 0.000**

q9 3.77 2.95 16.5% 0.064* 3.96 2.77 28.9% 0.002**

q10 3.45 2.60 20.8% 0.719 4.31 2.29 5.5% 0.000**

q11 4.02 2.67 12.7% 0.000** 2.57 2.43 20.6% 0.000**

q12 3.88 2.90 13.6% 0.008** 2.89 2.45 29.7% 0.000**

q13 3.52 2.79 8.5% 0.873 3.48 2.84 10.0% 0.865

q14 3.33 2.75 20.1% 0.271 3.46 3.32 39.8% 0.826

q15 4.95 1.18 7.0% 0.000** 3.48 3.30 30.1% 0.936

5 Conclusion

This research aimed to answer the following main question:

– How can we develop a system that effectively determines ingredientsubstitutes for recipes based on a combination of word embeddings andexpert knowledge?

. This question is answered by the following sub-questions:

5 Conclusion 21

1. Can the expert data also support implicit substitution rules, next to theexplicit substitution rules defined by the expert?

2. Which similarity formula can best be applied when calculating the dis-tance between two vectors in the word embedding model, intending torank the vectors to find ingredient substitutes?

3. Which of the following systems results in a better performance: theexpert knowledge system or the word embedding system?

To answer the first sub-question, three experiments were designed to testthe three implicit substitution rules. This resulted into the implementationof the first rule based on symmetry. The two other rules, based on triadicclosure and taxonomically, were not applied. While evidence on the rules wasfound in the CT data, the application of these rules caused the network tobecome too generic and no other positive effects, like adding new ingredients,were found.

In order to answer the second sub-question, the performance of threesimilarity measures and two word embedding models were evaluated. Thebest results were achieved when applying the model trained by Google andusing the CosAdd similarity formula, showing that the implementation ofanalogical reasoning improves the results of the substitute search task.

A user test was conducted in order to determine which model performedbest, answering the third sub-question. The user test showed that the ex-pert knowledge model performed slightly better. The overall mean of the EKratings are slightly higher then the WE ratings. Yet, the WE model scoresbetter at the average mean per question due to its higher variance. How-ever, not all results were significant and the WE model also had a higherpercentage of substitutes that were unknown by the user. Concluding, ifthe EK model and the WE model suggest different substitutes, this researchshows that the EK model should be applied first. The final flow of the logiccombiner is visualized in Figure 8.

5.1 Discussion

Conducting a user test, both the results and the proposed questions shouldbe critically examined. Firstly, because the context of the Flavourspacerecipe search engine may differ from the context of choosing a substitutein the supermarket, extensively A/B testing the questionnaire can add todeveloping a more precise context and formulation of the questions.

5 Conclusion 22

Fig. 8: Final flow of the logic combiner. The ingredient that the user wantsto substitute is the input of the logic combiner. The output is aranked list of possible substitutes.

Secondly, the vast majority of the user test respondents are Dutch citi-zens and the expert knowledge is derived from an American website. Thesetwo factors could possible lead to an bias towards the traditional North-ern American cuisine and other similar cuisines like the Northern Europeancuisine. This bias should be taken into account when using the substitutesystem.

Lastly, a reason why the Google model outperformed the Recipe modelmight be explained by the training corpus of Google, which is much largerthen the training corpus of the Recipe model. If a certain word is notfrequently used in the corpus, the probability in the model of that wordis going to be lower anyway, irrespective of the actual substitute quality.Extending the Recipe corpus with more ingredient lists will improve thequality of the Recipe model.

5.2 Future work

The quality of a substitute ingredient also depends on the recipe it is usedin. In these models, the substitution rules are determined based on theingredient that should be replaced. Further research is needed to explore howthe recipe could be taken into account when ranking the final substitutes.

Another improvement of the model is to incorporate more generic usercharacteristics, like vegetarianism or lactose-intolerance. Using the taxon-omy derived from the expert knowledge, the ingredient tree can exclude allcategories and subcategories of meat or dairy. To incorporate this into theword embedding model, the similar theory of linguistic relations could beused.

Furthermore, the logic combiner could be developed in more detail. Adifferentiation can be made between implicit and explicit expert substitu-

5 Conclusion 23

tions of which both the individual quality could be tested.Also, the final decision of the user in the Flavourspace engine in regard to

the substitute they prefer to use can also be incorporated in the system. Overtime, the system would be able to improve itself based on user judgments.

Lastly, the system could also provide the subcategory of the substitutes,instead of the specific substitute itself. Instead of offering two types of beansas a substitute for Chili beans, the system would then offer a type of drybean, as all given substitutes belong to that subcategory.

Concluding, this thesis has contributed to user empowerment in cook-ing practices by providing people the possibility to easily adjust recipes totheir needs and preferences with a newly developed ingredient substitutionsystem. This research has used two methods for obtaining ingredient substi-tutes by using expert knowledge from a domain expert and the data-drivenmethod of word embeddings. The expert knowledge, derived from Cook’sThesaurus, has been interpreted and tested in order to be able to use by theFlavourspace search engine. At the same time, this research has also shownhow a substitute could be derived based on recipe data without the need ofhuman expertise. This would be a highly cost-effective method for findingnew ingredient substitutes.

REFERENCES 24

References

[1] Fernando Alonso et al. “Combining expert knowledge and data miningin a medical diagnosis domain”. In: Expert Systems with Applications23.4 (2002), pp. 367–375.

[2] Fadi Badra et al. “TAAABLE: Text Mining, Ontology Engineering,and Hierarchical Classification for Textual Case-Based Cooking.” In:ECCBR Workshops. 2008, pp. 219–228.

[3] Yoshua Bengio, Aaron Courville, and Pascal Vincent. “Representationlearning: A review and new perspectives”. In: IEEE transactions onpattern analysis and machine intelligence 35.8 (2013), pp. 1798–1828.

[4] Chris Biemann and Martin Riedl. “Text: Now in 2D! a framework forlexical expansion with contextual similarity”. In: Journal of LanguageModelling 1.1 (2013), pp. 55–95.

[5] Corrado Boscarino et al. “Automatic extraction of ingredient’s sub-stitutes”. In: Proceedings of the 2014 ACM International Joint Con-ference on Pervasive and Ubiquitous Computing: Adjunct Publication.ACM. 2014, pp. 559–564.

[6] Trevor Cohen, Dominic Widdows, and Thomas Rindflesch. “Expansion-by-analogy: A vector symbolic approach to semantic search”. In: Inter-national Symposium on Quantum Interaction. Springer. 2014, pp. 54–66.

[7] Amelie Cordier et al. “Taaable: a case-based system for personalizedcooking”. In: Successful Case-based Reasoning Applications-2. Springer,2014, pp. 121–162.

[8] Louise Crochemore et al. “Comparing expert judgement and numericalcriteria for hydrograph evaluation”. In: Hydrological Sciences Journal60.3 (2015), pp. 402–423.

[9] Valmi Dufour-Lussier et al. “Improving case retrieval by enrichmentof the domain ontology”. In: International Conference on Case-BasedReasoning. Springer. 2011, pp. 62–76.

[10] David Easley and Jon Kleinberg. Networks, crowds, and markets: Rea-soning about a highly connected world. Cambridge University Press,2010.

REFERENCES 25

[11] Stefano Federici, Simonetta Montemagni, and Vito Pirrelli. “Inferringsemantic similarity from distributional evidence: an analogy-based ap-proach to word sense disambiguation”. In: Proceedings of the ACL/EACLWorkshop on Automatic Information Extraction and Building of Lex-ical Semantic Resources for NLP Applications. 1997, pp. 90–97.

[12] John R Firth. “A synopsis of linguistic theory, 1930-1955”. In: (1957).

[13] Anna Gladkova, Aleksandr Drozd, and Satoshi Matsuoka. “Analogy-based detection of morphological and semantic relations with wordembeddings: what works and what doesnfffdfffdfffdt”. In: Proceedingsof naacl-hlt. 2016, pp. 8–15.

[14] Zellig S Harris. “Distributional structure”. In: Word 10.2-3 (1954),pp. 146–162.

[15] P Javier Herrera et al. “JaDaCook: Java Application Developed andCooked Over Ontological Knowledge.” In: ECCBR Workshops. 2008,pp. 209–218.

[16] Maximilian Koper, Christian Scheible, and Sabine Schulte im Walde.“Multilingual reliability and fffdfffdfffdsemanticfffdfffdfffd structure ofcontinuous word spaces”. In: Proceedings of the 11th InternationalConference on Computational Semantics. 2015, pp. 40–45.

[17] Gerhard Kremer et al. “What Substitutes Tell Us-Analysis of an” All-Words” Lexical Substitution Corpus.” In: EACL. 2014, pp. 540–549.

[18] Petra M Kuhnert, Tara G Martin, and Shane P Griffiths. “A guide toeliciting and using expert knowledge in Bayesian ecological models”.In: Ecology letters 13.7 (2010), pp. 900–914.

[19] Jean-Francois Lavallee and Philippe Langlais. “Unsupervised morpho-logical analysis by formal analogy”. In: Multilingual Information Ac-cess Evaluation I. Text Retrieval Experiments (2010), pp. 617–624.

[20] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. “Deep learning”.In: Nature 521.7553 (2015), pp. 436–444.

[21] Yves Lepage and Chooi-ling Goh. “Towards automatic acquisition oflinguistic features”. In: Proceedings of the 17th Nordic Conference onComputational Linguistics (NODALIDA 2009), eds., Kristiina Joki-nen and Eckard Bick. 2009, pp. 118–125.

[22] Omer Levy and Yoav Goldberg. “Neural word embedding as implicitmatrix factorization”. In: Advances in neural information processingsystems. 2014, pp. 2177–2185.

REFERENCES 26

[23] Omer Levy, Yoav Goldberg, and Israel Ramat-Gan. “Linguistic Reg-ularities in Sparse and Explicit Word Representations.” In: CoNLL.2014, pp. 171–180.

[24] Laurens van der Maaten and Geoffrey Hinton. “Visualizing data us-ing t-SNE”. In: Journal of Machine Learning Research 9.Nov (2008),pp. 2579–2605.

[25] Tara G Martin et al. “Eliciting expert knowledge in conservation sci-ence”. In: Conservation Biology 26.1 (2012), pp. 29–38.

[26] Diana McCarthy and Roberto Navigli. “Semeval-2007 task 10: En-glish lexical substitution task”. In: Proceedings of the 4th InternationalWorkshop on Semantic Evaluations. Association for ComputationalLinguistics. 2007, pp. 48–53.

[27] Oren Melamud et al. “A simple word embedding model for lexicalsubstitution”. In: Proceedings of the 1st Workshop on Vector SpaceModeling for Natural Language Processing. 2015, pp. 1–7.

[28] Tomas Mikolov, Wen-tau Yih, and Geoffrey Zweig. “Linguistic Reg-ularities in Continuous Space Word Representations.” In: Hlt-naacl.Vol. 13. 2013, pp. 746–751.

[29] Tomas Mikolov et al. “Distributed representations of words and phrasesand their compositionality”. In: Advances in neural information pro-cessing systems. 2013, pp. 3111–3119.

[30] Tomas Mikolov et al. “Efficient estimation of word representations invector space”. In: arXiv preprint arXiv:1301.3781 (2013).

[31] Mark EJ Newman. “The structure and function of complex networks”.In: SIAM review 45.2 (2003), pp. 167–256.

[32] Arvid Osterlund, David Odling, and Magnus Sahlgren. “Factorizationof Latent Variables in Distributional Semantic Models.” In: EMNLP.2015, pp. 227–231.

[33] Stuart J Russell and Peter Norvig. “Artificial intelligence: a modernapproach (International Edition)”. In: (2002).

[34] Tobias Schnabel et al. “Evaluation methods for unsupervised wordembeddings.” In: EMNLP. 2015, pp. 298–307.

[35] Yuka Shidochi et al. “Finding replaceable materials in cooking recipetexts considering characteristic cooking actions”. In: Proceedings ofthe ACM multimedia 2009 workshop on Multimedia for cooking andeating activities. ACM. 2009, pp. 9–14.

REFERENCES 27

[36] Seymour Sudman and Norman M Bradburn. “Asking questions: apractical guide to questionnaire design.” In: (1982).

[37] Chun-Yuen Teng, Yu-Ru Lin, and Lada A Adamic. “Recipe recommen-dation using ingredient networks”. In: Proceedings of the 4th AnnualACM Web Science Conference. ACM. 2012, pp. 298–307.

[38] Duncan J Watts and Steven H Strogatz. “Collective dynamics of fffdfffdfffdsmall-worldfffdfffdfffdnetworks”. In: nature 393.6684 (1998), pp. 440–442.

[39] Ludwig Wittgenstein. Philosophical investigations. John Wiley & Sons,2010.

[40] Marion E Wittmann et al. “Use of structured expert judgment toforecast invasions by bighead and silver carp in Lake Erie”. In: Con-servation Biology 29.1 (2015), pp. 187–197.

[41] Qian Zhang et al. “Back to the future: Knowledge light case basecookery”. In: Conference papers. 2008, p. 15.

A Appendix 28

A Appendix

A.1 Performance results of MAP, recall and precision.

Tab. 4: Results of MAP, recall and precision of the Google and Recipemodel, using three different similarity measures.

(a) Metrics applied on the ingredient level

CoSim CosAdd CosMul

MAPGoogle 0.047 0.055 0.055

Recipe 0.031 0.050 0.045

RecallGoogle 0.056 0.066 0.065

Recipe 0.038 0.052 0.050

PrecisionGoogle 0.054 0.067 0.067

Recipe 0.035 0.052 0.050

(b) Metrics applied on subcategory level

CoSim CosAdd CosMul

MAPGoogle 0.208 0.229 0.222

Recipe 0.174 0.173 0.166

RecallGoogle 0.227 0.253 0.247

Recipe 0.186 0.188 0.182

PrecisionGoogle 0.206 0.251 0.246

Recipe 0.150 0.178 0.172

Automatic ingredient replacement in digital recipes ...flavourspace.com/wp-content/uploads/2018/06/Master... · Automatic ingredient replacement in digital recipes: combining machine

Documents