Recommender system providing easy reachable technical assistance

Recommender System for identifying qualified

people providing technical support for small

technical problems

Giovanni Mahlknecht, Martin Reinstadler

June 20, 2011

Contents

1 Introduction 2

2 Domain description 22.1 Technical support . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.2 Problem definition . . . . . . . . . . . . . . . . . . . . . . . . . . 32.3 Existing applications . . . . . . . . . . . . . . . . . . . . . . . . . 3

3 The application 53.1 Target groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53.2 System features . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63.3 System architecture . . . . . . . . . . . . . . . . . . . . . . . . . 6

4 Information retrieval and Recommendation techniques 84.1 Information retrieval . . . . . . . . . . . . . . . . . . . . . . . . . 84.2 Text Classification . . . . . . . . . . . . . . . . . . . . . . . . . . 104.3 Recommendation techniques . . . . . . . . . . . . . . . . . . . . . 124.4 Proposed solution . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

5 Advantages 17

6 Further considerations 17

7 Conclusion 18

1

Abstract

1 Introduction

The fast technological progress of the last years leads to a constantly growinggap in our society. Society is more and more divided into one group growingup with new technologies and used to manage new activities with new devicesand another group finding many difficulties in using new devices. For instancethink at the switch from analog TV to digital TV, where many customers hadproblems to understand how to do it. A young person can probably intuitivelyselect and install appropriate equipment, while an older person may encountersome difficulties. But is there any technical assistance available for such smallproblems? We observed that in most cases it is difficult to find an appropriateperson willing to help in such small technical questions. This observation moti-vates us to build a system to bring together the knowledge of some persons toaddress the questions made by others.A person willing to participate describes his/her skills on a platform (free textdescription) specifying also some extra features (e.g. where is he/she living,which languages is he/she speaking).If then somebody needs some technical help he can enter a free text query tothe system and receives a contact list of the most appropriate persons providinghelp for the specific problem.

2 Domain description

2.1 Technical support

Most companies offer a support service for their own products, which can bedivided into:

• Customer Service (or After Sales Service) for non technical support. Ac-cording to Turban et al.1 Customer Service is a series of activities designedto enhance the level of customer satisfaction. This includes reclamations,maintenance questions, and questions on how to use a certain product.

• Technical support or application support is typically a service providingassistance for technology products like Tv, mobile phones, computers andmore. This service attempts to help the users on specific problems with aproduct. Special trainings, customization of products and other servicesare mostly not covered by technical support.

• Help Desk is mainly designed for hardware and software support.

Summing up support service tries to provide qualified contact persons so thata customer can easily reach such a person and hopefully find a solution for aspecific problem. Obviously this service influences the customer satisfaction andconsecutive the image of the company.

1http://en.wikipedia.org/wiki/Customer_service

2

2.2 Problem definition

We observed that the problem can be divided into two subproblems. Firstcustomers may encounter difficulties in choosing an appropriate product fora certain need, since companies offer a large variety of products and services.Second it may be difficult to find a qualified contact person for a specific problemeven if support service is provided from the company. This can happen due tothe following reasons:

• support service is very expensive and therefore it may be available onlyfor the top selling products

• products have a very short life cycle. Support for older products may notbe available.

• companies have a very large range of products and it may be difficult toreach a qualified person for the given product. (e.g. a customer has topreselect his own product through a hierarchy of products)

• support service is very often managed by generalist call centers, whichmay not have qualified persons for the specific question.

• the service may be available only in a language the customer has difficultiesto understand.

Many times support service works fine, but the examples above show just somecases where it could be very difficult to obtain that service.

2.3 Existing applications

There exists several approaches providing help for users if the support from thecompany is not satisfactory. In order to explain the approaches we will use thefollowing terms:support provider referring to the (company independent) institution provid-ing the supportcustomer referring to the person asking for assistance.problem referring to a product with the associated problem.

1. Time banking stands for the idea to reciprocally exchange services betweenmembers. The used currency are time units (e.g. hours) and every serviceis measured in time units (e.g. one hour of technical support has thesame value as one hour of babysitting). Time banking associations existssince some years in many cities and play an important social role fora small part of the society. However they are limited to members ofthe association and a member must play both roles of support providerand customer simultaneously. Unfortunately service descriptions are verygeneral and it may be difficult to find an expert for a complex problem.http://www.zeitbank.net

http://www.zeitbank-meran.it/dienste_xls.htm

2. In 2009 a network of 65 high-school students in Alto-Adige helped peopleto install new digital television receiver. Calling a central phone numbera customer could get direct support on the telephone or look for a student

3

Figure 1: www.timebanking.org

in his neighbourhood to get personal assistance. This service had big suc-cess, unfortunately it was limited to a short time period and only to onespecific problem (digital television receiver)http://www.stol.it/Artikel/Chronik-im-Ueberblick/Lokal/Digitalisierung-Schueler-in-ganz-

Suedtirol-helfen

3. User communities (like forums and blogs) are very useful places whereusers can share their knowledge about a product. If a customer has somebackground knowledge about the problem and is able to formulate a de-tailed question, he can post that question and it is very likely that someother user (service provider) answers. Unfortunately not all customers candescribe in detail their problem. Also the answers may be inaccurate orpoorly explained. Personal contact (even telephonic contact) is unlikelyin this approach, since service providers are spread around the world.http://www.howardforums.com/

http://forum.chip.de/windows-7/

Besides these existing applications we found an American patent [GE01] de-scribing a network based customer service. For a customer requesting technicalhelp three consecutive levels of interaction are considered: If the first level (selfhelp searching database) is not sufficient, a customer can describe the problemon a platform and wait for a response (asynchronous help in the second level).The third level of interaction provides direct communication with an advisor,which is informed about the previously searched information of the user. Unfor-tunately the patent gives no specification about the group of advisors and thematching process of a technical help query and a capable advisor. Furthermorewe could not find any application which integrates the described approach.

Summarizing we can say that there exists some (company independent) ap-proaches basing on different paradigms to provide technical support, howeverall of them encounter different difficulties:

• the description of the provided help is very general and it is difficult tofind the appropriate person for a complex problem

• the service is limited by a time period

4

Figure 2: system architecture of us patent 6,177,932 B1

• the service considers just a small number of products and activities

• the customer must have enough background knowledge about the problemto get significant help

• the customer must differentiate between relevant and non relevant answersfor the posted question.

3 The application

The goal of the application is to provide a new way for customers to get technicalsupport. Combining the three ideas from section 2.3 the application makesuse of the technical skills of persons that participate voluntarily as supportprovider. The description of their skills (free text descriptions) are collectedand indexed in a central place. They are also associate to at least one of thepredefined categories of technical support (e.g. TV, telephone, computer). Oncea customer needs technical help, he can post the question (free text query) andthe application provides a contact list with the most qualified service providersfor the given problem.

3.1 Target groups

The system is designed for two groups of users:

1. users with special technical skills participating voluntarily as technicalsupport providers. This first group of users may consist of young studentsof technical high schools or people with some technical background.

5

2. users that require technical assistance or need help to find an appropriateproduct for their needs. Obviously they expect a system that is easy andintuitive to use.

3.2 System features

Since the application is designed for two target groups with different perspec-tives, there are also two different kinds of features available:

support providers

the first group of users consists of voluntarily participating people offering somesupport service. To participate each user must:

1. register to reserve a proper web space and

2. create a profile. In the profile he can describe his technical skills in formof free text description. He must also specify to which of the predefinedtechnical domains (for instance TV, phone, computer) the described skillsbelong to. To complete the profile the user must insert the contact detailsand some additional information as spoken languages, age, location, dayswhen he is disposable (demographic data).

users requiring technical help

the second group of users can search for technical assistance for a problem. Theyhave the possibility to:

1. use the simple search for assistance entering just a simple text query.This simple search considers just the indexed free text descriptions of thesupport providers and retrieves the documents with the highest calculatedscore. The results shown to the user are dynamic summaries of the topscored documents. The user can then click the preferred results to see thedetailed description of the offered service. However some of the personaldetails (for instance phone number) are unavailable to unregistered usersto protect privacy of support providers.

2. create an optional profile specifying some personal informations asage, address and language (demographic data). Once a user has set up apersonal profile he use the following advanced features:- advanced search where the original query can be augmented with thepersonal demographic data (for instance spoken language, distance fromdomicile, urgency). This service also includes the recommendation processdescribed in section 4. Unlike unregistered users they can access privateinformations of support providers.- give feedback to the support providers.

3.3 System architecture

Most of the information used for the search is stored in documents with free textdescription (description of technical skills of support providers). Each text is

6

also associated to at least one of the predefined technical domains (for exampleTV, telephone, computer). Furthermore there are some extra information (forexample spoken language, domicile, availability) available for each document.A simple inverted index can be created from these text documents with the theassociated fields. Besides this simple architecture we propose to integrate thefollowing technologies in the system:

1. a thesaurus, since there may exist different namings for just one technicaldevice. Once a thesaurus is set up it is simple to expand the originalsearch terms to the associated terms in the thesaurus (query augmenta-tion). The difficult part is the creation and the maintenance of such a the-saurus. A manual generation requires domain experts and is much moreexpensive than an automatic derivation. On the other hand it is muchmore efficient. Since the voluntarily participating support providers areconsidered domain experts, we propose to ask them every time a unknownterm is considered important. For example a term could be important ifit appears several times in the queries. If we present such an importantterm to all support providers together with the existing thesaurus, theyshould be able to update the thesaurus easily.

2. a translation tool for cross language information retrieval. Queries in onelanguage can retrieve documents in another language as well as in the orig-inal language. This feature is obviously very important for the system,since a support provider may describe his skills in one language specify-ing that he speaks some other languages as well. Since a query is usuallyshort, we propose to first identify the language and then translate the mostimportant terms (terms with a high idf score). For the language identifica-tion task an existing LID system can be integrated, while the translationtask can be accomplished by an external system, such as SYSTRAN2 orPROMPT3 [NA10]

3. a recommendation technique. After the information retrieval part hasbuild a ranking based on the cos similarity between skill description andquery of the most appropriate support providers, the ratings of other userscan be used to re-evaluate the ranking. Furthermore this service can bevery useful for a user if he does not know how to formulate a precise query.However the recommendation technique must be planned very carefully,since it is rare that a customer gives much feedback. Unlike commondomains of recommender systems (e-commerce, audio, video, travel) usersof our system uses the service just from time to time. Furthermore anew user can not give any feedback for experts as long as he did not getany service from them. In order to generate recommendations the systemmakes use of the following information:

• technical domain associated to the technical support

• language of the support provider and the customer

• distance of the support provider to the users location. This can beuseful if direct contact is necessary

2http://www.systran.it/3http://www.online-translator.com/

7

• age of the customer. A customer may belief more in feedbacks fromother users of the same age group

4 Information retrieval and Recommendation tech-niques

In order to retrieve the best service providers for a given problem the applica-tion implements several information retrieval and recommendation techniques.In this section we evaluate briefly the different techniques pointing out the ad-vantages and disadvantages they could give to our system.

4.1 Information retrieval

Information Retrieval is an active area of research that tempts to help usersto find relevant information in the available documents. The user specifies hisinformation need in a free text query and the system answers with the list ofmost appropriate documents, witch in our case are skill descriptions of supportproviders. This task can be achieved through a sequence of steps.

Modeling documents

First of all the information contained in each document must be modeled.

• Tokenization: Scanning a document it is possible to partition it into se-quences of characters, called tokens. Since we deal only with Europeanlanguages, we can assume that each token is embedded in between spaces.

• Language identification: Each document is written in a specific language,which can be identified automatically by one of the existing LID systems.This is very useful for our system, since documents may be written indifferent languages.

• Normalization to terms: Each token is just a sequence of characters repre-senting a word in a certain form (for example it may be a word in singularor plural form). Since it is not efficient to store all different forms of eachword in the dictionary, we should normalize the token to a term. Lateron queries are normalized in the same way. This task includes reducingletters to lower case, eliminating numbers, de-accenting and stemming. Astemming algorithm reduces each token to its root form by following aset of rules. Porter’s algorithm is a famous stemming algorithm originallydeveloped for English language4. Currently it is available for various lan-guages5. Only the rules differ from language to language, which makesthe importance of previous language identification evident.

• Inverted index: Each term found in a document is compared to the in-memory dictionary and added if not yet present. Also two importantvalues, namely the term frequency and the inverse document frequency

4http://www.tartarus.org/~martin/PorterStemmer/5http://snowball.sourceforge.net/german/stemmer.html

8

(idf) are attached. Furthermore the documentID is inserted in the as-sociated posting list (a list of documents that contain the term). Thisstructure is very useful to to identify documents containing some requiredinformation: queries consisting just of one term can be easily solved bysearching that term in the dictionary and simply returning the correspond-ing posting list. Queries containing more than one term can be evaluatedby taking the the intersection of the retrieved posting lists. Inverted in-dex is a common structure in information retrieval systems and should bepresent in our system as well.

• Index for phrase queries: The inverted index does not contain any infor-mation about the position of terms in the text. Consider for example thequery ”digital television”. With an inverted index the search result couldcontain a document with one sentence ”...digital world...” and anotherone with the words ”...no knowledge on television...”. This is called afalse positive result, since it is in the set of retrieved documents (positive)but not relevant (false).To overcome this problem a biword index couldbe very useful. For each consecutive pair of terms a new dictionary entryis created and the documents containing that entry are added to the as-sociated posting list. Unfortunately search results can still contain falsepositives for phrase queries with more than two words. A positional indexcould overcome this problem. Such index increases the size of posting-lists substantially since for each term each occurrence in the documentsis stored in the posting list. However this should not be a major problemfor our system, since the documents usually are short and the quantity ofdocuments is rather small.

• Index for wild-card queries: A system that allows wild card queries mustintegrate an additional index. For example a permuterm index could becreated for each term. A special symbol is defined and the original termis rotated around that symbol. A pointer from the dictionary term pointsthen to the permuterm index.Using permuterm indices wild-card queries can be solved very efficiently,however we are not planning to integrate wild-card queries in our system.A user normally either knows the terms he must put in the query for hisinformation need, or does not know the terms at all. In the second caseeven wild-card queries cannot solve the problem.

• Index for mis-spelled query terms: We assume that support providers usethe automatic spell check when they describe their skills witch ensuresthat single terms are spelled correctly. For queries the situation is differ-ent, since a query may contain mis-spelled terms and no matching docu-ments may be found if the mis-spelled terms of the query do not appear inthe dictionary. To overcome this problem the system should identify theclosest words between the in-memory dictionary terms and present themto the user as possible corrections of the original terms. Eventually thequery can run again with the corrected terms. The distance between twowords can be calculated with ”Edith Distance” algorithm or through theoverlapping n-grams. The second option requires an additional index, butis faster and cheaper than the first one. Since space efficiency is not anissue for our system we propose to integrate n-gram indices for dictionary

9

terms. A common method to measure the overlap of n-grams between twoterms is the Jaccard coefficient, witch normalizes the overlap. As exampleconsider the trigram overlap of the term television and the mis-spelledterm tevision:

television : tel, ele, lev, evi, vis, isi, sio, iontevision : tev, evi, vis, isi, sio, ionJaccardtelevision,tevision = 5

9 = 0, 56

• Index for zones and fields: Each document includes some extra infor-mation like category, language, domicile. This meta-data can be veryimportant for the search results, because a user may look for a technicalassistance witch speaks a certain language. Language= english is an ex-ample for a field value. In order to deal with field values we propose tointegrate a field index. For each field value a posting list of documents iscreated during the indexing process. Only for the domicile field a rangeof values is assembled to one single value.

Modeling Queries

Obviously after modeling the documents the information contained in a querymust be modeled as well. A query must be tokenized, normalized and the lan-guage identified in the same way as documents, but then some other techniquesmust be applied to query terms:

• Translation technique: A user may put a query in a certain language,but an appropriate document is written in another language. Even if themetadata in that document states that the support provider speaks thatlanguage, it will not be found in the inverted index. For this reason wepropose to translate the query terms in other languages as explained insection 3.3.

• Query assist: Once a user inserts a query term the system proposes pos-sible query sentences containing that term. Based on query log miningthis simple technique offers a very useful service to users. Especially usersthat does not know how to formulate a good query, may benefit from thisservice.

• Thesaurus: Human languages have some times several words for the samemeaning. Such synonyms or homonyms can be captured with a thesaurus.We propose to augment a query with the associated terms in the thesaurus,as explained in section 3.3.

4.2 Text Classification

At the moment a user posts a query, the system has no information about thetechnical domain the search belongs to. For example the query ”digital receiverinstallation” belongs to the domain television and not to the domaintelephone.The user encodes his information need in a very short sentence or just a se-quence of words and expects search results from the correct technical domain.

10

This is a classification problem. The query terms represent the instance and aclassification function should map the instance to one of the predefined techni-cal domains (classes). There are several ways to build a classification function:manual classification and classification through hand-coded rules are very ac-curate methods, but also very expensive. A third method is called supervisedlearning, where a chosen classification function observes a set of hand-classifiedtraining documents and learns from them how to classify documents where theclass is unknown. We have a set of descriptions from support providers withassociated categories available and could use them as training documents. Sev-eral well known classification function are available. During some experimentswith the Data-mining tool Weka we tried to predict the quality of wines (thequality lies in a range between 3 and 8) when the chemical attributes like resid-ual sugar, acidity and alcohol are known. We used an freely available dataset of1100 items and compared different classification functions. Thereby we observedthat Naive Bayes classifier is one of the most easy and powerful functions. Noprior parameters are required like for k-Nearest-Neighbours; nevertheless theclassifier is very fast and the results have high accuracy.

There are two models for Naive Bayes Classifier:

• Multivariate Naive Bayes Classifier, witch considers terms from the dic-tionary just as present or not present in the document.

• Multinomial Naive Bayes Classifier, witch goes beyond presence- absenceof terms in a document considering the number of times a word appearsin a document. Since we have a positional index available, this model isthe better alternative. Also because this model works usually better fortext classification. The classification function

cNB = maxcjεC

P (cj)ΠP (xi|cj)

where cj represents the j-th class and xi the i-th term of the query searchesfor the class cj with the largest value for that function. Two small improve-ments for Naive Bayes Classifier are important: first Laplace’s smoothingprevents zero probability for a class if there are single zero factors in themultiplication. A factor P (xi|cj) can be zero if the term xi is not presentin class cj . Second taking the log of each factor prevents floating-pointunderflow.

Before classifying real queries it is better to evaluate the classifier on test docu-ments. Therefore the available classified documents should be partitioned intoa larger set of training documents and a smaller set of test documents. Putting2/3 of the documents in the test set and 1/3 in the training set is one of themost simple ways to partition the whole documents. A better approach wouldbe to partition the documents into k mutually exclusive subsets. Then in kiterations the classifier takes k-1 subsets as training data and 1 subset as testdata. Obviously the test set changes in every iteration. The overall evaluationof the classifier is obtained from the average evaluation from all iterations.

11

4.3 Recommendation techniques

Recommender systems are personalized agents that provide suggestions foritems a user may like. The way of user interaction distinguishes recommendersystem from information retrieval systems: in a recommender system a userspecifies some personal likings or defines some personal details in order to ob-tain a personalized set of recommended items. In contrast an information re-trieval system does not require any personal information from users. But ifdifferent users post the same query, the result is the same for all of them. Avariety of techniques for generating recommendations are widely used in modernrecommender system:

• Collaborative filtering: The idea behind Collaborative filtering comes fromreal life: if a person wants to know something about a product he tendsto ask his like-minded friends at first. So, bringing this concept to digitalnetworks, in order to generate recommendations for a target user a sys-tem must first of all collect all of his previously expressed opinions aboutother products (expressed as numerical ratings). They are then comparedto the ratings of all other users to identify the group of users with themost similar opinions (like minded users or neighbours). After this thesystem predicts a rating for all the products rated by the neighbours butnot by the target user. Finally the set of products with the highest scoresis presented to the target user. Collaborative filtering is a widely usedtechnique for various domains. Since the technique does not require anyproduct descriptions, but only user ratings, especially systems that havedifficulties in describing their products use it (for example music recom-mender systems or film recommender systems). Despite the advantages wedon’t think collaborative filtering is useful for our system. First becausewe have no particular difficulties in describing our products. Each textualdescription of a support provider represents a product in the system andthe contained terms or features can be used to describe that document.Second because in most cases users give not much feedback, since theymay use the service just from time to time. Also they cannot give anyfeedback at the moment they join the network, and it would be impossibleto identify neighbours. This well-know problem of handling new items ornew users is called cold-start problem.

• Item to item collaborative filtering: This method considers the items auser has liked in the past and looks for similar items to recommend themto the user. The similarity of items can be computed through the ratings,through item descriptions or through some co-occurrence in identical setof items. Item to item filtering normally provides better predictions thanCollaborative filtering. However item to item filtering suffers also fromcold start: a new support provider without feedback from users has fewpossibilities to be recommended. For our system this is not acceptable.

• Content-based recommendations: A content-based recommender systemtries to find items that are similar to those the target user has liked (rated)in the past. Those items are considered possible candidates to recommend.In contrast to the collaborative filtering technique this approach does notdepend on ratings from other users. It is used mainly for text-based items,

12

since they can be described by their associated features (keywords) andcompared to the user model. When a user gives some feedback, our systemcould store an entry in the user model with the following information:the most important keywords of the description of the support provider,the associated technical domain and some demographic information asspoken language, age and address. When a user posts a query, first ofall the technical domain could be identified (see section 4.2). Since eachentry in the user model is associated to a technical domain, only entriesfrom the same technical domain as the query are utile to identify similarsupport providers. The information contained in the identified entries(for example keywords that appear often, demographic information) canthen be used to find similar support provider the user has not yet rated.Content-based filtering seems very interesting for our system, because wedeal only with textual documents. Items with other content types (forexample multimedia) would be hard to describe. Other disadvantagesas over-specialization and the recommendation of expected items seemsnot to be relevant for our system. The only problem could be the userfeedback. We observed already in the Collaborative filtering approachthat our system may not expect much user feedback and Content-basedsystems suffer also from the cold-start problem.

• Demographic methods Considering the personal attributes of the targetuser (for example language, address, age) a demographic recommendersystem aims to categorize the user into a demographic class. The recom-mended items are contained in the derived demographic class. In contrastto the previous methods this approach does not require necessarily ratingsfrom the user, but users demographic information and predefined demo-graphic classes. Ratings could be useful if the recommended items arebetween the highest rated items of users belonging to the same demo-graphic class. Unfortunately our system holds just few personal detailsof users. Age, language and domicile may not be enough information tobuild a classifier for the users.

• Knowledge-based systems: A knowledge-based recommender system rec-ommends items that meet user’s preferences. Such preferences are encodedin a knowledge structure that supports inference. One possible technique,case-based-reasoning, uses solutions from similar past problems in orderto solve a new problem. This sounds reasonable, since similar problemsusually have similar solutions. Typically knowledge-based recommendersystems focus on short term user profiles to cope users and items with justfew or even without ratings; a cold-start is rather unproblematic. Thiscan be very interesting for items that are rarely bought or very expensive.Also for our system the features of a knowledge-based recommender sys-tem seem excellent, however the effort to define the knowledge structure isprobably too high. We expect social benefit rather than high profit fromthe system and this technique seems not cost effective.

• Utility methods: A utility function describes the degree of user happinessfor a certain recommendation. It can capture either long term or a shortterm utility (ephemeral) of a user. Each item is described by a list ofattributes (either real values or boolean attributes) and can be modeled

13

together with a set of weights derived from the user model. The itemswith the highest output from the utility function should be recommendedto the target user. Utility methods does not suffer from cold start, but thechallenge is to build a good utility function. We have several attributesavailable (for example distance from the user to support provider, agedifference) witch can be expressed in numerical values. Furthermore wecould could ask the user to provide the weights for the attributes (whichfeatures are more important for the specific problem and witch are lessimportant). For every problem the weights can be adjusted again. Wethink that a utility method is ideal for our system and explain the detailsin the next section.

• Hybrid systems In order to solve unsatisfactory characteristics of the previ-ous approaches, hybrid systems combine several techniques to produce thefinal output. According to [Bur07] there are seven strategies to combinedifferent recommender system:

– Weighted: the scores of different recommendation techniques arecombined

– Switching: depending on the situation the system chooses one amongthe different techniques.

– Mixed: recommendations from different techniques are presented to-gether explaining their meaning.

– Feature combination: features from different sources are combined asinput for a recommendation algorithm.

– Feature augmentation: One technique computes the inputs for thenext techniques.

– Cascade: recommender techniques are applied in sequence.

– Meat-level: one technique produces a model witch is used for thesubsequent techniques.

The idea of gaining from the strength of different techniques is very inter-esting. Obviously the single recommender techniques must be chosen verycarefully, since they influence the final result. For our system we proposea hybrid recommender technique witch could produce high quality recom-mendations and overcome the cold-start problem at the same moment. Inthe following section we explain our decision in detail.

4.4 Proposed solution

Before explaining the proposed solution we would like to summarize the keypoints of the system: first of all the main factors that influence user happiness:

• a user wants to find easily an appropriate support provider for a givenproblem.

• the support provider should speak the language of the user (even if thetextual description may be in another language)

• the support provider should be close to the user if direct support is re-quested.

14

• the support provider should be available in a short timespan if the problemis urgent.

• the support provider should be a competent and reliable person, especiallyif direct support is requested.

Next we would like to recap all background data of the system that is usedfor the information retrieval and recommendation process:

• free text descriptions of support providers

• associated technical domain

• profiles of support providers with some demographic data

• few explicit ratings from users if they achieved technical support (valuesfrom 0-5)

• implicit feedback from users if the click a summary link of a retrieveddocument (always value 1 since it is considered less important than explicitfeedback)

Besides background data we would like to highlight also input data, witch comesfrom the target user:

• a simple free text query without wild-cards.

• associated technical domain derived from the classifier (see section 4.2)

• weights for the utility function set by the user

• explicit ratings if the user gives feedback to a support provider

• implicit ratings if the user clicks a summary link of a retrieved document(description of a support provider)

Basing on this information we decided to combine information retrieval andutility based recommender system to a hybrid system. Both techniques could beapplied together (weighted strategy) to calculate the final score for the recom-mended documents. Several techniques help to improve information retrieval:the integrated thesaurus used for query augmentation improves recall as wellas the language translation tool. Spell correction techniques help to overcomemis-spelled query terms and query assistant support users in formulating thequery. The information retrieval component gives a score between 0 and 1 toeach retrieved document:

Inf(d, q) = (D + 1− kd)/D

where d is the description of the support provider, q is the query, D is the set ofretrieved documents and kd is the position of the document d in the result set.For example the query ”digital television” could retrieve 5 documents wheredocument dx is in the third position.

Inf(dx, ”digital, television”) = (5 + 1− 3)/5 = 0.6

if the document dx would have been the first document in the result-set, thescore would be:

15

Inf(dx, ”digital, television”) = (5 + 1− 1)/5 = 1

A utility function calculates a second score for the documents. Several attributesvalued from zero to ten are used for the utility function. For all attributes thevalue ten represents the highest utility for the user and zero the lowest:

• dist: distance from the support provider to the user. It should be small ifa user requires direct technical support and is insignificant if telephone ormail support is sufficient. The value zero denotes a distance of more than100 km, while the value ten denotes a distance of less than 10 km.

• urg: urgency of the problem and availability of the support provider. Ifuser has a urgent problem the support provider should be available assoon as possible. The value zero expresses that the support provider isnot available during the next 10 days, while the value ten states that thesupport provider is ready to help the next day.

• posfee: positive feedbacks of the support provider. For some users it maybe fundamental that a support provider has the best possible feedbacks.Since feedbacks are in a range between zero and five, we must multiplythat value with two. For example if a support provider has five votes witha mean value four, the value for the attribute used by utility function willbe 8

• agdif : age difference between user and support provider. This may beinteresting if a user has a problem that is common in a certain age group.For example a young user may have a problem with his computer gameand it is very likely that a support provider from the same age group canhelp easily in that case. The value zero denotes an age difference of morethan 50 years, while the value ten denotes an age difference of less than 5years.

Since each technical problem is different, the importance of each of these at-tributes may vary. Therefore the user can specify for every technical problemthe weights of each attribute by simply regulating some scroll bars. For ex-ample if the user does not need direct support he could mark the distance asnon-relevant (low weight), but the age difference could be very important (highweight). The sum of all weights result always in the value one.

Ut(u, d) =∑4i=1

ai∗wi

10

where u is the user, d is the description of the support provider with the de-mographic data, ai denotes the i-th attribute and wi denotes the weight for thei-th attribute. For example a user could set the weights for a specific problemin the following way: wdist = 0, 7, wurg = 0, 3, wposfee = 0 and wagdif = 0.A support provider, which lives at a distance of 29 km, has time in two days,has only the best feedbacks and an age difference of 11 years gets the followingvalues: dist = 8, urg = 9, posfee = 10 and agdif = 8. Based on attributevalues and weights the utility function calculates the following score:

Ut(u, d) = 8∗0,710 + 9∗0,3

10 + 10∗010 + 8∗0

10 = 0, 56 + 0, 27 + 0 + 0 = 0, 83

16

The final score for a description of a support provider is given by:

Score(d, q, u) = α ∗ Ut(u, d) + (1− α) ∗ Inf(d, q)

where the variable α is a value larger than 0 and smaller than 1. It is used toset the impact of the two components for the final score. For example α couldbe 0.3 and Ut(u, d) could be 0.7 for userx:

Score(dx, ”digital, television”, userx) = 0.3 ∗ 0.7 + (1− 0.3) ∗ 0.6 = 0.63

If we change the weight α to 0.6 the final score is:

Score(dx, ”digital, television”, userx) = 0.6 ∗ 0.7 + (1− 0.6) ∗ 0.6 = 0.66

witch means that more importance is given to the utility function and lessimportance to the information retrieval component. The value of α is initiallyset to 0.5 (equal impact of information retrieval and utility method). Withouttraining data it is difficult to estimate a good value of α so it has to be set infuture.

5 Advantages

There are several advantages of the system. Let us start with the advantagesfor the two user groups:

1. volunteers can apply their skills in real world and bring their skills towardsconcrete problems. This can be very valuable especially for students: theycan gather first experiences with customers and customer satisfaction.Such experiences can be crucial for future job careers, since many privatecompanies ask for concrete experiences when they assume new employees.

2. people requiring some technical help can easily find a qualified volunteerfor many different kind of problems. They can feel secure about the pro-vided assistance, because they have insight in feedbacks from other users.This improves customer satisfaction significantly.

The mentioned advantages could also bring benefits for other parts of the society.As we mentioned in section 2, customer support tempts enhance the level ofcustomer satisfaction. If a customer is happy about a product he is more likelyto buy that product again in future, which can be a significant advantage forproducing companies.

6 Further considerations

Until now we did not take payment into consideration. Technical support isprovided by volunteers, which decide freely to participate. This sounds nice,however it may be advantageously to include a payment system. First of all be-cause support providers probably are motivated to accumulate positive feedbackto give technical assistance regularly. They may describe their skills very de-tailed, which improves the information retrieval process, and solve the technicalquestions accurately, which improves their positive feedbacks. Second becausewe think that a high qualitative service should cost a reasonable amount ofmoney to prevent its quality.

17

7 Conclusion

Customer support influences the customer satisfaction and consecutive the im-age of the company. Even if most of the companies offer this service, it isnot always easy to reach it. Besides the employees of the customer service de-partment there may be several people around knowing how to handle certaindevices. If these persons would be agree to share their knowledge to other peoplerequiring technical assistance, customer support could be improved. An infor-mation retrieval and recommender system is a very good way to bring volunteersupport providers and people asking for technical assistance together. Supportproviders can describe their skills and can be located easily if they are adequateto solve a certain technical problem. Customers can set up a personal profileand provide some feedback, either implicitly or explicitly, to improve futurepersonalized search results. Furthermore they help other users to feel confidentabout support providers. By using the latest technologies the system can beused very intuitively: support providers can set up their profile without trans-lating it into other languages. On the other side people find a support providereven if the query terms are mis-spelled or the query language is different fromthe document language.

References

[ABH+08] Fabian Abel, Ig Ibert Bittencourt, Nicola Henze, Daniel Krause,and Julita Vassileva. A rule-based recommender system for onlinediscussion forums. Technical report, Conf. on Adaptive Hypermediaand Adaptive Web-Based Systems, 2008.

[Bur07] Robin Burke. The adaptive web. chapter Hybrid web recommendersystems, pages 377–408. Springer-Verlag, Berlin, Heidelberg, 2007.

[Dum97] Susan T. Dumais. Automatic cross-language retrieval using latentsemantic indexing, 1997.

[GE01] Frank A. Glades and Mark A. Ericson. Method and apparatus fornetwork based customer service. US-Patent No 6177932 B1, January2001.

[GRBF11] Abdallah Gomah, Samir Abdel Rahman, Amr Badr, and IbrahimFarag. An auto-recommender based intelligent e-learning system.International Journal of Computer Science and Network Security,11(1):67–70, January 2011.

[NA10] Nurul Amelina Nasharuddin and Muhamad Taufik Abdullah. Cross-lingual information retrieval. Electronic Journal of Computer Sci-ence and Information Technology, 2(1), 2010.

[TH01] Loren Terveen and Will Hill. Beyond recommender systems: Helpingpeople help each other. In HCI in the New Millennium, pages 487–509. Addison-Wesley, 2001.

18

[TM03] Tiffany Ya Tang and Gordon Mccalla. Smart recommendation for anevolving e-learning system. In Workshop on Technologies for Elec-tronic Documents for Supporting Learning, International Conferenceon Artificial Intelligence in Education (AIED), page 2003, 2003.

19