IN DEGREE PROJECT TECHNOLOGY, FIRST CYCLE, 15 CREDITS , STOCKHOLM SWEDEN 2020 Virtual Assistants and Their Performance In Professional Environments ERIK PERSSON JOHAN TORSSELL KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE
17
Embed
Virtual Assistants and Their Performance In Professional ...1470600/FULLTEXT01.pdf · Today, virtual assistants can provide value in organisation and support a sustainable society
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
IN DEGREE PROJECT TECHNOLOGY,FIRST CYCLE, 15 CREDITS
, STOCKHOLM SWEDEN 2020
Virtual Assistants and Their Performance In Professional Environments
ERIK PERSSON
JOHAN TORSSELL
KTH ROYAL INSTITUTE OF TECHNOLOGYSCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE
Svensk Sammanfattning
Fran mitten av 1900-talet har virtuella assistenter utvecklats och forfinats dar teknologin gatt franen mangd regler till assistenter drivna av artificiell intelligens. Idag kan virtuella assistenter tillforavarde till organisationer och bidra till ett hallbart samhalle bland annat genom att utfora enkla ochaterkommande uppgifter samt minska ojamlikheter orsakad av partiska radgivare i kansliga fragor.Trots framgangen har nuvarande forskning inte fokuserat pa evalueringen av virtuella assistenter iindustriella sammanhang.
Syftet med denna rapport ar att utvardera virtuella assistenter fran ett tekniskt, ekonomiskt ochorganisationellt perspektiv for att forsta dess prestation i industriella miljoer. Arbetet har genomfortsi samarbete med IBM och en av deras kunder som foredrar att forbli anonyma. I detta foretag ar tvaIBM Watson Assistant under utveckling; en for deras IT Service Desk och en for deras avdelning forEthics & Compliance. I studien har bade kvantitativa och kvalitativa metoder anvants, dariblandanvandartestning och frageformular, for att inkludera alla aspekter av de virtuella assistenternasprestation. I denna process har diskussioner forts med experter inom IBM samt medarbetare paforetaget for vilket den praktiska implementationen studerats for att fa en forstaelse for bade generelloch specifik kunskap ur olika perspektiv.
I denna rapport kan foljande slutsatser dras. Ett, den tekniska prestationen kan bestammas medkvantitativa matetal sa som tackning (coverage), sakerhet (confidence), precision och hjalpsamhet(helpfulness), och kompletteras med kvalitativa matetal som anvandarnojdhet och upplevd forstaelsefor anvandaren. Tva, specifik teknisk prestation ar relativ och de tekniska begransningarna samtmognad bor anvandas som komplement till utvarderingen av assistenterna. Tre, identifieradeorganisationsfordelar inkluderar:
Slutsatserna i de specifika fallen visar att en virtuell assistent som implementeras inom ett smalareomrade, som en assistent for Ethics & compliance, enklare kan implementeras samt presterar relativtbra aven i en mindre utvecklad miljo. Bredare omraden, som en assistent for IT-support, kravermer arbete for att prestera pa en hog niva men kan vara annu mer vardefull an assistenten i detsmala omradet nar den blivit tillrackligt utvecklad.
DEGREE PROJECT IN TECHNOLOGY, FIRST CYCLE, 15 CREDITS STOCKHOLM, SWEDEN 2020 1
Abstract—Contributors from the mid 20th century up to
now have developed and refined virtual assistants, taking the
technology from a set of rules to assistants driven by Artifi-
cial Intelligence. Today, virtual assistants can provide value in
organisation and support a sustainable society by conducting
basic and repetitive tasks, and help reduce inequalities caused
by biased advisors on sensitive topics. Despite its prosperity,
current research somewhat lack focus on the evaluation of virtual
assistants in industrial applications.
The purpose of this paper is to evaluate virtual assistants
from a technical, economical and organisational perspective, in
order to understand their performance and value in an industrial
environment. This has been done in collaboration with IBM
and a client company which prefers to remain anonymous in
this report. In this company, two IBM Watson Assistants are
under development; one for the IT Service Desk, and one for
the Ethics & Compliance department. To cover all aspects of
the virtual assistants’ performance, quantitative and qualitative
methods were used by conducting user testings and surveys. In
this process, discussions have been conducted with IBM experts
and employees of the firm for which the practical implementation
has been studied, to gain a general and specific understanding
from different perspectives.
From this paper, the following can be concluded. First, techno-
logical performance can be described using quantitative metrics
such as coverage, confidence, precision and helpfulness, and
should be complemented using qualitative measures such as user
satisfaction and perceived user understanding. Second, specific
technological performance is relative and the technical limitations
as well as it’s maturity should be used as a complement to
the evaluation of the assistants. Third, identified organisational
benefits include:
• reduced time-to-resolution,
• reduced handling time,
• all-hour-support,
• scalability and
• user understanding
Conclusions specific for the use cases show that an assistant
implemented in a narrower use case, that is the Ethics &
Compliance assistant, easier can be implemented and performs
relatively well also in less developed environments. A broader use
case, such as the IT assistant, requires more effort to perform at
a high level but may be even more beneficial than in the narrow
use case once sufficiently refined.
Index Terms—Virtual Assistants, Watson Assistant, Virtual As-
sistant Evaluation, Potential Cost Savings, Organisational Value
I. INTRODUCTION
IN 1950, Alan Turing asked the question, “Can Machinesthink?”, and proposed an experiment for this. He described
a game consisting of three entities - A, B and C where C is aninterrogator with the mission to identify the gender of A andB. A’s objective is to deceive the interrogator while B’s is tohelp the interrogator. The interrogator would write questions,and A or B would answer. Turing then proposed the question,“What will happen when a machine takes the part of A inthis game?”. This idea formed The Turing Test - a test of amachine’s ability to display intelligence indistinguishable fromthat of a human. [1]
Around the mid 1960s, MIT professor Joseph Weizenbaumcreated ELIZA, an early Natural Language Processing (NLP)computer program able to attempt The Turing Test. ELIZA
used keyword recognition and context identification to simu-late an understanding of the users. [2] However, consideringthe primitive version of NLP and Natural Language Under-standing (NLU), the machine lacked the ability to upholda conversation and was limited to its narrow skills, thusconcluding The Turing Test unsuccessful. [3] Still today, NLPis key for all modern virtual assistants. Virtual assistants havea broad range of definitions whereas Cambridge Dictionarydefines it as “a computer program or device that is connectedto the internet and can understand questions and instructions,designed to help you to make plans, find answers to questions,etc.”. [4] Due to its wide definition, assistants are commonlycategorised as those purely driven by rules and those drivenby machine learning and artificial intelligence. The latter arealso divided into rule-driven dialog flows or machine learntstories. [5] [6]
From a socioeconomic perspective, virtual assistants canhelp with unbiased information and advice on sensitivesubjects. For instance, it can advice on harassment in theworkplace and therefore indirectly reduce harassment andinequalities. Moreover, it can respond objectively on questionsregarding career change that might otherwise be difficult to aska manager. It is notable however that assistants implementedin industry, are not designed to pass the Turing Test, butrather to answer specific questions or help users throughprocesses. This differs from general virtual assistants, suchas Google Assistant, Siri or Alexa, where part of the goal isto be as human like as possible. Furthermore, considering theassistants’ ability to perform simple tasks, the well-being canimprove as the tasks left for the employee are more complexand thus stimulating. Undoubtedly, the benefits are importantworldwide and the value they can provide are aligned withthe UN’s Sustainable Development Goals, particularly (3), (8)and (10). These concern Good health and Well-being, Decentwork and Economic growth and Reduced inequalities[7]
Despite the virtual assistants’ advantages and benefits forsociety and firms, questions on technical aspects such asimplementation complexity and technological maturity arise.Also, questions on the firms potential cost savings and intan-gible benefits remain. Consequently, this paper will addressthree key problem formulations:
1) How can virtual assistants’ performance be evaluated?2) What value can virtual assistants provide to an organi-
sation?3) What are the potential cost savings of using virtual
assistants?
In order to study the research questions, this paper is dividedinto six parts. The first part deals with the technical aspect ofhow virtual assistants work, as well as the current researchon organisational value and cost savings when using virtualassistants. The next chapter covers this paper’s relation tocurrent research and focuses on how it distinguishes itself fromprevious research. Next part is about the methodology usedfor this study, which includes a summarised view on how thispaper’s research questions have been answered. The fourthsection presents the findings of the research, focusing on thetwo key themes that have been taken into consideration, i.e.,
DEGREE PROJECT IN TECHNOLOGY, FIRST CYCLE, 15 CREDITS STOCKHOLM, SWEDEN 2020 2
the technical aspects of virtual assistants and what socioe-conomic value they can provide to an organisation. Chapterfive provides a deeper discussion of the results and the paperis summarized the conclusions from the study to answer theresearch questions.
Thanks to the collaboration with IBM and the companyfor which the implementation was studied, the problem for-mulations are investigated using the IBM Watson Assistant.In this case, two assistants are currently being developed forthe company in question; one for Ethics & Compliance andone for the IT Service Desk. The work has been ongoingfor eighteen weeks with a small team of IBM consultants.Both assistants are results of a pilot project, and are not yetconsidered production ready. These two assistants are used forthis paper, both to gain insights in implementation processes,as well as to evaluate them in an industrial environment.
II. BACKGROUND - TECHNICAL PERSPECTIVE
This section focuses on the two core challenges discoveredwhen creating virtual assistants: How to understand userintents and how to perform the requested tasks. Due to thecomplex nature of these key functions, the former is typicallysolved using advanced and combined methods for NaturalLanguage Understanding. [6] [8]
To further break down the concepts of virtual assistants, arange of terms are frequently used. These terms are describedbelow, using the example sentence “What are the openinghours for the store in Silicon Valley?” to clarify them:
• Utterance. A user’s request or statement. In the example,the utterance is “What are the opening hours for the storein Silicon Valley?”.
• Intent. A specific goal or idea conveyed from the utter-ance. The intents would be to find out the opening hoursof the store, in the example sentence.
• Entity. A term or object, which provides context for anintent. [9] Here, Silicon Valley is recognised as a location.
• Domain of knowledge or Skill. A domain covers arange of intents, which sets the limit of the assistant’sability. [10] [6] The Skill in the example could be“customer support”, including topics such as openinghours, locations, return of goods et cetera.
Saloni Potdar, Senior Software Engineer in Cognitive An-alytics and Deep Learning, identified some key factors toconsider on virtual assistants for this paper. Apart from in-tent classification and entity-recognition, she acknowledgedofftopic identification to realise when not to answer, as well asprofanity filtering to avoid obscene language in conversations.Furthermore, Potdar stated that algorithms for spellcheckshould be considered essential to make the intents and entitieseasier identified.
A. Dialog Flow ConfigurationOne way to create dialog between the user and the virtual
assistant is by using predetermined rules. The rule-baseddialog flows are configured by first manually adding intentsand entities, and then building a conversation path for the
users. Example sentences for the intents are constructed, whilstsynonyms for the entities are added. After this process, theassistant is trained to recognise and classify intents from theuser inputted utterance. The entities are either based on a listof words, rule-based or machine learnt. In the first case, themodel is not trained but rather is a list purely maintained bythe developer, and in the latter the entities are apprehendedbased on the context in which they are used. [9] [6]
Another less common way to configure dialogues is touse machine learning. In this way, positive and negativesample conversations are used to determine the conversationwith the user. At each conversational turn, the model usesprobabilistic estimations to determine the next direction ofdevelopment. Consequently, if a large amount of data andlabeled user conversations are present, the machine learningapproach might be more practical. In addition, the machinewill become more practical due to its ability to self-improve.It can do so since it does not require an explicit dialog flow,unlike the rule-based configuration. [6]
B. ComparisonThe rule-based approach works well with incremental de-
velopment because of the straightforward process to createpredictable functionality and test the feature. Predeterminedconversation flows can be practical when there is little data onuser interactions, and the desire to create the system quicklyis present. However, rule sets tend to become large and im-practical with more complex systems since the machine needsto behave naturally towards the user. In contrast, an initialisedsystem with a large amount of available data works well withthe machine-learnt conversation flow. Due to its independence,the model can be trained to create elaborate dialog. In spiteof its capabilities, this approach reduce the possibility ofcontrolling specific conversation flows. Additionally, a largeset of example stories would be necessary if one wants toensure that all entities for an intent are present, somethingthat could be faster configured with a rule policy. [6]
Admittedly, both approaches have advantages and disadvan-tages. Therefore, a mixture of predetermined rules combinedwith machine learnt stories can be used. However, the mostcommon configuration today, also used by IBM Watson As-sistant, is rule-based. [6]
C. LimitationsDespite the prosperity and developments in the area, virtual
assistants still have limitations. For instance, a Domain ofknowledge or question category - such as email issues -generally supports 10-50 intents, and if the number exceeds600 the domain would typically become impracticable. [6]Additionally, the different intents are not equally frequentlyrequired. However trivial it might sound, the assistants are alsolimited by the amount of topics they can cover. A key factorin this is to not only consider how accurate and precise theresponse is, but also how well the reply can help and satisfythe user.
In addition to limitations specific to virtual assistants,fundamental challenges are undoubtedly present in NLP and
DEGREE PROJECT IN TECHNOLOGY, FIRST CYCLE, 15 CREDITS STOCKHOLM, SWEDEN 2020 3
NLU. In particular, using multiple or conditional utteranceshas proved difficult for the classifiers to identify. For example,using the phrase “I want to buy an umbrella, unless you sellrain jackets” a human would identify that the intent is to buya rain jacket. The virtual assistant would, however, typicallybe confused by the sentence. Furthermore, intents can only betrained using already existing data and phrasing. As a result,if one were to write a completely different utterance than whathas been used to train the model, the virtual assistant mighthave difficulties understanding the user’s intents.
III. BACKGROUND - ORGANISATIONAL VALUE
This section focuses on the organisational value a virtual as-sistant can provide to an organisation in general as well as thespecific company studied. It includes the current process forthe studied use cases as well as their key challenges amongstother common challenges discussed with the company. Thebenefits of virtual assistants are divided into two parts wherethe tangible section describes the benefits possible to quantifyand the intangible those who are not.
A. Ethics & Compliance’s Internal InvestigationBefore deciding to implement a virtual assistant the Ethics
& Compliance department wanted insight on whether virtualassistants would be beneficial, and made an internal investi-gation by questioning nine employees within the firm. Thequestions were created to receive information on what theusers’ consider important in a virtual assistant, what theircurrent frequencies of contact with the Ethics & Complianceteam were, sentiment towards virtual assistants in general, andhow experienced they were with virtual assistants.
As a result, some key conclusions could be drawn. First,it was found that users value response speed as much asreceiving comprehensive advice. Second, users tend to havepsychological barriers to use a virtual assistant, hence wantingto only use it as advice and not allowing the assistant totake direct action. Third, it was concluded that users findit difficult to search through the business practice policy,something that a virtual assistant could simplify. In terms oforganisational benefits for the Ethics & Compliance team, itwas estimated that between 1/2 and 1 full-time equivalentworkload would be freed by using the virtual assistant toanswer questions of simpler character. Additionally, there areintangible benefits such as getting insights from conversationanalytics which could be used to increase user understanding,direct incremental improvements and to clarify the content.Also, the research found that virtual assistants would bevaluable for users as it would be all-time available and beable to provide answers instantly, as well as enabling usersto ask otherwise sensitive questions such as questions makingthe employee appear incompetent or disloyal.
B. Current Process1) IT Service Desk: The current process for the studied
company’s service desk is initialised by someone reporting anissue using a self-service tool in their service portal. Next,
an IT Service Desk agent manually sets the category, priorityand other parameters based on the issue description. Based onthese parameters, the ticket is routed to the correct team tohandle the inquiry. If the inquiry is routed correctly, the agentassigned starts working on a solution to the issue, otherwise,it is transferred to the correct group. The service desk is open24/7 where issues usually are being responded to within onehour.
2) Ethics & Compliance: The current process for inquiriesregarding ethics and compliance begins with a generalist legalcounsel being contacted over email based on the division theinquiry comes from. The counsel then searches the companypolicy for relevant rules as well as assessing the question orissue to make sure every aspect is covered. An answer is thenformulated with insights into best practice which may not beavailable in the policy. If the generalist is in need of furtherguidance, an internal specialist is contacted as a first escalationand if necessary, an external expert is advised.
C. Key Challenges
A key challenge for the specified use cases is long time-to-resolution. According to Forrester, One of the reasons forlong time-to-resolution is limited service hours, especiallyfor international companies operating in different time zoneswith centralised service desks. Other reasons for long time-to-resolution are queues and multistep routing between agentswhich decrease the user experience as well as incurs costto the service organisation. [11] Furthermore, repetitive basictasks are common, time-consuming and costly. For instance,password resets commonly account for 20 % - 50 % of theIT-support requests. [12]
The key challenges for the company for which the practicalimplementation was studied, differed between the use cases.For the IT Service Desk, a key challenge was inconsistencybetween different cases and countries which may be a resultfrom poor documentation and lack of predefined answers forfrequently asked questions. Due to lack of documentationand the structure of the organisation with multiple differentIT Service Desks, new agents have a long learning process.Furthermore, recurring simple tasks is a key challenge. Theglobal service desk manager mentioned that around 40% ofthe inquiries are regarding outlook issues, distribution lists,shared mailboxes or otherwise email related. Those inquiriescould potentially be automated using a virtual assistant.
For the team working with ethics and compliance, a keychallenge was lack of standardisation and consistency resultingin answers to inquiries highly depended on the advisors’individual interpretation of the policies. Moreover, as thereis no standardisation and as the team is spread out around theglobe with close to no native speakers in English, it requires alot of energy and time to keep the language at a professionallevel and making sure it can not be misinterpreted. Anotherkey challenge is recurring questions regarding basic rules,where even the simplest inquiries require at least 10 minutes oftime according to a legal counsel at the company in question.Furthermore, availability is a key challenge as limited service
DEGREE PROJECT IN TECHNOLOGY, FIRST CYCLE, 15 CREDITS STOCKHOLM, SWEDEN 2020 4
hours and different time zones result in inconvenient waitingtimes.
D. Use CasesThere are three general types of interactions for virtual
Agent assist refers to a solution where service agents areaugmented by blending automation with human labour. Oneapproach to this is to let the virtual assistant handle routinetasks like gathering relevant information, authenticating theusers and then routing them to a relevant human agent toresolve the issue. A different approach is to have a virtualassistant that monitors the conversation and provides the hu-man agent with suggested responses. The suggested responsescan either be sent to the user by the click of a button, modifiedor rejected. The agent’s decision can in this case be used tofurther train the assistant to improve its accuracy. [13]
The second use case refers to customer-facing virtual assis-tants that fully answers basic questions and runs predefineddialog. More complex inquiries can be handled by searchinga knowledge base or handling the conversation to a suitablehuman agent. [11]
The third use case is the one that is in focus in this reportand it refers to internal virtual assistants. The assistants aredesigned similar to the virtual assistants in the customer self-service use case with the exception of the audience. IT ServiceDesk and human resources are common support functionswhere virtual assistants are implemented. [11]
IV. THE PAPER’S RELATION TO CURRENT RESEARCH
Current research shows how virtual assistants function, andkey factors to consider in an assistant. From a technicalperspective, these key factors are good indicators to use uponevaluating the assistants. [3] [6] When considering organisa-tional value and cost savings, literature shows guidelines asto how virtual assistants can be evaluated both using tangibleand intangible metrics. [11]
The purpose of this paper is to combine technical andeconomical evaluations in order to cover virtual assistants fromboth perspectives. Through this combination, the aim is tounderstand virtual assistants using qualitative and quantitativemeasurements to investigate their performance and value.
V. METHOD
The method was divided into 8 steps: (See figure 1)1) Pre-study2) Practical implementation3) Understanding the organisation4) Data collection by user testing5) Performance evaluation6) Calculations of key performance indicators7) Calculation of potential cost savings
8) Analysis of deployment timing
Phase 1. To investigate and answer the research questionsquantitative and qualitative methods were combined. Theresearch was initiated by exploring current research on thetopic from an organisational and technological perspective.The literature study was complemented by discussions withIBM subject matter experts. In the next step, a team of IBMconsultants were followed in their practical implementation ofthe virtual assistants to gain an understanding of the effort andskills required for implementation. During this step, a deeperunderstanding of the virtual assistants and Watson Assistant inpractice was acquired. Following, discussions and interviewswere conducted with employees of the firm to get a thoroughunderstanding of the organisation, their processes as well astheir key challenges. This was structured as multiple meetingsgoing from understanding their purpose of the project tounderstanding their processes and key challenges in detail. Themeetings were conducted with the IT Service Desk managers,an Ethics & Compliance legal counsel as well as the head ofEthics & Compliance.
Phase 2. The second phase of the research method focusedon collecting, analysing and drawing conclusions from data.Before collecting the data, research objectives and questionswere developed in a formulation stage. To reach the objectives,a quantitative sampling method was used by conducting usertesting and sending out questionnaires (see appendix D and F).The user was first asked to answer general questions regardingtheir usage of the services provided by the IT Service Deskand the Ethics & Compliance team. This part was followedby a testing process where the user was asked to get familiarwith the virtual assistant by chitchatting, that is, writing thingssuch as ”How are you?” or ”Tell me a joke”. Once the userwas familiar with the virtual assistant, it was asked to phrasemultiple questions within pre-specified topics. To help the user,example questions on the topics were given. The reason fordividing it into different topics was to get a broader rangeof questions with a more realistic distribution. Otherwise,the risk would be that some testers only asked questionswithin a specific topic, for example email related issues orcorruption. The testing phase was followed by a survey abouttheir general experience of the virtual assistant and how theythought they would use it. This survey and testing process wassent to employees within the company for which the practicalimplementation was studied, specifically a group of Ethics &Compliance ambassadors working in various countries andpositions with a shared interest in ethics and compliance.Furthermore, a questionnaire was sent out to the Ethics &Compliance team as well as the IT Service Desk to collectdata regarding time spent on tasks on different complexitylevels (see appendix E and B).
Once the survey and testing process was completed, theconversation logs were extracted from Watson Assistant andcleaned. The full data cleaning process is described under DataCleaning.
The conversations were then analysed using existing toolsfrom the IBM Watson Assistant team, primarily the MeasureNotebook and the Effectiveness Notebook to calculate perfor-
DEGREE PROJECT IN TECHNOLOGY, FIRST CYCLE, 15 CREDITS STOCKHOLM, SWEDEN 2020 5
mance metrics such as precision and helpfulness. An estimateon the amount of traffic that would go through the assistantrather than a live agent was made based on historical statisticsand the questionnaire answers. [14] The results were thencombined with other contextual dimensions such as averagecost rate of the different divisions personnel to calculatepotential cost savings.
The report was then concluded in an analysis of whetherthe virtual assistant were ready for deployment and if thetiming is right to invest in the technology. This was basedon performance, IBM expert statements, current research andpotential cost savings. The method is visualised under figure1.
Fig. 1: Visual representation of the applied method.
A. Technical PerformanceTo evaluate the virtual assistants’ technical performance in
the specific use cases, two IBM created Jupyter Notebookswere used together with conversation logs from the conducteduser testings. The first notebook, the Measure Notebook,contains a set of metrics to describe an assistant’s overallperformance, and was developed to identify well performingareas as well as lagging areas. The second notebook, the Effec-tiveness Notebook, is focused around the relative performanceof each intent and entity. For the technical evaluation, theMeasure notebook was first used. From this, an annotationfile was created and used for the Effectiveness Notebook. TheEffectiveness Notebook could then analyse the intents aftera partly manual annotation. From this several metrics couldbe calculated, whereas the most relevant for this report wasprecision and helpfulness.
1) Measure Notebook: This notebook gave two key perfor-mance indicators, coverage and average confidence. Coveragemeasures the system on an utterance level, compared to confi-dence which measures the system on a conversation level. Inother words, the two metrics represent the portion of inquiriesit can respond to and how certain it is of the identified intents.Coverage is measured based on a predetermined thresholdin confidence on the classified intents. [15] Another usefulmeasure based on helpfulness is task fulfillment rate, whichdescribes how well the assistant can help the users reach theirend goals without the help of a human agent.
2) Effectiveness Notebook: In this notebook a confusionmatrix can be built and used to calculate True Positives,False Positive, False Negatives and True Negatives in the
intent classification. Other summary metrics such as numberof utterances, average helpfulness, precision and variance overintents were displayed using the notebook. An important noteon helpfulness is that this measure has a subjective definition,and can therefore vary depending on the goals of the assistant.This notebook was modified for this paper to be able toexamine precision and helpfulness with and without chitchatas a contributing factor. Chitchat is conversations containingnon-use case specific information and is often associated withutterances such as ”How are you?” or ”Tell me a joke”. [16]
3) Data Cleaning: In order to use the Effectiveness Note-book an annotation file created from the Measure Notebookwas used. This annotation file was partly generated by thenotebook but needed a human’s annotation to determine pre-cision and helpfulness. The manual work consisted of notingif the correct intent had been chosen, if the response wascorrect and also if the response was helpful for the user.For this paper, incorrect usage was removed from the logs,for instance when users asked questions in not supportedlanguages or utterances incomprehensible to a human assistant.Furthermore, the assistant’s user interface presents pre-writtenresponse options in some dialog stories, instead of letting theuser writing freely and then recognising the intents within thedialog story. When these alternatives occurred in the logs, theywere removed with the motivation that they would skew theresults and display a stronger intent classification, while theuser simply chose between alternatives in reality.
4) Assumptions: In the evaluation two primary assumptionswere made. First, the amount of low, medium and highcomplexity tasks determined from the surveys is representativein the logs. This means that the same complexity levels will beused when calculating coverage on different complexity levels.Next, the coverage and task fulfillment rate differs betweencomplexity levels by a factor of 1.7. That is, coverage andtask fulfillment rate is 70 % higher in low complexity taskscompared to the metrics on medium complexity. The mediumcomplexity’s coverage and task fulfillment rate is consequently70 % higher than for high complexity tasks. The numbershave been determined with the help from the two contactsin the company. These contacts have experience within theirrespective fields and have been involved in developing thevirtual assistants hence being familiar with the assistants’abilities and limitations.
VI. RESULTS
The following section displays the results from the practicalinvestigation and has been divided into two parts for eachassistant. The first part shows results from the user testings,based on metrics from the two notebooks presented in theMethod section. The second subsection concerns the econom-ical evaluation and displays results from the survey for the ITService desk and the Ethics & Compliance department.
A. IT Service Desk1) Technical Evaluation: For this paper, there were initially
108 conversations logged from the user testings and roughly
DEGREE PROJECT IN TECHNOLOGY, FIRST CYCLE, 15 CREDITS STOCKHOLM, SWEDEN 2020 6
1000 messages. According to the results from the MeasureNotebook, 72 % of the questions were covered by the virtualassistant. Furthermore, the assistant had an average confidenceof 61 % when determining the intents.
From the Effectiveness Notebook, as figure 2 below dis-plays, precision was calculated to 55 % and helpfulness to 72%.
Fig. 2: Visual representation of precision, helpfulness and theirvariances (red line) for the IT Service Desk virtual assistant.In this representation, chitchat is included.
Next, all chitchat was excluded which resulted in theprecision calculated to 48 % and helpfulness to 69 % (seefigure 3).
Fig. 3: Visual representation of precision, helpfulness and theirvariances (red line) for the IT Service Desk virtual assistant.In this representation, chitchat is excluded.
2) Economic Evaluation: This section focuses on the po-tential cost savings of using virtual assistants, based on thesurvey taken by 8 IT Service Desk agents as well as theconversation logs extracted from the user testings.
TABLE I: IT Service Desk workload survey
Lowcomplexity
Mediumcomplexity
Highcomplexity
Averagepercent 46 % 35 % 19 %
Standarddeviation 20 pp 9 pp 13 pp
Averagetime on task 8 min 27 min 68 min
Range timeon task 5 - 11 min 19 - 35 min 48 - 88 min
From the survey regarding the IT Service Desk virtualassistant, the following results were acquired.
TABLE II: IT Service Desk virtual assistant survey
Average RangeIT related issues per quarter 7.1 5 - 9.1Issues brought up with ser-vice desk 4.6 2.7 - 6.4
Have jobs related to IT 53%The Assistants overall un-derstanding 31%
Assistant usage instead ofown research 30%
Will use assistant next time 43%Overall satisfaction 44%
B. Ethics & Compliance
1) Technical Evaluation: The user testings for the Ethics& Compliance assistant resulted in 90 logged conversationsand just under 700 messages. Of these, the virtual assistantwas able to cover 74 %. Also, the average confidence for thisassistant was 64 %, showing how certain the assistant waswhen identifying the intents from the utterances.
From the Effectiveness Notebook, as figure 4 displays,precision was calculated to 74 % and helpfulness to 73 %.
Fig. 4: Visual representation of precision, helpfulness and theirvariances for the Ethics & Compliance virtual assistant. In thisrepresentation, chitchat is included.
DEGREE PROJECT IN TECHNOLOGY, FIRST CYCLE, 15 CREDITS STOCKHOLM, SWEDEN 2020 7
Next, all chitchat was excluded, and the resulted, as figure 5displays, in the precision calculated to 73 % and helpfulnessto 68 %.
Fig. 5: Visual representation of precision, helpfulness and theirvariances for the Ethics & Compliance virtual assistant. In thisrepresentation, chitchat is excluded.
2) Economic Evaluation: This section focuses on the poten-tial cost savings of using virtual assistants, based on the surveytaken by 7 legal counsels working in Ethics & Compliance aswell as the conversation logs extracted from the user testings.
TABLE III: Tangible metrics
Average salary 90 000 eVacation pay 9 000 e
Overhead costs 40 %Average tax labor rate 40 %
CREC 178 200 e/year
From the survey asked to the Ethics & Compliance teamabout their workload, the following results could be acquired.
TABLE IV: Ethics & Compliance workload survey
Lowcomplexity
Mediumcomplexity
Highcomplexity
Average in-quiries permonth
19 12.5 6.2
Standarddeviation 22.2 14.2 6.9
Averagetime on task 24 min 2.2 h 5.9 h
Range timeon task 15 - 32 min 1.4 - 2.9 h 4.4 - 7.3 h
From the survey regarding the Ethics & Compliance virtualassistant, the following results could be acquired.
Average RangeE&C related questions perquarter 2 0.9 - 3.1
Questions brought up withE&C 1.4 0.4 - 2.5
The Assistants overall un-derstanding 51%
Assistant usage instead ofown research 50 %
Will ask assistant next time 73 %Overall satisfaction 63 %
VII. DISCUSSION
This section discuss the observed results divided into thethree key aspects mentioned throughout the paper. The firstpart discusses the technical performance of IBM WatsonAssistant in the two specific use cases, the IT Service Desk andthe Ethics & Compliance department. The second part presentsthe organisational value of the two virtual assistants examinedin regard to tangible as well as intangible benefits. Finally,deployment timing and technological maturity is discussed forthe specific use cases and in a general perspective.
A. Technical Performance
1) IT Service Desk in Comparison to Ethics & Compliance:The results show better coverage and precision for the assis-tant implemented for the Ethics & Compliance departmentcompared to the one for the IT Service Desk. Indeed, equiv-alent results could be seen in user satisfaction and perceivedunderstanding. This result can be caused by several reasonsdivided into subtopics of Subject differences in IT and Ethics& Compliance, Survey differences as well as Difference in usergroups.
Subject differences in IT and Ethics & Compliance. Onefactor as to why the performance was better for the Ethics &Compliance assistant could be due to differences between theuse cases. IT as a subject is broad and it therefore requiresmore effort to cover all questions employees might have. Incomparison, Ethics & Compliance follows the company policywith quite specific information. The number of topics andquestions are of less diverse nature, thus making it easier toforesee what might be asked. In terms of independence, theIT Service Desk assistant would be more independent oncesufficiently refined due to the nature of the questions. TheEthics & Compliance assistant would in contrast be difficultor impossible to make independent due the risks with givingwrong or ambiguous answers.
Survey differences. Despite efforts to make the surveysas similar as possible, difference could be found in termsof example utterances. The virtual assistant in Ethics &Compliance had more specific example questions on topics topartly ease the process of inventing a new question for the userand partly to make the topic itself easier understood. As anexample, to create a question on competition law can be con-
DEGREE PROJECT IN TECHNOLOGY, FIRST CYCLE, 15 CREDITS STOCKHOLM, SWEDEN 2020 8
sidered more difficult than a question on email issues. Unlikeemail issues, competition law needs more specified exampleutterances to avoid over-representation on utterances such as”What is competition law?” or ”Tell me about competitionlaw”. This might however skew the results, where the Ethics &Compliance assistant test group displayed greater ability to askquestions similar to those the Ethics & Compliance counselhas foreseen and implemented. In contrast, the IT assistanttest group asked more general questions not foreseen by theIT Service Desk, such as ”How do I respond to an email?”.
Difference in user groups. Groups of testers will havedifferent experiences using the virtual assistants due to variousexpectations and phrasing that may be more or less suitedfor the assistant. These differences may be due to the usersbackground and experience in using similar technology. Thecompany where the practical implementation was studiedchose applicants for the user test groups after what theythought were representative for future usage. Despite this fact,the risk of over- and under-represented groups are still present.Therefore, this evaluation could have been improved by takingdifferent user test groups and calculate an average over thesegroups, to diminish over- and under-representation in the tests.
2) User Understanding: For the purpose of displayingaccurate measurements on the different metrics - that is,coverage, confidence, precision and helpfulness - data cleaningof various wrong uses and repetition errors have been made.However, in reality one must also consider these when evaluat-ing the assistant. In the user testings, cases of people using theassistant as a search engine occurred. Queries such as ”covid-19” and other utterances not specific to the use cases canneedlessly cause harm to user satisfaction. Users have alsobeen found to repeat the utterance instead of rephrasing itwhen the virtual assistant displays a lack of understanding.Due to this, one should consider the effects of an increasedknowledge on how to effectively use virtual assistants. Thiswould result in better performance which leads to improveduser satisfaction.
3) Confidence Threshold: When the virtual assistant con-siders itself confident enough in its intent classification togive a response, it does so based on a confidence threshold.An increase of this threshold would in theory lower thecalculated coverage, but potentially larger satisfaction, sincea higher confidence in intent classification often is connectedto accurate responses. However, the analysed logs showedthat some intents with a confidence close to the thresholdgave a more accurate response than those with significantlyhigher confidence. Consequently, an analysis in thresholddetermination is left for future research, where focus shouldbe on choosing a threshold that gives least dissatisfaction.
4) Chitchat and How it Affects Performance: Another rel-evant factor from a technical perspective is the precision andhelpfulness when evaluated with and without the presenceof chitchat. Chitchat is not necessarily considered a ”part”of the assistant’s domain and would therefore arguably beunnecessary to display. Furthermore, chitchat have intents
which are - in comparison - easy to classify and thereforeincrease the overall performance. This could result in skewedresults with over-represented chitchat and thus higher scores.For the assistant in the IT Service Desk, the precision wassignificantly lower when chitchat was excluded from theanalysis. The assistant for Ethics & Compliance was howeveronly slightly affected. Notable however is the importance ofchitchat in a virtual assistant and that user satisfaction couldlessen without it. The primary reason for this is that chitchatmakes the assistant more human and fun to use. Furthermore,the user might not understand the assistant without being ableto ask questions such as ”what can you do?” or ”who areyou?”. Chitchat is also especially important for a user in thebeginning when the user wants to know the assistant’s abilitiesand discover its traits. When familiar with the assistant’slimitations and possibilities, the users’ chitchat would decreaseand focus would be laid on the actual questions.
5) Coverage, Precision and Helpfulness: In this paper, thesemetrics have been used to evaluate the technical performanceand potential cost savings. The helpfulness has in this casebeen used to understand how well the conversations could havebeen contained, but can also be presented as a quantitativemetric on how satisfied a user would be of the responses fromthe assistant. Precision has also been chosen over coveragein this paper to understand how well the virtual assistant canunderstand the intent based on the utterance. This is primarilydue to the fact that coverage was based on a thresholdrather than statistical facts and gold labels, unlike precision.However, one could also combine this by either laying anaverage between the two, or manually calculate coverage anduse it instead.
6) The Assistants’ Abilities over Different Complexity Lev-els: One might argue that coverage and task fulfillment ratebased on the assumptions made under Assumptions could beconsidered high, especially for low complexity tasks. However,this assumption has been made based on the quantitativeanalysis and from discussions with the contacts in the company(see section Assumptions) and is therefore deemed reasonablein this paper. To understand how much time the company cansave from using the assistant, one can see that the Ethics &Compliance use case would result in 21 minutes saved onaverage per low complexity task. Furthermore, 40 minuteswould be saved on medium complexity tasks, and 37 minuteson high complexity tasks. Despite this, a deeper analysis on thedifferent complexity levels and how the assistant can performon each level should be made in future research.B. Organisational Value
1) Tangible Benefits: Tangible benefits are defined as ben-efits that can be measured within the organisation. This couldbe potential cost savings in terms of reduced workforce orthe reduction of the average handle time. The main tangiblebenefits are divided into 2 parts:
• Task fulfillment rate• Benefits specific for the IT Service Desk use case
DEGREE PROJECT IN TECHNOLOGY, FIRST CYCLE, 15 CREDITS STOCKHOLM, SWEDEN 2020 9
Task fulfillment rate. Task fulfillment rate refers to theportion of the task the virtual assistant on average can fulfill,i.e how well the assistant can help the user reach its end goalwithout the help of a human. The task fulfillment rate willincrease as the virtual assistant is improved and users are morecomfortable using the technology.
Benefits specific for the IT Service Desk use case.
Automatically setting ticket parameters As the virtualassistant automatically can set parameters in the tickets for theIT Service Desk, the manual labour required from the humanagents will decrease and lead to a potential cost saving.
2) Intangible Benefits: Intangible benefits are defined asbenefits hard or impossible to measure. This could be im-proved user experience or decreased time to resolution. In thissection, the intangible benefits are divided into 2 parts:
• General intangible benefits realised by the entire firm• Benefits Specific for the Ethics & Compliance use case
General Benefits
Reduced time-to-resolution and 24 x 7 x 365 support.
As the virtual assistant is available 24 x 7 x 365 and instantlycan answer questions, it will reduce the time to resolution,especially for international companies operating in differenttime zones with centralised service desks. Reduced time-to-resolution increases the user experience and allows the user toget back to their day-to-day activities quicker.
Competitive advantage as an early adopter. Being an earlyadopter of virtual assistants and digitalisation in general yielda competitive advantage as the company will be more adoptedto the technology and therefore use it more efficiently. Thisis due to the technology being built into the organisationalstructure, culture and processes. Furthermore, this will enableuser testing that generates statistics which can be evaluated toimprove the performance of the implemented virtual assistants.
Increased brand value. With the use of virtual assistantscompanies are seen as innovative with their AI solutions,something that could increase the brand value. This benefitis more valuable for companies with client facing virtualassistants but can be realised for internal roles as well.
Improved employee satisfaction. As the virtual assistantwill handle repetitive basic inquiries, human agents can focuson the more advanced and challenging tasks. This will lead toimproved employee satisfaction and performance.
Statistics to further improve the way to work. Thevirtual assistant generates valuable data that when analysedcould be used to optimise the way to work. This wouldcreate more user-centered departments with an increased userunderstanding and a better user experience. The data could alsobe used to find out employee pain points and system flaws.For example if many users ask questions on how to connectto VPN, the VPN solution might need to be made more userfriendly. An example from the Ethics & Compliance use caseis if many users ask around offering gifts, the company mightwant to include this into corporate training.
Scalability. An important factor for IT solutions isscalability. As the virtual assistants can be scaled to handle
an unlimited number of users, the increase of workforceneeded for an increased service demand would be reduced.Despite having an increased total service demand, the humanworkload would be less affected, than without automation,thanks to the virtual assistant’s ability to handle a portion ofthe cases. This would result in fewer new hires and thereforedecrease the total cost for an increased service demand.This will be especially important for the studied IT ServiceDesk as one of their key challenges, a long learning processfor new hires due to lack of documentation and a complexorganisation, results in an expensive on-boarding process.Furthermore, the process to find and hire new employees isexpensive in itself.
Benefits Specific for Ethics & Compliance
Increased awareness in ethics and compliance. By usinga virtual assistant, the user might feel more comfortablediscussing difficult topics such as suspected fraud, corruptionor harassment than if they were to speak with a human legalcounsel. According to a legal counsel at the studied company,this also applies to non-difficult conversations since employeesdo not wish to give the appearance of not understanding and donot wish to consume too much of the counsels time. The use ofa virtual assistant will therefore lead to increased awarenessaround ethics and compliance, an increased user experienceand an increase in reported cases of harassment, fraud andcorruption. Furthermore, this will increase the compliance ofthe firm and help to reduce the amount of harassment, fraudand corruption.
C. Deployment Timing and Technological Maturity
According to IBM Watson Assistant’s algorithm develop-ment team lead Saloni Potdar, the technological advancementhas come far enough to be efficient in an everyday application.Potdar recon it is time to implement a virtual assistant and thatthe sooner the deployment has been made, the better. Thisis due to two main reasons: internal technological maturityand external technological expectations. On the first point,she considers the technological maturity on IBM WatsonAssistant well enough to be deployed even if the assistantis not fully developed. This way, the system can be provideddata to build up the service, while giving the advantages ofautomation in the already developed areas. She explains thata way to maintain a good outgoing impression while buildingthe assistant is to initially trust the system less and let ahuman supervise its responses, and let the system be moreindependent as it evolves. In terms of external technologicalexpectations, Potdar notes the technological hype which hassurrounded virtual assistants and concludes that the currenthype and therefore expectations are lower compared to afew years ago. Due to a general acceptance of the currentlimitations of natural language processing and understanding,users accept failures today that might not be accepted in thefuture.
Potdar’s statement is directly connected to the Gartner’sHype Cycle (GHC), a concept to graphically depicts the com-mon pattern that follows with each newly arisen technology or
DEGREE PROJECT IN TECHNOLOGY, FIRST CYCLE, 15 CREDITS STOCKHOLM, SWEDEN 2020 10
other invention. Through GHC all new developments consist offive phases, where emphasis is laid on the inflated expectationsfrom the invention. [17]
Fig. 6: The GHC for Artificial Intelligence in July 2019, withVirtual Assistants circled. [18]
According to Gartner (see figure 6), virtual assistants arein the phase of Trough of Disillusionment. In this phase,the investments are low as the interest wanes due to unmetexpectations. As a result, the investments are low, and areonly able to survive if the providers improve the product ina manner that satisfies early adopters. However, in the nextphase - that is, the Slope of Enlightenment - the organisationalbenefits of the technology become concrete and tangiblewhich leads to more enterprises starting pilots given that thetechnology survives. [19]
This introduces a discussion on whether to deploy a virtualassistant before the Slope of Enlightenment, or not. Gartnerestimates that the Plateau of Productivity will be reachedwithin two to five years, and so the time between phasesis thin. The risk of waiting would be that others havedeveloped and deployed their assistants. This would resultin both competitors already having developed assistants, aswell as increased customer expectations from the technology’smaturity. In contrast, if one were to deploy today, the risks ofunmet expectations and a longer payback period is present.However, if the implementation is successful, then there is theclear advantage of a mature and experienced virtual assistantas an early adopter of the technology.
VIII. CONCLUSION
The conclusions of this paper are the following. First, tech-nological performance can be represented using quantitativemetrics such as coverage, confidence, precision and helpful-ness, and can be complemented using qualitative measuressuch as user satisfaction and perceived user understanding.From a technical perspective, one should not only think about
the specific use case the assistant is implemented fore, butalso think about how chitchat can improve the user experience.Furthermore, one should consider the effects of an increasedknowledge on how to effectively use virtual assistants. Thiswill result in an increased user understanding and betterperformance which will lead to improved user satisfaction.A deeper analysis on the confidence’s affect on coverage isleft for future research.
In terms of technological performance, one should also con-sider the technology’s limitations and maturity. The specifictechnological performance is subjective and the technology’slimitations as well as its technological maturity can be usedto complement the virtual assistants’ evaluation. When con-sidering the maturity, one should include the firms internaltechnological and organisational maturity as well as the exter-nal technological expectations. Further analysed was the entrytiming on the market, where one should especially considerthe Gartner Hype Cycle as well as the risks and advantagesof being an early adopter of the technology.
Next, the value a virtual assistant can provide to an organisa-tion includes reduced time-to-resolution as well as having theadvantage of all-hour-support. Also, it can improve employeesatisfaction and simplify scalability. Moreover, the statisticsfrom the conversations is valuable for further improvements,both for the virtual assistant and the way to work. Finally it canbe concluded that due to its ability to primarily help with lowand medium complexity tasks, a virtual assistant can make theagents more efficient and satisfied with their workplace whilelessening their workload.
Conclusions specific for the use cases show that an assistantimplemented in a narrower use case, that is the Ethics &Compliance assistant, can be easier implemented and performbetter in a less developed environment. As a result, thepotential cost savings for the Ethics & Compliance departmentwas calculated to 139 508 e/year. It is however difficult in thisuse case to make the assistant independent on human legalcounsels, considering the importance of clear and unambigu-ous replies. A broader use case, such as the IT assistant withdistinct questions within specific domains, requires more effortto perform at a high level but may be even more beneficialand independent once sufficiently refined. Consequently, thepotential cost savings were calculated to 65 958 e/year forthe IT Service Desk assistant in its current state.
DEGREE PROJECT IN TECHNOLOGY, FIRST CYCLE, 15 CREDITS STOCKHOLM, SWEDEN 2020 11
APPENDIX AIT SERVICE DESK COST SAVINGS CALCULATIONS
1) What amount of tasks (in %) do you deal with on thefollowing complexity levels?
• Low complexity• Medium complexity• High complexity
2) How much time does it take for you on average to handlean inquiry of the following complexity? Please note thatthis is your total time spent on the question includingreading and understanding the question, searching forinformation on the topic, formulating an answer andfollowing up. Please exclude waiting/idle time
• Low complexity
– Less than 5 min– 5 - 10 min– 10 - 20 min– 20 - 40 min– 40 - 60 min– 1 h or more
• Medium complexity
– Less than 5 min– 5 - 10 min– 10 - 20 min– 20 - 40 min– 40 - 60 min– 1 h or more
• High complexity
– Less than 5 min– 5 - 10 min– 10 - 20 min– 20 - 40 min– 40 - 60 min– 1 h or more
DEGREE PROJECT IN TECHNOLOGY, FIRST CYCLE, 15 CREDITS STOCKHOLM, SWEDEN 2020 12
APPENDIX DIT SERVICE DESK ASSISTANT SURVEY & TESTING
A. IntroductionThis is a survey and a testing process to gain valuable
insights how the IT Service Desk virtual assistant performs.
In the following steps, you are going to be asked 3 generalquestions about how you work with IT related issues. You arethen going to be guided through a testing process to test theassistant by asking a series of questions on different topics.After the testing process, we would appreciate if you couldanswer 3 questions regarding the virtual assistant and howyou might utilise it.
NOTE: The following will NOT be presented to yoursupervisor. When presenting the conclusions of this survey,you will be anonymous.
B. Survey Part 1, General Questions1) How many IT related issues do you have each quarter?
• Less than 2• 2 - 6• 7 - 11• 12 - 15• 16 or more
2) For how many of those issues do you ask the IT ServiceDesk for assistance?
• Less than 2• 2 - 6• 7 - 11• 12 - 15• 16 or more
3) Is your job related to IT?• Yes• No
C. Testing Process1) Begin with getting familiar with the virtual assistant by
trying out general conversations, for example startingwith:
• Hello!• Who are you?• What can you do?• Tell me a joke
2) Please ask 2-3 questions regarding emailissues/inquiries, for example: ”I would like to create ashared mailbox”
3) How often did the assistant understand your emailrelated questions?
• Always• Almost always• Usually• Sometimes• Almost never
4) Please ask 2-3 questions regarding MS Teams, for ex-ample: ”I have audio issues in Teams”
5) How often did the assistant understand your Teamsrelated questions?
• Always• Almost always• Usually• Sometimes
DEGREE PROJECT IN TECHNOLOGY, FIRST CYCLE, 15 CREDITS STOCKHOLM, SWEDEN 2020 13
• Almost never6) Please ask 2 or more questions related to IT issues, for
example: ”I have received a suspicious email” or ”I needWIFI access for a guest”
7) How often did the assistant understand your general ITquestions?
• Always• Almost always• Usually• Sometimes• Almost never
D. Survey Part 2, Questions Regarding The Virtual Assistant1) How often will you use the virtual assistant instead of
searching for a solution yourself?• Always• Almost always• Usually• Sometimes• Almost never
2) Will you ask the assistant next time you need IT-support?• Very likely• Likely• Neutral• Unlikely• Very unlikely
3) How satisfied are you with the virtual assistant?• Scale from 1 - 10
4) Do you have any other comments? For example: Whatfunctionality would make you use the assistant more?
1) How many tasks do you deal with of the followingcomplexity levels each month? (Examples were includedin the survey but those will not be presented here dueto confidentiality)
• Low complexity• Medium complexity• High complexity
2) How much time does it take for you on average to handlean inquiry of the following complexity? Please note thatthis is your total time spent on the question includingreading and understanding the question, searching forinformation on the topic, formulating an answer andfollowing up
• Low complexity– Less than 15 min– 15 - 30 min– 30 - 60 min– 1 - 3 h– 3 - 5 h– 5 h or more
• Medium complexity
– Less than 15 min– 15 - 30 min– 30 - 60 min– 1 - 3 h– 3 - 5 h– 5 h or more
• High complexity– Less than 15 min– 15 - 30 min– 30 - 60 min– 1 - 3 h– 3 - 5 h– 5 h or more
APPENDIX FETHICS & COMPLIANCE SURVEY AND TESTING PROCESS
A. IntroductionThis is a survey and a testing process to gain valuable in-
sights how the Ethics & Compliance virtual assistant performs.In the following steps, you are going to be asked 2 generalquestions about how you work with Ethics & Compliance.You are then going to be guided through a testing process totest the assistant by asking a series of questions on differenttopics. After the testing process, we would appreciate if youcould answer 3 questions regarding the virtual assistant andhow you might utilise it.
NOTE: The following will NOT be presented to yoursupervisor. When presenting the conclusions of this survey,you will be anonymous.
B. Survey Part 1, General Questions1) How many questions do you have regarding the Business
Practice Policy each quarter?• Less than 2• 2 - 4• 5 or more
2) How many of those questions do you ask to the Ethics& Compliance team each quarter?
• Less than 2• 2 - 4• 5 or more
C. Testing Process1) Begin with getting familiar with the virtual assistant by
trying out general conversations, for example startingwith:
• Hello!• Who are you?• What can you do?• Tell me a joke
2) Please ask 1-3 questions about gifts and hospitalities,for example: ”Can I invite a customer for dinner?”
3) Please ask 1-3 questions regarding corruption or conflictof interest, for example: ”I suspect corruption” or ”CanI hire my brother?”
DEGREE PROJECT IN TECHNOLOGY, FIRST CYCLE, 15 CREDITS STOCKHOLM, SWEDEN 2020 14
4) How often did the assistant understand your Teamsrelated questions?
5) Please ask 2 or more other questions regarding ethicsand compliance, for example: ”What do I do if I havebeen harassed?” or ”What can I speak about with acompetitor?”
6) How often did the assistant understand you?• Always• Almost always• Usually• Sometimes• Almost never
D. Survey Part 2, Questions Regarding The Virtual Assistant1) How often will you use the virtual assistant instead of
searching for clarity yourself?• Always• Almost always• Usually• Sometimes• Almost never
2) Will you ask the assistant next time you have a questionregarding ethics and compliance?
• Very likely• Likely• Neutral• Unlikely• Very unlikely
3) How satisfied are you with the virtual assistant?• Scale from 1 - 10
4) Do you have any other comments? For example: Whatfunctionality would make you use the assistant more?
• Free text
ACKNOWLEDGMENT
The authors wish to thank both IBM and the company wherethe assistants are developed, for the opportunity to write thisthesis, and their help in collecting the data for this paper.Furthermore, thanks should be given to Dr. Mattias Wiggbergfor the aid and advice as project supervisor from KTH. Theauthors would finally like to give special thanks to Mr. AndreasHerman, for his professional guidance and valuable support asresearch project supervisor from IBM.
REFERENCES
[1] A. M. Turing, Computing Machinery and Intelligence. Mind, 1950.[2] J. Weizenbaum, Computer Power and Human Reason: From Judgment
to Calculation. New York: W. H. Freeman and Company, 1976.[3] M. Radziwill, N. Benton, “Evaluating quality of chatbots and intelligent
conversational agents,” 2017.[4] Virtual assistant: meaning in the cambridge english dictionary.
[5] P. Greenberg, “Chatbots: Conversation for all of us.” Pitney Bowes.[6] “Technology landscape review,” Torchbox: Council Chatbots, 2019.[7] UNDP. Sustainable development goals. [Online]. Available: https://www.
undp.org/content/undp/en/home/sustainable-development-goals.html[8] D. Jurafsky and J. H. Martin, In Speech and language processing.
[10] Creating a skill. [Online]. Available: https://cloud.ibm.com/docs/assistant?topic=assistant-skill-add
[11] “The total economic impact of ibm watson assistant,” Forrester, 2020.[12] M. Deane. (2019) 5 key it service desk challenges and
how to overcome them. [Online]. Available: https://itsm.tools/5-key-it-service-desk-challenges-and-how-to-overcome-them
[13] “Stop trying to replace your agents with chatbots,” Forrester, 2019.[14] M. Farshid, “Market research analysis,” 2019.[15] Measure watson assistant performance. [Online].
[18] L. Goasduff. (2019, September) Top trends on thegartner hype cycle for artificial intelligence, 2019.[Online]. Available: https://www.gartner.com/smarterwithgartner/top-trends-on-the-gartner-hype-cycle-for-artificial-intelligence-2019
[19] Hype cycle research methodology. [Online]. Available: https://www.gartner.com/en/research/methodologies/gartner-hype-cycle
Johan Torssell J. Torssell is currently pursuing hisB.S degree in industrial engineering and manage-ment at the Royal Institute of Technology (KTH),Stockholm, Sweden.
Since 2018, he has been a Project Manager withinIoT and Industry 4.0 at IBM, Sweden.
In this thesis, Mr. Torssell has been working onall sections with an extra focus on the organisationalbenefits.
Erik Persson E. Persson is currently pursuing hisB.S degree in industrial engineering and manage-ment at the Royal Institute of Technology (KTH),Stockholm, Sweden.
In this thesis, Mr. Persson has been working onall sections with an extra focus on the technicalperspective.