Graphical vs. Tabular Notations for Risk Modelshomepage.tudelft.nl/6d93v/talks/labunets-ESEM2017-slides.pdf• Recruitment process: –Email invitation distributed via mailing lists

1

Graphical vs. Tabular Notations for Risk Models:On the Role of Textual Labels and Complexity

Katsiaryna (Kate) LabunetsTU Delft, The Netherlands

E: [email protected]

Joint work with Fabio Massacci, University of Trento, ItalyAlessandra Tedeschi, DeepBlue srl, Rome, Italy

InternationalSymposiumon EmpiricalSoftwareEngineeringandMeasurements,Toronto,Canada– November10th,2017

2

Rationale

• Risk recommendations should be “consumed” mostly by not-experts in security

• What if the security representation is not easy to understand?– Stakeholder does not

understand you– The security recommendations

are not implemented• “Understand” != “Believe to

have understood”

3

Example Risk ModelsA simple example of one attack path represented in graphical and tabular notation.

Customershares credentials with next-of-kin

Unauthorized account login[unlikely]

Regularly inform customers of terms of

use

Integrity of account data

Lack of compliance with terms

of use

Customer

severe

Threat

Vulnerability

Threat scenario

Consequence

Unwanted incident

Treatment

Likelihood

Asset

Threatevent Threatsource

Vulnerability Impact Overalllikelihood

Levelofimpact

Asset Securitycontrol

Customersharescredentialswithnext-of-kin

Customer Lackofcompliancewithtermsofuse

Unauthorizedaccountlogin

Unlikely Severe Integrityofaccountdata

Regularlyinformcustomersoftermsofuse

CORASdiagram

NISTtablerowentry

UML-stylediagram

Threat

Customer

Vulnerability

Lack of compliance

with terms of use

Threat scenario

Customer

shares credental

with next-of-kin

Treatment

Regularly inform

customers of security

best practces

Unwanted incident

Unauthorized

account login

[Likelihood: unlikely]

Asset

Integrity of

account data

Consequence

Severe

4

Previous work

• Published in EMSE journal:– Labunets, K., Massacci, F., Paci, F., Marczak, S. and

de Oliveira, F.M., 2017. Model comprehension for security risk assessment: an empirical comparison of tabular vs. graphical representations. Empirical Software Engineering, pp.1-40.

• Included two studies: – [2014] 69 MSc and BSc students from Italy and Brazil – [2015] 83 professional post-master course and MSc students from Italy

• Treatment– graphical and tabular risk modeling notations

• Findings– Tabular is more effective that the graphical representation for simple

comprehension tasks– Less difference for complex tasks, but still tabular is better

5

Research questions

We address the following questions for participants with a significant work experience:

RQ1: Does the task complexity have an impact on the comprehensibility of the models?

RQ2: Does the availability of textual labels improve the participants effectiveness in extracting correct information about security risks?

6

Experiment Description [1/2]

• Goal: – Compare tabular vs. graphical risk models w.r.t.

comprehensibility• Treatments

– Notations: NIST 800-30 (tabular); CORAS (graphical); UML-style (graphical)

– Task: open questions with different level of complexity about information presented in the model

• 7 questions (originally 12 but 5 questions were discarded due to an implementation error)

• Between-subject design– one treatment per participant

7

Experiment Description [2/2]

• Application scenario: – Online Banking scenario by Poste Italiane

• Recruitment process: – Email invitation distributed via mailing lists by UNITN and DeepBlue– Offered a reward of 20 euro (via PayPal)– 572 attempts to start the experiment

• Participants: – 61 professional (avg. 9 years of working experience)

The number of participants reached each experimental phase

8

Demographics

36%

41%

23%

Age

24–30yrsold

31-40yrsold

41-62yrsold

3% 18%

43%

36%

Workingexperience

Didnotreport

Had1-3yrs

Had4-7yrs

Had>7yrs

2% 23%

31% 28%

16%

Experienceingraph.modelinglanguages

Novices

Beginners

Competentusers

Proficientusers

Experts

11%

36%

8%

45%

Educationdegree

BSc

MSc

MBA

PhD

9

Used Risk Models: NIST

1

ThreatEvent ThreatSource

Vulnerabilities Impact Asset OverallLikelihoo

LevelofImpact

SecurityControls

Customer'sbrowserinfectedbyTrojanandthisleadstoalterationoftransactiondata

Hacker

1.Poorsecurityawareness2.Weakmalwareprotection

Unauthorizedtransactionviawebapplication

Integrityofaccountdata

Likely Severe

1.Regularlyinformcustomersaboutsecuritybestpractices.2.Strengthenauthenticationoftransactioninwebapplication.

Keyloggerinstalledoncomputerandthisleadstosniffingcustomercredentials.Whichleadstounauthorizedaccesstocustomeraccountviawebapplication.

Cybercriminal

Insufficientdetectionofspyware

Unauthorizedtransactionviawebapplication


Likely SevereStrengthenauthenticationoftransactioninwebapplication.

Spear-phishingattackoncustomersleadstosniffingcustomercredentials.Whichleadstounauthorizedaccesstocustomeraccountviawebapplication.

Cybercriminal

PoorsecurityawarenessUnauthorizedtransactionviawebapplication


Likely Severe

1.Regularlyinformcustomersaboutsecuritybestpractices.2.Strengthenauthenticationoftransactioninwebapplication.

Keyloggerinstalledoncustomer'scomputerleadstosniffingcustomercredentials

Cybercriminal


Unauthorizedaccesstocustomeraccountviawebapplication

Userauthenticity Certain Severe

Spear-phishingattackoncustomersleadstosniffingcustomercredentials

Cybercriminal

PoorsecurityawarenessUnauthorizedaccesstocustomeraccountviawebapplication

Userauthenticity Certain SevereRegularlyinformcustomersaboutsecuritybestpractices.

Keyloggerinstalledoncustomer'scomputerleadstosniffingcustomercredentials

Cybercriminal


Unauthorizedaccesstocustomeraccountviawebapplication

Confidentialityofcustomerdata

Certain Severe

Spear-phishingattackoncustomersleadstosniffingcustomercredentials

Cybercriminal

PoorsecurityawarenessUnauthorizedaccesstocustomeraccountviawebapplication


Certain SevereRegularlyinformcustomersaboutsecuritybestpractices.

Fakebankingappofferedonapplicationstoreandthisleadstosniffingcustomercredentials

Cybercriminal

Lackofmechanismsforauthenticationofapp

Unauthorizedaccesstocustomeraccountviafakeapp

Userauthenticity Likely Critical Conductregularsearchesforfakeapps.

Fakebankingappofferedonapplicationstoreandthisleadstosniffingcustomercredentials

Cybercriminal


Unauthorizedaccesstocustomeraccountviafakeapp


Likely Severe Conductregularsearchesforfakeapps.

Fakebankingappofferedonapplicationstoreleadstosniffingcustomercredentials.Whichleadstounauthorizedaccesstocustomeraccountviafakeapp.

Cybercriminal


UnauthorizedtransactionviaPosteApp


Unlikely Minor Conductregularsearchesforfakeapps.

Fakebankingappofferedonapplicationstoreleadstoalterationoftransactiondata

Cybercriminal


UnauthorizedtransactionviaPosteApp


Unlikely Minor Conductregularsearchesforfakeapps.

Smartphoneinfectedbymalwareandthisleadstoalterationoftransactiondata

Hacker WeakmalwareprotectionUnauthorizedtransactionviaPosteApp


Unlikely MinorRegularlyinformcustomersaboutsecuritybestpractices.

Denial-of-serviceattack Hacker1.Useofwebapplication2.Insufficientresilience

Onlinebankingservicegoesdown

Availabilityofservice

Certain Minor1.Monitornetworktraffic.2.Increasebandwidth.

Web-applicationgoesdownSystemfailure

ImmaturetechnologyOnlinebankingservicegoesdown

Availabilityofservice

Certain MinorStrengthenverificationandvalidationprocedures.

10

Used Risk Models: CORAS

11

Used Risk Models: UML-style

12

Comprehension Questions

We ask to identify a risk element of a specific type that is related to another element of a different type.

“Which threats can exploit the vulnerability ‘Poor security awareness’? Please specify all threats:”

At least one question per element type:Graphicalelementtypes:1. Threat2. Vulnerability3. Threatscenario4. Unwantedincident5. Likelihood6. Consequence7. Asset8. Treatment

Tabularelementtypes:1. Threatsource2. Vulnerability3. Threatevent4. Impact5. Overalllikelihood6. Levelofimpact7. Asset8. Securitycontrol

13

Comprehension Questions: Complexity

• Task complexity is based on [Wood, 1986].• Our mapping of Wood’s components to the elements of

modeling notations:– Information cues (IC) describe some characteristics that help to

identify the desired element of the model. • Which are the assets that can be harmed by the unwanted

incident “Unauthorized access to HCN"?– Required acts (A) are judgment acts that require to select a

subset of elements meeting some explicit or implicit criteria.• What is the highest consequence?

– Relationships (R) between a desired element and other elements of the model that must be identified.

• … the assets that can be harmed by Cyber criminal…

Complexity (Qi) = |ICi| + |Ri| + |Ai|

14

Comprehension Questions: Complexity

• Simple question: – “What are the assets that can be harmed by Cyber criminal?”

• Complex question: – “Which unwanted incidents may be exploited by Hacker to

harm the asset “Integrity of account data” with the highest likelihood?”

Legend:– Information cue– Relationship– Required act

15

Measurements

• Precision of the response to a question: – # of identified correct elements / # of all elements in a

response• Recall of the response to a question:

– # of identified correct elements / # of all expected correct elements

• F-measure is a weighted harmonic mean of precision and recall

• Subject’s Comprehension– Aggregated F-measure of all questions about assigned risk

model

16

ResultsAllquestions

●

●

●●

●

●

●●

●

● ●

●●●●●

●

●●

●

●

med

ian

all =

0.8

3median all = 0.91

T: N= 0 UML: N= 1

CORAS: N= 1

T: N= 13 −>UML: N= 8 −>

CORAS: N= 5 −>

T: N= 4 UML: N= 3

CORAS: N= 3

T: N= 4 UML: N= 8

CORAS: N= 11

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.00Aggregated Recall

Aggr

egat

ed P

resis

ion

●●●

●

●

All questions

METHOD● Tabular

UML

CORAS

17

Results: RQ1 – Complexity effect

• F-measure of simple (compl. lvl =2) vs. complex (compl. lvl >2) questions:

– UML & CORAS: Small difference, but not statistically significant

– Tabular: Statistical equivalence using an equivalence test (TOST)

Noeffecthasbeenobservedinourexperiment

Table: F-measure by task complexity

18

Results: RQ2 – Textual labels

• Precision:– Tabular and UML are

equivalent • pTOST-MW = 0.04

– Tabular > CORAS • pMW = 0.0009, pKW = 0.002

Availability of textual labels helps to elicit better responses

• Recall:– Tabular > UML

• pMW = 0.004, pKW = 0.008– Tabular > CORAS

• pMW = 1.4·10−5, pKW = 6.6·10−5

– UML > CORAS • pMW = 0.04

Table: Precision and recall by modeling notation

19

RQ2: Discussion• Tables are good, but textual labels also help

– Tables provide possibility to use computer-aided search and copy&paste information

– CORAS group used search feature more often than UML group• CORAS does not have textual labels to identify elements of necessary

type

0.00

0.25

0.50

0.75

1.00

Tabular UML CORASModel type

Usa

ge o

f sea

rch

feat

ure,

%

ResponseNo

Yes

0.00

0.25

0.50

0.75

1.00

Tabular UML CORASModel type

Usa

ge o

f cop

y-pa

ste

feat

ure,

%

ResponseNo

Yes

Search usage Copy&paste usage

20

Conclusions

• Electronic tables may be your best choice to communicate security risk assessment results with stakeholders

• Pure graphical models are prone to comprehension errors (comparing to tables). Mitigation options: – Textual labels [cheap]– Invest more in training stakeholders on notation

[expensive]

21

Open questions

• How well models support memorization of information about security risk– What information you can recall from your memory?

• Task complexity– Can we measure it better? – Which questions are complex enough to trigger

declared benefits of graphical models?

Graphical vs. Tabular Notations for Risk Modelshomepage.tudelft.nl/6d93v/talks/labunets-ESEM2017-slides.pdf• Recruitment process: –Email invitation distributed via mailing lists

Documents