Top Banner
The CASS Technique for Evaluating the Performance of Argument Mining Rory Duthie, John Lawrence, Katarzyna Budzynska, Chris Reed
32

The CASS Technique for Evaluating the Performance of ... · The CASS Technique for Evaluating the Performance of Argument Mining Rory Duthie, John Lawrence, Katarzyna Budzynska, Chris

Aug 25, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The CASS Technique for Evaluating the Performance of ... · The CASS Technique for Evaluating the Performance of Argument Mining Rory Duthie, John Lawrence, Katarzyna Budzynska, Chris

The CASS Technique for Evaluating the Performance of Argument Mining

Rory Duthie, John Lawrence, Katarzyna Budzynska, Chr is Reed

Page 2: The CASS Technique for Evaluating the Performance of ... · The CASS Technique for Evaluating the Performance of Argument Mining Rory Duthie, John Lawrence, Katarzyna Budzynska, Chris

Centre for Argument TechnologyUniversity of Dundee

Rory Duthie John Lawrence Katarzyna Budzynska Chris Reed

IFiS Polish Academy of Sciences

22

Page 3: The CASS Technique for Evaluating the Performance of ... · The CASS Technique for Evaluating the Performance of Argument Mining Rory Duthie, John Lawrence, Katarzyna Budzynska, Chris

OutlineMotivation and Aim

• Problems when publishing evaluation and results

• CASS (Combined Argument Similarity Score)

Metric• How CASS is calculated

Automation• Deployment of CASS

33

Page 4: The CASS Technique for Evaluating the Performance of ... · The CASS Technique for Evaluating the Performance of Argument Mining Rory Duthie, John Lawrence, Katarzyna Budzynska, Chris

OutlineMotivation and Aim

• Problems when publishing evaluation and results

• CASS (Combined Argument Similarity Score)

Metric• How CASS is calculated

Automation• Deployment of CASS

44

Page 5: The CASS Technique for Evaluating the Performance of ... · The CASS Technique for Evaluating the Performance of Argument Mining Rory Duthie, John Lawrence, Katarzyna Budzynska, Chris

Motivation

•Consistency for the Argument Mining community

•Metric which does not double penalise mismatches

•Automate the calculations

55

Page 6: The CASS Technique for Evaluating the Performance of ... · The CASS Technique for Evaluating the Performance of Argument Mining Rory Duthie, John Lawrence, Katarzyna Budzynska, Chris

Motivation: Consistency for the community

From the 2nd Workshop on Argument(ation) Mining:

• Inter-annotator agreement: 3 papers - Cohen’s Kappa 3 papers - percentage agreement2 papers - precision and recall 3 papers - other methods

• Automatic Argument Mining results: 4 papers - accuracy 5 papers - precision, recall and F-score1 paper - macro-averaged F-score

• Other Metrics in Comp Ling: ROUGE, in text summarization

66

Page 7: The CASS Technique for Evaluating the Performance of ... · The CASS Technique for Evaluating the Performance of Argument Mining Rory Duthie, John Lawrence, Katarzyna Budzynska, Chris

Motivation: Metric (1/3)(Kirschner et al., 2015) provides:• Graph Based approach, APA, Weighted Average

Problems: • Segmentation differences

• Propositional content relations only

• Not all nodes in an analysis (Distance < 6)

• Relation direction ignored

• Set metrics

77

Page 8: The CASS Technique for Evaluating the Performance of ... · The CASS Technique for Evaluating the Performance of Argument Mining Rory Duthie, John Lawrence, Katarzyna Budzynska, Chris

Motivation: Metric (2/3)CASS extends (Kirschner et al., 2015):

• Segmentation differences

• Propositional content relations and dialogical content relations:

• confusion matrices

• all nodes

• differing segmentation

88

Page 9: The CASS Technique for Evaluating the Performance of ... · The CASS Technique for Evaluating the Performance of Argument Mining Rory Duthie, John Lawrence, Katarzyna Budzynska, Chris

Motivation: Metric (3/3)• Use CASS to combine scores

• CASS with any metric

• Annotator agreement and Argument Mining results

• Comparison of analysis in different annotation schemes

9

Page 10: The CASS Technique for Evaluating the Performance of ... · The CASS Technique for Evaluating the Performance of Argument Mining Rory Duthie, John Lawrence, Katarzyna Budzynska, Chris

Motivation: Automatic Solution

Manual VS ManualCohen’s Kappa,Fleiss Kappa…

Manual VS AutomaticPrecision, Recall, F-score,

Accuracy…

1010

VS VS

Page 11: The CASS Technique for Evaluating the Performance of ... · The CASS Technique for Evaluating the Performance of Argument Mining Rory Duthie, John Lawrence, Katarzyna Budzynska, Chris

OutlineMotivation and Aim

• Problems when publishing evaluation and results

• Aim of CASS (Combined Argument Similarity Score)

Metric• How CASS is calculated

Automation• Deployment of CASS

1111

Page 12: The CASS Technique for Evaluating the Performance of ... · The CASS Technique for Evaluating the Performance of Argument Mining Rory Duthie, John Lawrence, Katarzyna Budzynska, Chris

Metric: Segmentation (1/4)

1212

Still, it is possible that, should war erupt in Iraq, American and British forces might fall foul of, for example, the provision of the ICC treaty outlawing attacks on military targets that cause "clearly excessive" harm to civilians.

Page 13: The CASS Technique for Evaluating the Performance of ... · The CASS Technique for Evaluating the Performance of Argument Mining Rory Duthie, John Lawrence, Katarzyna Budzynska, Chris

Metric: Segmentation (2/4)

1313

That is especially so if they do not learn lessons from recent wars and take corrective steps. The weapon most likely to produce such harm is the cluster bomb.

Page 14: The CASS Technique for Evaluating the Performance of ... · The CASS Technique for Evaluating the Performance of Argument Mining Rory Duthie, John Lawrence, Katarzyna Budzynska, Chris

Metric: Segmentation (3/4)

1414

12 31 1810 28S2 17 12 27

S1 20 18 29 39 31 18

Still, it is possible that, should war erupt in Iraq, American and British forces might fall foul of, for example, the provision of the ICC treaty outlawing attacks on military targets that cause "clearly excessive" harm to civilians.

Page 15: The CASS Technique for Evaluating the Performance of ... · The CASS Technique for Evaluating the Performance of Argument Mining Rory Duthie, John Lawrence, Katarzyna Budzynska, Chris

Metric: Segmentation (4/4)

•Pk - (Beeferman et al., 1999)

•WindowDiff - (Pevzner and Hearst, 2002)

•Segmentation Similarity - (Fournier and Inkpen, 2012)

1515

Page 16: The CASS Technique for Evaluating the Performance of ... · The CASS Technique for Evaluating the Performance of Argument Mining Rory Duthie, John Lawrence, Katarzyna Budzynska, Chris

Metric: Calculating Relations

•Guaranteed matching formula used for all propositions and locutions

•We use the Levenshtein distance

•Levenshtein distance and word positions are combined to give node matches

1616

Page 17: The CASS Technique for Evaluating the Performance of ... · The CASS Technique for Evaluating the Performance of Argument Mining Rory Duthie, John Lawrence, Katarzyna Budzynska, Chris

Metric: Propositional Relations (1/3)

1717

5

6

42

31

7

2 4

31

6

8

5

Annotation 1 Annotation 2

Page 18: The CASS Technique for Evaluating the Performance of ... · The CASS Technique for Evaluating the Performance of Argument Mining Rory Duthie, John Lawrence, Katarzyna Budzynska, Chris

Metric: Propositional Relations (2/3)

1818

5

6

42

31

7

2 4

31

6

8

5

Annotation 1 Annotation 2

Page 19: The CASS Technique for Evaluating the Performance of ... · The CASS Technique for Evaluating the Performance of Argument Mining Rory Duthie, John Lawrence, Katarzyna Budzynska, Chris

Metric: Propositional Relations (3/3)

•Pair nodes and check the relation attached

•When there is a differing segmentation, consider fine grained and convergent arguments

•All node pairs are considered to give a confusion matrix

19

Page 20: The CASS Technique for Evaluating the Performance of ... · The CASS Technique for Evaluating the Performance of Argument Mining Rory Duthie, John Lawrence, Katarzyna Budzynska, Chris

Metric: Dialogical Relations (1/3)

2020

Page 21: The CASS Technique for Evaluating the Performance of ... · The CASS Technique for Evaluating the Performance of Argument Mining Rory Duthie, John Lawrence, Katarzyna Budzynska, Chris

Metric: Dialogical Relations (2/3)

2121

Page 22: The CASS Technique for Evaluating the Performance of ... · The CASS Technique for Evaluating the Performance of Argument Mining Rory Duthie, John Lawrence, Katarzyna Budzynska, Chris

Metric: Dialogical Relations (3/3)

•Split calculation into parts

•When there is a differing segmentation, considered matched pairs

•All node pairs are considered to give a confusion matrix

22

Page 23: The CASS Technique for Evaluating the Performance of ... · The CASS Technique for Evaluating the Performance of Argument Mining Rory Duthie, John Lawrence, Katarzyna Budzynska, Chris

CASS technique

•Combine scores for the CASS technique

•Applied to any consistent combination of scores

2323

Page 24: The CASS Technique for Evaluating the Performance of ... · The CASS Technique for Evaluating the Performance of Argument Mining Rory Duthie, John Lawrence, Katarzyna Budzynska, Chris

CASS: Evaluation

•Use CASS – Kappa as it provides an adjustment of the score for chance

•Not the only score that can be used with CASS

2424

Page 25: The CASS Technique for Evaluating the Performance of ... · The CASS Technique for Evaluating the Performance of Argument Mining Rory Duthie, John Lawrence, Katarzyna Budzynska, Chris

CASS: Extension

•Any metric with a confusion matrix can be applied to CASS

• E.g. Balanced Accuracy, Informedness…

•We provide a select set but there is no metric ruled out

2525

Page 26: The CASS Technique for Evaluating the Performance of ... · The CASS Technique for Evaluating the Performance of Argument Mining Rory Duthie, John Lawrence, Katarzyna Budzynska, Chris

OutlineMotivation and Aim

• Problems when publishing evaluation and results

• Aim of CASS (Combined Argument Similarity Score)

Metric• How CASS is calculated

Automation• Deployment of CASS

2626

Page 27: The CASS Technique for Evaluating the Performance of ... · The CASS Technique for Evaluating the Performance of Argument Mining Rory Duthie, John Lawrence, Katarzyna Budzynska, Chris

Automation: AIF (Argument Interchange Format)

•AIF allows us to split calculations into component parts: segmentation, propositional and dialogical

•AIF allows the translation of other representation models to AIF format

•Allows for comparison of corpora in different representations.

•However, CASS technique is independent of AIF

2727

Page 28: The CASS Technique for Evaluating the Performance of ... · The CASS Technique for Evaluating the Performance of Argument Mining Rory Duthie, John Lawrence, Katarzyna Budzynska, Chris

Automation: AIFdb

28

http://www.aifdb.org/search

Page 29: The CASS Technique for Evaluating the Performance of ... · The CASS Technique for Evaluating the Performance of Argument Mining Rory Duthie, John Lawrence, Katarzyna Budzynska, Chris

Automation: AIFcorpora

http://corpora.aifdb.org/

29

Page 30: The CASS Technique for Evaluating the Performance of ... · The CASS Technique for Evaluating the Performance of Argument Mining Rory Duthie, John Lawrence, Katarzyna Budzynska, Chris

Automation: Argument Analytics

http://analytics.arg-tech.org

30

Page 31: The CASS Technique for Evaluating the Performance of ... · The CASS Technique for Evaluating the Performance of Argument Mining Rory Duthie, John Lawrence, Katarzyna Budzynska, Chris

Thank You.

[email protected]

31

Find out more athttp://arg.tech

Come to COMMA 2016: Conference onComputational

Models of Argument(Potsdam)

Investigate thedatasets at

http://aifdb.org

31

Page 32: The CASS Technique for Evaluating the Performance of ... · The CASS Technique for Evaluating the Performance of Argument Mining Rory Duthie, John Lawrence, Katarzyna Budzynska, Chris

ReferencesChristian Kirschner, Judith Eckle-Kohler, and Iryna Gurevych. 2015. Linking the thoughts: Analysis of argumentation structures in scientific publications. In Proceedings of the Second Workshop on Argumentation Mining. Association for Computational Linguistics, pages 1–11.

Doug Beeferman, Adam Berger, and John Lafferty. 1999. Statistical models for text segmentation. Machine learning, 34(1-3):177–210.

Lev Pevzner and Marti A Hearst. 2002. A critique and improvement of an evaluation metric for text segmentation. Computational Linguistics, 28(1):19–36.

Chris Fournier and Diana Inkpen. 2012. Segmentation similarity and agreement. In Proceedings of the2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 152–161. Association for Computational Linguistics

3232