Top Banner
A clustering-based approach to detect probable outcomes of lawsuits Undergraduate thesis/final project Escola de Informática Aplicada - UNIRIO Author: Daniel Lemes Gribel <[email protected]> Comission: Leonardo G. Azevedo 1,2 (supervisor) Maíra A. C. Gatti 2 (supervisor) Adriana C. de F. Alvim 1 Sean W. M. Siqueira 1 1 UNIRIO, 2 IBM Research December 19, 2014 1
33

A clustering-based approach to detect probable outcomes of lawsuits

Jul 25, 2015

Download

Technology

Daniel Gribel
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A clustering-based approach to detect probable outcomes of lawsuits

A clustering-based approach to detect probable outcomes of lawsuits

Undergraduate thesis/final projectEscola de Informática Aplicada - UNIRIO

Author: Daniel Lemes Gribel <[email protected]>

Comission:Leonardo G. Azevedo 1,2 (supervisor)

Maíra A. C. Gatti 2 (supervisor)Adriana C. de F. Alvim 1

Sean W. M. Siqueira 1

1 UNIRIO, 2 IBM Research

December 19, 2014 1

Page 2: A clustering-based approach to detect probable outcomes of lawsuits

The project idea

IBM Research, 2013: inspired from a Social Media Simulator (SMSim project) developed to predict Twitter users behavior.

First idea: to model judges behavior and then predict lawsuits outcomes through multi-agent simulation, as SMSim.

New proposal: develop an approach to suggest possible outcomes for a given lawsuit based on modelling, similarity detection and clustering.

2

Page 3: A clustering-based approach to detect probable outcomes of lawsuits

Project contributions

Results shown that, by analysing past data, was possible to verify the most likely outcome and to detect its uncertainty degree.

3

Page 4: A clustering-based approach to detect probable outcomes of lawsuits

Problem statement

Large amount of unstructured data coming from the numerous lawsuits ⇒ Large number of hidden or unknown information

★ How do we know which similar lawsuits can be a reference to a new lawsuit?

★ How do we estimate the time for taking the decisions?

★ How do we estimate a likelihood for the possible emergent results?

4

Page 5: A clustering-based approach to detect probable outcomes of lawsuits

The STF and its responsibilities

The Brazilian Supreme Court (STF) is an organism part of the Brazilian Judiciary System, responsible for the safeguarding and interpreting of the Constitution. STF decides matters related to the Constitution or when there is doubt or controversy regarding legal actions ².

² STF. Institucional. 2011. Available from internet: http://www.stf.jus.br/portal/cms/verTexto.asp?servico=sobreStfConhecaStfInstitucional

5

Page 6: A clustering-based approach to detect probable outcomes of lawsuits

STF judgement configuration

Nowadays, STF is constituted by 11 judges, who act in its Panels as well as in its Plenary.

1. Monocratic: decision taken by a single judge.

2. Collegial: there is a rapporteur (one of them), and each judge votes individually, prevailing the majority decision.

a. First Panel (Primeira Turma): 5 judges.

b. Second Panel (Segunda Turma): 5 judges.

c. Plenary: 11 judges – currently, there is an open position.

6

Page 7: A clustering-based approach to detect probable outcomes of lawsuits

Law classes

There are several lawsuit classes in the Brazilian judicial system: Habeas Corpus, Interlocutory Appeal, Extraordinary Appeal, etc.

In this work, only lawsuits belonging to the Appeal class are considered *.

* The choice of Appeal class was supported by some conversation with a professor and a student of Law School in Fundação Getúlio Vargas (FGV).

7

Page 8: A clustering-based approach to detect probable outcomes of lawsuits

Law classesAppeal: “the instrument to cause a review of a decision by the same judicial authority, or other hierarchically higher, in order to obtain their reform or modification” ³

● +50% of ~1.5M lawsuits judged by STF - which is important in terms of the heterogeneity of the data.

● Have similar dynamics in their life cycles - which is important in terms of pattern detection.

³ Moacyr Amaral Santos, professor, lawyer and minister of the Supreme Court.8

Page 9: A clustering-based approach to detect probable outcomes of lawsuits

Mental modelling

1. Look for an appeal lawsuit page in the STF website and identify its meta-data: lawsuit id, period (start and end date), state of origin, rapporteur, author, defendant, type (area of Law) and subjects associated to the lawsuit.

2. Identify the summary and the claim of the lawsuit, found in a document called “Acórdão”.

3. Extract decisions and votes from “Acórdão”.

9

Page 10: A clustering-based approach to detect probable outcomes of lawsuits

Mental modelling

10

Page 11: A clustering-based approach to detect probable outcomes of lawsuits

Classification and clusteringClustering goals 4:

1. Development of a typology or classification.

2. Investigation of conceptual schemes for grouping entities.

3. Hypothesis generation through data exploration.

4. Hypothesis testing, or the attempt to determine if types defined through other procedures are in fact present in a dataset.

4 ALDENDERFER, M. S.; BLASHFIELD, R. K. Cluster Analysis. Beverly Hills: Sage, 1984.

11

Page 12: A clustering-based approach to detect probable outcomes of lawsuits

Classification and clustering

12

Adapted from WOOYOUNG, K. Parallel Clustering Algorithms: Survey. Available from internet: http://www.solver.com/hierarchical-clustering-intro

Page 13: A clustering-based approach to detect probable outcomes of lawsuits

Hierarchical clustering

13

A B C D E

A,B D,E

C,D,E

A,B,C,D,E

Agglomerative Divisive

tree cut

tree cut

Adapted from Frontline Solvers. Cluster Analysis. Available from internet: http://www.solver.com/hierarchical-clustering-intro

Page 14: A clustering-based approach to detect probable outcomes of lawsuits

Hierarchical clustering+ Advantages:

● Does not require pre-defined number of clusters.

● Accepts any valid measure of distance.

● Less influenced by cluster shapes and less sensitive to handle clusters with different densities.

14

- Disadvantages:

● Complexity, which in general is ≥ O(n²), which makes them too slow for large datasets.

Page 15: A clustering-based approach to detect probable outcomes of lawsuits

Ward’s algorithm

Ward’s minimum variance criterion, a particularization of the Ward general method, the objective function is to minimize the total within-cluster variance.

As a general result, Ward’s minimum variance method leads to compact and spherical clusters.

15

Page 16: A clustering-based approach to detect probable outcomes of lawsuits

Single-linkage algorithmIn Single-linkage clustering, the objective function is defined by those two elements (one in each cluster) that are closest to each other.

16

The shortest of these links causes the fusion of the two clusters whose elements are involved.

Page 17: A clustering-based approach to detect probable outcomes of lawsuits

Complete-linkage algorithmIn Complete-linkage clustering, the objective function is defined by those two elements (one in each cluster) that are farthest away from each other.

17

The shortest of these links causes the fusion of the two clusters whose elements are involved.

Page 18: A clustering-based approach to detect probable outcomes of lawsuits

Proposed solution

18

Page 19: A clustering-based approach to detect probable outcomes of lawsuits

Similarity calculation

From the modelled dataset, calculate the similarities between lawsuits:

1. Each pair of lawsuit receives a similarity coefficient regarding to a property.

2. Then, a mean (resultant) matrix is obtained from each property matrix.

Output: Similarity matrix

19

Page 20: A clustering-based approach to detect probable outcomes of lawsuits

Similarity calculationSimilarity metric - Jaccard index:

20

Mean similarity:

Page 21: A clustering-based approach to detect probable outcomes of lawsuits

Lawsuits clustering

From the similarities observed, run the hierarchical clustering algorithm.

Output: lawsuits classified into clusters.

21

Page 22: A clustering-based approach to detect probable outcomes of lawsuits

Lawsuit instance assigning

From the detected clusters, calculate the similarities between the new lawsuit instance and the other lawsuits already classified.

Output: new instance assigned to the most similar cluster.

22

Page 23: A clustering-based approach to detect probable outcomes of lawsuits

Decisions compilationConsidering a list of judges that will decide the lawsuit:

1. Collect their past votes observed in the cluster.2. Compute the degree of agreement between them.

For each judge jx, compare his/her decisions with each decision taken by another judge composing input, lawsuit by lawsuit.

Ratio no of commum votes/no of commum decisions determines the degree of agreement for each judge.

Output: the likely outcome – a number between 0 and 1, indicating the probable decision.

23

Page 24: A clustering-based approach to detect probable outcomes of lawsuits

Datasets

lawsuit_16.csv: 16 lawsuits

decision_16.csv: 24 decisions

Lawsuits: lawsuit id, start/end date of lawsuit, state of origin, rapporteur, defendant, author, type, subjects, summary and claim.

Decisions: associated lawsuit id, decision id, type of decision, date, votes tuple <judge name, vote> and resultant decision.

24

Page 25: A clustering-based approach to detect probable outcomes of lawsuits

Similarity analysis

25Rapporteur Summary

completely similar

completely different

Page 26: A clustering-based approach to detect probable outcomes of lawsuits

Similarity analysis

26Mean similarity Mean similarity (Pearson correlation)

completely similar

completely different

Page 27: A clustering-based approach to detect probable outcomes of lawsuits

Clustering analysis

27

completely similar

completely different

Page 28: A clustering-based approach to detect probable outcomes of lawsuits

Agglomerative algorithms performances

28

Page 29: A clustering-based approach to detect probable outcomes of lawsuits

Prediction results

29

Page 30: A clustering-based approach to detect probable outcomes of lawsuits

Prediction results

30

reveals an…Optimization

problem!

● The correct choice of the number k of clusters is not trivial, depending on the distribution of points in a dataset and on the desired clustering resolution.

● Possible approach: define a search space, overvalue a k, and then develop optimization heuristics to determine a new stopping point (k2) when the algorithm finds a good solution.

● A stopping point, in this case, could be when the algorithm finds a cluster that is similar enough to the instance been tested and has difficulties to improve this best rate found.

Page 31: A clustering-based approach to detect probable outcomes of lawsuits

Main contributions

● By analysing past data, it is possible that other similar cases were already judged.

● Results shown that was possible to verify the most likely outcome and to detect the degree of uncertainty of the outcome.

● Prediction results were satisfied: lawsuit instances were correctly assigned to clusters and similarity comparison revealed a good coefficient between lawsuits.

31

Page 32: A clustering-based approach to detect probable outcomes of lawsuits

Future work● Use more sophisticated machine learning techniques.● Investigate a more efficient clustering method than the

hierarchical clustering - consider optimization issues.● Discriminate decisions by type.● Develop a better mechanism to find lawsuits properties

weights.● Have a training and a testing dataset. Then, use evaluation

metrics to check if predictions match real outcomes.● Investigate stochastic simulation approaches.

32

Page 33: A clustering-based approach to detect probable outcomes of lawsuits

Code and datasets at bitbucket.org Git repository.Contact [email protected] to have access!

Thank you! Questions?

33