Faculty of Informatics Technical University of Munich The Smart Workspace What can technology do for the legal profession? Bernhard Waltl, 2018
Faculty of InformaticsTechnical University of Munich
The Smart WorkspaceWhat can technology do for the legal profession?Bernhard Waltl, 2018
Introduction
• Processes of legal experts (scientists and lawyers) are…• ... time-intensive• ... knowledge-intensive• ... data-intensive.
• Legal Data Science is becoming more and more attractive, because• ... process time and memory space are cheap• ... algorithms can process data fast and accurate.
• In order to achieve highest accuracy, • algorithms (e.g., importer, segmenter, named entity recognition),• models and patterns (e.g., machine learning models, linguistic models),• training and test data sets,
• have to be adapted.
2180322 Waltl - The Smart Workspace
MotivationGartner Hype Cycle July 2017
180322 Waltl - The Smart Workspace 3
Deep Learning
Artificial General Intelligence
Smart Workspaces
Machine Learning
CognitiveComputing
The „Smart“ Workspace
180322 Waltl - The Smart Workspace 4
The „Smart“ Workspace
1. Support of search and exploration processes§ eDiscovery, forensics, etc.§ E.g., Due diligence, technology assisted review (TAR), etc.
2. Support during creation of contracts§ Document assembling§ Consistency checks and smart recommendations
3. Structured computational contracts§ Machine-readable representation of contracts§ Clause dependent , e.g., hybrid contracts
180322 Waltl - The Smart Workspace 5
Use Cases
Huge researchpotential
Structuring a legal contract is a journey
Recognition of names entities
POS - tagging
Stop word removal
Segmentation
Tokenization
Unstructured Information
Structured Information
Extraction andcrawling of text
Stream of tokens
Different document types
Different document formats
Normalization of names entities
Stemming
Lemmatization
Dependency parsing
Constituencyparsing
Extraction of auxiliary sentences
Recognition of subjects, objects, etc.
Semantic role labeling
Extraction of relations
Determination of functional types
Creation of controlled natural language
Normalization of text
Formal representationof concepts
Creation and reuse of ontologies
Lexical Phase
Morphological Phase
FormalizationPhase
Syntactic Phase
Semantic Phase
Raw TextPhase
180322 Waltl - The Smart Workspace 6
Waltl, Bernhard, Semantic Analysis and Computational Modeling of Legal Documents, Dissertation, 2018 (to appear)
Technology is language agnosticEnglish works even better
180322 Waltl - The Smart Workspace 7
Classification of Norms
180322 Waltl - The Smart Workspace 8
Reference ArchitectureOpen-Source Software Stack
180322 Waltl - The Smart Workspace 9
Implementation Details• Web Application• ElasticSearch• Apache UIMA • Apache Spark
Complex pattern recognizer(Apache Ruta)
Active machinelearning
componentPattern
definitions
Exporter
Dependencyparser
POS tagger
Metric calc. component
Named entityrecognizer
Splitter Stemmer
Lemmatizer Subject tagger
Tokenizer
User interface
Exploration
Text analysis engine
Navigation
Thesauri
Importer
Database and search-engine
Knowledgebases
Data access layer
Information extraction component
Classification of NormsRule-based approach
UIMA (Unstructured Information Management Architecture)• Pipes & Filters architecture
• Thread-safe
• Apache Ruta: complex pattern specification
180322 Waltl - The Smart Workspace 10
Splitter Segmenter TokenizerReferenceExtractorDocument Sections Sentences Tokens
(Words)
Sentence Classifier
• Structured Text • Named Entities• Classified Sentences
Money Value Extractor
Date & TimeExtractor
Named Entity Extractor
Classification of NormsActive Machine Learning: A Hybrid Approach
Lexia + LexML• Machine Learning as a Service (MLaaS)• Apache Spark + Apache UIMA
180322 Waltl - The Smart Workspace 11
Rule-based
Classifier
(e.g., naive bayes, logistic regression, perceptron, etc.)
Labeled norms
Unlabeled norms
Knowledge-engineer,Domainexpert
Labeled norms
Query strategy
Query strategies(e.g., most/least
uncertain)
Labour intensive & tedious!
Interaction with thesystem
Parameters & Options
Classification of Sentences, Phrases, etc.
180322 Waltl - The Smart Workspace 12
Data setGerman tenancy law
180322 Waltl - The Smart Workspace 13
Features• Bag-of-Words• Stopword removal• Tf-idf vectorization
Classes canhave very low
support!
Results on classifications of norms
180322 Waltl - The Smart Workspace 14
Remarkableresults?
Document creation
• A lot of effort needs to be done to structure documents1. Pre-processing2. Training3. Feature extraction4. Data mining5. Post-processing
• Why are they unstructured?• What if we could structure them as we create them?
è This would completely change the situation!
180322 Waltl - The Smart Workspace 15
Hybrid contracts
Definition• A hybrid contract is a document, which contains contractual content, combining
unstructured, structured, and computable information in a machine-readable and executable format.
What does this mean?
• Unstructured information à Text, Images, etc.
• Structured information à Parties, Metadata, Expiration Dates, etc.
• Computable information à Decision structures, code, etc.
è It is technology feasible that a document contains various kind of information.
180322 Waltl - The Smart Workspace 16
Closing remarks and follow-ups
• Technology is ready, but there is not a “one-size-fits-all” solutions
• The ”Smart Workspace” is more than just software• Data• Methods• Workflow Integration
• Follow-Up
• What is the ”Smart Workspace” for the audience?
• How does the “Smart Workspace” for lawyers look like?
• Which kind of workflows need to be supported?
• What kind of data is relevant in the United States?
180322 Waltl - The Smart Workspace 17
Technische Universität MünchenFaculty of InformaticsChair of Software Engineering for Business Information Systems
Boltzmannstraße 385748 Garching bei München
Tel +49.89.289.Fax +49.89.289.17136
wwwmatthes.in.tum.de
Bernhard Waltl
17124
Research Associate