Top Banner
UIMA Introduction SHARPn Summit June 11, 2012
21

UIMA Introduction SHARPn Summit June 11, 2012. Outline UIMA Terminology (not just TLAs) Parts of a UIMA pipeline Running a pipeline Viewing annotations.

Dec 22, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: UIMA Introduction SHARPn Summit June 11, 2012. Outline  UIMA Terminology (not just TLAs)  Parts of a UIMA pipeline  Running a pipeline  Viewing annotations.

UIMA Introduction

SHARPn Summit

June 11, 2012

Page 2: UIMA Introduction SHARPn Summit June 11, 2012. Outline  UIMA Terminology (not just TLAs)  Parts of a UIMA pipeline  Running a pipeline  Viewing annotations.

Outline

UIMA Terminology (not just TLAs)

Parts of a UIMA pipeline

Running a pipeline

Viewing annotations interactively

Page 3: UIMA Introduction SHARPn Summit June 11, 2012. Outline  UIMA Terminology (not just TLAs)  Parts of a UIMA pipeline  Running a pipeline  Viewing annotations.

UIMA Terminology

CAS XCAS JCAS View Analysis Engine (AE) / Annotator

XML output: XCAS XMI

Type System JCasGen

CAS Visual Debugger (CVD)

CPE (Collection Processing Engine)

Page 4: UIMA Introduction SHARPn Summit June 11, 2012. Outline  UIMA Terminology (not just TLAs)  Parts of a UIMA pipeline  Running a pipeline  Viewing annotations.

UIMA

Framework– Defining data types

– Passing data from one component to another

Tooling– Viewing results

– Debugging

– Editing XML visually

Page 5: UIMA Introduction SHARPn Summit June 11, 2012. Outline  UIMA Terminology (not just TLAs)  Parts of a UIMA pipeline  Running a pipeline  Viewing annotations.

Data Through a Pipeline

Type System– Defines the data types passed along

CAS (Common Analysis Structure)– Container for the data passed along

– Created by UIMA from the Type System

Page 6: UIMA Introduction SHARPn Summit June 11, 2012. Outline  UIMA Terminology (not just TLAs)  Parts of a UIMA pipeline  Running a pipeline  Viewing annotations.

Parts of a UIMA Pipeline

Collection Reader– Read input document

Analysis Engine(s) / Annotator(s)– Process document

CAS Consumer– Output data

Page 7: UIMA Introduction SHARPn Summit June 11, 2012. Outline  UIMA Terminology (not just TLAs)  Parts of a UIMA pipeline  Running a pipeline  Viewing annotations.

Tying a Pipeline Together

CPE descriptor (Collection Processing Engine)

– Collection Reader

– Analysis Engine(s)

– CAS Consumer

Aggregate analysis engine– Multiple Analysis Engines and their order

Page 8: UIMA Introduction SHARPn Summit June 11, 2012. Outline  UIMA Terminology (not just TLAs)  Parts of a UIMA pipeline  Running a pipeline  Viewing annotations.

Pipeline Example

UIMA term

Collection Reader

Analysis Engine

Analysis Engine

Analysis Engine

CAS Consumer

Example

Read files from a dir

Sentence detector

Tokenizer annotator

Part of Speech

tagger

Output tokens to DB

Page 9: UIMA Introduction SHARPn Summit June 11, 2012. Outline  UIMA Terminology (not just TLAs)  Parts of a UIMA pipeline  Running a pipeline  Viewing annotations.

UIMA plugin for Eclipse

Provides visual editors for descriptors – Mini GUI for selecting options – Rather than editing XML directly

An “Update site” exists for installing pluginhttp://www.apache.org/dist/incubator/uima/eclipse-update-site

Page 10: UIMA Introduction SHARPn Summit June 11, 2012. Outline  UIMA Terminology (not just TLAs)  Parts of a UIMA pipeline  Running a pipeline  Viewing annotations.

UIMA Tooling Options

Tools:– CPE Configurator

– CVD (CAS Visual Debugger)

Options:– Command line scripts/.bat files

– Run within Eclipse

Page 11: UIMA Introduction SHARPn Summit June 11, 2012. Outline  UIMA Terminology (not just TLAs)  Parts of a UIMA pipeline  Running a pipeline  Viewing annotations.

Running a Pipeline - CPE

cTAKES provides a script and a bat filerunctakesCPE

Choose a CPE descriptor, such astest_plaintext.xml

from cTAKESdesc/cdpdesc/collection_processing_engine

Page 12: UIMA Introduction SHARPn Summit June 11, 2012. Outline  UIMA Terminology (not just TLAs)  Parts of a UIMA pipeline  Running a pipeline  Viewing annotations.

Viewing Annotations - CVD

Viewing annotations using the CVD– Load the Type System

– Load the XCAS or XMI

Page 13: UIMA Introduction SHARPn Summit June 11, 2012. Outline  UIMA Terminology (not just TLAs)  Parts of a UIMA pipeline  Running a pipeline  Viewing annotations.

Annotation Viewers

UIMA tools

– CVD (CAS Visual Debugger)

– Annotation viewer

Viewing XML output

– Any XML viewer

– Any text editor

Page 14: UIMA Introduction SHARPn Summit June 11, 2012. Outline  UIMA Terminology (not just TLAs)  Parts of a UIMA pipeline  Running a pipeline  Viewing annotations.

Questions?

http://uima.apache.org/

Page 15: UIMA Introduction SHARPn Summit June 11, 2012. Outline  UIMA Terminology (not just TLAs)  Parts of a UIMA pipeline  Running a pipeline  Viewing annotations.

Supplemental slides follow

Page 16: UIMA Introduction SHARPn Summit June 11, 2012. Outline  UIMA Terminology (not just TLAs)  Parts of a UIMA pipeline  Running a pipeline  Viewing annotations.

Options to Run a Pipeline

CPE GUI CVD GUI

– Single Aggregate Analysis Engine

– No Collection Reader

Instantiate a CpeDescription and invoke

the process() method

uimaFIT– removes dependency on XML

Page 17: UIMA Introduction SHARPn Summit June 11, 2012. Outline  UIMA Terminology (not just TLAs)  Parts of a UIMA pipeline  Running a pipeline  Viewing annotations.

Creating a New Annotator

Within Eclipse– Create Java project– Right click -> Add UIMA Nature– Add UIMA jars to .classpath (Build Path)– Create Analysis Engine (AE) descriptor– Add types to AE descriptor, or optionally

create separate Type System descriptor– Write code!

Page 18: UIMA Introduction SHARPn Summit June 11, 2012. Outline  UIMA Terminology (not just TLAs)  Parts of a UIMA pipeline  Running a pipeline  Viewing annotations.

Running an AE in CVD

Using CVD to run an Analysis Engine– No Collection Reader– Single Analysis Engine (can be an aggregate)– No CAS Consumer

– Load an Analysis Engine – Paste/type in text to process

Family history of hyperlipidemia.

Page 19: UIMA Introduction SHARPn Summit June 11, 2012. Outline  UIMA Terminology (not just TLAs)  Parts of a UIMA pipeline  Running a pipeline  Viewing annotations.

Modifying a parameter

UIMA’s descriptor editors allow you to modify most parameters without looking at the XML itself.

Page 20: UIMA Introduction SHARPn Summit June 11, 2012. Outline  UIMA Terminology (not just TLAs)  Parts of a UIMA pipeline  Running a pipeline  Viewing annotations.

Links

Getting started with UIMA http://uima.apache.org/doc-uima-annotator.html

UIMA Update site for use in Eclipse http://www.apache.org/dist/incubator/uima/eclipse-update-site