UIMA Introduction SHARPn Summit June 11, 2012. Outline UIMA Terminology (not just TLAs) Parts of a UIMA pipeline Running a pipeline Viewing annotations.

Post on 22-Dec-2015

213 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

UIMA Introduction

SHARPn Summit

June 11, 2012

Outline

UIMA Terminology (not just TLAs)

Parts of a UIMA pipeline

Running a pipeline

Viewing annotations interactively

UIMA Terminology

CAS XCAS JCAS View Analysis Engine (AE) / Annotator

XML output: XCAS XMI

Type System JCasGen

CAS Visual Debugger (CVD)

CPE (Collection Processing Engine)

UIMA

Framework– Defining data types

– Passing data from one component to another

Tooling– Viewing results

– Debugging

– Editing XML visually

Data Through a Pipeline

Type System– Defines the data types passed along

CAS (Common Analysis Structure)– Container for the data passed along

– Created by UIMA from the Type System

Parts of a UIMA Pipeline

Collection Reader– Read input document

Analysis Engine(s) / Annotator(s)– Process document

CAS Consumer– Output data

Tying a Pipeline Together

CPE descriptor (Collection Processing Engine)

– Collection Reader

– Analysis Engine(s)

– CAS Consumer

Aggregate analysis engine– Multiple Analysis Engines and their order

Pipeline Example

UIMA term

Collection Reader

Analysis Engine

Analysis Engine

Analysis Engine

CAS Consumer

Example

Read files from a dir

Sentence detector

Tokenizer annotator

Part of Speech

tagger

Output tokens to DB

UIMA plugin for Eclipse

Provides visual editors for descriptors – Mini GUI for selecting options – Rather than editing XML directly

An “Update site” exists for installing pluginhttp://www.apache.org/dist/incubator/uima/eclipse-update-site

UIMA Tooling Options

Tools:– CPE Configurator

– CVD (CAS Visual Debugger)

Options:– Command line scripts/.bat files

– Run within Eclipse

Running a Pipeline - CPE

cTAKES provides a script and a bat filerunctakesCPE

Choose a CPE descriptor, such astest_plaintext.xml

from cTAKESdesc/cdpdesc/collection_processing_engine

Viewing Annotations - CVD

Viewing annotations using the CVD– Load the Type System

– Load the XCAS or XMI

Annotation Viewers

UIMA tools

– CVD (CAS Visual Debugger)

– Annotation viewer

Viewing XML output

– Any XML viewer

– Any text editor

Questions?

http://uima.apache.org/

Supplemental slides follow

Options to Run a Pipeline

CPE GUI CVD GUI

– Single Aggregate Analysis Engine

– No Collection Reader

Instantiate a CpeDescription and invoke

the process() method

uimaFIT– removes dependency on XML

Creating a New Annotator

Within Eclipse– Create Java project– Right click -> Add UIMA Nature– Add UIMA jars to .classpath (Build Path)– Create Analysis Engine (AE) descriptor– Add types to AE descriptor, or optionally

create separate Type System descriptor– Write code!

Running an AE in CVD

Using CVD to run an Analysis Engine– No Collection Reader– Single Analysis Engine (can be an aggregate)– No CAS Consumer

– Load an Analysis Engine – Paste/type in text to process

Family history of hyperlipidemia.

Modifying a parameter

UIMA’s descriptor editors allow you to modify most parameters without looking at the XML itself.

Links

Getting started with UIMA http://uima.apache.org/doc-uima-annotator.html

UIMA Update site for use in Eclipse http://www.apache.org/dist/incubator/uima/eclipse-update-site

Email address

masanz.james@mayo.edu

top related