Table mining and data curation from biomedical literature

Table mining and data curation from biomedical literatureNikola MilosevicSupervisors: Dr Goran Nenadic, Robert Hernandez

Why are we doing this?Growth of published research

Information growth

Text miningText mining developed tools and

methods to help scientistsFocused mainly on the body of

the articleTables and figures are typically

ignored

What about tables?

What about tables?

ChallengeVisually structured textMay be ungrammatical and

ambiguousVarious layoutsValue representation types

◦Numeric◦Text◦Ranges◦Formulas◦Complex

Method overview

Method overview

Table decompositionAim: Decompose table into the

structures suitable for further processing

Cell structures that keep information about navigational path (headers, stubs, etc.)

Heuristic based approachCell structure, alignment, content,

neigbourhood

Table decomposition

Information extractionPerformed a number of

experimentsExtraction of number of patients,

weight, BMIApproaches:

◦Rules◦Metamap◦White and black lists

ResultsAchieved promising results

Some of the information classes are easier to extract than other

Conclusion & Future workInformation extraction from tables is

feasibleFuture work:

◦Value and table type categorisation◦Development of normalization and

extraction engine◦Extraction rules◦Data storing format (triple store, linked

data)◦Data curation interface◦Data querying interface

Thank you! Q&A

Email: [email protected]

Table mining and data curation from biomedical literature

Technology

Table mining and data curation from biomedical literature