Table mining and data curation from biomedical literatureNikola MilosevicSupervisors: Dr Goran Nenadic, Robert Hernandez
Why are we doing this?Growth of published research
Information growth
Text miningText mining developed tools and
methods to help scientistsFocused mainly on the body of
the articleTables and figures are typically
ignored
What about tables?
What about tables?
ChallengeVisually structured textMay be ungrammatical and
ambiguousVarious layoutsValue representation types
◦Numeric◦Text◦Ranges◦Formulas◦Complex
Method overview
Method overview
Table decompositionAim: Decompose table into the
structures suitable for further processing
Cell structures that keep information about navigational path (headers, stubs, etc.)
Heuristic based approachCell structure, alignment, content,
neigbourhood
Table decomposition
Information extractionPerformed a number of
experimentsExtraction of number of patients,
weight, BMIApproaches:
◦Rules◦Metamap◦White and black lists
ResultsAchieved promising results
Some of the information classes are easier to extract than other
Conclusion & Future workInformation extraction from tables is
feasibleFuture work:
◦Value and table type categorisation◦Development of normalization and
extraction engine◦Extraction rules◦Data storing format (triple store, linked
data)◦Data curation interface◦Data querying interface
Thank you! Q&A
Email: [email protected]