Top Banner
Text Mining Sophia Ananiadou [email protected] Na:onal Centre for Text Mining www.nactem.ac.uk
49

TextMining% - University of Manchesterstudentnet.cs.manchester.ac.uk/pgr/2012/CDTSeminar/... · 2012-10-22 · NaCTeM- ! The 1st publicly funded national text mining centre in the

Jul 10, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: TextMining% - University of Manchesterstudentnet.cs.manchester.ac.uk/pgr/2012/CDTSeminar/... · 2012-10-22 · NaCTeM- ! The 1st publicly funded national text mining centre in the

Text  Mining  

Sophia  Ananiadou  [email protected]  

Na:onal  Centre  for  Text  Mining  www.nactem.ac.uk      

Page 2: TextMining% - University of Manchesterstudentnet.cs.manchester.ac.uk/pgr/2012/CDTSeminar/... · 2012-10-22 · NaCTeM- ! The 1st publicly funded national text mining centre in the

NaCTeM- www.nactem.ac.uk q  The 1st publicly funded national

text mining centre in the world q  Location: Manchester

Interdisciplinary Biocentre q  Phase I - Biology (2005-2008) q  Phase II - Biology, Medicine,

Social Sciences (2008-2011) q  Phase III- Medicine, Biology

(2012-2016)

Sophia  Ananiadou  John  McNaught  

Page 3: TextMining% - University of Manchesterstudentnet.cs.manchester.ac.uk/pgr/2012/CDTSeminar/... · 2012-10-22 · NaCTeM- ! The 1st publicly funded national text mining centre in the

Text  Mining  Research  Group  

•  Sophia  Ananiadou  [email protected]    (MIB)  

•  John  McNaught  [email protected]    (MIB)  •  Goran  Nenadic  [email protected]  IT  building,  IT308      

Page 4: TextMining% - University of Manchesterstudentnet.cs.manchester.ac.uk/pgr/2012/CDTSeminar/... · 2012-10-22 · NaCTeM- ! The 1st publicly funded national text mining centre in the

The problem with information overload and knowledge discovery

•  Humans cannot easily: – Keep up-to-date with all relevant literature – Find relevant and precise information – Synthesize information from many diverse

sources – Exploit the mass of information to generate

hypotheses – Discover new knowledge

S.Ananiadou    

Page 5: TextMining% - University of Manchesterstudentnet.cs.manchester.ac.uk/pgr/2012/CDTSeminar/... · 2012-10-22 · NaCTeM- ! The 1st publicly funded national text mining centre in the

What  is  text  mining?  

•  Extracts  and  discovers  knowledge  hidden  in  text  

•  Informa:on  access  •  Knowledge  discovery  •  Seman:c  search,  seman:c  metadata    

–  iden:fying  concepts  – extrac:ng  facts/rela:ons    – discovering  implicit  links    

S.Ananiadou    

Page 6: TextMining% - University of Manchesterstudentnet.cs.manchester.ac.uk/pgr/2012/CDTSeminar/... · 2012-10-22 · NaCTeM- ! The 1st publicly funded national text mining centre in the

S.Ananiadou     6  

The Need for Text Mining  

§  Full  Papers  

§  Abstracts  

§  Clinical  trials    

§  Reports,  discharge              summaries  

§  EHR  

§  Textbooks,                monographs  

§  Grey  content,  online              discussion  forums  

MEDLINE •  2005: ~14M •  2009: ~18M •  2011: 21.2M (1/10/11)

Overwhelming information in textual, unstructured format

Page 7: TextMining% - University of Manchesterstudentnet.cs.manchester.ac.uk/pgr/2012/CDTSeminar/... · 2012-10-22 · NaCTeM- ! The 1st publicly funded national text mining centre in the

S.Ananiadou     7  

A  new  paradigm  of  sharing  informa:on  and  knowledge  

Informa:on  Retrieval   Databases

Seman:c  Web

Text  Mining,  NLP

Disciplines  Merging  Knowledge  sharing  

Page 8: TextMining% - University of Manchesterstudentnet.cs.manchester.ac.uk/pgr/2012/CDTSeminar/... · 2012-10-22 · NaCTeM- ! The 1st publicly funded national text mining centre in the

From  Text  to  Knowledge:    tackling  the  data  deluge  through  text  mining  

Unstructured Text (implicit knowledge)

Structured content (explicit knowledge)

Information extraction

Semantic metadata

Knowledge Discovery

Information Retrieval

S.Ananiadou    

Page 9: TextMining% - University of Manchesterstudentnet.cs.manchester.ac.uk/pgr/2012/CDTSeminar/... · 2012-10-22 · NaCTeM- ! The 1st publicly funded national text mining centre in the

Text  mining  steps  

•  Informa:on  Retrieval  yields  all  relevant  texts  –  Gathers,  selects,  filters  documents  that  may  prove  useful  –  Finds  what  is  known  

•  Informa:on  Extrac:on  extracts  facts  &  events  of  interest  to  user  –  Finds  relevant  concepts,  facts  about  concepts  

–  Finds  only  what  we  are  looking  for    

•  Data  Mining  discovers  unsuspected  associa:ons  –  Combines  &  links  facts  and  events  –  Discovers  new  knowledge,  finds  new  associa:ons    

S.Ananiadou    

Page 10: TextMining% - University of Manchesterstudentnet.cs.manchester.ac.uk/pgr/2012/CDTSeminar/... · 2012-10-22 · NaCTeM- ! The 1st publicly funded national text mining centre in the

•  Extrac:on  of  terms  and  named  en::es  (names  of  people,  organisa:ons,  diseases,  genes,  etc)  

•  Discovery  of  concepts  allows    seman:c  annota:on  and  enrichment  of  documents  

•  Going  a  step  further:  extrac:ng  facts,  events  from  text  

•  And  even  further…  opinions,  a]tudes,  certainty,  contradic:ons…  

   .  meta-­‐knowledge  

S.Ananiadou     10  

Impact  of  NLP-­‐based  text  mining    

•  Improves  clustering,  classifica:on  of  documents  

•  Improves  informa:on  access  by  going  beyond  index  terms,  enabling  seman:c  querying  

•  Enables  even  more  advanced  text  mining  applica:ons  

•   Linking  text  with  pathways  

Page 11: TextMining% - University of Manchesterstudentnet.cs.manchester.ac.uk/pgr/2012/CDTSeminar/... · 2012-10-22 · NaCTeM- ! The 1st publicly funded national text mining centre in the

Structured Knowledge

From  Text  to  Knowledge:    NLP  and  Knowledge  Extrac:on  

Lexicons and ontologies

Knowledge Extraction

Tools

Text Annotation Tools

S.Ananiadou    

Page 12: TextMining% - University of Manchesterstudentnet.cs.manchester.ac.uk/pgr/2012/CDTSeminar/... · 2012-10-22 · NaCTeM- ! The 1st publicly funded national text mining centre in the

Who  needs  this  stuff?  

•  Seman:c  Web  community:  we  provide  the  seman:cs  

•  Computa:onal  Biology:  we  link  text  with  networks/pathways  

•  Ontology:  we  populate  ontologies  from  text,  linking  with  Protégé    

•  Database  curators:  automa:c  update  using  evidence  from  text….  

 S.Ananiadou    

Page 13: TextMining% - University of Manchesterstudentnet.cs.manchester.ac.uk/pgr/2012/CDTSeminar/... · 2012-10-22 · NaCTeM- ! The 1st publicly funded national text mining centre in the

•  Semantic search from full papers, abstracts •  Hypothesis generator: mining direct and indirect

associations •  Supporting systematic reviews •  Developing clinical trial recommender systems •  Extracting bioprocesses for cancer research •  Enriching, curating pathways with literature evidence •  Annotation environment for curators….

S.Ananiadou     13  

How TM is embedded in applications  

Page 14: TextMining% - University of Manchesterstudentnet.cs.manchester.ac.uk/pgr/2012/CDTSeminar/... · 2012-10-22 · NaCTeM- ! The 1st publicly funded national text mining centre in the

Which  User  Communi:es?  •  Pharma  •  Health/Medicine  •  Finance  •  Social  sciences  •  Digital  Economy,  Digital  Libraries  •  Google,  IBM,  Microsof:  all  inves:ng  in  text  mining  

•  Everyone  needs  text  mining  to  solve  their  knowledge  management  problems!  

S.Ananiadou    

Page 15: TextMining% - University of Manchesterstudentnet.cs.manchester.ac.uk/pgr/2012/CDTSeminar/... · 2012-10-22 · NaCTeM- ! The 1st publicly funded national text mining centre in the

Text  Mining:  Layers  upon  layers  

Interactions

Facts

Terms

Entities

POS

WordsLayers of SophisticationSimple keyword search ala GoogleTM

Term identification

Information Extraction

GeneralSolution

HighlyCustomised

Solution

ImprovedAccuracy

Informative Summarisation

Q&A Services

Named Entity Recognition

Metadata Extraction

Database Curation

Indicative Summarisation

Semantically Annotate

Names, Addresses, Organisations or

Proteins

Who, What, When and Where?

Enhance searching by

looking for related keywords and

phrasesChoose between

different meanings - ‘a dog lead’ or ‘a lead balloon’?

What doesthis do?

Generatehypotheses

S.Ananiadou    

Page 16: TextMining% - University of Manchesterstudentnet.cs.manchester.ac.uk/pgr/2012/CDTSeminar/... · 2012-10-22 · NaCTeM- ! The 1st publicly funded national text mining centre in the

Retrieving related concepts

MEDLINE  (21  million  abstracts) FACTA+

diabetes diabetes

216,000  documents  relevant  to  diabetes

Insulin,  albumin,  …  

Diabetes  is  …

…  when  insulin  is  …

…  lower  albumin  level

http://refine1-nactem.mc.man.ac.uk/facta/ Tsuruoka,  Y.  et  al  (2008)  Bioinforma:cs  24(21)  

S.Ananiadou    

Page 17: TextMining% - University of Manchesterstudentnet.cs.manchester.ac.uk/pgr/2012/CDTSeminar/... · 2012-10-22 · NaCTeM- ! The 1st publicly funded national text mining centre in the

Click!

S.Ananiadou    

Page 18: TextMining% - University of Manchesterstudentnet.cs.manchester.ac.uk/pgr/2012/CDTSeminar/... · 2012-10-22 · NaCTeM- ! The 1st publicly funded national text mining centre in the

…  However,  further  decreases  in  branched-­‐chain  amino  acid  levels  indicate  that  caffeine  might  promote  deeper  fa@gue  than  placebo    

Extracting snippets of information

S.Ananiadou    

Page 19: TextMining% - University of Manchesterstudentnet.cs.manchester.ac.uk/pgr/2012/CDTSeminar/... · 2012-10-22 · NaCTeM- ! The 1st publicly funded national text mining centre in the

Extracting indirect associations

19

E-cadherin is associated with Parkinson’s disease via CASS4,

SNAIL3, transcription factor EB, etc.

S.Ananiadou    

Page 20: TextMining% - University of Manchesterstudentnet.cs.manchester.ac.uk/pgr/2012/CDTSeminar/... · 2012-10-22 · NaCTeM- ! The 1st publicly funded national text mining centre in the

Directly associated concepts

20

Query:    E-­‐cadherin  and  GENIA:Nega:ve_regula:on

E-cadherin often appears with cancers S.Ananiadou    

Page 21: TextMining% - University of Manchesterstudentnet.cs.manchester.ac.uk/pgr/2012/CDTSeminar/... · 2012-10-22 · NaCTeM- ! The 1st publicly funded national text mining centre in the

Indirectly associated concepts

21

Query:    E-­‐cadherin  and  GENIA:Nega:ve_regula:on

E-cadherin is indirectly associated with nervous system disorders (e.g., Alzheimer’s disease, Parkinson’s disease, epilepsy)

S.Ananiadou    

Page 22: TextMining% - University of Manchesterstudentnet.cs.manchester.ac.uk/pgr/2012/CDTSeminar/... · 2012-10-22 · NaCTeM- ! The 1st publicly funded national text mining centre in the

Project  :  TM  for  cancer  genomics  

•  Enhancing  FACTA+  to  deal  with  cancer  genomics  

•  Muta:ons  oncogenes  •  Rela:ons  between  treatments,  genes,  drugs  •  Research  into  Informa:on  Extrac:on  (Named  en:ty,  rela:on,  event  mining)  

•  Collabora:on  with  Medical  School        

S.Ananiadou    

Page 23: TextMining% - University of Manchesterstudentnet.cs.manchester.ac.uk/pgr/2012/CDTSeminar/... · 2012-10-22 · NaCTeM- ! The 1st publicly funded national text mining centre in the

Information extraction with Ø Typed associations of arbitrary numbers of participants (n-ary)‏ Ø Events (processes / reactions) can participate in other events (recursive)‏ Ø Explicit identification of roles that participants play (Theme, Cause, ...)‏ Many resources, methods and applications introduced since 2009

Event extraction (EE)‏

S.Ananiadou    

Page 24: TextMining% - University of Manchesterstudentnet.cs.manchester.ac.uk/pgr/2012/CDTSeminar/... · 2012-10-22 · NaCTeM- ! The 1st publicly funded national text mining centre in the

Project:  extrac:ng  inten:ons  

•  Extrac:ng  informa:on  from  full  papers  •  Classify  facts  according  to  the  authors’  inten:ons  

•  hnp://www.nactem.ac.uk/meta-­‐knowledge/    •  Nega:on,  specula:on,  contradic:on  

 

S.Ananiadou    

Page 25: TextMining% - University of Manchesterstudentnet.cs.manchester.ac.uk/pgr/2012/CDTSeminar/... · 2012-10-22 · NaCTeM- ! The 1st publicly funded national text mining centre in the

Nuances  of  language  •  Argumenta:on,  rhetorical  intent,  meta-­‐knowledge  •  Specula:on  

–  Probable,  possible,…  –  Suggest,  indicate,  …  –  May,  might,  would,  …  

•  Manner:  slightly,  rapidly,  greatly,  …  •  Polarity  (nega:ve,  posi:ve):  no,  never,  …  •  Such  knowledge  required  for:  discourse  analysis,  opinion  

mining,  …  •  If  not  taken  into  account,  then  results  can  be  invalid  and  

misleading  •  Collabora:ve  project  with  publishing  company.  

S.Ananiadou    

Page 26: TextMining% - University of Manchesterstudentnet.cs.manchester.ac.uk/pgr/2012/CDTSeminar/... · 2012-10-22 · NaCTeM- ! The 1st publicly funded national text mining centre in the

Meta-­‐knowledge  annota:on  

Certainty  level  

Polarity  

Analysis  

Manner  

Source  

S.Ananiadou    

Page 27: TextMining% - University of Manchesterstudentnet.cs.manchester.ac.uk/pgr/2012/CDTSeminar/... · 2012-10-22 · NaCTeM- ! The 1st publicly funded national text mining centre in the

Public Health reviews

S.Ananiadou    

Page 28: TextMining% - University of Manchesterstudentnet.cs.manchester.ac.uk/pgr/2012/CDTSeminar/... · 2012-10-22 · NaCTeM- ! The 1st publicly funded national text mining centre in the

Unsupervised  methods  for  Public  Health  Search    

•  Building  on  the  clinical  trials  project  •   Extrac:ng  informa:on  from  literature  •  Unsupervised  methods  +  machine  learning  •  Summarisa:on  •  Coopera:on  with  Public  Health  (NICE:  na:onal  Ins:tute  for  Health  and  Clinical  Excellence)  

 

S.Ananiadou    

Page 29: TextMining% - University of Manchesterstudentnet.cs.manchester.ac.uk/pgr/2012/CDTSeminar/... · 2012-10-22 · NaCTeM- ! The 1st publicly funded national text mining centre in the

Finding  evidence  from  full  text  

•  In  context  of  UKPMC  •  Beyond  full  text  search  and  panern  matching  •  Deeply  analyse  documents  off-­‐line  •  Index  rela:onships  •  Key  off  search  term  to  dynamically  generate  from  indexed  rela:onships  ques;ons  that  have  known  answers  – Not  auto-­‐comple:on…  

S.Ananiadou    

Page 30: TextMining% - University of Manchesterstudentnet.cs.manchester.ac.uk/pgr/2012/CDTSeminar/... · 2012-10-22 · NaCTeM- ! The 1st publicly funded national text mining centre in the

http://labs.ukpmc.ac.uk/evf S.Ananiadou    

Page 31: TextMining% - University of Manchesterstudentnet.cs.manchester.ac.uk/pgr/2012/CDTSeminar/... · 2012-10-22 · NaCTeM- ! The 1st publicly funded national text mining centre in the

Fewer  hits,  now  we  click  on  a  ques:on  

S.Ananiadou    

Page 32: TextMining% - University of Manchesterstudentnet.cs.manchester.ac.uk/pgr/2012/CDTSeminar/... · 2012-10-22 · NaCTeM- ! The 1st publicly funded national text mining centre in the

Known  answers  to    “what  is  produced  by  GO”  

We can find out more facts by investigating a document S.Ananiadou    

Page 33: TextMining% - University of Manchesterstudentnet.cs.manchester.ac.uk/pgr/2012/CDTSeminar/... · 2012-10-22 · NaCTeM- ! The 1st publicly funded national text mining centre in the

Extracted  subject-­‐verb-­‐object  triples  

Verbs are “domain verbs of interest” Deep analysis reveals “hidden” subjects (passives undone) S.Ananiadou    

Page 34: TextMining% - University of Manchesterstudentnet.cs.manchester.ac.uk/pgr/2012/CDTSeminar/... · 2012-10-22 · NaCTeM- ! The 1st publicly funded national text mining centre in the

Biomedical  causality  recogni:on  •  Discovering  new  facts  and  connec:ons  •  Enriching  exis:ng  pathways  •  Crea:ng  new  pathways  

CAUSES  

Named  en::es   Events   Causality   Pathways  Raw  

text  

S.Ananiadou    

Page 35: TextMining% - University of Manchesterstudentnet.cs.manchester.ac.uk/pgr/2012/CDTSeminar/... · 2012-10-22 · NaCTeM- ! The 1st publicly funded national text mining centre in the

 TwiHer  analysis    using  text  mining  tools    

•  Twiner  is:  –  one  of  the  most  popular  social  media  –  A  new  means  of  mass  communica:on  –  accessible  to  all  

•  The  load  of  informa:on  is  immense,  thus  automa@c  analysis  is  essen:al.  •  In  this  project,  the  student  will:  

•  use  the  text  mining  tools  of  NaCTeM  (e.g.  topic  extrac:on,  summarisa:on)  •  exploit  panerns  and  trends  in  twiner  feeds  concerning  specific  topics  or  events  •  Sta:s:cal  analysis  based  on  text  mining  analy:cs,  noisy  data    •  Anempt  to  answer  ques:ons  about  the  nature  of  tweeter,  for  example:  

–  the  way  tweeter  influences  human  behaviour    –  whether  tweeter  strengthens  posi:ve  or  nega:ve  emo:ons  about  an  event  –  whether  it  can  mo:vate  people  to  par:cipate  in  a  public  protest  –  whether  it  can  agitate  or  allay  panic  during  extreme  natural  phenomena    

such  as  floods,  earthquakes  sequences  and  typhoons,  etc.    

 

S.Ananiadou    

Page 36: TextMining% - University of Manchesterstudentnet.cs.manchester.ac.uk/pgr/2012/CDTSeminar/... · 2012-10-22 · NaCTeM- ! The 1st publicly funded national text mining centre in the

Opinion  and  trend    analysis  using  text  mining  

•  Synthesis  of  mul@ple  views  about  a  topic,  issue  or  product.    •  Sources:  reviews,  newswire  ar:cles,  blogs,  and  social  media,  such  

as  facebook,  tweeter,  google+  and  myspace  •  These  sources  are  are  opinion  repositories  and  logs  of  trends  and  

lifestyle  •  Opinion  and  trend  analysis  cuts  across:  

–  informa:on  retrieval  –  text  mining  –  automa:c  summarisa:on  –  sen:ment  analysis.      

•  Research  in  this  area  includes:  –  learning  the  seman:c  orienta:on  and  emo:onal  stress  of  words  –  scoring  the  sen:ment  of  documents  –  analysing  opinions  and  a]tudes  etc.      

S.Ananiadou    

Page 37: TextMining% - University of Manchesterstudentnet.cs.manchester.ac.uk/pgr/2012/CDTSeminar/... · 2012-10-22 · NaCTeM- ! The 1st publicly funded national text mining centre in the

John  McNaught  

Text  Mining  Research  Group  and  

NaCTeM  (Deputy  Director)  [email protected]    

Page 38: TextMining% - University of Manchesterstudentnet.cs.manchester.ac.uk/pgr/2012/CDTSeminar/... · 2012-10-22 · NaCTeM- ! The 1st publicly funded national text mining centre in the

It’s  your  PhD,  not  mine  

•  If  you  want  me  to  supervise  you  in  an  area  of  interest  to  me,  then  I  expect  you  to  come  up  with  at  least  a  rough  idea  for  a  research  proposal  –  You’ll  be  more  interested  in  working  on  something  you  “own”  

– Whom  would  a  top  restaurant  be  more  interested  in  employing?  

•  A  cook  who  could  show  he  was  good  at  buying  ready-­‐made  meals?  

•  Or  a  chef  who  could  show  he  was  capable  of  inven:ng  a  novel  dish?  

 

Page 39: TextMining% - University of Manchesterstudentnet.cs.manchester.ac.uk/pgr/2012/CDTSeminar/... · 2012-10-22 · NaCTeM- ! The 1st publicly funded national text mining centre in the

Proposals  welcome  in  areas  such  as:  

•  Text  mining  –  Informa:on  extrac:on  

•  named  en:ty  recogni:on,  rela:on  extrac:on,  fact  or  event  extrac:on  

– Opinion  mining  (sen:ment  mining)  – Presenta:on  of  complex  text  mining  results  to  users,  interac:on  aspects,  search  aspects  

•  Issues  in  resource  building  for  NLP/TM  – Lexicons,  terminologies,  annotated  corpora  

 

Page 40: TextMining% - University of Manchesterstudentnet.cs.manchester.ac.uk/pgr/2012/CDTSeminar/... · 2012-10-22 · NaCTeM- ! The 1st publicly funded national text mining centre in the

Proposals  welcome…  •  Mapping  between  the  language  of  experts  and  the  language  of  non-­‐experts  – Many  non-­‐experts  anempt  to  use/understand  specialised  sources  (health  problems,  …)  

•  Wri:ng  aids  –  TM  is  applied  post-­‐crea:on  of  document,  no  author  present  

•  Ambiguity  greatest  problem  – Why  not  create  seman:c  metadata  as  author  constructs  document,  resolve  ambigui:es,  propose  extracted  events,  link  document  to  knowledge  sphere?  

Page 41: TextMining% - University of Manchesterstudentnet.cs.manchester.ac.uk/pgr/2012/CDTSeminar/... · 2012-10-22 · NaCTeM- ! The 1st publicly funded national text mining centre in the

Proposals  welcome…  •  If  you  have  domain  or  language  exper:se  

–  Proposals  can  be  oriented  towards  that  domain  or  language  

– Although  finding  appropriate  resources  (lexica,  corpora,  language  processing  tools)  may  be  a  severe  issue  where  NLP/TM  is  underdeveloped  or  nascent  for  some  language  

•  (so    that  might  give  further  ideas)  •  TM  is  of  interest  also  to  those  in  humani:es,  social  sciences,  law,  ...,  so  plenty  of  scope  for  topics  in  such  domains  (e.g.  linking  historical  personages  and  historical  events)    

Page 42: TextMining% - University of Manchesterstudentnet.cs.manchester.ac.uk/pgr/2012/CDTSeminar/... · 2012-10-22 · NaCTeM- ! The 1st publicly funded national text mining centre in the

Some  projects  PhD  students  of  mine  have/are  worked/working  on  

•  Arabic  named  en:ty  recogni:on  –  Hard  because  no  capitalisa:on,  lack  of  diacri:cs  in  MSA,  ambiguity  of  names  with  common  nouns  

•  Opinion  mining  for  Arabic  •  Machine  learning  of  template  extrac:on  rules  

–  To  help  grammar  rule  writers  •  Automa:c  genera:on  of  seman:c  clusters  from  defini:ons  

–  To  help  with  “:p  of  tongue”  phenomenon  and  with  communica:on  among  experts  from  different  domains  

•  Lexical  simplifica:on  for  accessibility  and  low-­‐literacy  support      

Page 43: TextMining% - University of Manchesterstudentnet.cs.manchester.ac.uk/pgr/2012/CDTSeminar/... · 2012-10-22 · NaCTeM- ! The 1st publicly funded national text mining centre in the

Information on NaCTeM

•  All our services are here: http://www.nactem.ac.uk/services.php •  Our tools are here: http://www.nactem.ac.uk/software.php •  Our publications http://www.nactem.ac.uk/aigaion2/index.php?/publications

Page 44: TextMining% - University of Manchesterstudentnet.cs.manchester.ac.uk/pgr/2012/CDTSeminar/... · 2012-10-22 · NaCTeM- ! The 1st publicly funded national text mining centre in the
Page 45: TextMining% - University of Manchesterstudentnet.cs.manchester.ac.uk/pgr/2012/CDTSeminar/... · 2012-10-22 · NaCTeM- ! The 1st publicly funded national text mining centre in the

Possible projects

Identification of conflicting information in biological literature

• Aim: finding statements that express some degree of difference/conflict, e.g.

Protein A is highly expressed in T-cells T-cells show reduced expression of Protein A

• Build on previous work (completed PhD)

Page 46: TextMining% - University of Manchesterstudentnet.cs.manchester.ac.uk/pgr/2012/CDTSeminar/... · 2012-10-22 · NaCTeM- ! The 1st publicly funded national text mining centre in the

Possible projects Support for logical modelling in

systems biomedicine •  Aim: extract information to construct quantitative

computational models of metabolic functions or diseases –  involves literature mining and data integration,

but also some mathematical skills (e.g. logical models and simulations)

–  one modelling project already running in a similar area

•  Multi-disciplinary supervisory team (from Life Sciences)

Page 47: TextMining% - University of Manchesterstudentnet.cs.manchester.ac.uk/pgr/2012/CDTSeminar/... · 2012-10-22 · NaCTeM- ! The 1st publicly funded national text mining centre in the

Possible projects Clinical and health-care text mining

– Aim: support clinical decision support by extracting and aggregating textual health data

– Extraction and structuring of patient-specific information from health-care records, literature and patient generated sources

•  combining text mining, ontologies and data analytics

– Multi-disciplinary supervisory teams (local hospitals: Christie, Children hospital, Hope)

Page 48: TextMining% - University of Manchesterstudentnet.cs.manchester.ac.uk/pgr/2012/CDTSeminar/... · 2012-10-22 · NaCTeM- ! The 1st publicly funded national text mining centre in the

Possible projects

Integrated data and text mining •  Aim: combine data that comes from multi-modal

sources, e.g. structured and unstructured – e.g. integration of clinical/experimental data

•  Many challenging questions to be asked: –  how to combine different types of data, weights etc –  defining kernel-based similarity methods to be used in

machine learning •  Requires good maths and computing skills

Page 49: TextMining% - University of Manchesterstudentnet.cs.manchester.ac.uk/pgr/2012/CDTSeminar/... · 2012-10-22 · NaCTeM- ! The 1st publicly funded national text mining centre in the

Contact

•  Goran Nenadic email: [email protected] IT building, IT308

http://gnode1.mib.man.ac.uk •  Small scale pilot projects around these topics will

be available