Top Banner
Text Mining for Evidence Based Medicine Diego Moll´ a Centre for Language Technology, Macquarie University Staff Seminar, 1 May 2015
64
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Text Mining for Evidence Based Medicine

Text Mining for Evidence Based Medicine

Diego Molla

Centre for Language Technology,Macquarie University

Staff Seminar, 1 May 2015

Page 2: Text Mining for Evidence Based Medicine

Evidence Based Medicine Our Research on Text Mining for EBM In Progress/Future Research

Contents

Evidence Based MedicineWhat is Evidence Based Medicine?EBM and Language Technology

Our Research on Text Mining for EBMClusteringEvidence GradingSingle-document Summarisation

In Progress/Future Research

EBM Summarisation Diego Molla 2/59

Page 3: Text Mining for Evidence Based Medicine

Evidence Based Medicine Our Research on Text Mining for EBM In Progress/Future Research

Contents

Evidence Based MedicineWhat is Evidence Based Medicine?EBM and Language Technology

Our Research on Text Mining for EBMClusteringEvidence GradingSingle-document Summarisation

In Progress/Future Research

EBM Summarisation Diego Molla 3/59

Page 4: Text Mining for Evidence Based Medicine

Evidence Based Medicine Our Research on Text Mining for EBM In Progress/Future Research

Contents

Evidence Based MedicineWhat is Evidence Based Medicine?EBM and Language Technology

Our Research on Text Mining for EBMClusteringEvidence GradingSingle-document Summarisation

In Progress/Future Research

EBM Summarisation Diego Molla 4/59

Page 5: Text Mining for Evidence Based Medicine

Evidence Based Medicine Our Research on Text Mining for EBM In Progress/Future Research

Evidence Based Medicine

http://laikaspoetnik.wordpress.com/2009/04/04/evidence-based-medicine-the-facebook-of-medicine/

EBM Summarisation Diego Molla 5/59

Page 6: Text Mining for Evidence Based Medicine

Evidence Based Medicine Our Research on Text Mining for EBM In Progress/Future Research

Suggested Steps in EBM

http://hlwiki.slais.ubc.ca/index.php?title=Five_steps_of_EBM

EBM Summarisation Diego Molla 6/59

Page 7: Text Mining for Evidence Based Medicine

Evidence Based Medicine Our Research on Text Mining for EBM In Progress/Future Research

Where to Search for External Evidence?

1. Clinical Practice Guidelines, Recommendations:I Clinical Practice Guidelines Portal

(http://www.clinicalguidelines.gov.au)I Clinical Inquiries from the Journal of Family Practice

(http://jfponline.com)

2. Evidence-based Summaries (Systematic Reviews):I The Cochrane Library (http://www.thecochranelibrary.com/).I EBM Online (http://ebm.bmj.com).

3. Search the Medical Literature:I E.g. PubMed (http://www.ncbi.nlm.nih.gov/pubmed/).

EBM Summarisation Diego Molla 7/59

Page 8: Text Mining for Evidence Based Medicine

Evidence Based Medicine Our Research on Text Mining for EBM In Progress/Future Research

Where to Search for External Evidence?

1. Clinical Practice Guidelines, Recommendations:I Clinical Practice Guidelines Portal

(http://www.clinicalguidelines.gov.au)I Clinical Inquiries from the Journal of Family Practice

(http://jfponline.com)

2. Evidence-based Summaries (Systematic Reviews):I The Cochrane Library (http://www.thecochranelibrary.com/).I EBM Online (http://ebm.bmj.com).

3. Search the Medical Literature:I E.g. PubMed (http://www.ncbi.nlm.nih.gov/pubmed/).

EBM Summarisation Diego Molla 7/59

Page 9: Text Mining for Evidence Based Medicine

Evidence Based Medicine Our Research on Text Mining for EBM In Progress/Future Research

Where to Search for External Evidence?

1. Clinical Practice Guidelines, Recommendations:I Clinical Practice Guidelines Portal

(http://www.clinicalguidelines.gov.au)I Clinical Inquiries from the Journal of Family Practice

(http://jfponline.com)

2. Evidence-based Summaries (Systematic Reviews):I The Cochrane Library (http://www.thecochranelibrary.com/).I EBM Online (http://ebm.bmj.com).

3. Search the Medical Literature:I E.g. PubMed (http://www.ncbi.nlm.nih.gov/pubmed/).

EBM Summarisation Diego Molla 7/59

Page 10: Text Mining for Evidence Based Medicine

Evidence Based Medicine Our Research on Text Mining for EBM In Progress/Future Research

Searching Cochrane

EBM Summarisation Diego Molla 8/59

Page 11: Text Mining for Evidence Based Medicine

Evidence Based Medicine Our Research on Text Mining for EBM In Progress/Future Research

Searching PubMed

EBM Summarisation Diego Molla 9/59

Page 12: Text Mining for Evidence Based Medicine

Evidence Based Medicine Our Research on Text Mining for EBM In Progress/Future Research

Contents

Evidence Based MedicineWhat is Evidence Based Medicine?EBM and Language Technology

Our Research on Text Mining for EBMClusteringEvidence GradingSingle-document Summarisation

In Progress/Future Research

EBM Summarisation Diego Molla 10/59

Page 13: Text Mining for Evidence Based Medicine

Evidence Based Medicine Our Research on Text Mining for EBM In Progress/Future Research

Where can LT/Text Mining Help?

I Questions:I Help formulate

answerable questions.I From natural question

to PICO frames ?I Question analysis and

classification.

I Search:I Retrieve and rank

relevant literature.I Extract the

evidence-basedinformation.

I Summarise the results.

EBM Summarisation Diego Molla 11/59

Page 14: Text Mining for Evidence Based Medicine

Evidence Based Medicine Our Research on Text Mining for EBM In Progress/Future Research

Where can LT/Text Mining Help?

I Questions:I Help formulate

answerable questions.I From natural question

to PICO frames ?I Question analysis and

classification.

I Search:I Retrieve and rank

relevant literature.I Extract the

evidence-basedinformation.

I Summarise the results.

EBM Summarisation Diego Molla 11/59

Page 15: Text Mining for Evidence Based Medicine

Evidence Based Medicine Our Research on Text Mining for EBM In Progress/Future Research

Where can LT/Text Mining Help? (II)

I Appraisal: Classify theevidence.

EBM Summarisation Diego Molla 12/59

Page 16: Text Mining for Evidence Based Medicine

Evidence Based Medicine Our Research on Text Mining for EBM In Progress/Future Research

PICO for Asking the Right Question back

EBM Summarisation Diego Molla 13/59

Page 17: Text Mining for Evidence Based Medicine

Evidence Based Medicine Our Research on Text Mining for EBM In Progress/Future Research

The Vision

Which treatments work best for hemorrhoids?

(SOR B) Excision is the most effective treatment forthrombosed external hemorrhoids

(SOR A) Hemorrhoidectomy is the best treatment forprolapsed internal hemorrhoids

(SOR A) Rubber band ligation produces the lowest level ofrecurrence among nonoperative techniques

EBM Summarisation Diego Molla 14/59

Page 18: Text Mining for Evidence Based Medicine

Evidence Based Medicine Our Research on Text Mining for EBM In Progress/Future Research

The Vision

Which treatments work best for hemorrhoids?

(SOR B) Excision is the most effective treatment forthrombosed external hemorrhoids

(SOR A) Hemorrhoidectomy is the best treatment forprolapsed internal hemorrhoids

(SOR A) Rubber band ligation produces the lowest level ofrecurrence among nonoperative techniques

EBM Summarisation Diego Molla 14/59

Page 19: Text Mining for Evidence Based Medicine

Evidence Based Medicine Our Research on Text Mining for EBM In Progress/Future Research

The Vision

Which treatments work best for hemorrhoids?

(SOR B) Excision is the most effective treatment forthrombosed external hemorrhoidsA retrospective study of 231 patients treated conservatively or surgically foundthat the 48.5treated surgically had a lower recurrence rate than theconservative group (number needed to treat [NNT]=2 for recurrence at meanfollow-up of 7.6 months) and earlier resolution of symptoms (average 3.9 dayscompared with 24 days for conservative treatment).

I Reference: Greenspon J, Williams SB, Young HA ,et al. Thrombosedexternal hemorrhoids: outcome after conservative or surgicalmanagement. Dis Colon Rectum. 2004; 47: 1493-1498.

A retrospective analysis of 340 patients who underwent outpatient excision ofthrombosed external hemorrhoids under local anesthesia reported a lowrecurrence rate of 6.5% at a mean follow-up of 17.3 months.

I Reference: Jongen J, Bach S, Stubinger SH ,et al. Excision ofthrombosed external hemorrhoids under local anesthesia: a retrospectiveevaluation of 340 patients. Dis Colon Rectum. 2003; 46: 1226-1231.

EBM Summarisation Diego Molla 15/59

Page 20: Text Mining for Evidence Based Medicine

Evidence Based Medicine Our Research on Text Mining for EBM In Progress/Future Research

Contents

Evidence Based MedicineWhat is Evidence Based Medicine?EBM and Language Technology

Our Research on Text Mining for EBMClusteringEvidence GradingSingle-document Summarisation

In Progress/Future Research

EBM Summarisation Diego Molla 16/59

Page 21: Text Mining for Evidence Based Medicine

Evidence Based Medicine Our Research on Text Mining for EBM In Progress/Future Research

Journal of Family Practice’s “Clinical Inquiries”

EBM Summarisation Diego Molla 17/59

Page 22: Text Mining for Evidence Based Medicine

Evidence Based Medicine Our Research on Text Mining for EBM In Progress/Future Research

The XML Contents I

<r e c o r d i d =”7843”><u r l>h t t p : / /www. j f p o n l i n e . com/ Pages . asp ?AID=7843&amp ; i s s u e=September 2009&amp ; UID=</u r l><q u e s t i o n>Which t r e a t m e n t s work b e s t f o r h e m o r r h o i d s?</q u e s t i o n><answer>

<s n i p i d =”1”><s n i p t e x t>E x c i s i o n i s t h e most e f f e c t i v e t r e a t m e n t f o r thrombosed

e x t e r n a l h e m o r r h o i d s .</ s n i p t e x t><s o r t y p e=”B”> r e t r o s p e c t i v e s t u d i e s </sor><l o n g i d =”1 1”>

<l o n g t e x t>A r e t r o s p e c t i v e s t u d y o f 231 p a t i e n t s t r e a t e dc o n s e r v a t i v e l y o r s u r g i c a l l y found t h a t t h e 48.5% o f p a t i e n t st r e a t e d s u r g i c a l l y had a l o w e r r e c u r r e n c e r a t e than t h ec o n s e r v a t i v e group ( number needed to t r e a t [NNT]=2 f o rr e c u r r e n c e a t mean f o l l o w−up o f 7 . 6 months ) and e a r l i e rr e s o l u t i o n o f symptoms ( a v e r a g e 3 . 9 days compared w i t h 24 daysf o r c o n s e r v a t i v e t r e a t m e n t ).</ l o n g t e x t><r e f i d =”15486746” a b s t r a c t =” A b s t r a c t s /15486746. xml”>GreensponJ , W i l l i a m s SB , Young HA , e t a l . Thrombosed e x t e r n a lh e m o r r h o i d s : outcome a f t e r c o n s e r v a t i v e o r s u r g i c a lmanagement . Dis Colon Rectum . 2 0 0 4 ; 4 7 : 1493−1498.</ r e f>

</long><l o n g i d =”1 2”>

<l o n g t e x t>A r e t r o s p e c t i v e a n a l y s i s o f 340 p a t i e n t s who underwento u t p a t i e n t e x c i s i o n o f thrombosed e x t e r n a l h e m o r r h o i d s underl o c a l a n e s t h e s i a r e p o r t e d a low r e c u r r e n c e r a t e o f 6.5% a t a

EBM Summarisation Diego Molla 18/59

Page 23: Text Mining for Evidence Based Medicine

Evidence Based Medicine Our Research on Text Mining for EBM In Progress/Future Research

The XML Contents II

mean f o l l o w−up o f 1 7 . 3 months.</ l o n g t e x t><r e f i d =”12972967” a b s t r a c t =” A b s t r a c t s /12972967. xml”>Jongen J ,Bach S , S t u b i n g e r SH , e t a l . E x c i s i o n o f thrombosed e x t e r n a lh e m o r r h o i d s under l o c a l a n e s t h e s i a : a r e t r o s p e c t i v e e v a l u a t i o no f 340 p a t i e n t s . Dis Colon Rectum . 2 0 0 3 ; 4 6 : 1226−1231.</ r e f>

</long><l o n g i d =”1 3”>

<l o n g t e x t>A p r o s p e c t i v e , randomized c o n t r o l l e d t r i a l (RCT) o f 98p a t i e n t s t r e a t e d n o n s u r g i c a l l y found improved p a i n r e l i e f w i t h ac o m b i n a t i o n o f t o p i c a l n i f e d i p i n e 0.3% and l i d o c a i n e 1.5% comparedw i t h l i d o c a i n e a l o n e . The NNT f o r complete p a i n r e l i e f a t 7 days was3.</ l o n g t e x t><r e f i d =”11289288” a b s t r a c t =” A b s t r a c t s /11289288. xml”>P e r r o t t i P ,A n t r o p o l i C , Mol ino D , e t a l . C o n s e r v a t i v e t r e a t m e n t o f a c u t ethrombosed e x t e r n a l h e m o r r h o i d s w i t h t o p i c a l n i f e d i p i n e . DisColon Rectum . 2 0 0 1 ; 4 4 : 405−409.</ r e f>

</long></s n i p>

</answer></r e c o r d>

EBM Summarisation Diego Molla 19/59

Page 24: Text Mining for Evidence Based Medicine

Evidence Based Medicine Our Research on Text Mining for EBM In Progress/Future Research

Components of the Corpus

Question Direct extract from the source.

Answer Split from the source and manually checked.

Evidence Extracted from the source.

Additional text Manually extracted from the source and massaged.

References PMID looked up in PubMed (automatic and manualprocedure).

EBM Summarisation Diego Molla 20/59

Page 25: Text Mining for Evidence Based Medicine

Evidence Based Medicine Our Research on Text Mining for EBM In Progress/Future Research

Corpus Statistics

Size

I 456 questions (“records”).

I 1,396 answer parts (“snips”).

I 3,036 answer justifications (“longs”).I 3,705 references:

I 2,908 unique references.I 2,657 XML abstracts from PubMed.

EBM Summarisation Diego Molla 21/59

Page 26: Text Mining for Evidence Based Medicine

Evidence Based Medicine Our Research on Text Mining for EBM In Progress/Future Research

Answer parts per Question

Avg=3.06

EBM Summarisation Diego Molla 22/59

Page 27: Text Mining for Evidence Based Medicine

Evidence Based Medicine Our Research on Text Mining for EBM In Progress/Future Research

Answer justifications per answer part

Avg=2.17

EBM Summarisation Diego Molla 23/59

Page 28: Text Mining for Evidence Based Medicine

Evidence Based Medicine Our Research on Text Mining for EBM In Progress/Future Research

References per answer justification

Avg=1.22

EBM Summarisation Diego Molla 24/59

Page 29: Text Mining for Evidence Based Medicine

Evidence Based Medicine Our Research on Text Mining for EBM In Progress/Future Research

References per question

Avg=6.57

EBM Summarisation Diego Molla 25/59

Page 30: Text Mining for Evidence Based Medicine

Evidence Based Medicine Our Research on Text Mining for EBM In Progress/Future Research

Evidence Grade

EBM Summarisation Diego Molla 26/59

Page 31: Text Mining for Evidence Based Medicine

Evidence Based Medicine Our Research on Text Mining for EBM In Progress/Future Research

References

EBM Summarisation Diego Molla 27/59

Page 32: Text Mining for Evidence Based Medicine

Evidence Based Medicine Our Research on Text Mining for EBM In Progress/Future Research

Contents

Evidence Based MedicineWhat is Evidence Based Medicine?EBM and Language Technology

Our Research on Text Mining for EBMClusteringEvidence GradingSingle-document Summarisation

In Progress/Future Research

EBM Summarisation Diego Molla 28/59

Page 33: Text Mining for Evidence Based Medicine

Evidence Based Medicine Our Research on Text Mining for EBM In Progress/Future Research

The Vision

Which treatments work best for hemorrhoids?

(SOR B) Excision is the most effective treatment forthrombosed external hemorrhoids

(SOR A) Hemorrhoidectomy is the best treatment forprolapsed internal hemorrhoids

(SOR A) Rubber band ligation produces the lowest level ofrecurrence among nonoperative techniques

EBM Summarisation Diego Molla 29/59

Page 34: Text Mining for Evidence Based Medicine

Evidence Based Medicine Our Research on Text Mining for EBM In Progress/Future Research

Clustering for EBM Summarisation

Input

QUESTION:Which treatments workbest for hemorrhoids?

DOCUMENTS:[11289288] [12972967][1442682] [15486746][16235372] [16252313][17054255] [17380367]

clustering

=⇒

Output

1. [11289288] [12972967][15486746]

2. [17054255] [17380367]

3. [1442682] [16252313][16235372]

EBM Summarisation Diego Molla 30/59

Page 35: Text Mining for Evidence Based Medicine

Evidence Based Medicine Our Research on Text Mining for EBM In Progress/Future Research

Clustering Approach (Shash & Molla 2013)

I K -means(non-overlappingclustering).

I Unigram-basedfeatures.

I lowercased, stopwords removed,tf.idf ofremainingwords.

EBM Summarisation Diego Molla 31/59

Page 36: Text Mining for Evidence Based Medicine

Evidence Based Medicine Our Research on Text Mining for EBM In Progress/Future Research

Results

Table 1: Average entropy for optimal K clusters.

UMLS UMLSMeasure Whole XML Abstract only concepts only semantic types

Euclidean 0.260 0.264 0.274 0.310Correlation 0.348 0.362 0.349 0.347Cosine 0.249 0.266 0.277 0.298Dice 0.332 0.328 0.324 0.334Jaccard 0.320 0.330 0.317 0.327Manhattan 0.288 0.299 0.305 0.296

Entropy of pure random clustering is − log2(1/K ) = 1.263.

EBM Summarisation Diego Molla 32/59

Page 37: Text Mining for Evidence Based Medicine

Evidence Based Medicine Our Research on Text Mining for EBM In Progress/Future Research

Contents

Evidence Based MedicineWhat is Evidence Based Medicine?EBM and Language Technology

Our Research on Text Mining for EBMClusteringEvidence GradingSingle-document Summarisation

In Progress/Future Research

EBM Summarisation Diego Molla 33/59

Page 38: Text Mining for Evidence Based Medicine

Evidence Based Medicine Our Research on Text Mining for EBM In Progress/Future Research

The Vision

Which treatments work best for hemorrhoids?

(SOR B) Excision is the most effective treatment forthrombosed external hemorrhoids

(SOR A) Hemorrhoidectomy is the best treatment forprolapsed internal hemorrhoids

(SOR A) Rubber band ligation produces the lowest level ofrecurrence among nonoperative techniques

EBM Summarisation Diego Molla 34/59

Page 39: Text Mining for Evidence Based Medicine

Evidence Based Medicine Our Research on Text Mining for EBM In Progress/Future Research

Appraisal

Input: group of references (optional: question, answer part)

<s n i p><q u e s t i o n>Which t r e a t m e n t s work b e s t f o r h e m o r r h o i d s ?</ q u e s t i o n><s n i p t e x t>E x c i s i o n i s t h e most e f f e c t i v e t r e a t m e n t f o r thrombosed e x t e r n a l

h e m o r r h o i d s .</ s n i p t e x t><r e f i d=” 15486746 ”/> <r e f i d=” 12972967 ”/> <r e f i d=” 11289288 ”/>

</ s n i p>

Target: Strength of Recommendation (SOR)

<s o r t y p e=”B”>r e t r o s p e c t i v e s t u d i e s</ s o r>

The SORT Taxonomy

A Consistent and good-quality patient-oriented evidence.

B Inconsistent or limited-quality patient-oriented evidence.

C Consensus, usual practise, opinion, disease-oriented evidence, or case series for studies of diagnosis,treatment, prevention, or screening.

EBM Summarisation Diego Molla 35/59

Page 40: Text Mining for Evidence Based Medicine

Evidence Based Medicine Our Research on Text Mining for EBM In Progress/Future Research

Cascaded Classification (Molla & Sarker, ALTA 2011)

Process: Cascaded SVMs

1. Default class: B.

2. SVMs with abstract n-grams to identify A and C.

3. SVMs with publication types to identify A and C.

4. SVMs with title n-grams to identify A and C.

Results

Method Accuracy C I

Majority (B) 48.63% 41.5 – 55.83Cascaded SVMs 62.84%

http://corine13.c.o.pic.centerblog.net/h7f1xcsu.jpgEBM Summarisation Diego Molla 36/59

Page 41: Text Mining for Evidence Based Medicine

Evidence Based Medicine Our Research on Text Mining for EBM In Progress/Future Research

Contents

Evidence Based MedicineWhat is Evidence Based Medicine?EBM and Language Technology

Our Research on Text Mining for EBMClusteringEvidence GradingSingle-document Summarisation

In Progress/Future Research

EBM Summarisation Diego Molla 37/59

Page 42: Text Mining for Evidence Based Medicine

Evidence Based Medicine Our Research on Text Mining for EBM In Progress/Future Research

The Vision

Which treatments work best for hemorrhoids?

(SOR B) Excision is the most effective treatment forthrombosed external hemorrhoidsA retrospective study of 231 patients treated conservatively or surgically foundthat the 48.5treated surgically had a lower recurrence rate than theconservative group (number needed to treat [NNT]=2 for recurrence at meanfollow-up of 7.6 months) and earlier resolution of symptoms (average 3.9 dayscompared with 24 days for conservative treatment).

I Reference: Greenspon J, Williams SB, Young HA ,et al. Thrombosedexternal hemorrhoids: outcome after conservative or surgicalmanagement. Dis Colon Rectum. 2004; 47: 1493-1498.

A retrospective analysis of 340 patients who underwent outpatient excision ofthrombosed external hemorrhoids under local anesthesia reported a lowrecurrence rate of 6.5% at a mean follow-up of 17.3 months.

I Reference: Jongen J, Bach S, Stubinger SH ,et al. Excision ofthrombosed external hemorrhoids under local anesthesia: a retrospectiveevaluation of 340 patients. Dis Colon Rectum. 2003; 46: 1226-1231.

EBM Summarisation Diego Molla 38/59

Page 43: Text Mining for Evidence Based Medicine

Evidence Based Medicine Our Research on Text Mining for EBM In Progress/Future Research

Single-document Query-based Summarisation

Input

I Which treatments work best for hemorrhoids?

I Abstract of Greenspon J, Williams SB, Young HA ,et al. Thrombosedexternal hemorrhoids: outcome after conservative or surgicalmanagement. Dis Colon Rectum. 2004; 47: 1493-1498.

Target Output

A retrospective study of 231 patients treated conservatively or surgically foundthat the 48.5% of patients treated surgically had a lower recurrence rate thanthe conservative group (number needed to treat [NNT]=2 for recurrence atmean follow-up of 7.6 months) and earlier resolution of symptoms (average 3.9days compared with 24 days for conservative treatment).

EBM Summarisation Diego Molla 39/59

Page 44: Text Mining for Evidence Based Medicine

Evidence Based Medicine Our Research on Text Mining for EBM In Progress/Future Research

Extractive Summarisation (Sarker et al. CBMS 2012)

Input

I Which treatments work best for hemorrhoids?

I Abstract of Greenspon J, Williams SB, Young HA ,et al. Thrombosedexternal hemorrhoids: outcome after conservative or surgicalmanagement. Dis Colon Rectum. 2004; 47: 1493-1498.

Actual OutputThe aim was to test the efficacy of local application of nifedipine ointment in healing acute thrombosed externalhemorrhoids.Results obtained were as follows: complete relief of pain in 43 patients (86 percent) of the nifedipine-treated groupas opposed to 24 patients (50 percent) of the control group after 7 days of therapy (P < 0.01); oral analgesicswere used by 4 patients (8 percent) in the nifedipine-treated group as opposed to 26 patients (54.1 percent) of thecontrol group after 7 days of therapy (P < 0.01); and resolution of acute thrombosed external hemorrhoids wasachieved after 14 days of therapy in 46 patients (92 percent) of the nifedipine-treated group, as opposed to 22patients (45.8 percent) of the control group (P < 0.01).Our study clearly demonstrates that the use of topical nifedipine, which at present is for treatment ofcardiovascular disorders, is a reliable new option in the conservative treatment of thrombosed external hemorrhoids.

EBM Summarisation Diego Molla 40/59

Page 45: Text Mining for Evidence Based Medicine

Evidence Based Medicine Our Research on Text Mining for EBM In Progress/Future Research

General Approach (Sarker et al., CBMS 2012)

In a Nutshell

1. Gather statistics from the best 3-sentence extracts.I Exhaustive search to find these best extracts.

2. Build three classifiers, one per sentence in the final extract.I Classifier 1 based on statistics from best 1st sentence.I Classifier 2 based on statistics from best 2nd sentence.I Classifier 3 based on statistics from best 3rd sentence.

EBM Summarisation Diego Molla 41/59

Page 46: Text Mining for Evidence Based Medicine

Evidence Based Medicine Our Research on Text Mining for EBM In Progress/Future Research

The Statistics Gathered

1. Source sentence position.

2. Sentence length.

3. Sentence similarity.

4. Sentence type.

EBM Summarisation Diego Molla 42/59

Page 47: Text Mining for Evidence Based Medicine

Evidence Based Medicine Our Research on Text Mining for EBM In Progress/Future Research

1. Source Sentence Position

I Compute relative positions (0 . . . 1).

I Create normalised frequency histograms f1, f2, . . . , f10.

I Score every relative position in bin i with its bin frequency:Spos(i) = fbin(i).

EBM Summarisation Diego Molla 43/59

Page 48: Text Mining for Evidence Based Medicine

Evidence Based Medicine Our Research on Text Mining for EBM In Progress/Future Research

2. Sentence Length

Reward larger sentences and penalise shorter sentences:

Normalised sentence length

Slen(i) =ls − lavg

ld

ls : sentence length

lavg : average sentence length in the corpus

ld : document length

EBM Summarisation Diego Molla 44/59

Page 49: Text Mining for Evidence Based Medicine

Evidence Based Medicine Our Research on Text Mining for EBM In Progress/Future Research

3. Sentence Similarity

Sentence Similarity

I Lowercase, stem, remove stop words.

I Build vector of tf .idf with remaining words and UMLSsemantic types.

I CosSim(X ,Y ) = X .Y|X ||Y |

Maximal Marginal Relevance (Carbonell & Goldstein, 1998)

Reward sentences similar to the query and penalise those similar toother summary sentences.MMR = λ(CosSim(Si ,Q))

−(1− λ)maxSj εS (CosSim(Si , Sj ))

EBM Summarisation Diego Molla 45/59

Page 50: Text Mining for Evidence Based Medicine

Evidence Based Medicine Our Research on Text Mining for EBM In Progress/Future Research

4. PIBOSO (Kim et al. 2011) I

1. Classify all sentences into PIBOSO types (a variant of PICO).

2. Generate normalised frequency histograms of resultingPIBOSO types.

EBM Summarisation Diego Molla 46/59

Page 51: Text Mining for Evidence Based Medicine

Evidence Based Medicine Our Research on Text Mining for EBM In Progress/Future Research

4. PIBOSO (Kim et al. 2011) II

Position independent

SPIPS (i) =Pbest

Pall

Position dependent

SPDPS (i) =Ppos

Pbest

Pbest : proportion of this PIBOSO typeamong all best summary sentences.

Pall : proportion of this PIBOSO typeamong all sentences.

Ppos : proportion of this PIBOSO typeamong all best summary sentences atthis position.

EBM Summarisation Diego Molla 47/59

Page 52: Text Mining for Evidence Based Medicine

Evidence Based Medicine Our Research on Text Mining for EBM In Progress/Future Research

Classification

Edmunsonian Formula

SSi= αSrposi + βSleni

+ γSPIPSi

+δSPDPSi+ εSMMRi

I MMR is replaced with cosine similarity for first sentence.

I In case of ties, the sentence with greatest length is chosen.

I Parameters are fine-tuned through exhaustive search (gridsearch) using training set.

α = 1.0, β = 0.8, γ = 0.1, δ = 0.8, ε = 0.1, λ = 0.1.

EBM Summarisation Diego Molla 48/59

Page 53: Text Mining for Evidence Based Medicine

Evidence Based Medicine Our Research on Text Mining for EBM In Progress/Future Research

Systems

For comparison

L3 Last three sentences.

O3 Last three PIBOSO outcome sentences.

R Random.

O All outcome sentences.

PI Sentence position independent.

Our proposal

PD Sentence position dependent.

EBM Summarisation Diego Molla 49/59

Page 54: Text Mining for Evidence Based Medicine

Evidence Based Medicine Our Research on Text Mining for EBM In Progress/Future Research

Results (ROUGE-L F score) (Sarker et al. AIME 2013)

System F-Score 95% CI Percentile (%)

L3 0.155 0.151–0.158 55.9O3 0.160 0.158–0.164 78.1R 0.152 0.149–0.156 46.1O 0.159 0.155–0.164 74.2PI 0.160 0.157–0.164 78.1

PD 0.168 0.164–0.172 96.8

EBM Summarisation Diego Molla 50/59

Page 55: Text Mining for Evidence Based Medicine

Evidence Based Medicine Our Research on Text Mining for EBM In Progress/Future Research

Contents

Evidence Based MedicineWhat is Evidence Based Medicine?EBM and Language Technology

Our Research on Text Mining for EBMClusteringEvidence GradingSingle-document Summarisation

In Progress/Future Research

EBM Summarisation Diego Molla 51/59

Page 56: Text Mining for Evidence Based Medicine

Evidence Based Medicine Our Research on Text Mining for EBM In Progress/Future Research

The Vision

Which treatments work best for hemorrhoids?

(SOR B) Excision is the most effective treatment forthrombosed external hemorrhoids

(SOR A) Hemorrhoidectomy is the best treatment forprolapsed internal hemorrhoids

(SOR A) Rubber band ligation produces the lowest level ofrecurrence among nonoperative techniques

EBM Summarisation Diego Molla 52/59

Page 57: Text Mining for Evidence Based Medicine

Evidence Based Medicine Our Research on Text Mining for EBM In Progress/Future Research

The Vision

Which treatments work best for hemorrhoids?

(SOR B) Excision is the most effective treatment forthrombosed external hemorrhoids

(SOR A) Hemorrhoidectomy is the best treatment forprolapsed internal hemorrhoids

(SOR A) Rubber band ligation produces the lowest level ofrecurrence among nonoperative techniques

EBM Summarisation Diego Molla 52/59

Page 58: Text Mining for Evidence Based Medicine

Evidence Based Medicine Our Research on Text Mining for EBM In Progress/Future Research

The Vision

Which treatments work best for hemorrhoids?

(SOR B) Excision is the most effective treatment forthrombosed external hemorrhoidsA retrospective study of 231 patients treated conservatively or surgically foundthat the 48.5treated surgically had a lower recurrence rate than theconservative group (number needed to treat [NNT]=2 for recurrence at meanfollow-up of 7.6 months) and earlier resolution of symptoms (average 3.9 dayscompared with 24 days for conservative treatment).

I Reference: Greenspon J, Williams SB, Young HA ,et al. Thrombosedexternal hemorrhoids: outcome after conservative or surgicalmanagement. Dis Colon Rectum. 2004; 47: 1493-1498.

A retrospective analysis of 340 patients who underwent outpatient excision ofthrombosed external hemorrhoids under local anesthesia reported a lowrecurrence rate of 6.5mean follow-up of 17.3 months.

I Reference: Jongen J, Bach S, Stubinger SH ,et al. Excision ofthrombosed external hemorrhoids under local anesthesia: a retrospectiveevaluation of 340 patients. Dis Colon Rectum. 2003; 46: 1226-1231.

EBM Summarisation Diego Molla 53/59

Page 59: Text Mining for Evidence Based Medicine

Evidence Based Medicine Our Research on Text Mining for EBM In Progress/Future Research

In Progress: A Proof-of-Concept System I

I http://144.6.224.235:8000

I Michael van Treeck (ITEC810 project 2014)

I Yan Moiseev (Summer internship 2015)

EBM Summarisation Diego Molla 54/59

Page 60: Text Mining for Evidence Based Medicine

Evidence Based Medicine Our Research on Text Mining for EBM In Progress/Future Research

In Progress: A Proof-of-Concept System II

EBM Summarisation Diego Molla 55/59

Page 61: Text Mining for Evidence Based Medicine

Evidence Based Medicine Our Research on Text Mining for EBM In Progress/Future Research

In Progress: A Proof-of-Concept System III

EBM Summarisation Diego Molla 56/59

Page 62: Text Mining for Evidence Based Medicine

Evidence Based Medicine Our Research on Text Mining for EBM In Progress/Future Research

In Progress: Identifying Keywords of the Answer (OSP2013)

Keyword ExtractionTechniques

I tf.idf.

I Filter with Part of SpeechPatterns.

I Filter with C-Value, NC-Value.

I Topic modelling variants (e.g.LDA).

EBM Summarisation Diego Molla 57/59

Page 63: Text Mining for Evidence Based Medicine

Evidence Based Medicine Our Research on Text Mining for EBM In Progress/Future Research

Further Research

I Use the actual output of PubMed.

I Fine-tune search techniques.

I Incorporate question types.

I Overlapping clustering.

I Label the clusters.

I Combine single summaries.

I Test with real people.

EBM Summarisation Diego Molla 58/59

Page 64: Text Mining for Evidence Based Medicine

Evidence Based Medicine Our Research on Text Mining for EBM In Progress/Future Research

About Me: Diego Molla Aliod

Research interests

I Question Answering.

I Summarisation.

I Information Extraction.

Questions?

Further information about this research:http://web.science.mq.edu.au/~diego/medicalnlp/

EBM Summarisation Diego Molla 59/59