Top Banner
Social media mining for pharmacovigilance Graciela Gonzalez-Hernandez @gracielagon email: [email protected] CPeRT - Feb 20, 2017 Funded by NLM/NIH grant number 5R01LM011176
25

Social media mining for pharmacovigilance...Social media mining for pharmacovigilance Graciela Gonzalez-Hernandez @gracielagon email: [email protected] CPeRT - Feb 20, 2017 Funded by

Jun 05, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Social media mining for pharmacovigilance...Social media mining for pharmacovigilance Graciela Gonzalez-Hernandez @gracielagon email: gragon@upenn.edu CPeRT - Feb 20, 2017 Funded by

Social media mining for

pharmacovigilance

Graciela Gonzalez-Hernandez @gracielagon

email: [email protected]

CPeRT - Feb 20, 2017

Funded by NLM/NIH grant number 5R01LM011176

Page 2: Social media mining for pharmacovigilance...Social media mining for pharmacovigilance Graciela Gonzalez-Hernandez @gracielagon email: gragon@upenn.edu CPeRT - Feb 20, 2017 Funded by

2

Social media as an “online health report”?

26% of internet users actively

discuss health information. Of that

group …1

– 30% changed behavior as a result

– 42% discussed current medical

conditions

“Extrapolating” this to Twitter...2,3

– Given 317 million active monthly users (Q3

2015): about 24 million would change their

health behavior

– Given 350,000 tweets/minute: about 38,220

tweets / minute about their current medical

conditions

1http://www.pewinternet.org/fact-sheets/health-fact-sheet/2http://www.statisticbrain.com/twitter-statistics/ 2http://www.statista.com/statistics/282087/number-of-monthly-active-twitter-users/ 3www.internetlivestats.com/twitter-statistics/

Page 3: Social media mining for pharmacovigilance...Social media mining for pharmacovigilance Graciela Gonzalez-Hernandez @gracielagon email: gragon@upenn.edu CPeRT - Feb 20, 2017 Funded by

3

Social media in public health monitoring

Growing interest - from just over 100 to 2000 publications including “social media” or “social network” in PubMed over the last 10 years:

• Identifying smoking cessation patterns (Struik and Baskerville 2014),

• Identifying user social circles with common experiences (like prescription drug abuse) (Hanson et al 2014) ,

• Monitoring malpractice (Nakhasi et al 2012),

• Tracking infectious/viral disease spread (Broniatowski et al 2013) (Paul and Dredze, 2011)

In September of 2015, our JBI paper “Utilizing Social Media Data for Pharmacovigilance: A Review” was nominated as one of the 10 articles with greatest potential social impact from the over 2500 journals published by Elsevier /Atlas.

Public health monitoring challenge: how do we get observations over time for specific groups that share interesting characteristics?

Page 4: Social media mining for pharmacovigilance...Social media mining for pharmacovigilance Graciela Gonzalez-Hernandez @gracielagon email: gragon@upenn.edu CPeRT - Feb 20, 2017 Funded by

4

Social Media for health monitoring?

What do systematic reviews tell us? • Under-reporting is a problem in current surveillance systems. (37 studies

from 12 countries) showed median under-reporting rate was 94% (82-98%). For serioius/severe, 85%. (Hazell & Shakir, Drug Saf 2008 PMID 16689555).

• Abundant reports in SM. (29 studies that compared SM to other sources) showed a higher frequency of adverse events was found in social media and that this was particularly true for ‘symptom’ related and ‘mild’ adverse events. (Golder et all, Br J Clin Pharmacol 2015 PMID 26271492).

• Patient reporting brings different perspective, more info. (34 studies) Patient reporting brings novel information, more detail, info on severity and impact of ADRs in daily life. (Inacio et al, 2017 Br J Clin Pharmacol PMID 27558545).

Targeted, diverse, cohorts may be more easily accessible through social media.

People reveal information in social media that may not be available from FAERS or health records

• e.g., information about medication abuse, co-ingestion, sentiment regarding medications, impact on daily life…

Recruitment of cohorts via social media is something already being considered

• Shere et al “The Role of Social Media in Clinical Trials” (PMC3966825)

• Admon et al “Recruiting Pregnant Patients for Survey Research: A Head to Head Comparison of Social Media-Based Versus Clinic-Based Approaches (PMC5215244)

Page 5: Social media mining for pharmacovigilance...Social media mining for pharmacovigilance Graciela Gonzalez-Hernandez @gracielagon email: gragon@upenn.edu CPeRT - Feb 20, 2017 Funded by

5

Difficulties with social media data

Incompleteness:

• Not all health conditions may be revealed through social media posts

• While social media data may provide access to larger population,

complete data about individual cases may be difficult to obtain:

pregnant woman can be identified and detected to be taking drug X,

but dosage, frequency etc. information may be missing

• Participants from the cohort may dropout at higher rates

Accessibility:

• Data from social media is dependent on the available APIs

• Data collection methods may have to be changed frequently over time

Authenticity:

• Bots – a large portion of social media is now generated by bots,

making it harder to mine reliable data

Page 6: Social media mining for pharmacovigilance...Social media mining for pharmacovigilance Graciela Gonzalez-Hernandez @gracielagon email: gragon@upenn.edu CPeRT - Feb 20, 2017 Funded by

6

“Typical” Social media mining pipeline

Data collection

Annotation

Resource adaptation

Classification

Information extraction

Normalization

Case studies / validation

Page 7: Social media mining for pharmacovigilance...Social media mining for pharmacovigilance Graciela Gonzalez-Hernandez @gracielagon email: gragon@upenn.edu CPeRT - Feb 20, 2017 Funded by

7

HA! Not if you're on #Seroquil. EXTREMELY vivid dreams

that stay in conscious memory. Very #Freaky! Any idea why?

I'm def suing cymbalta. I can't wait until its out of my system.

Get out!!!!!!! Nowwww!!!!! You turn peaceful people into the

hulk!. (c0034634 – Rage)

Apparently, Baclofen greatly exacerbates the "AD" part of my ADHD. Average length of focus today: about 30 seconds. (c0235198 – cerebration impaired)

The 100mg tabs of trazodone my gp prescribed are too much,

now that I don't take them every night. Still zombieish after an

hour awake

Gone from 50mg to 150mg of Serequel last night. Could

barely wake up this morning and I feel like my body is made

of lead

A taste of Twitter ADR lingo

Page 8: Social media mining for pharmacovigilance...Social media mining for pharmacovigilance Graciela Gonzalez-Hernandez @gracielagon email: gragon@upenn.edu CPeRT - Feb 20, 2017 Funded by

8

Data collection and annotation

Phonetic spelling variants for capturing misspelled

medication names1

(http://diego.asu.edu/Publications/ADRSpell/ADRSpell.html)

• Seroquel -> siroquil, seroquil etc.

Binary and full ADR annotations2,3

Multiple trained annotators + pharmacology expert

to resolve annotation disagreements

1 Pimpalkhute et al. Phonetic Spelling Filter for Keyword Selection. AMIA Jt Summits Transl Sci Proc. 2014.

2 O’Connor et al. Pharmacovigilance on Twitter. AMIA Annu Symp Proc. 2014.

3 Ginn et al. Mining Twitter for adverse drug reaction mentions. BioTxtM. 2014.

Page 9: Social media mining for pharmacovigilance...Social media mining for pharmacovigilance Graciela Gonzalez-Hernandez @gracielagon email: gragon@upenn.edu CPeRT - Feb 20, 2017 Funded by

9

Annotation example

Works to calm mania or depression but zonks me and scares

me about diabetes issues reported.

Indication:

mania (C0338831)

Indication:

depression (C001157)

ADR: drowsiness

(C0013144)

Other:

diabetes

stops me from crying most of the time, blocks most of my

feelings

Indication:

crying (C0010399)

Adverse reaction:

emotional indifference

(C0001726)

Page 10: Social media mining for pharmacovigilance...Social media mining for pharmacovigilance Graciela Gonzalez-Hernandez @gracielagon email: gragon@upenn.edu CPeRT - Feb 20, 2017 Funded by

10

Text classification

Generate a large set of features, representing semantic

properties (e.g., sentiment, polarity, and topic), from

short text nuggets1

• Combine training data from different corpora in attempts to boost

classification accuracies

• Effort in resource creation pays off

Other text classification tasks:

• Drug abuse classification2

• Drug safety classification3

1 Sarker and Gonzalez. Portable automatic text classification. J Biomed Inform. 2015.

2 Sarker et al. Social media mining for toxicovigilance. Drug Saf. 2016.

3 Patki et al. Mining adverse drug .. going beyond extraction. BioLinkSig. 2014.

Page 11: Social media mining for pharmacovigilance...Social media mining for pharmacovigilance Graciela Gonzalez-Hernandez @gracielagon email: gragon@upenn.edu CPeRT - Feb 20, 2017 Funded by

11

ADR extraction

To automatically extract exact mentions of ADRs

and other information

Traditional, lexicon-based approaches perform poorly on

social media text

Page 12: Social media mining for pharmacovigilance...Social media mining for pharmacovigilance Graciela Gonzalez-Hernandez @gracielagon email: gragon@upenn.edu CPeRT - Feb 20, 2017 Funded by

12

ADRMine: deep learning

Our approach using conditional random fields

outperforms lexicon based approaches1

Shared Task at PSB 2016 showed it outperforms all

others.

Particularly ambiguous ADRs captured by “cluster”

feature

1 Nikfarjam et al. Pharmacovigilance from social media.. sequence labeling with word embedding cluster

features. JAMIA. 2015.

Publication resources: http://diego.asu.edu/Publications/ADRMine.html

Page 13: Social media mining for pharmacovigilance...Social media mining for pharmacovigilance Graciela Gonzalez-Hernandez @gracielagon email: gragon@upenn.edu CPeRT - Feb 20, 2017 Funded by

13

Unsupervised learned clusters

Cluster# Topic Examples of clustered words

c1 Drug abilify, adderall, ambien, ativan, aspirin, citalopram, effexor, paxil,

c2 Signs/Symptoms hangover, headache, rash, hive, …

c3 Signs/Symptoms anxiety, depression, disorder, ocd, mania, stabilizer, …

c4 Drug dosage 1000mg, 100mg, .10, 10mg, 600mg, 0.25, .05, ...

c5 Treatment anti-depressant, antidepressant, drug, med, medication, medicine,

treat, …

c6 Family member brother, dad, daughter, father, husband, mom, mother, son, wife, …

c7 Date 1992, 2011, 2012, 23rd, 8th, april, aug, august, december, …

Page 14: Social media mining for pharmacovigilance...Social media mining for pharmacovigilance Graciela Gonzalez-Hernandez @gracielagon email: gragon@upenn.edu CPeRT - Feb 20, 2017 Funded by

14

Concept normalization

A set of rule-based techniques followed by semantic similarity

based techniques1

Best F-score: 0.603

Page 15: Social media mining for pharmacovigilance...Social media mining for pharmacovigilance Graciela Gonzalez-Hernandez @gracielagon email: gragon@upenn.edu CPeRT - Feb 20, 2017 Funded by

15

Frequency comparsion: signal analysis

Drug name (Brand name)

Primary Indications

Documented Adverse Effects (Frequency)

Adverse Effects Found in User Comments (Frequency)

carbamazepine (Tegretol)

epilepsy, trigeminal neuralgia

dizziness, somnolence or fatigue, unsteadiness, nausea, vomiting

somnolence or fatigue (12.3%), allergy (5.2%), weight gain (4.1%), rash (3.5%), depression (3.2%), dizziness (2.4%), tremor/spasm (1.7%), headache (1.7%), appetite increased (1.5%), nausea (1.5%)

olanzapine (Zyprexa)

schizophrenia, bipolar disorder

weight gain (65%), alteration in lipids (40%), somnolence or fatigue (26%), increased cholesterol (22%), diabetes (2%)

weight gain (30.0%), somnolence or fatigue (15.9%), appetite increased (4.9%), depression (3.1%), tremor (2.7%), diabetes (2.6%), mania (2.3%), anxiety (1.4%), hallucination (0.7%), edema (0.6%)

trazodone (Oleptro)

depression somnolence or fatigue (46%), headache (33%), dry mouth (25%), dizziness (25%), nausea (21%)

somnolence or fatigue (48.2%), nightmares (4.6%), insomnia (2.7%), addiction (1.7%), headache (1.6%), depression (1.3%), hangover (1.2%), anxiety attack (1.2%), panic reaction (1.1%), dizziness (0.9%)

Page 16: Social media mining for pharmacovigilance...Social media mining for pharmacovigilance Graciela Gonzalez-Hernandez @gracielagon email: gragon@upenn.edu CPeRT - Feb 20, 2017 Funded by

16

Exploring Health Timelines: longitudinal data

We want to be able to explore a condition or event of

interest with a prior or subsequent event or condition.

Thus, if in the timeline of a pregnant woman we find:

• “6 years to this day that I was diagnosed with depression, bi polar

disorder and anxiety disorder. But I am still standing. God is good”

• “5 yrs today since I was diagnosed with type 1 diabetes”,

• “Stop vaping !! I have Asthma !!“

• “I took a Zyrtec this morning and I guess youre not suppose to

consume more than 1 in 24hrs the struggle”

we would want to include this information in the health timeline of the

user for further analysis.

Page 17: Social media mining for pharmacovigilance...Social media mining for pharmacovigilance Graciela Gonzalez-Hernandez @gracielagon email: gragon@upenn.edu CPeRT - Feb 20, 2017 Funded by

17

Focus of this work

Address the gap in longitudinal social media based public

health surveillance

Develop natural language processing (NLP), machine learning,

and information retrieval (IR) methods to help accurately

identify a cohort of pregnant women and collect their social

media timelines

Perform preliminary analyses of the extracted health timelines

to identify limitations, and establish future research goals.

Page 18: Social media mining for pharmacovigilance...Social media mining for pharmacovigilance Graciela Gonzalez-Hernandez @gracielagon email: gragon@upenn.edu CPeRT - Feb 20, 2017 Funded by

18

Page 19: Social media mining for pharmacovigilance...Social media mining for pharmacovigilance Graciela Gonzalez-Hernandez @gracielagon email: gragon@upenn.edu CPeRT - Feb 20, 2017 Funded by

19

Data collection & classification

Collect tweets mentioning pregnancy announcements

• Based on search queries

• Time period: Jan 2014 to Sept 2015

• Example query: “i am * weeks/months pregnant”

• Query count: 18

Not all tweets from the search queries were legitimate pregnancy

announcements

• Example: “…I look like Im 3 months pregnant”

• CLASSIFICATION: N-grams and synsets, sentiment, word clusters

Collect user timelines of positive announcements using Twitter API

DailyStrength Data

• Individual forums for different cohorts.

• Collected data from 5 forums (Pregnancy, Pregnancy After Loss Or Infertility,

Pregnancy Teens, Stillbirth, and Miscarriage)

Page 20: Social media mining for pharmacovigilance...Social media mining for pharmacovigilance Graciela Gonzalez-Hernandez @gracielagon email: gragon@upenn.edu CPeRT - Feb 20, 2017 Funded by

20

Information extraction: concept tagging

• Tweets by pregnancy period

• Extract relevant tweets from pregnancy period, if possible.

• Tag trimester: combination of term and pattern matching.

• “I'm officially 20 weeks pregnant….” : 2nd trimester.

• The proposed algorithm covers most of the cases.

• It fails to cover ambiguous and relative time cases:

• “next week is gone b my last week pregnant who want to make a bet

lol”

Tag medications mentioned

• Dictionary of 7396 drugs total

• FDA drug classification of medication safety: 1916 drugs collected from 3 sources

• Expanded using RxNorm database

Tag health conditions (diseases, side effects…) –not reported-

Page 21: Social media mining for pharmacovigilance...Social media mining for pharmacovigilance Graciela Gonzalez-Hernandez @gracielagon email: gragon@upenn.edu CPeRT - Feb 20, 2017 Funded by

21

Evaluation

• Annotation

• 1200 tweets annotated by two human annotators

• Inter-Annotator agreement (kappa score) was found to be 0.79.

• 10x cross validation was used to find the accuracy of the

classifier.

• 15,523 users out of 35,355 were found to mention legitimate

pregnancy announcements

• Timeline extraction of these users resulted in over 30 million

tweets all of which were indexed to lucene.

Classification Results Precision Recall F-measure

isPreg 0.83 0.79 0.81

notPreg 0.84 0.77 0.80

Page 22: Social media mining for pharmacovigilance...Social media mining for pharmacovigilance Graciela Gonzalez-Hernandez @gracielagon email: gragon@upenn.edu CPeRT - Feb 20, 2017 Funded by

22

Use case: medication mentions

Distribution of top 10 drug mentions by trimester in Twitter:

Note on ibuprofen: it is generally not recommended during pregnancy,

especially during the third trimester. ... Ibuprofen may cause premature

closure of the fetal ductus arteriosus and prolongation of bleeding time

(https://www.drugs.com/pregnancy/ibuprofen.html)

Note on codeine: Codeine use anytime during pregnancy was associated

with planned Cesarean delivery. Third-trimester use was associated with

acute Cesarean and postpartum hemorrhage (PMC3214255)

Page 23: Social media mining for pharmacovigilance...Social media mining for pharmacovigilance Graciela Gonzalez-Hernandez @gracielagon email: gragon@upenn.edu CPeRT - Feb 20, 2017 Funded by

23

Prescription drug abuse monitoring

Users post information about medication abuse on

social media

- about to be cracked on adderall to survive today

- i’m just gonna shower and overdose on Seroquel so I’ll sleep

until morning.

- popped Adderall tonight hahahah let’s finish this 100 page paper

- an oxycodone high from snorting lasts for one hour, if it is

swallowed, your looking at three hour high.

Page 24: Social media mining for pharmacovigilance...Social media mining for pharmacovigilance Graciela Gonzalez-Hernandez @gracielagon email: gragon@upenn.edu CPeRT - Feb 20, 2017 Funded by

24

Adderall® vs. oxycodone abuse patterns

Supervised classification to investigate patterns of

abuse-related tweets1

1 Sarker et al. Social media mining for toxicovigilance. Drug Saf. 2016.