Top Banner
Original Paper Identifying Acute Low Back Pain Episodes in Primary Care Practice from Clinical Notes Riccardo Miotto 1,2,3 , PhD; Bethany L. Percha 2,3 , PhD; Benjamin S. Glicksberg 1,2,3 , PhD; Hao-Chih Lee 2,3 , PhD; Lisanne Cruz 4 , MD, MSc, FAAPMR; Joel T. Dudley 1,2,3 , PhD; and Ismail Nabeel 5 , MD, MPH (1) Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, USA (2) Institute for Next Generation Healthcare, Icahn School of Medicine at Mount Sinai, New York, USA (3) Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, USA (4) Department of Physical Medicine and Rehabilitation, Icahn School of Medicine at Mount Sinai, New York, USA (5) Department of Environmental Medicine and Public Health, Icahn School of Medicine at Mount Sinai, New York, USA Abstract Background: Acute and chronic low back pain (LBP) are different conditions with different treatments. However, they are coded in electronic health records with the same ICD-10 code (M54.5) and can be differentiated only by retrospective chart reviews. This prevents efficient definition of data-driven guidelines for billing and therapy recommendations, such as return-to-work options. Objective: To solve this issue, we evaluate the feasibility of automatically distinguishing acute LBP episodes by analyzing free text clinical notes. Methods: We used a dataset of 17,409 clinical notes from different primary care practices; of these, 891 documents were manually annotated as “acute LBP” and 2,973 were generally associated with LBP via the recorded ICD-10 code. We compared different supervised and unsupervised strategies for automated identification: keyword search; topic modeling; logistic regression with bag-of-n-grams and manual features; and deep learning (ConvNet). We trained the supervised models using either manual annotations or ICD-10 codes as positive labels. Results: ConvNet trained using manual annotations obtained the best results with an AUC- ROC of 0.97 and F-score of 0.69. ConvNet’s results were also robust to reduction of the number of manually annotated documents. In the absence of manual annotations, topic models performed better than methods trained using ICD-10 codes, which were unsatisfactory for identifying LBP acuity. Conclusions: This study uses clinical notes to delineate a potential path toward systematic learning of therapeutic strategies, billing guidelines, and management options for acute LBP at the point of care. All rights reserved. No reuse allowed without permission. not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which was this version posted November 11, 2019. . https://doi.org/10.1101/19010462 doi: medRxiv preprint
22

Identifying Acute Low Back Pain Episodes in Primary Care ... · logistic regression with bag-of-n-grams and manual features; and deep learning (ConvNet). We trained the supervised

May 22, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Identifying Acute Low Back Pain Episodes in Primary Care ... · logistic regression with bag-of-n-grams and manual features; and deep learning (ConvNet). We trained the supervised

OriginalPaperIdentifyingAcuteLowBackPainEpisodesinPrimaryCarePracticefromClinicalNotesRiccardoMiotto1,2,3, PhD; Bethany L. Percha2,3, PhD; Benjamin S. Glicksberg1,2,3, PhD;Hao-Chih Lee2,3, PhD;LisanneCruz4,MD,MSc,FAAPMR;JoelT.Dudley1,2,3,PhD;andIsmailNabeel5,MD,MPH

(1)HassoPlattnerInstituteforDigitalHealthatMountSinai,IcahnSchoolofMedicineatMountSinai,NewYork,USA(2)InstituteforNextGenerationHealthcare,IcahnSchoolofMedicineatMountSinai,NewYork,USA(3)DepartmentofGeneticsandGenomicSciences,IcahnSchoolofMedicineatMountSinai,NewYork,USA(4)DepartmentofPhysicalMedicineandRehabilitation,IcahnSchoolofMedicineatMountSinai,NewYork,USA(5)DepartmentofEnvironmentalMedicineandPublicHealth,IcahnSchoolofMedicineatMountSinai,NewYork,USA

Abstract

Background:Acuteandchroniclowbackpain(LBP)aredifferentconditionswithdifferenttreatments.However,theyarecodedinelectronichealthrecordswiththesameICD-10code(M54.5)andcanbedifferentiatedonlybyretrospectivechartreviews.Thispreventsefficientdefinition of data-driven guidelines for billing and therapy recommendations, such asreturn-to-workoptions.Objective: To solve this issue,we evaluate the feasibility of automatically distinguishingacuteLBPepisodesbyanalyzingfreetextclinicalnotes.Methods:Weusedadatasetof17,409clinicalnotesfromdifferentprimarycarepractices;ofthese,891documentsweremanuallyannotatedas“acuteLBP”and2,973weregenerallyassociatedwithLBPviatherecordedICD-10code.Wecompareddifferentsupervisedandunsupervised strategies for automated identification: keyword search; topic modeling;logisticregressionwithbag-of-n-gramsandmanualfeatures;anddeeplearning(ConvNet).We trained the supervised models using either manual annotations or ICD-10 codes aspositivelabels.Results:ConvNettrainedusingmanualannotationsobtainedthebestresultswithanAUC-ROC of 0.97 and F-score of 0.69. ConvNet’s resultswere also robust to reduction of thenumber of manually annotated documents. In the absence of manual annotations, topicmodels performed better than methods trained using ICD-10 codes, which wereunsatisfactoryforidentifyingLBPacuity.Conclusions:Thisstudyusesclinicalnotestodelineateapotentialpathtowardsystematiclearningoftherapeuticstrategies,billingguidelines,andmanagementoptionsforacuteLBPatthepointofcare.

All rights reserved. No reuse allowed without permission. not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint (which wasthis version posted November 11, 2019. .https://doi.org/10.1101/19010462doi: medRxiv preprint

Page 2: Identifying Acute Low Back Pain Episodes in Primary Care ... · logistic regression with bag-of-n-grams and manual features; and deep learning (ConvNet). We trained the supervised

Keywords: electronic health records; clinical notes; low back pain; natural languageprocessing;machinelearning.

IntroductionLowbackpain(LBP)isoneofthemostcommoncausesofdisabilityinUSadultsundertheageof45 [1],with10-20%ofAmericanworkers reportingpersistentbackpain [2]. LBPimpactsone’sabilitytoworkandaffectsthequalityoflife.Forexample,in2015Luckhauptetal.showedthat,fromapoolof19,441people,16.9%ofworkerswithanyLBPand19.0%ofthosewithfrequentandsevereLBPmissedatleastonefulldayofworkoveraperiodofthreemonths[3].LBPeventsalsoleadtosignificantfinancialburdenforbothindividualsand clinical facilities, with combined direct and indirect costs of treatment formusculoskeletal injuries and associated pain estimated to be approximately $213 billionannually[4].LBPeventsfallintotwomajorcategories:acuteandchronic[5].AcuteLBPoccurssuddenly,usuallyassociatedwithtraumaorinjurywithsubsequentpain,whereaschronicLBPisoftenreportedbypatientsinregularcheckupsandhasledtoasignificantincreaseintheuseofhealthcareservicesoverthepasttwodecades.ItisveryimportanttodifferentiatebetweenacuteandchronicLBPintheclinicalsettingastheseconditions-aswellastheirmanagementandbilling-aresubstantivelydifferent.Chronicbackpainisgenerallytreatedwithspinalinjections[6,7],surgery[8,9],and/orpainmedications[10,11],whileanti-inflammatoriesandarapidreturntonormalactivitiesofdailylivingaregenerallythebestrecommendationsforacuteLBP[12].However, acute and chronic LBP are usually not explicitly separated in electronic healthrecords (EHRs) due to a lack of distinguishing codes. The ICD-10-CM (InternationalClassificationofDiseases,TenthRevision,ClinicalModification)standardonlyincludesthecodeM54.5tocharacterize“Lowbackpain”diagnosis,anddoesnotprovidemodifierstodistinguishdifferentLBPacuities[13].Acuityisusuallyreportedinclinicalnotes,requiringretrospective chart review of the free text to characterize LBP events, which is time-consumingandnotscalable[14].Moreover,acuitycanbeexpressedindifferentways.Forexample,thetextcouldmention“acutelowbackpain”or“acutelbp”,butcouldalsosimplyreport “shootingpaindown into the lower extremities”, “limited spine rangeofmotion”,“vertebral tenderness”, “diffuse pain in lumbarmuscles”, and so on [15]. This variabilitymakesitdifficult forclinical facilitiesandresearcherstogroupLBPepisodesbyacuitytoperformkeytasks,suchasdefiningappropriatediagnosticandbillingcodes;evaluatingtheeffectivenessofprescribedtreatments;andderivingtherapeuticguidelinesandimproveddiagnosticmethodsthatcouldreducetime,disabilityandcost.Thispaperisthefirsttoexploretheuseofautomatedapproachesbasedonmachinelearningandinformationretrievaltoanalyzefree-textclinicalnotesandidentifytheacuityofLBPepisodes.Specifically,weuseasetofmanuallyannotatednotestotrainandevaluatevarious

All rights reserved. No reuse allowed without permission. not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint (which wasthis version posted November 11, 2019. .https://doi.org/10.1101/19010462doi: medRxiv preprint

Page 3: Identifying Acute Low Back Pain Episodes in Primary Care ... · logistic regression with bag-of-n-grams and manual features; and deep learning (ConvNet). We trained the supervised

machine learningarchitecturesbasedon logistic regression,n-grams, topicmodels,wordembeddings and convolutional neural networks, and to demonstrate that some of thesemodelsareable to identifyacuteLBPepisodeswithpromisingprecision. Inaddition,wedemonstratetheineffectivenessofusingICD-10codesalonetotrainthemodels,reinforcingtheideathattheyarenotsufficienttodifferentiatetheacuityofLBP.Ouroverallobjectiveistodevelopanautomatedframeworkthatcanhelpfrontlineprimarycareprovidersinthedevelopment of targeted strategies and return-to-work (RTW) options for acute LBPepisodesinclinicalpractice.

BackgroundandSignificancePrimary care providers (PCPs) are commonly the first medical practitioners to assesspatient’smusculoskeletalinjuriesandpainassociatedwiththeseinjuriesandarethereforeinauniqueposition tooffer reassurance, treatmentoptions, andRTWrecommendationscatered to the acuity of the injury and pain associated with it. Several studies havedocumented increases in medication prescriptions and visits to physicians, physicaltherapists,andchiropractorsforLBPepisodes[16–18].SinceindividualswithchronicLBPseekcareandusehealthcareservicesmorefrequentlythanthosewithacuteLBP,increasesinhealthcareuseandcostsforbackpainaredrivenmorebychronicthanacutecases[19].Arapidreturntonormalactivitiesofdailyliving,includingwork,isgenerallythebestactivityrecommendationforacuteLBPmanagement[12].ThenumberofworkdaysthatarelostduetoacuteLBPcanbereducedbyimplementingclinicalpracticeguidelinesintheprimarycaresetting [20]. In previous work, Cruz et al. built a RTW protocol tool for PCPs based onguidelinesfromtheLBPliterature[21].Basedonthetypeofwork(e.g.,clerical,manual,orheavy) and the severity of the condition, thedoctorwould recommendRTWoptions (inpartialorfulldutycapacity)withinacertainnumberofdays.Thestudyfoundthatphysicianswerelikelytousethisprotocol,especiallywhenitwasintegratedintotheEHRs.TheprotocolwasnotalwaysusedforpatientssufferingfromacuteLBP,however,astheresearchteamwasunabletoquicklyidentifytheacuityusingonlythestructuredEHRdata(e.g., ICD-10codes). Acuity information was only available in the progress notes and was thus notincorporatedintotheautomatedrecommendations.ThispreventedtheresearchteamfromprovidinganaccuratefeedbacktoPCPsbasedonafullpictureofthepatient’scondition.AsimilartoolthatcouldincorporateacuityinformationfromnotescouldprovidemuchmorespecificrecommendationstoPCPsthatincorporatebestpracticeguidelinesforeachacuitylevel.Besidesleadingtomoreprecisecare,thiswouldstreamlinebillingforLBP[22].Similarneedsariseforothermusculoskeletalconditions,suchasknee,elbow,andshoulderpain,whereICD-10codesdonotdifferentiatebypainlevelandacuity[23,24].MachinelearningmethodsforEHRdataprocessingareenablingimprovedunderstandingofpatientclinicaltrajectories,creatingopportunitiestoderivenewclinicalinsights[25,26].Inrecentyears,theapplicationofdeeplearning,ahierarchicalcomputationaldesignbasedon

All rights reserved. No reuse allowed without permission. not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint (which wasthis version posted November 11, 2019. .https://doi.org/10.1101/19010462doi: medRxiv preprint

Page 4: Identifying Acute Low Back Pain Episodes in Primary Care ... · logistic regression with bag-of-n-grams and manual features; and deep learning (ConvNet). We trained the supervised

layersofneuralnetworks[27],tostructuredEHRshasledtopromisingresultsonclinicaltaskslikediseasephenotypingandprediction[28–33].However,awealthofrelevantclinicalinformation remains locked behind clinical narratives in the free text of notes. NaturalLanguage Processing (NLP), a branch of computer science that enables machines tounderstandandprocesshumanlanguage[34]forapplicationslikemachinetranslation[35],text generation [36], and image captioning [37], has beenused toparse clinical notes toextractrelevantinsightsthatcanguideclinicaldecisions[38].RecentapplicationsofdeeplearningtoclinicalNLPhaveclassifiedclinicalnotesaccordingtodiagnosisordiseasecodes[39–41], predicted disease onset [32,42], and extracted primary cancer sites and theirlateralityinpathologyreports[43,44].However,whiledeeplearninghassuccessfullybeenappliedtoanalyzeclinicalnotes,traditionalmethodsarestillpreferablewhentrainingdataarelimited[45,46].Regardlessof the specificmethodology, toolsbasedonNLPapplied to clinicalnarrativeshavenotbeenwidelyusedinclinicalsettings[31,38],despitethefactthatphysiciansarelikely to follow computer-assisted guidelines if recommendations are tied to their ownobservations [47]. In this paper, we present an NLP-based framework that can helpphysiciansadheretobestpracticesandRTWrecommendationsforLBP.Tothebestofourknowledge,therearenostudiestodatethathaveappliedmachinelearningtoclinicalnotestodistinguish theacuityofamusculoskeletal condition incaseswhere it isnotexplicitlycoded.

MethodsThe conceptual steps of this study are summarized in Figure 1, specifically: datasetcomposition; text processing; clinical notes modeling; and experimental evaluation. Theoverall goal was to evaluate the feasibility of automatically identifying clinical notesreporting“acuteLBP”episodes.

All rights reserved. No reuse allowed without permission. not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint (which wasthis version posted November 11, 2019. .https://doi.org/10.1101/19010462doi: medRxiv preprint

Page 5: Identifying Acute Low Back Pain Episodes in Primary Care ... · logistic regression with bag-of-n-grams and manual features; and deep learning (ConvNet). We trained the supervised

Figure1:Conceptualframeworkusedtoevaluatetheuseofautomatedapproachesbasedonmachine learning and information retrieval to analyze free-text clinical notes and identify“acutelowbackpain”episodes.DatasetWeuseda setof free-text clinicalnotesextracted from theMountSinaidatawarehouse,made available foruseunder IRB approval followingHIPAAguidelines. TheMount SinaiHealthSystemisanurbantertiarycarehospitallocatedontheUpperEastSideofManhattanin New York City. It generates a high volume of structured, semi-structured, andunstructureddataaspartof its routinehealthcareandclinicaloperations,which includeinpatient,outpatient,andemergencyroomvisits.TheseclinicalnoteswerecollectedduringapreviouspilotstudyevaluatingaRTWtoolbasedonEHRdatathatincludednearly40,000encountersfor15,715patientsspanningtheyears2016-2018andclinicalnoteswrittenby

All rights reserved. No reuse allowed without permission. not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint (which wasthis version posted November 11, 2019. .https://doi.org/10.1101/19010462doi: medRxiv preprint

Page 6: Identifying Acute Low Back Pain Episodes in Primary Care ... · logistic regression with bag-of-n-grams and manual features; and deep learning (ConvNet). We trained the supervised

81differentproviders(Cruzetal.[21]).Inthatstudy,weusedthepublishedliteraturetodevelop a list of guidelines to determine the assessment andmanagement of acute LBPepisodesinclinicalpractice.InparticularweusedICD-10codesaswellasotherparameters,such as “presenting complaint”, “pre-existing conditions”, “management factors”,“imaging/radiology/testordered”,andsoon,todefineandlabeltheacuityofLBPinaclinicalencounter.Followingtheseguidelines,14individuals(physicalmedicineandrehabilitationfellows,residents,andmedicalstudents)manuallyreviewedarandomsetof4,291clinicalnotesassociatedwiththeseencountersandlabeledall“acutelowbackpain”events.Eachnotewasreviewedbyatleasttwoindividualsandwasfurthercheckedbyaleadphysicianresearcherifitwasmarkedasambiguousand/ortherewasdiscordancebetweenreviewers.Thisproject leveraged the entire set of clinicalnotes thatwere collected in thepreviousstudy. Inparticular,we joinedall theprogressnotesof theseencountersunder thesameinitial visit, andwe eliminated duplicate, short (less than 3words), and non-meaningfulreports.Thefinaldatasetwascomposedof17,409distinctclinicalnotes,withlengthrangingfromsevento6,638words.Ofthisset,3,092notesweremanuallyreviewedinthepreviousstudyand891ofthemwereannotatedas“acuteLBP”.Theremaining14,317noteswerenotmanually evaluated and were related to different clinical domains, including variousmusculoskeletaldisordersandpotentiallyLBPevents.Inthisfinaldataset,1,973noteswerealsoassociatedtoanencounterbilledwithanICD10M54.5“Lowbackpain”code.TextProcessingEverynote in thedatasetwas tokenized, divided into sentences, and checked to removepunctuation,numbers,andnon-relevantconceptssuchasURLs,emails,dates,etc.Eachnotewasthenrepresentedasalistofsentences,witheverysentencebeingalistoflemmatizedwordsrepresentedasone-hotencodings.Thevocabularywascomposedofall thewordsappearingatleastfivetimesinthetrainingset.Thediscardedwordswerecorrectedtotheterms in the vocabularyhaving theminimumedit distance, i.e., theminimumnumberofoperations required to transform one string into the other [48]. This step reduced thenumber of misspelled words and prevented the accidental discarding of relevantinformation;atthesametimeitalsolimitedthesizeofthevocabularytoimprovescalability[39].Overall, thevocabularycoveringthewholedatasetwascomprisedof56,142uniquewords.ClinicalNoteModelingWe evaluated different approaches for identifying clinical notes that refer to acute LBPepisodes.Theseincludedbothsupervisedandunsupervisedmethods.Whilewebenefitedfrom theuseofhigh-qualitymanual annotations to train the supervisedmodels,wealsoinvestigated alternatives that did not require manual annotation of notes. All of these

All rights reserved. No reuse allowed without permission. not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint (which wasthis version posted November 11, 2019. .https://doi.org/10.1101/19010462doi: medRxiv preprint

Page 7: Identifying Acute Low Back Pain Episodes in Primary Care ... · logistic regression with bag-of-n-grams and manual features; and deep learning (ConvNet). We trained the supervised

methodsprovidedstraightforwardexplanationsoftheirpredictions,enablingustovalidateeachmodelandto identifypartsoftextandpatternsthatarerelevanttothe“acuteLBP”predictions.KeywordSearchWesearchedforasetofrelevantkeywordsinthetext.Inparticular,welookedfor“acutelowback pain”, “acute lbp”, “acute lowbp”, and “acute back pain” andwe counted theiroccurrencesinthetext.WeusedNegEx[49]toremovenegatedoccurrencesofthekeywords.Intheevaluation,werefertothismodelas“WordSearch”.TopicModelingWeusedtopicmodelingonthefullsetofwordscontainedinthenotestocaptureabstracttopicsreferredtointhedataset[50].Topicmodelingisanunsupervisedinferenceprocess,inthiscaseimplementedusinglatentDirichletallocation[51],thatcapturespatternsofwordco-occurrences within documents to define interpretable topics (i.e., multinomialdistributionofwords)andrepresentadocumentasamultinomialoverthesetopics.Everydocument can then be classified as talking about one or (usually) more topics. Topicmodeling is often used in healthcare to generalize clinical notes, improve the automaticprocessingofpatientdata,andexploreclinicaldatasets[52–55].Inthisstudy,weassumedthatoneormoreofthesetopicsmightrefertoacuteLBP.Inordertodiscoverthem,weidentifiedthemostlikelytopicsforasetofkeywords(i.e.,“acute”,“low”,“back”,“pain”,“lbp”,“bp”)andwemanuallyreviewedthemtoretainonlythosethatseemedmorelikelytocharacterizeacuteLBPepisodes(i.e.,thatincludedmostofthekeywordswithhighprobability).WethenconsideredthemaximumlikelihoodamongthesetopicsastheprobabilitythatareportreferredtoacuteLBP(i.e.,“TopicModel”intheexperiments).BagofN-gramsEach clinical note was represented as a bag of n-grams (with n = 1, …, 5), with TermFrequency-InverseDocument Frequency (tf-idf)weights (determined from the corpus ofdocuments).Eachn-gramisacontiguoussequenceofnwordsfromthetext.WeconsideredallthewordsinthevocabularyandfilteredthecommonstopwordsbasedontheEnglishdictionarybeforebuildingallthen-grams.TheclassificationwasimplementedusingLogisticRegressionwithLasso(i.e.,“BoN-LR”).FeatureEngineeringWeusedtheprotocolbuiltbyCruzetal. [21]todefineacuteLBPepisodes intheclinicalnotes.Inparticular,weusedalltheconceptsdescribedinthatguideline,pre-processedthemwiththesamealgorithmusedfortheclinicalnotes,andbuiltasetof5,154distinctn-grams(withn=1,…,5),thatwerefertoas“FeatEng”.Wethenrepresentedeachclinicalnoteasa

All rights reserved. No reuse allowed without permission. not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint (which wasthis version posted November 11, 2019. .https://doi.org/10.1101/19010462doi: medRxiv preprint

Page 8: Identifying Acute Low Back Pain Episodes in Primary Care ... · logistic regression with bag-of-n-grams and manual features; and deep learning (ConvNet). We trained the supervised

bag of FeatEng (i.e., we counted the occurrences of only these n-grams in the text),normalizedwithtf-idfweights,andclassifiedthemusingLogisticRegressionwithLasso(i.e.,“FeatEng-LR”).DeepLearningWeimplementedanend-to-enddeepneuralnetworkarchitecture(i.e.,“ConvNet”)thattakesasinputthefullnoteandoutputsitsprobabilityofbeingrelatedto“acuteLBP”.Thefirstlayerofthearchitecturemapsthewordstodensevectorrepresentations(i.e,“embeddings”),which attempt to contextualize the semanticmeaning of eachwordby creating ametricspace where vectors of semantically similar words are close to each other. We appliedword2vecwiththeskip-gramalgorithmtotheparsednotes[56]toinitializetheembeddingofeachwordinthevocabulary.Word2veciscommonlyusedwithEHRstolearnembeddingsofmedicalconceptsfromstructureddataaswellasfromclinicalnotes[46,57–59].TheembeddingswerethenfedtoaConvolutionalNeuralNetwork(CNN)inspiredbythemodel described by Kim [60] and by Liu et al. [42]. This architecture concatenatesrepresentationsofthetextatdifferentlevelsofabstraction,byessentiallychoosingthemostrelevantn-gramsateachlevel.Here,wefirstappliedasetofparallel1Dconvolutionsontheinputsequencewithkernelsizesrangingfrom1to5,thussimulatingn-gramswithn=1,…,5.Theoutputsofeachoftheseconvolutionswerethenmax-pooledoverthewholesequenceandconcatenatedtoa5xddimensionalvector,wheredisthenumberof1Dconvolutionalfilters.Thisrepresentationwasthenfedtosequencesoffullyconnectedlayers,whichlearnthe interactionsbetweenthetext features,andfinally toasigmoid layerthatoutputs thepredictionprobability.Then-grams that aremost relevant to theprediction, in this architecture, are those thatactivatetheneuronsinthemax-poolinglayer.Therefore,weusedthelog-oddsthatthen-gramcontributestothesigmoiddecisionfunction[42]asanindicationofhowmucheachn-graminfluencesthedecision.EvaluationDesignWeevaluatedallthearchitecturesusinga10-foldcross-validationexperiment,witheverynoteappearinginthetestsetonlyonce.Ineachtrainingsetweusedarandom90/10splitto train and validate all themodel configurations.As baselinewe also report the resultsobtainedbyconsideringas“acuteLBP”all thenotesassociatedwiththe“Lowbackpain”M54.5ICD-10code(i.e.,“ICD-10”intheresults).

All rights reserved. No reuse allowed without permission. not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint (which wasthis version posted November 11, 2019. .https://doi.org/10.1101/19010462doi: medRxiv preprint

Page 9: Identifying Acute Low Back Pain Episodes in Primary Care ... · logistic regression with bag-of-n-grams and manual features; and deep learning (ConvNet). We trained the supervised

TrainingAnnotationsWeconsideredtwodifferentsetsofannotationsasgoldstandardstotrainthesupervisedmodels.Inthefirstexperiment,weusedthemanuallycuratedannotationsprovidedwiththedatasetfrompreviouswork[21],whereasinthesecondexperimentwetrainedthemodelsusing the ICD-10 codes associated with each note encounter. Both experiments wereevaluated using manual annotations. The rationale was to compare the feasibility ofidentifyingacuteLBPeventswhenmanualannotationsareandarenotavailable.Wetrainedtheclassifiertooutput“acuteLBP”vs.“other”becausethegoaloftheprojectwastoidentifyclinicalnoteswithacuteLBPevents,ratherthandiscriminatedifferentfacetsofLBPevents(e.g.,“chronicLBP”vs.“acuteLBP”).MetricsForallexperiments,wereportareaunderthereceiveroperatingcharacteristiccurve(AUC-ROC),micro-precision,recall,F-score,andareaundertheprecision-recallcurve(AUC-PRC)[61].TheROCcurveisaplotoftruepositiverateversusfalsepositiveratefoundoverthesetof predictions. F-score is the harmonic mean of classification precision and recall perannotation,whereprecisionisthenumberofcorrectpositiveresultsdividedbythenumberof all positive results, and recall is thenumber of correct positive results dividedby thenumberofpositiveresultsthatshouldhavebeenreturned.ThePRCisaplotofprecisionandrecallfordifferentthresholds.TheareasundertheROCandPRCcurvesarecomputedbyintegratingthecorrespondingcurves.ModelHyperparametersThemodelhyperparameterswereempiricallytunedusingthevalidationsetstooptimizetheresultswithbothtrainingannotations.Inthetopicmodelingmethod,weinferredtopicsusingthewholetrainingsetofdocumentsand200topics(derivedusingperplexityanalysis).Whileseeminglymore intuitive,usingonly thenotesassociatedwith theM54.5 “Lowbackpain” ICD10codeactuallyproducedworse results. For each fold, the most relevant topics associated with acute LBP weremanuallyreviewedandusedtoannotatethenotes.Inthedeeplearningarchitecture,weusedembeddingswithsize300,andfull-lengthnotes.We trained word2vec just on the clinical note dataset to initialize embeddings. Pre-initializingtheembeddingswithageneral-purposecorpusdidnotleadtoanyimprovement.EachCNNhad200filtersandusedaReLuactivationfunction.Weaddedtwofullyconnectedlayers of size 600 following the CNNs with ReLu activations and batch normalization.Dropoutvaluesacrossthelayerswereallsetto0.5.Thearchitecturewastrainedusingcross-entropy losswith theAdamoptimizer for five epochs andbatch size32 (learning rate=0.001).Theclassificationthresholdsforprecision,recall,andF-scorewerefoundbyranging

All rights reserved. No reuse allowed without permission. not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint (which wasthis version posted November 11, 2019. .https://doi.org/10.1101/19010462doi: medRxiv preprint

Page 10: Identifying Acute Low Back Pain Episodes in Primary Care ... · logistic regression with bag-of-n-grams and manual features; and deep learning (ConvNet). We trained the supervised

thevaluefrom0.1to1,with0.1increments,andretaining,foreachmodel,thevalueleadingtothebestresultsonthevalidationset.

ResultsTable1andFigure2showtheaverageresultsofthe10-foldcross-validationexperimentforallthemodelsconsidered.ThebestresultswereobtainedbyConvNetwhentrainedwiththemanualannotations.Whilethisisnotentirelysurprisinggiventhesuccessofdeeplearningfor NLPwhen high-quality annotations and a large amount of data (i.e., on the order ofmillionsoftrainingexamples)areavailable,thiswasnotcertaininthisdomainwherethetrainingdatasetwasmuchsmaller.Asexpected,theresultsobtainedbythebaselineandbytrainingthemodelsusingtheICD-10codeswerenotasgood,confirmingthattheM54.5ICD-10codeisnotasufficientindicatorofacuteLBP.TopicModelleadstosimilarperformancebut provides a more intuitive and potentially effective way for exploring the collection,extractingmeaningfulpatternsthatarerelatedtoacuteLBPepisodes(seeFigure3).Whilethisapproachmightnotberobustenoughforclinicalapplication,arefinedandmanuallycuratedversionofTopicModelpromisestoallowanefficientpre-filteringofclinicalreportsthat can speed up themanual work required to annotate them.On the contrary but asexpected,WordSearchperformedpoorlyastheconditionismentionedintoomanydifferentwaysacrossthetextandsimplekeywordswerenotsufficient.

All rights reserved. No reuse allowed without permission. not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint (which wasthis version posted November 11, 2019. .https://doi.org/10.1101/19010462doi: medRxiv preprint

Page 11: Identifying Acute Low Back Pain Episodes in Primary Care ... · logistic regression with bag-of-n-grams and manual features; and deep learning (ConvNet). We trained the supervised

Table1:Classificationresultsinidentifyingclinicalnoteswith“acuteLBP”episodesintermsofPrecision, Recall, F-score, Area Under the ROC (AUC-ROC) and Precision-Recall (AUC-PRC)Curves. Results are averaged over the 10-fold cross validation experiment. We compareddifferent supervised and unsupervised strategies: keyword search (“WordSearch”); topicmodeling (“TopicModel”); logistic regression with bag-of-n-grams (“BoN-LR”) and manualfeatures(“FeatEng-LR”);anddeeplearning(“ConvNet”).Thesupervisedmodels(i.e.,BoN-LR,FeatEng-LRandConvNet)weretrainedusingmanualannotationsorM54.5ICD-10codes.The“ICD-10”baselinesimplyconsideredas“acuteLBP”allthenotesassociatedwiththegenericM54.5“Lowbackpain”ICD-10code.

Precision Recall F-score AUC-ROC AUC-PRC

Baseline ICD-10 0.32 0.68 0.41 0.81 0.42

UnsupervisedMethods

WordSearch 0.71 0.03 0.06 0.52 0.40

TopicModel 0.44 0.58 0.50 0.92 0.46

TrainedwiththeM54.5ICD-10Code

BoN-LR 0.50 0.70 0.59 0.83 0.42

FeatEng-LR 0.47 0.59 0.52 0.88 0.41

ConvNet 0.55 0.68 0.61 0.89 0.46

TrainedwithManual

Annotations

BoN-LR 0.53 0.64 0.58 0.93 0.56

FeatEng-LR 0.58 0.66 0.62 0.93 0.58

ConvNet 0.65 0.73 0.70 0.98 0.72

All rights reserved. No reuse allowed without permission. not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint (which wasthis version posted November 11, 2019. .https://doi.org/10.1101/19010462doi: medRxiv preprint

Page 12: Identifying Acute Low Back Pain Episodes in Primary Care ... · logistic regression with bag-of-n-grams and manual features; and deep learning (ConvNet). We trained the supervised

Figure2:ROCandPrecision-RecallcurvesobtainedwhenusingastrainingdataforBoN-LR,FeatEng-LRandConvNetthemanualannotations(a)andtheM54.5ICD-10codes(b).ConvNettrained using themanual annotations obtained the best results. In the absence ofmanualannotationstousefortraining,TopicModelworkedbetterthanmethodstrainedusingICD-10codes,whichprovednottobeagoodindicatortoidentifyacuityinLBPepisodes.

All rights reserved. No reuse allowed without permission. not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint (which wasthis version posted November 11, 2019. .https://doi.org/10.1101/19010462doi: medRxiv preprint

Page 13: Identifying Acute Low Back Pain Episodes in Primary Care ... · logistic regression with bag-of-n-grams and manual features; and deep learning (ConvNet). We trained the supervised

Figure3:Representative“acuteLBP”-relatedtopicderivedbyaveragingthewordlikelihoodinall the relevant topics (i.e., that weremanually verified) inferred across the 10-fold cross-validation experiment.We report the top30words,with the biggestwords being themostrelevant.Asitcanbeseen,mostofthewordsareindeedrelatedtoacuteLBP,includingseveralmedicationsthatareusuallyprescribedtotreatinflammationandpain(e.g.,Cyclobenzaprine,Flexeril,Advil).AmanuallyrefinedversionofTopicModelcanhelppre-filteringthenotesinanintuitivesemi-automaticway,promisingtospeedupthemanualannotationprocess.Figure4reportstheclassificationresultsintermsofAUC-ROCandAUC-PRCwhenrandomlysubsamplingthe“acuteLBP”manualannotationsinthetrainingset.WefoundthatConvNetalwaysoutperformstheothermethodsbasedonLRaswellasTopicModel.Inaddition,wenoticethatusingjust30%ofthemanualannotations(i.e.,70clinicalnotes)alreadyleadstobetter results than using ICD-10 codes as training data. This is a particularly interestinginsightas it shows thatonlyminimal manualwork is required inorder toachievegoodclassifications;thesecanthenbefurtherimprovedbyaddingautomaticallyannotatednotestothemodel(aftermanualverification)andretraining.

All rights reserved. No reuse allowed without permission. not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint (which wasthis version posted November 11, 2019. .https://doi.org/10.1101/19010462doi: medRxiv preprint

Page 14: Identifying Acute Low Back Pain Episodes in Primary Care ... · logistic regression with bag-of-n-grams and manual features; and deep learning (ConvNet). We trained the supervised

Figure4:AreaundertheROC(AUC-ROC)andPrecision-Recall(AUC-PRC)curvesobtainedwhentraining the supervised models using random sub-samples of the manual annotations.TopicModel is reported as reference baseline. ConvNet obtained satisfactory results whentrainedusinglessmanuallyannotateddocuments,showingrobustnessandscalabilitytothegoldstandard.

Figure5highlightsthedistributionsoftheclassificationscores(predictedprobabilityofthelabel“acuteLBP”)derivedbyseveralsupervisedmodels(trainedwithmanualannotations)andTopicModel.ConvNetshowsclearseparationbetweenacuteLBPnotesandtherestofthedataset.Inparticular,allacuteLBPnoteshadscoresgreaterthan0.2,with82%ofthem(i.e.,727notes)havingscoresgreaterthan0.5.Onthecontrary,only347controlshadscoresgreater than 0.5, meaning that only a few notes were highly likely to be misclassified.Similarly,TopicModelhadnocontrolswithscoresgreaterthan0.7andall“acuteLBP”noteshadscoresgreaterthan0.2.

All rights reserved. No reuse allowed without permission. not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint (which wasthis version posted November 11, 2019. .https://doi.org/10.1101/19010462doi: medRxiv preprint

Page 15: Identifying Acute Low Back Pain Episodes in Primary Care ... · logistic regression with bag-of-n-grams and manual features; and deep learning (ConvNet). We trained the supervised

Figure 5: Representation of the probability distribution of the scores obtained by BoN-LR,FeatEng-LR,ConvNet(trainedwithmanualannotations)andTopicModel.ConvNetledtogoodseparationbetween“acuteLBP”clinicalnotesandalltheotherdocuments.Intheothercases,suchseparation isnotasclear,explainingtheworseclassificationresultsobtainedbythosemodels. Finally, Table 2 summarizes some of the n-grams driving the “acute LBP” predictionsobtainedbyConvNet(trainedwithmanualannotations)acrosstheexperiments.Whilesomeof these are obvious and refer to the disease itself (e.g., “acute lbp”), others refer tomedications(e.g.,“prescribedmusclerelaxant”,“flexeril”),andrecommendations(e.g.,“rtwfulldutyquick”).Giventheirclinicalmeaningandrelevance,allthesepatternscanbefurtheranalyzedandreviewedtopotentiallydrivethedevelopmentofguidelinesfor,e.g.,treatmentandRTWoptions.

All rights reserved. No reuse allowed without permission. not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint (which wasthis version posted November 11, 2019. .https://doi.org/10.1101/19010462doi: medRxiv preprint

Page 16: Identifying Acute Low Back Pain Episodes in Primary Care ... · logistic regression with bag-of-n-grams and manual features; and deep learning (ConvNet). We trained the supervised

Table2:Examplesofn-gramsthatwererelevantinidentifying“acuteLBP”noteswhenusingConvNet trained with manual annotations. The n-grams relevance was determined byanalyzing theneurons of theCNNsactivating themax-pooling layers and their log-odds tocontributetothefinaloutput.Log-oddswerefilteredpernotesandthenaveragedoverallthenotesandevaluationfolds.

Type AcuteLBP-relatedPredictiveN-grams

Diagnosis

musclespasmlowerbackacutelowbackpainflarebeenhavingacutebackpainacutemidlinelowbackpainsportsacutebilaterallowbackpainacutelowbackpainacutelbp

RelatedConditions

gaitabnormalityshowedsignificantdiskherniationintermittentsciaticaspinalstenosis

Medications

backpainflareprescribedflexerilcyclobenzaprineflexerilnaproxenforacutelowbackprescribedmusclerelaxant

Recommendations

backbraceforbackpainobtainlumbarspineMRIrecommendationrtwvisitrtwfulldutyquick

DiscussionInthisworkweevaluatedtheuseofseveralmachinelearningapproachestoidentifyacuteLBP episodes in free text clinical notes in order to better personalize the treatment andmanagementofthisconditioninprimarycare.Theexperimentalresultsshowedthatit ispossible toextractacuteLBPepisodeswithpromisingprecision,especiallywhenat leastsomemanuallycuratedannotationsareavailable.Inthisscenario,ConvNet,adeeplearningarchitecturebasedonCNNs,significantlyoutperformedothershallowtechniquesbasedon

All rights reserved. No reuse allowed without permission. not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint (which wasthis version posted November 11, 2019. .https://doi.org/10.1101/19010462doi: medRxiv preprint

Page 17: Identifying Acute Low Back Pain Episodes in Primary Care ... · logistic regression with bag-of-n-grams and manual features; and deep learning (ConvNet). We trained the supervised

bag-of-n-gramsandlogisticregression,openingthepossibilitytoboostperformancesusingmorecomplexarchitecturesfromcurrentresearchintheNLPcommunity.Theimplementeddeeparchitecturealsoprovidesaneasymechanismtoexplain thepredictions, leading toinformeddecisionsupportbasedonmodel transparency[62,63]andthe identificationofmeaningfulpatternsthatcandriveclinicaldecisionmaking.Ifnoannotationsareavailable,experimentsshowedthattheuseoftopicmodelingispreferredtotrainingaclassifierusingonly the M54.5 ICD-10 codes (i.e., “Low back pain”) associated with the clinical noteencounter,whichprovedtobeapoorindicatortodiscriminateLBPepisodes.Inaddition,thetopicsidentifiedcanserveasanintuitivetooltoinformguidelinesandrecommendationsaswellastopre-filterthedocumentsandreducethemanualworkrequiredtoannotatethenotes. Theproposed framework is inherently domain agnostic anddoes not require anymanual supervision to identify relevant features from the free-text. Therefore, it can beleveragedinothermusculoskeletalconditiondomainswhereacuityisnotexpressedintheICD-10/diagnosticcodes,suchasknee,elbow,andshoulderpain.PotentialApplicationsMedical care decisions are often based on heuristics and manually derived rule-basedmodelsconstructedonpriorknowledgeandexpertise[64].Cognitivebiasesandpersonalitytraits,suchasaversiontoriskorambiguity,overconfidence,andtheanchoringeffect,maylead to diagnostic inaccuracies and medical errors resulting in mismanagement orinadequate utilization of resources [65]. In the LBP domain, thismay lead to: delays infindingtherighttherapyandassistinginthereturnofpatientstonormalactivities;increasein the risk of transitioning the condition from acute to chronic; creating discomfort forpatients;andincreasingtheeconomicburdensonclinicalfacilitiestoadequatelytreatandmanage this patient population. Deriving data-driven guidelines for treatmentrecommendationscanhelpinreducingthesecognitivebiasesandpersonalitytraits,leadingto more consistent and accurate decisions. In this scenario, the proposed frameworksintegrate seamlesslywith theRTWtoolproposedbyCruzetal. [21]by includingacuity-relevantinformationintheclinicalnotesandaddressingoneofthelimitationsofthatstudy(i.e.,recommendingtheRTWtoolatthepointofcarebyaccuratelyidentifyingconditionasacuteLBP).Similarly,anunderstandingofthepatternsdrivingthepredictionscanleadtothedevelopmentofnewand improved treatment strategies forvarious typesof injuries,whichcanbepresentedtothecliniciansatthetimeofpatientencountertohelpthemwithbettermanagementof thecondition.Whilephysicianswillcontinue tohaveautonomy indeterminingoptimalcarepathways fortheirpatients,therecommendationsprovidedbythesupportingframeworkwillbeusefultosystematizeandsupporttheiractivitieswithintherealmofthebusyclinicalpractice.PosterioranalysisoftheclinicalnotestoinferacuteLBP episodes can also help in assigning the proper diagnostic and billing codes for theencounter.Inaforeseeablefuturescenariowhereclinicalobservations areautomaticallytranscribedviavoiceandEHRsareprocessedinreal-time,anautomatedtoolthatidentifies

All rights reserved. No reuse allowed without permission. not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint (which wasthis version posted November 11, 2019. .https://doi.org/10.1101/19010462doi: medRxiv preprint

Page 18: Identifying Acute Low Back Pain Episodes in Primary Care ... · logistic regression with bag-of-n-grams and manual features; and deep learning (ConvNet). We trained the supervised

acuityinformationcouldalsoimprovetheaccuracyofdiagnosisandbillinginreal-time,withnoneedtowaitforposteriorevaluations.LimitationsThisworkevaluatedthefeasibilityofusingmachinelearningtoidentifyacuteLBPepisodesinclinicalnotes.Therefore,wecompareddifferenttypesofmodels(shallowvs.deep)andlearning frameworks (unsupervised vs. supervised) to identify the best directions forimplementationanddeploymentinrealclinicalsettings.Whileseveralofthearchitecturesevaluatedinthisworkobtainedpromisingresults,moresophisticatedmodelsarelikelytoimprove these performances, especially in the deep learning domain. For example,algorithms based on attention models [66], BERT [67], or XLNet [68] have shownencouraging results on similarNLP tasks and are likely to obtain better results in thisdomain as well. In this work we only focused on processing clinical notes; however,embeddingstructuredEHRdata,especiallymedications,imagingstudiesand/orlabtests,intothemethodshouldimprovetheresults.ThedatasetofclinicalnotesusedinthisstudyoriginatedfromageographicallydiversesetofprimarycareclinicsservingtheNewYorkpopulationacrosstheNYmetroareaoveralimitedperiod(i.e.,2016-2018).Providerswereenrolledandrandomizedintothestudyonarollingbasis,withthenumberofencountersforLBPvaryingforeachindividualprovider,basedonhis/herownpractice.Themajorityoftheprimarycareproviderswereassistantprofessorsservingonthefrontlines.Nospecialistswereincludedintheinitialstudyasthepilotprojectwasonlygearedtowardstheprimarycareproviders.Consequently,theresultsofthisstudymightnotbeapplicabletospecialtycarepractice.FutureWorkTheclassificationofLBPepisodesasacuteorchronicatthepointofcarelevelwithinprimarycarepracticeisimperativeforaRTWtooltobeeffectivelyusedtorenderevidence-basedguidelines.Atthistime,weplantoclassifya largesetofnotes,derivepatternsrelatedtoacuteLBPandextendthetoolproposedbyCruzetal.[21]accordingtothem.WealsoplantoidentifycaseswheretheRTWtoolcanbeeasilydeployedbasedonEHRintegrationintheclinicaldomain.Second,wewillbegintoaddresssomeofthemethodologicallimitationsofthisstudytooptimizeperformanceandevaluateitsgeneralizabilityoutsideprimarycare.Finally,weaimtoevaluatethefeasibilityofthistypeofapproachforothermusculoskeletalconditions,inparticular,shoulderandkneepain.

ConclusionsThisarticledemonstratesthefeasibilityofusingmachinelearningtoautomaticallyidentifyacuteLBPepisodesfromclinicalreportsusingonlyunstructuredfreetextdata.Inparticular,

All rights reserved. No reuse allowed without permission. not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint (which wasthis version posted November 11, 2019. .https://doi.org/10.1101/19010462doi: medRxiv preprint

Page 19: Identifying Acute Low Back Pain Episodes in Primary Care ... · logistic regression with bag-of-n-grams and manual features; and deep learning (ConvNet). We trained the supervised

manuallyannotatingasetofnotes touseasagoldstandardcan leadtoeffectiveresults,especiallywhenusingdeeplearning.Topicmodelingcanhelpinspeedinguptheannotationprocess,initiatinganiterativeprocesswhereinitialpredictionsarevalidatedandthenusedto refineandoptimize themodel.This approachprovidesa generalizable framework forlearning to differentiate disease acuity in primary care, which canmore accurately andspecificallyguidethediagnosisandtreatmentofLBP.ItalsoprovidesaclearpathtowardimprovingtheaccuracyofcodingandbillingofclinicalencountersforLBP.

AcknowledgementsI.N.andL.C.wouldliketothankthePilotProjectsResearchTrainingProgramoftheNYandNJ Education and Research Center (ERC), National Institute for Occupational Safety andHealth,fortheirfunding(grant#T42OH008422).R.M.wouldliketothankthesupportfromtheHassoPlattnerFoundationandacourtesyGPUdonationfromNVIDIA.CompetingInterestsNone.ContributionsR.M.and I.N. initiated the ideaandwrote thearticle; I.N.collected thedataandprovidedclinicalsupport;R.M.conductedtheresearchandtheexperimentalevaluation;B.L.P.advisedonevaluationstrategiesandrefinedthearticle;B.S.G.,H.L.,andL.C.refinedthearticle;J.T.D.supportedtheresearch.Alltheauthorseditedandreviewedthemanuscript. Abbreviations AUC-PRC:AreaUnderthePrecision-RecallCurveAUC-ROC:AreaUndertheReceiverOperatingCharacteristicCurveBoN:BagOfN-gramsCNN:ConvolutionalNeuralNetworkEHR:ElectronicHealthRecordHIPAA:HealthInsurancePortabilityandAccountabilityActICD-CM:InternationalStatisticalofDiseases,ClinicalModificationIRB:InstitutionalReviewBoardLBP:LowBackPainLR:LogisticRegressionNLP:NaturalLanguageProcessingNY:NewYorkPCP:PrimaryCareProviderRTW:ReturnToWorkTF-IDF:TermFrequency-InverseDocumentFrequency

All rights reserved. No reuse allowed without permission. not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint (which wasthis version posted November 11, 2019. .https://doi.org/10.1101/19010462doi: medRxiv preprint

Page 20: Identifying Acute Low Back Pain Episodes in Primary Care ... · logistic regression with bag-of-n-grams and manual features; and deep learning (ConvNet). We trained the supervised

References 1 Centers forDisease Control and Prevention (CDC). Prevalence andmost common causes of disability

amongadults--UnitedStates,2005.MMWRMorbMortalWklyRep2009;58:421–6.2 RicciJA,StewartWF,CheeE,etal.BackpainexacerbationsandlostproductivetimecostsinUnitedStates

workers.Spine2006;31:3052–60.3 LuckhauptSE,DahlhamerJM,GonzalesGT,etal.Prevalence,RecognitionofWork-Relatedness,andEffect

on Work of Low Back Pain Among US Workers. Ann Intern Med Published Online First:2019.https://annals.org/aim/article-abstract/2733500/prevalence-recognition-work-relatedness-effect-work-low-back-pain-among?searchresult=1

4 HealthCareUtilizationandEconomicCost.BMUS:TheBurdenofMusculoskeletalDiseasesintheUnitedStates. https://www.boneandjointburden.org/2014-report/if0/health-care-utilization-and-economic-cost(accessed22Apr2019).

5 Fairbank J, Gwilym SE, France JC, et al. The role of classification of chronic low back pain. Spine2011;36:S19–42.

6 WeinerDK,KimY-S,BoninoP,etal.Lowbackpaininolderadults:areweutilizinghealthcareresourceswisely?PainMed2006;7:143–50.

7 FriedlyJ,ChanL,DeyoR.IncreasesinlumbosacralinjectionsintheMedicarepopulation:1994to2001.Spine2007;32:1754–60.

8 DeyoRA,MirzaSK.TrendsandVariationsintheUseofSpineSurgery.ClinOrthopRelatRes2006;443:139.9 DeyoRA,NachemsonA,MirzaSK.Spinal-FusionSurgery—TheCaseforRestraint.SpineJ2004;4:S138–

42.10 BallantyneJC.OpioidsfortheTreatmentofChronicPain:MistakesMade,LessonsLearned,andFuture

Directions.AnesthAnalg2017;125:1769–78.11 LuoX,PietrobonR,HeyL.Patternsand trends inopioiduseamong individualswithbackpain in the

UnitedStates.Spine2004;29:884–90;discussion891.12 MalmivaaraA,HäkkinenU,AroT,etal.TheTreatmentofAcuteLowBackPain—BedRest,Exercises,or

Ordinary Activity? New England Journal of Medicine. 1995;332:351–5.doi:10.1056/nejm199502093320602

13 2019 ICD-10-CM Diagnosis Code M54.5: Low back pain.https://www.icd10data.com/ICD10CM/Codes/M00-M99/M50-M54/M54-/M54.5 (accessed 24 Apr2019).

14 PetersenT,LaslettM,JuhlC.Clinicalclassificationinlowbackpain:best-evidencediagnosticrulesbasedonsystematicreviews.BMCMusculoskeletDisord2017;18:188.

15 CasazzaBA.Diagnosisandtreatmentofacutelowbackpain.AmFamPhysician2012;85:343.16 FeuersteinM,MarcusSC,HuangGD.Nationaltrendsinnonoperativecarefornonspecificbackpain.Spine

J2004;4:56–63.17 KesslerRC,DavisRB,FosterDF,etal. Long-term trends in theuseof complementaryandalternative

medicaltherapiesintheUnitedStates.AnnInternMed2001;135:262–8.18 MartinBI,DeyoRA,MirzaSK,etal.Expendituresandhealthstatusamongadultswithbackandneck

problems.JAMA2008;299:656–64.19 FreburgerJK,HolmesGM,AgansRP,etal.Therisingprevalenceofchroniclowbackpain.ArchInternMed

2009;169:251–8.20 RossignolM,AbenhaimL,SéguinP,etal.Coordinationofprimaryhealthcareforbackpain.Arandomized

controlledtrial.Spine2000;25:251–8;discussion258–9.21 CruzLC,AlamgirHA,ShethP,etal.Developmentofareturntoworktoolforprimarycareprovidersfor

patientswithlowbackpain:Apilotstudy.JFamilyMedPrimCare2018;7:1185–92.22 OwensJD,HegmannKT,ThieseMS,etal.ImpactsofAdherencetoEvidence-BasedMedicineGuidelines

fortheManagementofAcuteLowBackPainonCostsofWorker’sCompensationClaims.JOccupEnviron

All rights reserved. No reuse allowed without permission. not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint (which wasthis version posted November 11, 2019. .https://doi.org/10.1101/19010462doi: medRxiv preprint

Page 21: Identifying Acute Low Back Pain Episodes in Primary Care ... · logistic regression with bag-of-n-grams and manual features; and deep learning (ConvNet). We trained the supervised

Med2019;61:445–52.23 Reporting Pain in ICD-10-CM. Coding Strategies. https://www.codingstrategies.com/news/reporting-

pain-icd-10-cm(accessed21Jun2019).24 GrossDP,Armijo-OlivoS,ShawWS,etal.ClinicalDecisionSupportToolsforSelectingInterventionsfor

PatientswithDisablingMusculoskeletalDisorders:AScopingReview.JOccupRehabil2016;26:286–318.25 JensenPB,JensenLJ,BrunakS.Miningelectronichealthrecords:towardsbetterresearchapplicationsand

clinicalcare.NatRevGenet2012;13:395–405.26 GlicksbergBS,JohnsonKW,DudleyJT.Thenextgenerationofprecisionmedicine:observationalstudies,

electronichealthrecords,biobanksandcontinuousmonitoring.HumMolGenet2018;27:R56–62.27 LeCunY,BengioY,HintonG.Deeplearning.Nature2015;521:436–44.28 MiottoR, Li L,KiddBA,etal.DeepPatient:AnUnsupervisedRepresentation toPredict theFutureof

PatientsfromtheElectronicHealthRecords.SciRep2016;6:26094.29 MiottoR,WangF,WangS,etal.Deeplearningforhealthcare:review,opportunitiesandchallenges.Brief

BioinformPublishedOnlineFirst:6May2017.doi:10.1093/bib/bbx04430 ChoiE,BahadoriMT,SchuetzA,etal.DoctorAI:PredictingClinicalEventsviaRecurrentNeuralNetworks.

arXiv[cs.LG].2015.http://arxiv.org/abs/1511.05942v1131 XiaoC,ChoiE,SunJ.Opportunitiesandchallengesindevelopingdeeplearningmodelsusingelectronic

health recordsdata: a systematic review. JAmMed InformAssoc PublishedOnlineFirst:8 June2018.doi:10.1093/jamia/ocy068

32 RajkomarA,OrenE,ChenK,etal.Scalableandaccuratedeeplearningwithelectronichealthrecords.npjDigitalMedicine2018;1:18.

33 MiottoR,LiL,DudleyJT.DeepLearningtoPredictPatientFutureDiseasesfromtheElectronicHealthRecords.In:FerroN,CrestaniF,MoensM-F,etal.,eds.AdvancesinInformationRetrieval.Cham::SpringerInternationalPublishing2016.768–74.

34 GoldbergY.APrimeronNeuralNetworkModelsforNaturalLanguageProcessing. JournalofArtificialIntelligenceResearch.2016;57:345–420.doi:10.1613/jair.4992

35 WuY,SchusterM,ChenZ,etal.Google’sNeuralMachineTranslationSystem:BridgingtheGapbetweenHumanandMachineTranslation.arXiv[cs.CL].2016.http://arxiv.org/abs/1609.08144

36 KannanA,KurachK,RaviS,etal.SmartReply:AutomatedResponseSuggestionforEmail.In:Proceedingsofthe22NdACMSIGKDDInternationalConferenceonKnowledgeDiscoveryandDataMining.NewYork,NY,USA::ACM2016.955–64.

37 VinyalsO,ToshevA,BengioS,etal.Showandtell:Aneuralimagecaptiongenerator.In:ProceedingsoftheIEEEconferenceoncomputervisionandpatternrecognition.2015.3156–64.

38 SheikhalishahiS,MiottoR,DudleyJT,etal.NaturalLanguageProcessingofclinicalnotes:AsystematicreviewforChronicDiseases.JMIRMedicalInformatics.2018.doi:10.2196/12239

39 BaumelT,Nassour-KassisJ,CohenR,etal.Multi-LabelClassificationofPatientNotesaCaseStudyonICDCodeAssignment.arXiv[cs.CL].2017.http://arxiv.org/abs/1709.09587

40 MullenbachJ,WiegreffeS,DukeJ,etal.ExplainablePredictionofMedicalCodesfromClinicalText.arXiv[cs.CL].2018.http://arxiv.org/abs/1802.05695

41 Shi H, Xie P, Hu Z, et al. Towards Automated ICD Coding Using Deep Learning. arXiv [cs.CL].2017.http://arxiv.org/abs/1711.04075

42 Liu J, Zhang Z, RazavianN.DeepEHR: ChronicDisease PredictionUsingMedicalNotes. arXiv [cs.LG].2018.http://arxiv.org/abs/1808.04928

43 Yoon H-J, Ramanathan A, Tourassi G. Multi-task Deep Neural Networks for Automated Extraction ofPrimarySiteandLateralityInformationfromCancerPathologyReports.In:AdvancesinBigData.SpringerInternationalPublishing2017.195–204.

44 QiuJX,YoonH-J,FearnPA,etal.DeepLearningforAutomatedExtractionofPrimarySitesFromCancerPathologyReports.IEEEJBiomedHealthInform2018;22:244–51.

45 Turner CA, Jacobs AD, Marques CK, et al. Word2Vec inversion and traditional text classifiers for

All rights reserved. No reuse allowed without permission. not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint (which wasthis version posted November 11, 2019. .https://doi.org/10.1101/19010462doi: medRxiv preprint

Page 22: Identifying Acute Low Back Pain Episodes in Primary Care ... · logistic regression with bag-of-n-grams and manual features; and deep learning (ConvNet). We trained the supervised

phenotypinglupus.BMCMedInformDecisMak2017;17:126.46 GehrmannS,DernoncourtF, LiY,et al. ComparingRule-BasedandDeepLearningModels forPatient

Phenotyping.arXiv[cs.CL].2017.http://arxiv.org/abs/1703.0870547 DavisDA,Taylor-VaiseyA.Translatingguidelinesintopractice.Asystematicreviewoftheoreticconcepts,

practical experience and research evidence in the adoption of clinical practice guidelines. CMAJ1997;157:408–16.

48 NavarroG.AGuidedTourtoApproximateStringMatching.ACMComputSurv2001;33:31–88.49 ChapmanWW,BridewellW,HanburyP,etal.Asimplealgorithmfor identifyingnegated findingsand

diseasesindischargesummaries.JBiomedInform2001;34:301–10.50 Blei DM. Probabilistic topic models. Communications of the ACM. 2012;55:77.

doi:10.1145/2133806.213382651 BleiDM,NgAY,JordanMI.LatentDirichletAllocation.JMachLearnRes2003;3:993–1022.52 Miotto R,Weng C. Case-based reasoning using electronic health records efficiently identifies eligible

patientsforclinicaltrials.JAmMedInformAssoc2015;22:e141–50.53 PerotteAJ,WoodF,ElhadadN,etal.HierarchicallySupervisedLatentDirichletAllocation. In: Shawe-

TaylorJ,ZemelRS,BartlettPL,etal.,eds.AdvancesinNeuralInformationProcessingSystems24.CurranAssociates,Inc.2011.2609–17.

54 ChangKR,LouX,KaraletsosT,etal.AnEmpiricalAnalysisofTopicModelingforMiningCancerClinicalNotes.bioRxiv.2016;:062307.doi:10.1101/062307

55 CohenR,AviramI,ElhadadM,etal.Redundancy-awaretopicmodelingforpatientrecordnotes.PLoSOne2014;9:e87555.

56 Mikolov T, Sutskever I, Chen K, et al. Distributed Representations of Words and Phrases and theirCompositionality.In:BurgesCJC,BottouL,WellingM,etal.,eds.AdvancesinNeuralInformationProcessingSystems26.CurranAssociates,Inc.2013.3111–9.

57 GlicksbergBS,MiottoR,JohnsonKW,etal.AutomateddiseasecohortselectionusingwordembeddingsfromElectronicHealthRecords.PacSympBiocomput2018;23:145–56.

58 Choi Y, ChiuCY-I, SontagD. LearningLow-DimensionalRepresentations ofMedical Concepts.AMIA JtSummitsTranslSciProc2016;2016:41–50.

59 WangY, Liu S, AfzalN, et al. A comparison ofword embeddings for the biomedical natural languageprocessing.JBiomedInform2018;87:12–20.

60 Kim Y. Convolutional Neural Networks for Sentence Classification. arXiv [cs.CL].2014.http://arxiv.org/abs/1408.5882

61 Ricardo BY, Berthier RN. Modern Information Retrieval: the concepts and technology behind searchsecondedition.AddisionWesley2011.

62 HolzingerA,BiemannC,PattichisCS,etal.WhatdoweneedtobuildexplainableAIsystemsforthemedicaldomain?arXiv[cs.AI].2017.http://arxiv.org/abs/1712.09923

63 LiptonZC.TheMythosofModelInterpretability.arXiv[cs.LG].2016.http://arxiv.org/abs/1606.0349064 MarewskiJN,GigerenzerG.Heuristicdecisionmakinginmedicine.DialoguesClinNeurosci2012;14:77–

89.65 SaposnikG,RedelmeierD,RuffCC,etal.Cognitivebiasesassociatedwithmedicaldecisions:asystematic

review.BMCMedInformDecisMak2016;16:138.66 VaswaniA,ShazeerN,ParmarN,etal.AttentionisAllyouNeed.In:GuyonI,LuxburgUV,BengioS,etal.,

eds.AdvancesinNeuralInformationProcessingSystems30.CurranAssociates,Inc.2017.5998–6008.67 DevlinJ,ChangM-W,LeeK,etal.BERT:Pre-trainingofDeepBidirectionalTransformersforLanguage

Understanding.arXiv[cs.CL].2018.http://arxiv.org/abs/1810.0480568 YangZ,DaiZ,YangY,etal.XLNet:GeneralizedAutoregressivePretrainingforLanguageUnderstanding.

arXiv[cs.CL].2019.http://arxiv.org/abs/1906.08237

All rights reserved. No reuse allowed without permission. not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint (which wasthis version posted November 11, 2019. .https://doi.org/10.1101/19010462doi: medRxiv preprint