stm.sciencemag.org/cgi/content/full/12/545/eaay1548/DC1 Supplementary Materials for Using genetics to prioritize diagnoses for rheumatology outpatients with inflammatory arthritis Rachel Knevel, Saskia le Cessie, Chikashi C. Terao, Kamil Slowikowski, Jing Cui, Tom W. J. Huizinga, Karen H. Costenbader, Katherine P. Liao, Elizabeth W. Karlson, Soumya Raychaudhuri* *Corresponding author. Email: [email protected]Published 27 May 2020, Sci. Transl. Med. 12, eaay1548 (2020) DOI: 10.1126/scitranslmed.aay1548 The PDF file includes: Fig. S1. Flowchart of the simulation study. Fig. S2. Test characteristics of different ICD9 cutoffs for identification of RA cases using reviewed medical record data as the gold standard. Fig. S3. Flowchart of patient selection in setting I. Fig. S4. Flowchart of patient selection in setting II. Fig. S5. Flowchart of patient selection in setting III. Fig. S6. Flowchart of the medical record review procedure. Fig. S7. Density plots of G-probabilities per disease. Fig. S8. Precision recall curves. Fig. S9. Sensitivity analysis of the performance of G-PROB per disease. Fig. S10. Sensitivity analysis of the influence of individual diseases on G-PROB’s performance. Fig. S11 Sensitivity analysis comparing different shrinkage factors. Fig. S12. Test characteristics for the probabilities at different cutoffs. Table S1. ICD9 and ICD10 codes used to identify patients in setting I (eMERGE). Table S2. Patient characteristics in setting I. Table S3. Patient characteristics in setting II. Table S4. Patient characteristics in setting III. Table S5. Area under the receiver operating curve per disease. Table S6. McFadden’s R 2 from multinomial logistic regression testing how much of the variance in the final disease diagnosis was explained by clinical, genetic, or serologic information. Legends for data files S1 and S2
22
Embed
Supplementary Materials for...ACR2010 Meeting 98 51 Expert opinion 94 56 Summary statistics of ICD9 counts of the reviewed notes Min 1st Qu Median Mean 3rd Qu Max Number of RA ICD9
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Using genetics to prioritize diagnoses for rheumatology outpatients with
inflammatory arthritis
Rachel Knevel, Saskia le Cessie, Chikashi C. Terao, Kamil Slowikowski, Jing Cui, Tom W. J. Huizinga, Karen H. Costenbader, Katherine P. Liao, Elizabeth W. Karlson, Soumya Raychaudhuri*
Published 27 May 2020, Sci. Transl. Med. 12, eaay1548 (2020)
DOI: 10.1126/scitranslmed.aay1548
The PDF file includes:
Fig. S1. Flowchart of the simulation study. Fig. S2. Test characteristics of different ICD9 cutoffs for identification of RA cases using reviewed medical record data as the gold standard. Fig. S3. Flowchart of patient selection in setting I. Fig. S4. Flowchart of patient selection in setting II. Fig. S5. Flowchart of patient selection in setting III. Fig. S6. Flowchart of the medical record review procedure. Fig. S7. Density plots of G-probabilities per disease. Fig. S8. Precision recall curves. Fig. S9. Sensitivity analysis of the performance of G-PROB per disease. Fig. S10. Sensitivity analysis of the influence of individual diseases on G-PROB’s performance. Fig. S11 Sensitivity analysis comparing different shrinkage factors. Fig. S12. Test characteristics for the probabilities at different cutoffs. Table S1. ICD9 and ICD10 codes used to identify patients in setting I (eMERGE). Table S2. Patient characteristics in setting I. Table S3. Patient characteristics in setting II. Table S4. Patient characteristics in setting III. Table S5. Area under the receiver operating curve per disease. Table S6. McFadden’s R2 from multinomial logistic regression testing how much of the variance in the final disease diagnosis was explained by clinical, genetic, or serologic information. Legends for data files S1 and S2
Other Supplementary Material for this manuscript includes the following: (available at stm.sciencemag.org/cgi/content/full/12/545/eaay1548/DC1)
Data file S1 (Microsoft Excel format). ORs of curated risk variants for RA, RAneg, SLE, PsA, SpA, and gout. Data file S2 (Microsoft Excel format). Disease prevalence used in G-PROB per setting.
Fig. S1. Flowchart of the simulation study. We started with the generation of a simulated
healthy population followed by identification of theoretical cases based on genetic profiles
corresponding to the different rheumatologic diseases. RA = rheumatoid arthritis, SLE = systemic lupus
≥ 3 ICD codes given at a rheumatology outpatient clinic
Genotyped in Biobank, Caucasians only n = 12,604
n = 1,808
n = 282
RA
n = 134
CCP+ n = 64
CCP- n = 51
Unknown#
n = 19
SLE
n = 7
SpA
n = 8
PsA
n = 22
Gout
n = 22
Other
n = 69
Excluded
n = 20
no synovitis
n = 8
info lacking n = 7
juvenile n = 1
multiple n = 3
Fig. S6. Flowchart of the medical record review procedure.
#Excluded patients because no clear decision could be made on whether the patient had undifferentiated arthritis or
one of the diseases of our interest: either the rheumatologist diagnosed the patients without meeting the criteria
(making it undifferentiated arthritis for our study) or the rheumatologist had more information than registered in the
notes.
Meets
classification criteria
YES
Same diagnosis
as rheumatologist
YES
Classify as case
according to criteria
NO
Additional expert
review
Consensus between two reviewers on
diagnosis
YES
Classify as case
according to criteria
NO
Exclude
NO
Synovitis
YES
Rheumatologist's
diagnosis at last visit
Possibly one of the diagnoses
Exclude
Other
phenotype
Other
No clear diagnosis
Undifferentiated
arthritis
NO
Exclude
Fig. S7. Density plots of G-probabilities per disease. These graphs depict the density of
probabilities for each disease subset in each setting (A-D). In green the probabilities that concern
a patients’ real disease. In orange the probabilities that refer to another disease than patients’ real
disease. Panel E shows the results of a subanalysis of Setting-III where we applied a flat
prevalence to G-Prob, avoiding skewed results due to an overrepresentation of (pre-)RA cases. RA = rheumatoid arthritis, SLE = systemic lupus erythematosus, SpA = spondyloarthropathy, PsA =
psoriatic arthritis.
Fig. S8. Precision recall curves. These graphs depict the PRC which is the precision (positive
predictive value) versus recall (sensitivity) curve. The fourth graph is the PRC given a random
classifier given a disease prevalence of 20% such as the case in the datasets of our study.
Fig. S9. Sensitivity analysis of the performance of G-PROB per disease. This graph depicts
the receiver operating curve (ROC) from Fig. 2B (main manuscript) subdivided for each
individual disease in setting II. The table shows area under the curve (AUC) for each disease. RA