Top Banner
www.sciencemag.org/cgi/content/full/science.aaq1327/DC1 Supplementary Materials for Co-regulatory networks of human serum proteins link genetics to disease Valur Emilsson*†, Marjan Ilkov*, John R. Lamb*†, Nancy Finkel, Elias F. Gudmundsson, Rebecca Pitts, Heather Hoover, Valborg Gudmundsdottir, Shane R. Horman, Thor Aspelund, Le Shu, Vladimir Trifonov, Sigurdur Sigurdsson, Andrei Manolescu, Jun Zhu, Örn Olafsson, Johanna Jakobsdottir, Scott A. Lesley, Jeremy To, Jia Zhang, Tamara B. Harris, Lenore J. Launer, Bin Zhang, Gudny Eiriksdottir, Xia Yang, Anthony P. Orth, Lori L. Jennings‡, Vilmundur Gudnason†‡ *These authors contributed equally to this work. †Corresponding author. Email: [email protected] (V.E.); [email protected] (V.G.); [email protected] (J.R.L.) ‡These authors contributed equally to this work. Published 2 August 2018 on Science First Release DOI: 10.1126/science.aaq1327 This PDF file includes: Materials and Methods Figs. S1 to S14 Tables S2, S5, S8, S11, S12, S16, and S18 to S20 Captions for tables S1, S3, S4, S6, S7, S9, S10, S13 to S15, S17, S21, and S22 References Other Supplementary Materials for this manuscript include the following: (available at www.sciencemag.org/cgi/content/full/science.aaq1327/DC1) Tables S1, S3, S4, S6, S7, S9, S10, S13 to S15, S17, S21 and S22 (Excel)
61

Supplementary Materials for · Materials and Methods 1. The study cohort Cohort participants aged 66 through 96 were from the (AGES) – Reykjavik Study (12), a single-center prospective

Jun 01, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Supplementary Materials for · Materials and Methods 1. The study cohort Cohort participants aged 66 through 96 were from the (AGES) – Reykjavik Study (12), a single-center prospective

www.sciencemag.org/cgi/content/full/science.aaq1327/DC1

Supplementary Materials for

Co-regulatory networks of human serum proteins link genetics to disease

Valur Emilsson*†, Marjan Ilkov*, John R. Lamb*†, Nancy Finkel, Elias F. Gudmundsson, Rebecca Pitts, Heather Hoover, Valborg Gudmundsdottir, Shane R. Horman, Thor Aspelund, Le Shu, Vladimir Trifonov, Sigurdur Sigurdsson, Andrei Manolescu, Jun Zhu, Örn Olafsson,

Johanna Jakobsdottir, Scott A. Lesley, Jeremy To, Jia Zhang, Tamara B. Harris, Lenore J. Launer, Bin Zhang, Gudny Eiriksdottir, Xia Yang, Anthony P. Orth, Lori L. Jennings‡,

Vilmundur Gudnason†‡

*These authors contributed equally to this work. †Corresponding author. Email: [email protected] (V.E.); [email protected] (V.G.);

[email protected] (J.R.L.) ‡These authors contributed equally to this work.

Published 2 August 2018 on Science First Release

DOI: 10.1126/science.aaq1327

This PDF file includes:

Materials and Methods Figs. S1 to S14 Tables S2, S5, S8, S11, S12, S16, and S18 to S20 Captions for tables S1, S3, S4, S6, S7, S9, S10, S13 to S15, S17, S21, and S22 References

Other Supplementary Materials for this manuscript include the following: (available at www.sciencemag.org/cgi/content/full/science.aaq1327/DC1)

Tables S1, S3, S4, S6, S7, S9, S10, S13 to S15, S17, S21 and S22 (Excel)

Page 2: Supplementary Materials for · Materials and Methods 1. The study cohort Cohort participants aged 66 through 96 were from the (AGES) – Reykjavik Study (12), a single-center prospective

2

Materials and Methods

1. The study cohort

Cohort participants aged 66 through 96 were from the (AGES) – Reykjavik Study (12), a

single-center prospective population-based study of deeply phenotyped subjects (N = 5,457,

mean age 76.6±6 years). AGES Reykjavik is the study of the elderly and all survivors of the

50-year-long prospective Reykjavik study (N = 19,360), an epidemiologic study focusing on

four biologic systems: vascular, neurocognitive (including sensory), musculoskeletal, and

body composition/metabolism. Blood samples were collected at the AGES-Reykjavik

baseline after an overnight fast, serum was prepared using a standardized protocol and stored

in 0.5 ml aliquots at -80°C.

Prevalent coronary heart disease (CHD) was defined as previous myocardial infarction

(MI), coronary artery bypass graft or percutaneous intervention (PCI) obtained from hospital

records or prevalent MI, and according to echocardiography at AGES visit. Incident CHD

events included fatal CHD or incident non-fatal CHD (International Classification of Diseases

(ICD) 9th edition, codes 410, 411, 414, 429, and ICD-10th edition, codes I21–I25), obtained

from cause of death registries and hospitalization records. The criteria for heart failure (HF)

was based on symptoms, signs, chest X-ray, and echocardiographic findings from hospital

records adjudicated by examining every record for both prevalent and incident HF (8 year

follow-up). Metabolic syndrome (MetS) is defined by three or more of the following: 1.

Fasting glucose ≥ 5.6mmol/L, blood pressure ≥ 140/90, triglycerides ≥ 1.7 mmol/L,

0<HDL<0.9 mmol/L males or 0<HDL<1.0 mmol/L for females, BMI > 30kg/m2. Roughly

20.7% of the AGES study population falls under the criteria of having developed MetS.

Systolic and diastolic blood pressure were measured using a Mercury sphygmomanometer 2

times in a supine position, BMI was calculated as weight (kg) divided by height (in meters)

squared, TG, HDL cholesterol, and plasma glucose levels were measured on fasting blood

samples. TG was measured using enzymatic colorimetry (Roche Triglyceride Assay Kit),

HDL with an enzymatic in vitro assay (Roche Direct HDL Cholesterol Assay Kit), and

glucose was measured using photometry (Roche Hitachi 717 Photometric Analysis System).

T2D (all prevalent cases) was determined from self-reported diabetes, diabetes medication

use, or fasting plasma glucose ≥ 7 mmol/L according to ADA (31). Computed tomography

(CT) imaging of the mid-thigh and abdomen at the L4/L5 vertebrae was performed with a 4-

row detector system (Sensation; Siemens Medical Systems, Erlangen, Germany). Visceral

adipose tissue (VAT) and abdominal subcutaneous adipose tissue (SAT) were estimated from

a single 10mm thick trans-axial section via CT. Images were loaded into an AVS5 display

environment. VAT was distinguished from SAT by tracing along the facial plane defining the

internal abdominal wall. Adipose areas were calculated by multiplying the number of pixels

by the pixel area using specialized software (University of California, San Francisco). Finally,

we assessed survival probability for individual proteins and the eigenvectors (E(q)

s, q denotes

a specific module as explained below) of protein modules in a 12-14 year follow-up study, i.e.

both overall survival with 2,982 events as well as survival post incident CHD with 692 events.

Follow-up time for overall survival was defined as the time from entry into AGES until death

from any cause or end of follow-up (end of year 2016), while follow-up time for survival post

incident CHD was defined as the time from 28 days after an incident CHD-event until death

from any cause or end of follow-up time. Table S2 reports the baseline characteristics of the

study cohort.

Given the frequent associations of the serum proteins to prevalent disease throughout

the present study, then we wanted to learn if the association of proteins to prevalent disease

like CHD was influenced by the time of diagnosis prior to the time of sampling. Note

Page 3: Supplementary Materials for · Materials and Methods 1. The study cohort Cohort participants aged 66 through 96 were from the (AGES) – Reykjavik Study (12), a single-center prospective

3

however, that numerous protein biomarkers linked to CHD have been identified using

prevalent disease in the discovery phase (32), including CRP, LpPLA2, NT-proBNP, cTnT

and Lp(a) to mention few. More to the point, the mean±SD time of diagnosis of CHD to the

time of sampling was 6.08±5.37 years in the AGES cohort. We performed forward logistic

regression analysis using either all 1,217 prevalent CHD cases or 700 CHD cases that were

diagnosed with the disease within 5 years before the entry into the AGES study. We found

that 927 proteins were correlated with CHD at a Bonferroni adjusted P-value <0.05 using all

prevalent cases, while 859 proteins were associated with the disease in the time restricted

analysis (fig. S8). Of these, an identical set of 768 proteins were found in both analyses. The

list of top proteins and the effect sizes of the overlapping protein set was unchanged between

the analyses, while the P-value was numerically higher in the time-restricted analysis most

likely due to reduced power as there were 517 fewer cases (fig. S8). In summary, variable

levels of proteins associated with prevalent disease like CHD were not affected by restricting

the analysis to the time of diagnosis to the time of sampling. The AGES-Reykjavik study was approved by the NBC in Iceland (approval number

VSN-00-063), and by the National Institute on Aging Intramural Institutional Review Board

(US), and the Data Protection Authority in Iceland. Informed consent was obtained from all

study participants.

2. Protein measurements and assessment of aptamer specificity

Each protein has its own detection reagent selected from chemically modified DNA libraries,

referred to as Slow-Off rate Modified Aptamers (SOMAmer). With the focus set on proteins

known or predicted to be present extracellularly or on the surface of cells, a new custom-

designed Novartis SOMAscan 5K platform was developed that measures 5,034 protein

analytes in a single serum sample, of which 4,783 SOMAmers bind specifically to human

proteins (4,137 distinct human proteins) and 250 SOMAmers that recognize non-human

targets (47 non-human vertebrate proteins and 203 targeting human pathogens). SOMAmers

are small single-stranded 40-mer DNA aptamers with modified nucleic acids selected to

specifically recognize target proteins in their native three-dimensional state. SOMAmer

reagents were selected for slow dissociation kinetics (t1/2>30 min), which in combination with

stringent wash steps impedes nonspecific binding (11).

Protein levels were measured at SomaLogic Inc. (Boulder, US) essentially as

previously described (10, 33). In brief, 5,457 individual serum samples were treated with the

detergent Tween-20 to prevent loss of reagent material to tube walls and for lysis of

exosomes, and then incubated with the mixture of 5,034 SOMAmers to generate SOMAmer-

protein complexes. Unbound SOMAmers and unbound or nonspecifically bound proteins

were eliminated by 2 bead-based immobilization wash steps and the use of polyanionic

competitors. After eluting the enriched SOMAmers from their target proteins they were

directly quantified on an Agilent hybridization array (Agilent Technologies). Hybridization

controls were used to correct for systematic variability in detection and calibrator samples of

three dilution sets (40%, 1% and 0.005%) were included so that the degree of fluorescence

was a quantitative reflection of protein concentration. All scale factors were then used to

normalize the protein data. We note that albumin-tolerance testing is a part of standard assay

development at SomaLogic and has been evaluated for all analytes on the new custom-

designed aptamer-based platform, showing no effect of albumin addition on the SOMAmer-

protein interactions.

To avoid batch or time of processing biases, both sample collection and sample

processing for protein measurements were randomized and all samples run as a single set.

The 5,034 SOMAmers that passed quality control had median intra-assay and inter-assay

coefficient of variation, CV = 100×/µ, <5%, or similar to that reported on variability in the

Page 4: Supplementary Materials for · Materials and Methods 1. The study cohort Cohort participants aged 66 through 96 were from the (AGES) – Reykjavik Study (12), a single-center prospective

4

SOMAscan assays (34). More specifically, we aliquoted samples from 30 subjects into two

separate plates (any two of the 67 plates processed), and assessed the inter-plate variability of

those proteins relevant to the present study, notably the serum protein network. First we

interrogated the inter-plate CV for the top 20% most connected proteins (kTotal) or 1,000

proteins/aptamers and found the median CV to be 0.60%. Secondly, we checked the inter-

plate CV for all the 390 proteins that constitute the PM26 module as an example, and found

that the median inter-plate CV for the proteins in PM26 was 0.42%. This exercise

demonstrates a remarkably low inter-plate CV in our dataset.

Information related to annotation of the 4,137 human proteins is provided in table S1.

Prior to the analysis of the protein data, we applied a Yeo-Johnson transformation on the

proteins to improve normality, symmetry and to maintain all protein variables on a similar

scale (35). Furthermore, we examined the pairwise correlation between all proteins measured

in the AGES population and found the median rho to be close to zero (rho = 0.0286,

interquartile range Q1 = -0.0577 and Q3 = 0.1268) consistent with there being no bias in the

data due to potential off-target binding of the aptamers.

2.1 Direct measures of aptamer specificity: confirmation of SOMAmer enrichment from

complex biological samples using mass spectrometry

SOMAmer reagents are selected not just for their affinity but also for low dissociation rates

(slow off-rates) with their target proteins (11). This kinetics-based element together with the

use of excess poly-anionic competitors and stringent wash steps during the screening process

can overcome non-specific binding interactions (11). To verify this, the authors selected and

purified a subset of random 20 SOMAmer-bound-to-protein complexes followed by a mass

spectrometry (MS) sequencing and confirmed the specificity for all of them with negligible

amount of contaminants (11). We have now expanded this work significantly to confirm

specificity of a much larger set, or 779 SOMAmers (tables S3 to S4). Thus in an effort to

confirm binding of SOMAmers on the custom designed SOMAscan platform to their

respective targets in an endogenous matrix, we have conducted a series of experiments using

SOMAmers to enrich proteins in complex biological matrices followed by measurements

using two mass spectrometry techniques: data dependent analysis (DDA) and multiple

reaction monitoring (MRM). Description of the experiments, results and data release is

detailed below.

For data dependent analysis, the library of 4,783 SOMAmers was multiplexed in sets

of 8 and used for enrichment of target proteins from cell lysate, conditioned media, human

plasma, and human serum. A subset was also screened in urine. Cell lines were selected

from Cancer Cell Line Encyclopedia (CCLE) (36), for screening by comparing gene

expression of the target proteins measured by RNA sequencing across the CCLE. Using the

criterion of Fragments Per Kilobase Million (FPKM) value greater than 5 for the target

protein transcript, a reduced number of cell lines were cultured to maximize coverage across

the proteins represented in the SOMAmer library. For a given cell line, the presence of at

least 8 target proteins with FPKM values greater than 5 was applied as a cutoff for inclusion.

It was not feasible to cover approximately 400 target proteins applying these criteria and these

proteins were screened in serum and plasma only. For enrichment from biological matrices,

SOMAmers were combined into sets of 8 such that the potential interaction with similar

proteins or binding partners was minimized, for example, isoforms and closely related

homologs were not included in the same batch of 8 targets. Non-specific binding of the

SOMAmers was assessed in each matrix by using a SOMAmer generated against the bacterial

protein phosphoadenosine phosphosulfate reductase (CysH). Target protein spectral counts in

the SOMAmer enriched samples were compared to the respective CysH control. A positive

Page 5: Supplementary Materials for · Materials and Methods 1. The study cohort Cohort participants aged 66 through 96 were from the (AGES) – Reykjavik Study (12), a single-center prospective

5

hit is defined as target protein detection with a minimum of 2 spectral counts and signal over

CysH background greater than 10X.

The SOMAmers were screened following established protocols (37). The SOMAmer

mix was combined with 1 mg lysate, 500 uL of conditioned media or 100 uL of plasma,

serum or urine and diluted to 1000 uL total volume with buffer (40 mM HEPES, pH 7.5, 100

mM NaCl, 5mM KCl, 5 mM MgCl2, 1 mM EDTA, 0.5%NP-40). Enriched proteins were

released from the SOMAmers via denaturation, followed by reduction, alkylation, and

digestion with Trypsin/Lys-C. For plasma and serum matrices, a deglycosylation step with

PNGaseF was added before Trypsin/Lys-C digestion. Peptides were reconstituted in 40 uL

2% acetonitrile 0.1% formic acid and analyzed using a nanoflow liquid chromatography

system (Proxeon nano-LC ) coupled to a data dependent mass spectrometer (LTQ-Orbitrap-

Velos or Q Exactive Plus mass spectrometer, Thermo Scientific, San Jose, Ca). The sample

(2 uL) was loaded onto an EASY-Column™ (2 cm x 100 um ID, 5 um, 120Å C18-A1,

ThermoFisher Scientific) and separated using a 75 um id Picotip emitter with a 15 um

diameter tip (Cat No PF360-75-15-N-5, New Objective, Woburn, MA) hand packed with

Magic C18 100Å 3 um resin to a length of 12 cm. The following gradient conditions at a

flow rate of 400 nL/min (Mobile Phase A: 100% water 0.1% formic acid (FA); Mobile Phase

B: 90% acetonitrile (ACN), 10% water 0.1% FA) were used: 2-30% B over 40 min, 30-80%

B over 5 min, 80% B for 2 min, followed by column washing and equilibration. The mass

spectrometer was operated in the standard scan mode with positive ionization, electrospray

voltage of 2.75 kV and ion transfer tube temp of 275°C. Full MS spectra were acquired in the

Orbitrap mass analyzer over the 325-2000 m/z range with 60,000 mass resolution, AGC target

of 1x10^6, and 500 ms injection time. The 10 most intense peaks with a charge state ≥2 were

acquired with the LTQ ion trap with 1 microscan, minimum signal threshold of 5,000 counts,

100 ms injection time and 10,000 AGC. Dynamic exclusion was enabled with a repeat

duration of 10 sec, exclusion list size of 500, exclusion duration of 10 sec, and exclusion mass

width relative to reference mass (+/- 10 ppm). Comparable parameters were used on the Q

Exactive Plus. The data were searched using Uniprot Human canonical database (v Jan 2014)

with common contaminants and reverse database appended (43,136 sequences; 23,452,844

residues). Fifteen proteins on the SOMAmer list are present in the contaminants database, so

this subset of proteins was searched both with and without this database. Raw data were

processed with Mascot (v 2.4) using default parameters: trypsin enzyme specificity allowed

for up to 2 missed cleavages, monoisotopic mass values, unrestricted protein mass, peptide

mass tolerance +/- 15ppm, and fragment mass tolerance +/- 0.8Da. The fixed modification

Carbamidomethyl (C) and the following variable modifications were selected: Oxidation (M),

phospho (ST), and phospho (Y). In samples treated with PNGaseF, the additional variable

modification of deamidation (NQ) was used. The PeptideProphet and ProteinProphet

algorithms were used for peptide and protein identification, respectively (ISB/SPC Trans

Proteomic Pipeline TPP v4.3 JETSTREAM rev 1, Build 200909091257 (MinGW)). Protein

results were filtered using a false discovery rate (FDR) of less than 1%.

The MRM method was employed for selected protein targets in follow-up work in the

same biological matrices (cell lysate, conditioned media, serum, plasma and urine). The

higher sensitivity of MRM detection results in a higher success rate when compared to DDA,

however the additional time and expense requirements preclude the use of this methodology

for large-scale screening of the large SOMAmer library on the current custom designed

SOMAscan. Multiple tryptic peptides (minimum of 3 per protein) were selected based using

standard criteria (38). Heavy-labeled (13

C615

N4-arginine, 13

C615

N2-lysine) peptides were

synthesized to act as internal standards (crude or >97% purity with concentration determined

by AAA, JPT and ThermoFisher Scientific). Peptide optimization and transition selection was

completed using Skyline software (MacCoss Lab Software, University of Washington) (37,

Page 6: Supplementary Materials for · Materials and Methods 1. The study cohort Cohort participants aged 66 through 96 were from the (AGES) – Reykjavik Study (12), a single-center prospective

6

39). SOMAmer enrichment was performed as above. Peptides were reconstituted in 10 uL

2% acetonitrile 0.1% formic acid, diluted in a mixture of respective internal standard peptides,

and analyzed using a nanoflow liquid chromatography system (Proxeon nano-LC ) coupled to

a triple quadrupole mass spectrometer (TSQ Vantage or TSQ Altis,Thermo Scientific, San

Jose, Ca). The sample (2 uL) was loaded onto an EASY-Column™ (2 cm x 100 um ID, 5

um, 120Å C18-A1, ThermoFisher Scientific) and separated using a 75 um id Picotip emitter

with a 15 um diameter tip (Cat No PF360-75-15-N-5, New Objective, Woburn, MA) hand

packed with Magic C18 100Å 3 um resin to a length of 12 cm. The following gradient

conditions at a flow rate of 250 nL/min (Mobile Phase A: 100% water 0.1% formic acid

(FA); Mobile Phase B: 90% acetonitrile (ACN), 10% water 0.1% FA) were used: 2-40% B

over 25 min, 40-80% B over 1 min, 80% B for 3 min, followed by column washing and

equilibration. The mass spectrometer was operated in the SRM scan mode with positive

ionization, electrospray voltage of 1800 V, capillary temperature of 225°C (TSQ Vantage) or

325°C (TSQ Altis), Q1 and Q3 resolution settings of 0.7 FWHM, and a cycle time of 1.0

second. Collision energy (CE) parameters were calculated using linear equations in Skyline

and collision cell gas pressure of 1.0 mTorr was used for fragmentation. Positive detection

was defined using standard criteria of co-elution and equivalent transition patterns with the

internal standard, as well as the absence of interferences. Data was processed using Skyline

software.

Results of the mass spectrometry experiments were combined into a database

containing confirmatory evidence of 779 SOMAmer reagents binding their endogenous

targets (736 by DDA and 104 by MRM). The raw DDA data have been deposited to the

ProteomeXchange Consortium (http://proteomecentral.proteomexchange.org) via the PRIDE

partner repository (40), with the five dataset identifiers PXD008819-PXD008823. The raw

MRM data have been deposited to ProteomeXchange Consortium via the Peptide Atlas

PASSEL repository with the dataset identifier PASS01145. These databases can be used to

aid in the prioritization of SOMAscan results for technical and biological validation studies.

Results from the larger DDA screening efforts are utilized to set up targeted MRM assays for

additional follow-up. The annotation of the 779 SOMAmer reagents is provided in tables S3

and S4.

2.2 Inferential support for SOMAmer specificity towards target proteins

Here we highlight the use of inferred measures of SOMAmer specificity towards their

respective targets including: 1) Cross-platform validation of a number of known protein

biomarkers. Figure S2A demonstrates strong correlation between different measurements of

the well characterized serum proteins insulin (INS), C-reactive protein (CRP) and natriuretic

peptide B (NPPB) in the AGES population using either the custom-designed SOMAscan or

standard immunoassays. Although highly significant, the correlation for INS (r = 0.680, P =

1×10-264

) is apparently not as marked as for NPPB (r = 0.915, P < 1×10-300

) and CRP (r =

0.984, P < 1×10-300

). It is noted however, that the aim of the present study is not clinical assay

validation and development but discovery of new biomarkers that will enable us, and others,

to expand the toolkit for the development of novel diagnostics and therapeutics. Next, we

applied the SOMAscan to confirm the associations of a 20 different protein biomarkers to

relevant phenotypic measures previously found through standard immunoassays (table S5 and

figs. S2B and S3). Figure S2C highlights cross-platform validation of the known associations

of elevated serum levels of NPPB and growth differentiation factor 15 (GDF15) with reduced

survival probability post incident CHD (41, 42). 2) Assessment of cis-pSNP‘s is an internal

measure of specificity for the SOMAmer-protein interactions (see section “S4.1 Identification

of cis- and trans-acting protein SNPs” below). Here, a cis SNP is proximal to a given protein

encoding gene and affecting variable levels of the cognate protein, detected by a SOMAmer

Page 7: Supplementary Materials for · Materials and Methods 1. The study cohort Cohort participants aged 66 through 96 were from the (AGES) – Reykjavik Study (12), a single-center prospective

7

designed to bind the cognate protein. In other words, the proximal cis variant localizes the

SOMAmer to the intended target protein, thus supporting its target specificity. 3) We note that

many of the results presented in the current work including for instance the functional

annotation of network modules and the expected stronger links of hub proteins to disease,

indirectly support the specificity of the aptamers towards the intended target. Overall, these

data in combination with the mass spectrometry approaches described above indicate

consistent target specificity across the platform. Table S6 lists all direct and inferential

measures of aptamer specificity in the present study. We note, however, that direct validation

of aptamers which still lack information regarding their target binding specificity is an

ongoing process.

Construction of the protein co-regulation network

We used a previously described method coupled with the Weighted Gene Co-Expression

Analysis (WGCNA) R package (43). Biological metabolic networks are scale-free as regards

topology and any network that doesn’t reflect this property is unreliable (44). Scale-free

networks have a degree distribution that follows a power law. Measurement of the network’s

scale-free topology is the 𝑟2 coefficient, which is the fitting index for the linear model

regressing log(𝑘) and log(𝑝(𝑘)), 𝑘 being the connectivity and 𝑝(𝑘) its distribution, but 𝑟2 = 1

signifies a perfectly scale-free network. This is never the case with real-world biological

networks, in which case the criterion requires nearly scale-free topology (𝑟2 ≳ 0.8). The

method starts by putting in a matrix the Pearson correlation, 𝑠𝑖𝑗 = 𝑐𝑜𝑟(𝑥𝑖 , 𝑥𝑗), between each

pair of proteins. This correlation matrix is then transformed through 𝑎𝑖𝑗 = |𝑠𝑖𝑗|𝛽

, which are

the elements of the adjacency matrix, 𝐴. This power transformation is used to punish weak

correlations and reward strong ones making less meaningful weak correlations weakened

further and strong ones amplified which in turn decreases noise and increases network

robustness. This condition alone constraints the value of 𝛽. Due to our large number of

samples and dynamic range in protein levels we were able to afford using 𝛽 = 5 even though

the community standard for unsigned networks like ours is 𝛽 = 6. Further details regarding

the scale-free property of the serum protein network are presented in the section “3.1

Assessing the robustness of the serum protein network” below.

In hyperspace, the distance between two proteins is given by 𝑑𝑖𝑗 = 1 − 𝑎𝑖𝑗 and is

called the dissimilarity measure. The WGCNA package uses the hierarchical clustering

algorithm to create groups of closely co-expressed proteins creating a tree and the Dynamic

Tree Cut package (29), cuts branches according to specific morphological characteristics

(branch size, structure, etc.). Each cut branch represents a module, a group of closely related

proteins. The connectivity of a protein is simply the sum of all the adjacencies with all the

other proteins, 𝑘𝑖 = ∑ 𝑎𝑖𝑗𝑖≠𝑗 , where intra-module connectivity (kWithin) is the same concept

but only for proteins inside a specific module. The maximum connectivity 𝑘𝑚𝑎𝑥, is simply the

largest connectivity from the list of all protein with the largest connectivity from the list of all

proteins within a specific module. For ease of comparison between modules, connectivity

values were scaled as 𝐾𝑖 =𝑘𝑖

𝑘𝑚𝑎𝑥 . Protein significance is the absolute value of the correlation

between a protein and a disease 𝑃𝑆𝑖 = |𝑐𝑜𝑟(𝑥𝑖 , 𝑇)|𝛽. By plotting the scaled intra-modular

connectivity of all proteins versus the protein significances we can uncover which proteins

were most strongly associated with each trait. The slope of the regression line is the hub

protein significance, defined as HubProtSignif =∑ 𝑃𝑆𝑖𝐾𝑖𝑖

∑ (𝐾𝑖)2

𝑖. Studies have shown that highly

connected hub proteins are essential for yeast survival and are preserved across species (17,

45-49). Next, we characterized each module´s eigenvector or better eigenprotein (E(q)

, i.e. the

1st principal component of a give module which is q) through a singular value decomposition

Page 8: Supplementary Materials for · Materials and Methods 1. The study cohort Cohort participants aged 66 through 96 were from the (AGES) – Reykjavik Study (12), a single-center prospective

8

and transformation of the variable protein levels for any given module. E(q)

represent most

closely the behavior and biological relevance of each of the 27 modules as these modules can

be viewed as independent sub-networks. Finally, we carried out 1,000 permutations to test

whether the network modules could be derived from random data and performed module

preservation analysis as described in more details in the section “3.1 Assessing the robustness

of the serum protein network” below. Network visualization was performed with the igraph

package in R, a circle graph for smaller modules and spring graph for the larger modules (30).

3.1 Assessing the robustness of the serum protein network

In the random network model of Erdős-Rény, the degree connectivity of nodes follows a

Poisson distribution (13). Random networks, however, neither capture the degree distribution

nor the clustering coefficient of networks based on real data (13). Instead, real networks are

more clustered and consist of few highly connected hubs (13). In other words, biological

networks are not random but follow a scale-free power-law distribution (13, 14). The scale-

free criterion is imposed for the simple reason of cleaning out spurious and/or noise induced

connections. If we build the serum protein network without the scale-free based constraints

using power transformation, we should still get a modular network like before with several

expected differences like fewer and larger clusters since there is no punishment on the weak

connections between proteins. To confirm this, we reconstructed the network by omitting any

power transformation ( = 1), hence removing the scale-free criterion altogether. This resulted

in a network consisting of 11 large modules as opposed to 27 modules of smaller sizes (fig

S4A, B). The larger size of modules is due to the fact that there was no punishment imposed

on weak connections between protein nodes. Thus the group of unconnected proteins goes

from 716 to 2 proteins. The fewer number of clusters in the un-tuned network is due to the

fact that all proteins are included, both the weakly and strongly connected, thus many bridges

appear in the space been different clusters and close ones merge. Furthermore, the proteins

within these 11 modules show characteristic differential degree of connectivity as some

proteins were more strongly connected than others within the network. Thus the baseline un-

tuned serum protein network is scale-free as was initially anticipated.

The significance of the serum protein network was assessed by comparing the true

network to potential networks derived from 1,000 permutation tests through randomization of

each of the protein´s data across the AGES subjects. Here, we ran the co-expression algorithm

repeatedly and counted the number of modules created with the same parameters and

restrictions as were used for the true non-randomized protein data, i.e. that no cluster shall

have less than 20 proteins and the value was 5 for transforming the data. In our 1,000

permutation tests, we did not detect a single protein module based on these criteria beyond a

single group of a handful of proteins that clustered by chance. In contrast, with these

constraints, the true non-randomized protein data presented with 27 clusters/modules which

we in addition have shown to contain a deeper biological meaning through enrichment of

distinct functional categories and links to disease (see main text). Because of this a z-test,

𝑧 =x−µ

, where x is the number of modules from the real network, µ is the mean number of

modules from the permutations tests and is the standard deviation of the permutation test

results, is trivial. This is also apparent in the comparison between the degree connectivity

(kTotal) of proteins from the real network and corresponding proteins from the network based

on the permuted data (fig. S4C). More to the point, the mean kTotal for proteins from the real

network was 9.950, while kTotal was 0.000018 for the randomized protein data. In summary,

we have shown that the modularity and degree connectivity of the serum protein network is

highly robust and could not be explained by random chance. Finally, by systematically

Page 9: Supplementary Materials for · Materials and Methods 1. The study cohort Cohort participants aged 66 through 96 were from the (AGES) – Reykjavik Study (12), a single-center prospective

9

changing the tuning parameter between 1 and 10, no network structures appeared based on

the randomized dataset.

To test the preservation of the network structure, we split the AGES cohort into a

training set (main network) and in an independent test set (to compare to), either by 2/3 vs.

1/3 split or 1/2 vs. 1/2 split applying a suite of statistics for quantifying the preservation of a

module’s topology as described in Langfelder et al. (15). Langfelder et al., applied the

preservation model (summary Z score statistics) successfully on many independent datasets to

show preservation of networks and pathways within and across species and datasets, as well

as preservation of sex differences across datasets (15).

We applied the summary Z score statistics which produce multitude of indicators

showing many facets of a given network:

𝑍𝑠𝑢𝑚𝑚𝑎𝑟𝑦(𝑞)

=𝑍𝑑𝑒𝑛𝑠𝑖𝑡𝑦(𝑞)

−𝑍𝑐𝑜𝑛𝑛𝑒𝑐𝑡𝑖𝑣𝑖𝑡𝑦(𝑞)

2

Here we raised the following questions:

1. Density Z score: are the modules denser than background density created from

randomized data

2. Connectivity Z score: Is hub protein status preserved between the training and test

datasets?

The results are exhibited in Fig. 1B for the 1/2 vs. 1/2 split and fig. S10 for the 2/3 vs. 1/3

split. Here the summary Z score <2 indicates no preservation, 2< summary Z score <10

indicates moderate evidence of preservation, while a summary Z score >10 indicates strong

evidence of preservation. The summary Z-score thresholds were derived empirically as

previously described (15), applying multiple types of simulation tests condensing aggregate

multiple preservation statistics into a summary preservation statistic. The Bonferroni adjusted

P-value significance of the summary Z-score in our dataset for the different protein modules

was between 110-358

and 110-22

, revealing strong preservation of the serum protein

network. More to the point, all the 27 modules of the serum protein network showed summary

Z score >10 indicating strong validation of the network topology including connectivity status

and module density (Figs. 1B and S10A, B).

4. Genetic studies and statistical analyses

Genotyping was conducted using the Illumina Hu370CNV Array on 3,200 of the AGES study

subjects and SNPs and gene targets mapped using the GRCh37 build reference sequence.

AGES subjects were imputed with the imputation reference panel 1000G v3 for all ethnicities,

through the use of MACH v 1.0.16 (50). Also, genotypes assayed through the exome-wide

genotyping array Illumina HumanExome Beadchip were available for all the 5,457 AGES

subjects. For detection of network-associated protein SNPs (npSNPs) we applied conventional

P-value threshold of 510-8

for genome-wide significance and a P-value between 510-6

and

510-8

for suggestive evidence of associations. All AGES study cohort members were

European Caucasians. For a cis effect we considered an arbitrary window of 300kb region

across and including the protein coding gene in question, given majority of cis acting signals

detected for mRNA levels are found within that window (8), and assessed the window-wide

significance by correcting for number of SNPs tested in each window. For detection of cis-

trans pairs we used the Bonferroni corrected P-value threshold of 1×10-8

(adjusted for number

of proteins and cis SNPs tested). A more detailed description of the detection and

characterization of the cis and trans effects in the present study is found below.

For all single-point SNP association analyses we applied linear regression using an

additive genetic model. For the associations of individual proteins and modules (eigenvectors)

to different phenotypic measures we used forward linear or logistic regression or Cox

Page 10: Supplementary Materials for · Materials and Methods 1. The study cohort Cohort participants aged 66 through 96 were from the (AGES) – Reykjavik Study (12), a single-center prospective

10

proportional hazards regression, depending on the outcome being continuous, binary or a time

to an event. Given consistency in terms of sample handling including time from blood draw to

processing (between 9-11 am), same personnel handling all specimens and the ethnic

homogeneity of the population we adjusted only for age and sex in all our regression and

network-based analyses unless otherwise noted.

4.1 Identification of cis- and trans-acting protein SNPs

To identify proximal cis-acting effects on serum proteins we classified a cis-acting pSNP-

protein if the pSNP was no more than 150kbp distance up- or downstream of the protein

coding gene, or within the introns and exons of the corresponding gene. We defined the

window-wide significance for the cis effects, by adjusting the P-value for number of SNPs

tested in each gene window (P-value threshold between 6×10-6

and 5×10-4

). We identified

1,046 significant cis pSNP-protein associations, or 25.3% of all human proteins screened.

Table S13 lists all significant cis-acting pSNP-proteins detected in the present study, as a

single lead pSNP per region showing the best P-value. For now, independent cis effects per

region were not considered. It is of note that 39.5% of all pSNPs were located within the

introns, exons or untranslated regions of the corresponding protein gene. Figure S11A

highlights some cis-acting effects on protein levels depending on if the lead pSNP was

missense, in UTR regions, intronic or intergenic.

To provide some insights into how the human serum proteome variation is regulated,

we cross-referenced the cis pSNP-proteins detected in serum to previously identified

expression SNPs/QTLs (eSNPs/eQTLs) identified in >30 different tissues and cell types using

PhenoScanner (20), and applying stringent cutoffs of P < 5×10-8

and SNP proxies at r2≥0.8

for significant matches. We found that 37.3% of the pSNP-proteins matched corresponding

eSNP-transcripts identified in one or more solid tissues (table S14). This suggests that 60%

of the genetic effects on serum protein levels are either mediated by as yet unknown

transcriptional effect and/or post-transcriptional mechanisms, which is consistent with

previous data from human and yeast studies (51).

Given cis-acting pSNPs are functionally annotated variants as they affect population

variation of adjacent proteins then they can be useful as genetic instruments in Mendelian

randomization studies to infer causality between a protein and disease (pSNP Protein

Disease). We cross-referenced all cis pSNPs detected in the present study with GWAS lead

SNPs reported in the PhenoScanner using P < 5×10-8

for genome-wide significance and

r2≥0.8 for relevant SNP proxies (20). We found that 232 or 20.7% of all cis-acting serum

pSNPs matched GWAS lead SNPs associated with various disease-related phenotypes

including inflammatory bowel disease (IBD), adiposity, age related macular degeneration

(AMD), blood pressure, CHD, Crohns disease, hematological parameters, lipoprotein

fractions, late-onset Alzheimer´s disease (LOAD), DNA methylation status, MetS, multiple

sclerosis (MS), prostate cancer, rheumatoid arthritis (RA), systemic lupus erythematosus

(SLE), diabetes and venous thrombosis (table S15).

Theoretical and experimental studies suggest that network hubs are evolutionary

conserved and robust against disturbances like deleterious mutations or hub removal (7, 17,

18, 21, 52). In other words, a removal of a hub protein in biological networks will have a

larger effect on phenotype outcome than a removal of a random protein. In fact we have

shown that protein hubs are more strongly connected to various disease outcomes than less

well connected proteins within the serum protein network (Figs. 3 and S9). We explored if the

proteins affected by cis pSNPs showed differential degree of connectivity depending on either

the strength of the -coefficient or in comparison to other proteins across the serum protein

network. First we found that the mean connectivity was significantly lower among proteins

with a detected cis effect compared to proteins with no detected cis effect (fig. S11B). Here,

Page 11: Supplementary Materials for · Materials and Methods 1. The study cohort Cohort participants aged 66 through 96 were from the (AGES) – Reykjavik Study (12), a single-center prospective

11

the mean kTotal was 10.8 for proteins with no cis effects vs. 7.7 for proteins with significant

cis effects (down by 28.2%, P = 3×10-16

). Secondly, there was a significant negative

correlation between the standardized -coefficient (absolute values) of the cis effects and the

network connectivity of corresponding cis serum proteins (r = -0.231, P = 1×10-15

) (fig.

S11C). Thus cis pSNP-protein effects were significantly under-represented among highly

connected protein nodes which may reflect a relaxed selective constraint on proteins with low

connectivity. These results are in agreement with previous observations showing hub proteins

to be essential and evolutionary conserved (7, 17, 18, 21, 52), and to have a greater effect on

disease outcome (17). The observed phenomenon described above is not restricted to

humans, but has been noted in other kingdoms as well including plants (21).

We tested the association of each cis-acting pSNP to all proteins screened in the

present study (table S17). Here, 16.0% of the cis pSNPs affected one or more proteins in

trans at a Bonferroni adjusted P-value <1×10-8

, or 911 proteins in trans (table S17). Thus

together with the proteins regulated in cis, the cis-acting pSNPs affected levels of 1,954 serum

proteins. Of interest, 40.7% of the cis pSNPs that were trans-acting were also associated with

GWAS lead SNPs, which is an increase by 20% compared with 20.7% for all cis-acting

pSNPs (see above). This indicates that the trans effects on proteins levels could be a critical

part of the mechanism(s) underlying the genetic risk at GWAS loci.

In the past 10 years, GWASs have discovered thousands of disease-associated genetic

loci providing insights into the genetic architecture of complex disease (19). Here, many

common SNPs, each SNP contributing only a small amount to the total risk, act

synergistically to influence susceptibility to a complex disease. Majority of GWAS lead

SNPs are located outside the coding regions of genes, suggesting a key role for gene

regulation in the disease aetiology. In fact, a strong enrichment of cis-acting eSNP/eQTLs

among the GWAS signals has been observed (25, 53). A recent analytical study demonstrated

that GWAS SNPs that contribute most to the heritability of a given disease are not necessarily

located near genes with disease-specific effects or found in core pathways (25). In other

words, the numerous small peripheral GWAS effects converge onto a common biological

network that integrate other signals (e.g. environmental) as well, influencing activity/levels of

core protein hub(s) which in turn can cause a disease (25). The accumulated data suggest that

cis acting pSNPs affect proteins that are located at the periphery of the network and similar to

GWAS signals may individually or synergistically affect activity/levels of neighboring

proteins including protein hubs to affect disease.

4.2 Validation of cis and trans pSNP-protein findings across different study populations and

proteomic platforms

In this section we tested the replication of previously reported cis and trans pSNP-protein

findings identified in different study populations and across different or related proteomic

profiling platforms. Given the differences in the genotyping and proteomic platforms, and the

definition of cis and trans effects between the different studies, we have used a moderate

proxy threshold of r2 ≥ 0.5 between pSNPs for any comparison of pSNP-protein pairs

between studies. Generally, however, we interrogated the associations of the reported pSNP-

protein pairs directly in our dataset, at least for the large studies.

The percentage confirmation in the AGES of previous findings was only computed for

those proteins that are detected with the present multiplex aptamer-based platform. Proteins

encoded by genes on the X chromosome were excluded from the analysis as they were not

tested for cis linked association. Given our definition of cis effects within a 300kb window

was not necessarily applied in the other studies, we have followed the study-specific

definition. Therefore, in some cases therefore, the study-specific SNPs were not the strongest

Page 12: Supplementary Materials for · Materials and Methods 1. The study cohort Cohort participants aged 66 through 96 were from the (AGES) – Reykjavik Study (12), a single-center prospective

12

cis or trans effects identified in the present study. For these we considered P < 1×10-4

to be a

significant replication provided the effect is directionally consistent across studies.

Johansson et al. (54), used mass spectrometry (MS) to quantify 163 proteins in

1,060 subjects and identified cis acting effects for five proteins. These effects were all

replicated in our dataset (table S18). Kim et al. (55), screened 132 proteins in plasma of

521 subjects from the ADNI cohort using multiplex immunoassay-based platform,

identifying 28 cis pSNP-proteins. We confirmed 73.9% of these cis effects in the AGES

(table S18). Further, Enroth et al. (56), applied a multiplex immunoassay-based platform

that quantified 92 inflammation related plasma proteins screened in 1,005 individuals

identifying cis acting effects for 23 proteins, of which 63.5% were replicated in the AGES

(table S18). Liu et al. (57), applied a SWATH mass spectrometry technique to measure 342

unique plasma proteins in 232 samples and identified cis-acting pSNPs affecting 13 proteins.

Out of the 13 proteins, eight proteins were measured with our aptamer-based platform of

which seven cis effects (87.5%), were replicated in the AGES (table S18). Thus on average,

we confirmed 74.6% of all pSNP-protein associations detected with non-aptamer based

technology.

Next, we tested replication of pSNP-protein findings in studies applying the aptamer-

based platform (58, 59). We note that these studies often report multiple pSNPs per locus,

thus we explored all cis and trans pSNPs detected in their studies for association to

corresponding serum protein(s) in the AGES cohort. For the cis and trans effects reported in

Suhre et al. (59), we confirmed 88.3% of all cis and 84.5% of all trans effects in the AGES

dataset (table S18). For instance, Suhre et al., reported 14 trans effects mediated by six

independent pSNPs at the ABO locus (59). We ran two of these trans acting pSNPs rs651007

and rs8176749 proximal to the ABO locus and confirmed all of these trans effects except for

NOTCH1. For the cis and trans effects reported in Sun et al. (58), 75.7% of the cis effects and

72.8% of the trans effects were confirmed in the AGES (table S18). For instance, Sun et al.

detected 115 proteins regulated in trans by the rs704 missense variant (NP_000629.3:

p.Thr400Met) in VTN while we detected 488 trans regulated proteins at their Bonferroni

adjusted P-value < 1.510-11

. The overlap between the rs704 mediated trans effects of the two

studies was 81.7%. In another example, Sun et al. detected 36 proteins that were affected by

20 independent pSNPs acting in trans at that ABO locus (58). We find that 13 of these pSNPs

affected 88 proteins in trans in the AGES dataset at P < 1.510-11

, with an 81% overlap of the

trans regulated proteins at the ABO locus between the two studies.

Of the aptamer-based studies mentioned above (table S18), Sun et al. (58), comes

closest to the present study as regards sample size and number of proteins measured.

However, they used a smaller version of the aptamer-based platform or 28.3% fewer proteins

and 40% fewer study participants which were predominantly of young age. Below we present

the reproducibility of selected examples of cis and trans effects described in Sun et al. in the

AGES dataset. Sun et al. reported a pSNP mediating a cis effect on WFIKKN2 as well as

mediating a trans effect on the myostatin protein GDF11/8 (58). We note that the cis effect

for WFIKKN2 was also reported in Suhre et al. (59). Using a window size of 300kb across

WFIKKN2, we detected a strong cis acting effect for WFIKKN2 (P = 210-93

) and also

mediating a trans effect on GDF11/8 serum levels (P = 210-9

) (fig. S14A). Here, the lead

SNP, the synonymous variant rs9675120 (NP_783165:p.Ser135=) in WFIKKN2, was highly

correlated (r2=0.928) with pSNP rs11079936 (58). The common allele T for rs9675120 was

associated with lower levels of both WFIKKN2 and GDF11/8 (fig. S14A). Furthermore, we

find that the proteins WFIKKN2 and GDF11/8 were positively correlated in the AGES data

(fig. S14B). The direction of all effects is consistent with that reported in Sun et al. (58).

GDF11/8 has been implicated in muscular dystrophy (60), and experimental studies have

shown that WFIKKN2 has strong affinity for GDF11/8 (61). Interestingly, we found that both

Page 13: Supplementary Materials for · Materials and Methods 1. The study cohort Cohort participants aged 66 through 96 were from the (AGES) – Reykjavik Study (12), a single-center prospective

13

WFIKKN2 and GDF11/8 map to the same protein module PM27 (table S7), a module

enriched for proteins involved in extracellular matrix organization and vascular disease. This

module is also enriched in fibrosis related signatures (62), where 8 out the 16 well-established

fibrosis-related proteins are found in PM27 (Fisher exact test P-value = 610-7

).

The second example from Sun et al. (58), is the GWAS locus for inflammatory bowel

disease (IBD) at the missense variant rs3197999 (NP_066278.3: p.Arg703Cys) in MST1.

This locus also affected five other proteins in trans including PRDM1 (aka BLIMP1) at

chromosome 6 (58). We find a strong cis acting effect on MST1 and significant trans acting

effects on 11 proteins including three of the five reported in Sun et al. (fig. S14C,D). In the

third and final example of replicated findings from the work of Sun et al., we focused on the

pQTL hotspot at the vasculitis associated missense variant rs28929474 (NP_001121179:

p.Glu366Lys) in SERPINA1 that was associated with 13 proteins (58). We find that

rs28929474 was associated with 17 proteins in trans, of which 8 were reported in Sun et al.

(58), and we find were directionally consistent across both studies (fig. S14E, F). Also, we

find the rs28929474 mediated a weak cis effect on SERPINA1 (T allele, = -0.471, P =810-

6), directionally consistent with that of Sun et al. (58).

In summary, this extensive validation and comparative study not only reveals the

robustness of our multiplex aptamer-based platform to confirm findings across independent

study populations and proteomics platforms, but highlights the added information the present

study can provide in terms of identifying links to new proteins and the relationship between

proteins in the context of the serum protein network. Although the study cohorts were

different in terms of subject recruitment, age range, health status and ethnic homogeneity, and

in the genotyping and proteomic platforms applied, on average 80% of all reported cis effects

and 74% of all trans effects were confirmed in our dataset. It is possible that study-specific cis

and trans effects exist that appear in a single study only. Finally, a lack of replication of cis

and trans effects may indicate false positive findings in the discovery study.

5. Assessment of tissue specificity of cis and trans proteins and protein modules

Transcript expression data for 53 different human tissues as median RPKM by tissue, was

downloaded from GTEx (https://www.gtexportal.org) on 07/25/2017. The GTEx project

provides RNA-Seq based transcriptome data in over 40 tissues from hundreds of human

donors and since multiple tissues are collected from the same individuals, cross-tissue

analysis is feasible (63). The specificity score for a gene in a tissue was calculated by

subtracting from its RPKM value the mean value in all other tissues for that gene and dividing

by the standard deviation of those values. The top 0.5 to 2.5% (Z >9.24 to >2.75) were

declared as tissue specific and mapped to modules after removal of duplicate matches.

Similarly, the subset of cis-trans protein pairs, were selected where mRNA levels for both

scored in the top 2.5% for tissue specificity (Z>2.75). Here, 158 cis-trans pairs showed the

same tissue specific expression while 2,119 pairs exhibited different tissue specific expression

(table S21).

The npSNP discovery also allowed us to assess if the serum networks resulted from

cross-tissue regulatory control. For example, the rs704 control of VTN protein levels occurred

primarily in liver (tissue specific Z >123), and this npSNP regulated proteins across several

modules including other tissue specific proteins. For example tissue specific proteins from

five and 19 distinct tissues were regulated by VTN in the PM7 and PM10 modules

respectively (18 non-liver tissues, table S22). These results provide evidence that in a number

of cases, npSNPs affected serum levels of a tissue specific protein and that subsequently

affected variable serum levels of other proteins synthesized in distinct tissues.

Finally, we interrogated how well the protein modules agree with the gene mRNA co-

expression modules constructed in solid tissues and evaluated if similar network organization

Page 14: Supplementary Materials for · Materials and Methods 1. The study cohort Cohort participants aged 66 through 96 were from the (AGES) – Reykjavik Study (12), a single-center prospective

14

is shared at both the protein and gene expression mRNA levels. In addition, this may help

indicate the potential tissues of origin of the serum protein network modules. The assessment

of overlaps between serum protein modules and 2,672 gene mRNA co-expression modules

constructed from whole-genome transcript information from multiple solid human tissues

(16), was based on how well two modules shared similar set of genes encoding either mRNA

or proteins. Here, we counted the number of genes/proteins that were common between a

protein module and a gene mRNA co-expression module in a given tissue and calculated the

overlap ratio of the match (fig. S5). Next, we assessed the significance of this overlap against

random expectation using Fisher´s exact test. Heatmaps of overlap ratio values were used to

show that most module pairs have a very low gene member overlap (<8%) (fig. S5). A

heatmap based on the statistical test P-values showed three protein modules with weak but

significant overlaps with the tissue (mainly liver, muscle and adipose tissue) mRNA co-

expression modules (fig. S5). The accumulated data suggest that the serum protein network

arose at least in part via systemic cross-tissue regulation.

Page 15: Supplementary Materials for · Materials and Methods 1. The study cohort Cohort participants aged 66 through 96 were from the (AGES) – Reykjavik Study (12), a single-center prospective

15

Fig. S1. A general workflow of the present study. The figure demonstrates the datasets used

in the present study and the analyses of the datasets including the construction of the serum

protein network, identification of its individual protein modules and their association to

genetic variants and disease related outcomes.

Page 16: Supplementary Materials for · Materials and Methods 1. The study cohort Cohort participants aged 66 through 96 were from the (AGES) – Reykjavik Study (12), a single-center prospective

16

A

B

C

Fig. S2. Cross-platform validation of protein measurements. (A) A comparison between

the SOMAmer-based technology and immunoassays measuring serum levels of C-reactive

protein (CRP), r=0.984, P<1×10-300

, insulin (INS), r=0.680, P=1×10-264

, and natriuretic

peptide B (NPPB), r=0.915, P<1×10-300

. (B) Cross-platform validation of the correlation of

five known plasma protein biomarkers to the phenotypic measures previously observed using

immunoassays including prevalent heart failure (prev HF), metabolic syndrome (MetS), type

2 diabetes (T2D), lean (BMI<25), overweight (25BMI<30) or obese (BMI≥30) (see table

S5). (C) The custom-designed SOMAscan was used to confirm the association of elevated

Page 17: Supplementary Materials for · Materials and Methods 1. The study cohort Cohort participants aged 66 through 96 were from the (AGES) – Reykjavik Study (12), a single-center prospective

17

serum levels of NPPB (red curve) and growth differentiation factor 15 (GDF15) (red curve) to

lower probability of survival post incident coronary heart disease (CHD) (highest vs. lowest

quartiles of the respective protein levels). General: controls are subjects free of the disease in

question. Data were analyzed using forward linear or logistic regression or Cox proportional

hazards regression, depending on the outcome being continuous, binary or a time to an event.

Kaplan-Meier plots were used to display survival probabilities.

Page 18: Supplementary Materials for · Materials and Methods 1. The study cohort Cohort participants aged 66 through 96 were from the (AGES) – Reykjavik Study (12), a single-center prospective

18

Fig. S3. A correlation matrix for selected candidate proteins. The correlation matrix

demonstrates the relationship between the candidate proteins from table S5, and includes as

well the highly connected hub proteins highlighted in Figs. 3, S9 and S10.

Page 19: Supplementary Materials for · Materials and Methods 1. The study cohort Cohort participants aged 66 through 96 were from the (AGES) – Reykjavik Study (12), a single-center prospective

19

A

B

C

Fig. S4. Clustering and robustness of the serum protein network. (A) Hierarchical

clustering by applying dynamic tree cut and a power transformation of 5 (=5) resulting in 27

protein modules each containing a minimum of 20 proteins (table S7). (B) A dynamic tree cut

using power transformation of 1 (= 1) and a minimum of 20 proteins per module, resulting

in 11 relatively large modules compared to using power transformation = 5, thus

maintaining the scale-free property of the network. (C) Comparison between connectivity

(kTotal) of proteins from the real network (blue curve) and corresponding proteins from a

network based on random protein data (cyan curve). Proteins (x-axis) were ordered by

Page 20: Supplementary Materials for · Materials and Methods 1. The study cohort Cohort participants aged 66 through 96 were from the (AGES) – Reykjavik Study (12), a single-center prospective

20

annotation and increasing kTotal (y-axis). The mean kTotal for proteins from the real network

was 9.950, while the mean kTotal was 0.000018 for the randomized protein data.

Page 21: Supplementary Materials for · Materials and Methods 1. The study cohort Cohort participants aged 66 through 96 were from the (AGES) – Reykjavik Study (12), a single-center prospective

21

Fig. S5. Heat plots of the overlap between modules of the serum protein network and

gene mRNA co-expression modules generated from solid tissues. Limited overlap was

found between protein modules within the serum protein networks and 2,672 gene co-

expression modules constructed from multiple solid tissues (21). Top panel is the overlap

heatmap, representing the overlap ratio between each pair of protein module (rows) vs gene

co-expression module (columns). Proteins in each protein module were assessed for overlaps

with genes in each gene co-expression module by Jaccard Index, defined as the number of

shared genes between the two modules divided by the sum of unique genes in both modules.

Jaccard index values are plotted on a color scale at the intersections of each protein module-

gene module pair. The best Jaccard index is only 8%, a very low overlap ratio. The bottom

panel is the overlap heatmap based on statistical significance of the module overlap analysis

as evaluated by Fisher's Exact Test with Bonferroni correction. -log10 (adjusted P-values)

were used in this heatmap. Similarly, protein modules are in rows and gene modules are in

columns, and the intersection between a row and a column is colored based on the

significance of the -log10 (adjusted P-values). Only three protein modules demonstrated

significant overlap with gene co-expression modules at the cutoff of Jaccard Index > 5% and

Bonferroni-corrected P-value < 0.05 (shown in the heatmap to the right). Among these, PM23

overlaps with only liver gene co-expression modules, PM27 overlaps with liver and muscle

gene-coexpression modules, whereas PM24 overlaps with adipose, hypothalamus and liver

gene co-expression modules. Hierarchical clustering was applied to the rows and columns of

both heatmaps, and dendrograms were plotted accordingly.

Page 22: Supplementary Materials for · Materials and Methods 1. The study cohort Cohort participants aged 66 through 96 were from the (AGES) – Reykjavik Study (12), a single-center prospective

22

Fig. S6. A dendrogram showing the inter-module clustering of the different protein

modules via correlation of their eigenproteins (E(q)

s). PM1 does not link to any other

modules, while the other modules form four major super-clusters reflecting the functionality

shared between modules (tables S8 and S11). The numbers at the branches of the dendrogram

refer to the number of proteins found in a given protein module. Functional categories and

tissue/cell specific signatures enriched in the different super-clusters were obtained using

annotation tools like WebGestalt, DAVID, GeneMANIA and CTen, also reported in table

S11. Modules are ordered and annotated according to their inter-module relationship here as

well as throughout the present study.

Page 23: Supplementary Materials for · Materials and Methods 1. The study cohort Cohort participants aged 66 through 96 were from the (AGES) – Reykjavik Study (12), a single-center prospective

23

A B C

D E F

G H I

Fig. S7. The relationship between modules E

(q)s to disease related measures and

outcomes. (A) The modules PM7 and PM10 are members of super-cluster II. (B) Inverse

association of the modules E(PM7)

and E(PM10)

to prevalent heart failure (prev HF), ***P110-

16. (C) Reduced overall survival probability for low E

(PM7) levels (cyan curve) compared to

high E(PM7)

levels (red curve). (D) PM16, a 170 protein module, is a member of super-cluster

IV. (E) Positive association of E(PM16)

quintiles to variation (cm2) in visceral adipose tissue

(VAT), P = 310-16

, to the metabolic syndrome (MetS) and prevalent coronary heart disease

(prev CHD) and HF, ***P <110-11

. (F) Reduced overall survival probability for high E(PM16)

levels (red curve) compared to low E(PM16)

levels (cyan curve). (G) PM26, a 390 protein

module, is a member of super-cluster V. (H) Positive association of E(PM26)

to prevalent CHD

and HF as well as incident CHD (inc CHD) and HF (inc HF), ***P110-8

. (I) Reduced post

CHD and overall survival probability for high E(PM26)

levels (red curve) compared to low

E(PM26)

levels (cyan curve). Controls are subjects free of the disease in question. Data were

analyzed using forward linear or logistic regression or Cox proportional hazards regression,

depending on the outcome being continuous, binary or a time to an event. Kaplan-Meier plots

were used to display survival probabilities. For more details see fig. S6 and tables S7 and S12.

The number of proteins per module are denoted at the branches of the dendrogram.

Page 24: Supplementary Materials for · Materials and Methods 1. The study cohort Cohort participants aged 66 through 96 were from the (AGES) – Reykjavik Study (12), a single-center prospective

24

Fig. S8. A volcano plot of the association of global serum proteins to prevalent CHD

diagnosed at different times before sampling. The plot demonstrates the significance –

log(Bonferroni adjusted P-value) as a function of effect sizes (log odds ratio), either when all

prevalent CHD cases (N=1,217) were included in the analysis (blue circles) or when only

CHD cases diagnosed with the disease within five years before entry in the AGES (N=700)

were included (orange circles). Two different aptamers were used to detect and measure

PCSK9. In terms of effect sizes variable levels of proteins associated with prevalent disease

like CHD were not affected by restricting the analysis to the time of diagnosis to the time of

sampling (see material and methods).

Page 25: Supplementary Materials for · Materials and Methods 1. The study cohort Cohort participants aged 66 through 96 were from the (AGES) – Reykjavik Study (12), a single-center prospective

25

A B C

D E F

G H I

Fig. S9. The relationship between network connectivity of proteins and disease related

measures and outcomes. (A) Spring graph of PM10 highlighting the hub protein DYRK3

located in the hub region of the module. (B) Positive correlation between within module

connectivity (Ki) (x-axis) of PM10 proteins and the absolute value of the effect (-coefficient)

size of their association to prevalent heart failure (HF) (y-axis), Pearson´s r=0.782, P=110-72

.

(C) Positive association of DYRK3 to prevalent HF, P<110-30

, and reduced overall survival

(all-cause mortality post entry into the AGES study cohort) associated with low serum

DYRK3 levels (cyan curve). (D) Spring graph of the PM16 showing location of the hub

HNRNPA1 within the hub region. (E) Positive correlation between Ki (x-axis) and the

association to incident coronary heart disease (inc CHD), r=0.712, P=110-22

. (F) Positive

association of HNRNPA1 to incident CHD, P=110-10

, and high serum levels of HNRNPA1

(red curve) predict reduced overall survival. (G) A spring graph of PM26 highlighting the

module´s hub FSTL3. (H) Positive correlation between Ki (x-axis) and the association to

prevalent HF, r=0.431, P=110-16

. (I) Positive association of FSTL3 to prevalent HF,

P<110-30

, and reduced overall survival associated with high serum FSTL3 levels (red curve).

Network visualization was performed with the igraph package in R (30). Controls are subjects

Page 26: Supplementary Materials for · Materials and Methods 1. The study cohort Cohort participants aged 66 through 96 were from the (AGES) – Reykjavik Study (12), a single-center prospective

26

free of the disease in question. Data were analyzed using forward linear or logistic regression

or Cox proportional hazards regression, depending on the outcome being continuous, binary

or a time to an event. Kaplan-Meier plots were used to display survival probabilities.

Page 27: Supplementary Materials for · Materials and Methods 1. The study cohort Cohort participants aged 66 through 96 were from the (AGES) – Reykjavik Study (12), a single-center prospective

27

A

B

Fig. S10. Preservation analysis of the serum protein network structure. (A) The cohort

was randomly split into two parts, 2/3 for a training set and a 1/3 for the test set, and the

summary Z score statistics plotted for each of the 27 modules presented as colored data

points. Here the summary Z score <2 (blue dotted line) indicates no preservation, 2< summary

Z score <10 (between the blue and green dotted lines) indicates moderate evidence of

preservation, while a summary Z score >10 (green dotted line) indicates strong evidence of

preservation. All the modules showed strong preservation or Z score >10. (B) Preservation of

the connectivity status for the top 10 hubs within each module (kWithin). The modules and

protein hubs highlighted are also presented in Fig. 3 and figs. S3 and S9.

Page 28: Supplementary Materials for · Materials and Methods 1. The study cohort Cohort participants aged 66 through 96 were from the (AGES) – Reykjavik Study (12), a single-center prospective

28

A

B C

Fig. S11. Highlighted examples of cis acting SNPs depending on genomic location and in

relation to network connectivity. (A) Cis-acting pSNPs may be located in intergenic

regions (rs7547965), or within genes including missense (rs1250259, NP_997647.1:

p.Gln15Leu), 5´-UTR (rs16923189), 3´-UTR (rs15881) or intronic (rs76426991). (B) Mean

total connectivity ±2CI (2× 95% Confidence Interval) for all significant cis effects (yes)

compared to proteins with no detectable cis effect (no), Student´s t-test P = 310-16

. (C)

Pearson, correlation between the absolute value for the -coefficient of all cis effects (x-axis)

vs. total connectivity of corresponding cis regulated proteins (y-axis), r = -0.231, P=110-15

.

Page 29: Supplementary Materials for · Materials and Methods 1. The study cohort Cohort participants aged 66 through 96 were from the (AGES) – Reykjavik Study (12), a single-center prospective

29

A

B C

D E

Fig. S12. Examples of GWAS risk loci affecting serum protein levels. (A) A box plot of

five cis regulated proteins by known GWAS loci listed in table S16. (B) Trans acting effects

at the rs1050362 GWAS locus and a corresponding boxplot of two proteins affected. (C)

Trans acting effects at the rs964184 GWAS locus and a boxplot of two proteins affected. (D)

The strong cis and trans acting effects at the CHD-associated locus rs579459 affecting 43

proteins in trans. The rs579459 mediates a strong proximal cis acting effect on serum ABO

levels as highlighted in the boxplot. Also shown are boxplots for two proteins regulated in

trans by rs579459. (E) The Venn diagram demonstrates a significant enrichment of the

rs579459 trans affected proteins within the PM27 module (Fisher Exact Test P = 210-10

).

Here, 18 out of 25 proteins regulated by rs579459 map to PM27. Chromosomal ideograms

were reprinted from the NCBI chromosome Map Viewer. The genotypes and pSNPs are at the

x-axis of each box plot while the normalized levels of serum proteins are denoted at the y-

axis.

Page 30: Supplementary Materials for · Materials and Methods 1. The study cohort Cohort participants aged 66 through 96 were from the (AGES) – Reykjavik Study (12), a single-center prospective

30

A B C

D E

F G

H I

Fig. S13. Selected examples of known GWAS risk loci for CHD, T2D and/or adiposity.

(A) A box plot of the trans regulated protein PROC at the CHD locus rs867186. (B) Trans

acting effects at the rs1892094 GWAS CHD locus and a boxplot of a protein affected by the

pSNP. (C) A trans acting effect at the rs1165669, another CHD GWAS locus, and a boxplot

Page 31: Supplementary Materials for · Materials and Methods 1. The study cohort Cohort participants aged 66 through 96 were from the (AGES) – Reykjavik Study (12), a single-center prospective

31

of a protein affected by the locus. (D) The well documented T2D locus rs7756992 at

CDKAL1 affects the protein MLN in trans as highlighted in the boxplot. (E) The T2D GWAS

locus rs3132524 exerts trans effects on five proteins including proteins in the corresponding

box plots. (F) The distribution of the ABO protein serum levels in the AGES study population

as per genotypes for the CHD lead SNP rs579459. (G) A strong cis acting effect on ABO

serum levels using a 300kb window across the ABO locus, also representing many well

established GWAS risk lead SNPs for various disease related outcome data (right panel). (H)

The distribution of the VTN protein serum levels in the AGES study population as per

genotypes for the npSNP rs704. (I) The E(PM11)

representing module PM11 is strongly

associated with LDL cholesterol and triglycerides (TG) but not HDL cholesterol, using

forward linear regression analysis. Chromosomal ideograms were reprinted from the NCBI

chromosome Map Viewer. The genotypes and pSNPs are at the x-axis of each box plot while

the normalized levels of proteins are denoted at the y-axis.

Page 32: Supplementary Materials for · Materials and Methods 1. The study cohort Cohort participants aged 66 through 96 were from the (AGES) – Reykjavik Study (12), a single-center prospective

32

A B

C D

E F

Fig. S14. Examples of replicated cis and trans effects reported by others. (A) Applying a

genomic window of 300kb across WFIKKN2, we detected a strong cis acting effect for

WFIKKN2. The lead pSNP rs9675120 is also associated with GDF11/8 levels acting in trans.

The T allele represents the major allele in the AGES. The rs9675120 is highly correlated

(r2=0.928) with the rs11079936 reported in Sun et al. (58). (B) There was a significant

positive correlation between the protein levels of WFIKKN2 and GDF11/8 in the AGES

cohort, Pearson´s r=0.498, P=110-241

. (C) The missense variant rs3197999 (NP_066278.3:

p.Arg703Cys) in MST1 mediated trans effects on 11 proteins in the AGES dataset. (D) Also,

the boxplot shows a strong cis effect on the proximal protein MST1 (P < 110-300

). Two trans

effects are highlighted as well. (E) The pSNP hotspot at rs28929474 (NP_001121179:

p.Glu366Lys) in SERPINA1 affects 17 proteins in trans at P < 110-5

. The regression values

in the table are based on copy T allele (also called the Z allele). Subjects homozygous for the

Z allele are not found in the AGES cohort. Many of these effects were also reported in Sun et

al. (58). (F). Boxplots of three proteins affected by the pQTL hotspot rs28929474. The

genotypes and pSNPs are at the x-axis of each box plot while the normalized levels of serum

proteins are denoted at the y-axis.

Page 33: Supplementary Materials for · Materials and Methods 1. The study cohort Cohort participants aged 66 through 96 were from the (AGES) – Reykjavik Study (12), a single-center prospective

33

Table S1. Annotation of the human proteins targeted in the present study Annotation of the 4,137 human protein targets detected with the custom-designed SOMAscan

platform.

(Excel table hosted online)

Page 34: Supplementary Materials for · Materials and Methods 1. The study cohort Cohort participants aged 66 through 96 were from the (AGES) – Reykjavik Study (12), a single-center prospective

34

Table S2. Descriptive statistics of the present study cohort for relevant measures

Baseline characteristics of the AGES Reykjavik study cohort: Numbers are mean(SD) for

continuous-, N(%) for categorical- and median[IQR] for skewed variables. Abbreviations:

SBP, systolic blood pressure; DBP, diastolic blood pressure; TOT-C, total cholesterol; LDL-

C, LDL cholesterol; TG, triglyceride; FG, fasting blood glucose; VAT, visceral adipose

tissue; SAT, subcutaneous adipose tissue; T2D, type 2 diabetes; MetS, metabolic syndrome;

CHD, coronary heart disease; HF, heart failure; N/A, not applicable.

*For sex differences, obtained from two sided T-test for continous-, χ2 test for categorical- and

quantile regression for skewed variables.

Characteristic Variable Males Females P-value* Total

Demographics

Numbers

Age (years)

2330 (42.7%)

76.7 (5.4)

3127 (57.3%)

76.5 (5.7)

N/A

0.280

5457

76.6 (5.6)

Anthropometry

BMI (kg/m2)

Obese (BMI>30)

26.9 (3.8)

439 (18.9%)

27.2 (4.8)

777 (24.9%)

0.004

<0.001

27.1 (4.4)

1216 (22.3%)

Physiological

SBP (mmHg)

DBP (mmHg)

TOT-C (mmol/L)

LDL-C (mmol/L)

TG (mmol/L)

FG (mmol/L)

VAT (cm2)

SAT (cm2)

143.2 (20.4)

76.2 (9.6)

5.2 (1.1)

3.2 (1.0)

1.0 [0.8,1.4]

5.9 (1.2)

203.0 (86.2)

203.4 (86.8)

142.2 (20.9)

72.2 (9.5)

6.0 (1.1)

3.7 (1.0)

1.1 [0.8,1.5]

5.7 (1.1)

150.3 (67.2)

294.9 (112.3)

0,075

<0.001

<0.001

<0.001

<0.001

<0.001

<0.001

<0.001

142.6 (20.7)

73.9 (9.7)

5.6 (1.2)

3.5 (1.0)

1.0 [0.8,1.4]

5.8 (1.2)

172.8 (80.2)

255.7 (111.7)

Medication

Antihypertension

Lipid lowering

1460 (62.7%)

656 (28.2%)

2016 (64.5%)

575 (18.4%)

0,169

<0.001

3476 (63.7%)

1231 (22.6%)

Lifestyle

Smoker

265 (11.7%)

390 (12.8%)

0.199

655 (12.3%)

Metabolic

T2D

MetS

363 (15.6%)

486 (20.9%)

291 (9.3%)

641 (20.5%)

<0.001

0.746

654 (12.0%)

1127 (20.7%)

Heart disease

CHD prevalent

CHD incl recurrent

CHD incident

HF prevalent

HF incl recurrent

HF incident

Followup yrs CHD

Followup yrs death

777 (33.6%)

938 (40.6%)

421 (27.4%)

101 (4.4%)

287 (12.4%)

233 (10.5%)

7.4 [3.2,10.1]

10.5 [6.2,12.3]

440 (14.2%)

681 (22.0%)

451 (17.0%)

71 (2.3%)

242 (7.8%)

207 (6.9%)

9.7 [5.8,10.8]

11.6 [8.1,12.8]

<0.001

<0.001

<0.001

<0.001

<0.001

<0.001

<0.001

<0.001

1217 (22.5%)

1619 (30.0%)

872 (20.8%)

172 (3.2%)

529 (9.8%)

440 (8.4%)

9.2 [4.4,10.6]

11.3 [7.2,12.6]

Page 35: Supplementary Materials for · Materials and Methods 1. The study cohort Cohort participants aged 66 through 96 were from the (AGES) – Reykjavik Study (12), a single-center prospective

35

Table S3. Direct assessment of aptamer specificity via DDA mass spectrometry List of proteins with confirmation by data dependent analysis (DDA) mass spectrometry after

SOMAmer enrichment in biological matrices. Column Biological Matrix; Cell line name if

detected in lysate or conditioned media (cm), otherwise noted as blood serum, blood plasma,

or urine biofluid. Column File Name: Refers to raw data file name uploaded to PRIDE

Proteome Exchange with five dataset identifiers PXD008819-PXD008823.

(Excel table hosted online)

Table S4. Direct assessment of aptamer specificity via MRM mass spectrometry

List of proteins with confirmation by multiple reaction monitoring (MRM) mass spectrometry

after SOMAmer enrichment in biological matrices. Cell line name if detected in lysate or

conditioned media (cm), otherwise noted as blood serum, blood plasma, or urine biofluid. The

MRM dataset has been deposited to Peptide Atlas PASSEL repository with the dataset

identifier PASS01145.

(Excel table hosted online)

Page 36: Supplementary Materials for · Materials and Methods 1. The study cohort Cohort participants aged 66 through 96 were from the (AGES) – Reykjavik Study (12), a single-center prospective

36

Table S5. Cross-platform validation of known links of proteins to disease related traits

Confirmation, via application of the custom designed SOMAscan platform, in the AGES

cohort, of known associations of protein biomarkers to relevant disease related outcomes

detected with conventional immunoassays. The beta coefficients (-coeff) were estimated

through either linear or logistic regression analysis. N/A, not applicable.

Protein Reference Trait Reported

levels

Prevalent disease, AGES

-coeff P-value

Incident disease, AGES

-coeff P-value

IL-18

CRP

SAA

IL6

NPPB

MPO

PAPPA

GDF15

LGALS3

ADIPOQ

LEP

IGFBP2

ADIPOQ

LEP

sLEPR

ADIPOQ

RBP4

FABP4

EDN1

NPPB

UCN3

LECT2

PAI-1

PTX3

21481392

20182820

20182820

10769275

20182820

20182820

20182820

27811204

22230397

19029992

29236298

22554827

11479627

27906690

12075576

11479627

18239568

17553506

8149524

24807464

19961889

28278265

8673927

21900125

CHD

CHD

CHD

CHD

CHD

CHD

CHD

CHD

CHD

T2D

T2D

T2D

SAT

SAT

SAT

MetS

MetS

MetS

HF

HF

HF

VAT

VAT

VAT

Elevated

Elevated

Elevated

Elevated

Elevated

Elevated

Elevated

Elevated

Elevated

Reduced

Elevated

Reduced

Reduced

Elevated

Reduced

Reduced

Elevated

Elevated

Elevated

Elevated

Elevated

Elevated

Elevated

Elevated

0.117

0.077

0.075

0.066

0.656

0.277

0.163

0.327

0.285

0.543

0.491

-0.632

-19.140

88.848

-15.472

-0.903

0.398

1.043

0.527

1.303

0.250

10.670

8.190

3.511

0.0007

0.02

0.02

0.045

1e-64

9e-16

6e-07

3e-19

9e-16

1e-55

1e-41

<1e-258

3e-37

<1e-300

4e-29

<1e-300

4e-24

<1e-300

1e-13

<1e-300

0.001

2e-23

7e-15

0.0008

0.094

0.176

0.087

0.111

0.401

0.159

0.128

0.300

0.179

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

0.239

0.807

0.165

N/A

N/A

N/A

0.002

1e-08

0.004

0.0002

1e-26

3e-07

0.0002

2e-18

2e-08

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

2e-07

<1e-300

0.0005

N/A

N/A

N/A

Page 37: Supplementary Materials for · Materials and Methods 1. The study cohort Cohort participants aged 66 through 96 were from the (AGES) – Reykjavik Study (12), a single-center prospective

37

Table S6. The degree validation of aptamer specifcity for all human proteins measured

in the present study A summary of direct and/or inferred validation of aptamer specificity for the 4,137 human

proteins detected in the present study.

(Excel table hosted online)

Table S7. The modules of the serum protein network and corresponding proteins Annotation of the modules and the proteins that constitute each module of the serum protein

network together with information related to degree connectivity (kWithin, kOut, and kTotal).

(Excel table hosted online)

Page 38: Supplementary Materials for · Materials and Methods 1. The study cohort Cohort participants aged 66 through 96 were from the (AGES) – Reykjavik Study (12), a single-center prospective

38

Table S8. Enrichment of functional categories in the different modules

Functional categories and tissue/cell specific signatures enriched in the different protein

modules using annotation tools like WebGestalt, DAVID, GeneMANIA and CTen (64-67).

Modules are ordered and annotated according to their inter-module relationship. N/A, not

applicable.

Module Size Over-represented

pathways & tissue signatures

FDR P-value

(Bonferroni

adjusted)

Database

PM1 31 Signal peptide

Autoimmunity

Notch signaling

BDCA4+ dentritic cells

N/A

0.03

0.00001

0.01

0.0007

N/A

N/A

N/A

DAVID

WebGestalt

GeneMANIA

CTen

PM2 86 Signal peptide

Circadian rhythm

Adenocarcinoma

Lymphocyte mediated immunity

Whole blood

N/A

N/A

0.00002

0.0002

0.006

2e-07

0.006

N/A

N/A

N/A

DAVID

DAVID

WebGestalt

GeneMANIA

CTen

PM3 921 Signal peptide

Growth factor activity

MAPK cascade

Zymogen

Cytokine-cytokine receptor

JAK - STAT signaling

PI3K – AKT signaling

Immune system diseases

Hypotension

Smooth muscle

Pancreas

N/A

N/A

N/A

N/A

1e-30

2e-11

1e-10

1e-06

0.0008

0.002

0.028

1e-78

1e-25

1e-12

1e-07

N/A

N/A

N/A

N/A

N/A

N/A

N/A

DAVID

DAVID

DAVID

DAVID

WebGestalt

WebGestalt

WebGestalt

WebGestalt

WebGestalt

CTen

CTen

PM4 86 Signal peptide

Pattern recognition receptor activity

Hepatitis B

SIDS

Rheumatoid arthritis

Leukemia lymphoblastic

N/A

1e-06

0.00005

0.005

0.02

0.002

0.002

N/A

N/A

N/A

N/A

N/A

DAVID

GeneMANIA

WebGestalt

WebGestalt

WebGestalt

CTen

PM5 65 Extracellular exosome

IkB / NF-kB signaling pathway

CD33+ Myeloid

Skin

N/A

0.0001

0.002

0.007

0.002

N/A

N/A

N/A

DAVID

GeneMANIA

CTen

CTen

PM6 157 Signal peptide

Calcium ion transport

Heart valve disease

Cardiac myocytes

N/A

0.004

0.04

0.002

5e-15

N/A

N/A

N/A

DAVID

GeneMania

WebGestalt

CTen

PM7 88 Protein binding N/A 0.02 DAVID

PM8 84 Signal peptide

Four helical cytokine core

Natural killer cell activation

Intravascular coagulation

N/A

N/A

2e-06

0.03

3e-07

0.01

N/A

N/A

DAVID

DAVID

GeneMania

WebGestalt

PM9 286 Signal peptide

Growth factor binding

Complement and coagulation

Liver

Pancreatic islets

N/A

0.0002

0.002

0.002

0.006

1e-31

N/A

N/A

N/A

N/A

DAVID

GeneMania

WebGestalt

CTen

CTen

PM10 312 Signal peptide N/A 8e-28 DAVID

Page 39: Supplementary Materials for · Materials and Methods 1. The study cohort Cohort participants aged 66 through 96 were from the (AGES) – Reykjavik Study (12), a single-center prospective

39

Leukocyte differentiation

Fc-epsilon receptor

Inate immune system

Lung diseases

Globus pallidus

Cingulate cortex subthalamic

0.0004

0.0005

0.001

0.004

0.002

0.002

N/A

N/A

N/A

N/A

N/A

N/A

GeneMania

GeneMania

WebGestalt

WebGestalt

CTen

CTen

PM11 26 Secreted proteins

Lipoprotein particles

Sterol homeostasis

Familial hypercholesterolemia

Adrenal gland

Fetal liver

N/A

1e-12

1e-10

0.002

0.02

0.03

0.01

N/A

N/A

N/A

N/A

N/A

DAVID

GeneMania

GeneMania

WebGestalt

CTen

CTen

PM12 69 Signal peptide

Telomere maintenance

Ovary

Atrioventricular node

N/A

0.00004

0.01

0.01

0.00003

N/A

N/A

N/A

DAVID

GeneMania

CTen

CTen

PM13 318 Signal peptide

Biological rhythms

Epstein-Barr virus infection

Skeletal muscle

Uterus

N/A

N/A

0.009

0.0002

0.006

1e-09

0.006

N/A

N/A

N/A

DAVID

DAVID

WebGestalt

CTen

CTen

PM14 81 Signal peptide

Bone marrow

N/A

0.01

0.00003

N/A

DAVID

CTen

PM15 118 Signal peptide

TNF mediated signaling

Kaposi sarcoma

T- cell activation

N/A

N/A

0.01

0.004

1e-08

0.0005

N/A

N/A

DAVID

DAVID

WebGestalt

GeneMania

PM16 170 Poly(A) RNA binding

Acetylation

Ubiquitin conjugation

Secreted proteins

Antibiotic activity

Neutrophil degranulation

Inflammation

Liver carcinoma

RNA spliceosome

Bone marrow

CD33+ myeloid

N/A

N/A

N/A

N/A

N/A

1e-12

1e-06

0.01

1e-07

5e-14

4e-13

1e-08

3e-07

8e-07

0.00001

0.00002

N/A

N/A

N/A

N/A

N/A

N/A

DAVID

DAVID

DAVID

DAVID

DAVID

WebGestalt

WebGestalt

WebGestalt

GeneMania

CTen

CTen

PM17 53 Acetylation

Phosphoprotein

Stress

Vesicle mediated transport

SNARE complex

N/A

N/A

0.0009

0.003

0.002

1e-06

3e-06

N/A

N/A

N/A

DAVID

DAVID

WebGestalt

WebGestalt

GeneMania

PM18 83 Cytoplasm

ERBB signaling pathway

Platelet activation

EGF / EGFR signaling pathway

Drug-drug interaction

CD28 costimulation

Focal adhesion

FCg mediated phagocytosis

CCKR signaling

Angiogenesis

CD56+ NK Cells

N/A

2e-13

3e-13

1e-12

1e-10

1e-08

5e-07

3e-06

0.0004

0.01

0.0003

1e-15

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

DAVID

GeneMania

GeneMania

WebGestalt

WebGestalt

WebGestalt

WebGestalt

WebGestalt

WebGestalt

WebGestalt

CTen

Page 40: Supplementary Materials for · Materials and Methods 1. The study cohort Cohort participants aged 66 through 96 were from the (AGES) – Reykjavik Study (12), a single-center prospective

40

CD19+ B cells 0.0028 N/A CTen

PM19 81 Acetylation

Extracellular exosomes

Hereditary hemolytic anemia

Cofactor metabolic process

Protein folding

CD71+ early erythroid

CD105+ endothelial

N/A

N/A

N/A

0.0005

0.002

1e-07

0.0001

1e-28

2e-14

0.00001

N/A

N/A

N/A

N/A

DAVID

DAVID

DAVID

GeneMania

GeneMania

CTen

CTen

PM20 32 Signal peptide

Immunoglobulin C1

Lymph node

Small intestine

N/A

N/A

0.01

0.02

4e-10

2e-06

N/A

N/A

DAVID

DAVID

CTen

CTen

PM21 18 Cellular ion homeostasis 0.004 N/A GeneMania

PM22 39 Signal peptide

Calcium ion binding

Bronchial epithelial cells

Adipocyte

N/A

N/A

0.01

0.02

3e-08

0.00002

N/A

N/A

DAVID

DAVID

CTen

CTen

PM23 35 Extracellular exosome

Biosynthesis of antibiotics

NAD(P)-binding domains

Disease mutation

Metabolic pathways

Amino acid metabolism

Carbon metabolism

Metabolism, inborn errors

Ethanol oxidation

Oxidoreductase

Liver

Kidney

Small intestine

Adrenal gland

N/A

N/A

N/A

N/A

1e-10

5e-10

0.00001

0.0001

1e-09

3e-07

2e-14

7e-06

0.00008

0.0003

3e-08

1e-07

3e-06

0.0001

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

DAVID

DAVID

DAVID

DAVID

WebGestalt

WebGestalt

WebGestalt

WebGestalt

GeneMania

GeneMania

CTen

CTen

CTen

CTen

PM24 37 Secreted proteins

Protein activation cascade

Vesicle lumen

Complement activation

Platelet degranulation

Thrombosis

Fetal liver

Fetal lung

Lymph node

N/A

1e-18

1e-15

2e-10

2e-09

3e-09

1e-21

5e-10

6e-06

1e-27

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

DAVID

GeneMania

GeneMania

GeneMania

WebGestalt

WebGestalt

CTen

CTen

CTen

PM25 30 Signal peptide N/A 0.003 DAVID

PM26 390 Signal peptide

Extracellular exosome

Ephrin receptor signaling

Inflammation

Glomerular filtration rate

Spontenous abortion

Axon guidance

Osteoporosis

Prostatic neoplasms

Smooth muscle

Adipocyte

Lung

N/A

N/A

2e-08

5e-08

0.00001

0.00002

0.00002

0.04

0.04

1e-06

4e-06

6e-06

6e-101

1e-19

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

DAVID

DAVID

GeneMania

WebGestalt

WebGestalt

WebGestalt

WebGestalt

WebGestalt

WebGestalt

CTen

CTen

CTen

PM27 378 Signal peptide N/A 1e-113 DAVID

Page 41: Supplementary Materials for · Materials and Methods 1. The study cohort Cohort participants aged 66 through 96 were from the (AGES) – Reykjavik Study (12), a single-center prospective

41

Extracellular exosome

Cell adhesion (CAMs)

Extracellular matrix organization

Collagen diseases

Vascular diseases

Axon guidance

Neoplasm metastasis

Osteoblast signaling

Adipocyte

Uterus

Smooth muscle

N/A

1e-30

1e-18

7e-07

1e-06

0.00001

0.0005

0.005

2e-15

6e-12

1e-09

1e-20

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

DAVID

WebGestalt

WebGestalt

WebGestalt

WebGestalt

WebGestalt

WebGestalt

WebGestalt

CTen

CTen

CTen

Page 42: Supplementary Materials for · Materials and Methods 1. The study cohort Cohort participants aged 66 through 96 were from the (AGES) – Reykjavik Study (12), a single-center prospective

42

Table S9. Tissue specific expression of individual serum proteins GTEx gene expression data (https://www.gtexportal.org) related to potential tissue of origin

of individual proteins. The Z>9.24 represents the top 0.5% of all tissue-specific Z-scores for

the proteins measured.

(Excel table hosted online)

Table S10. Tissue specific expression of serum protein modules

GTEx gene expression data (https://www.gtexportal.org) related to potential tissue of origin

of individual protein modules using a Z>2.75 cut-off, i.e. the top 2.5% of tissue specificity.

The numbers refer to percentage of all proteins in each module passing this cut-off.

(Excel table hosted online)

Page 43: Supplementary Materials for · Materials and Methods 1. The study cohort Cohort participants aged 66 through 96 were from the (AGES) – Reykjavik Study (12), a single-center prospective

43

Table S11. Enrichment of functional categories in the different superclusters

Functional categories enriched in the five super-clusters using annotation tools like

WebGestalt, DAVID, GeneMANIA and CTen (64-67). N/A, not applicable. GEFs,

guanine nucleotide exchange factors.

Modules Super-

cluster

Over-representation of

pathways & tissues

FDR P-value

(Bonferroni

adjusted)

Database

PM1 I Signal peptide

Autoimmunity

Notch signaling

BDCA4+ dendritic cells

N/A

0.03

0.00001

0.01

0.0007

N/A

N/A

N/A

DAVID

WebGestalt

GeneMANIA

CTen

PM2-10 II Signal peptide

Immune diseases

Necrosis

Inflammation

Cytokine

Growth factor

Jak STAT signaling

PI3K-AKT signaling

N/A

1e-100

1e-90

1e-90

1e-34

1e-33

3e-18

1e-15

1e-169

N/A

N/A

N/A

N/A

N/A

N/A

N/A

DAVID

WebGestalt

WebGestalt

WebGestalt

WebGestalt

WebGestalt

WebGestalt

WebGestalt

PM11-15 III Signal peptide

MAPK cascade

Ras GEFs

Extracellular matrix

N/A

N/A

N/A

3e-30

1e-31

3e-18

2e-18

N/A

DAVID

DAVID

DAVID

WebGestalt

PM16-19 IV Extracellular exosomes

Kit receptor signaling

Drug-drug interaction

Nucleotide binding

Fc epsilon RI pathway

Bone marrow

CD33+ myeoloid

N/A

1e-30

1e-20

7e-12

1e-06

1e-14

1e-12

4e-28

N/A

N/A

N/A

N/A

N/A

N/A

DAVID

WebGestalt

WebGestalt

WebGestalt

WebGestalt

CTen

CTen

PM20-27 V Signal peptide

Extracellular exosome

Biological adhesion

Neoplasm invasivness

Angiogenesis

Axon guidance

Adipocyte

Smooth muscle

Lung

N/A

N/A

1e-43

1e-20

1e-09

1e-10

1e-20

1e-15

1e-13

1e-251

3e-56

N/A

N/A

N/A

N/A

N/A

N/A

N/A

DAVID

DAVID

WebGestalt

WebGestalt

WebGestalt

WebGestalt

CTen

CTen

CTen

Page 44: Supplementary Materials for · Materials and Methods 1. The study cohort Cohort participants aged 66 through 96 were from the (AGES) – Reykjavik Study (12), a single-center prospective

44

Table S12. Association of the modules E(q)

s to disease related phenotypic measures

Correlation of different modules E(q)

s to various disease related outcomes in the AGES study

cohort. The significance threshold of module trait correlations to outcome data was set at a

conservative P-value <110-7

. N/A, not applicable; NS, not significant.

E(module)

Size Super-

cluster

Outcome* Data N cases, events,

measurements

Direction

of effect

P-value

PM1 31 I VAT

MetS

SAT

T2D

Survival

Survival

CHD

HF

Prevalent

Prevalent

Prevalent

Prevalent

Post CHD

Overall

Incident

Incident

5239

1127

5239

654

692

2982

872

440

Direct

Direct

Direct

Direct

Direct

Direct

Direct

Direct

1e-65

1e-55

5e-27

2e-19

1e-17

<1e-14

6e-13

1e-11

PM2 86 II N/A N/A N/A N/A NS

PM3 921 II N/A N/A N/A N/A NS

PM4 86 II CHD

VAT

Prevalent

Prevalent

1217

5239

Inverse

Direct

2e-13

6e-12

PM5 65 II HF

MetS

CHD

HF

Prevalent

Prevalent

Prevalent

Incident

172

1127

1217

440

Inverse

Inverse

Inverse

Inverse

4e-18

2e-14

8e-14

8e-09

PM6 157 II HF

CHD

HF

Survival

Prevalent

Prevalent

Incident

Overall

172

1217

440

2982

Inverse

Inverse

Inverse

Inverse

1e-26

3e-14

2e-12

8e-08

PM7 88 II HF

Survival

Prevalent

Overall

172

2982

Inverse

Inverse

3e-20

2e-10

PM8 84 II VAT

HF

Survival

Prevalent

Prevalent

Overall

5239

172

2982

Direct

Inverse

Inverse

2e-22

4e-11

1e-09

PM9 286 II HF

VAT

CHD

HF

Survival

Prevalent

Prevalent

Prevalent

Incident

Overall

172

5239

1217

440

2982

Inverse

Direct

Inverse

Inverse

Inverse

2e-23

1e-13

1e-09

2e-09

3e-09

PM10 312 II HF

Survival

Prevalent

Overall

172

2982

Inverse

Inverse

7e-17

1e-12

PM11 26 III MetS

CHD

Prevalent

Prevalent

1127

1217

Direct

Direct

1e-15

1e-14

PM12 69 III N/A N/A N/A N/A NS

PM13 318 III N/A N/A N/A N/A NS

PM14 81 III N/A N/A N/A N/A NS

PM15 118 III N/A N/A N/A N/A NS

PM16 170 IV CHD

CHD

VAT

MetS

Survival

Prevalent

Incident

Prevalent

Prevalent

Overall

1217

872

5239

1127

2982

Direct

Direct

Direct

Direct

Direct

1e-18

5e-17

3e-16

1e-12

2e-12

Page 45: Supplementary Materials for · Materials and Methods 1. The study cohort Cohort participants aged 66 through 96 were from the (AGES) – Reykjavik Study (12), a single-center prospective

45

*Survival probability was estimated either as post incident CHD or overall survival post entry into the

AGES study (see material and methods). Data were analyzed using forward linear or logistic

regression or Cox proportional hazards regression, depending on the outcome being continuous, binary

or a time to an event. Abbreviations: MetS, metabolic syndrome; VAT, visceral adipose tissue via CT;

SAT, subcutaneous adipose tissue via CT; T2D, type 2 diabetes; CHD, coronary heart disease; HF,

heart failure. See table S2 for descriptive statistics of the study cohort.

HF

HF

Prevalent

Incident

172

440

Direct

Direct

3e-12

2e-09

PM17 53 IV HF

CHD

HF

CHD

Survival

Survival

SAT

MetS

Prevalent

Prevalent

Incident

Incident

Overall

Post CHD

Prevalent

Prevalent

172

1217

440

872

2982

692

5339

1127

Direct

Direct

Direct

Direct

Direct

Direct

Direct

Direct

2e-22

6e-22

1e-18

3e-18

<1e-16

3e-11

1e-10

1e-09

PM18 83 IV N/A N/A N/A N/A NS

PM19 81 IV N/A N/A N/A N/A NS

PM20 32 V N/A N/A N/A N/A NS

PM21 18 V N/A N/A N/A N/A NS

PM22 39 V N/A N/A N/A N/A NS

PM23 35 V VAT

MetS

SAT

T2D

CHD

Prevalent

Prevalent

Prevalent

Prevalent

Prevalent

5239

1127

5239

654

1217

Direct

Direct

Direct

Direct

Direct

6e-90

8e-65

4e-42

5e-32

2e-14

PM24 37 V VAT

MetS

SAT

Prevalent

Prevalent

Prevalent

5239

1127

5239

Inverse

Inverse

Inverse

2e-18

3e-12

4e-10

PM25 30 V N/A N/A N/A N/A NS

PM26 390 V HF

HF

Survival

CHD

CHD

Survival

Prevalent

Incident

Overall

Prevalent

Incident

Post CHD

172

440

2982

1217

872

692

Direct

Direct

Direct

Direct

Direct

Direct

5e-20

2e-18

<1e-16

5e-13

2e-10

1e-08

PM27 378 V VAT

MetS

HF

Survival

Prevalent

Prevalent

Prevalent

Overall

5239

1127

172

2982

Inverse

Inverse

Direct

Direct

6e-34

4e-11

6e-09

9e-08

Page 46: Supplementary Materials for · Materials and Methods 1. The study cohort Cohort participants aged 66 through 96 were from the (AGES) – Reykjavik Study (12), a single-center prospective

46

Table S13. Cis-acting serum pSNP-protein pairs All cis-acting pSNP-protein pairs detected within a 300kb window across and including a

given serum protein encoding gene. For each specific cis effect we report the single strongest

one (lead pSNP), and do not consider multiple independent cis effects per region.

(Excel table hosted online)

Table S14. Cross-referencing cis acting serum pSNP-proteins with eSNP-transcript

pairs Matching cis pSNP-proteins to expression eSNPs-transcripts pairs identified in >30 solid

tissues or cell types, using the stringent cutoffs of P < 5e-08 for significance and r2≥0.8 for

SNP proxy.

(Excel table hosted online)

Table S15. Cross-referencing cis acting serum pSNPs with GWAS lead SNPs

Cross-referencing cis pSNPs to genome-wide significant GWAS lead SNPs, using the

stringent cutoffs of P < 5×10-8 for significance and r2≥0.8 for pSNP proxy.

(Excel table hosted online)

Page 47: Supplementary Materials for · Materials and Methods 1. The study cohort Cohort participants aged 66 through 96 were from the (AGES) – Reykjavik Study (12), a single-center prospective

47

Table S16. Highlighted examples of serum pSNP-protein pairs underlying GWAS risk

Selected examples of genome-wide significant GWAS risk loci for various disease-related

outcomes (68), showing cis and/or trans acting effects on serum protein levels in the AGES

study population. P-value threshold for significant trans effects was set at P<510-7

based on

Bonferroni corrections (20 GWAS loci and SOMAmers tested). The proximal cis acting

window was 150kb from 5´ and 3´ of a gene and including the gene in question. MA, minor

allele.

GWAS

risk locus

Phenotype Reported

gene

PMID Protein(s)

affected

Cis or

trans Effect ()

per MA

P-value

rs1892094 CHD ATP1B1 28530674 SIGLEC11

EHBP1

Trans

Trans

-0.134

-0.157

3e-10

7e-08

rs2820315 CHD LMOD1 28530674 RNPEP Cis -0.157 2e-09

rs2258287 CHD HNF1A 28530674 CRP Trans -0.156 4e-09

rs1050362 CHD DHX38 28530674 APOL1

SERPIND1

APOA1

Trans

Trans

Trans

0.373

-0.148

0.130

6e-51

1e-08

5e-08

rs867186 CHD PROCR 28530674 PROC Trans 0.630 6e-59

rs964184 CHD ZNF259

APOA1

APOC3

APOA4

APOA5

21378990 APOA5

NXPH2

PCSK7

ANGPTL3

FAM159B

APOC3

LRP1B

Cis

Trans

Trans

Trans

Trans

Cis

Trans

-0.371

-0.276

-0.272

0.213

0.206

0.194

0.189

8e-24

2e-13

3e-13

8e-09

3e-08

6e-08

7e-08

rs1165669 CHD HSP90B1 21626137

26343387

HSP90B1

HNRNPM

Cis

Trans

0.836

0.564

6e-228

7e-95

rs10840293 CHD SWAP70 26343387 SWAP70 Cis 0.281 7e-30

rs579459 CHD ABO

LCN1P2

21378990

ABO

SELE

ADGRF5

ROBO4

IL3RA

QSOX2

INSR

ICAM2

FAM3D

KDR

EPHA4

ICAM5

FLT4

F8

ENG

ISLR2

KIN

GOLM1

CD200

MET

GLCE

LIFR

C1GALT1C1

SHANK3

ICAM4

ACE

Cis

Trans

Trans

Trans

Trans

Trans

Trans

Trans

Trans

Trans

Trans

Trans

Trans

Trans

Trans

Trans

Trans

Trans

Trans

Trans

Trans

Trans

Trans

Trans

Trans

Trans

1.087

-0.963

-0.815

-0.730

-0.613

0.625

-0.595

-0.593

0.581

-0.508

-0.492

-0.457

-0.442

0.393

-0.394

-0.387

-0.376

0.379

-0.357

-0.367

0.360

-0.337

0.326

0.314

-0.323

-0.316

8e-244

9e-193

8e-125

1e-114

5e-81

4e-73

3e-70

3e-67

8e-66

5e-50

2e-49

5e-40

1e-36

9e-32

1e-30

1e-28

2e-28

2e-28

2e-25

3e-25

1e-24

1e-22

2e-20

4e-20

5e-20

2e-19

Page 48: Supplementary Materials for · Materials and Methods 1. The study cohort Cohort participants aged 66 through 96 were from the (AGES) – Reykjavik Study (12), a single-center prospective

48

CHST15

SELP

IGF1R

CDH5

VWF

SEMA6A

L1CAM

CD109

CCL28

IL6ST

CHST12

DPEP2

JAG1

MBL2

B3GNT2

GNS

PEAR1

Trans

Trans

Trans

Trans

Trans

Trans

Trans

Trans

Trans

Trans

Trans

Trans

Trans

Trans

Trans

Trans

Trans

-0.292

-0.248

-0.237

-0.242

0.244

-0.236

-0.232

-0.211

-0.217

-0.202

0.199

-0.196

-0.185

0.194

0.191

-0.189

-0.184

1e-17

7e-13

1e-12

1e-12

2e-12

6e-12

2e-11

7e-10

8e-10

3e-09

1e-08

2e-08

2e-08

2e-08

5e-08

7e-08

9e-08

rs6235

Adiposity

Proinsulin

PCSK1

PCSK1

18604207

21873549

PCSK1 Cis 0.979 1e-300

rs7756992 T2D CDKAL1 24509480 MLN Trans 0.245 2e-12

rs3132524 T2D TCF19

POU5F1

24509480 C4A/B

TGM3

DNAJC10

KIR2DS2

H6PD

Trans

Trans

Trans

Trans

Trans

0.244

0.215

0.204

0.164

0.145

2e-20

6e-17

3e-15

2e-10

4e-08

rs16861329 T2D ST6GAL1 21874001 ST6GAL1 Trans 0.193 2e-08

Page 49: Supplementary Materials for · Materials and Methods 1. The study cohort Cohort participants aged 66 through 96 were from the (AGES) – Reykjavik Study (12), a single-center prospective

49

Table S17. Identification of cis-to-trans effects on serum proteins

All cis-to-trans pSNP-protein pairs effects detected in the present study using P < 1×10-8

after

Bonferroni corrections (number of proteins and number of cis effects) for significant hits.

(Excel table hosted online)

Page 50: Supplementary Materials for · Materials and Methods 1. The study cohort Cohort participants aged 66 through 96 were from the (AGES) – Reykjavik Study (12), a single-center prospective

50

Table S18. Replication of previously reported cis and trans pSNPs

Confirmation and comparison, via application of the custom designed SOMAscan platform in

the AGES cohort, of known cis and trans acting pSNPs-proteins across different study

populations and proteomic technologies. The percentage confirmed applies to proteins

detected in the AGES. N/A, not applicable. See material and methods for more details.

Study (reference) Platform Number % Confirmed

Cis Trans Subjects Proteins Cis Trans

Johansson et al. (54) Mass

spectometry

1,060 163 5 0 100 N/A

Kim et al. (55)

Immunoassay 521 132 28 0 73.9 N/A

Enroth et al. (56) Immunoassay 1,005 92 23 0 62.5 N/A

Liu et al. (57) Mass

spectometry

232 342 13 0 87.5 N/A

Suhre et al. (59) SOMAmers 1,000 1,124 384 148 88.3 84.5

Sun et al. (58) SOMAmers 3,301 2,994 552 1,104 75.7 72.8

Emilsson et al.

(present study)

SOMAmers 5492 4,173 1,046 911 N/A N/A

Page 51: Supplementary Materials for · Materials and Methods 1. The study cohort Cohort participants aged 66 through 96 were from the (AGES) – Reykjavik Study (12), a single-center prospective

51

Table S19. Common variants associated with module E(q)

s

Identification of genetic variants associated with different modules E(q)

s. Associations were

considered genome-wide significant when P<510-8

(Bonferroni adjusted) while the P-values

for suggestive evidence of association were between 510-8

and 510-6

. N/A, not applicable.

E(q)

Lead

npSNP

P-value Known GWAS

SNP (r2≥0.8)*

GWAS

phenotype**

PMID***

PM1 rs204896 1e-09 rs204896 RA 24390342

PM2 rs704 3e-11 rs704 OPG 25080503

PM3 rs6813952

rs7144389

3e-09

2e-12

None

None

N/A

N/A

N/A

N/A

PM4 rs10761731 1e-08 rs10761731 Platelets, TG

22139419

20686565

PM6 rs13026392 6e-07 None N/A N/A

PM7 rs704

rs887829

<1e-300

1e-25

rs704

rs887829

OPG

Bilirubin

25080503

19414484

PM9 rs1250229 2e-39 rs1250229 LDL 24097068

PM10 rs704 1e-70 rs704 OPG 25080503

PM11 rs445925

rs157582

rs6857

rs1803274

1e-88

1e-86

1e-66

1e-15

rs157582

rs445925

rs6857

rs844200

rs6445035

LDL, CHD, Lp-PLA2

LOAD, TG

LOAD, LDL

BCHE

ASPA

28334899

22005930

24162737

21862451

23508960

PM12 rs17836931 1e-07 None N/A N/A

PM13 rs1329424

rs541862

9e-13

2e-11

rs1329424

rs541862

AMD, NV

RA, AMD, NV

23455636

24390342

23455636

PM14 rs541862 3e-10 rs541862 RA, AMD, NV 24390342

23455636

PM15 rs1329424

rs389512

6e-18

1e-17

rs1329424

rs389512

rs406936

AMD, NV

AMD, RA, NV

T1D

23455636

22694956

24390342

PM16 rs1970793 2e-07 None N/A N/A

PM17 rs17080938 1e-07 None N/A N/A

PM18 rs2562545 1e-09 None N/A N/A

PM19 rs17091323 3e-08 None N/A N/A

PM20 rs719482

rs2885162

4e-66

2e-07

None

None

N/A

N/A

N/A

N/A

PM21 rs719482 1e-180 None N/A N/A

PM23 rs357707 3e-10 None N/A N/A

PM26 rs881029 1e-07 None N/A N/A

PM27 rs6683597 4e-08 None N/A N/A

*For a qualified proxy the correlation between npSNP and corresponding lead GWAS SNP was

r2≥0.8.

**RA, rheumatoid arthritis; OPG, Osteoprotegerin levels; Metabolites, blood metabolites; LDL, LDL-

cholesterol levels; LOAD, late-onset Alzheimer´s disease; TG, triglyceride levels; BCHE,

butyrylcholinesterase; ASPA, plasma aspirin activity; AMD, age-related macular degeneration; NV,

neovascularization

***Known GWAS findings are reported in the PhenoScanner (20), and/or the GWAS catalogue (68).

Page 52: Supplementary Materials for · Materials and Methods 1. The study cohort Cohort participants aged 66 through 96 were from the (AGES) – Reykjavik Study (12), a single-center prospective

52

Table S20. Effects of network associated SNP (npSNP) on individual serum proteins

The SNPs associated with module E(q)

s listed in table S19 mediated cis and trans acting

effects on multiple proteins which cluster within specific protein modules. The genome-wide

significant association threshold for individual cis and trans effects mediated by the npSNPs

was set at Bonferroni adjusted P<510-7

(corrected for number of aptamers and npSNPs

tested). FET, Fisher exact test. N/A, not applicable.

E(q)

Lead

npSNP

Adjacent

cis effect(s)

#Trans

effects

Module

affected

#Cis and trans

effects in module

FET

P-value

PM1 rs204896

C4B, TNXB 78 PM13

PM15

33

18

1e-19

6e-13

PM2 rs704 VTN

698 PM2

PM4

PM6

PM7

PM10

27

34

68

87

160

1e-06

3e-10

4e-21

1e-75

4e-54

PM3 rs6813952 None 81 PM3 67 4e-39

PM4 rs10761731 None 27 PM3 18 4e-09

PM6 rs13026392 None 61 PM6

PM7

22

15

1e-17

4e-13

PM7

rs704

rs887829

VTN

UGT1A6

698

8

PM2

PM4

PM6

PM7

PM10

PM1

27

34

68

87

160

7

1e-06

3e-10

4e-21

1e-75

4e-54

1e-14

PM9 rs1250229 FN1 6 None N/A N/A

PM10 rs704

VTN

698

PM2

PM4

PM6

PM7

PM10

27

34

68

87

160

1e-06

3e-10

4e-21

1e-75

4e-54

PM11 rs445925

rs157582

rs6857

rs1803274

APOE

APOE

None

BCHE

37

37

35

20

PM11

PM11

PM11

PM11

16

19

19

9

4e-25

1e-31

5e-32

3e-15

PM12 rs17836931 None 27 PM12

PM14

6

8

2e-06

1e-08

PM13 rs1329424

rs541862

CFHR1, 4, 5

C4A/B, CFB

129

106

PM13

PM15

PM13

PM15

48

32

55

37

1e-24

3e-22

6e-37

3e-31

PM14 rs541862 C4A/B, CFB 106 PM13

PM15

55

37

6e-37

3e-31

PM15 rs1329424

rs389512

CFHR1, 4, 5

C4A/B, CFB

129

158

PM13

PM15

PM13

PM15

48

32

68

46

1e-25

3e-22

1e-38

4e-34

PM16 rs1970793 None 19 PM16 15 4e-19

PM17 rs17080938 None 7 PM17 5 3e-09

PM18 rs2562545 None 9 None N/A N/A

PM19 rs17091323 None 11 PM19

PM26

3

5

0.0006

0.0007

PM20 rs719482 IGHG1-4 68 PM20 25 7e-34

Page 53: Supplementary Materials for · Materials and Methods 1. The study cohort Cohort participants aged 66 through 96 were from the (AGES) – Reykjavik Study (12), a single-center prospective

53

rs2885162

None

10

PM25

PM20

23

6

6e-31

2e-11

PM21 rs719482

IGHG1-4

68

PM20

PM25

25

23

7e-34

6e-31

PM23 rs357707 None 20 PM23 11 2e-18

PM26 rs881029 None 43 PM26 31 8e-26

PM27 rs6683597 None 28 PM27

PM26

15

9

1e-10

0.0001

Page 54: Supplementary Materials for · Materials and Methods 1. The study cohort Cohort participants aged 66 through 96 were from the (AGES) – Reykjavik Study (12), a single-center prospective

54

Table S21. Tissue specificity of cis-to-trans protein pairs Tissue specific expression of transcripts encoding the cis-to-trans regulated proteins based on

53 different human tissues (median RPKM by tissue) downloaded from GTEx

(https://www.gtexportal.org) on 07/25/2017.

(Excel table hosted online)

Table S22. Tissue specificity of npSNPs

Tissue-specificity of network-associated protein SNPs (npSNPs).

(Excel table hosted online)

Page 55: Supplementary Materials for · Materials and Methods 1. The study cohort Cohort participants aged 66 through 96 were from the (AGES) – Reykjavik Study (12), a single-center prospective

55

References and Notes 1. J. M. Schwenk, G. S. Omenn, Z. Sun, D. S. Campbell, M. S. Baker, C. M. Overall, R.

Aebersold, R. L. Moritz, E. W. Deutsch, The Human Plasma Proteome Draft of 2017: Building on the Human Plasma PeptideAtlas from Mass Spectrometry and Complementary Assays. J. Proteome Res. 16, 4299–4310 (2017). doi:10.1021/acs.jproteome.7b00467 Medline

2. M. Uhlén, L. Fagerberg, B. M. Hallström, C. Lindskog, P. Oksvold, A. Mardinoglu, Å. Sivertsson, C. Kampf, E. Sjöstedt, A. Asplund, I. Olsson, K. Edlund, E. Lundberg, S. Navani, C. A.-K. Szigyarto, J. Odeberg, D. Djureinovic, J. O. Takanen, S. Hober, T. Alm, P.-H. Edqvist, H. Berling, H. Tegel, J. Mulder, J. Rockberg, P. Nilsson, J. M. Schwenk, M. Hamsten, K. von Feilitzen, M. Forsberg, L. Persson, F. Johansson, M. Zwahlen, G. von Heijne, J. Nielsen, F. Pontén, Proteomics. Tissue-based map of the human proteome. Science 347, 1260419 (2015). doi:10.1126/science.1260419 Medline

3. M. Stastna, J. E. Van Eyk, Secreted proteins as a fundamental source for biomarker discovery. Proteomics 12, 722–735 (2012). doi:10.1002/pmic.201100346 Medline

4. I. M. Conboy, M. J. Conboy, A. J. Wagers, E. R. Girma, I. L. Weissman, T. A. Rando, Rejuvenation of aged progenitor cells by exposure to a young systemic environment. Nature 433, 760–764 (2005). doi:10.1038/nature03260 Medline

5. S. A. Villeda, J. Luo, K. I. Mosher, B. Zou, M. Britschgi, G. Bieri, T. M. Stan, N. Fainberg, Z. Ding, A. Eggel, K. M. Lucin, E. Czirr, J.-S. Park, S. Couillard-Després, L. Aigner, G. Li, E. R. Peskind, J. A. Kaye, J. F. Quinn, D. R. Galasko, X. S. Xie, T. A. Rando, T. Wyss-Coray, The ageing systemic milieu negatively regulates neurogenesis and cognitive function. Nature 477, 90–94 (2011). doi:10.1038/nature10357 Medline

6. E. E. Schadt, Molecular networks as sensors and drivers of common human diseases. Nature 461, 218–223 (2009). doi:10.1038/nature08454 Medline

7. B. Zhang, C. Gaiteri, L.-G. Bodea, Z. Wang, J. McElwee, A. A. Podtelezhnikov, C. Zhang, T. Xie, L. Tran, R. Dobrin, E. Fluder, B. Clurman, S. Melquist, M. Narayanan, C. Suver, H. Shah, M. Mahajan, T. Gillis, J. Mysore, M. E. MacDonald, J. R. Lamb, D. A. Bennett, C. Molony, D. J. Stone, V. Gudnason, A. J. Myers, E. E. Schadt, H. Neumann, J. Zhu, V. Emilsson, Integrated systems approach identifies genetic nodes and networks in late-onset Alzheimer’s disease. Cell 153, 707–720 (2013). doi:10.1016/j.cell.2013.03.030 Medline

8. V. Emilsson, G. Thorleifsson, B. Zhang, A. S. Leonardson, F. Zink, J. Zhu, S. Carlson, A. Helgason, G. B. Walters, S. Gunnarsdottir, M. Mouy, V. Steinthorsdottir, G. H. Eiriksdottir, G. Bjornsdottir, I. Reynisdottir, D. Gudbjartsson, A. Helgadottir, A. Jonasdottir, A. Jonasdottir, U. Styrkarsdottir, S. Gretarsdottir, K. P. Magnusson, H. Stefansson, R. Fossdal, K. Kristjansson, H. G. Gislason, T. Stefansson, B. G. Leifsson, U. Thorsteinsdottir, J. R. Lamb, J. R. Gulcher, M. L. Reitman, A. Kong, E. E. Schadt, K. Stefansson, Genetics of gene expression and its effect on disease. Nature 452, 423–428 (2008). doi:10.1038/nature06758 Medline

9. Y. Chen, J. Zhu, P. Y. Lum, X. Yang, S. Pinto, D. J. MacNeil, C. Zhang, J. Lamb, S. Edwards, S. K. Sieberts, A. Leonardson, L. W. Castellini, S. Wang, M.-F. Champy, B. Zhang, V. Emilsson, S. Doss, A. Ghazalpour, S. Horvath, T. A. Drake, A. J. Lusis, E. E. Schadt, Variations in DNA elucidate molecular networks that cause disease. Nature 452, 429–435 (2008). doi:10.1038/nature06757 Medline

Page 56: Supplementary Materials for · Materials and Methods 1. The study cohort Cohort participants aged 66 through 96 were from the (AGES) – Reykjavik Study (12), a single-center prospective

56

10. D. R. Davies, A. D. Gelinas, C. Zhang, J. C. Rohloff, J. D. Carter, D. O’Connell, S. M. Waugh, S. K. Wolk, W. S. Mayfield, A. B. Burgin, T. E. Edwards, L. J. Stewart, L. Gold, N. Janjic, T. C. Jarvis, Unique motifs and hydrophobic interactions shape the binding of modified DNA ligands to protein targets. Proc. Natl. Acad. Sci. U.S.A. 109, 19971–19976 (2012). doi:10.1073/pnas.1213933109 Medline

11. L. Gold, D. Ayers, J. Bertino, C. Bock, A. Bock, E. N. Brody, J. Carter, A. B. Dalby, B. E. Eaton, T. Fitzwater, D. Flather, A. Forbes, T. Foreman, C. Fowler, B. Gawande, M. Goss, M. Gunn, S. Gupta, D. Halladay, J. Heil, J. Heilig, B. Hicke, G. Husar, N. Janjic, T. Jarvis, S. Jennings, E. Katilius, T. R. Keeney, N. Kim, T. H. Koch, S. Kraemer, L. Kroiss, N. Le, D. Levine, W. Lindsey, B. Lollo, W. Mayfield, M. Mehan, R. Mehler, S. K. Nelson, M. Nelson, D. Nieuwlandt, M. Nikrad, U. Ochsner, R. M. Ostroff, M. Otis, T. Parker, S. Pietrasiewicz, D. I. Resnicow, J. Rohloff, G. Sanders, S. Sattin, D. Schneider, B. Singer, M. Stanton, A. Sterkel, A. Stewart, S. Stratford, J. D. Vaught, M. Vrkljan, J. J. Walker, M. Watrobka, S. Waugh, A. Weiss, S. K. Wilcox, A. Wolfson, S. K. Wolk, C. Zhang, D. Zichi, Aptamer-based multiplexed proteomic technology for biomarker discovery. PLOS ONE 5, e15004 (2010). doi:10.1371/journal.pone.0015004 Medline

12. T. B. Harris, L. J. Launer, G. Eiriksdottir, O. Kjartansson, P. V. Jonsson, G. Sigurdsson, G. Thorgeirsson, T. Aspelund, M. E. Garcia, M. F. Cotch, H. J. Hoffman, V. Gudnason, Age, Gene/Environment Susceptibility-Reykjavik Study: Multidisciplinary applied phenomics. Am. J. Epidemiol. 165, 1076–1087 (2007). doi:10.1093/aje/kwk115 Medline

13. A. L. Barabási, R. Albert, Emergence of scaling in random networks. Science 286, 509–512 (1999). doi:10.1126/science.286.5439.509 Medline

14. B. Zhang, S. Horvath, A general framework for weighted gene co-expression network analysis. Stat. Appl. Genet. Mol. Biol. 4, e17 (2005). doi:10.2202/1544-6115.1128 Medline

15. P. Langfelder, R. Luo, M. C. Oldham, S. Horvath, Is my network module preserved and reproducible? PLOS Comput. Biol. 7, e1001057 (2011). doi:10.1371/journal.pcbi.1001057 Medline

16. L. Shu, K. H. K. Chan, G. Zhang, T. Huan, Z. Kurt, Y. Zhao, V. Codoni, D.-A. Trégouët, J. Yang, J. G. Wilson, X. Luo, D. Levy, A. J. Lusis, S. Liu, X. Yang; Cardiogenics Consortium, Shared genetic regulatory networks for cardiovascular disease and type 2 diabetes in multiple populations of diverse ethnicities in the United States. PLOS Genet. 13, e1007040 (2017). doi:10.1371/journal.pgen.1007040 Medline

17. H. Jeong, S. P. Mason, A. L. Barabási, Z. N. Oltvai, Lethality and centrality in protein networks. Nature 411, 41–42 (2001). doi:10.1038/35075138 Medline

18. A. L. Barabási, N. Gulbahce, J. Loscalzo, Network medicine: A network-based approach to human disease. Nat. Rev. Genet. 12, 56–68 (2011). doi:10.1038/nrg2918 Medline

19. M. Muñoz, R. Pong-Wong, O. Canela-Xandri, K. Rawlik, C. S. Haley, A. Tenesa, Evaluating the contribution of genetics and familial shared environment to common disease using the UK Biobank. Nat. Genet. 48, 980–983 (2016). Medline

20. J. R. Staley, J. Blackshaw, M. A. Kamat, S. Ellis, P. Surendran, B. B. Sun, D. S. Paul, D. Freitag, S. Burgess, J. Danesh, R. Young, A. S. Butterworth, PhenoScanner: A database of human genotype-phenotype associations. Bioinformatics 32, 3207–3209 (2016). doi:10.1093/bioinformatics/btw373 Medline

Page 57: Supplementary Materials for · Materials and Methods 1. The study cohort Cohort participants aged 66 through 96 were from the (AGES) – Reykjavik Study (12), a single-center prospective

57

21. N. Mähler, J. Wang, B. K. Terebieniec, P. K. Ingvarsson, N. R. Street, T. R. Hvidsten, Gene co-expression network connectivity is an important determinant of selective constraint. PLOS Genet. 13, e1006402 (2017). doi:10.1371/journal.pgen.1006402 Medline

22. J. K. Pickrell, T. Berisa, J. Z. Liu, L. Ségurel, J. Y. Tung, D. A. Hinds, Detection and interpretation of shared genetic influences on 42 human traits. Nat. Genet. 48, 709–717 (2016). doi:10.1038/ng.3570 Medline

23. M. Franchini, G. Lippi, The intriguing relationship between the ABO blood group, cardiovascular disease, and cancer. BMC Med. 13, 7 (2015). doi:10.1186/s12916-014-0250-y Medline

24. M. Franchini, F. Capra, G. Targher, M. Montagnana, G. Lippi, Relationship between ABO blood group and von Willebrand factor levels: From biology to clinical implications. Thromb. J. 5, 14 (2007). doi:10.1186/1477-9560-5-14 Medline

25. E. A. Boyle, Y. I. Li, J. K. Pritchard, An Expanded View of Complex Traits: From Polygenic to Omnigenic. Cell 169, 1177–1186 (2017). doi:10.1016/j.cell.2017.05.038 Medline

26. D. Alfego, U. Rodeck, A. Kriete, Global mapping of transcription factor motifs in human aging. PLOS ONE 13, e0190457 (2018). doi:10.1371/journal.pone.0190457 Medline

27. J. Yang, T. Huang, F. Petralia, Q. Long, B. Zhang, C. Argmann, Y. Zhao, C. V. Mobbs, E. E. Schadt, J. Zhu, Z. Tu; GTEx Consortium, Synchronized age-related gene expression changes across multiple tissues in human and the link to complex diseases. Sci. Rep. 5, 15145 (2015). doi:10.1038/srep15145 Medline

28. J. M. Zahn, S. Poosala, A. B. Owen, D. K. Ingram, A. Lustig, A. Carter, A. T. Weeraratna, D. D. Taub, M. Gorospe, K. Mazan-Mamczarz, E. G. Lakatta, K. R. Boheler, X. Xu, M. P. Mattson, G. Falco, M. S. H. Ko, D. Schlessinger, J. Firman, S. K. Kummerfeld, W. H. Wood 3rd, A. B. Zonderman, S. K. Kim, K. G. Becker, AGEMAP: A gene expression database for aging in mice. PLOS Genet. 3, e201 (2007). doi:10.1371/journal.pgen.0030201 Medline

29. P. Langfelder, B. Zhang, S. Horvath, Defining clusters from a hierarchical cluster tree: The Dynamic Tree Cut package for R. Bioinformatics 24, 719–720 (2008). doi:10.1093/bioinformatics/btm563 Medline

30. G. Csardi, T. Nepusz, The igraph software package for complex network research. InterJournal. Complex Syst. 1695, 1 (2006).

31. American Diabetes Association, Diagnosis and classification of diabetes mellitus. Diabetes Care 36 (suppl. 1), S67–S74 (2013). doi:10.2337/dc13-S067 Medline

32. A. Agarwala, S. Virani, D. Couper, L. Chambless, E. Boerwinkle, B. C. Astor, R. C. Hoogeveen, J. Coresh, A. R. Sharrett, A. R. Folsom, T. Mosley, C. M. Ballantyne, V. Nambi, Biomarkers and degree of atherosclerosis are independently associated with incident atherosclerotic cardiovascular disease in a primary prevention cohort: The ARIC study. Atherosclerosis 253, 156–163 (2016). doi:10.1016/j.atherosclerosis.2016.08.028 Medline

33. Y. Hathout, E. Brody, P. R. Clemens, L. Cripe, R. K. DeLisle, P. Furlong, H. Gordish-Dressman, L. Hache, E. Henricson, E. P. Hoffman, Y. M. Kobayashi, A. Lorts, J. K. Mah, C. McDonald, B. Mehler, S. Nelson, M. Nikrad, B. Singer, F. Steele, D. Sterling, H. L. Sweeney, S. Williams, L. Gold, Large-scale serum protein biomarker

Page 58: Supplementary Materials for · Materials and Methods 1. The study cohort Cohort participants aged 66 through 96 were from the (AGES) – Reykjavik Study (12), a single-center prospective

58

discovery in Duchenne muscular dystrophy. Proc. Natl. Acad. Sci. U.S.A. 112, 7153–7158 (2015). doi:10.1073/pnas.1507719112 Medline

34. J. Candia, F. Cheung, Y. Kotliarov, G. Fantoni, B. Sellers, T. Griesman, J. Huang, S. Stuccio, A. Zingone, B. M. Ryan, J. S. Tsang, A. Biancotto, Assessment of Variability in the SOMAscan Assay. Sci. Rep. 7, 14248 (2017). doi:10.1038/s41598-017-14755-5 Medline

35. K. J. Max Kuhn, Applied Predictive Modeling (Springer, 2013).

36. J. Barretina, G. Caponigro, N. Stransky, K. Venkatesan, A. A. Margolin, S. Kim, C. J. Wilson, J. Lehár, G. V. Kryukov, D. Sonkin, A. Reddy, M. Liu, L. Murray, M. F. Berger, J. E. Monahan, P. Morais, J. Meltzer, A. Korejwa, J. Jané-Valbuena, F. A. Mapa, J. Thibault, E. Bric-Furlong, P. Raman, A. Shipway, I. H. Engels, J. Cheng, G. K. Yu, J. Yu, P. Aspesi Jr., M. de Silva, K. Jagtap, M. D. Jones, L. Wang, C. Hatton, E. Palescandolo, S. Gupta, S. Mahan, C. Sougnez, R. C. Onofrio, T. Liefeld, L. MacConaill, W. Winckler, M. Reich, N. Li, J. P. Mesirov, S. B. Gabriel, G. Getz, K. Ardlie, V. Chan, V. E. Myer, B. L. Weber, J. Porter, M. Warmuth, P. Finan, J. L. Harris, M. Meyerson, T. R. Golub, M. P. Morrissey, W. R. Sellers, R. Schlegel, L. A. Garraway, The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483, 603–607 (2012). doi:10.1038/nature11003 Medline

37. B. MacLean, D. M. Tomazela, S. E. Abbatiello, S. Zhang, J. R. Whiteaker, A. G. Paulovich, S. A. Carr, M. J. Maccoss, Effect of collision energy optimization on the measurement of peptides by selected reaction monitoring (SRM) mass spectrometry. Anal. Chem. 82, 10116–10124 (2010). doi:10.1021/ac102179j Medline

38. Y. Mohammed, D. Domański, A. M. Jackson, D. S. Smith, A. M. Deelder, M. Palmblad, C. H. Borchers, PeptidePicker: A scientific workflow with web interface for selecting appropriate peptides for targeted proteomics experiments. J. Proteomics 106, 151–161 (2014). doi:10.1016/j.jprot.2014.04.018 Medline

39. B. MacLean, D. M. Tomazela, N. Shulman, M. Chambers, G. L. Finney, B. Frewen, R. Kern, D. L. Tabb, D. C. Liebler, M. J. MacCoss, Skyline: An open source document editor for creating and analyzing targeted proteomics experiments. Bioinformatics 26, 966–968 (2010). doi:10.1093/bioinformatics/btq054 Medline

40. J. A. Vizcaíno, R. G. Côté, A. Csordas, J. A. Dianes, A. Fabregat, J. M. Foster, J. Griss, E. Alpi, M. Birim, J. Contell, G. O’Kelly, A. Schoenegger, D. Ovelleiro, Y. Pérez-Riverol, F. Reisinger, D. Ríos, R. Wang, H. Hermjakob, The PRoteomics IDEntifications (PRIDE) database and associated tools: Status in 2013. Nucleic Acids Res. 41, D1063–D1069 (2013). doi:10.1093/nar/gks1262 Medline

41. M. M. Chan, R. Santhanakrishnan, J. P. C. Chong, Z. Chen, B. C. Tai, O. W. Liew, T. P. Ng, L. H. Ling, D. Sim, K. T. G. Leong, P. S. D. Yeo, H.-Y. Ong, F. Jaufeerally, R. C.-C. Wong, P. Chai, A. F. Low, A. M. Richards, C. S. P. Lam, Growth differentiation factor 15 in heart failure with preserved vs. reduced ejection fraction. Eur. J. Heart Fail. 18, 81–88 (2016). doi:10.1002/ejhf.431 Medline

42. P. G. van Peet, A. J. de Craen, J. Gussekloo, W. de Ruijter, Plasma NT-proBNP as predictor of change in functional status, cardiovascular morbidity and mortality in the oldest old: The Leiden 85-plus study. Age (Dordr.) 36, 9660 (2014). doi:10.1007/s11357-014-9660-1 Medline

Page 59: Supplementary Materials for · Materials and Methods 1. The study cohort Cohort participants aged 66 through 96 were from the (AGES) – Reykjavik Study (12), a single-center prospective

59

43. P. Langfelder, S. Horvath, WGCNA: An R package for weighted correlation network analysis. BMC Bioinformatics 9, 559 (2008). doi:10.1186/1471-2105-9-559 Medline

44. E. Ravasz, A. L. Somera, D. A. Mongru, Z. N. Oltvai, A. L. Barabási, Hierarchical organization of modularity in metabolic networks. Science 297, 1551–1555 (2002). doi:10.1126/science.1073374 Medline

45. S. L. Carter, C. M. Brechbühler, M. Griffin, A. T. Bond, Gene co-expression network topology provides a framework for molecular characterization of cellular state. Bioinformatics 20, 2242–2250 (2004). doi:10.1093/bioinformatics/bth234 Medline

46. M. C. Oldham, S. Horvath, D. H. Geschwind, Conservation and evolution of gene coexpression networks in human and chimpanzee brains. Proc. Natl. Acad. Sci. U.S.A. 103, 17973–17978 (2006). doi:10.1073/pnas.0605938103 Medline

47. R. Albert, H. Jeong, A.-L. Barabási, Error and attack tolerance of complex networks. Nature 406, 378–382 (2000). doi:10.1038/35019019 Medline

48. R. Albert, A.-L. Barabási, Statistical mechanics of complex networks. Rev. Mod. Phys. 74, 47–97 (2002). doi:10.1103/RevModPhys.74.47

49. J.-D. J. Han, N. Bertin, T. Hao, D. S. Goldberg, G. F. Berriz, L. V. Zhang, D. Dupuy, A. J. M. Walhout, M. E. Cusick, F. P. Roth, M. Vidal, Evidence for dynamically organized modularity in the yeast protein-protein interaction network. Nature 430, 88–93 (2004). doi:10.1038/nature02555 Medline

50. G. Chauhan, C. R. Arnold, A. Y. Chu, M. Fornage, A. Reyahi, J. C. Bis, A. S. Havulinna, M. Sargurupremraj, A. V. Smith, H. H. H. Adams, S. H. Choi, S. L. Pulit, S. Trompet, M. E. Garcia, A. Manichaikul, A. Teumer, S. Gustafsson, T. M. Bartz, C. Bellenguez, J. S. Vidal, X. Jian, O. Kjartansson, K. L. Wiggins, C. L. Satizabal, F. Xue, S. Ripatti, Y. Liu, J. Deelen, M. den Hoed, S. Bevan, J. C. Hopewell, R. Malik, S. R. Heckbert, K. Rice, N. L. Smith, C. Levi, P. Sharma, C. L. M. Sudlow, A. M. Nik, J. W. Cole, R. Schmidt, J. Meschia, V. Thijs, A. Lindgren, O. Melander, R. P. Grewal, R. L. Sacco, T. Rundek, P. M. Rothwell, D. K. Arnett, C. Jern, J. A. Johnson, O. R. Benavente, S. Wassertheil-Smoller, J.-M. Lee, Q. Wong, H. J. Aparicio, S. T. Engelter, M. Kloss, D. Leys, A. Pezzini, J. E. Buring, P. M. Ridker, C. Berr, J.-F. Dartigues, A. Hamsten, P. K. Magnusson, M. Traylor, N. L. Pedersen, L. Lannfelt, L. Lindgren, C. M. Lindgren, A. P. Morris, J. Jimenez-Conde, J. Montaner, F. Radmanesh, A. Slowik, D. Woo, A. Hofman, P. J. Koudstaal, M. L. P. Portegies, A. G. Uitterlinden, A. J. M. de Craen, I. Ford, J. W. Jukema, D. J. Stott, N. B. Allen, M. M. Sale, A. D. Johnson, D. A. Bennett, P. L. De Jager, C. C. White, H. J. Grabe, M. R. P. Markus, U. Schminke, G. B. Boncoraglio, R. Clarke, Y. Kamatani, J. Dallongeville, O. L. Lopez, J. I. Rotter, M. A. Nalls, R. F. Gottesman, M. E. Griswold, D. S. Knopman, B. G. Windham, A. Beiser, H. S. Markus, E. Vartiainen, C. R. French, M. Dichgans, T. Pastinen, M. Lathrop, V. Gudnason, T. Kurth, B. M. Psaty, T. B. Harris, S. S. Rich, A. L. deStefano, C. O. Schmidt, B. B. Worrall, J. Rosand, V. Salomaa, T. H. Mosley, E. Ingelsson, C. M. van Duijn, C. Tzourio, K. M. Rexrode, O. J. Lehmann, L. J. Launer, M. A. Ikram, P. Carlsson, D. I. Chasman, S. J. Childs, W. T. Longstreth, S. Seshadri, S. Debette; Neurology Working Group of the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium, the Stroke Genetics Network (SiGN), and the International Stroke Genetics Consortium, Identification of additional risk loci for stroke and small vessel disease: A meta-analysis of genome-wide association studies. Lancet Neurol. 15, 695–707 (2016). doi:10.1016/S1474-4422(16)00102-2

Page 60: Supplementary Materials for · Materials and Methods 1. The study cohort Cohort participants aged 66 through 96 were from the (AGES) – Reykjavik Study (12), a single-center prospective

60

51. E. J. Foss, D. Radulovic, S. A. Shaffer, D. R. Goodlett, L. Kruglyak, A. Bedalov, Genetic variation shapes protein networks mainly through non-transcriptional mechanisms. PLOS Biol. 9, e1001144 (2011). doi:10.1371/journal.pbio.1001144 Medline

52. C. Gaiteri, Y. Ding, B. French, G. C. Tseng, E. Sibille, Beyond modules and hubs: The potential of gene coexpression networks for investigating molecular mechanisms of complex brain disorders. Genes Brain Behav. 13, 13–24 (2014). doi:10.1111/gbb.12106 Medline

53. D. L. Nicolae, E. Gamazon, W. Zhang, S. Duan, M. E. Dolan, N. J. Cox, Trait-associated SNPs are more likely to be eQTLs: Annotation to enhance discovery from GWAS. PLOS Genet. 6, e1000888 (2010). doi:10.1371/journal.pgen.1000888 Medline

54. Å. Johansson, S. Enroth, M. Palmblad, A. M. Deelder, J. Bergquist, U. Gyllensten, Identification of genetic variants influencing the human plasma proteome. Proc. Natl. Acad. Sci. U.S.A. 110, 4673–4678 (2013). doi:10.1073/pnas.1217238110 Medline

55. S. Kim, S. Swaminathan, M. Inlow, S. L. Risacher, K. Nho, L. Shen, T. M. Foroud, R. C. Petersen, P. S. Aisen, H. Soares, J. B. Toledo, L. M. Shaw, J. Q. Trojanowski, M. W. Weiner, B. C. McDonald, M. R. Farlow, B. Ghetti, A. J. Saykin; Alzheimer’s Disease Neuroimaging Initiative (ADNI), Influence of genetic variation on plasma protein levels in older adults using a multi-analyte panel. PLOS ONE 8, e70269 (2013). doi:10.1371/journal.pone.0070269 Medline

56. S. Enroth, A. Johansson, S. B. Enroth, U. Gyllensten, Strong effects of genetic and lifestyle factors on biomarker variation and use of personalized cutoffs. Nat. Commun. 5, 4684 (2014). doi:10.1038/ncomms5684 Medline

57. Y. Liu, A. Buil, B. C. Collins, L. C. Gillet, L. C. Blum, L.-Y. Cheng, O. Vitek, J. Mouritsen, G. Lachance, T. D. Spector, E. T. Dermitzakis, R. Aebersold, Quantitative variability of 342 plasma proteins in a human twin population. Mol. Syst. Biol. 11, 786 (2015). doi:10.15252/msb.20145728 Medline

58. B. B. Sun, J. C. Maranville, J. E. Peters, D. Stacey, J. R. Staley, J. Blackshaw, S. Burgess, T. Jiang, E. Paige, P. Surendran, C. Oliver-Williams, M. A. Kamat, B. P. Prins, S. K. Wilcox, E. S. Zimmerman, A. Chi, N. Bansal, S. L. Spain, A. M. Wood, N. W. Morrell, J. R. Bradley, N. Janjic, D. J. Roberts, W. H. Ouwehand, J. A. Todd, N. Soranzo, K. Suhre, D. S. Paul, C. S. Fox, R. M. Plenge, J. Danesh, H. Runz, A. S. Butterworth, Genomic atlas of the human plasma proteome. Nature 558, 73–79 (2018). doi:10.1038/s41586-018-0175-2 Medline

59. K. Suhre, M. Arnold, A. M. Bhagwat, R. J. Cotton, R. Engelke, J. Raffler, H. Sarwath, G. Thareja, A. Wahl, R. K. DeLisle, L. Gold, M. Pezer, G. Lauc, M. A. El-Din Selim, D. O. Mook-Kanamori, E. K. Al-Dous, Y. A. Mohamoud, J. Malek, K. Strauch, H. Grallert, A. Peters, G. Kastenmüller, C. Gieger, J. Graumann, Connecting genetic risk to disease end points through the human blood plasma proteome. Nat. Commun. 8, 14357 (2017). doi:10.1038/ncomms14357 Medline

60. I. Bhattacharya, Z. Manukyan, P. Chan, A. Heatherington, L. Harnisch, Application of Quantitative Pharmacology Approaches in Bridging Pharmacokinetics and Pharmacodynamics of Domagrozumab From Adult Healthy Subjects to Pediatric Patients With Duchenne Muscular Disease. J. Clin. Pharmacol. 58, 314–326 (2018). Medline

Page 61: Supplementary Materials for · Materials and Methods 1. The study cohort Cohort participants aged 66 through 96 were from the (AGES) – Reykjavik Study (12), a single-center prospective

61

61. K. Kondás, G. Szláma, M. Trexler, L. Patthy, Both WFIKKN1 and WFIKKN2 have high affinity for growth and differentiation factors 8 and 11. J. Biol. Chem. 283, 23677–23684 (2008). doi:10.1074/jbc.M803025200 Medline

62. H. Sun, Y. Zhu, H. Pan, X. Chen, J. L. Balestrini, T. T. Lam, J. E. Kanyo, A. Eichmann, M. Gulati, W. H. Fares, H. Bai, C. A. Feghali-Bostwick, Y. Gan, X. Peng, M. W. Moore, E. S. White, P. Sava, A. L. Gonzalez, Y. Cheng, L. E. Niklason, E. L. Herzog, Netrin-1 Regulates Fibrocyte Accumulation in the Decellularized Fibrotic Sclerodermatous Lung Microenvironment and in Bleomycin-Induced Pulmonary Fibrosis. Arthritis Rheumatol. 68, 1251–1261 (2016). Medline

63. G. T. Consortium; GTEx Consortium, The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans. Science 348, 648–660 (2015). doi:10.1126/science.1262110 Medline

64. J. Wang, D. Duncan, Z. Shi, B. Zhang, WEB-based GEne SeT AnaLysis Toolkit (WebGestalt): Update 2013. Nucleic Acids Res. 41, W77–W83 (2013). doi:10.1093/nar/gkt439 Medline

65. J. E. Shoemaker, T. J. S. Lopes, S. Ghosh, Y. Matsuoka, Y. Kawaoka, H. Kitano, CTen: A web-based platform for identifying enriched cell types from heterogeneous microarray data. BMC Genomics 13, 460 (2012). doi:10.1186/1471-2164-13-460 Medline

66. D. Warde-Farley, S. L. Donaldson, O. Comes, K. Zuberi, R. Badrawi, P. Chao, M. Franz, C. Grouios, F. Kazi, C. T. Lopes, A. Maitland, S. Mostafavi, J. Montojo, Q. Shao, G. Wright, G. D. Bader, Q. Morris, The GeneMANIA prediction server: Biological network integration for gene prioritization and predicting gene function. Nucleic Acids Res. 38 (suppl. 2), W214–W220 (2010). doi:10.1093/nar/gkq537 Medline

67. W. Huang, B. T. Sherman, R. A. Lempicki; W. Huang da, Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 4, 44–57 (2009). doi:10.1038/nprot.2008.211 Medline

68. D. Welter, J. MacArthur, J. Morales, T. Burdett, P. Hall, H. Junkins, A. Klemm, P. Flicek, T. Manolio, L. Hindorff, H. Parkinson, The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 42 (D1), D1001–D1006 (2014). doi:10.1093/nar/gkt1229 Medline