Identification of personalized multi-omic disease modules ... › smash › get › diva2:1233438 › FULLTEXT01.pdf · Identification of personalized multi-omic disease modules in

Identification of personalized

multi-omic disease modules

in asthma

Master Degree Project in Systems Biology

A2E (60 ECTS)

Autumn 2017 - Spring 2018

Version 2

Author

David Martínez Enguita [email protected]

Main supervisor

Mika Gustafsson [email protected] Department of Physics, Chemistry and Biology

Linköping University 581 83 Linköping

Sweden

Co-supervisor

Tejaswi V. S. Badam [email protected]

Examiner

Björn Olsson [email protected] School of Bioscience University of Skövde PO Box 408 541 28 Skövde Sweden Local supervisor

Zelmina Lubovac [email protected]

ii

Abstract

Asthma is a respiratory syndrome associated with airflow limitation, bronchial hyperresponsiveness and inflammation of the airways in the lungs. Despite the ongoing research efforts, the outstanding heterogeneity displayed by the multiple forms in which this condition presents often hampers the attempts to determine and classify the phenotypic and endotypic biological structures at play, even when considering a limited assembly of asthmatic subjects. To increase our understanding of the molecular mechanisms and functional pathways that govern asthma from a systems medicine perspective, a computational workflow focused on the identification of personalized transcriptomic modules from the U-BIOPRED study cohorts, by the use of the novel MODifieR integrated R package, was designed and applied. A feature selection of candidate asthma biomarkers was implemented, accompanied by the detection of differentially expressed genes across sample categories, the production of patient-specific gene modules and the subsequent construction of a set of core disease modules of asthma, which were validated with genomic data and analyzed for pathway and disease enrichment. The results indicate that the approach utilized is able to reveal the presence of components and signaling routes known to be crucially involved in asthma pathogenesis, while simultaneously uncovering candidate genes closely linked to the latter. The present project establishes a valuable pipeline for the module-driven study of asthma and other related conditions, which can provide new potential targets for therapeutic intervention and contribute to the development of individualized treatment strategies.

iii

Popular scientific summary

Asthma is a respiratory disease with very diverse manifestations that involves a long-term inflammation of the airways in the lungs. This condition usually develops in early childhood but can appear at any stage in life. The latest estimates indicate that one out of ten children have experienced asthma symptoms (such as shortness of breath, chest tightness, and wheezing) and one out of twenty young adults have been diagnosed with asthma or are under treatment for asthma. Previously considered as a disease of high-income areas, currently the majority of individuals affected by asthma originate from less economically developed countries, where asthma prevalence has been rapidly increasing in recent years. Some asthmatics present a more serious form of the disease, named "severe asthma", which is characterized by the recurrent risk of suffering life-threatening episodes known as "asthma attacks". In order to prevent these attacks, bronchodilators and corticosteroids are administered to patients, with differing efficacy.

The ample variety of forms in which asthma can occur poses a challenge to researchers and physicians when trying to discern which of the presentations of the disease is affecting a particular individual and what treatment will provide the optimal results in that case. To help solving this problem, a well-established strategy is the classification of the clinical and physiological aspects of asthma into several categories (phenotypes) and the identification of the biological mechanisms governing them (endotypes). Thus, the more information that can be gathered about the phenotypic and endotypic features of a patient, the better control that can be achieved over the diagnosis of asthma, the required medication type and dosage, their potential side effects, or their interplay with other drugs. The maximum expression of this approach is personalized medicine, in which the scope is narrowed to the smallest component: the concrete manifestation and causative mechanism of a disease in a single individual.

Biological mechanisms and pathways are complex and dynamic structures with highly intricate regulation systems. With the objective of displaying them in a coherent and understandable manner, representations known as biological networks can be constructed, formed by nodes (genes, proteins, or other molecules) and edges (interactions between nodes). Densely connected areas of a network are defined as modules, which might be associated with a certain function or, due to a defective performance of that function, with the presence of a form of a disease. Once the disrupted modules are found (disease modules), the patients can be sorted according to them, and their constituents can be inspected to determine novel candidate targets for drugs, which would be specific for each of these groups. The aim of this work was the identification of asthma-linked modules obtained from the combination of data from the gene expression analysis and the genome-wide study of genetic variants of participants in the U-BIOPRED project, by the development of a workflow for the personalized profiling of samples, from which the corresponding modules could be extracted, validated and interpreted.

The initial outcome of the project consisted in the production of individualized modules for samples of the U-BIOPRED study of asthma biomarkers. These served as the basis for the subsequent construction and validation of a core disease module of severe asthma, which was found to contain a variety of genes that have been tied to asthma occurrence, progression, or severity. Additionally, the analysis of the core module revealed the presence of multiple molecular signaling routes that are known to correspond with distinct asthma phenotypes. In conclusion, these results suggest that the module-based approach is a valuable strategy for the detection of specific disease genes and their related biological pathways, able to generate a core network of elements enriched for a certain condition, in comparison to healthy controls or other sample categories. Likewise, new insights regarding the pathology of the studied syndrome can be discovered by the further examination of genes of interest found in the modules that had no previous association with the disease. The workflow designed and tested is not restricted to asthma, but on the contrary, it can be applied to any set of inputs with analogous source or format.

iv

Table of Contents

Introduction ................................................................................................................................... 1

Materials and Methods ................................................................................................................. 6

Project outline and workflow ................................................................................................... 6

Microarray transcriptomic profiling of asthma ........................................................................ 6

Feature selection and linear modeling ..................................................................................... 7

MODifieR: a robust module identification method ................................................................. 8

Module production and workflow optimization ...................................................................... 8

Genome-wide profiling of asthma ........................................................................................... 9

PASCAL module scoring and pathway analysis ...................................................................... 10

Core module development and network modeling ............................................................... 10

Results ......................................................................................................................................... 11

Linear modeling and covariate analysis ................................................................................. 11

SNP association with asthma in U-BIOPRED .......................................................................... 12

Identification of individual modules ...................................................................................... 12

Core modules I: PASCAL validation ........................................................................................ 13

Core modules II: Network modeling and enrichment analysis .............................................. 14

Discussion .................................................................................................................................... 16

Ethical aspects and impact of the research on the society ......................................................... 20

Future perspectives ..................................................................................................................... 20

Acknowledgments ....................................................................................................................... 21

References ................................................................................................................................... 22

Appendices .................................................................................................................................. 36

Appendix A ............................................................................................................................. 36

Appendix B ............................................................................................................................. 38

Appendix C ............................................................................................................................. 43

Appendix D ............................................................................................................................. 44

Appendix E.............................................................................................................................. 52

Appendix F .............................................................................................................................. 58

Appendix G ............................................................................................................................. 59

Appendix H ............................................................................................................................. 62

Appendix I .............................................................................................................................. 72

Appendix J .............................................................................................................................. 83

Appendix K ............................................................................................................................. 84

v

Abbreviations

BMI Body Mass Index

DEG Differentially expressed gene

DO Disease Ontology

ErbB Erythroblastic leukemia viral oncogene

FAMD Factor analysis of mixed data

FDR False discovery rate

FeNO Fractional exhaled nitric oxide

FEV1 Forced expiratory volume in 1 second

GLM Generalized linear model

GO Gene Ontology

GWAS Genome wide association study

ICS Inhaled corticosteroids

LABA Long-acting β2-agonists

LD Linkage disequilibrium

limma Linear models for microarray data

LR Likelihood ratio

MFA Multiple factor analysis

MLDM Multi-layer disease module

mTOR Mammalian target of rapamycin

OCS Oral corticosteroids

PCA Principal component analysis

PI3K Phosphatidylinositol 3-kinase

PPI Protein-protein interaction network

SNP Single nucleotide polymorphism

1

Introduction

Asthma is commonly recognized as a variable condition associated with the chronic inflammation and remodeling of the airways of the lungs. Currently, the term "asthma" has evolved to refer to a highly heterogeneous assembly of disease variants, tied together by the presence of a set of recurring clinical symptoms, which converge around three main characteristics: reversible expiratory airflow limitation, bronchial hyperresponsiveness, and chronic airway inflammation (Busse et al., 2007). However, particularly in large-cohort studies, some of these features are often absent, or one or two are found to be predominant (Lötvall et al., 2011). The existing, albeit incomplete, knowledge of the diverse underlying pathophysiologies of asthma and the progressive realization of their complexity led to the definition of the "asthma syndrome", a concept introduced by Wenzel (2012) and ultimately ratified by the Global Initiative for Asthma (GINA, 2017). Estimates regarding the prevalence and incidence of asthma, applying the most conservative criteria for its diagnosis, indicate that approximately 300 million individuals are affected worldwide, irrespective of their age, sex and ethnicity, of which around 250,000 will die prematurely each year (Bousquet and Khaltaev, 2007).

While the majority of asthma patients can be successfully treated with the gold standard medication (inhaled corticosteroids, ICS), up to a 20 % of them remain refractory to any available therapy (Bateman et al., 2004). In addition, within this subgroup, a population representing between 5 and 10 % of the total patients is afflicted with a more severe form of asthma, which also lacks an adequate treatment and accounts for a substantial portion of the health care cost (Hekking et al., 2015). A patient with severe asthma can be identified by a lower than predicted forced expiratory volume in 1 second (FEV1) value, multiple exacerbations per day (asthma attacks; episodes of shortness of breath, cough or wheezing, of varying intensity), a history of respiratory infections, no atopy, and frequent comorbidity, usually in the form of rhinosinusitis, aspirin hypersensitivity or gastroesophageal reflux disease (Moore et al., 2007). The difference between "difficult-to-control" asthma and "severe refractory" asthma was emphasized in the consensus statement published by the Innovative Medicine Initiative (IMI) in 2011. In the case of difficult-to-control asthma patients, treatment failure originates from factors unrelated to asthma itself, such as inappropriate inhalation techniques; whereas individuals with severe refractory asthma experience few or no improvement of their condition even when no external influences are involved (Bel et al., 2011). Thus, severe asthma may either persist in an uncontrolled state or require treatment with high-dose ICS plus a second controller like long-acting β2-agonists (LABAs) and/or systemic corticosteroids to prevent the former from occurring (Chung et al., 2014). Risks of uncontrolled asthma include adverse reactions to medication, chronic morbidity, and gradually acute exacerbations, which may result in death by fatal asthma attack (Bousquet et al., 2010).

Disease phenotypes represent the external manifestation of the interaction of an individual's genetics, regarding a certain disease, with the environment, without necessarily relating to its latent causes (Skloot, 2016). In an effort to categorize the multiple subgroups encompassed by the asthma syndrome, several asthma phenotypes have been suggested, based upon clinical or physiological aspects (e.g., severity-defined, exacerbation-prone, chronic airflow restriction asthma), disease triggers (e.g., exercise-induced, aspirin-induced, occupational, allergic asthma), and inflammatory processes (e.g., eosinophilic, neutrophilic, paucigranulocytic asthma). Nevertheless, asthma phenotypes are dynamic in time and not mutually exclusive, namely, they are known to occasionally overlap with each other (Wenzel, 2006). A more recent classification of phenotypes by Desai and Oppenheimer (2016) proposed the separation of the phenotypical subgroups into two levels: Th2-mediated asthma and non-Th2-mediated asthma phenotypes, depending on the immunological mechanisms of the adaptive immune system that are implicated. Accordingly, Th2-mediated asthma phenotype, which presents strong links to atopy and eosinophilic inflammation, consists of four subsets: a) early-onset allergic asthma, b) late-onset persistent eosinophilic asthma, with aspirin-induced asthma as a subtype, c) allergic bronchopulmonary mycoses, and d) exercise-induced asthma.

2

On the other hand, non-Th2-mediated asthma comprises patients with poor corticosteroid response, whose symptoms cannot be attributed solely to the Th2 pathway, and is divided into: a) neutrophilic asthma, b) extensive remodeling asthma, and c) obesity-related asthma. Despite the significant share of asthma cases it represents, the understanding of non-Th2-mediated asthma is still lacking.

By extending the interpretation of a phenotype to include the biological mechanisms at work behind its observable properties, a specific endotype can be determined. Asthma endotypes are "mechanistically coherent disease entities" (Lötvall et al., 2011) that can be constructed by the aggregation of phenotypic clusters, developed from multivariate analyses of the diverse dimensions of the disease (e.g., immunology, hereditary components, response to treatment), and pathophysiological studies. Lötvall et al. (2011) also defined a seven-parameter criterion for the identification of asthma endotypes, all associated with asthma pathogenesis. Therefore, true endotypes should fulfill at least five of these parameters: clinical characteristics (history, comorbidity, physical examination), biomarkers (eosinophilia, fractional exhaled nitric oxide (FeNO), IgE), lung physiology (bronchial hyperresponsiveness, FEV1, reversibility), genetics (single nucleotide polymorphisms (SNPs), biological pathways), histopathology (tissue and lung characteristics), epidemiology (prevalence, risk factors, natural history), and treatment response (responsiveness to available medication). In order to underscore the intricacy of endotypes, due to the heterogeneity and inconstant nature of pathogenic pathways, Agache et al. (2015) recommended the use of "complex endotype" to label those endotypes connected with more than a single molecular mechanism.

Correspondingly, three major asthma endotypes have been established, each related to the known asthma phenotypes and containing one or several sub-endotypes: a) type-2-driven endotype, characterized by a Th2 immune response to an allergic-like reaction and a satisfactory response to ICS, often with the involvement of atopy and allergic rhinitis (including the systemic-IgE-high, local-IgE-high, interleukin-4-high, interleukin-5-high, and interleukin-13-high sub-endotypes); b) non-type-2-driven endotype, usually referred to as non-eosinophilic or neutrophilic asthma, defined by neutrophilic or neurogenic inflammation, activation of the interleukin-17-dependent pathway, epithelial barrier dysfunction, and resistance to ICS; and c) mixed type-2/Th17-driven endotype, rooted on a dual Th2/Th17 cell response, and linked with airway obstruction, bronchial hyperresponsiveness, and resistance to ICS (Agache and Akdis, 2016). Yet, it is worth noting that these endotypes are susceptible to modification, as alternative benchmarks for endotype categorizing might appear in the future, alongside with the advent of novel insights in the perception of asthma.

Difficulties in the diagnosis and therapeutic treatment of asthma, as well as of a broad range of other prevailing diseases, presumably stem from the high level of variability displayed by their underlying disease mechanisms, generally involving from dozens to hundreds of interconnected disease-associated genes with distinct effects. While the task of individually identifying these genes and their products can be facilitated by high-throughput techniques, such as omics technologies or genome-wide association studies (GWAS), comprehending and dissecting their functionality as a whole continues to pose a challenge for researchers (Zhang et al., 2014). Given this context, new scientific fields with the potential of answering these questions are on the rise. One of the most relevant is systems biology, a discipline that aims for the integration of known biological interactions into highly-descriptive network models with the capacity of validating existing hypotheses and, notably, generating new ones from observations (Davidson, 2002). Systems approaches to disease are focused on the expected divergence between the protein and gene regulatory networks of healthy individuals or populations and their altered states as a result of the disease (Hood, 2004).

The emerging inter-specialty of systems medicine fuses basic research data generation with clinical information and mathematical modeling workflows, bringing together bioinformatics and systems biology (Wolkenhauer et al., 2014). It advocates for the obtaining and improvement of robust pathological networks, the definitive discard of the "single drug for a single target" dogma, and the unravelling of functional biological relationships via high-throughput experimental techniques coupled with big data routines (Sun, Vilar and Tatonetti, 2013). Hence, systems medicine aims to supervise the

3

complete procedure of data production, with the objective of optimizing the pathway models and treatment strategies that are generated; in contrast with systems biology, where the usefulness and accuracy of this material often depends on third parties (Ayers and Day, 2015). If any downside to this discipline is to be considered, it should be the lack of dynamicity displayed by a majority of the available models, in that data concerning the intermediate states between full health and full illness of an individual is limited at best (Tillmann et al., 2015). However, this issue will undoubtedly be coped with in upcoming studies, as increasingly complex models continue to be developed.

The bottom-up application of systems medicine involves the use of network-based analyses, in combination with large-scale clinical data, to predict and classify disease-associated genes or proteins by utilizing different procedures, including linkage methods, diffusion-based methods, and disease module-based methods. The latter are founded on the assumption that all cellular components that belong to well-defined neighborhoods of the interactome with an above-average local density ("topological modules"), which present biologically related functions ("functional modules"), or which jointly contribute to a cellular function whose perturbation leads to the development of a certain disease phenotype ("disease modules") have a high chance of being linked to a specific disease. In other words, it hypothesizes that topological, functional and disease modules overlap to some extent, so that a disease arises when a functional module, formed by topologically close genes or products, is disrupted (Barabási et al., 2011). Several studies support this claim, providing evidence that proteins within disease modules have a tendency to interact with one another rather than with other proteins (Goh et al., 2007), that interacting proteins preferentially lead to similar disease phenotypes when mutated and thus protein-protein interactions (PPIs) may be used to predict candidate disease genes (Oti et al., 2006), that a high interconnectivity exists between disease-causative gene nodes in the form of clustered subnetworks (Gandhi et al., 2006), that there is a significant propensity for these disease genes to follow a certain topological property pattern in human PPIs network (Xu and Li, 2006), or that genotype and phenotype are complementary and diseases sharing a common set of genes are likely to present a comparable symptomatology (Halu et al., 2017).

The analysis of disease modules applies conventional network principles to disease mechanisms, biomarkers and therapeutic targets. Biological networks are mainly scale-free structures, exhibiting multiple nodes with few edges and a number of "hub" nodes with numerous interactions, whose relevant presence in the network's architecture implies a certain degree of essentiality in the functions they are related to (Jeong et al., 2001). The removal of a hub component is inclined to result in larger phenotypic alterations than those originated by the deletion of several non-hub nodes (Yu et al., 2008). Depending on their biological role, hubs are classified into "party" hubs, inserted in modules that organize specialized cellular processes (Taylor et al., 2009), and "date" hubs, which involve and coordinate more diverse pathways of the interactome (Han et al., 2004). Furthermore, the larger part of the known networks satisfies the small world property, which enunciates the existence of short paths joining together any pair of nodes in the network, meaning that the behaviour of the entire network or some of its neighborhoods can be altered by disturbances in the state of a single node (Watts and Strogatz, 1998).

Disease modules can be structured as network layers according to their components (e.g., DNA, mRNA, proteins, metabolites, lipids, SNPs), which are later superimposed in order to form multi-layer disease modules (MLDMs), based upon the links between the variables on each layer. For instance, a microRNA and a transcription factor belonging to separate layers of the model can be connected together if they both regulate the expression of the same gene. MLDMs serve as valuable tools in the identification of multi-layered diagnostic markers, can help in the design of human disease networks by merging modules formed from clinical data, and demonstrate a remarkable potential in disease progression tracking. While several limitations to the use of MLDMs do exist, such as the difficulty to obtain a sample size sufficiently large to produce statistically significant results, the technical problems related to the accuracy and sensitivity of high-throughput methods, or the simultaneous involvement

4

of multiple cell types in a disease, MLDMs are regarded as promising templates for the integration and study of disease-relevant data (Gustafsson et al., 2014).

The primary source of data for the present project consists in a series of gene expression microarray datasets, available for public access and download at the Gene Expression Omnibus (GEO) public repository, belonging to the adult cohorts of the Unbiased Biomarkers for the Predictions of Respiratory Disease Outcomes (U-BIOPRED) research study. U-BIOPRED (2010-2014) was a multicenter prospective cohort project resulting from a private-public partnership, which was conducted across 20 academic centers in 11 European countries and six patient organizations, within the framework of the Innovative Medicines Initiative program. It was funded by the European Commission and the European Federation of Pharmaceutical Industries and Associations (EFPIA). As stated by the European Lung Foundation (2013), its fundamental goals were the advance of the current understanding about severe asthma, the identification and investigation of the differences between asthma patients, and the discovery of new ideas that might help future efforts in the development of effective treatments against asthma.

The four groups (610 adults) recruited and classified according to the GINA guidelines were: a) severe non-smoking asthma group, formed by 311 non-smokers for at least the past 12 months, with uncontrolled symptoms and more than two exacerbations per year, even when treated with ICS; b) smokers and ex-smokers with severe asthma group, formed by 110 individuals with more than 5 pack-year smoking history, with asthma symptoms similar to the previous group; c) mild/moderate non-smoking asthma group, formed by 88 non-smokers for at least the past 12 months, with controlled or partially controlled asthma symptoms, and ICS-treated; and d) healthy non-smoking control group, formed by 101 non-smokers for at least the past 12 months, with no history of asthma or any other chronic respiratory disease. An extensive phenotypic characterization of the participants was performed for a set of candidate biomarkers of asthma, including spirometry (FEV1) values, FeNO, periostin, existence of atopy and comorbidities (e.g., allergic rhinitis, nasal polyps, sinusitis, eczemas), sputum eosinophil and neutrophil counts, white blood cell population counts, and treatment with corticosteroids, among other measurements depending on the sample type (blood, induced sputum, bronchial biopsy or epithelial brushing) (Shaw et al., 2015).

Data gathered in the U-BIOPRED project has already been studied to some extent. Lefaudeux et al. (2015) evaluated 418 samples containing complete information for eight particular physiological parameters (e.g. body mass index, FEV1, age of onset of asthma symptoms). They found out that the asthma samples could be grouped into four reproducible clusters, named T1 - T4, which were associated with certain pathobiological pathways and differed in their sputum proteomics and transcriptomics. While cluster T1 consisted in asthma patients of varying severity, clusters T2 - T4 congregated a major percentage of the severe asthmatics in the dataset, presenting a higher sputum eosinophilia than T1, without any significant changes in sputum neutrophilia, FeNO or serum IgE. Clusters T2 and T3 were similar regarding chronic airflow obstruction, but the former was composed by smokers whereas T3 was predominantly a non-smoking cluster. Finally, T4 diverged from the rest due to its enrichment in obese female asthmatics, with frequent exacerbations and normal lung functionality.

For their part, Bigler et al. (2017) were able to define a Severe Asthma Disease Signature (SADS) comprising 1693 differentially expressed genes (DEGs) by applying a transcriptomics workflow to analyze 498 of the gene expression in blood samples. Two main asthma patient clusters were identified by unbiased hierarchical clustering of the entire dataset. Firstly, a severe asthma-enriched cluster (SA-EC) containing 87 % of severe asthma samples; secondly, a mixed cluster (MC) formed by 56 % severe asthmatics, where between 86 to 90 % of the total mild/moderate asthma and healthy group samples were assigned to. The major differences regarding the two clusters were a higher total white blood cell and neutrophil counts for the SA-EC, but a lower lymphocyte count, compared to the MC. Moreover, severe asthmatics presented a significant up-regulation of certain biological pathways, such as chemotaxis, mobilization or migration, as well as a decrease in pathways related to B-lymphocytes,

5

B-cell development and hypoplasia of lymphoid organs. Treatment with corticosteroids heavily influenced the DEGs found in the SA-EC, which showed an enrichment for known corticosteroid targets. No gene expression differences were reported for corticosteroid users and non-users in the MC. Nonetheless, the annotated clinical features of the patients did not display a conclusive alignment with the clusters, meaning that the results were limited with respect to their clinical applicability.

Likewise, Kuo et al. (2017) analyzed sputum cell transcriptomics for 104 moderate or severe asthmatics and 16 controls belonging to the U-BIOPRED study. They performed a semibiased hierarchical clustering of the 508 DEGs obtained from the comparison of eosinophilic versus non-eosinophilic asthma, by which they defined three transcriptome-associated clusters (TACs) for the classification of asthmatic subjects and genes. The enrichment analysis of the TACs revealed that TAC1 was connected with blood and sputum eosinophilia, indicating a potential link with severe asthma with mast cell and eosinophil activation; TAC2 was related with the up-regulation of neutrophil-dependent inflammation pathways, paired with a reduced severity of the chronic airflow obstruction; while TAC3 presented a paucigranulocytic and mild eosinophilic molecular signature, with the lowest use of corticosteroids, a moderate airflow obstruction, and a decreased exacerbation frequency. In summary, these outcomes most assuredly provided new insights in asthma research, with a range of potential applications in the clinical management and drug development areas. However, further enhancements in this field are expected, motivated by the need to advance in the categorization of asthma and the urge for the integration of the current knowledge into a useful and straightforward structure. The extension of the classification efforts to incorporate asthma endotypes and the implementation of a systems-driven methodology are arising as new strategies to achieve these objectives.

The network-guided systems medicine approach has been the model of reference within the investigation group the project is framed in. By constructing a gene regulatory network of the transcriptome and methylome profiles of human CD4+ T cells at multiple time points, Gustafsson et al. (2015) managed to determine three early hub transcription factors related to T cell-disease-associated polymorphisms. Similarly, a highly-connected network module of dysregulated genes in multiple sclerosis (MS) was identified after the microarray analysis of in vitro samples from CD4+ T cells and the subsequent robust module exploration. Integration of GWAS showed an enrichment of the dynamic response genes in the module, which, translated into protein expression, could be arranged as a set of differentially activated proteins that can potentially serve as biomarkers in MS (Hellberg et al., 2016). Additional examples include the collaborative enterprise for the creation of software for the discovery of regulatory modules in whole-genome PPI networks (Vlaic et al., 2017) or the mathematical modeling analysis of the molecular mechanisms of insulin resistance in Type 2 diabetes via the mTOR complex 2 (Magnusson et al., 2017).

Notwithstanding previous research, the present project is pioneer in the exploration of the asthma syndrome from a personalized and network-based perspective, besides conceivably establishing the foundation of a computational pipeline that could be drawn upon in the evaluation of data from asthma and other diseases. The highly heterogeneous ways in which asthma presents poses a considerable challenge for any investigatory effort concerning the assortment of asthma phenotypes and the disentanglement of the relationships between the underlying biological routes that govern this condition. The main aim of the work is the identification and examination of individualized multi-omic modules and the production of asthma core modules by the application of a computational approach for data analysis and validation, in the context of systems medicine. Expectedly, the core modules will encompass key disease-related molecular components of the asthmatic cohorts within the U-BIOPRED study, from which a more personalized assessment based on patient-specific variation can be later applied, as well as assisting in the proposal of novel candidate targets for therapeutic intervention. Furthermore, they may contribute to the ongoing and future development of increasingly effective patient-specific treatments that will feasibly lead to an improvement in the quality of life of asthmatic patients, especially in the case of those suffering from the most severe forms of this respiratory disease.

6

Materials and Methods

Project outline and workflow

The approach observed in this project delved into the integration of microarray transcriptomics-based individual disease modules and whole genome SNP-omics, as a probationary step forward in the emergence of a personalized strategy for the diagnosis and management of asthma. The outline of the different stages that were designed and traversed during the project is depicted in Fig. 1.

Fig. 1 | Diagram of the workflow developed and applied for the identification of individualized and endotype-specific modules in asthma, by the combination of transcriptomic and genomic knowledge.

The download, pre-processing, statistical testing, and analysis of data was performed using R 3.3.2 as the central programming language, in the environment provided by RStudio 1.1.383 for Linux (RStudio, Inc.). An ordered list of the principal R packages used throughout the project is provided in the Appendix A. Other languages used during the project were Bash 4.4.12 and Python 3.6. The complete catalog of scripts can be accessed online at the GitLab repository of the project1.

Microarray transcriptomic profiling of asthma

The starting material for the individualized module production pipeline comprised a collection of five microarray gene expression data series originating from the U-BIOPRED project, submitted to the GEO database in 2015. In addition, a complementary file was provided, including a profound, yet partial, record of the values for a set of candidate biomarkers for asthma and other clinical traits obtained from U-BIOPRED participants. The main features of each dataset are shown in Table 1, whereas an extensive phenotypic characterization of the asthmatic patients and healthy controls involved in each study can be found in the Appendix B, Tables B1 to B5. In all cases, the microarray platform used in the experiments was the Affymetrix HT HG-U133+ PM Array Plate (GEO accession GPL13158), designed for the total RNA expression profiling across 54,715 probes, accounting for 20,741 unique genes.

Firstly, the raw data from each dataset was downloaded from GEO and stored as CEL files. The quality status of the arrays was inspected using the AffyAnalysisQC workflow (protocol available at the ArrayAnalysis online site2). Assessment plots for RNA degradation, background intensity, raw intensity distribution, intensity-dependent bias, spatial bias, correlation of expression, and cluster dendrograms were produced and evaluated, with satisfactory results for post-normalization parameters. Data was normalized with the GC-RMA (GeneChip Robust Multiarray Average) method (Wu et al., 2004).

1 https://gitlab.com/b16davma/Asthma_Modules 2 http://arrayanalysis.org/technicalDesc.php

7

Table 1 | Summary of the characteristics of the U-BIOPRED gene expression datasets.

GEO Accession Source Samples (n) Cohorts Reference

GSE69683 Blood 498

a) Healthy, non-smoking (n = 87)

b) Moderate asthma, non-smoking (n = 77)

c) Severe asthma, non-smoking (n = 246)

d) Severe asthma, smoking (n = 88)

Bigler et al. (2017)

GSE76262 Induced sputum

139

a) Healthy (n = 21)

b) Moderate asthma (n = 25)

c) Severe asthma (n = 93)

Kuo et al. (2017)

GSE76225 Bronchial biopsy

91 a) Moderate asthma (n = 35)

b) Severe asthma (n = 56) Unpublished

GSE76226 Epithelial brushing

99 a) Moderate asthma (n = 36)

b) Severe asthma (n = 63) Unpublished

GSE76227

Bronchial biopsy and epithelial brushing

190 a) Moderate asthma (n = 40)1

b) Severe asthma (n = 69)1 Unpublished

1Differences between the number of samples in the dataset (n = 190) and the number of patients in the study (n = 109) are

due to certain patients providing both epithelial brushing and bronchial biopsy samples.

Feature selection and linear modeling

According to the planned approach, a key prerequisite for the module production procedure was the identification of DEGs from the microarray datasets. For this purpose, limma (Linear models for microarray data) R package was used. limma is a well-known component of the Bioconductor software development project, predominantly utilized for the differential expression measurement and Bayesian estimation for microarray, protein array, and RNA sequencing data (Phipson et al., 2016).

The existence of a phenotypic information register enabled the partial or complete inclusion of a total of 21 baseline clinical features (asthma-related or not) as categorical or quantitative model covariates, to improve the relevance of the DEGs by the integration of sample-specific characteristics. Still, the large number of patient features and its lack of exhaustiveness (i.e., not every patient was tested for every trait) meant that a feature selection for differential expression significance was required, to determine the optimal balance between trait presence and model simplicity. With this aim in view, several limma linear models were developed and their performance tested by assessing their capacity to accurately classify individuals from the gene expression profiling in blood dataset (GSE69683, n = 498) or from the group of subjects with a complete phenotypic set (n = 88) into their correct sample category, in relation to the precision of a 21-covariate model. The linear models devised were: i) 21 partial models with 20 covariates each, excluding a different covariate in each model; ii) a model including only categorical covariates (five asthma comorbidities, use of corticosteroids, and gender); iii) a model including all the covariates plus the mean probe expression; iv) a model including every categorical covariate except gender; and v) a model including exclusively severe asthmatics, with diverse combinations of covariates. The adjustment method used was the Benjamini-Hochberg's (BH) false discovery rate (FDR) control, for a significance threshold of adj. p ≤ 0.05. Results were visualized in the form of volcano plots for the study of differential expression, and dendrograms with heat maps for the sample clustering analysis. Likewise, comparisons between models were examined by DEGs overlap and sample clustering accuracy. Clusters with an area under the curve (AU) ≥ 95 % were considered to be strongly supported by the data.

8

Simultaneously, the multiple factor analysis (MFA) of the phenotypic features was carried out, in the form of the factor analysis of mixed data (FAMD), due to the incorporation of qualitative and quantitative characteristics. MFA follows a two-step procedure in which a principal component analysis (PCA) is applied to normalized data. Then, the output is combined into a single table that is again analyzed via a non-normalized PCA (Abdi, Williams and Valentin, 2013). The purpose of this method consisted in the identification of correlated variables by their projection in a correlation circle, a bidimensional space delimited by the pair of dimensions with the most explained variance, in which the sample categories were also included. Hence, positively correlated covariates might be integrated into a single variable in the linear models. To further narrow down the most suitable variables for the prediction of asthma presence and severity, a generalized linear model (GLM) was designed. Input data was formed by the samples with a complete phenotypic set (n = 88) and the split for the training and validation groups was 80:20, respectively.

MODifieR: a robust module identification method

The approach of the present project revolved around the exploration and perfecting of the concept of patient-specific disease modules, that is, one module per data sample. Consequently, the strategy for the production of these expression-based structures represented a critical step in the workflow. The software tool chosen to this end was MODifieR 0.1.0, a currently under development package for R that aims to integrate a collection of module discovery algorithms into a single, enhanced pipeline able to generate robust consensus modules (Badam et al., 2017). Out of the ten methods planned to be included in the release version, three were implemented, by reason of method optimization status, and appropriate time and computational power requirements. The selected methods were: i) MCODE, a clique-based Molecular Complex Detection graph clustering algorithm that applies vertex weighting, complex prediction and a post-processing filter to detect densely connected network regions, calculates the vertex clustering coefficient to measure the "cliquishness" of a vertex neighborhood and uses the local highest weighted vertices as seed nodes (Bader and Hogue, 2003); ii) Clique Sum, a clique-based Susceptibility Module (SuM) identification algorithm that constructs modules using a PPI network and a set of disease-specific DEGs, and defines highly interconnected SuM regions as core SuMs, which are enriched for pathways involved in complex diseases and genes harboring disease-associated SNPs from GWAS (Barrenäs et al., 2012); and iii) DIAMOnD, a seed gene-based Disease Module Detection algorithm that uses network interaction significance instead of density to infer new disease-linked genes, assumes that disease modules overlap scarcely or not at all with densely interconnected topological communities, and depends on high degree seed nodes known to be tied to the disease (Ghiassian, Menche and Barabási, 2015).

The knowledge gained from the covariate analysis orientated the design of an alternative procedure for the preparation of MODifieR input files. Instead of supplying a list of DEGs with their respective p-values to the module identification function, a novel linear model was assembled. For each sample and probe, a p-value was calculated with the limma lmFit function by comparing it to the rest of expression values across the array, establishing no distinctions between asthma categories. Afterwards, a multiple t-testing process generated a normally distributed matrix of p-values for every dataset, which could be then formatted into the method-specific gene nomenclature and arrangement demanded by MODifieR at that stage. The microarray platform annotation file was used to map the Affymetrix probe IDs to their unique gene identifiers. The lowest p-value was kept for gene-redundant probes, while probe rows linked with two or more genes were expanded to accommodate each gene separately. Lastly, the PPI networks were obtained from the STRING 9.1 interaction database (highest confidence score, > 0.900) (Franceschini et al., 2013).

Module production and workflow optimization

Once the input files were prepared for each expression dataset, the R scripts for the application of the MODifieR algorithms for module detection were written and tested. A thorough optimization of

9

MODifieR methods was carried out to choose the best performing parameter values in terms of gene number per individual module (MCODE: vertex weigh percentage = 0.5, cluster density cutoff = 0.8; Clique Sum: cutoff = 0.05, minimum DEGs per clique = 3; DIAMOnD: seed weight = 1, output genes = 200). Due to the substantial requirements in the quantity of input files (1,017 samples per complete run, by three methods per sample), processing time (t = 70 s for a single-sample MCODE run, t = 80 s for DIAMOnD, t = 23 min for Clique Sum), and memory (a minimum of 8 GB of RAM per sample and method), access to the Gamma cluster (National Supercomputer Centre, Linköping University) was provided. Gamma comprises a total of 240 16-core compute nodes, using CentOS Linux 6 as operating system and Slurm Workload Manager 17.11 as batch queue system. This allowed the application of simultaneous and automatized routines, and the assignment of the adequate computational time and memory for batch job. A standard module produced with this approach for a single individual and method resembled an unranked list of a variable number of genes in Entrez ID annotation. The scripts for the submission of jobs to Gamma were designed in Bash 4.4.12, while Gamma file management and result retrieval was done from a local computer, using SSHFS (Secure SHell FileSystem) remote file software.

Genome-wide profiling of asthma

Every participant of the U-BIOPRED project was involved in a genome-wide association study (GWAS) of asthma and its comorbidities (Affymetrix Axiom UKB_WCSG Array, 845,487 probes, n = 610). This created a valuable input of stand-alone genomic data, but also provided an opportunity for its combination with the former expression profiles, now transformed into individualized expression modules, within a multi-dimensional framework. The original raw data (bed, bim and fam files) had already undergone quality control, pre-processing, and partial statistical analysis with the PLINK 1.07 tool set (Purcell et al., 2007) when received. Fisher's exact test p-values were calculated for every probe, for a set of comparisons between asthma categories, and patient comorbidities. Preliminarily, the data was reviewed in PLINK using Python 3.6, before importing it into R.

The individualization of GWAS data was initiated by extracting the genotype calls (AA, AB, or BB) for each SNP probe and patient. Only individuals with a matching gene expression profile were included. Furthermore, in order to avoid false-positive associations, samples that did not pertain to the most represented genetic ancestry population in the GWAS ("white Caucasian") were excluded. The remaining data set contained 365 samples, from which 230 were severe asthmatics, 65 moderate asthmatics, and 70 controls. Next, the frequency of the genotype calls per asthma category was obtained for every probe, thus being able to calculate the likelihood ratio (LR) of each genotype for every SNP, as in Equation 1 (for the case of AA genotype, asthma versus controls).

𝐿𝑅𝐴𝐴 =𝐴𝐴 𝑎𝑠𝑡ℎ𝑚𝑎 𝑇𝑜𝑡𝑎𝑙 𝑎𝑠𝑡ℎ𝑚𝑎⁄

𝐴𝐴 𝑐𝑜𝑛𝑡𝑟𝑜𝑙𝑠 𝑇𝑜𝑡𝑎𝑙 𝑐𝑜𝑛𝑡𝑟𝑜𝑙𝑠⁄

The corresponding Fisher's exact test p-value from the asthmatics versus controls comparison was assigned to the three genotype calls of its probe, to be transformed in the following manner: p-values for genotype calls which had a LR > 1, indicating a higher probability of association with asthma for the genotype and SNP, were divided by that LR; whereas genotype calls with a LR < 1 received a p-value of 1 (no association). Hence, each sample was given a unique genome-wide profile of SNP rsID numbers and p-values based on their exact genotype calls.

Withal, the reduced sample size and consequent low statistical power of the U-BIOPRED GWAS meant that its results were to be considered with caution (Klein, 2007), asides from the individualization of the genotype calls. Therefore, to provide a source of SNP association to asthma of a satisfactory power for the ensuing genomic validation of the expression modules, public GWAS data from the GABRIEL (A Multidisciplinary Study to Identify the Genetic and Environmental Causes of Asthma in the European Community) Consortium (582,892 probes, ncases = 10,365, ncontrols = 16,110) was used (Moffatt et al., 2010). GABRIEL results were provided as random effect model p-values, derived

(1)

10

from their GWAS meta-analysis approach, which accounted for the potential heterogeneity between studies, as opposed to methods based on fixed-effect modeling. Remarkably, for the scant number of SNP probes common to both studies (86,410 probes), GABRIEL GWAS p-values and U-BIOPRED GWAS Fisher's exact test p-values were not correlated (Spearman’s rank correlation test, ρ ≈ 0).

PASCAL module scoring and pathway analysis

Once the individual SNP profiles were produced, their integration within their expression modules was carried out using PASCAL (Pathway scoring algorithm). Developed by Lamparter et al. (2016) as a tool to incorporate GWAS SNP p-values across genes and pathways, PASCAL was run for the coupled U-BIOPRED transcriptomic and genomic profiles of each sample to compute a p-value for the significance of the pair (significance threshold: p ≤ 0.05). PASCAL scoring corrects for linkage disequilibrium (LD) given a reference population. To this end, the European population of the 1,000 Genomes Project (1KG) was used (Altshuler et al., 2012), with a minor allele frequency cutoff of 0.05 and a window size of 50 kb up and downstream from gene locations. Apart from single-method modules, every possible module resulting from the union or intersection of the latter was also built and analyzed (e.g., "MCODE ∩ Clique Sum"). In total, the performance of 14 module combinations per sample (5,110 modules) was assessed with PASCAL. An R-based classification system was implemented to select and store the module combination with the most significant PASCAL score in every case. These optimal combination modules were examined by the Euclidean hierarchical clustering of their distance measures. Additionally, PASCAL pathway enrichment scores were calculated for asthmatic SNP profiles in comparison with controls.

At the same time, a straightforward approach involving the direct overlap between SNP-associated genes and module genes was tested. To this aim, every SNP probe from both GWAS below the p-value thresholds of 1 x 10-3 and 5 x 10-3 was listed. SNPs in LD were obtained (distance limit = 500 kb, r-squared limit = 0.8) and added to the initial sets, which were converted into unique SNP-linked genes that could be matched with the module genes for the 14 module combinations, distributed by sample category, for the five U-BIOPRED expression datasets. The outcome was evaluated by calculating the statistical significance of the overlaps, expressed as a p-value.

Core module development and network modeling

Conforming to the project's workflow, the concept of core module was introduced. In the present context, a core module simply represented a set of genes for which the modules of a sample category were significantly enriched, in contrast to the modules of another sample category. The identification of core module genes was done by Chi-squared testing for two-proportion comparisons (significance threshold: p ≤ 0.05), with Yate's continuity correction for cases with gene counts of n ≤ 5. Due to the results of the DEG detection process, the approach was applied to modules originating from samples in the expression profiling in blood (GSE69683) and sputum (GSE76262) datasets. The procedure was performed for the 14 module combinations in each case, producing a core module for every combination and sample category. A PASCAL score was computed for the resulting core module genes, using the SNP data from GABRIEL GWAS with their corresponding random effect model p-values.

Subsequently, the core module with the most significant PASCAL p-value ("Clique Sum ∪ DIAMOnD ∩ MCODE", for the comparison severe asthma versus moderate asthma and controls) was evaluated by GO (Gene Ontology) and KEGG (Kyoto Encyclopedia of Genes and Genomes) over-representation tests and DO (Disease Ontology) semantic and enrichment analysis. The p-value adjustment method selected was the Benjamini-Hochberg procedure, for a significance threshold of adj. p ≤ 0.05. Finally, the network structure of the core module was examined in the STRING 10.5 database (Szklarczyk et al., 2017), filtering the protein-protein interactions at the > 0.900 confidence score level. The network data (397 nodes, 4,300 links) was imported to Cytoscape 3.6.0 and the MCODE 1.5.1 Cytoscape application was used to identify highly clustered regions of interest, which were further analyzed for enrichment in KEGG pathways, GO biological processes, and DO diseases.

11

Results

Linear modeling and covariate analysis

The best-performing phenotypic classification model identified by heat map clustering (Fig. 2a) was able to separate severe asthmatics (smokers and non-smokers) from moderate asthmatics and controls with an accuracy of 79.5 % for subjects with a complete feature set (n = 88). Covariates included were: age, total IgE (IU/ml), C-reactive protein (mg/l), periostin (ng/ml), sputum eosinophils (%), sputum macrophages (%), sputum lymphocytes (%), sputum mast cells (%), FeNO, and FEV1 (%). For its part, the multiple factor analysis (Fig. 2b) revealed the existence of positive correlations between certain asthma comorbidities (allergic rhinitis, atopy and eczema); between CRP, BMI, neutrophil %, total IgE, and severe asthma; or between lymphocyte and macrophage %, and FEV1, which are in turn negatively correlated with severe asthma. Differences between moderate and severe asthmatics, with the former being closer to controls than to the latter, are also showcased.

Fig. 2 | (a) Heat map clustering of U-BIOPRED gene expression samples with a complete feature set (n = 88), divided into healthy controls (green), moderate asthmatics (blue), severe asthmatic non-smokers (red), and severe asthmatic smokers (black). Phenotypic features were selected for the highest accuracy (79.5 % of samples correctly clustered). (b) Correlation circle for the multiple factor analysis of the phenotypic features of the U-BIOPRED participants and its sample categories: severe asthma (SA), moderate asthma (MA) and healthy controls (H). Arrows pointing towards the same direction denote a positive correlation between variables; arrows pointing in opposite directions indicate a negative correlation.

In spite of these early results, the outcome of the GLM indicated that none of the covariates was found to be indispensable for the estimation of the asthma category of a sample at the α = 0.05 significance level (Appendix C). This implication was thereafter emphasized by the drop of accuracy to almost random guessing (< 50 %) shown by the previous phenotypic model when attempting to classify the samples from the entire gene expression profiling in blood dataset (n = 498). Thus, no phenotypic features were added as explanatory variables to the definitive linear model, apart from gender. The inclusion of the latter was supported by the differences in gender proportion between asthma categories, which caused the identification of an elevated number of gender-related DEGs in the limma models that did not correct for this issue. The top up- and down-regulated probe sets from the gene expression profiling in blood (GSE69683) and induced sputum (GSE76262), and their associated genes, determined by the final linear model for severe asthmatics versus controls can be consulted in the

a b

12

Appendix D, Tables D1-D4. No significant DEGs were found for the expression in bronchial biopsy (GSE76225), epithelial brushing (GSE76226) or bronchial biopsy and epithelial brushing (GSE76227) microarray datasets (significance threshold: adj. p ≤ 0.05, BH correction).

SNP association with asthma in U-BIOPRED

Prior to the individualization of the U-BIOPRED GWAS genotype calls, the Fisher's exact test p-values obtained from PLINK were sorted to determine candidate SNPs associated with asthma or asthma comorbidities. Firstly, adhering to Clarke et al. (2011) protocol, the presence of confounders was assessed by producing a Q-Q (quantile-quantile) plot for each comparison. Then, the strongest SNP associations with asthma across chromosomal locations were displayed by generating the respective manhattan plots. Both representations and the summary of the suggestive SNPs for linkage to asthma in U-BIOPRED are available in the Appendix E, Figs. E1-E8 and Table E9. For a FDR of α = 0.05, the per-SNP genome-wide significance threshold was set to p ≤ 5 x 10-8 (Bonferroni correction) and the suggestive significance to p ≤ 1 x 10-5, in accordance with Pe’er et al. (2008) standard.

Identification of individual modules

The MODifieR workflow for the MCODE, Clique Sum, and DIAMOnD module detection methods was successfully applied to produce a gene expression module for each U-BIOPRED sample in the five microarray datasets, accounting for a total number of 2,871 modules (957 samples). The results for each dataset, regarding the number of unique genes included in the combined modules of each method, are displayed in Table 2.

Table 2 | Summary of the MODifieR module identification procedure for the U-BIOPRED datasets. Overlaps are formed by the intersected genes between methods (MC: MCODE; CS: Clique Sum; D: DIAMOnD). Consensus unique genes comprise every gene present in at least two out of three methods.

Dataset Source Modules Method Unique genes

Unique genes (method overlap)

Unique genes (consensus)

GSE69683 Blood 498

MCODE 4,061 MC ∩ CS: 2,300

Clique Sum 5,995 MC ∩ D: 2,891 5,292

DIAMOnD 13,884 CS ∩ D: 4,981

GSE76262 Induced sputum

139

MCODE 3,790 MC ∩ CS: 2,157

Clique Sum 5,995 MC ∩ D: 955 2,999

DIAMOnD 3,593 CS ∩ D: 2,035

GSE76225 Bronchial biopsy

91

MCODE 3,619 MC ∩ CS: 2,069

Clique Sum 5,995 MC ∩ D: 836 2,696

DIAMOnD 2,851 CS ∩ D: 1,670

GSE76226 Epithelial brushing

99

MCODE 3,681 MC ∩ CS: 2,097

Clique Sum 5,995 MC ∩ D: 1,642 3,938

DIAMOnD 6,575 CS ∩ D: 3,378

GSE76227

Bronchial biopsy and epithelial brushing

190

MCODE 3,652 MC ∩ CS: 2,085

Clique Sum 5,995 MC ∩ D: 1,686 3,901

DIAMOnD 7,524 CS ∩ D: 3,506

13

In addition, the PASCAL scoring of the 365 modules with matching SNP profiles from the U-BIOPRED GWAS and their possible unions and intersections showed that the combination "MCODE ∪ Clique Sum ∪ DIAMOnD" presented the highest frequency (23.9 %) of optimal modules (individual modules with the most significant PASCAL p-value), followed by "MCODE ∪ Clique Sum" (12.4 %), and "MCODE ∪ DIAMOnD" (10.1 %). MODifieR method-wise, MCODE achieved the best result (7.9 %), whereas the performance of Clique Sum (1.4 %) and DIAMOnD (2.1 %) was lower. Overall, the intersected method "MCODE ∩ Clique Sum ∩ DIAMOnD" occupied the last place for the frequency of optimal combinations (0.2 %). The complete result is available in the Appendix F, Table F1.

Lastly, the results of the integration of GABRIEL and U-BIOPRED GWAS and expression data by the overlap between SNP-associated genes and module genes of every U-BIOPRED dataset across sample categories are collected in the Appendix G, Figs. G1-G5. Although certain method combinations scored below the significance threshold (p ≤ 2.4 x 10-6), no significant difference between severe asthmatics, moderate asthmatics, and controls could be found in the same module combination by directly overlaying MODifieR expression-based modules and genome-wide information from asthma-related studies.

Core modules I: PASCAL validation

A set of core modules was generated for the module collections belonging to the expression datasets with significant DEGs (GSE69683 and GSE76262) by contrasting the frequency of the genes in the modules across sample categories (e.g., gene A is present in 192/230 severe asthma modules but only 10/70 control modules, obtaining a two-proportion test Chi-squared p < 0.05 and thus being included in the "severe asthma versus controls" core module). The results of the PASCAL validation with GABRIEL GWAS data of the core modules are shown in Fig. 3 (significance threshold: p ≤ 0.05). In the case of the expression profiling in blood dataset (Fig. 3a), the core modules of the MODifieR method combinations "MCODE", "MCODE ∩ Clique Sum", "MCODE ∩ DIAMOnD", "MCODE ∪ DIAMOnD", "MCODE ∩ Clique Sum ∩ DIAMOnD", "MCODE ∪ Clique Sum ∩ DIAMOnD", and "Clique Sum ∪ DIAMOnD ∩ MCODE" were found to be significant, mainly for the comparisons severe asthma versus moderate asthma and controls, and severe asthma versus moderate asthma. Yet, fewer significant core modules were found for the expression profiling in induced sputum (Fig. 3b), including only the "MCODE ∪ Clique Sum" combination for severe asthma versus moderate asthma, and "Clique Sum ∩ DIAMOnD" for moderate asthma versus controls.

a

*

*

** **

*

*

* * * *

**

*

**

**

14

Fig. 3 | (a) PASCAL scoring of core modules from the expression profiling in blood dataset (GSE69683), divided by sample category comparison and MODifieR method combination. (b) PASCAL scoring of core modules from the expression profiling in induced sputum dataset (GSE76262), divided by sample category comparison and MODifieR method combination. GABRIEL GWAS data was used as the source for SNP association with asthma. An asterisk denotes significance at p ≤ 0.05; a double asterisk indicates significance at p ≤ 0.01.

Core modules II: Network modeling and enrichment analysis

The core module with the most significant PASCAL score ("Clique Sum ∪ DIAMOnD ∩ MCODE" for severe asthma versus moderate asthma and controls, GSE69683, Fig. 4a) was selected for the network analysis. Its general network layout and highly connected regions (clusters) are displayed in Fig. 4b.

Fig. 4 | (a) Top 60 genes of the "Clique Sum ∪ DIAMOnD ∩ MCODE" core module by significance of the frequency comparison between sample categories (severe asthma: purple bars, moderate asthma: green bars, healthy controls: yellow bars), divided by network clusters (1: dark green, 2: red, 3: green, 4: orange, 5: pink, 6: blue, 0: unclustered genes). (b) Network visualization of the selected core module, colored by network clusters (same pattern as a; grey: unclustered module genes). Unconnected nodes are not displayed.

a b

b

* *

*

15

Regarding network statistics, the core module structure was formed by 397 nodes and 4,300 links, and presented a clustering coefficient = 0.538, network centralization = 0.210, network heterogeneity = 0.830, network density = 0.055, and average number of neighbors = 21.767. In terms of disease enrichment with DO, the core module genes were found to be significantly associated (p ≤ 0.05) with asthma (DOID:2841, p = 0.011). Pathway enrichment analysis results (GO and KEGG) for the core module are disclosed in the Appendix H, Tables H1-H2. Several pathways related to inflammation or immune response, closely linked with the biological mechanisms of particular asthma phenotypes, were significantly enriched (adj. p ≤ 0.05), such as the T cell receptor signaling pathway, PI3K-Akt signaling pathway, Th17 cell differentiation, Th1 and Th2 cell differentiation, or the IL-17 signaling pathway (Appendix H, Table H3). The core module genes from the six network clusters identified by the Cytoscape MCODE tool, ranked by score (Cluster 1: 22.81, 48 nodes and 536 links; Cluster 2: 13.88, 17 nodes and 111 links; Cluster 3: 9.90, 20 nodes and 94 links; Cluster 4: 8.72, 79 nodes and 340 links; Cluster 5: 6.74, 58 nodes and 192 links; and Cluster 6: 4.00, 26 nodes and 50 links), plus unclustered genes, are available in the Appendix I, while their network representations with integrated expression data (as exemplified for cluster 5 in Fig. 5) can be visualized in the Appendix J.

Fig. 5 | Representation of the network cluster 5 of the core module "Clique Sum ∪ DIAMOnD ∩ MCODE" from the GSE69683 dataset. Node size is directly dependent on node degree. Gene expression data from the related limma model (severe asthmatics versus moderate asthmatics and controls) has been added to each module gene: up-regulated genes are shown in green, down-regulated genes are shown in red (log2 fold change scale: -0.50 to 0.50). The cluster is displayed in the "Prefuse Force Directed" layout from Cytoscape.

Furthermore, the DO over-representation test for the network clusters indicated a significant association with asthma for Cluster 5 (p = 0.010), due to the presence of the annotated asthma-linked genes PIK3CG, FYN, SMAD2, and ITK. No significant relation with asthma was found for the rest of the clusters, though the outcome highlighted the inclusion of other acknowledged asthma-associated genes in them (Cluster 3: KDR, p = 0.287; Cluster 6: EDN1 and IFNG, p = 0.060). Similar to the core module case, the network clusters were analyzed individually for GO and KEGG pathway enrichment, in order to narrow down the scope in the search for mechanisms of interest in the pathophysiology of asthma that might have been ensnared by the core module network structure. The principal results of the enrichment analyses for the core module "Clique Sum ∪ DIAMOnD ∩ MCODE" clusters can be consulted in the Appendix H, Tables H4-H5.

Cluster 5

Core module CS ∪ D ∩ MC

-0.50 0 0.50

Expression (logFC)

16

Discussion

The exploratory nature of this project, involving the testing of novel biological module identification methodology and the establishment of a robust transcriptomic and genomic-based workflow, apt to be applied to data from asthma or other analogous diseases, entailed the lay-out and scrutiny of a variety of probationary paths, in that the assembly of their intersections could lead to a reliable route for the understanding of the asthma syndrome from a systems medicine approach. First, the phenotypic feature selection implemented to determine the most suitable covariates for the limma modeling delivered an engaging yet indefinite outcome. On the one hand, the MFA result suggested the existence of a number of correlations for the asthma categories and candidate biomarkers covered in the U-BIOPRED consortium (Fig. 2b). A portion of them corresponded to thoroughly studied asthma predictors, such as age and BMI being associated with severe or difficult-to-treat asthma (Miller et al., 2006), the significant decrease in FEV1 and increase in FeNO observed in asthmatics (Sato et al., 2009) (Karrasch et al., 2017), or the positive link between chronic rhinosinusitis and asthma (Gillis, Crabtree and Smith, 1979a) (Loymans et al., 2016). Eosinophilia has been consistently related with asthma severity and progression, either in peripheral blood and bronchial lavage (Bousquet et al., 1990), or in sputum (Louis et al., 2000); whereas the association between airway neutrophilia and airway obstruction in asthma has also been indicated by previous studies (Shaw et al., 2007). The negative correlation observed for severe asthma and sputum lymphocyte and macrophage populations could be a priori explained by the immunosuppressive and anti-inflammatory effects exerted by corticosteroids (Barnes, 1998) (Olnes et al., 2016), with 39.5 % of the severe asthmatic participants stating a regular use of OCS. However, a comparable behaviour for neutrophil and eosinophil levels would be expected in that situation (Goulding and Guyre, 1993) (Gillis, Crabtree and Smith, 1979b), but this was not displayed to the same extent by the multifactorial analysis.

Nevertheless, the translation of these insights into a model with the capacity to discern between asthma categories was elusive, as reflected by the low accuracy of the phenotypic clustering when evaluating the complete samples of the expression profiling in blood dataset (GSE69683, n = 498). This could be attributed to the ample diversity of mechanisms involved in the pathogenesis of asthma, coupled with the inter-individual differences in oral corticosteroid sensitivity (Hew et al., 2006), the presence of a varied set of comorbidities across asthmatic and control cohorts, or the lack of data for certain patients and clinical features. Notably, the empirically-built classification model based on phenotypic features (Fig. 2a) contained most of the characteristics reported by Bigler et al. (2017) to be elevated in U-BIOPRED severe asthmatics (age, FeNO, total IgE, and white blood cell counts). Plus, the model included CRP level, whose increment is related to non-allergic asthma (Olafsdottir et al., 2005) and inversely correlated with FEV1, as reported by Hancox et al. (2007) and shown in the MFA result; total IgE, a common indicator of eosinophilic asthma phenotypes with a Th2 cytokine profile (Manise et al., 2013), with close ties to eczema and rhinitis as asthma comorbidities, according to Ballardini et al. (2016) and depicted in the MFA, which also links it to atopy; and serum periostin level, a surrogate biomarker for asthma correlated with tissue remodeling (Izuhara, Ohta and Ono, 2016) and, in the case of the U-BIOPRED study, with nasal polyps and eosinophilia as well.

Furthermore, in the GLM, the non-significant p-values given for the Z-statistic for phenotypic features (p ≈ 1) meant that, in principle, the data alone was not providing sufficient evidence for any of the covariates to be added to the model (Appendix C). This lack of evidence did not automatically imply that a particular variable was not relevant, especially when considering the large standard errors obtained for a majority of the features (Hill and Lewicki, 2007). Still, to avoid the introduction of bias in the limma models, by the omission of covariates whose significance in asthma might not have been adequately captured in the feature selection process, and to accentuate the differential expression of genes, not only across asthma categories but between single individuals, the multiple t-testing method to produce the MODifieR transcriptomic input files was used. The module identification phase resulted in the production of individualized gene collections for each of the U-BIOPRED samples. An aspect

17

shared by the three applied methods (MCODE, Clique Sum, and DIAMOnD) was the noticeably disparate dimension of the modules they produced, ranging from a few dozens to several thousands of genes. This represented a double-edged sword: large modules comprising very diverse profiles offered wider possibilities of analysis while raising the research complexity and volume. In terms of unique genes, DIAMOnD was the method with the most differing output, accounting for almost 14,000 unique genes in the combined modules of the expression profiling in blood dataset. MCODE and Clique Sum modules were often formed by fewer elements, which in turn were more closely related to each other, as shown by the relatively high number of genes in the overlaps of the modules originating from these methods, in every dataset (Table 2). Incoming versions of MODifieR would expectedly display improvements in this respect, such as the integration of a scoring system for module genes to support a threshold-based limitation of the module size, the possibility of using specialized PPI networks (e.g., co-expression, homology, experimentally determined interactions), or the introduction of directed edges to enhance the examination of regulatory processes within the modules.

Afterwards, transcriptomic-based modules and personalized SNP profiles from the U-BIOPRED GWAS were paired for the validation with PASCAL, which evaluated both profiles and ranked the modules to find the optimal method combination for every sample (Appendix F). Nonetheless, the fraction of samples without a significant combination (36.0 %) was higher than the frequency of the best combination (23.9 %, for the module formed by the union of the three MODifieR methods). Moreover, although MCODE frequency (7.9 %) was superior compared to the other single methods (1.4 % for Clique Sum, 2.1 % for DIAMOnD), their low performance in comparison to the union of modules from different methods indicates a possible synergic effect, which in turn suggests that a consensus of MODifieR outputs is preferable to the separate module-identifying algorithms, for the surveyed conditions and data. Concerning the overlap between module genes and SNPs associated with asthma from U-BIOPRED and GABRIEL GWAS (Appendix G), despite certain module combinations being enriched for asthma-related SNPs, no significant differences were found for modules belonging to different asthma categories. A parallel procedure was tested by Barrenäs et al. (2012) in their analysis of the SNP enrichment in susceptibility modules (SuMs). In their study, they observed that SuMs were enriched for GWAS genes, though this fold enrichment lost its significance when considering exclusively DEGs for the corresponding diseases. Due to the dependence on gene expression inputs for the module identification workflow applied in this project, it seems reasonable to expect absent or very attenuated differential SNP enrichments in the straight overlaps of module and GWAS genes. Besides, the highly varying module size might have masked divergences between asthma categories if those were only marginally manifested in the modules, as the results indicated. The inclusion of a module gene ranking system or the implementation of GWAS data in the methods used for module detection could feasibly correct for this issue in future work.

The construction of the core modules for sample category comparisons and their validation for asthma association (core disease modules) with PASCAL identified the combination "Clique Sum ∪ DIAMOnD ∩ MCODE" for severe asthma versus moderate asthma and controls (dataset GSE69683) as the most significant. Asides from the PASCAL scoring, the size of the resulting module (397 genes) was suitable for further examination. Larger modules often lack applicability, while modules of reduced dimension (less than 100 genes) are seldom informative for the inference of novel hypotheses in a systems biology context. The GO and KEGG enrichment analysis for the core module revealed a preponderance of signaling pathways, including the positive regulation of multiple kinase activities, the mitogen-activated protein kinase (MAPK) cascade, or the phosphatidylinositol 3-kinase/Akt/ mammalian target of rapamycin (PI3K/Akt/mTOR), Ras, and Rap1 signaling pathways; as well as increasingly specific signal transduction routes such as the thyroid hormone, EGFR (epidermal growth factor receptor), and the ErbB (erythroblastic leukemia viral oncogene) signaling pathways, Fc receptor and T cell receptor signaling, among others (Tables H1, H2). Interestingly, the core module genes were significantly enriched for several cancer types, such as chronic myeloid leukemia, prostate cancer, or Kaposi's sarcoma. Although recent studies have suggested a possible link between asthma and cancer

18

risk and prognosis, especially in the case of lung cancer (Liu et al., 2015) (Qu et al., 2017), opposing conclusions have also been reported (González-Pérez et al., 2006) (Rosenberger et al., 2012), and thus the association of asthma and cancer is currently considered unclear.

Attending to the core module network clusters (Appendices H, I), cluster 1 enclosed a variety of translation-related genes, along with components of the thyroid hormone and Notch signaling pathways. Hyperthyroidism is associated with a worsening of the asthma control, which improves in hypothyroid conditions. Therefore, it has been suggested that thyroid hormones play an active role in maintaining the airway function, though the exact regulatory mechanism is unknown (Ayres and Clark, 1981) (Luong and Nguyen, 2000). Notch ligands can induce Th1/Th2 cell differentiation, in an alternative pathway to cytokine-mediated activation (Amsen, Spilianakis and Flavell, 2009). Noticeably, cluster 1 included CEBPB (NF-IL6), a transcription factor implicated in Th2 cell differentiation that has been appointed as a candidate target for asthma (Davydov, Krammer and Li-Weber, 1995); CREBBP (CBP), the specific co-activator of the CREB transcription factor, whose expression is related to inflammation and severity in asthma (Chiappara et al., 2007); and PPARG, a nuclear receptor involved in the long-term control of early onset asthma (Palmer et al., 2007). The appearance of a large number of unspecific ribosomal protein genes is believed to have originated from a defective performance of the module identification algorithms, which might incorporate them by default based on their supra-average interconnectivity, as observed in the PPI networks. For its part, cluster 2 contained different nodes responsible for the EGFR, ErbB, and MAPK activities detected for the core module, congregating the majority of the cancer-linked genes (EGF, EGFR, JAK1, MAP2K1). EGF has been shown to promote tissue remodeling after an acute asthma attack (Enomoto et al., 2009), while a CA repeat polymorphism in the intron 1 of its receptor (EGFR) has been connected to asthma sensitivity and severity (Wang et al., 2006). Likewise, the Janus kinase 1 (JAK1) rs2780895 SNP is significantly associated with a higher susceptibility to asthma (Hsieh et al., 2011). MEK1 (MAP2K1) participates in an IL-2/IL-4-driven mechanism that builds resistance against TGF-β and IL-10 inhibition pathways (Liang et al., 2010). Additionally, the cluster enclosed the beta-arrestin-2 gene (ARRB2), a regulator of allergic asthma development (Walker et al., 2003); and the cortactin gene (CTTN), linked to differential susceptibility to severe asthma occurrence (Ma et al., 2008).

On the other hand, cluster 3 comprised primarily PI3K/Akt signaling activities, which play a critical role in asthma by promoting airway inflammation and hyperresponsiveness, up-regulating Th2 cytokine levels and elevating the generation of mucus (Takeda et al., 2010) (Medina-Tato, Ward and Watson, 2007). PI3K and Notch signaling pathways, whose elements are dispersed across the core module (JAK1, cluster 2; AKT3, JAK2, JAK3, PIK3CD, PDPK1, SRC, cluster 3; PIK3CA, cluster 4; CHUK, HSP90AA1, PTEN, cluster 5; NOTCH1, cluster 6; BCL2, unclustered), are known to coordinately modulate T-lymphocyte activation and proliferation in asthma (Zhang et al., 2013). Concretely, the activation of the protein kinase mTOR, downstream in PI3K signaling, has been proposed as a crucial step in the onset of asthma, via the phosphorylation of the ribosomal protein S6 kinase (RPS6KB1, unclustered) (Zhang et al., 2017). JAK2 and JAK3 products have been proposed as therapeutic targets for asthma (Wong and Leong, 2004); VEGF product, the vascular endothelial growth factor, is up-regulated in induced sputum in childhood bronchial asthma (Hossny et al., 2009); and the latter plus KDR, encoding the VEGF receptor Flk-1, are overexpressed in allergic rhinitis patients with asthma (Yuksel et al., 2007). The drug-mediated inhibition of PDK1 (PDPK1) restrains experimental asthma development (Hayashi et al., 2007), whereas repression of PI3Kδ (PI3KCD) has been tested for the treatment of asthma (Park, Min and Lee, 2008). Other inflammatory pathways detected were the chemokine signaling pathway (PTK2, PXN, cluster 4; BCAR1, ITK, PTK2B, cluster 5; PAK1, VAV3, cluster 6) and the high-affinity IgE receptor (FcεRI) signaling pathway (LCP2, cluster 4; FYN, cluster 5; LAT, unclustered). Itk tyrosine kinase (ITK) is a known target of allergic airway anti-inflammatory agents (Wong, 2005); FYN polymorphisms have tied it to allergic asthma in children (Szczepankiewicz et al., 2007); and LAT is involved in Th2 polarization, again for allergic asthma patients (Li et al., 2013).

19

Enriched pathways in cluster 4 included immune response, focal adhesion, Fc receptor regulation, T-cell receptor, and blood coagulation activities, with a recurrent association of the module genes to cancer types and carcinogenesis. Particularly, focal adhesion (PTK2, PXN, TLN1, cluster 4) and its relation to airway smooth muscle remodeling (Dekkers et al., 2013) are relevant to the chronification of the airway inflammation that is observed in some severe asthma patients (Ammit and Panettieri, 2001). A SNP from one of these genes (TLN1, rs4879926) is associated with total IgE in asthmatics (Kim et al., 2013). Further, the activation of the receptor tyrosine kinase ErbB2 (ERBB2) from the ErbB signaling pathway (SOS1, cluster 3; GRB2, cluster 4) is known to trigger epithelial repair in a similar way to the EGFR mechanism (Polosa et al., 2002). Fittingly, ERBB2 appears to be negatively correlated to asthma severity, according to the gene expression network approach carried out by Modena et al. (2017). Two of ERBB2 SNPs, rs1058808 and rs2952156, have been associated with asthma (Song and Lee, 2013) (Demenais et al., 2018). Also in cluster 4, caveolin-1 (CAV1) plays a key role in epithelial barrier function and Th2 response in asthma (Hackett et al., 2013); IKBA (NFKBIA) rs2233407 AT polymorphism is related to atopic asthma progression (Park et al., 2010); and IKK-2 (IKBKB) inhibition has been suggested as a potential treatment for asthma (Birrell et al., 2005).

Cluster 5 was predominantly linked with cell cycle governance, due to the presence of the cyclin-dependent kinase 2 (CDK2) and the cyclin A and D genes (CCNA1 and CCND1). The increased expression of both cyclins and the decreased expression of p27kip1 (CDKN1B, in cluster 4) in proliferating CD4+ T cells of asthmatic patients are a consequence of the cited PI3K-Notch combined signal transduction (Zhang et al., 2013). Besides, a variant of CCND1 is implicated in obesity-related asthma (Thun et al., 2013). The relationship of GTPases Ras (HRAS, cluster 5) and Rap1 (RAP1A, cluster 4; RAP1B, cluster 3) with the regulation of T-cell proliferative responses has been proven before (Remans et al., 2004). Regarding GWAS association, CDK2 (rs2069408), GAB1 (rs1397527), SMAD3 (rs2033784, rs744910), and SOCS1 (rs17806299) SNPs have been connected to asthma (Hirota et al., 2011) (Demenais et al., 2018) (Moffatt et al., 2010). Moreover, SOCS1 gene is tied to adult asthma development by a functional polymorphism (Harada et al., 2007) and the induction of its product (SOCS-1) by IL-13 down-regulates allergic asthma phenotypes (Fukuyama et al., 2009). Other genes in cluster 5 that have been associated with asthma include IKK-α (CHUK) (Gagliardo et al., 2011), Hsp90 (HSP90AA1) (Perišić, Srećković and Matić, 2007), Lck (LCK) (Guo et al., 2008), and PTEN (PTEN) (Ni et al., 2011).

For cluster 6, a compelling combination of Th1 and Th2, and Th17 cell differentiation routes, paired with response to drug, morphogenesis and Fc receptor activities, were found. As displayed by the known disease phenotypes, asthma pathogenesis is tightly connected to a latent disequilibrium of the immune response, be it between Th1 and Th2 (Packard and Khan, 2003) or Th17 and regulatory T cells (Tao et al., 2015). Additional information about the causative mechanisms of these imbalances might be obtained from cluster 6 genes, albeit a more in-depth investigation would be required. In this instance, asthma association is mainly provided by EDN1 (rs1800541) (Zhu et al., 2008) and IFNG (Shannon et al., 2008) genes, although CCR5 (Hall et al., 1999), CTNNA1 (de Boer et al., 2008), FOS (Takahashi et al., 2002), and RAC1 (Dilasser et al., 2018) also present significant ties with the disease. Lastly, links with asthma for unclustered genes of the core module are summarized in the Appendix K.

In sum, the applied module-driven approach demonstrated a remarkable capacity to reveal disease-associated genetic components and pathways, which might not have been discerned from the direct analysis of expression or genome-wide data alone. The dispersion of asthma-related genes across the module clusters, instead of exhibiting hub-like properties, hints towards Goh et al. (2007) premise that disease nodes are often located at the functional periphery of networks. The present project establishes a valuable workflow, applicable to asthma and other conditions, focused on the identification of transcriptomic-based modules, from which core disease modules can be constructed and validated. As shown by the results, the selected core module was able to retain recognized elements of asthma pathogenesis while incorporating an assortment of novel candidate genes, which might conceivably become the subjects of forthcoming studies regarding the search of therapeutic targets and the development of personalized strategies for the treatment of this respiratory syndrome.

20

Ethical aspects and impact of the research on the society

Under the guidelines of the Declaration of Helsinki, interventional studies involving the participation of either human or animal subjects are compelled to be approved by an institutional ethics committee, incorporate informed consent in their procedures, and respect the privacy and confidentiality of the individuals, with special attention to those pertaining to vulnerable groups (World Medical Association, 2013). This project relied upon a combination of novel and well-established computational approaches, by which several sets of publicly available data were analyzed using innovative systems medicine strategies. Concerning the principal source of data, the Ethics Statement of the U-BIOPRED consortium reports that the investigatory process was approved by the Institutional Review Boards of the participating institutions and adopted the standards set by the International Conference on Harmonization and Good Clinical Practice (Bigler et al., 2017). Three advisory boards oversaw the project, specifically the Ethics Board (EB), Safety Monitoring Board (SMB), and Patient Input Platform (PIP); in order to ensure its ethical adequacy and the well-being of their subjects, provide advice on ethical issues, monitor the scientific conduct of the studies performed and coordinate the project activities with the rest of the members (U-BIOPRED EB, 2010) (U-BIOPRED PIP, 2016). All participants provided written informed consent. Therefore, no additional ethical permissions were to be procured in this case. The results of the present work can potentially contribute to expand the current understanding of asthma from a systems medicine perspective, by the exploration of the network structures underlying asthma disease mechanisms and the development of a multi-omic and individualized profiling strategy. A new workflow is provided for the combination of transcriptomic and genomic data, which can be applied to multiple diseases, with the capacity to detect disease-linked components and suggest particular genes as candidate targets for therapeutic intervention.

Future perspectives

The bottom-up strategy followed throughout the project has proved its capacity to successfully extract disease-associated information from expression-based modules validated with genomic data. Due to the exploratory nature of the workflow used, a wide variety of paths are available to further refine the methodology, corroborate the results, and expand the applicational possibilities. Time limitation was the only factor preventing the latter itineraries from being examined as yet. Firstly, the evaluated core module could be compared with a new consensus module built from the genes included in the severe asthma disease signature that was assembled from the same initial data by Bigler et al. (2017). In addition, a logical step would be the analysis of the best scoring core module for the expression profiling in induced sputum dataset (GSE76262), in order to assess its members for association with asthma and determine potential overlaps across it and the blood expression core module. In this case, the U-BIOPRED sputum transcriptomics sample and gene clustering provided by Kuo et al. (2017) would serve as a relevant reference for the unmasking of asthma phenotypes present in the module. Moreover, genes and biological mechanisms comprised in both induced sputum and blood core modules could be then contrasted with the asthma disease module identified by Sharma et al. (2015) by applying the DIAMOnD module detection method to a selected seed gene-enriched section of the interactome. Regarding the application of the established workflow to other diseases, conditions characterized by an elevated inflammatory component (allergy, hepatitis, rheumatoid arthritis, autoimmune diseases, etc.) could offer an interesting topic for future research, due to the probable crosstalk between the mechanisms disrupted in them and those affected in asthma. Asides from that, a gene regulatory network analysis supplemented by the experimental perturbation of candidate genes from the core module might be another appealing, although complex and extensive, alternative. Finally, an approach involving the production of dynamic disease modules from asthma data that encompassed several time points could also be applied, increasing the knowledge about the progression of the syndrome or the mechanism of action of a drug treatment, depending on the input.

21

Acknowledgments

In the first place, I would like to thank my supervisor, Dr. Mika Gustafsson of the Department of

Physics, Chemistry and Biology at Linköping University, for his trust, expert advice and continuous

guidance throughout this Master Thesis Project.

I would also like to thank my co-supervisor, Tejaswi Badam, for his inspiring suggestions and

constant assistance; Dr. Daniel Muthas from AstraZeneca R&D, for his valuable inputs and well-thought

feedback; and Dr. Zelmina Lubovac of the School of Bioscience at the University of Skövde, for her

insightful counseling.

Likewise, I am grateful to my fellow colleagues, Andreas Kalin, Andreas Tjärnberg, Ceylan Sonmez,

Julia Åkesson, Olof Rundquist, Rasmus Magnusson, Sanjiv Dwivedi, Simon Söderholm, and Thomas

Hillerton, at the Translational Bioinformatics group of Linköping University, for their friendship,

encouragement and stimulating discussions, both at meetings and coffee breaks.

In like manner, I cannot forget about my university classmates and friends back at Zaragoza, for

their extraordinary companionship: Andrea, Ángela, Dani, David, Edu, Héctor, Iratxe, Isabel, Javi,

Jessica, María, Patri, Perseo, Rubén, Sergio, and Vero.

Last but not the least, I would like to thank my family: my parents and my brother. None of this

would have been possible without their countless support and affection.

David M.

22

References

Abdi, H., Williams, L. J. and Valentin, D. (2013) ‘Multiple factor analysis: Principal component analysis for multitable and multiblock data sets’, Wiley Interdisciplinary Reviews: Computational Statistics, 5(2), pp. 149–179. doi: 10.1002/wics.1246.

Agache, I. and Akdis, C. A. (2016) ‘Endotypes of allergic diseases and asthma: An important step in building blocks for the future of precision medicine’, Allergology International. Elsevier B.V, 65(3), pp. 243–252. doi: 10.1016/j.alit.2016.04.011.

Agache, I., Sugita, K., Morita, H., Akdis, M. and Akdis, C. A. (2015) ‘The Complex Type 2 Endotype in Allergy and Asthma: From Laboratory to Bedside’, Current Allergy and Asthma Reports, 15(6). doi: 10.1007/s11882-015-0529-x.

Altshuler, D. M., Durbin, R. M., Abecasis, G. R., Bentley, D. R., Chakravarti, A., Clark, A. G., … Lacroute, P. (2012) ‘An integrated map of genetic variation from 1,092 human genomes’, Nature, 491(7422), pp. 56–65. doi: 10.1038/nature11632.

Ammit, A. J. and Panettieri, R. A. (2001) ‘Invited Review: The circle of life: cell cycle regulation in airway smooth muscle’, Journal of Applied Physiology, 91(3), pp. 1431–1437. doi: 10.1152/ jappl.2001.91.3.1431.

Amsen, D., Spilianakis, C. G. and Flavell, R. A. (2009) ‘How are TH1 and TH2 effector cells made?’, Current Opinion in Immunology, 21(2), pp. 153–160. doi: 10.1016/j.coi.2009.03.010.

Ayers, D. and Day, P. J. (2015) ‘Systems Medicine: The Application of Systems Biology Approaches for Modern Medical Research and Drug Development’, Molecular Biology International. Hindawi Publishing Corporation, 2015, pp. 1–8. doi: 10.1155/2015/698169.

Ayres, J. and Clark, T. J. H. (1981) ‘Asthma and the thyroid’, The Lancet, 318(8255), pp. 1110–1111. doi: 10.1016/S0140-6736(81)91309-X.

Badam, T. V. S., de Weerd, H., Åkesson, J., Wu, S., Hundstad, J., Lubovac, Z. and Gustafsson, M. (2017) ‘MODifieR: Robust disease modules identification.’ R package version 0.1.0. Available at: https://gitlab.com/Gustafsson-lab/MODifieR.

Bader, G. D. and Hogue, C. W. V. (2003) ‘An automated method for finding molecular complexes in large protein interaction networks’, BMC Bioinformatics, 4, p. 2. doi: 10.1186/1471-2105-4-2.

Ballardini, N., Bergström, A., Wahlgren, C. F., Van Hage, M., Hallner, E., Kull, I., Melén, E., Antõ, J. M., Bousquet, J. and Wickman, M. (2016) ‘IgE antibodies in relation to prevalence and multimorbidity of eczema, asthma, and rhinitis from birth to adolescence’, Allergy: European Journal of Allergy and Clinical Immunology, 71(3), pp. 342–349. doi: 10.1111/all.12798.

Barabási, A.-L., Gulbahce, N., Loscalzo, J., Celli, B., Vestbo, J., MacNee, W., Bakke, P., Calverley, P. M. A., Coxson, H., Crim, C., Edwards, L. D., Locantore, N., Lomas, D. A., Miller, B. E., Rennard, S. I., Wouters, E. F. M., Yates, J. C., Silverman, E. K. and Agusti, A. (2011) ‘Network medicine: a network-based approach to human disease.’, Nature reviews. Genetics, 12(1), pp. 56–68. doi: 10.1038/nrg2918.

Barnes, P. J. (1998) ‘Current issues for establishing inhaled corticosteroids as the antiinflammatory agents of choice in asthma’, Journal of Allergy and Clinical Immunology, 101(4 SUPPL.), pp. S427-33. doi: 10.1016/S0091-6749(98)70154-X.

Barrenäs, F., Chavali, S., Alves, A. C., Coin, L., Jarvelin, M. R., Jörnsten, R., Langston, M. A., Ramasamy, A., Rogers, G., Wang, H. and Benson, M. (2012) ‘Highly interconnected genes in disease-specific networks are enriched for disease-associated polymorphisms’, Genome Biology, 13(6). doi:

23

10.1186/gb-2012-13-6-r46.

Bass JDSwcfAJ, D. A., Dabney, A. and Robinson, D. (2015) ‘qvalue: Q-value estimation for false discovery rate control.’ R package version 2.6.0.

Bateman, E. D., Boushey, H. A., Bousquet, J., Busse, W. W., Clark, T. J. H., Pauwels, R. A. and Pedersen, S. E. (2004) ‘Can guideline-defined asthma control be achieved? The gaining optimal asthma control study’, American Journal of Respiratory and Critical Care Medicine, 170(8), pp. 836–844. doi: 10.1164/rccm.200401-033OC.

Bel, E. H., Sousa, A., Fleming, L., Bush, A., Chung, K. F., Versnel, J., Wagener, A. H., Wagers, S. S., Sterk, P. J. and Compton, C. H. (2011) ‘Diagnosis and definition of severe refractory asthma: an international consensus statement from the Innovative Medicine Initiative (IMI)’, Thorax, 66(10), pp. 910–917. doi: 10.1136/thx.2010.153643.

Bhavsar, P., Khorasani, N., Hew, M., Johnson, M. and Chung, K. F. (2010) ‘Effect of p38 MAPK inhibition on corticosteroid suppression of cytokine release in severe asthma’, European Respiratory Journal, 35(4), pp. 750–756. doi: 10.1183/09031936.00071309.

Bigler, J., Boedigheimer, M., Schofield, J. P. R., Skipp, P. J., Corfield, J., Rowe, A., Sousa, A. R., Timour, M., Twehues, L., Hu, X., Roberts, G., Welcher, A. A., Yu, W., Lefaudeux, D., De Meulder, B., Auffray, C., Chung, K. F., Adcock, I. M., Sterk, P. J. and Djukanović, R. (2017) ‘A severe asthma disease signature from gene expression profiling of peripheral blood from U-BIOPRED cohorts’, American Journal of Respiratory and Critical Care Medicine, 195(10), pp. 1311–1320. doi: 10.1164/rccm.201604-0866OC.

Birrell, M. A., Hardaker, E., Wong, S., McCluskie, K., Catley, M., De Alba, J., Newton, R., Haj-Yahia, S., Pun, K. T., Watts, C. J., Shaw, R. J., Savage, T. J. and Belvisi, M. G. (2005) ‘Iκ-B kinase-2 inhibitor blocks inflammation in human airway smooth muscle and a rat model of asthma’, American Journal of Respiratory and Critical Care Medicine, 172(8), pp. 962–971. doi: 10.1164/rccm.200412-1647OC.

de Boer, W. I., Sharma, H. S., Baelemans, S. M. I., Hoogsteden, H. C., Lambrecht, B. N. and Braunstahl, G. J. (2008) ‘Altered expression of epithelial junctional proteins in atopic asthma: possible role in inflammation’, Can. J. Physiol. Pharmacol., 86(3), pp. 105–112. doi: 10.1139/Y08-004.

Bousquet, J., Chanez, P., Lacoste, J. Y., Barnéon, G., Ghavanian, N., Enander, I., Venge, P., Ahlstedt, S., Simony-Lafontaine, J., Godard, P. and Michel, F.-B. (1990) ‘Eosinophilic Inflammation in Asthma’, New England Journal of Medicine, 323(15), pp. 1033–1039. doi: 10.1056/NEJM199010113231505.

Bousquet, J. and Khaltaev, N. (2007) Global surveillance, prevention and control of chronic respiratory diseases: a comprehensive approach. Global Alliance against Chronic Respiratory Diseases. Geneva: World Health Organization. Available at: http://www.who.int/gard/publications/GARD Book 2007.pdf.

Bousquet, J., Mantzouranis, E., Cruz, A. A., Aït-Khaled, N., Baena-Cagnani, C. E., Bleecker, E. R., Brightling, C. E., Burney, P., Bush, A., Busse, W. W., Casale, T. B., Chan-Yeung, M., Chen, R., Chowdhury, B., Chung, K. F., Dahl, R., Drazen, J. M., Fabbri, L. M., Holgate, S. T., Kauffmann, F., Haahtela, T., Khaltaev, N., Kiley, J. P., Masjedi, M. R., Mohammad, Y., O’Byrne, P., Partridge, M. R., Rabe, K. F., Togias, A., van Weel, C., Wenzel, S., Zhong, N. and Zuberbier, T. (2010) ‘Uniform definition of asthma severity, control, and exacerbations: Document presented for the World Health Organization Consultation on Severe Asthma’, Journal of Allergy and Clinical Immunology, 126(5), pp. 926–938. doi: 10.1016/j.jaci.2010.07.019.

Bruno, A., Pace, E., Chanez, P., Gras, D., Vachier, I., Chiappara, G., La Guardia, M., Gerbino, S., Profita, M. and Gjomarkaj, M. (2009) ‘Leptin and leptin receptor expression in asthma’, Journal of Allergy and Clinical Immunology. Elsevier Ltd, 124(2), p. 230–237.e4. doi: 10.1016/j.jaci.2009.04.032.

Busse, W. W., Camargo, C. A., Boushey, H. A. and Evans, D. (2007) Expert panel report 3: guidelines for

24

the diagnosis and management of asthma. Available at: https://www.nhlbi.nih.gov/files/docs/ guidelines/asthsumm.pdf.

Carlson, M. (2017) ‘org.Hs.eg.db: Genome wide annotation for Human.’ R package version 3.4.0.

Carvalho, B. S. and Irizarry, R. A. (2010) ‘A framework for oligonucleotide microarray preprocessing’, Bioinformatics, 26(19), pp. 2363–2367. doi: 10.1093/bioinformatics/btq431.

Chamberlain, S., Ushey, K. and Zhu, H. (2016) ‘rsnps: Get “SNP” (“Single-Nucleotide” ’Polymorphism’) Data on the Web.’ R package version 0.2.0.

Chavent, M., Kuentz, V., Labenne, A., Liquet, B. and Saracco, J. (2017) ‘PCAmixdata: Multivariate Analysis of Mixed Data.’ R package version 3.1.

Cheng, J., Galili, T., Bostock, M. and Palmer, J. (2018) ‘d3heatmap: Interactive Heat Maps Using “htmlwidgets” and “D3.js”’. R package version 0.6.1.2.

Chiang, C.-H., Chuang, C.-H., Liu, S.-L. and Shen, H.-D. (2013) ‘Genetic polymorphism of transforming growth factor β1 and tumor necrosis factor α is associated with asthma and modulates the severity of asthma.’, Respiratory care, 58(8), pp. 1343–50. doi: 10.4187/respcare.02187.

Chiappara, G., Chanez, P., Bruno, A., Pace, E., Pompeo, F., Bousquet, J., Bonsignore, G. and Gjomarkaj, M. (2007) ‘Variable p-CREB expression depicts different asthma phenotypes’, Allergy: European Journal of Allergy and Clinical Immunology, 62(7), pp. 787–794. doi: 10.1111/j.1398-9995.2007.01417.x.

Chung, K. F., Wenzel, S. E., Brozek, J. L., Bush, A., Castro, M., Sterk, P. J., Adcock, I. M., Bateman, E. D., Bel, E. H., Bleecker, E. R., Boulet, L. P., Brightling, C., Chanez, P., Dahlen, S. E., Djukanovic, R., Frey, U., Gaga, M., Gibson, P., Hamid, Q., Jajour, N. N., Mauad, T., Sorkness, R. L. and Teague, W. G. (2014) ‘International ERS/ATS guidelines on definition, evaluation and treatment of severe asthma’, European Respiratory Journal, 43(2), pp. 343–373. doi: 10.1183/09031936.00202013.

Clarke, G. M., Anderson, C. A., Pettersson, F. H., Cardon, L. R., Morris, A. P. and Zondervan, K. T. (2011) ‘Basic statistical analysis in genetic case-control studies’, Nature Protocols, 6(2), pp. 121–133. doi: 10.1038/nprot.2010.182.

Davidson, E. H. (2002) ‘A Genomic Regulatory Network for Development’, Science, 295(5560), pp. 1669–1678. doi: 10.1126/science.1069883.

Davydov, I. V, Krammer, P. H. and Li-Weber, M. (1995) ‘Nuclear factor-IL6 activates the human IL-4 promoter in T cells.’, J Immunol, 155(11), pp. 5273–79. Available at: http://www.ncbi.nlm.nih.gov/ pubmed/7594540.

Dekkers, B. G. J., Spanjer, A. I. R., van der Schuyt, R. D., Kuik, W. J., Zaagsma, J. and Meurs, H. (2013) ‘Focal adhesion kinase regulates collagen I-induced airway smooth muscle phenotype switching.’, The Journal of pharmacology and experimental therapeutics, 346(1), pp. 86–95. doi: 10.1124/jpet.113.203042.

Demenais, F., Margaritte-Jeannin, P., Barnes, K. C., Cookson, W. O. C., Altmüller, J., Ang, … Nicolae, D. L. (2018) ‘Multiancestry association study identifies new asthma risk loci that colocalize with immune-cell enhancer marks’, Nature Genetics, 50(1), pp. 42–53. doi: 10.1038/s41588-017-0014-7.

Desai, M. and Oppenheimer, J. (2016) ‘Elucidating asthma phenotypes and endotypes: Progress towards personalized medicine’, Annals of Allergy, Asthma and Immunology. American College of Allergy, Asthma & Immunology, 116(5), pp. 394–401. doi: 10.1016/j.anai.2015.12.024.

Dilasser, F., Klein, M., Magnan, A., Loirand, G. and Sauzeau, V. (2018) ‘Essential role of smooth muscle

25

Rac1 in airway hyperresponsiveness and airway remodelling associated to severe asthma’, Revue Française d’Allergologie. Elsevier Masson SAS, 58(3), p. 285. doi: 10.1016/j.reval.2018.02.167.

Ek, W. E., Rask-Andersen, M., Karlsson, T. and Johansson, A. (2017) ‘Genome-wide association analysis identifies 26 novel loci for asthma, hay fever and eczema’, bioRxiv (under review Nat. Commun.), p. 195933. doi: 10.1101/195933.

Enomoto, Y., Orihara, K., Takamasu, T., Matsuda, A., Gon, Y., Saito, H., Ra, C. and Okayama, Y. (2009) ‘Tissue remodeling induced by hypersecreted epidermal growth factor and amphiregulin in the airway after an acute asthma attack’, Journal of Allergy and Clinical Immunology. Elsevier Ltd, 124(5), p. 913–920.e7. doi: 10.1016/j.jaci.2009.08.044.

European Lung Foundation (2013) U-BIOPRED Project. Available at: http://www.europeanlung.org/ en/projects-and-research/projects/u-biopred/home.

Fraley, C., Raftery, A. E., Scrucca, L., Murphy, T. B. and Fop, M. (2017) ‘mclust: Gaussian Mixture Modelling for Model-Based Clustering, Classification, and Density Estimation’. R package version 5.4.

Franceschini, A., Szklarczyk, D., Frankild, S., Kuhn, M., Simonovic, M., Roth, A., Lin, J., Minguez, P., Bork, P., Von Mering, C. and Jensen, L. J. (2013) ‘STRING v9.1: Protein-protein interaction networks, with increased coverage and integration’, Nucleic Acids Research, 41(D1), pp. 808–815. doi: 10.1093/nar/gks1094.

Fukuyama, S., Nakano, T., Matsumoto, T., Oliver, B. G. G., Burgess, J. K., Moriwaki, A., Tanaka, K., Kubo, M., Hoshino, T., Tanaka, H., McKenzie, A. N. J., Matsumoto, K., Aizawa, H., Nakanishi, Y., Yoshimura, A., Black, J. L. and Inoue, H. (2009) ‘Pulmonary suppressor of cytokine signaling-1 induced by IL-13 regulates allergic asthma phenotype’, American Journal of Respiratory and Critical Care Medicine, 179(11), pp. 992–998. doi: 10.1164/rccm.200806-992OC.

Gagliardo, R., Chanez, P., Profita, M., Bonanno, A., Albano, G. D., Montalbano, A. M., Pompeo, F., Gagliardo, C., Merendino, A. M. and Gjomarkaj, M. (2011) ‘IκB kinase-driven nuclear factor-κB activation in patients with asthma and chronic obstructive pulmonary disease’, Journal of Allergy and Clinical Immunology. Elsevier Ltd, 128(3), p. 635–645.e2. doi: 10.1016/j.jaci.2011.03.045.

Gandhi, T. K. B., Zhong, J., Mathivanan, S., Karthick, L., Chandrika, K. N., Mohan, S. S., Sharma, S., Pinkert, S., Nagaraju, S., Periaswamy, B., Mishra, G., Nandakumar, K., Shen, B., Deshpande, N., Nayak, R., Sarker, M., Boeke, J. D., Parmigiani, G., Schultz, J., Bader, J. S. and Pandey, A. (2006) ‘Analysis of the human protein interactome and comparison with yeast, worm and fly interaction datasets.’, Nature genetics, 38(3), pp. 285–93. doi: 10.1038/ng1747.

Gautier, L., Cope, L., Bolstad, B. M. and Irizarry, R. A. (2004) ‘Affy - Analysis of Affymetrix GeneChip data at the probe level’, Bioinformatics, 20(3), pp. 307–315. doi: 10.1093/bioinformatics/btg405.

Gentleman, R. (2018) ‘annotate: Annotation for microarrays’. R package version 1.52.1.

Gentleman, R., Carey, V., Huber, W. and Hahne, F. (2017) ‘genefilter: methods for filtering genes from high-throughput experiments’. R package version 1.56.0.

Ghiassian, S. D., Menche, J. and Barabási, A. L. (2015) ‘A DIseAse MOdule Detection (DIAMOnD) Algorithm Derived from a Systematic Analysis of Connectivity Patterns of Disease Proteins in the Human Interactome’, PLoS Computational Biology, 11(4), pp. 1–21. doi: 10.1371/journal.pcbi.1004120.

Gillis, S., Crabtree, G. R. and Smith, K. A. (1979a) ‘Glucocorticoid-induced inhibition of T cell growth factor production. I. The effect on mitogen-induced lymphocyte proliferation.’, Journal of immunology (Baltimore, Md. : 1950), 123(4), pp. 1624–31. doi: 10.1111/j.1398-9995.2011.02709.x.

26

Gillis, S., Crabtree, G. R. and Smith, K. A. (1979b) ‘Glucocorticoid-induced inhibition of T cell growth factor production. I. The effect on mitogen-induced lymphocyte proliferation.’, Journal of immunology (Baltimore, Md. : 1950), 123(4), pp. 1624–31. Available at: http://www.ncbi.nlm.nih.gov/pubmed/ 314468.

GINA (2017) Global Initiative for Asthma. GINA guidelines. Global strategy for Asthma Management and Prevention. Available at: http://ginasthma.org/wp-content/uploads/2016/01/wms-GINA-2017-main-report-tracked-changes-for-archive.pdf.

Goh, K.-I., Cusick, M. E., Valle, D., Childs, B., Vidal, M. and Barabási, A.-L. (2007) ‘The human disease network.’, Proceedings of the National Academy of Sciences of the United States of America, 104(21), pp. 8685–90. doi: 10.1073/pnas.0701361104.

González-Pérez, A., Fernández-Vidaurre, C., Rueda, A., Rivero, E. and García Rodríguez, L. A. (2006) ‘Cancer incidence in a general population of asthma patients’, Pharmacoepidemiology and Drug Safety, 15(2), pp. 131–138. doi: 10.1002/pds.1163.

Goulding, N. J. and Guyre, P. M. (1993) ‘Glucocorticoids, lipocortins and the immune response’, Current Opinion in Immunology, 5(1), pp. 108–113. doi: 10.1016/0952-7915(93)90089-B.

Guo, X. J., Li, J., Ni, P. H., Ren, L. P. and Xu, W. G. (2008) ‘The transcription levels of linker for activation of T cell and its upstream regulatory factors in T cells of asthmatic patients’, Zhonghua Jie He He Hu Xi Za Zhi, 31(2), pp. 125–128.

Gustafsson, M., Gawel, D. R., Alfredsson, L., Baranzini, S., Bjorkander, J., Blomgran, R., Hellberg, S., Eklund, D., Ernerudh, J., Kockum, I., Konstantinell, A., Lahesmaa, R., Lentini, A., Liljenstrom, H. R. I., Mattson, L., Matussek, A., Mellergard, J., Mendez, M., Olsson, T., Pujana, M. A., Rasool, O., Serra-Musach, J., Stenmarker, M., Tripathi, S., Viitala, M., Wang, H., Zhang, H., Nestor, C. E. and Benson, M. (2015) ‘A validated gene regulatory network and GWAS identifies early regulators of T cell-associated diseases’, Science Translational Medicine, 7(313), p. 313ra178-313ra178. doi: 10.1126/ scitranslmed.aad2722.

Gustafsson, M., Nestor, C. E., Zhang, H., Barabási, A.-L., Baranzini, S., Brunak, S., Chung, K. F., Federoff, H. J., Gavin, A.-C., Meehan, R. R., Picotti, P., Pujana, M. À., Rajewsky, N., Smith, K. G., Sterk, P. J., Villoslada, P. and Benson, M. (2014) ‘Modules, networks and systems medicine for understanding disease and aiding diagnosis.’, Genome medicine, 6(10), p. 82. doi: 10.1186/s13073-014-0082-6.

Hackett, T.-L., de Bruin, H. G., Shaheen, F., van den Berge, M., van Oosterhout, A. J., Postma, D. S. and Heijink, I. H. (2013) ‘Caveolin-1 Controls Airway Epithelial Barrier Function. Implications for Asthma’, American Journal of Respiratory Cell and Molecular Biology, 49(4), pp. 662–671. doi: 10.1165/rcmb.2013-0124OC.

Hall, I. P., Wheatley, A., Christie, G., McDougall, C., Hubbard, R. and Helms, P. J. (1999) ‘Association of CCR5 ▵32 with reduced risk of asthma’, The Lancet, 354(9186), pp. 1264–1265. doi: 10.1016/S0140-6736(99)03425-X.

Halu, A., De Domenico, M., Arenas, A. and Sharma, A. (2017) ‘The multiplex network of human diseases’, bioRxiv, pp. 1–41. doi: 10.1101/100370.

Han, J.-D. J., Bertin, N., Hao, T., Goldberg, D. S., Berriz, G. F., Zhang, L. V., Dupuy, D., Walhout, A. J. M., Cusick, M. E., Roth, F. P. and Vidal, M. (2004) ‘Evidence for dynamically organized modularity in the yeast protein–protein interaction network’, Nature, 430(6997), pp. 380–380. doi: 10.1038/nature02795.

Hancox, R. J., Poulton, R., Greene, J. M., Filsell, S., McLachlan, C. R., Rasmussen, F., Taylor, D. R., Williams, M. J. A., Williamson, A. and Sears, M. R. (2007) ‘Systemic inflammation and lung function in

27

young adults’, Thorax, 62(12), pp. 1064–1068. doi: 10.1136/thx.2006.076877.

Harada, M., Nakashima, K., Hirota, T., Shimizu, M., Doi, S., Fujita, K., Shirakawa, T., Enomoto, T., Yoshikawa, M., Moriyama, H., Matsumoto, K., Saito, H., Suzuki, Y., Nakamura, Y. and Tamari, M. (2007) ‘Functional polymorphism in the suppressor of cytokine signaling 1 gene associated with adult asthma’, American Journal of Respiratory Cell and Molecular Biology, 36(4), pp. 491–496. doi: 10.1165/rcmb.2006-0090OC.

Hayashi, T., Mo, J.-H., Gong, X., Rossetto, C., Jang, A., Beck, L., Elliott, G. I., Kufareva, I., Abagyan, R., Broide, D. H., Lee, J. and Raz, E. (2007) ‘3-Hydroxyanthranilic acid inhibits PDK1 activation and suppresses experimental asthma by inducing T cell apoptosis’, Proceedings of the National Academy of Sciences, 104(47), pp. 18619–18624. doi: 10.1073/pnas.0709261104.

Hekking, P. P. W., Wener, R. R., Amelink, M., Zwinderman, A. H., Bouvy, M. L. and Bel, E. H. (2015) ‘The prevalence of severe refractory asthma’, Journal of Allergy and Clinical Immunology. Elsevier Ltd, 135(4), pp. 896–902. doi: 10.1016/j.jaci.2014.08.042.

Hellberg, S., Eklund, D., Gawel, D. R., Köpsén, M., Zhang, H., Nestor, C. E., Kockum, I., Olsson, T., Skogh, T., Kastbom, A., Sjöwall, C., Vrethem, M., Håkansson, I., Benson, M., Jenmalm, M. C., Gustafsson, M. and Ernerudh, J. (2016) ‘Dynamic Response Genes in CD4+ T Cells Reveal a Network of Interactive Proteins that Classifies Disease Activity in Multiple Sclerosis’, Cell Reports, 16(11), pp. 2928–2939. doi: 10.1016/j.celrep.2016.08.036.

Hersberger, M., Thun, G. A., Imboden, M., Brandstätter, A., Waechter, V., Summerer, M., Schmid-Grendelmeier, P., Bircher, A., Rohrer, L., Berger, W., Russi, E. W., Rochat, T., Kronenberg, F. and Probst-Hensch, N. (2010) ‘Association of STR polymorphisms in CMA1 and IL-4 with asthma and atopy: The SAPALDIA Cohort’, Human Immunology, 71(11), pp. 1154–1160. doi: 10.1016/j.humimm.2010.08.008.

Hew, M., Bhavsar, P., Torrego, A., Meah, S., Khorasani, N., Barnes, P. J., Adcock, I. and Kian, F. C. (2006) ‘Relative corticosteroid insensitivity of peripheral blood mononuclear cells in severe asthma’, American Journal of Respiratory and Critical Care Medicine, 174(2), pp. 134–141. doi: 10.1164/rccm.200512-1930OC.

Hill, T. and Lewicki, P. (2007) STATISTICS: Methods and Applications. Edited by StatSoft. Tulsa, OK.

Hirota, T., Takahashi, A., Kubo, M., Tsunoda, T., Tomita, K., Doi, S., Fujita, K., Miyatake, A., Enomoto, T., Miyagawa, T., Adachi, M., Tanaka, H., Niimi, A., Matsumoto, H., Ito, I., Masuko, H., Sakamoto, T., Hizawa, N., Taniguchi, M., Lima, J. J., Irvin, C. G., Peters, S. P., Himes, B. E., Litonjua, A. A., Tantisira, K. G., Weiss, S. T., Kamatani, N., Nakamura, Y. and Tamari, M. (2011) ‘Genome-wide association study identifies three new susceptibility loci for adult asthma in the Japanese population’, Nature Genetics. Nature Publishing Group, 43(9), pp. 893–896. doi: 10.1038/ng.887.

Hood, L. (2004) ‘Systems Biology and New Technologies Enable Predictive and Preventative Medicine’, Science, 306(5696), pp. 640–643. doi: 10.1126/science.1104635.

Hoshino, M., Aoike, N., Takahashi, M., Nakamura, Y. and Nakagawa, T. (2003) ‘Increased immunoreactivity of stromal cell-derived factor-1 and angiogenesis in asthma’, European Respiratory Journal, 21(5), pp. 804–809. doi: 10.1183/09031936.03.00082002.

Hossny, E., El-Awady, H., Bakr, S. and Labib, A. (2009) ‘Vascular endothelial growth factor overexpression in induced sputum of children with bronchial asthma’, Pediatric Allergy and Immunology, 20(1), pp. 89–96. doi: 10.1111/j.1399-3038.2008.00730.x.

Hsieh, Y.-Y., Chang, C.-C., Hsu, C.-M., Wan, L., Chen, S.-Y., Lin, W.-H. and Tsai, F.-J. (2011) ‘JAK-1 rs2780895 C-Related Genotype and Allele but Not JAK-1 rs10789166, rs4916008, rs2780885, rs17127114, and rs3806277 Are Associated with Higher Susceptibility to Asthma’, Genetic Testing and

28

Molecular Biomarkers, 15(12), pp. 841–847. doi: 10.1089/gtmb.2011.0002.

Huang, C.-D., Lin, S.-M., Chang, P.-J., Liu, W.-T., Wang, C.-H., Liu, C.-Y., Lin, H.-C., Hsieh, L.-L. and Kuo, H.-P. (2009) ‘Matrix metalloproteinase-1 polymorphism is associated with persistent airway obstruction in asthma in the Taiwanese population.’, The Journal of asthma : official journal of the Association for the Care of Asthma, 46(1), pp. 41–6. doi: 10.1080/02770900802252077.

Huber, W., Carey, V. J., Gentleman, R., Anders, S., Carlson, M., Carvalho, B. S., Bravo, H. C., Davis, S., Gatto, L., Girke, T., Gottardo, R., Hahne, F., Hansen, K. D., Irizarry, R. A., Lawrence, M., Love, M. I., MaCdonald, J., Obenchain, V., Oles,̈ A. K., Pagès, H., Reyes, A., Shannon, P., Smyth, G. K., Tenenbaum, D., Waldron, L. and Morgan, M. (2015) ‘Orchestrating high-throughput genomic analysis with Bioconductor’, Nature Methods. Nature Publishing Group, 12(2), pp. 115–121. doi: 10.1038/nmeth.3252.

Huerta-Yepez, S., Baay-Guzman, G. J., Bebenek, I. G., Hernandez-Pando, R., Vega, M. I., Chi, L., Riedl, M., Diaz-Sanchez, D., Kleerup, E., Tashkin, D. P., Gonzalez, F. J., Bonavida, B., Zeidler, M. and Hankinson, O. (2011) ‘Hypoxia Inducible Factor promotes murine allergic airway inflammation and is increased in asthma and rhinitis’, Allergy, 66(7), pp. 909–918. doi: 10.1111/j.1398-9995.2011.02594.x.

Husson, F., Le, S. and Pages, J. (2017) ‘FactoMineR: Multivariate Exploratory Data Analysis and Data Mining’. R package version 1.40.

Irizarry, R. A. and Cawley, S. (2017) ‘affycomp: Graphics Toolbox for Assessment of Affymetrix Expression Measures.’ R package version 1.50.0.

Izuhara, K., Ohta, S. and Ono, J. (2016) ‘Using periostin as a biomarker in the treatment of asthma’, Allergy, Asthma and Immunology Research, 8(6), pp. 491–498. doi: 10.4168/aair.2016.8.6.491.

Jeong, H., Mason, S. P., Barabasi, A.-L. and Oltvai, Z. N. (2001) ‘Lethality and centrality in protein networks’, 411(May), pp. 41–42. doi: 10.1038/35075138.

Kariyawasam, H. H., Pegorier, S., Barkans, J., Xanthou, G., Aizen, M., Ying, S., Kay, A. B., Lloyd, C. M. and Robinson, D. S. (2009) ‘Activin and transforming growth factor-β signaling pathways are activated after allergen challenge in mild asthma’, Journal of Allergy and Clinical Immunology, 124(3), pp. 454–462. doi: 10.1016/j.jaci.2009.06.022.

Kariyawasam, H. H., Xanthou, G., Barkans, J., Aizen, M., Kay, A. B. and Robinson, D. S. (2008) ‘Basal expression of bone morphogenetic protein receptor is reduced in mild asthma’, American Journal of Respiratory and Critical Care Medicine, 177(10), pp. 1074–1081. doi: 10.1164/rccm.200709-1376OC.

Karrasch, S., Linde, K., Rücker, G., Sommer, H., Karsch-Völk, M., Kleijnen, J., Jörres, R. A. and Schneider, A. (2017) ‘Accuracy of FENO for diagnosing asthma: a systematic review’, Thorax, 72(2), pp. 109–116. doi: 10.1136/thoraxjnl-2016-208704.

Kassambara, A. and Mundt, F. (2017) ‘factoextra: Extract and Visualize the Results of Multivariate Data Analyses.’ R package version 1.0.5.

Kim, J.-H., Cheong, H. S., Park, J. S., Jang, A.-S., Uh, S.-T., Kim, Y.-H., Kim, M.-K., Choi, I. S., Cho, S. H., Choi, B. W., Bae, J. S., Park, C.-S. and Shin, H. D. (2013) ‘A Genome-Wide Association Study of Total Serum and Mite-Specific IgEs in Asthma Patients’, PLoS ONE, 8(8), p. e71958. doi: 10.1371/journal.pone.0071958.

Klein, R. J. (2007) ‘Power analysis for genome-wide association studies’, BMC Genetics, 8, pp. 1–8. doi: 10.1186/1471-2156-8-58.

Kuo, C.-H. S., Pavlidis, S., Loza, M., Baribaud, F., Rowe, A., Pandis, I., Sousa, A., Corfield, J., Djukanovic, R., Lutter, R., Sterk, P. J., Auffray, C., Guo, Y., Adcock, I. M., Chung, K. F. and U-BIOPRED Study Group

29

(2017) ‘T-helper cell type 2 (Th2) and non-Th2 molecular phenotypes of asthma using sputum transcriptomics in U-BIOPRED.’, The European respiratory journal, 49(2), p. 1602135. doi: 10.1183/13993003.02135-2016.

Lamparter, D., Marbach, D., Rueedi, R., Kutalik, Z. and Bergmann, S. (2016) ‘Fast and Rigorous Computation of Gene and Pathway Scores from SNP-Based Summary Statistics’, PLoS Computational Biology, 12(1), pp. 1–20. doi: 10.1371/journal.pcbi.1004714.

Lee, C. M., Jung, I. D., Noh, K. T., Lee, J. S., Park, J. W., Heo, D. R., Park, J. H., Chang, J. H., Choi, I. W., Kim, J. S., Shin, Y. K., Park, S. J., Han, M. K., Lee, C. G., Cho, W. K. and Park, Y. M. (2012) ‘An essential regulatory role of downstream of kinase-1 in the ovalbumin-induced murine model of asthma’, PLoS ONE, 7(4), pp. 1–12. doi: 10.1371/journal.pone.0034554.

Lefaudeux, D., De Meulder, B., Loza, M. J., Peffer, N., Rowe, A., Baribaud, F., Bansal, A. T., Lutter, R., Sousa, A. R., Corfield, J., Pandis, I., Bakke, P. S., Caruso, M., Chanez, P., Dahlén, S. E., Fleming, L. J., Fowler, S. J., Horvath, I., Krug, N., Montuschi, P., Sanak, M., Sandstrom, T., Shaw, D. E., Singer, F., Sterk, P. J., Roberts, G., Adcock, I. M., Djukanovic, R., Auffray, C. and Chung, K. F. (2015) ‘U-BIOPRED clinical adult asthma clusters linked to a subset of sputum omics’, Journal of Allergy and Clinical Immunology. Elsevier Ltd. doi: 10.1016/j.jaci.2016.08.048.

Leisch, F. and Dimitriadou, E. (2012) ‘mlbench: Machine Learning Benchmark Problems’. R package version 2.1-1.

Li, C. ye, Peng, J., Ren, L. pin, Gan, L. xing, Lu, X. jiong, Liu, Q., Gu, W. and Guo, X. jun (2013) ‘Roles of histone hypoacetylation in LAT expression on T cells and Th2 polarization in allergic asthma’, Journal of Translational Medicine. Journal of Translational Medicine, 11(1), p. 1. doi: 10.1186/1479-5876-11-26.

Liang, Q., Guo, L., Gogate, S., Karim, Z., Hanifi, A., Leung, D. Y., Gorska, M. M. and Alam, R. (2010) ‘IL-2 and IL-4 Stimulate MEK1 Expression and Contribute to T Cell Resistance against Suppression by TGF- and IL-10 in Asthma’, The Journal of Immunology, 185(10), pp. 5704–5713. doi: 10.4049/jimmunol.1000690.

Liu, X., Hemminki, K., Försti, A., Sundquist, J., Sundquist, K. and Ji, J. (2015) ‘Cancer risk and mortality in asthma patients: A Swedish national cohort study’, Acta Oncologica, 54(8), pp. 1120–1127. doi: 10.3109/0284186X.2014.1001497.

Lötvall, J., Akdis, C. A., Bacharier, L. B., Bjermer, L., Casale, T. B., Custovic, A., Lemanske, R. F., Wardlaw, A. J., Wenzel, S. E. and Greenberger, P. A. (2011) ‘Asthma endotypes: A new approach to classification of disease entities within the asthma syndrome’, Journal of Allergy and Clinical Immunology, 127(2), pp. 355–360. doi: 10.1016/j.jaci.2010.11.037.

Louis, R., Lau, L. C. K., Bron, A. O., Roldaan, A. C., Radermecker, M. and Djukanović, R. (2000) ‘The relationship between airways inflammation and asthma severity’, American Journal of Respiratory and Critical Care Medicine, 161(1), pp. 9–16. doi: 10.1164/ajrccm.161.1.9802048.

Loymans, R. J. B., Honkoop, P. J., Termeer, E. H., Snoeck-Stroband, J. B., Assendelft, W. J. J., Schermer, T. R. J., Chung, K. F., Sousa, A. R., Sterk, P. J., Reddel, H. K., Sont, J. K. and ter Riet, G. (2016) ‘Identifying patients at risk for severe exacerbations of asthma: development and external validation of a multivariable prediction model’, Thorax, 71(9), pp. 838–846. doi: 10.1136/thoraxjnl-2015-208138.

Luong, K. V and Nguyen, L. T. (2000) ‘Hyperthyroidism and asthma.’, The Journal of asthma : official journal of the Association for the Care of Asthma, 37(2), pp. 125–30. Available at: http://www.ncbi.nlm.nih.gov/pubmed/10805201.

Ma, S.-F., Flores, C., Wade, M. S., Dudek, S. M., Nicolae, D. L., Ober, C. and Garcia, J. G. N. (2008) ‘A

30

common cortactin gene variation confers differential susceptibility to severe asthma’, Genetic Epidemiology, 32(8), pp. 757–766. doi: 10.1002/gepi.20343.

Magnusson, R., Gustafsson, M., Cedersund, G., Strålfors, P. and Nyman, E. (2017) ‘Cross-talks via mTORC2 can explain enhanced activation in response to insulin in diabetic patients’, Bioscience Reports, 37(1), p. BSR20160514. doi: 10.1042/BSR20160514.

Manise, M., Holtappels, G., Van Crombruggen, K., Schleich, F., Bachert, C. and Louis, R. (2013) ‘Sputum IgE and Cytokines in Asthma: Relationship with Sputum Cellular Profile’, PLoS ONE, 8(3). doi: 10.1371/journal.pone.0058388.

Medina-Tato, D. A., Ward, S. G. and Watson, M. L. (2007) ‘Phosphoinositide 3-kinase signalling in lung disease: Leucocytes and beyond’, Immunology, 121(4), pp. 448–461. doi: 10.1111/j.1365-2567.2007.02663.x.

Miller, M. K., Lee, J. H., Blanc, P. D., Pasta, D. J., Gujrathi, S., Barron, H., Wenzel, S. E. and Weiss, S. T. (2006) ‘TENOR risk score predicts healthcare in adults with severe or difficult-to-treat asthma’, European Respiratory Journal, 28(6), pp. 1145–1155. doi: 10.1183/09031936.06.00145105.

Modena, B. D., Bleecker, E. R., Busse, W. W., Erzurum, S. C., Gaston, B. M., Jarjour, N. N., Meyers, D. A., Milosevic, J., Tedrow, J. R., Wu, W., Kaminski, N. and Wenzel, S. E. (2017) ‘Gene expression correlated with severe asthma characteristics reveals heterogeneous mechanisms of severe disease’, American Journal of Respiratory and Critical Care Medicine, 195(11), pp. 1449–1463. doi: 10.1164/rccm.201607-1407OC.

Moffatt, M. F., Gut, I. G., Demenais, F., Strachan, D. P., Bouzigon, E., Heath, S., von Mutius, E., Farrall, M., Lathrop, M. and Cookson, W. O. C. M. (2010) ‘A Large-Scale, Consortium-Based Genomewide Association Study of Asthma’, New England Journal of Medicine, 363(13), pp. 1211–1221. doi: 10.1056/NEJMoa0906312.

Moore, W. C., Bleecker, E. R., Curran-Everett, D., Erzurum, S. C., Ameredes, B. T., Bacharier, L., Calhoun, W. J., Castro, M., Chung, K. F., Clark, M. P., Dweik, R. A., Fitzpatrick, A. M., Gaston, B., Hew, M., Hussain, I., Jarjour, N. N., Israel, E., Levy, B. D., Murphy, J. R., Peters, S. P., Teague, W. G., Meyers, D. A., Busse, W. W. and Wenzel, S. E. (2007) ‘Characterization of the severe asthma phenotype by the National Heart, Lung, and Blood Institute’s Severe Asthma Research Program’, Journal of Allergy and Clinical Immunology, 119(2), pp. 405–413. doi: 10.1016/j.jaci.2006.11.639.

Ni, Z. H., Tang, J. H., Cai, Z. Y., Yang, W., Zhang, L., Chen, Q., Zhang, L. and Wang, X. B. (2011) ‘A new pathway of glucocorticoid action for asthma treatment through the regulation of PTEN expression’, Respiratory Research. BioMed Central Ltd, 12(1), p. 47. doi: 10.1186/1465-9921-12-47.

Olafsdottir, I. S., Gislason, T., Thjodleifsson, B., Olafsson, I., Gislason, D., Jögi, R. and Janson, C. (2005) ‘C reactive protein levels are increased in non-allergic but not allergic asthma: a multicentre epidemiological study.’, Thorax, 60(6), pp. 451–4. doi: 10.1136/thx.2004.035774.

Olnes, M. J., Kotliarov, Y., Biancotto, A., Cheung, F., Chen, J., Shi, R., Zhou, H., Wang, E., Tsang, J. S. and Nussenblatt, R. (2016) ‘Effects of Systemically Administered Hydrocortisone on the Human Immunome’, Scientific Reports, 6(1), p. 23002. doi: 10.1038/srep23002.

Oti, M., Snel, B., Huynen, M. A. and Brunner, H. G. (2006) ‘Predicting disease genes using protein-protein interactions.’, Journal of medical genetics, 43(8), pp. 691–8. doi: 10.1136/jmg.2006.041376.

Ozyilmaz, E., Canbakan, S., Capan, N., Erturk, A. and Gulhan, M. (2009) ‘Correlation of plasma transforming growth factor beta 1 with asthma control test’, Allergy and Asthma Proceedings, 30(1), pp. 35–40. doi: 10.2500/aap.2009.30.3192.

Packard, K. A. and Khan, M. M. (2003) ‘Effects of histamine on Th1/Th2 cytokine balance’, International

31

Immunopharmacology, 3(7), pp. 909–920. doi: 10.1016/S1567-5769(02)00235-7.

Palmer, C. N. A., Doney, A. S. F., Ismail, T., Lee, S. P., Murrie, I., Macgregor, D. F. and Mukhopadhyay, S. (2007) ‘PPARG locus haplotype variation and exacerbations in asthma’, Clinical Pharmacology and Therapeutics, 81(5), pp. 713–718. doi: 10.1038/sj.clpt.6100119.

Park, S. J., Min, K. H. and Lee, Y. C. (2008) ‘Phosphoinositide 3-kinase δ inhibitor as a novel therapeutic agent in asthma’, Respirology, 13(6), pp. 764–771. doi: 10.1111/j.1440-1843.2008.01369.x.

Park, S. M., Chang, H. S., Rhim, T., Park, S. W., Jang, A. S., Park, J. S., Uh, S. T., Na, J. O., Hwang, H. G., Kim, Y. H., Lee, M. Y., Chung, I. Y., Park, B. L., Shin, H. D. and Park, C. S. (2010) ‘Association of IKBA gene polymorphisms with the development of asthma’, Human Immunology. Elsevier Inc., 71(11), pp. 1147–1153. doi: 10.1016/j.humimm.2010.07.002.

Parman, C., Halling, C. and Gentleman, R. (2016) ‘affyQCReport: QC Report Generation for affyBatch objects’. R package version 1.52.0.

Pe’er, I., Yelensky, R., Altshuler, D. and Daly, M. J. (2008) ‘Estimation of the multiple testing burden for genomewide association studies of nearly all common variants’, Genetic Epidemiology, 32(4), pp. 381–385. doi: 10.1002/gepi.20303.

Perišić, T., Srećković, M. and Matić, G. (2007) ‘An imbalance in antioxidant enzymes and stress proteins in childhood asthma’, Clinical Biochemistry, 40(15), pp. 1168–1171. doi: 10.1016/ j.clinbiochem.2007.06.006.

Phipson, B., Lee, S., Majewski, I. J., Alexander, W. S. and Smyth, G. K. (2016) ‘Robust hyperparameter estimation protects against hypervariable genes and improves power to detect differential expression’, Annals of Applied Statistics, 10(2), pp. 946–963. doi: 10.1214/16-AOAS920.

Polosa, R., Puddicombe, S. M., Krishna, M. T., Tuck, A. B., Howarth, P. H., Holgate, S. T. and Davies, D. E. (2002) ‘Expression of c-erbB receptors and ligands in the bronchial epithelium of asthmatic subjects’, Journal of Allergy and Clinical Immunology, 109(1), pp. 75–81. doi: 10.1067/mai.2002.120274.

Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M. A. R., Bender, D., Maller, J., Sklar, P., de Bakker, P. I. W., Daly, M. J. and Sham, P. C. (2007) ‘PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses’, The American Journal of Human Genetics, 81(3), pp. 559–575. doi: 10.1086/519795.

Qi, X., Gurung, P., Malireddi, R. K. S., Karmaus, P. W. F., Sharma, D., Vogel, P., Chi, H., Green, D. R. and Kanneganti, T. D. (2017) ‘Critical role of caspase-8-mediated IL-1 signaling in promoting Th2 responses during asthma pathogenesis’, Mucosal Immunology, 10(1), pp. 128–138. doi: 10.1038/mi.2016.25.

Qu, Y.-L., Liu, J., Zhang, L.-X., Wu, C.-M., Chu, A.-J., Wen, B.-L., Ma, C., Yan, X., Zhang, X., Wang, D.-M., Lv, X. and Hou, S.-J. (2017) ‘Asthma and the risk of lung cancer: a meta-analysis’, Oncotarget, 8(7). doi: 10.18632/oncotarget.14595.

Remans, P. H. J., Gringhuis, S. I., van Laar, J. M., Sanders, M. E., Papendrecht-van der Voort, E. A. M., Zwartkruis, F. J. T., Levarht, E. W. N., Rosas, M., Coffer, P. J., Breedveld, F. C., Bos, J. L., Tak, P. P., Verweij, C. L. and Reedquist, K. A. (2004) ‘Rap1 Signaling Is Required for Suppression of Ras-Generated Reactive Oxygen Species and Protection Against Oxidative Stress in T Lymphocytes’, The Journal of Immunology, 173(2), pp. 920–931. doi: 10.4049/jimmunol.173.2.920.

Ritchie, M. E., Phipson, B., Wu, D., Hu, Y., Law, C. W., Shi, W. and Smyth, G. K. (2015) ‘Limma powers differential expression analyses for RNA-sequencing and microarray studies’, Nucleic Acids Research, 43(7), p. e47. doi: 10.1093/nar/gkv007.

Rosenberger, A., Bickeböller, H., McCormack, V., Brenner, D. R., Duell, E. J., Tjønneland, A., Friis, S.,

32

Muscat, J. E., Yang, P., Wichmann, H. E., Heinrich, J., Szeszenia-Dabrowska, N., Lissowska, J., Zaridze, D., Rudnai, P., Fabianova, E., Janout, V., Bencko, V., Brennan, P., Mates, D., Schwartz, A. G., Cote, M. L., Zhang, Z. F., Morgenstern, H., Oh, S. S., Field, J. K., Raji, O., McLaughlin, J. R., Wiencke, J., LeMarchand, L., Neri, M., Bonassi, S., Andrew, A. S., Lan, Q., Hu, W., Orlow, I., Park, B. J., Boffetta, P. and Hung, R. J. (2012) ‘Asthma and lung cancer risk: A systematic investigation by the international lung cancer consortium’, Carcinogenesis, 33(3), pp. 587–597. doi: 10.1093/carcin/bgr307.

Sato, R., Tomita, K., Sano, H., Ichihashi, H., Yamagata, S., Sano, A., Yamagata, T., Miyara, T., Iwanaga, T., Muraki, M. and Tohda, Y. (2009) ‘The Strategy for Predicting Future Exacerbation of Asthma Using a Combination of the Asthma Control Test and Lung Function Test’, Journal of Asthma, 46(7), pp. 677–682. doi: 10.1080/02770900902972160.

Sean, D. and Meltzer, P. S. (2007) ‘GEOquery: A bridge between the Gene Expression Omnibus (GEO) and BioConductor’, Bioinformatics, 23(14), pp. 1846–1847. doi: 10.1093/bioinformatics/btm254.

Shannon, J., Ernst, P., Yamauchi, Y., Olivenstein, R., Lemiere, C., Foley, S., Cicora, L., Ludwig, M., Hamid, Q. and Martin, J. G. (2008) ‘Differences in airway cytokine profile in severe asthma compared to moderate asthma’, Chest. The American College of Chest Physicians, 133(2), pp. 420–426. doi: 10.1378/chest.07-1881.

Sharma, A., Menche, J., Huang, C. C., Ort, T., Zhou, X., Kitsak, M., Sahni, N., Thibault, D., Voung, L., Guo, F., Ghiassian, S. D., Gulbahce, N., Baribaud, F., Tocker, J., Dobrin, R., Barnathan, E., Liu, H., Panettieri, R. A., Tantisira, K. G., Qiu, W., Raby, B. A., Silverman, E. K., Vidal, M., Weiss, S. T. and Barabási, A.-L. (2015) ‘A disease module in the interactome explains disease heterogeneity, drug response and captures novel pathways and genes in asthma’, Human Molecular Genetics, 24(11), pp. 3005–3020. doi: 10.1093/hmg/ddv001.

Sharma, S., Rajan, U. M., Kumar, A., Soni, A. and Ghosh, B. (2005) ‘A novel (TG)n(GA)mrepeat polymorphism 254 bp downstream of the mast cell chymase (CMA1) gene is associated with atopic asthma and total serum IgE levels’, Journal of Human Genetics, 50(6), pp. 276–282. doi: 10.1007/s10038-005-0252-x.

Shaw, D. E., Berry, M. A., Hargadon, B., McKenna, S., Shelley, M. J., Green, R. H., Brightling, C. E., Wardlaw, A. J. and Pavord, I. D. (2007) ‘Association between neutrophilic airway inflammation and airflow limitation in adults with asthma’, Chest, 132(6), pp. 1871–1875. doi: 10.1378/chest.07-1047.

Shaw, D. E., Sousa, A. R., Fowler, S. J., Fleming, L. J., Roberts, G., Corfield, J., Pandis, I., Bansal, A. T., Bel, E. H., Auffray, C., Compton, C. H., Bisgaard, H., Bucchioni, E., Caruso, M., Chanez, P., Dahlén, B., Dahlen, S. E., Dyson, K., Frey, U., Geiser, T., De Verdier, M. G., Gibeon, D., Guo, Y. K., Hashimoto, S., Hedlin, G., Jeyasingham, E., Hekking, P. P. W., Higenbottam, T., Horváth, I., Knox, A. J., Krug, N., Erpenbeck, V. J., Larsson, L. X., Lazarinis, N., Matthews, J. G., Middelveld, R., Montuschi, P., Musial, J., Myles, D., Pahus, L., Sandström, T., Seibold, W., Singer, F., Strandberg, K., Vestbo, J., Vissing, N., Von Garnier, C., Adcock, I. M., Wagers, S., Rowe, A., Howarth, P., Wagener, A. H., Djukanovic, R., Sterk, P. J. and Chung, K. F. (2015) ‘Clinical and inflammatory characteristics of the European U-BIOPRED adult severe asthma cohort’, European Respiratory Journal, 46(5), pp. 1308–1321. doi: 10.1183/13993003.00779-2015.

Skloot, G. S. (2016) ‘Asthma phenotypes and endotypes’, Current Opinion in Pulmonary Medicine, 22(1), pp. 3–9. doi: 10.1097/MCP.0000000000000225.

Sokolowska, M., Borowiec, M., Ptasinska, A., Cieslak, M., Shelhamer, J. H., Kowalski, M. L. and Pawliczak, R. (2007) ‘85-kDa cytosolic phospholipase A2 group IValpha gene promoter polymorphisms in patients with severe asthma: a gene expression and case-control study.’, Clinical and experimental immunology, 150(1), pp. 124–131. doi: 10.1111/j.1365-2249.2007.03459.x.

Song, G. G. and Lee, Y. H. (2013) ‘Pathway analysis of genome-wide association study on asthma’,

33

Human Immunology. American Society for Histocompatibility and Immunogenetics, 74(2), pp. 256–260. doi: 10.1016/j.humimm.2012.11.003.

Sun, X., Vilar, S. and Tatonetti, N. P. (2013) ‘High-Throughput Methods for Combinatorial Drug Discovery’, Science Translational Medicine, 5(205), p. 205rv1-205rv1. doi: 10.1126/ scitranslmed.3006667.

Suzuki, R. and Shimodaira, H. (2015) ‘pvclust: Hierarchical Clustering with P-Values via Multiscale Bootstrap Resampling.’ R package version 2.0-0. Available at: http://www.sigmath.es.osaka-u.ac.jp/shimo-lab/prog/pvclust/.

Szczepankiewicz, A., Brȩborowicz, A., Skibińska, M., Wiłkość, M., Tomaszewska, M. and Hauser, J. (2007) ‘Association analysis of tyrosine kinase FYN gene polymorphisms in asthmatic children’, International Archives of Allergy and Immunology, 145(1), pp. 43–47. doi: 10.1159/000107465.

Szklarczyk, D., Morris, J. H., Cook, H., Kuhn, M., Wyder, S., Simonovic, M., Santos, A., Doncheva, N. T., Roth, A., Bork, P., Jensen, L. J. and Von Mering, C. (2017) ‘The STRING database in 2017: Quality-controlled protein-protein association networks, made broadly accessible’, Nucleic Acids Research, 45(D1), pp. D362–D368. doi: 10.1093/nar/gkw937.

Takahashi, E., Onda, K., Hirano, T., Oka, K., Maruoka, N., Tsuyuguchi, M., Matsumura, Y., Niitsuma, T. and Hayashi, T. (2002) ‘Expression of c-fos, rather than c-jun or glucocorticoid-receptor mRNA, correlates with decreased glucocorticoid response of peripheral blood mononuclear cells in asthma’, International Immunopharmacology, 2(10), pp. 1419–1427. doi: 10.1016/S1567-5769(02)00083-8.

Takeda, M., Ito, W., Tanabe, M., Ueki, S., Kihara, J., Kato, H., Tanigai, T., Kayaba, H., Sasaki, T. and Chihara, J. (2010) ‘The pathophysiological roles of PI3Ks and therapeutic potential of selective inhibitors in allergic inflammation’, International Archives of Allergy and Immunology, 152 (Suppl. 1), pp. 90–95. doi: 10.1159/000312132.

Tao, B., Ruan, G., Wang, D., Li, Y., Wang, Z. and Yin, G. (2015) ‘Imbalance of peripheral Th17 and regulatory T cells in children with allergic rhinitis and bronchial asthma’, Iranian Journal of Allergy, Asthma and Immunology, 14(3), pp. 273–279.

Taylor, I. W., Linding, R., Warde-Farley, D., Liu, Y., Pesquita, C., Faria, D., Bull, S., Pawson, T., Morris, Q. and Wrana, J. L. (2009) ‘Dynamic modularity in protein interaction networks predicts breast cancer outcome.’, Nature biotechnology, 27(2), pp. 199–204. doi: 10.1038/nbt.1522.

Thun, G. A., Imboden, M., Berger, W., Rochat, T. and Probst-Hensch, N. M. (2013) ‘The association of a variant in the cell cycle control gene CCND1 and obesity on the development of asthma in the Swiss SAPALDIA study’, Journal of Asthma, 50(2), pp. 147–154. doi: 10.3109/02770903.2012.757776.

Tillmann, T., Gibson, A. R., Scott, G., Harrison, O., Dominiczak, A. and Hanlon, P. (2015) ‘Systems medicine 2.0: Potential benefits of combining electronic health care records with systems science models’, Journal of Medical Internet Research, 17(3), p. e64. doi: 10.2196/jmir.3082.

Turner, S. (2017) ‘qqman: Q-Q and Manhattan Plots for GWAS Data.’ R package version 0.1.4.

U-BIOPRED EB (2010) Charter for the Ethics Board of U-BIOPRED. Available at: http://www.europeanlung.org/assets/microsites/ubiopred/files/ethics-board-charter-and-criteria.pdf.

U-BIOPRED PIP (2016) A short guide to successful patient involvement in EU-funded research. Available at: http://www.imi.europa.eu/sites/default/files/archive/uploads/PatientWorkshop2016/UBIOPRED _guide_2016.pdf.

Vlaic, S., Tokarski-Schnelle, C., Gustafsson, M., Dahmen, U., Guthke, R. and Schuster, S. (2017)

34

‘ModuleDiscoverer: Identification of regulatory modules in protein-protein interaction networks.’, bioRxiv, p. 119099. doi: 10.1101/119099.

Walker, J. K. L., Fong, A. M., Lawson, B. L., Savov, J. D., Patel, D. D., Schwartz, D. A. and Lefkowitz, R. J. (2003) ‘Arrestin-2 regulates the development of allergic asthma’, Journal of Clinical Investigation, 112(4), pp. 566–574. doi: 10.1172/JCI200317265.Introduction.

Wang, L. J., Hao, L., Li, H. T., Lu, L. G., Bian, H., Liu, M. and Song, L. L. (2012) ‘Expressions of OB-R, IRF-1 and GR-beta in airway smooth muscle cells of obese rats with asthma’, Chinese journal of cellular and molecular immunology, 28(10), pp. 1037–1040.

Wang, X., Saito, J., Ishida, T. and Munakata, M. (2006) ‘Polymorphism of egfr intron1 is associated with susceptibility and severity of asthma’, Journal of Asthma, 43(9), pp. 711–715. doi: 10.1080/02770900600925247.

Warnes, G. R., Bolker, B., Bonebakker, L., Gentleman, R., Huber Andy Liaw, W., Lumley, T., Maechler, M., Magnusson, A., Moeller, S., Schwartz, M. and Venables, B. (2016) ‘gplots: Various R Programming Tools for Plotting Data.’ R package version 3.0.1.

Watts, D. J. and Strogatz, S. H. (1998) ‘Collective dynamics of “small-world” networks’, Nature, 393(6684), pp. 440–442. doi: 10.1038/30918.

Weiner, J. (2017) ‘pca3d: Three Dimensional PCA Plots’. R package version 0.10. Available at: https://cran.r-project.org/web/packages/pca3d/vignettes/pca3d.pdf.

Wenzel, S. E. (2012) ‘Severe asthma: from characteristics to phenotypes to endotypes’, Clinical & Experimental Allergy, 42(5), pp. 650–658. doi: 10.1111/j.1365-2222.2011.03929.x.

Wenzel, S. E. (2006) ‘Asthma: defining of the persistent adult phenotypes’, Lancet, 368(9537), pp. 804–813. doi: 10.1016/S0140-6736(06)69290-8.

Wickham, H. and Chang, W. (2016) ‘ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics’. R package version 2.2.1.

Wolkenhauer, O., Auffray, C., Brass, O., Clairambault, J., Deutsch, A., Drasdo, D., Gervasio, F., Preziosi, L., Maini, P., Marciniak-Czochra, A., Kossow, C., Kuepfer, L., Rateitschak, K., Ramis-Conde, I., Ribba, B., Schuppert, A., Smallwood, R., Stamatakos, G., Winter, F. and Byrne, H. (2014) ‘Enabling multiscale modeling in systems medicine’, Genome Medicine, 6(3), p. 21. doi: 10.1186/gm538.

Wong, W. S. F. (2005) ‘Inhibitors of the tyrosine kinase signaling cascade for asthma’, Current Opinion in Pharmacology, 5(3 SPEC. ISS.), pp. 264–271. doi: 10.1016/j.coph.2005.01.009.

Wong, W. S. F. and Leong, K. P. (2004) ‘Tyrosine kinase inhibitors: A new approach for asthma’, Biochimica et Biophysica Acta - Proteins and Proteomics, 1697(1–2), pp. 53–69. doi: 10.1016/ j.bbapap.2003.11.013.

World Medical Association (2013) Declaration of Helsinki - Ethical Principles for Medical Research Involving Human Subjects, 64th WMA General Assembly. Fortaleza, Brazil. doi: 10.3917/jib.151.0124.

Wu, Z., Irizarry, R. A., Gentleman, R., Martinez-Murillo, F. and Spencer, F. (2004) ‘A Model-Based Background Adjustment for Oligonucleotide Expression Arrays’, Journal of the American Statistical Association, 99(468), pp. 909–917. doi: 10.1198/016214504000000683.

Xu, C. J., Söderhäll, C., Bustamante, M., Baïz, N., Gruzieva, O., Gehring, U., Mason, D., Chatzi, L., Basterrechea, M., Llop, S., Torrent, M., Forastiere, F., Fantini, M. P., Carlsen, K. C. L., Haahtela, T., Morin, A., Kerkhof, M., Merid, S. K., van Rijkom, B., Jankipersadsing, S. A., Bonder, M. J., Ballereau, S., Vermeulen, C. J., Aguirre-Gamboa, R., de Jongste, J. C., Smit, H. A., Kumar, A., Pershagen, G., Guerra,

35

S., Garcia-Aymerich, J., Greco, D., Reinius, L., McEachan, R. R. C., Azad, R., Hovland, V., Mowinckel, P., Alenius, H., Fyhrquist, N., Lemonnier, N., Pellet, J., Auffray, C., van der Vlies, P., van Diemen, C. C., Li, Y., Wijmenga, C., Netea, M. G., Moffatt, M. F., Cookson, W. O. C. M., Anto, J. M., Bousquet, J., Laatikainen, T., Laprise, C., Carlsen, K. H., Gori, D., Porta, D., Iñiguez, C., Bilbao, J. R., Kogevinas, M., Wright, J., Brunekreef, B., Kere, J., Nawijn, M. C., Annesi-Maesano, I., Sunyer, J., Melén, E. and Koppelman, G. H. (2018) ‘DNA methylation in childhood asthma: An epigenome-wide meta-analysis’, The Lancet Respiratory Medicine, 2600(18), pp. 1–10. doi: 10.1016/S2213-2600(18)30052-3.

Xu, J. and Li, Y. (2006) ‘Discovering disease-genes by topological features in human protein-protein interaction network’, Bioinformatics, 22(22), pp. 2800–2805. doi: 10.1093/bioinformatics/btl467.

Yu, G., Wang, L.-G., Han, Y. and He, Q.-Y. (2012) ‘clusterProfiler: an R Package for Comparing Biological Themes Among Gene Clusters’, OMICS: A Journal of Integrative Biology, 16(5), pp. 284–287. doi: 10.1089/omi.2011.0118.

Yu, H., Braun, P., Yildirim, M. A., Lemmens, I., Venkatesan, K., Sahalie, J., Hirozane-Kishikawa, T., Gebreab, F., Li, N., Simonis, N., Hao, T., Rual, J.-F., Dricot, A., Vazquez, A., Murray, R. R., Simon, C., Tardivo, L., Tam, S., Svrzikapa, N., Fan, C., de Smet, A.-S., Motyl, A., Hudson, M. E., Park, J., Xin, X., Cusick, M. E., Moore, T., Boone, C., Snyder, M., Roth, F. P., Barabasi, A.-L., Tavernier, J., Hill, D. E. and Vidal, M. (2008) ‘High-Quality Binary Protein Interaction Map of the Yeast Interactome Network’, Science, 322(5898), pp. 104–110. doi: 10.1126/science.1158684.

Yuksel, H., Kose, C., Yilmaz, O., Ozbilgin, K., Degirmenci, P. B., Pinar, E. and Kirmaz, C. (2007) ‘Increased expression of tissue vascular endothelial growth factor and foetal liver kinase-1 receptor in seasonal allergic rhinitis and relevance to asthma component’, Clinical and Experimental Allergy, 37(8), pp. 1183–1188. doi: 10.1111/j.1365-2222.2007.02763.x.

Zhang, H., Gustafsson, M., Nestor, C., Chung, K. F. and Benson, M. (2014) ‘Targeted omics and systems medicine: Personalising care’, The Lancet Respiratory Medicine. Elsevier Ltd, 2(10), pp. 785–787. doi: 10.1016/S2213-2600(14)70188-2.

Zhang, W., Nie, Y., Chong, L., Cai, X., Zhang, H., Lin, B., Liang, Y. and Li, C. (2013) ‘PI3K and Notch signal pathways coordinately regulate the activation and proliferation of T lymphocytes in asthma’, Life Sciences. Elsevier Inc., 92(17–19), pp. 890–895. doi: 10.1016/j.lfs.2013.03.005.

Zhang, Y., Jing, Y., Qiao, J., Luan, B., Wang, X., Wang, L. and Song, Z. (2017) ‘Activation of the mTOR signaling pathway is required for asthma onset’, Scientific Reports. Springer US, 7(1), pp. 1–13. doi: 10.1038/s41598-017-04826-y.

Zhu, G., Carlsen, K., Carlsen, K. H., Lenney, W., Silverman, M., Whyte, M. K., Hosking, L., Helms, P., Roses, A. D., Hay, D. W., Barnes, M. R., Anderson, W. H. and Pillai, S. G. (2008) ‘Polymorphisms in the endothelin-1 (EDN1) are associated with asthma in two populations’, Genes and Immunity, 9(1), pp. 23–29. doi: 10.1038/sj.gene.6364441.

36

Appendices

Appendix A

Table A1 | List of principal R packages used throughout the project and their specific application, ordered by appearance in the project workflow.

R package and version

Repository Application Reference

Biobase

2.34.0

Bioconductor Provides basic functions and objects needed for other Bioconductor R packages.

Huber et al. (2015)

GEOquery

2.40.0

Bioconductor Download and parsing of GEO SOFT format files for U-BIOPRED microarray gene expression datasets.

Sean and Meltzer (2007)

affycomp

1.50.0

Bioconductor Provides functions for the comparison of expression measures for Affymetrix oligonucleotide arrays.

Irizarry and Cawley (2017)

affy

1.52.0

Bioconductor Pre-processing and quality analysis for Affymetrix microarray gene expression datasets.

Gautier et al. (2004)

affyQCReport

1.52.0

Bioconductor Production of quality control reports for AffyBatch objects from the affyQC workflow.

Parman, Halling and Gentleman

(2016)

pca3d

0.10

CRAN Creation and 3D representation of principal component analysis plots. Used during the covariate relevance testing prior to the design of linear models.

Weiner (2017)

FactoMiner

1.40

CRAN Multiple factor analysis of categorical or quantitative clinical covariates of U-BIOPRED individuals.

Husson, Le and Pages (2017)

PCAmixdata

3.1

CRAN Multiple factor analysis of the combined categorical and quantitative clinical covariates of U-BIOPRED individuals.

Chavent et al. (2017)

genefilter

1.56.0

Bioconductor Gene filtering from microarray data via the selection of the top 1 % genes of the normalized expression matrix.

Gentleman et al. (2017)

mlbench

2.1-1

CRAN Production of generalized linear models to further examine the significance of U-BIOPRED clinical covariates in asthma.

Leisch and Dimitriadou

(2012)

limma

3.30.13

Bioconductor Production of linear models for the identification of differentially expressed genes from microarray data.

Ritchie et al. (2015)

qvalue

2.6.0

Bioconductor Calculation of q-values and false discovery rates from p-values obtained from simultaneous testing of multiple hypotheses. Used for DEG significance estimation prior to linear modeling.

Bass JDSwcfAJ, Dabney and

Robinson (2015)

gplots

3.0.1

CRAN Generation of clinical covariates and gene expression heat maps with the heatmap.2 function for sample categorization prior to linear modeling. Visualization of adjacency matrices for consensus modules.

Warnes et al. (2016)

d3heatmap

0.6.1.2

CRAN Creation of interactive heat maps. Used for the exploration of covariate significance, and for the

Cheng et al. (2018)

37

visualization of adjacency matrices for consensus modules and core asthma modules.

pvclust

2.0-0

CRAN Multiscale bootstrap resampling for hierarchical cluster uncertainty analysis. Used to determine optimal dendrogram height cutoffs for asthmatic or control cluster identification, based on approximately unbiased (AU) p-values and bootstrap probability (BP).

Suzuki and Shimodaira

(2015)

mclust

5.4

CRAN Fitting of Gaussian finite mixture models with model-based clustering algorithms. Used for the determination of the optimal number of clusters for the classification of DEG, sample, and module genes.

Fraley et al. (2017)

org.Hs.eg.db

3.4.0

Bioconductor Gene annotation. Identifier mapping and conversion for DEGs, and network genes and gene products (Gene Symbol, Entrez ID, Ensembl ID, protein names).

Carlson (2017)

annotate

1.52.1

Bioconductor Microarray data annotation. Provides experiment level annotation resources and functions to support gene mapping strategies from other Bioconductor R packages.

Gentleman (2018)

oligo

1.38.0

Bioconductor Analysis of oligonucleotide arrays. Used in the preparation of MODifieR method-specific input files. Provides functions to support identifier conversion and production of gene expression p-value matrices in the particular format required by each MODifieR method.

Carvalho and Irizarry (2010)

MODifieR

0.0.1

GitLab Identification and refinement of robust gene expression modules from U-BIOPRED microarray data. The MODifieR methods used in the project were MCODE, Clique Sum, and DIAMOnD.

Badam et al. (2017)

factoextra

1.0.5

CRAN Unsupervised clustering of distance measures of MODifieR module genes from U-BIOPRED individuals.

Kassambara and Mundt (2017)

clusterProfiler

3.2.14

Bioconductor Statistical analysis and visualization of functional profiles for module and core module genes. Gene Ontology, KEGG and Disease Ontology over-representation tests.

Yu et al. (2012)

qqman

0.1.4

CRAN Production of Q-Q and manhattan plots for U-BIOPRED GWAS data from PLINK output. Identification of significant and suggestive asthma-associated SNPs.

Turner (2017)

rsnps

0.2.0

CRAN SNP annotation. Conversion of U-BIOPRED and GABRIEL project GWAS rsID numbers to SNP-associated gene identifiers. Discovery of SNPs in LD for the GWAS-module genes overlap analysis.

Chamberlain, Ushey and Zhu

(2016)

ggplot2

2.2.1

CRAN Generation of boxplot and scatter plot overlays for the GWAS-module genes overlap analysis.

Wickham and Chang (2016)

38

Appendix B

Table B1 | Baseline clinical features of patients with severe or moderate asthma, and healthy controls, from the microarray gene expression profiling in blood (GEO accession GSE69683, n = 498).

Healthy Moderate asthma Severe asthma

N 87 77 334

Female 34 (39.1) 37 (48.1) 204 (61.1)

Age (years) 39 ± 14 42 ± 16 52 ± 14

BMI (kg/m2) 25.2 ± 3.4 25.8 ± 4.5 29.2 ± 6.2

Smokers 0 0 88 (26.3)

Total IgE (IU/ml) 26.8 (9.0 - 68.1)

n = 84 102.0 (53.1 - 244.0)

n = 74 119.5 (48.0 - 323.2)

n = 322

CRP (mg/l) 1.0 (1.0 - 3.0)

n = 81 2.3 (1.0 - 5.0)

n = 75 4.0 (1.6 - 6.0)

n = 320

Blood eosinophils x 103/μl 0.1 (0.1 - 0.2)

n = 87 0.2 (0.1 - 0.3)

n = 77 0.2 (0.1 - 0.4)

n = 325

Blood neutrophils x 103/μl 3.0 (2.4 - 3.9)

n = 87

3.3 (2.7 - 4.5) n = 77

4.7 (3.6 - 6.4) n = 325

Periostin (ng/ml) 49.2 (43.4 - 56.8)

n = 79

50.7 (40.6 - 54.9) n = 62

48.2 (40.0 - 59.3) n = 271

Sputum eosinophils % 0 (0 - 0.2)

n = 35 0.9 (0.3 - 3.4)

n = 38 2.9 (0.9 - 16.3)

n = 146

Sputum neutrophils % 36.5 (21.0 - 52.7)

n = 35 38.8 (24.0 - 59.1)

n = 38 51.5 (34.2 - 71.1)

n = 146

Sputum macrophages % 61.7 (43.6 - 76.7)

n = 35

44.6 (36.5 - 68.0) n = 38

29.8 (14.3 - 50.1) n = 147

Sputum lymphocytes % 1.3 (0.5 - 2.1)

n = 35

1.9 (0.9 - 2.9) n = 38

1.0 (0.4 - 1.8) n = 145

Sputum mast cells % 0 (0 - 0)

n = 35 0 (0 - 0)

n = 38 0 (0 - 0) n = 145

FeNO (ppb) 19 (14 - 27)

n = 83 26 (18 - 45)

n = 76 26 (15 - 46)

n = 314

FEV1 % 102.6 (94.1 - 110.3)

n = 87

92.0 (77.4 - 102.0) n = 76

66.5 (51.3 - 81.9) n = 333

Eczema 5 (5.7) 20 (26) 109 (32.6)

Allergic rhinitis 5 (5.7) 38 (49.4) 164 (49.1)

Atopy 30 (34.5) 62 (80.5) 220 (65.9)

Sinusitis 2 (2.3) 12 (15.6) 104 (31.1)

Nasal polyps 3 (3.4) 6 (7.8) 98 (29.3)

Oral corticosteroids use 0 0 132 (39.5)

BMI: Body Mass Index; IgE: immunoglobulin E; CRP: C-reactive protein; FeNO: Fractional Exhaled Nitric Oxide; FEV1: Forced Expiratory Volume in 1 second. Data are presented as n, mean ± SD, n (%) or median (interquartile range).

39

Table B2 | Baseline clinical features of patients with severe or moderate asthma, and healthy controls, from the microarray gene expression profiling in induced sputum (GEO accession GSE76262, n = 139).

Healthy Moderate asthma Severe asthma

N 21 25 93

Female 6 (28.6) 13 (52.0) 57 (61.30)

Age (years) 38 ± 14 42 ± 15 53 ± 12

BMI (kg/m2) 26.4 ± 3.2 26.2 ± 4.8 28.8 ± 6.2

Smokers 0 0 0

Total IgE (IU/ml) 32.0 (13.0 - 82.5)

n = 21 85.6 (46.0 - 165.0)

n = 25 113.0 (48.0 - 254.0)

n= 89

CRP (mg/l) 1.1 (1.0 - 2.1)

n = 20

1.6 (1.0 - 3.0) n = 24

3.0 (1.1 - 7.2) n = 90


n = 21

0.2 (0.1 - 0.3) n = 25

0.3 (0.1 - 0.5) n = 90


n = 21 3.3 (3.0 - 4.0)

n = 25 5.0 (3.9 - 6.8)

n = 90

Periostin (ng/ml) 46.3 (43.3 - 50.3)

n = 19 47.8 (37.5 - 53.8)

n = 21 49.2 (40.9 - 59.7)

n = 77

Sputum eosinophils % 0 (0 - 0.2)

n = 21

0.7 (0.2 - 1.5) n = 25

3.9 (0.5 - 18.4) n = 92


n = 21 41.1 (22.1 - 63.6)

n = 25 58.1 (35.0 - 79.1)

n = 92


n = 21 53.3 (35.8 - 68.1)

n = 25 24.6 (11.4 - 39.9)

n = 93


n = 21 1.4 (0.8 - 2.6)

n = 25 0.9 (0.4 - 2.0)

n = 92


n = 21 0 (0 - 0)

n = 25 0 (0 - 0)

n = 92

FeNO (ppb) 16 (14 - 26)

n = 19 27 (19 - 43)

n = 24 27 (15 - 47)

n = 90

FEV1 % 105.4 (98.4 - 112.4)

n = 21 91.5 (87.0 - 101.7)

n = 25 62.9 (51.6 - 79.1)

n = 93

Eczema 1 (4.8) 8 (32.0) 33 (35.5)

Allergic rhinitis 2 (9.5) 13 (52.0) 34 (36.6)

Atopy 6 (28.6) 22 (88.0) 59 (63.4)

Sinusitis 0 4 (16.0) 30 (32.3)

Nasal polyps 2 (9.5) 4 (16.0) 34 (36.6)

Oral corticosteroids use 0 0 44 (47.3)


40

Table B3 | Baseline clinical features of patients with severe or moderate asthma, from the microarray gene expression profiling of bronchial biopsy samples (GEO accession GSE76225, n = 91).

Moderate asthma Severe asthma

N 35 56

Female 19 (54.3) 30 (53.6)

Age (years) 41 ± 13 51 ± 12

BMI (kg/m2) 26.6 ± 4.7 29.5 ± 5.6

Smokers 0 0

Total IgE (IU/ml) 87.0 (44.0 - 166.5)

n = 35 119.0 (43.0 - 400.0)

n = 55

CRP (mg/l) 1.0 (1.0 - 4.5)

n = 32

2.7 (1.0 - 5.0) n = 55


n = 35

0.2 (0.1 - 0.3) n = 56


n = 35 4.9 (3.7 - 6.6)

n = 56

Periostin (ng/ml) 43.8 (38.2 - 50.8)

n = 31 46.0 (40.9 - 53.2)

n = 48

Sputum eosinophils % 0.6 (0 - 1.4)

n = 15

2.8 (0.6 - 13.8) n = 29


n = 15

52.1 (37.5 - 63.5) n = 29


n = 15 36.2 (16.6 - 53.6)

n = 29


n = 15 0.7 (0.2 - 1.4)

n = 29


n = 15

0 (0 - 0) n = 29

FeNO (ppb) 21 (16 - 50)

n = 35

26 (16 - 46) n = 51

FEV1 % 92.3 (72.3 - 103.1)

n = 35 69.5 (52.3 - 84.6)

n = 56

Eczema 10 (28.6) 26 (46.4)

Allergic rhinitis 16 (45.7) 27 (48.2)

Atopy 27 (77.1) 38 (67.9)

Sinusitis 5 (14.3) 16 (28.6)

Nasal polyps 2 (5.7) 21 (37.5)

Oral corticosteroids use 0 24 (42.9)


41

Table B4 | Baseline clinical features of patients with severe or moderate asthma, from the microarray gene expression profiling of epithelial brushing samples (GEO accession GSE76226, n = 99).


N 36 63

Female 21 (58.3) 31 (49.2)

Age (years) 39 ± 14 49 ± 13

BMI (kg/m2) 26.4 ± 4.7 30.2 ± 6.0

Smokers 0 0

Total IgE (IU/ml) 102.0 (44.0 - 172.8)

n = 36 124.0 (39.0 - 402.5)

n = 62

CRP (mg/l) 1.0 (1.0 - 4.8)

n= 34 3.0 (1.0 - 5.0)

n = 61


n = 36 0.2 (0.1 - 0.3)

n = 63


n = 36 5.0 (3.6 - 6.6)

n = 63

Periostin (ng/ml) 45.7 (39.6 - 52.8)

n = 32 46.1 (40.4 - 53.5)

n = 55


n = 15 1.6 (0.5 - 16.0)

n = 28


n = 15 49.6 (36.9 - 70.1)

n = 28


n = 15 31.0 (12.9 - 54.3)

n = 28


n = 15 0.7 (0.3 - 1.3)

n = 28


n = 15 0 (0 - 0)

n = 28

FeNO (ppb) 21 (16 - 56)

n = 36 27 (16 - 52)

n = 58

FEV1 % 95.1 (82.9 - 104.2)

n = 36 72.2 (56.3 - 88.7)

n = 63

Eczema 10 (27.8) 30 (47.6)


Atopy 27 (75.0) 43 (68.3)

Sinusitis 5 (13.9) 18 (28.6)

Nasal polyps 2 (5.6) 23 (36.5)



42

Table B5 | Baseline clinical features of patients with severe or moderate asthma, from the microarray gene expression profiling of bronchial biopsy and epithelial brushing samples (GEO accession GSE76227, n = 190).


N 40* 69*

Female 21 (52.5) 31 (44.9)

Age (years) 40 ± 13 49 ± 13

BMI (kg/m2) 26.3 ± 4.8 30.1 ± 5.9

Smokers 0 0

Total IgE (IU/ml) 102.0 (44.0 - 181.8)

n = 40 127.5 (39.0 - 410.0)

n = 68

CRP (mg/l) 1.0 (1.0 - 4.3)

n = 37 3.0 (1.0 - 5.5)

n = 67


n = 40 0.2 (0.1 - 0.3)

n = 69


n = 40 5.0 (3.6 - 6.6)

n = 69

Periostin (ng/ml) 45.7 (39.6 - 52.8)

n = 36 45.4 (39.9 - 53.2)

n = 61


n = 17 2.3 (0.6 - 12.6)

n = 34


n = 17 51.4 (37.3 - 68.2)

n = 34


n = 17 34.9 (16.4 - 55.4)

n = 34


n = 17 0.7 (0.4 - 1.4)

n = 34

Sputum mast cells % 0 (0 - 0) n = 17

0 (0 - 0) n = 34

FeNO (ppb) 22 (17 - 58)

n = 40 27 (16 - 50)

n = 63

FEV1 % 93.8 (74.5 - 103.6)

n = 40 70.3 (55.4 - 88.3)

n = 69

Eczema 10 (25.0) 30 (43.5)


Atopy 27 (67.5) 43 (62.3)

Sinusitis 5 (12.5) 18 (26.1)

Nasal polyps 2 (5.0) 23 (33.3)



*Differences between the number of samples in the gene expression profiling (n = 190) and the number of patients in the study (n = 109) are due to certain patients providing both epithelial brushing and bronchial biopsy samples.

43

Appendix C

Table C1 | Results of the GLM for sample category classification (severe asthmatics versus moderate asthmatics and controls) on U-BIOPRED participants with a complete phenotypic set (n = 88). Data were split into training and validation groups (80:20). The glm R function computes Wald test statistics for model variables (z value is the Wald statistic for H0: regression coefficient = 0; Pr(>|z|) is the tail area in a 2-tail test). The threshold for significant relevance of a parameter in the model was p ≤ 0.05.

Estimate Std. Error z value Pr(>|z|)

(Intercept) 3.91 x 103 6.82 x 106 0.00 1.00

Age 6.95 1.20 x 103 0.01 1.00

BMI (kg/m2) 30.79 5.38 x 103 0.01 1.00

Total IgE (IU/ml) 0.39 66.42 0.01 1.00

CRP (mg/l) -11.62 2.84 x 103 0.00 1.00

Blood eosinophils x 103/μl 0.00 0.91 0.00 1.00

Blood neutrophils x 103/μl 0.00 0.05 0.00 1.00

Periostin (ng/ml) 7.00 1.40 x 103 0.01 1.00

Sputum eosinophils % -40.21 6.85 x 104 0.00 1.00

Sputum neutrophils % -42.60 6.85 x 104 0.00 1.00

Sputum macrophages % -44.46 6.85 x 104 0.00 1.00

Sputum lymphocytes % -33.34 6.91 x 104 0.00 1.00

Sputum mast cells % 488.34 1.44 x 105 0.00 1.00

FeNO (ppb) -0.68 349.25 0.00 1.00

FEV1 % -11.85 1.92 x 103 -0.01 1.00

Eczema -36.62 5.70 x 104 0.00 1.00

Allergic rhinitis 386.67 6.81 x 104 0.01 1.00

Atopy -184.64 4.55 x 104 0.00 1.00

Sinusitis 16.79 8.46 x 104 0.00 1.00

Nasal polyps 46.35 5.32E x 104 0.00 1.00

BMI: Body Mass Index; IgE: immunoglobulin E; CRP: C-reactive protein; FeNO: Fractional Exhaled Nitric Oxide; FEV1: Forced Expiratory Volume in 1 second.

44

Appendix D

Table D1 | Top 50 significantly up-regulated gene-associated probes in severe asthmatic samples in comparison with healthy controls, from the microarray gene expression profiling in blood (GEO accession GSE69683). P-value adjustment for multiple comparisons was done with the BH procedure for FDR control (α = 0.05).

Affymetrix Probe ID Gene Product log2 FC adj. p-value

207269_PM_at DEFA4 defensin alpha 4 1.44 8.27 x 10-10

212768_PM_at OLFM4 olfactomedin 4 1.33 2.60 x 10-7

206676_PM_at CEACAM8 carcinoembryonic antigen related cell adhesion molecule 8

1.29 3.85 x 10-9

202018_PM_s_at LTF lactotransferrin 1.21 4.98 x 10-9

231688_PM_at MMP8 matrix metallopeptidase 8 1.18 6.22 x 10-9

206851_PM_at RNASE3 ribonuclease A family member 3 1.08 5.27 x 10-9

205557_PM_at BPI bactericidal/permeability-increasing protein

1.05 6.09 x 10-9

212531_PM_at LCN2 lipocalin 2 1.04 1.74 x 10-8

207802_PM_at CRISP3 cysteine rich secretory protein 3 1.04 6.72 x 10-9

211657_PM_at CEACAM6 carcinoembryonic antigen related cell adhesion molecule 6

1.02 7.40 x 10-9

206177_PM_s_at ARG1 arginase 1 0.96 6.04 x 10-7

210549_PM_s_at CCL23 C-C motif chemokine ligand 23 0.92 5.96 x 10-6

203757_PM_s_at CEACAM6 carcinoembryonic antigen related cell adhesion molecule 6

0.92 2.58 x 10-8

206697_PM_s_at HP haptoglobin 0.88 1.16 x 10-9

209369_PM_at ANXA3 annexin A3 0.83 3.25 x 10-10

208650_PM_s_at CD24 CD24 molecule 0.82 1.01 x 10-9

205033_PM_s_at DEFA1B / DEFA3 / DEFA1

defensin alpha 1B / defensin alpha 3 / defensin alpha 1

0.81 3.44 x 10-10

205653_PM_at CTSG cathepsin G 0.81 3.90 x 10-7

203021_PM_at SLPI secretory leukocyte peptidase inhibitor

0.81 6.23 x 10-11

210548_PM_at CCL23 C-C motif chemokine ligand 23 0.78 9.54 x 10-6

210254_PM_at MS4A3 membrane spanning 4-domains A3 0.76 2.71 x 10-6

207329_PM_at MMP8 matrix metallopeptidase 8 0.76 7.58 x 10-7

206111_PM_at RNASE2 ribonuclease A family member 2 0.76 3.03 x 10-9

208470_PM_s_at HPR / HP haptoglobin-related protein / haptoglobin

0.75 6.09 x 10-9

210244_PM_at CAMP cathelicidin antimicrobial peptide 0.74 1.74 x 10-9

205041_PM_s_at ORM2 / ORM1 orosomucoid 2 / orosomucoid 1 0.74 3.20 x 10-5

205040_PM_at ORM1 orosomucoid 1 0.72 3.93 x 10-5

219669_PM_at CD177 CD177 molecule 0.70 7.25 x 10-4

45

1552773_PM_at CLEC4D C-type lectin domain family 4 member D

0.67 4.41 x 10-5

231029_PM_at F5 coagulation factor V 0.65 9.44 x 10-12

232034_PM_at LINC00537 long intergenic non-protein coding RNA 537

0.64 7.43 x 10-10

202110_PM_at COX7B cytochrome c oxidase subunit 7B 0.64 5.04 x 10-5

211517_PM_s_at IL5RA interleukin 5 receptor subunit alpha 0.62 8.66 x 10-5

1554892_PM_a_at MS4A3 membrane spanning 4-domains A3 0.62 1.01 x 10-5

1552772_PM_at CLEC4D C-type lectin domain family 4 member D

0.62 4.58 x 10-6

205513_PM_at TCN1 transcobalamin 1 0.61 2.58 x 10-7

206171_PM_at ADORA3 adenosine A3 receptor 0.61 4.07 x 10-7

230720_PM_at RNF182 ring finger protein 182 0.61 1.88 x 10-2

237340_PM_at SLC26A8 solute carrier family 26 member 8 0.60 1.25 x 10-6

231662_PM_at ARG1 arginase 1 0.60 2.80 x 10-5

206343_PM_s_at NRG1 neuregulin 1 0.60 2.20 x 10-4

214523_PM_at CEBPE CCAAT/enhancer binding protein epsilon

0.59 2.92 x 10-8

205681_PM_at BCL2A1 BCL2 related protein A1 0.59 2.05 x 10-5

204774_PM_at EVI2A ecotropic viral integration site 2A 0.58 1.47 x 10-4

238439_PM_at ANKRD22 ankyrin repeat domain 22 0.57 4.76 x 10-5

205863_PM_at S100A12 S100 calcium binding protein A12 0.57 4.33 x 10-8

1552348_PM_at PRSS33 protease, serine 33 0.57 3.13 x 10-3

222945_PM_x_at OLAH oleoyl-ACP hydrolase 0.57 1.09 x 10-4

235568_PM_at MCEMP1 mast cell expressed membrane protein 1

0.57 4.05 x 10-9

237056_PM_at INSC inscuteable homolog 0.56 1.49 x 10-6

46

Table D2 | Top 50 significantly down-regulated gene-associated probes in severe asthmatic samples in comparison with healthy controls, from the microarray gene expression profiling in blood (GEO accession GSE69683). P-value adjustment for multiple comparisons was done with the BH procedure for FDR control (α = 0.05).


233261_PM_at EBF1 early B-cell factor 1 -0.85 3.26 x 10-9

217979_PM_at TSPAN13 tetraspanin 13 -0.80 1.35 x 10-9

237625_PM_s_at IGK / IGKC immunoglobulin kappa locus / immunoglobulin kappa constant

-0.79 1.03 x 10-8

231798_PM_at NOG noggin -0.76 1.42 x 10-7

39318_PM_at TCL1A T-cell leukemia/lymphoma 1A -0.75 9.83 x 10-6

209995_PM_s_at TCL1A T-cell leukemia/lymphoma 1A -0.73 1.05 x 10-5

209374_PM_s_at IGHM immunoglobulin heavy constant mu -0.61 1.17 x 10-6

230877_PM_at IGHD immunoglobulin heavy constant delta

-0.61 1.81 x 10-4

213920_PM_at CUX2 cut like homeobox 2 -0.60 1.84 x 10-9

234474_PM_x_at IL6ST interleukin 6 signal transducer -0.59 2.17 x 10-10

227198_PM_at AFF3 AF4/FMR2 family member 3 -0.58 1.66 x 10-6

202760_PM_s_at AKAP2 A-kinase anchoring protein 2 -0.58 1.38 x 10-7

227646_PM_at EBF1 early B-cell factor 1 -0.57 2.07 x 10-6

214180_PM_at MAN1C1 mannosidase alpha class 1C member 1

-0.56 6.09 x 10-9

233252_PM_s_at STRBP spermatid perinuclear RNA binding protein

-0.56 1.76 x 10-8

209167_PM_at GPM6B glycoprotein M6B -0.56 9.44 x 10-12

212827_PM_at IGHM immunoglobulin heavy constant mu -0.55 2.66 x 10-5

209841_PM_s_at LRRN3 leucine rich repeat neuronal 3 -0.54 8.32 x 10-4

234967_PM_at IL6ST interleukin 6 signal transducer -0.54 6.69 x 10-9

207655_PM_s_at BLNK B-cell linker -0.54 3.60 x 10-7

236796_PM_at BACH2 BTB domain and CNC homolog 2 -0.53 1.01 x 10-7

217418_PM_x_at MS4A1 membrane spanning 4-domains A1 -0.52 7.80 x 10-6

228613_PM_at RAB11FIP3 RAB11 family interacting protein 3 -0.52 6.52 x 10-10

218856_PM_at TNFRSF21 TNF receptor superfamily member 21

-0.52 4.27 x 10-9

243968_PM_x_at FCRL1 Fc receptor like 1 -0.52 2.68 x 10-4

212382_PM_at TCF4 transcription factor 4 -0.52 3.30 x 10-11

222073_PM_at COL4A3 collagen type IV alpha 3 chain -0.52 3.56 x 10-6

1552343_PM_s_at PDE7A phosphodiesterase 7A -0.52 2.28 x 10-11

1560861_PM_at THRA1/BTR uncharacterized LOC105371807 -0.50 9.28 x 10-7

210356_PM_x_at MS4A1 membrane spanning 4-domains A1 -0.50 6.56 x 10-6

47

226122_PM_at PLEKHG1 pleckstrin homology and RhoGEF domain containing G1

-0.50 4.31 x 10-5

1559078_PM_at BCL11A B-cell CLL/lymphoma 11A -0.50 6.11 x 10-9

1565818_PM_s_at IKZF1 IKAROS family zinc finger 1 -0.49 1.03 x 10-8

244011_PM_at PPM1K protein phosphatase, Mg2+/Mn2+ dependent 1K

-0.49 5.81 x 10-7

1555779_PM_a_at CD79A CD79a molecule -0.49 1.24 x 10-5

218975_PM_at COL5A3 collagen type V alpha 3 chain -0.48 5.98 x 10-4

205267_PM_at POU2AF1 POU class 2 associating factor 1 -0.48 1.50 x 10-6

1561035_PM_at LOC100288282 hypothetical protein LOC100288282 -0.48 5.82 x 10-6

232164_PM_s_at EPPK1 epiplakin 1 -0.48 1.57 x 10-6

209136_PM_s_at USP10 ubiquitin specific peptidase 10 -0.48 2.89 x 10-6

217422_PM_s_at CD22 CD22 molecule -0.47 6.47 x 10-5

230983_PM_at FAM129C family with sequence similarity 129 member C

-0.47 8.06 x 10-7


-0.47 2.04 x 10-5

230245_PM_s_at LINC00926 long intergenic non-protein coding RNA 926

-0.47 1.83 x 10-4

204439_PM_at IFI44L interferon induced protein 44 like -0.47 6.59 x 10-3

238070_PM_at CHD1L chromodomain helicase DNA binding protein 1 like

-0.47 4.98 x 10-8

206983_PM_at CCR6 C-C motif chemokine receptor 6 -0.47 2.30 x 10-8

235401_PM_s_at FCRLA Fc receptor like A -0.47 2.46 x 10-4

219024_PM_at PLEKHA1 pleckstrin homology domain containing A1

-0.47 1.49 x 10-6

235982_PM_at FCRL1 Fc receptor like 1 -0.47 3.52 x 10-4

48

Table D3 | Top 50 significantly up-regulated gene-associated probes in severe asthmatic samples in comparison with healthy controls, from the microarray gene expression profiling in induced sputum (GEO accession GSE76262). P-value adjustment for multiple comparisons was done with the BH procedure for FDR control (α = 0.05).


242809_PM_at IL1RL1 interleukin 1 receptor like 1 2.89 3.77 x 10-5

206207_PM_at CLC Charcot-Leyden crystal galectin 2.55 4.84 x 10-5

206618_PM_at IL18R1 interleukin 18 receptor 1 2.44 1.40 x 10-6

207526_PM_s_at IL1RL1 interleukin 1 receptor like 1 2.43 7.98 x 10-5

1552348_PM_at PRSS33 protease, serine 33 1.93 9.21 x 10-4

219159_PM_s_at SLAMF7 SLAM family member 7 1.82 2.17 x 10-4

211372_PM_s_at IL1R2 interleukin 1 receptor type 2 1.71 5.85 x 10-5


1.70 8.24 x 10-4

211527_PM_x_at VEGFA vascular endothelial growth factor A 1.63 4.68 x 10-5

266_PM_s_at CD24 CD24 molecule 1.63 1.64 x 10-3

222802_PM_at EDN1 endothelin 1 1.59 8.42 x 10-5

1568894_PM_at LOC101929007 WAS/WASL-interacting protein family member 1

1.53 2.97 x 10-3

206637_PM_at P2RY14 purinergic receptor P2Y14 1.49 2.93 x 10-3


207072_PM_at IL18RAP interleukin 18 receptor accessory protein

1.48 1.39 x 10-3

226905_PM_at FAM101B family with sequence similarity 101 member B

1.48 1.05 x 10-3

202948_PM_at IL1R1 interleukin 1 receptor type 1 1.47 2.77 x 10-4

205403_PM_at IL1R2 interleukin 1 receptor type 2 1.47 1.36 x 10-4

224965_PM_at GNG2 G protein subunit gamma 2 1.46 3.68 x 10-4

232213_PM_at PELI1 pellino E3 ubiquitin protein ligase 1 1.45 7.07 x 10-5

1553723_PM_at ADGRG3 adhesion G protein-coupled receptor G3

1.45 3.54 x 10-4

211269_PM_s_at IL2RA interleukin 2 receptor subunit alpha 1.45 2.03 x 10-4

1568830_PM_at IRAK3 interleukin 1 receptor associated kinase 3

1.44 2.31 x 10-4

210513_PM_s_at VEGFA vascular endothelial growth factor A 1.44 9.59 x 10-5

210512_PM_s_at VEGFA vascular endothelial growth factor A 1.43 1.97 x 10-4

211839_PM_s_at CSF1 colony stimulating factor 1 1.42 9.40 x 10-5

1569599_PM_at SAMSN1 SAM domain, SH3 domain and nuclear localization signals 1

1.42 4.05 x 10-4


1.42 1.78 x 10-3

49

242814_PM_at SERPINB9 serpin family B member 9 1.41 2.26 x 10-3

221385_PM_s_at FFAR3 free fatty acid receptor 3 1.41 8.48 x 10-4

204951_PM_at RHOH ras homolog family member H 1.40 1.91 x 10-4

213036_PM_x_at ATP2A3 ATPase sarcoplasmic/endoplasmic reticulum Ca2+ transporting 3

1.40 1.08 x 10-4

216015_PM_s_at NLRP3 NLR family pyrin domain containing 3

1.39 8.61 x 10-4

216016_PM_at NLRP3 NLR family pyrin domain containing 3

1.36 1.90 x 10-4


217187_PM_at LOC100293983 similar to mucin 1.35 1.77 x 10-2

207651_PM_at GPR171 G protein-coupled receptor 171 1.35 3.98 x 10-4

224964_PM_s_at GNG2 G protein subunit gamma 2 1.34 1.10 x 10-3

209771_PM_x_at CD24 CD24 molecule 1.34 5.74 x 10-3

205624_PM_at CPA3 carboxypeptidase A3 1.34 1.21 x 10-4

202643_PM_s_at TNFAIP3 TNF alpha induced protein 3 1.34 4.62 x 10-4

228758_PM_at BCL6 B-cell CLL/lymphoma 6 1.33 4.79 x 10-5

217678_PM_at SLC7A11 solute carrier family 7 member 11 1.33 2.52 x 10-3

215561_PM_s_at IL1R1 interleukin 1 receptor type 1 1.32 9.09 x 10-4

210664_PM_s_at TFPI tissue factor pathway inhibitor 1.32 1.36 x 10-4

220404_PM_at ADGRG3 adhesion G protein-coupled receptor G3

1.31 3.53 x 10-4

207522_PM_s_at ATP2A3 ATPase sarcoplasmic/endoplasmic reticulum Ca2+ transporting 3

1.31 2.51 x 10-4

221920_PM_s_at SLC25A37 solute carrier family 25 member 37 1.30 2.65 x 10-3

208651_PM_x_at CD24 CD24 molecule 1.29 7.09 x 10-3

232094_PM_at KATNBL1 katanin regulatory subunit B1 like 1 1.29 2.18 x 10-2

50

Table D4 | Top 50 significantly down-regulated gene-associated probes in severe asthmatic samples in comparison with healthy controls, from the microarray gene expression profiling in induced sputum (GEO accession GSE76262). P-value adjustment for multiple comparisons was done with the BH procedure for FDR control (α = 0.05).


200832_PM_s_at SCD stearoyl-CoA desaturase -1.85 1.36 x 10-4


227761_PM_at MYO5A myosin VA -1.74 4.79 x 10-5

211162_PM_x_at SCD stearoyl-CoA desaturase -1.74 2.40 x 10-4


1569069_PM_s_at TDRD3 tudor domain containing 3 -1.71 2.06 x 10-3

204517_PM_at PPIC peptidylprolyl isomerase C -1.71 1.08 x 10-3

229954_PM_at CHDH choline dehydrogenase -1.67 2.08 x 10-4

209512_PM_at HSDL2 hydroxysteroid dehydrogenase like 2 -1.67 3.36 x 10-4

235978_PM_at FABP4 fatty acid binding protein 4 -1.66 9.34 x 10-3

226806_PM_s_at NFIA nuclear factor I A -1.62 1.08 x 10-4

220120_PM_s_at EPB41L4A erythrocyte membrane protein band 4.1 like 4A

-1.62 5.16 x 10-4

219525_PM_at SLC47A1 solute carrier family 47 member 1 -1.61 1.22 x 10-4

204894_PM_s_at AOC3 amine oxidase, copper containing 3 -1.61 9.21 x 10-4

1556697_PM_at GPRIN3 GPRIN family member 3 -1.61 3.32 x 10-4

1554004_PM_a_at ARHGEF28 Rho guanine nucleotide exchange factor 28

-1.61 8.42 x 10-5

215570_PM_s_at ZNF780A / ZNF780B

zinc finger protein 780A / zinc finger protein 780B

-1.59 4.05 x 10-4

209616_PM_s_at CES1 carboxylesterase 1 -1.59 3.80 x 10-4

200665_PM_s_at SPARC secreted protein acidic and cysteine rich

-1.58 2.77 x 10-4


201116_PM_s_at CPE carboxypeptidase E -1.54 1.93 x 10-4

231192_PM_at LPAR3 lysophosphatidic acid receptor 3 -1.54 1.09 x 10-4

224893_PM_at ATL3 atlastin GTPase 3 -1.53 1.31 x 10-4

203259_PM_s_at HDDC2 HD domain containing 2 -1.53 5.21 x 10-4

202786_PM_at STK39 serine/threonine kinase 39 -1.52 1.83 x 10-4

1569110_PM_x_at LOC728613 programmed cell death 6 pseudogene

-1.51 4.09 x 10-3

201117_PM_s_at CPE carboxypeptidase E -1.50 8.04 x 10-4

214835_PM_s_at SUCLG2 succinate-CoA ligase GDP-forming beta subunit

-1.50 6.81 x 10-5

219295_PM_s_at PCOLCE2 procollagen C-endopeptidase enhancer 2

-1.50 2.00 x 10-3

51

228728_PM_at CPED1 cadherin like and PC-esterase domain containing 1

-1.50 1.11 x 10-3

212459_PM_x_at SUCLG2 succinate-CoA ligase GDP-forming beta subunit

-1.50 7.72 x 10-5

203335_PM_at PHYH phytanoyl-CoA 2-hydroxylase -1.49 3.51 x 10-4

1554003_PM_at ARHGEF28 Rho guanine nucleotide exchange factor 28

-1.49 1.49 x 10-3

214021_PM_x_at ITGB5 integrin subunit beta 5 -1.49 8.84 x 10-4

218356_PM_at MRM2 mitochondrial rRNA methyltransferase 2

-1.49 1.01 x 10-4

229070_PM_at ADTRP androgen dependent TFPI regulating protein

-1.48 4.73 x 10-4

203386_PM_at TBC1D4 TBC1 domain family member 4 -1.48 1.73 x 10-4

218197_PM_s_at OXR1 oxidation resistance 1 -1.47 1.10 x 10-4


-1.47 1.15 x 10-4

210471_PM_s_at KCNAB1 potassium voltage-gated channel subfamily A member regulatory beta subunit 1

-1.46 1.22 x 10-4

229383_PM_at MARCH1 membrane associated ring-CH-type finger 1

-1.45 3.17 x 10-5

229498_PM_at MBNL3 muscleblind like splicing regulator 3 -1.44 2.30 x 10-5

205206_PM_at ANOS1 anosmin 1 -1.44 5.79 x 10-5

222714_PM_s_at LACTB2 lactamase beta 2 -1.43 4.95 x 10-4

227194_PM_at FAM3B family with sequence similarity 3 member B

-1.43 1.69 x 10-4

212224_PM_at ALDH1A1 aldehyde dehydrogenase 1 family member A1

-1.43 2.95 x 10-4

234295_PM_at DBR1 debranching RNA lariats 1 -1.43 1.66 x 10-4

214770_PM_at MSR1 macrophage scavenger receptor 1 -1.42 1.34 x 10-3

234000_PM_s_at HACD3 3-hydroxyacyl-CoA dehydratase 3 -1.41 5.83 x 10-4

227040_PM_at NHLRC3 NHL repeat containing 3 -1.41 8.59 x 10-5

52

Appendix E

Fig. E1 | Manhattan plot of U-BIOPRED GWAS for atopy in the asthmatic group (n = 295), across chromosomes 1 (left) to Y. The red line represents the threshold for genome-wide significance (p ≤ 5 x 10-8); the blue line shows the threshold for suggestive significance (p ≤ 1 x 10-5).

Fig. E2 | Manhattan plot of U-BIOPRED GWAS for eczema in the asthmatic group (n = 295), across chromosomes 1 (left) to Y. The red line represents the threshold for genome-wide significance (p ≤ 5 x 10-8); the blue line shows the threshold for suggestive significance (p ≤ 1 x 10-5).

TRAF3IP1

VAT1L

NUDT12

FLJ16171

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X Y

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X Y

rs117293567

ZNF891 ZNF10

53

Fig. E3 | Manhattan plot of U-BIOPRED GWAS for allergic rhinitis in the asthmatic group (n = 295), across chromosomes 1 (left) to Y. The red line represents the threshold for genome-wide significance (p ≤ 5 x 10-8); the blue line shows the threshold for suggestive significance (p ≤ 1 x 10-5).

Fig. E4 | Manhattan plot of U-BIOPRED GWAS for moderate asthma (n = 65) versus controls (n = 70), across chromosomes 1 (left) to Y. The red line represents the threshold for genome-wide significance (p ≤ 5 x 10-8); the blue line shows the threshold for suggestive significance (p ≤ 1 x 10-5).

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X Y

UNK

rs4482098

rs2911761

LOC105375643

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X Y

IL17A

IL17A

54

Fig. E5 | Manhattan plot of U-BIOPRED GWAS for severe asthma (n = 230) versus controls (n = 70), across chromosomes 1 (left) to Y. The red line represents the threshold for genome-wide significance (p ≤ 5 x 10-8); the blue line shows the threshold for suggestive significance (p ≤ 1 x 10-5).

Fig. E6 | Manhattan plot of U-BIOPRED GWAS for severe (n = 230) versus moderate asthma (n = 65), across chromosomes 1 (left) to Y. The red line represents the threshold for genome-wide significance (p ≤ 5 x 10-8); the blue line shows the threshold for suggestive significance (p ≤ 1 x 10-5).

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X Y

B3GALT1

PTPN2

PTPN2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X Y

GIN1

GADD45G

RNPS1

MAPK15 GMDS

LOC105375149 SIGLEC12 SIGLEC6

55

Fig. E7 | Manhattan plot of the U-BIOPRED GWAS for severe and moderate asthma (n = 295) versus controls (n = 70), across chromosomes 1 (left) to Y. The red line represents the threshold for genome-wide significance (p ≤ 5 x 10-8); the blue line shows the threshold for suggestive significance (p ≤1 x 10-5).

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X Y

GIN1

HDAC9

GADDA45G

LINC00331

RNPS1

SIGLEC12 SIGLEC6

56

a b

c d

e f

g Fig. E8 | Normal Q-Q plots of the expected distribution of association test statistics (horizontal axis) across the U-BIOPRED GWAS SNPs compared to the experimentally observed p-values (vertical axis). (a) atopy group. (b) eczema group. (c) allergic rhinitis group. (d) moderate asthma vs. controls. (e) severe asthma vs. controls. (f) severe asthma vs. moderate asthma. (g) severe and moderate asthma vs. controls.

57

Table E9 | Summary of the SNPs identified for suggestive association to asthma, moderate asthma, severe asthma, and asthma comorbidities (Fisher's exact test; suggestive threshold: p ≤ 1 x 10-5; association threshold: p ≤ 5 x 10-8) from the U-BIOPRED GWAS (n = 610).

Comparison Probe ID SNP (rsID) p-value Locus Associated gene

Atopy

vs.

Controls

AX-33671783 rs62194948 3.79 x 10-6 2q37.3 TRAF3IP1

AX-11591081 rs6869606 2.95 x 10-6 5q21.1 FLJ16171

AX-11428050 rs294087 1.11 x 10-6 5q21.2 NUDT12

AX-13114292 rs56407011 6.89 x 10-6 16q23.1 VAT1L

Eczema

vs.

Controls

AX-35910739 rs117293567 7.43 x 10-7 6q14.3 -

AX-11656188 rs7959401 5.89 x 10-6 12q24.33 ZNF891

AX-16864123 rs2270928 6.66 x 10-6 12q24.33 ZNF10

Allergic rhinitis

vs.

Controls

AX-11426644 rs2911761 8.71 x 10-7 8q13.3 -

AX-11289480 rs16915813 4.65 x 10-6 8q22.1 LOC105375643

AX-11507428 rs4482098 9.57 x 10-6 12q21.31 -

AX-83023978 - 8.01 x 10-7 17q25.1 UNK

Mod. asthma

vs.

Controls

AFFX-SNP-002429 rs9357726 1.37 x 10-6 6p12.2 IL17A

AX-11682831 rs9357726 7.14 x 10-6 6p12.2 IL17A

Severe asthma vs.

Controls

AX-94351343 rs139236310 1.66 x 10-7 5q21.1 GIN1

AFFX-KIT-000188 rs4538755 6.32 x 10-6 6p25.3 GMDS

AX-38342115 rs117655759 5.50 x 10-6 7p21.3 LOC105375149

AX-36530641 rs73718146 8.64 x 10-6 8q24.3 MAPK15

AX-37088245 rs34575488 4.79 x 10-7 9q22.2 GADD45G

AX-94380583 rs34308697 5.20 x 10-7 16p13.3 RNPS1

AX-56898073 rs73051357 6.16 x 10-6 19q13.41 SIGLEC12

AX-63500245 rs73067315 6.16 x 10-6 19q13.41 SIGLEC6

Mod. asthma vs.

Severe asthma

AX-33381785 rs2029080 6.61 x 10-6 2q24.3 B3GALT1

AX-32381281 rs73404453 2.21 x 10-7 18p11.21 PTPN2

AX-13257605 rs10502415 7.42 x 10-6 18p11.21 PTPN2

Asthma

vs.

Controls

AX-94351343 rs139236310 5.45 x 10-6 5q21.1 GIN1

AX-36140731 rs12537699 6.06 x 10-6 7p21.1 HDAC9

AX-37088245 rs34575488 5.43 x 10-7 9q22.2 GADD45G

AX-17240625 rs9574294 7.02 x 10-6 13q31.1 LINC00331

AX-94380583 rs34308697 9.18 x 10-7 16p13.3 RNPS1

AX-56898073 rs73051357 1.12 x 10-6 19q13.41 SIGLEC12

AX-63500245 rs73067315 1.12 x 10-6 19q13.41 SIGLEC6

58

Appendix F

Table F1 | Results of the analysis of MODifieR methods by the selection of the optimal module combinations based on PASCAL scoring, for expression-based modules and their paired U-BIOPRED GWAS profiles (n = 365). The number of cases in which a module combination was selected as the "optimal" (most significant) is displayed as its frequency (%) for asthmatics (n = 295), healthy controls (n = 70), or both. Samples for which none of the method combinations produced a PASCAL p-value over the significance threshold (p ≤ 0.05) were considered not significant.

MODifieR method combination Frequency

(asthmatics) Frequency (controls)

Frequency (Total samples)

MCODE ∪ Clique Sum ∪ DIAMOnD 26.2 % 14.4 % 23.9 %

MCODE ∪ Clique Sum 13.0 % 10.1 % 12.4 %

MCODE ∪ DIAMOnD 11.1 % 5.9 % 10.1 %

MCODE 7.6 % 9.3 % 7.9 %

Clique Sum ∪ DIAMOnD 3.0 % 3.6 % 3.1 %

DIAMOnD 0.4 % 9.5 % 2.1 %

Clique Sum 1.7 % 0.0 % 1.4 %

Clique Sum ∩ DIAMOnD 1.0 % 1.2 % 1.0 %

MCODE ∩ Clique Sum 0.8 % 1.4 % 0.9 %

MCODE ∩ DIAMOnD 0.9 % 0.7 % 0.9 %

MCODE ∩ Clique Sum ∩ DIAMOnD 0.2 % 0.1 % 0.2 %

Not significant 34.1 % 43.8 % 36.0 %

59

Appendix G

Fig. G1 | Evaluation of the GWAS SNPs-module genes intersection across the MODifieR method combinations, divided by sample category. Every SNP in the U-BIOPRED and GABRIEL genome-wide studies below the p-value cutoff of 1 x 10-3 was included in the analysis. Each dot represents the score of the individual module for a sample in the expression profiling in blood dataset (GSE69683). The significance threshold was p ≤ 2.4 x 10-6 (dashed line).

Fig. G2 | Evaluation of the GWAS SNPs-module genes intersection across the MODifieR method combinations, divided by sample category. Every SNP in the U-BIOPRED and GABRIEL genome-wide studies below the p-value cutoff of 1 x 10-3 was included in the analysis. Each dot represents the score of the individual module for a sample in the expression profiling in induced sputum dataset (GSE76262). The significance threshold was p ≤ 2.4 x 10-6 (dashed line, not shown).

MC: MCODE

CS: Clique Sum

D: DIAMOnD

GSE69683

MC: MCODE

CS: Clique Sum

D: DIAMOnD

GSE76262

60

Fig. G3 | Evaluation of the GWAS SNPs-module genes intersection across the MODifieR method combinations, divided by sample category. Every SNP in the U-BIOPRED and GABRIEL genome-wide studies below the p-value cutoff of 1 x 10-3 was included in the analysis. Each dot represents the score of the individual module for a sample in the expression profiling of bronchial biopsy samples (GSE76225). The significance threshold was p ≤ 2.4 x 10-6 (dashed line).

Fig. G4 | Evaluation of the GWAS SNPs-module genes intersection across the MODifieR method combinations, divided by sample category. Every SNP in the U-BIOPRED and GABRIEL genome-wide studies below the p-value cutoff of 1 x 10-3 was included in the analysis. Each dot represents the score of the individual module for a sample in the expression profiling of epithelial brushing samples (GSE76226). The significance threshold was p ≤ 2.4 x 10-6 (dashed line, not shown).

MC: MCODE

CS: Clique Sum

D: DIAMOnD

GSE76225

GSE76226 MC: MCODE

CS: Clique Sum

D: DIAMOnD

61

Fig. G5 | Evaluation of the GWAS SNPs-module genes intersection across the MODifieR method combinations, divided by sample category. Every SNP in the U-BIOPRED and GABRIEL genome-wide studies below the p-value cutoff of 1 x 10-3 was included in the analysis. Each dot represents the score of the individual module for a sample in the expression profiling of bronchial biopsy and epithelial brushing samples (GSE76227). The significance threshold was p ≤ 2.4 x 10-6 (dashed line, not shown).

MC: MCODE

CS: Clique Sum

D: DIAMOnD

GSE76227

62

Appendix H

Table H1 | Top 30 results for the Gene Ontology (GO) over-representation analysis for the core module "severe asthma versus moderate asthma and controls", of the MODifieR method combination with the higher PASCAL score for GWAS significance ("Clique Sum ∪ DIAMOnD ∩ MCODE", GSE69683). P-values were BH-corrected (significance threshold: adj. p ≤ 0.05).

GO ID Description Gene ratio adj. p-value

GO:0033674 positive regulation of kinase activity 87/413 2.77 x 10-46

GO:0045860 positive regulation of protein kinase activity 80/413 8.19 x 10-42

GO:0071900 regulation of protein serine/threonine kinase activity 73/413 4.49 x 10-35

GO:0002768 immune response-regulating cell surface receptor signaling pathway

65/413 2.35 x 10-34

GO:0043405 regulation of MAP kinase activity 61/413 8.38 x 10-34

GO:0043410 positive regulation of MAPK cascade 72/413 1.16 x 10-33

GO:0018108 peptidyl-tyrosine phosphorylation 62/413 5.47 x 10-32

GO:0018212 peptidyl-tyrosine modification 62/413 6.72 x 10-32

GO:0002764 immune response-regulating signaling pathway 71/413 8.99 x 10-32

GO:0038093 Fc receptor signaling activity 48/413 2.33 x 10-30

GO:0002429 immune response-activating cell surface receptor signaling pathway

58/413 3.04 x 10-30

GO:0043406 positive regulation of MAP kinase activity 48/413 5.04 x 10-30

GO:0071417 cellular response to organonitrogen compound 67/413 5.40 x 10-30

GO:0071902 positive regulation of protein serine/threonine kinase activity

54/413 2.80 x 10-29

GO:0032147 activation of protein kinase activity 53/413 3.16 x 10-28

GO:0071407 cellular response to organic cyclic compound 67/413 4.57 x 10-28

GO:1901652 response to peptide 62/413 7.97 x 10-28

GO:0002757 immune response-activating signal transduction 64/413 9.02 x 10-28

GO:0048015 phosphatidylinositol-mediated signaling 44/413 5.33 x 10-27

GO:0014065 phosphatidylinositol 3-kinase signaling 40/413 9.21 x 10-27

GO:0048017 inositol lipid-mediated signaling 44/413 9.46 x 10-27

GO:0014066 regulation of phosphatidylinositol 3-kinase signaling 38/413 1.32 x 10-26

GO:1901653 cellular response to peptide 52/413 1.55 x 10-26

GO:0038127 ERBB signaling pathway 38/413 5.01 x 10-26

GO:0071375 cellular response to peptide hormone stimulus 50/413 7.21 x 10-26

GO:0043434 response to peptide hormone 57/413 1.26 x 10-25

GO:0038095 Fc-epsilon receptor signaling pathway 37/413 2.96 x 10-25

GO:0007173 epidermal growth factor receptor signaling pathway 33/413 1.02 x 10-22

GO:0050852 T cell receptor signaling pathway 37/413 1.11 x 10-22

GO:0090150 establishment of protein localization to membrane 51/413 1.66 x 10-22

63

Table H2 | Top 30 results for the KEGG over-representation analysis for the core module "severe asthma versus moderate asthma and controls", of the MODifieR method combination with the higher PASCAL score for GWAS significance ("Clique Sum ∪ DIAMOnD ∩ MCODE", GSE69683). P-values were BH-corrected (significance threshold: adj. p ≤ 0.05).

KEGG ID Description Gene ratio adj. p-value

hsa05220 Chronic myeloid leukemia 42/366 2.44 x 10-32

hsa04012 ErbB signaling pathway 42/366 2.56 x 10-30

hsa04510 Focal adhesion 59/366 6.92 x 10-29

hsa01521 EGFR tyrosine kinase inhibitor resistance 39/366 1.39 x 10-28

hsa04015 Rap1 signaling pathway 59/366 3.40 x 10-28

hsa05215 Prostate cancer 42/366 4.33 x 10-28

hsa04014 Ras signaling pathway 61/366 4.50 x 10-27

hsa04660 T cell receptor signaling pathway 42/366 6.39 x 10-27

hsa04919 Thyroid hormone signaling pathway 44/366 1.05 x 10-26

hsa05212 Pancreatic cancer 36/366 3.54 x 10-26

hsa01522 Endocrine resistance 40/366 3.54 x 10-26

hsa04010 MAPK signaling pathway 66/366 1.16 x 10-25

hsa05161 Hepatitis B 46/366 2.32 x 10-24

hsa04722 Neurotrophin signaling pathway 42/366 3.51 x 10-24

hsa04151 PI3K-Akt signaling pathway 70/366 5.00 x 10-24

hsa05205 Proteoglycans in cancer 53/366 1.56 x 10-23

hsa04068 FoxO signaling pathway 42/366 3.16 x 10-22

hsa04062 Chemokine signaling pathway 49/366 4.72 x 10-22

hsa05167 Kaposi's sarcoma-associated herpesvirus infection 49/366 5.80 x 10-22

hsa05214 Glioma 31/366 3.46 x 10-21

hsa05166 HTLV-I infection 56/366 3.61 x 10-21

hsa04664 Fc epsilon RI signaling pathway 30/366 8.43 x 10-21

hsa05213 Endometrial cancer 28/366 8.90 x 10-21

hsa05211 Renal cell carcinoma 30/366 1.31 x 10-20

hsa05210 Colorectal cancer 33/366 1.32 x 10-20

hsa05223 Non-small cell lung cancer 29/366 4.06 x 10-20

hsa04926 Relaxin signaling pathway 39/366 6.87 x 10-20

hsa05203 Viral carcinogenesis 48/366 7.98 x 10-20

hsa04910 Insulin signaling pathway 40/366 8.07 x 10-20

hsa05226 Gastric cancer 41/366 2.14 x 10-19

64

Table H3 | Top 30 results for the KEGG over-representation analysis for the core module "severe asthma versus moderate asthma and controls", of the MODifieR method combination with the higher PASCAL score for GWAS significance ("Clique Sum ∪ DIAMOnD ∩ MCODE", GSE69683), for asthma, inflammation, or immune response-related pathways. P-values were BH-corrected (significance threshold: adj. p ≤ 0.05).

KEGG ID Description Gene ratio adj. p-value

hsa04015 Rap1 signaling pathway 59/366 3.40 x 10-28

hsa04660 T cell receptor signaling pathway 42/366 6.39 x 10-27

hsa04010 MAPK signaling pathway 66/366 1.16 x 10-25

hsa04151 PI3K-Akt signaling pathway 70/366 5.00 x 10-24

hsa04062 Chemokine signaling pathway 49/366 4.72 x 10-22

hsa04664 Fc epsilon RI signaling pathway 30/366 8.43 x 10-21

hsa04659 Th17 cell differentiation 32/366 2.27 x 10-16

hsa04662 B cell receptor signaling pathway 25/366 8.61 x 10-15

hsa04370 VEGF signaling pathway 23/366 7.42 x 10-15

hsa04630 Jak-STAT signaling pathway 36/366 5.84 x 10-14

hsa04072 Phospholipase D signaling pathway 34/366 5.94 x 10-14

hsa04658 Th1 and Th2 cell differentiation 26/366 7.27 x 10-13

hsa04611 Platelet activation 29/366 3.58 x 10-12

hsa04650 Natural killer cell mediated cytotoxicity 30/366 4.80 x 10-12

hsa04668 TNF signaling pathway 27/366 5.82 x 10-12

hsa04670 Leukocyte transendothelial migration 27/366 1.42 x 10-11

hsa04666 Fc gamma R-mediated phagocytosis 23/366 1.65 x 10-10

hsa04520 Adherens junction 20/366 4.77 x 10-10

hsa04024 cAMP signaling pathway 33/366 2.04 x 10-9

hsa04620 Toll-like receptor signaling pathway 23/366 3.16 x 10-9

hsa04150 mTOR signaling pathway 26/366 6.63 x 10-8

hsa04920 Adipocytokine signaling pathway 17/366 7.36 x 10-8

hsa04750 Inflammatory mediator regulation of TRP channels 20/366 1.11 x 10-7

hsa04371 Apelin signaling pathway 23/366 5.79 x 10-7

hsa04657 IL-17 signaling pathway 18/366 1.55 x 10-6

hsa04350 TGF-beta signaling pathway 16/366 7.85 x 10-6

hsa04530 Tight junction 22/366 7.49 x 10-5

hsa04310 Wnt signaling pathway 20/366 7.49 x 10-5

hsa04064 NF-kappa B signaling pathway 15/366 1.39 x 10-4

hsa04330 Notch signaling pathway 8/366 4.83 x 10-3

65

Table H4 | Top 20 significant results for the GO over-representation analysis for each of the network clusters 1-6 of the core module "severe asthma versus moderate asthma and controls" of the "Clique Sum ∪ DIAMOnD ∩ MCODE" method combination (GSE69683 dataset). P-values were BH-corrected (significance threshold: adj. p ≤ 0.05).

Cluster GO ID Description Gene ratio adj. p-value

1 GO:0000184 nuclear-transcribed mRNA catabolic process, nonsense-mediated decay

22/48 3.01 x 10-32

1 GO:0000956 nuclear-transcribed mRNA catabolic process 22/48 1.50 x 10-27

1 GO:0006402 mRNA catabolic process 22/48 5.40 x 10-27

1 GO:0006413 translational initiation 21/48 3.44 x 10-26

1 GO:0006613 cotranslational protein targeting to membrane 18/48 3.44 x 10-26

1 GO:0006401 RNA catabolic process 22/48 5.56 x 10-26

1 GO:0006614 SRP-dependent cotranslational protein targeting to membrane

17/48 7.48 x 10-25

1 GO:0045047 protein targeting to ER 17/48 2.07 x 10-24

1 GO:0072599 establishment of protein localization to endoplasmic reticulum

17/48 3.81 x 10-24

1 GO:0019080 viral gene expression 19/48 5.06 x 10-23

1 GO:0070972 protein localization to endoplasmic reticulum 17/48 7.79 x 10-23

1 GO:0034655 nucleobase-containing compound catabolic process

22/48 3.00 x 10-22

1 GO:0044033 multi-organism metabolic process 19/48 4.04 x 10-22

1 GO:0019083 viral transcription 18/48 7.24 x 10-22

1 GO:0006612 protein targeting to membrane 18/48 1.36 x 10-21

1 GO:0046700 heterocycle catabolic process 22/48 1.36 x 10-21

1 GO:0044270 cellular nitrogen compound catabolic process 22/48 1.88 x 10-21

1 GO:0019439 aromatic compound catabolic process 22/48 2.09 x 10-21

1 GO:1901361 organic cyclic compound catabolic process 22/48 6.63 x 10-21

1 GO:0006364 rRNA processing 17/48 2.11 x 10-17

2 GO:0042059 negative regulation of epidermal growth factor receptor signaling pathway

4/17 5.10 x 10-5

2 GO:1901185 negative regulation of ERBB signaling pathway 4/17 5.10 x 10-5

2 GO:0070371 ERK1 and ERK2 cascade 6/17 5.10 x 10-5

2 GO:0042058 regulation of epidermal growth factor receptor signaling pathway

4/17 3.54 x 10-4

2 GO:1901184 regulation of ERBB signaling pathway 4/17 3.54 x 10-4

2 GO:0043406 positive regulation of MAP kinase activity 5/17 3.54 x 10-4

2 GO:0023061 signal release 6/17 3.54 x 10-4

2 GO:0060627 regulation of vesicle-mediated transport 6/17 3.54 x 10-4

2 GO:0032386 regulation of intracellular transport 6/17 5.50 x 10-4

2 GO:0046887 positive regulation of hormone secretion 4/17 5.50 x 10-4

66

2 GO:0043410 positive regulation of MAPK cascade 6/17 5.50 x 10-4

2 GO:0045055 regulated exocytosis 5/17 6.14 x 10-4

2 GO:0032388 positive regulation of intracellular transport 5/17 6.14 x 10-4

2 GO:0007173 epidermal growth factor receptor signaling pathway

4/17 6.14 x 10-4

2 GO:0038128 ERBB2 signaling pathway 3/17 6.31 x 10-4

2 GO:0071902 positive regulation of protein serine/threonine kinase activity

5/17 6.31 x 10-4

2 GO:0032147 activation of protein kinase activity 5/17 6.31 x 10-4

2 GO:0000187 activation of MAPK activity 4/17 7.84 x 10-4

2 GO:0038127 ERBB signaling pathway 4/17 7.84 x 10-4

2 GO:0043405 regulation of MAP kinase activity 5/17 7.84 x 10-4

3 GO:0048015 phosphatidylinositol-mediated signaling 10/20 7.28 x 10-12

3 GO:0048017 inositol lipid-mediated signaling 10/20 7.28 x 10-12

3 GO:0071375 cellular response to peptide hormone stimulus 10/20 1.66 x 10-10

3 GO:1901653 cellular response to peptide 10/20 2.22 x 10-10

3 GO:0043410 positive regulation of MAPK cascade 11/20 3.50 x 10-10

3 GO:0033674 positive regulation of kinase activity 11/20 4.38 x 10-10

3 GO:0046777 protein autophosphorylation 9/20 4.46 x 10-10

3 GO:0014066 regulation of phosphatidylinositol 3-kinase signaling

8/20 4.98 x 10-10

3 GO:0014065 phosphatidylinositol 3-kinase signaling 8/20 1.05 x 10-9

3 GO:0043434 response to peptide hormone 10/20 1.05 x 10-9

3 GO:0038083 peptidyl-tyrosine autophosphorylation 6/20 1.38 x 10-9

3 GO:0048008 platelet-derived growth factor receptor signaling pathway

6/20 1.38 x 10-9

3 GO:1901652 response to peptide 10/20 1.90 x 10-9

3 GO:0032147 activation of protein kinase activity 9/20 2.23 x 10-9

3 GO:0045860 positive regulation of protein kinase activity 10/20 3.44 x 10-9

3 GO:0071417 cellular response to organonitrogen compound 10/20 3.51 x 10-9

3 GO:0043405 regulation of MAP kinase activity 9/20 3.94 x 10-9

3 GO:0018108 peptidyl-tyrosine phosphorylation 9/20 1.01 x 10-8

3 GO:0018212 peptidyl-tyrosine modification 9/20 1.01 x 10-8

3 GO:0070374 positive regulation of ERK1 and ERK2 cascade 7/20 5.64 x 10-8

4 GO:0038093 Fc receptor signaling pathway 14/78 1.83 x 10-9

4 GO:0002429 immune response-activating cell surface receptor signaling

16/78 2.40 x 10-9

4 GO:0038095 Fc-epsilon receptor signaling pathway 12/78 2.40 x 10-9


67

4 GO:0002768 immune response-regulating cell surface receptor signaling

16/78 4.21 x 10-9

4 GO:0050878 regulation of body fluid levels 18/78 4.21 x 10-9

4 GO:0030168 platelet activation 12/78 4.21 x 10-9

4 GO:0050852 T cell receptor signaling pathway 12/78 5.68 x 10-9

4 GO:0002757 immune response-activating signal transduction 17/78 7.82 x 10-9



4 GO:0007596 blood coagulation 15/78 1.50 x 10-8

4 GO:0050817 coagulation 15/78 1.50 x 10-8

4 GO:0007599 hemostasis 15/78 1.50 x 10-8

4 GO:0002764 immune response-regulating signaling pathway 17/78 1.50 x 10-8

4 GO:0044839 cell cycle G2/M phase transition 12/78 1.64 x 10-8

4 GO:0050851 antigen receptor-mediated signaling pathway 12/78 7.75 x 10-8

4 GO:0044770 cell cycle phase transition 16/78 7.94 x 10-8

4 GO:0051098 regulation of binding 13/78 1.17 x 10-7


5 GO:0045930 negative regulation of mitotic cell cycle 13/58 2.96 x 10-10

5 GO:0002429 immune response-activating cell surface receptor signaling

15/58 2.96 x 10-10


5 GO:0002768 immune response-regulating cell surface receptor signaling

15/58 5.67 x 10-10

5 GO:0071900 regulation of protein serine/threonine kinase activity

16/58 1.02 x 10-9


5 GO:0018212 peptidyl-tyrosine modification 14/58 4.28 x 10-9

5 GO:0000082 G1/S transition of cell cycle 12/58 4.28 x 10-9

5 GO:0045786 negative regulation of cell cycle 15/58 4.72 x 10-9

5 GO:0044772 mitotic cell cycle phase transition 15/58 4.72 x 10-9

5 GO:0007050 cell cycle arrest 12/58 4.72 x 10-9

5 GO:0044843 cell cycle G1/S phase transition 12/58 4.72 x 10-9

5 GO:0002757 immune response-activating signal transduction 15/58 4.82 x 10-9


5 GO:0007346 regulation of mitotic cell cycle 15/58 5.57 x 10-9

5 GO:0044770 cell cycle phase transition 15/58 7.47 x 10-9



5 GO:0050851 antigen receptor-mediated signaling pathway 11/58 2.90 x 10-8

68

5 GO:0050852 T cell receptor signaling pathway 10/58 2.94 x 10-8

6 GO:2000027 regulation of organ morphogenesis 8/25 3.61 x 10-8

6 GO:0048754 branching morphogenesis of an epithelial tube 6/25 1.93 x 10-5

6 GO:0042493 response to drug 8/25 1.93 x 10-5

6 GO:1901652 response to peptide 8/25 2.23 x 10-5

6 GO:0002433 immune response-regulating cell surface receptor signaling pathway involved in phagocytosis

5/25 2.23 x 10-5

6 GO:0038096 Fc-gamma receptor signaling pathway involved in phagocytosis

5/25 2.23 x 10-5

6 GO:0061138 morphogenesis of a branching epithelium 6/25 2.23 x 10-5


6 GO:0038094 Fc-gamma receptor signaling pathway 5/25 2.23 x 10-5

6 GO:0002431 Fc receptor mediated stimulatory signaling pathway

5/25 2.23 x 10-5

6 GO:0001763 morphogenesis of a branching structure 6/25 6.01 x 10-5



6 GO:0071407 cellular response to organic cyclic compound 8/25 6.31 x 10-5


6 GO:0002768 immune response-regulating cell surface receptor signaling pathway

7/25 9.65 x 10-5

6 GO:0030335 positive regulation of cell migration 7/25 1.42 x 10-4

6 GO:0043434 response to peptide hormone 7/25 1.44 x 10-4

6 GO:0072075 metanephric mesenchyme development 3/25 1.48 x 10-4

6 GO:2000147 positive regulation of cell motility 7/25 1.48 x 10-4

69

Table H5 | Top significant results for the KEGG over-representation analysis for the network clusters 1-6 of the core module "severe asthma versus moderate asthma and controls" of the "Clique Sum ∪ DIAMOnD ∩ MCODE" method combination (GSE69683 dataset). P-values were BH-corrected (significance threshold: adj. p ≤ 0.05).

Cluster KEGG ID Description Gene ratio adj. p-value

1 hsa03010 Ribosome 17/46 7.77 x 10-16

1 hsa04919 Thyroid hormone signaling pathway 13/46 4.01 x 10-12

1 hsa03013 RNA transport 8/46 1.95 x 10-4

1 hsa03015 mRNA surveillance pathway 5/46 3.94 x 10-3

1 hsa04330 Notch signaling pathway 3/46 4.32 x 10-2

2 hsa05212 Pancreatic cancer 4/17 1.47 x 10-3

2 hsa01521 EGFR tyrosine kinase inhibitor resistance 4/17 1.47 x 10-3

2 hsa04130 SNARE interactions in vesicular transport 3/17 2.19 x 10-3

2 hsa05219 Bladder cancer 3/17 2.90 x 10-3

2 hsa04144 Endocytosis 5/17 3.46 x 10-3

2 hsa04926 Relaxin signaling pathway 4/17 3.46 x 10-3

2 hsa04072 Phospholipase D signaling pathway 4/17 4.10 x 10-3

2 hsa05213 Endometrial cancer 3/17 4.10 x 10-3

2 hsa04010 MAPK signaling pathway 5/17 4.81 x 10-3

2 hsa05223 Non-small cell lung cancer 3/17 4.81 x 10-3

2 hsa05214 Glioma 3/17 5.18 x 10-3

2 hsa05218 Melanoma 3/17 5.18 x 10-3

2 hsa04012 ErbB signaling pathway 3/17 7.45 x 10-3

2 hsa05210 Colorectal cancer 3/17 7.45 x 10-3

2 hsa04540 Gap junction 3/17 7.45 x 10-3

2 hsa04658 Th1 and Th2 cell differentiation 3/17 7.95 x 10-3

2 hsa05215 Prostate cancer 3/17 8.52 x 10-3

2 hsa05231 Choline metabolism in cancer 3/17 8.52 x 10-3

2 hsa04066 HIF-1 signaling pathway 3/17 8.52 x 10-3

2 hsa04660 T cell receptor signaling pathway 3/17 8.81 x 10-3

3 hsa01521 EGFR tyrosine kinase inhibitor resistance 12/20 1.61 X 10-17

3 hsa04510 Focal adhesion 14/20 1.65 X 10-16

3 hsa04151 PI3K-Akt signaling pathway 15/20 6.75 x 10-15

3 hsa04014 Ras signaling pathway 13/20 4.82 x 10-14

3 hsa05211 Renal cell carcinoma 9/20 1.42 x 10-12

3 hsa04722 Neurotrophin signaling pathway 10/20 3.13 x 10-12

3 hsa05205 Proteoglycans in cancer 11/20 1.45 x 10-11

3 hsa04015 Rap1 signaling pathway 11/20 1.49 x 10-11

3 hsa04072 Phospholipase D signaling pathway 10/20 1.68 x 10-11

70

3 hsa04630 Jak-STAT signaling pathway 10/20 4.33 x 10-11

3 hsa04917 Prolactin signaling pathway 8/20 6.53 x 10-11

3 hsa05214 Glioma 8/20 6.73 x 10-11

3 hsa04062 Chemokine signaling pathway 10/20 1.26 x 10-10

3 hsa05220 Chronic myeloid leukemia 8/20 1.26 x 10-10


3 hsa05231 Choline metabolism in cancer 8/20 7.80 x 10-10

3 hsa04370 VEGF signaling pathway 7/20 8.73 x 10-10


3 hsa04664 Fc epsilon RI signaling pathway 7/20 2.19 x 10-9

3 hsa04926 Relaxin signaling pathway 8/20 5.66 x 10-9

4 hsa05222 Small cell lung cancer 15/72 1.25 x 10-12


4 hsa04510 Focal adhesion 18/72 1.52 x 10-11


4 hsa01522 Endocrine resistance 13/72 1.94 x 10-10


4 hsa05169 Epstein-Barr virus infection 16/72 1.74 x 10-9

4 hsa04722 Neurotrophin signaling pathway 13/72 1.97 x 10-9

4 hsa05212 Pancreatic cancer 11/72 2.22 x 10-9

4 hsa05203 Viral carcinogenesis 15/72 1.14 x 10-8


4 hsa05214 Glioma 10/72 1.90 x 10-8

4 hsa05226 Gastric cancer 13/72 2.07 x 10-8


4 hsa05167 Kaposi's sarcoma-associated herpesvirus infection 14/72 2.74 x 10-8

4 hsa05213 Endometrial cancer 9/72 4.20 x 10-8

4 hsa05165 Human papillomavirus infection 17/72 7.15 x 10-8

4 hsa05205 Proteoglycans in cancer 14/72 7.15 x 10-8

4 hsa05161 Hepatitis B 12/72 1.10 x 10-7

4 hsa05224 Breast cancer 12/72 1.32 x 10-7

5 hsa04110 Cell cycle 15/56 1.54 x 10-12

5 hsa04218 Cellular senescence 13/56 1.01 x 10-8


5 hsa04151 PI3K-Akt signaling pathway 16/56 1.25 x 10-7


5 hsa04014 Ras signaling pathway 13/56 3.30 x 10-7


71

5 hsa04068 FoxO signaling pathway 10/56 8.74 x 10-7

5 hsa04630 Axon guidance 11/56 1.06 x 10-6

5 hsa05161 Hepatitis B 10/56 1.60 x 10-6

5 hsa05166 HTLV-I infection 12/56 4.87 x 10-6

5 hsa05225 Hepatocellular carcinoma 10/56 5.70 x 10-6

5 hsa05226 Gastric cancer 9/56 1.88 x 10-5




5 hsa04914 Progesterone-mediated oocyte maturation 7/56 8.09 x 10-5


5 hsa05221 Acute myeloid leukemia 6/56 8.09 x 10-5

5 hsa04010 MAPK signaling pathway 11/56 8.55 x 10-5

6 hsa05161 Hepatitis B 6/24 4.78 x 10-4

6 hsa04658 Th1 and Th2 cell differentiation 5/24 4.78 x 10-4

6 hsa04933

AGE-RAGE signaling pathway in diabetic complications

5/24 4.78 x 10-4


6 hsa04659 Th17 cell differentiation 5/24 4.78 x 10-4


6 hsa05167 Kaposi's sarcoma-associated herpesvirus infection 6/24 4.78 x 10-4

6 hsa04919 Thyroid hormone signaling pathway 5/24 5.65 x 10-4

6 hsa04110 Cell cycle 5/24 6.93 x 10-4



6 hsa05210 Colorectal cancer 4/24 1.82 x 10-3

6 hsa04217 Necroptosis 5/24 1.82 x 10-3

6 hsa05168 Herpes simplex infection 5/24 2.96 x 10-3

6 hsa04620 Toll-like receptor signaling pathway 4/24 3.25 x 10-3

6 hsa04024 cAMP signaling pathway 5/24 3.55 x 10-3


6 hsa04380 Osteoclast differentiation 4/24 5.97 x 10-3

6 hsa04650 Natural killer cell mediated cytotoxicity 4/24 6.53 x 10-3

6 hsa05418 Fluid shear stress and atherosclerosis 4/24 7.32 x 10-3

72

Appendix I

Table I1 | List of genes from the core module "severe asthma versus moderate asthma and controls" of the MODifieR method combination with the higher PASCAL score for GWAS significance ("Clique Sum ∪ DIAMOnD ∩ MCODE"), ordered by network cluster (clusters 1-6 and unclustered) and ascending Chi-squared p-values for the frequency of the sample category comparison of the core module identification procedure (significance threshold: p ≤ 0.05). The log2 fold change (FC) and BH-adjusted p-value from the limma model for the "severe asthma versus moderate asthma and controls" comparison of the expression profiling in blood dataset (GSE69683) is reported (DEG significance threshold: adj. p ≤ 0.05). Genes without expression data correspond to seed genes added through the module identification process.

Cluster Entrez ID Gene Chi-squared

p-value Affymetrix Probe ID log2 FC adj. p-value

1 7311 UBA52 3.77 x 10-6 221700_PM_s_at 0.05 1.71 x 10-3

1 6194 RPS6 8.72 x 10-5 209134_PM_s_at 0.02 5.81 x 10-1

1 1387 CREBBP 1.55 x 10-4 228177_PM_at -0.10 1.11 x 10-2

1 6124 RPL4 1.90 x 10-4 200089_PM_s_at -0.02 5.67 x 10-1

1 6189 RPS3A 2.90 x 10-4 216823_PM_at -0.42 7.22 x 10-5

1 6134 RPL10 2.94 x 10-4 200724_PM_at -0.14 4.77 x 10-5

1 10499 NCOA2 5.71 x 10-4 205732_PM_s_at -0.15 7.92 x 10-2

1 6129 RPL7 6.61 x 10-4 216580_PM_at -0.20 8.24 x 10-6

1 6222 RPS18 8.03 x 10-4 201049_PM_s_at -0.03 5.86 x 10-1

1 6233 RPS27A 8.16 x 10-4 242214_PM_at 0.09 2.56 x 10-2

1 6201 RPS7 1.00 x 10-3 213941_PM_x_at -0.37 1.26 x 10-4

1 6181 RPLP2 1.66 x 10-3 200909_PM_s_at 0.11 6.16 x 10-6

1 8841 HDAC3 1.66 x 10-3 216326_PM_s_at -0.01 8.79 x 10-1

1 9862 MED24 2.96 x 10-3 213043_PM_s_at -0.06 3.82 x 10-1

1 5515 PPP2CA 3.01 x 10-3 208652_PM_at -0.08 1.10 x 10-2

1 1051 CEBPB 3.27 x 10-3 212501_PM_at -0.04 2.20 x 10-1

1 6176 RPLP1 3.39 x 10-3 200763_PM_s_at 0.05 1.96 x 10-1

1 6227 RPS21 3.78 x 10-3 200834_PM_s_at -0.10 1.33 x 10-1

1 55090 MED9 5.33 x 10-3 218372_PM_at 0.18 3.77 x 10-5

1 6130 RPL7A 5.33 x 10-3 224930_PM_x_at 0.09 9.68 x 10-4

1 8648 NCOA1 5.44 x 10-3 209106_PM_at -0.10 4.51 x 10-3

1 5469 MED1 5.63 x 10-3 225452_PM_at 0.10 2.93 x 10-3

1 6209 RPS15 5.68 x 10-3 1563014_PM_at -0.09 7.59 x 10-3

1 9611 NCOR1 6.16 x 10-3 200854_PM_at 0.03 2.78 x 10-1

1 10891 PPARGC1A 6.45 x 10-3 219195_PM_at -0.10 1.74 x 10-1

1 26019 UPF2 7.74 x 10-3 203519_PM_s_at 0.07 3.98 x 10-2

1 2033 EP300 8.34 x 10-3 213579_PM_s_at 0.05 1.79 x 10-1

1 8202 NCOA3 8.86 x 10-3 209061_PM_at 0.10 6.83 x 10-5

1 9612 NCOR2 9.25 x 10-3 236025_PM_at 0.10 6.17 x 10-3

73

1 9968 MED12 9.48 x 10-3 214275_PM_at 0.03 2.71 x 10-1

1 6136 RPL12 9.51 x 10-3 200088_PM_x_at 0.06 5.61 x 10-2

1 6837 MED22 1.06 x 10-2 206593_PM_s_at 0.08 9.79 x 10-2

1 8665 EIF3F 1.16 x 10-2 226014_PM_at 0.09 3.72 x 10-3

1 90390 MED30 1.17 x 10-2 227786_PM_at -0.11 1.23 x 10-2

1 6175 RPLP0 1.18 x 10-2 201033_PM_x_at -0.04 4.23 x 10-1

1 112950 MED8 1.66 x 10-2 213127_PM_s_at -0.06 6.99 x 10-2

1 6207 RPS13 1.81 x 10-2 200018_PM_at -0.01 8.00 x 10-1

1 4023 LPL 1.87 x 10-2 203549_PM_s_at -0.11 1.07 x 10-2

1 8663 EIF3C 1.87 x 10-2 236700_PM_at 0.13 7.99 x 10-2

1 9669 EIF5B 2.32 x 10-2 201024_PM_x_at 0.20 3.97 x 10-6

1 9282 MED14 2.66 x 10-2 202610_PM_s_at 0.09 1.64 x 10-2

1 1915 EEF1A1 2.85 x 10-2 1557120_PM_at -0.22 1.90 x 10-7

1 4686 NCBP1 2.92 x 10-2 209520_PM_s_at 0.05 3.03 x 10-1

1 9969 MED13 2.92 x 10-2 244611_PM_at 0.18 1.65 x 10-3

1 6746 SSR2 3.25 x 10-2 200652_PM_at -0.09 9.20 x 10-5

1 5468 PPARG 3.59 x 10-2 208510_PM_s_at -0.06 4.67 x 10-2

1 5976 UPF1 3.74 x 10-2 211168_PM_s_at 0.06 2.25 x 10-1

1 65109 UPF3B 4.39 x 10-2 218757_PM_s_at 0.18 1.98 x 10-3

2 7314 UBB 3.27 x 10-4 217144_PM_at -0.18 1.90 x 10-2

2 6845 VAMP7 1.18 x 10-2 202829_PM_s_at -0.10 1.16 x 10-1

2 915 CD3D 1.47 x 10-2 213539_PM_at 0.02 8.31 x 10-1

2 6844 VAMP2 1.49 x 10-2 214792_PM_x_at -0.05 4.22 x 10-1

2 29924 EPN1 1.67 x 10-2 226667_PM_x_at -0.05 2.78 x 10-1

2 917 CD3G 1.81 x 10-2 206804_PM_at 0.10 1.28 x 10-1

2 2017 CTTN 2.22 x 10-2 214782_PM_at -0.08 2.98 x 10-2

2 1950 EGF 2.30 x 10-2 206254_PM_at -0.08 3.02 x 10-1

2 9322 TRIP10 2.30 x 10-2 202734_PM_at 0.07 2.99 x 10-2

2 409 ARRB2 2.44 x 10-2 203388_PM_at -0.13 1.79 x 10-3

2 5604 MAP2K1 2.51 x 10-2 202670_PM_at 0.07 1.25 x 10-1

2 3716 JAK1 2.68 x 10-2 239695_PM_at 0.11 2.60 x 10-2

2 408 ARRB1 2.81 x 10-2 49111_PM_at -0.08 1.17 x 10-1

2 8673 VAMP8 3.54 x 10-2 202546_PM_at -0.11 1.39 x 10-2

2 551 AVP 3.96 x 10-2 207848_PM_at -0.06 8.08 x 10-2

2 1956 EGFR 4.21 x 10-2 201983_PM_s_at -0.05 1.01 x 10-2

2 8867 SYNJ1 4.93 x 10-2 212990_PM_at -0.07 3.75 x 10-1

3 5155 PDGFB 3.65 x 10-4 216061_PM_x_at 0.03 4.76 x 10-1

3 6464 SHC1 3.95 x 10-4 214853_PM_s_at 0.10 2.53 x 10-4

74

3 3688 ITGB1 4.00 x 10-4 1561042_PM_at 0.06 4.19 x 10-1

3 5781 PTPN11 7.90 x 10-4 205867_PM_at -0.06 1.10 x 10-1

3 6654 SOS1 9.20 x 10-4 227426_PM_at 0.04 3.65 x 10-1

3 3717 JAK2 1.79 x 10-3 205841_PM_at -0.28 8.51 x 10-4

3 3791 KDR 1.91 x 10-3 203934_PM_at -0.03 1.04 x 10-1

3 7422 VEGFA 2.13 x 10-3 211527_PM_x_at -0.05 1.29 x 10-1

3 3718 JAK3 3.36 x 10-3 211108_PM_s_at -0.07 3.07 x 10-3

3 5595 MAPK3 4.79 x 10-3 212046_PM_x_at -0.08 1.57 x 10-1

3 3561 IL2RG 6.13 x 10-3 204116_PM_at 0.04 2.00 x 10-1

3 5170 PDPK1 6.25 x 10-3 204524_PM_at 0.14 2.86 x 10-3

3 5293 PIK3CD 7.78 x 10-3 203879_PM_at 0.02 3.64 x 10-1

3 6714 SRC 8.56 x 10-3 213324_PM_at 0.11 2.75 x 10-3

3 5908 RAP1B 1.05 x 10-2 200833_PM_s_at 0.07 1.75 x 10-1

3 5894 RAF1 1.18 x 10-2 201244_PM_s_at -0.05 1.56 x 10-1

3 5159 PDGFRB 1.48 x 10-2 202273_PM_at 0.01 6.99 x 10-1

3 9846 GAB2 1.76 x 10-2 203853_PM_s_at -0.11 2.89 x 10-2

3 10000 AKT3 1.85 x 10-2 242876_PM_at 0.30 2.84 x 10-6

3 3667 IRS1 2.30 x 10-2 204686_PM_at 0.21 4.94 x 10-5

4 7157 TP53 2.24 x 10-4 211300_PM_s_at 0.20 3.19 x 10-5

4 2885 GRB2 3.42 x 10-4 215075_PM_s_at -0.05 2.85 x 10-1

4 1869 E2F1 3.56 x 10-4 204947_PM_at 0.02 6.97 x 10-1

4 5747 PTK2 4.17 x 10-4 208820_PM_at 0.26 2.36 x 10-8

4 23054 NCOA6 4.62 x 10-4 1568874_PM_at -0.07 4.98 x 10-2

4 1027 CDKN1B 1.02 x 10-3 209112_PM_at -0.01 8.22 x 10-1

4 1499 CTNNB1 1.02 x 10-3 223679_PM_at 0.12 5.18 x 10-4

4 351 APP 1.31 x 10-3 200602_PM_at 0.06 1.24 x 10-1

4 5829 PXN 1.46 x 10-3 201087_PM_at -0.03 5.84 x 10-1

4 3066 HDAC2 2.36 x 10-3 201833_PM_at 0.20 3.89 x 10-5

4 5906 RAP1A 2.36 x 10-3 202362_PM_at -0.12 2.83 x 10-3

4 857 CAV1 2.36 x 10-3 212097_PM_at 0.10 1.19 x 10-1

4 7189 TRAF6 2.88 x 10-3 205558_PM_at -0.05 1.29 x 10-1

4 5290 PIK3CA 3.06 x 10-3 235980_PM_at -0.05 3.63 x 10-1

4 1445 CSK 3.57 x 10-3 202329_PM_at 0.01 8.66 x 10-1

4 5914 RARA 3.57 x 10-3 216300_PM_x_at -0.08 8.21 x 10-2

4 1871 E2F3 3.62 x 10-3 203692_PM_s_at -0.01 8.12 x 10-1

4 5921 RASA1 4.96 x 10-3 202677_PM_at -0.05 5.50 x 10-1

4 5347 PLK1 5.84 x 10-3 202240_PM_at 0.01 7.43 x 10-1

4 4792 NFKB1A 6.09 x 10-3 201502_PM_s_at -0.15 5.33 x 10-5

75

4 2324 FLT4 6.43 x 10-3 234379_PM_at -0.03 5.19 x 10-1

4 391 RHOG 8.03 x 10-3 203175_PM_at -0.13 2.74 x 10-2

4 5594 MAPK1 8.56 x 10-3 212271_PM_at -0.20 5.35 x 10-8

4 7186 TRAF2 9.24 x 10-3 204413_PM_at 0.01 7.87 x 10-1

4 389 RHOC 9.96 x 10-3 235742_PM_at 0.10 6.84 x 10-2

4 6118 RPA2 9.99 x 10-3 210756_PM_at -0.07 9.79 x 10-3

4 3181 HNRNPA2B1 1.02 x 10-2 205292_PM_s_at 0.24 4.82 x 10-8

4 5335 PLCG1 1.09 x 10-2 202789_PM_at 0.19 5.42 x 10-4

4 5982 RFC2 1.24 x 10-2 203696_PM_s_at -0.14 7.63 x 10-5

4 7846 TUBA1A 1.36 x 10-2 209118_PM_s_at -0.13 5.48 x 10-5

4 3685 ITGAV 1.38 x 10-2 202351_PM_at -0.03 8.39 x 10-1

4 1870 E2F2 1.45 x 10-2 207042_PM_at 0.03 5.21 x 10-1

4 7094 TLN1 1.46 x 10-2 236132_PM_at -0.19 8.10 x 10-5

4 7791 ZYX 1.47 x 10-2 200808_PM_s_at -0.10 2.32 x 10-1

4 3551 IKBKB 1.52 x 10-2 211027_PM_s_at 0.11 2.96 x 10-2

4 6502 SKP2 1.52 x 10-2 203625_PM_x_at 0.11 2.01 x 10-2

4 5880 RAC2 1.58 x 10-2 207419_PM_s_at 0.04 4.65 x 10-1

4 999 CDH1 1.68 x 10-2 201131_PM_s_at 0.05 2.58 x 10-1

4 5686 PSMA5 1.72 x 10-2 230300_PM_at 0.24 6.70 x 10-3

4 6631 SNRPC 1.83 x 10-2 201342_PM_at 0.04 2.41 x 10-1

4 2869 GRK5 1.89 x 10-2 204396_PM_s_at -0.21 1.92 x 10-3

4 399 RHOH 1.96 x 10-2 204951_PM_at 0.14 2.88 x 10-3

4 3937 LCP2 2.04 x 10-2 205269_PM_at -0.06 1.92 x 10-2

4 5291 PIK3CB 2.04 x 10-2 217620_PM_s_at 0.08 1.70 x 10-1

4 7132 TNFRSF1A 2.12 x 10-2 207643_PM_s_at -0.03 4.61 x 10-1

4 4751 NEK2 2.30 x 10-2 204641_PM_at -0.19 8.96 x 10-5

4 5702 PSMC3 2.32 x 10-2 201267_PM_s_at 0.10 1.87 x 10-2

4 2064 ERBB2 2.44 x 10-2 210930_PM_s_at 0.03 2.92 x 10-1

4 6628 SNRPB 2.44 x 10-2 213175_PM_s_at 0.13 1.17 x 10-3

4 988 CDC5L 2.57 x 10-2 209057_PM_x_at -0.15 1.61 x 10-3

4 7531 YWHAE 2.82 x 10-2 210996_PM_s_at 0.08 1.23 x 10-5

4 5020 OXT 2.87 x 10-2 207576_PM_x_at 0.09 8.85 x 10-3

4 5580 PRKCD 2.96 x 10-2 202545_PM_at -0.11 2.25 x 10-2

4 7403 KDM6A 3.00 x 10-2 203992_PM_s_at -0.06 2.10 x 10-1

4 388 RHOB 3.05 x 10-2 1553963_PM_at -0.04 7.39 x 10-2

4 23433 RHOQ 3.09 x 10-2 212119_PM_at 0.09 1.65 x 10-3

4 8997 KALRN 3.10 x 10-2 236651_PM_at 0.07 2.68 x 10-3

4 991 CDC20 3.14 x 10-2 202870_PM_s_at -0.07 1.01 x 10-1

76

4 2149 F2R 3.23 x 10-2 1569642_PM_at -0.09 1.32 x 10-2

4 1453 CSNK1D 3.25 x 10-2 208774_PM_at 0.07 4.04 x 10-3

4 8503 PIK3R3 3.38 x 10-2 241325_PM_at -0.11 8.24 x 10-4

4 9662 CEP135 3.41 x 10-2 207268_PM_at 0.03 4.47 x 10-1

4 2962 GTF2F1 3.50 x 10-2 202354_PM_s_at 0.07 4.93 x 10-2

4 56949 XAB2 3.50 x 10-2 218110_PM_at 0.07 2.61 x 10-1

4 2321 FLT1 3.53 x 10-2 232809_PM_s_at 0.17 3.25 x 10-6

4 146 ADRA1D 3.54 x 10-2 210961_PM_s_at -0.06 1.57 x 10-2

4 7201 TRHR 3.81 x 10-2 211438_PM_at -0.10 2.93 x 10-3

4 5979 RET 3.81 x 10-2 215771_PM_x_at 0.02 6.17 x 10-1

4 5705 PSMC5 3.90 x 10-2 209503_PM_s_at 0.03 5.39 x 10-1

4 7307 U2AF1 3.95 x 10-2 202858_PM_at 0.11 3.46 x 10-5

4 8766 RAB11A 3.95 x 10-2 200863_PM_s_at -0.12 2.21 x 10-7

4 5566 PRKACA 4.14 x 10-2 216324_PM_s_at -0.01 9.14 x 10-1

4 3672 ITGA1 4.32 x 10-2 - - -

4 2963 GTF2F2 4.37 x 10-2 209595_PM_at -0.06 2.44 x 10-1

4 2147 F2 4.55 x 10-2 205754_PM_at 0.04 1.51 x 10-1

4 2776 GNAQ 4.61 x 10-2 224861_PM_at -0.20 3.58 x 10-8

4 7277 TUBA4A 4.67 x 10-2 212242_PM_at -0.11 1.03 x 10-2

4 2048 EPHB2 4.82 x 10-2 209588_PM_at 0.16 2.57 x 10-3

4 55660 PRPF40A 4.93 x 10-2 213729_PM_at 0.05 1.56 x 10-1

5 7027 TFDP1 1.41 x 10-4 242939_PM_at -0.15 4.64 x 10-3

5 5294 PIK3CG 4.28 x 10-4 239294_PM_at -0.20 6.36 x 10-5

5 2534 FYN 6.46 x 10-4 243006_PM_at 0.34 1.51 x 10-6

5 3932 LCK 7.09 x 10-4 204890_PM_s_at 0.16 5.28 x 10-4

5 2810 SFN 7.99 x 10-4 33323_PM_r_at -0.15 6.32 x 10-5

5 25 ABL1 8.00 x 10-4 202123_PM_s_at 0.16 3.56 x 10-6

5 2549 GAB1 1.27 x 10-3 214987_PM_at -0.13 9.07 x 10-5

5 7010 TEK 1.69 x 10-3 206702_PM_at 0.08 4.33 x 10-1

5 3630 INS 1.98 x 10-3 206598_PM_at 0.01 8.19 x 10-1

5 7410 VAV2 2.15 x 10-3 226063_PM_at 0.19 8.60 x 10-10

5 9564 BCAR1 2.68 x 10-3 232442_PM_at -0.02 4.58 x 10-1

5 3265 HRAS 3.19 x 10-3 212983_PM_at 0.00 9.65 x 10-1

5 6711 SPTBN1 3.36 x 10-3 212071_PM_s_at 0.22 1.76 x 10-5

5 4087 SMAD2 5.17 x 10-3 226563_PM_at -0.19 8.44 x 10-3

5 1969 EPHA2 5.42 x 10-3 203499_PM_at 0.17 1.03 x 10-9

5 6710 SPTB 5.73 x 10-3 208416_PM_s_at -0.14 7.46 x 10-2

5 5501 PPP1CC 5.88 x 10-3 200726_PM_at 0.08 8.13 x 10-2

77

5 8850 KAT2B 7.68 x 10-3 203845_PM_at -0.18 2.55 x 10-2

5 1945 EFNA4 8.62 x 10-3 205107_PM_s_at 0.09 1.96 x 10-2

5 2065 ERBB3 8.90 x 10-3 215638_PM_at -0.02 3.48 x 10-1

5 5925 RB1 1.00 x 10-2 211540_PM_s_at 0.00 9.54 x 10-1

5 3320 HSP90AA1 1.01 x 10-2 214328_PM_s_at 0.18 6.30 x 10-5

5 7029 TFDP2 1.01 x 10-2 226157_PM_at 0.12 6.70 x 10-3

5 5336 PLCG2 1.11 x 10-2 204613_PM_at -0.04 3.74 x 10-1

5 4088 SMAD3 1.30 x 10-2 218284_PM_at 0.23 4.04 x 10-8

5 1942 EFNA1 1.42 x 10-2 202023_PM_at 0.02 5.14 x 10-1

5 8900 CCNA1 1.47 x 10-2 205899_PM_at -0.03 5.23 x 10-1

5 2773 GNAI3 1.71 x 10-2 201179_PM_s_at -0.09 4.41 x 10-2

5 5062 PAK2 1.75 x 10-2 205962_PM_at -0.06 6.78 x 10-2

5 5934 RBL2 1.91 x 10-2 212332_PM_at 0.09 6.09 x 10-2

5 1022 CDK7 2.06 x 10-2 211297_PM_s_at 0.12 8.71 x 10-4

5 3845 KRAS 2.10 x 10-2 204010_PM_s_at -0.08 9.33 x 10-2

5 1500 CTNND1 2.16 x 10-2 208407_PM_s_at 0.21 9.55 x 10-5

5 8651 SOCS1 2.25 x 10-2 210000_PM_s_at -0.04 1.69 x 10-1

5 4331 MNAT1 2.28 x 10-2 203565_PM_s_at 0.02 6.64 x 10-1

5 1017 CDK2 2.34 x 10-2 204252_PM_at 0.07 3.37 x 10-2

5 30011 SH3KBP1 2.49 x 10-2 1553588_PM_at -0.10 1.02 x 10-3

5 1111 CHEK1 2.75 x 10-2 205394_PM_at -0.06 4.74 x 10-2

5 6709 SPTAN1 2.79 x 10-2 214925_PM_s_at 0.18 1.64 x 10-3

5 902 CCNH 2.83 x 10-2 204093_PM_at -0.02 5.94 x 10-1

5 7514 XPO1 3.16 x 10-2 235927_PM_at 0.18 1.92 x 10-4

5 5728 PTEN 3.19 x 10-2 225363_PM_at -0.10 5.10 x 10-3

5 1147 CHUK 3.41 x 10-2 209666_PM_s_at -0.11 1.17 x 10-1

5 2889 RAPGEF1 3.41 x 10-2 204543_PM_at 0.09 3.69 x 10-3

5 7704 ZBTB16 3.50 x 10-2 205883_PM_at 0.03 7.81 x 10-1

5 8312 AXIN1 3.82 x 10-2 212849_PM_at 0.05 1.98 x 10-1

5 595 CCND1 4.02 x 10-2 208711_PM_s_at -0.05 6.04 x 10-2

5 7297 TYK2 4.25 x 10-2 205546_PM_s_at -0.09 2.08 x 10-2

5 699 BUB1 4.29 x 10-2 216275_PM_at -0.06 3.76 x 10-2

5 1943 EFNA2 4.34 x 10-2 238956_PM_at -0.16 1.00 x 10-3

5 9844 ELMO1 4.34 x 10-2 204513_PM_s_at -0.04 4.63 x 10-1

5 2960 GTF2E1 4.34 x 10-2 205930_PM_at 0.12 1.07 x 10-1

5 3702 ITK 4.61 x 10-2 211339_PM_s_at 0.14 1.27 x 10-2

5 2185 PTK2B 4.76 x 10-2 203110_PM_at -0.12 1.41 x 10-3

5 4605 MYBL2 4.76 x 10-2 201710_PM_at 0.04 2.35 x 10-1

78

5 925 CD8A 4.76 x 10-2 205758_PM_at 0.15 1.77 x 10-2

5 940 CD28 4.92 x 10-2 206545_PM_at 0.27 1.77 x 10-3

5 7319 UBE2A 5.00 x 10-2 201898_PM_s_at -0.08 2.82 x 10-3

6 4089 SMAD4 4.09 x 10-5 1565703_PM_at 0.15 3.94 x 10-3

6 4609 MYC 1.48 x 10-4 202431_PM_s_at 0.17 4.54 x 10-4

6 2353 FOS 4.12 x 10-4 209189_PM_at -0.09 2.87 x 10-1

6 5879 RAC1 4.17 x 10-4 208640_PM_at 0.04 3.20 x 10-2

6 111 ADCY5 4.34 x 10-4 228182_PM_at -0.04 1.78 x 10-1

6 4851 NOTCH1 2.40 x 10-3 223508_PM_at -0.04 3.57 x 10-1

6 8772 FADD 2.97 x 10-3 202535_PM_at -0.23 2.82 x 10-7

6 6776 5A 3.27 x 10-3 203010_PM_at -0.06 1.60 x 10-1

6 1906 EDN1 3.93 x 10-3 218995_PM_s_at 0.12 2.26 x 10-3

6 990 CDC6 4.65 x 10-3 203968_PM_s_at -0.03 3.79 x 10-1

6 983 CDK1 4.75 x 10-3 203214_PM_x_at -0.13 6.06 x 10-5

6 3065 HDAC1 6.80 x 10-3 201209_PM_at 0.08 5.43 x 10-3

6 4690 NCK1 1.06 x 10-2 229895_PM_s_at 0.09 1.90 x 10-2

6 1495 CTNNA1 1.34 x 10-2 217366_PM_at 0.03 2.98 x 10-1

6 5928 RBBP4 1.68 x 10-2 225396_PM_at 0.22 1.08 x 10-6

6 5058 PAK1 1.84 x 10-2 209615_PM_s_at -0.14 7.18 x 10-3

6 23421 ITGB3BP 2.22 x 10-2 205176_PM_s_at -0.06 4.94 x 10-1

6 25492 SIN3A 2.32 x 10-2 238006_PM_at 0.12 3.89 x 10-4

6 367 AR 2.42 x 10-2 226197_PM_at 0.04 3.02 x 10-1

6 3458 IFNG 2.71 x 10-2 210354_PM_at -0.20 1.20 x 10-3

6 6772 STAT1 2.76 x 10-2 209969_PM_s_at 0.08 1.73 x 10-1

6 8331 HIST1H2AJ 2.77 x 10-2 208583_PM_x_at -0.04 2.49 x 10-1

6 10006 ABI1 2.82 x 10-2 209027_PM_s_at 0.04 4.47 x 10-1

6 10451 VAV3 3.24 x 10-2 218807_PM_at -0.08 1.40 x 10-1

6 9448 MAP4K4 3.55 x 10-2 1558732_PM_at 0.20 2.89 x 10-5

6 1234 CCR5 4.15 x 10-2 206991_PM_s_at 0.13 2.74 x 10-2

- 3091 HIF1A 7.90 x 10-4 200989_PM_at 0.03 6.21 x 10-1

- 2057 EPOR 1.07 x 10-3 396_PM_f_at -0.13 1.51 x 10-3

- 1432 MAPK14 1.53 x 10-3 211561_PM_x_at -0.24 3.16 x 10-5

- 1399 CRKL 2.24 x 10-3 212180_PM_at 0.10 2.75 x 10-4

- 5901 RAN 2.29 x 10-3 200750_PM_s_at 0.01 7.62 x 10-1

- 3009 HIST1H1B 2.89 x 10-3 214534_PM_at 0.05 1.75 x 10-1

- 53358 SHC3 2.98 x 10-3 206330_PM_s_at -0.01 8.22 x 10-1

- 613 BCR 3.55 x 10-3 226602_PM_s_at 0.15 6.03 x 10-4

- 7040 TGFB1 4.19 x 10-3 203084_PM_at 0.02 6.86 x 10-1

79

- 118460 EXOSC6 4.31 x 10-3 231916_PM_at -0.06 2.72 x 10-2

- 5970 RELA 5.29 x 10-3 209878_PM_s_at 0.10 1.50 x 10-3

- 27040 LAT 5.42 x 10-3 - - -

- 25782 RAB3GAP2 5.59 x 10-3 202373_PM_s_at 0.07 4.24 x 10-3

- 23624 CBLC 5.98 x 10-3 220638_PM_s_at 0.09 8.91 x 10-4

- 10125 RASGRP1 6.47 x 10-3 205590_PM_at 0.20 3.46 x 10-3

- 6804 STX1A 7.18 x 10-3 204729_PM_s_at 0.04 2.50 x 10-1

- 5296 PIK3R2 7.60 x 10-3 229392_PM_s_at 0.05 3.10 x 10-1

- 6908 TBP 8.15 x 10-3 203135_PM_at 0.18 2.79 x 10-8

- 2260 FGFR1 8.42 x 10-3 222164_PM_at -0.19 5.06 x 10-5

- 6387 CXCL12 9.35 x 10-3 209687_PM_at 0.02 5.10 x 10-1

- 8471 IRS4 9.51 x 10-3 1560652_PM_at -0.04 1.12 x 10-1

- 8646 CHRD 9.65 x 10-3 211248_PM_s_at -0.05 5.15 x 10-2

- 7423 VEGFB 9.76 x 10-3 203683_PM_s_at 0.09 8.23 x 10-3

- 10055 SAE1 1.01 x 10-2 1555618_PM_s_at 0.19 1.08 x 10-7

- 841 CASP8 1.08 x 10-2 1553306_PM_at -0.20 4.15 x 10-3

- 4897 NRCAM 1.10 x 10-2 204105_PM_s_at 0.26 4.73 x 10-5

- 6777 STAT5B 1.12 x 10-2 1555088_PM_x_at -0.13 6.63 x 10-3

- 659 BMPR2 1.18 x 10-2 210214_PM_s_at 0.07 8.77 x 10-2

- 6605 SMARCE1 1.18 x 10-2 211989_PM_at 0.20 3.80 x 10-8

- 596 BCL2 1.22 x 10-2 203685_PM_at 0.11 5.32 x 10-2

- 5046 PCSK6 1.30 x 10-2 210553_PM_x_at -0.11 2.75 x 10-4

- 9180 OSMR 1.30 x 10-2 205729_PM_at -0.04 3.98 x 10-2

- 9368 SLC9A3R1 1.34 x 10-2 201349_PM_at -0.01 7.68 x 10-1

- 22943 DKK1 1.36 x 10-2 204602_PM_at -0.08 2.09 x 10-3

- 6809 STX3 1.39 x 10-2 209238_PM_at -0.17 1.84 x 10-4

- 10159 ATP6AP2 1.42 x 10-2 201444_PM_s_at 0.11 8.03 x 10-3

- 55290 BRF2 1.43 x 10-2 218954_PM_s_at 0.08 1.43 x 10-2

- 2923 PDIA3 1.49 x 10-2 208612_PM_at 0.17 1.82 x 10-7

- 4205 MEF2A 1.49 x 10-2 212535_PM_at -0.20 1.25 x 10-4

- 1847 DUSP5 1.58 x 10-2 209457_PM_at 0.34 1.08 x 10-6

- 598 BCL2L1 1.59 x 10-2 231228_PM_at 0.04 3.68 x 10-1

- 640 BLK 1.60 x 10-2 206255_PM_at 0.28 3.61 x 10-6

- 1796 DOK1 1.64 x 10-2 216835_PM_s_at -0.12 1.53 x 10-2

- 5778 PTPN7 1.66 x 10-2 1554860_PM_at 0.08 1.09 x 10-1

- 9712 USP6NL 1.71 x 10-2 204761_PM_at 0.17 1.74 x 10-2

- 5567 PRKACB 1.74 x 10-2 235780_PM_at -0.11 1.36 x 10-1

- 4929 NR4A2 1.81 x 10-2 216248_PM_s_at 0.07 2.36 x 10-1

80

- 57510 XPO5 1.83 x 10-2 223055_PM_s_at 0.11 9.66 x 10-3

- 4023 LPL 1.87 x 10-2 203549_PM_s_at -0.11 1.07 x 10-2

- 10445 MCRS1 1.88 x 10-2 202556_PM_s_at 0.04 3.40 x 10-1

- 6198 RPS6KB1 1.89 x 10-2 211578_PM_s_at 0.09 2.82 x 10-3

- 7341 SUMO1 1.92 x 10-2 211069_PM_s_at -0.02 5.51 x 10-1

- 705 BYSL 1.96 x 10-2 203612_PM_at 0.05 3.66 x 10-1

- 9314 KLF4 2.11 x 10-2 220266_PM_s_at 0.13 1.53 x 10-3

- 10253 SPRY2 2.18 x 10-2 204011_PM_at 0.04 4.61 x 10-1

- 91 ACVR1B 2.22 x 10-2 213198_PM_at -0.08 7.70 x 10-2

- 55811 ADCY10 2.23 x 10-2 217305_PM_s_at -0.09 4.96 x 10-2

- 1739 DLG1 2.23 x 10-2 230229_PM_at 0.19 7.94 x 10-5

- 2158 F9 2.27 x 10-2 207218_PM_at -0.04 1.98 x 10-2

- 5034 P4HB 2.34 x 10-2 200654_PM_at 0.10 1.28 x 10-4

- 64425 POLR1E 2.34 x 10-2 231041_PM_at 0.12 1.51 x 10-2

- 5154 PDGFA 2.34 x 10-2 229830_PM_at 0.16 1.49 x 10-2

- 3981 LIG4 2.34 x 10-2 206235_PM_at -0.09 1.58 x 10-2

- 6595 SMARCA2 2.34 x 10-2 212258_PM_at 0.09 1.73 x 10-3

- 94 ACVRL1 2.45 x 10-2 226950_PM_at 0.02 6.55 x 10-1

- 359 AQP2 2.47 x 10-2 240285_PM_at 0.04 7.39 x 10-2

- 2066 ERBB4 2.57 x 10-2 233494_PM_at -0.05 2.43 x 10-2

- 2886 GRB7 2.61 x 10-2 210761_PM_s_at -0.04 2.18 x 10-1

- 1215 CMA1 2.66 x 10-2 214533_PM_at 0.00 9.76 x 10-1

- 5770 PTPN1 2.68 x 10-2 217686_PM_at 0.08 6.99 x 10-3

- 919 CD247 2.73 x 10-2 210031_PM_at 0.17 3.72 x 10-4

- 4233 MET 2.80 x 10-2 213807_PM_x_at -0.04 1.58 x 10-1

- 4170 MCL1 2.87 x 10-2 214057_PM_at 0.04 1.48 x 10-1

- 840 CASP7 2.88 x 10-2 207181_PM_s_at 0.12 8.03 x 10-2

- 4801 NFYB 2.89 x 10-2 218129_PM_s_at -0.10 2.57 x 10-1

- 5494 PPM1A 2.89 x 10-2 235344_PM_at -0.09 1.32 x 10-1

- 5873 RAB27A 2.91 x 10-2 209514_PM_s_at -0.27 1.53 x 10-8

- 7133 TNFRSF1B 2.91 x 10-2 203508_PM_at -0.02 6.13 x 10-1

- 3172 HNF4A 2.91 x 10-2 214851_PM_at -0.04 1.00 x 10-1

- 50855 PARD6A 2.96 x 10-2 205245_PM_at 0.04 3.21 x 10-1

- 2261 FGFR3 3.00 x 10-2 204379_PM_s_at -0.06 6.70 x 10-2

- 2887 GRB10 3.03 x 10-2 215248_PM_at -0.35 1.87 x 10-6

- 8125 ANP32A 3.14 x 10-2 201051_PM_at 0.05 3.44 x 10-2

- 134353 LSM11 3.19 x 10-2 226826_PM_at 0.19 6.91 x 10-3

- 23114 NFASC 3.25 x 10-2 214799_PM_at -0.14 4.96 x 10-3

81

- 4654 MYOD1 3.25 x 10-2 206657_PM_s_at 0.00 9.57 x 10-1

- 1962 EHHADH 3.41 x 10-2 205222_PM_at -0.01 8.84 x 10-1

- 118788 PIK3AP1 3.41 x 10-2 226459_PM_at 0.00 9.48 x 10-1

- 7375 USP4 3.43 x 10-2 211800_PM_s_at -0.03 3.84 x 10-1

- 10376 TUBA1B 3.43 x 10-2 201090_PM_x_at -0.10 3.81 x 10-3

- 4296 MAP3K11 3.46 x 10-2 203652_PM_at -0.11 5.22 x 10-2

- 3953 LEPR 3.47 x 10-2 211356_PM_x_at -0.10 7.70 x 10-4

- 28964 GIT1 3.50 x 10-2 218030_PM_at 0.06 1.21 x 10-1

- 5423 POLB 3.53 x 10-2 234907_PM_x_at 0.08 1.34 x 10-1

- 5782 PTPN12 3.53 x 10-2 202006_PM_at -0.05 5.84 x 10-1

- 5467 PPARD 3.55 x 10-2 37152_PM_at 0.15 2.88 x 10-4

- 6997 TDGF1 3.64 x 10-2 206286_PM_s_at -0.04 2.90 x 10-1

- 4312 MMP1 3.73 x 10-2 204475_PM_at -0.14 3.57 x 10-2

- 56896 DPYSL5 3.74 x 10-2 222797_PM_at -0.05 1.09 x 10-1

- 8567 MADD 3.74 x 10-2 210252_PM_at 0.09 2.27 x 10-2

- 1263 PLK3 3.74 x 10-2 229825_PM_at 0.04 2.15 x 10-1

- 5591 PRKDC 3.74 x 10-2 208694_PM_at -0.08 6.81 x 10-2

- 27063 ANKRD1 3.80 x 10-2 206029_PM_at -0.10 1.23 x 10-3

- 23468 CBX5 3.81 x 10-2 226085_PM_at 0.28 6.43 x 10-7

- 2357 FPR1 3.81 x 10-2 205119_PM_s_at -0.07 3.38 x 10-2

- 5609 MAP2K7 3.81 x 10-2 226023_PM_at 0.19 1.07 x 10-4

- 10018 BCL2L11 3.81 x 10-2 1553096_PM_s_at -0.18 2.80 x 10-4

- 22938 SNW1 3.81 x 10-2 201575_PM_at 0.17 1.24 x 10-7

- 57162 PELI1 3.82 x 10-2 232304_PM_at 0.08 9.45 x 10-2

- 1288 COL4A6 3.96 x 10-2 213992_PM_at -0.04 6.10 x 10-2

- 8607 RUVBL1 3.96 x 10-2 201614_PM_s_at 0.19 2.77 x 10-6

- 10054 UBA2 4.07 x 10-2 229587_PM_at 0.19 4.26 x 10-4

- 3611 ILK 4.10 x 10-2 201234_PM_at -0.04 3.70 x 10-1

- 660 BMX 4.12 x 10-2 206464_PM_at -0.37 7.16 x 10-6

- 10112 KIF20A 4.12 x 10-2 218755_PM_at -0.07 6.49 x 10-3

- 5422 POLA1 4.15 x 10-2 204835_PM_at 0.12 3.02 x 10-3

- 1509 CTSD 4.16 x 10-2 200766_PM_at -0.20 4.26 x 10-3

- 3692 EIF6 4.16 x 10-2 210213_PM_s_at 0.01 8.89 x 10-1

- 84552 PARD6G 4.23 x 10-2 227204_PM_at -0.05 3.01 x 10-1

- 7185 TRAF1 4.25 x 10-2 235116_PM_at 0.06 3.10 x 10-1

- 6241 RRM2 4.26 x 10-2 209773_PM_s_at 0.11 3.23 x 10-1

- 10640 EXOC5 4.29 x 10-2 224253_PM_at -0.09 1.89 x 10-1

- 4216 MAP3K4 4.29 x 10-2 216199_PM_s_at 0.13 2.26 x 10-4

82

- 4041 LRP5 4.29 x 10-2 1552405_PM_at -0.01 7.22 x 10-1

- 2572 GAD2 4.34 x 10-2 211264_PM_at -0.04 5.79 x 10-2

- 817 CAMK2D 4.34 x 10-2 224994_PM_at 0.32 3.77 x 10-7

- 2002 ELK1 4.34 x 10-2 203617_PM_x_at 0.15 4.66 x 10-9

- 6016 RIT1 4.39 x 10-2 209882_PM_at -0.17 1.35 x 10-4

- 1849 DUSP7 4.39 x 10-2 213848_PM_at 0.14 2.48 x 10-5

- 56616 DIABLO 4.40 x 10-2 219350_PM_s_at 0.06 1.65 x 10-2

- 51807 TUBA8 4.62 x 10-2 220069_PM_at 0.06 3.84 x 10-2

- 9988 DMTF1 4.62 x 10-2 203301_PM_s_at 0.01 9.11 x 10-1

- 6602 SMARCD1 4.76 x 10-2 203183_PM_s_at 0.09 4.62 x 10-2

- 2533 FYB 4.80 x 10-2 224148_PM_at -0.06 3.44 x 10-1

- 5584 PRKCI 4.80 x 10-2 209678_PM_s_at 0.19 2.87 x 10-5

- 86 ACTL6A 4.93 x 10-2 202666_PM_s_at 0.08 2.69 x 10-1

- 4947 OAZ2 4.95 x 10-2 201364_PM_s_at -0.07 1.75 x 10-1

- 8861 LDB1 4.95 x 10-2 35160_PM_at 0.01 7.17 x 10-1

- 8569 MKNK1 4.95 x 10-2 209467_PM_s_at -0.23 3.14 x 10-7

- 8660 IRS2 4.97 x 10-2 209184_PM_s_at -0.07 3.90 x 10-1

- 5321 PLA2G4A 5.00 x 10-2 210145_PM_at -0.09 2.79 x 10-1

- 80854 SETD7 5.00 x 10-2 244653_PM_at -0.08 1.08 x 10-2

83

Appendix J

Figure J1 | Representation of the network clusters 1-6 of the core module "Clique Sum ∪ DIAMOnD ∩ MCODE" from the GSE69683 dataset. Node size is directly dependent on intra-cluster node degree. Gene expression data from the related limma model (severe asthmatics versus moderate asthmatics and controls) has been added to each module gene: up-regulated genes are shown in green, down-regulated genes are shown in red (log2 fold change scale: -0.50 to 0.50). The layout used for every cluster was the Prefuse Force Directed layout from Cytoscape.

Cluster 1

Cluster 2

Cluster 3

Cluster 4

Cluster 5

Cluster 6

-0.50 0 0.50

Expression (logFC)

84

Appendix K

Table K1 | Summary of the associations with asthma for the unclustered genes of the core module "severe asthma versus moderate asthma and controls" of the MODifieR method combination with the higher PASCAL score for GWAS significance ("Clique Sum ∪ DIAMOnD ∩ MCODE", GSE69683).

Gene Product Asthma association Reference

ACVR1B Alk-4 Increased expression of Alk-1 and Alk-4 in T cells after allergen challenge in mild asthma.

Kariyawasam et al. (2009) ACVRL1 Alk-1

BMPR2 BMPRII Decreased expression of BMPRII in mild asthma. Kariyawasam et al. (2008)

CASP8 CASP8 IL-1 cytokine regulation in asthma pathogenesis; potential therapeutic target.

Qi et al. (2017)

CMA1 CMA1 (TG)n(GA)m repeat polymorphism in CMA1 is associated with atopic asthma and total serum IgE levels.

Sharma et al. (2005)

Short tandem repeat (STR) polymorphism in CMA1 is strongly associated with atopic asthma.

Hersberger et al. (2010)

CXCL12 SDF-1 Increased vascularity of bronchial mucosa in asthma due to SDF-1 expression.

Hoshino et al. (2003)

DOK1 DOK-1 Regulatory role of DOK-1 in allergen-induced Th2 inflammation and airway response, in a murine model of asthma; potential therapeutic target.

Lee et al. (2012)

HIF1A HIF-1α Role in the development of allergic airway inflammation, expression increased in asthma in rhinitis patients, after challenge; potential therapeutic target.

Huerta-Yepez et al. (2011)

LEPR Ob-R Decreased expression of Ob-R in severe asthma, associated with airway remodeling.

Bruno et al. (2009)

Increased expression of Ob-R in airway smooth muscle of obese rats with asthma, associated with corticosteroid-resistant asthma.

Wang et al. (2012)

MAPK14 p38 MAPK Increased activation of p38 MAPK in severe asthma, associated with corticosteroid-resistant asthma.

Bhavsar et al. (2010)

MMP1 MMP-1 MMP-1 polymorphism associated with asthma with persistent airway obstruction.

Huang et al. (2009)

PLA2G4A cPLA2α (CA)(n) and (T)(n) repeat polymorphisms linked with cPLA2α overexpression in severe asthma.

Sokolowska et al. (2007)

STAT5B STAT5B Association with asthma for SNP rs9909628 of STAT5B. Ek et al. (2017)

STX3 STX3 CpG site cg19764973 from STX3 associated with childhood asthma in DNA methylation study.

Xu et al. (2018)

TGFB1 TGF-β1 Association of genetic polymorphisms in TGFB1 with asthma, TGF-β1 modulates asthma severity and airway obstruction.

Chiang et al. (2013)

Increased expression of TGF-β1 in severe asthma. Ozyilmaz et al. (2009)

Identification of personalized multi-omic disease modules ... › smash › get › diva2:1233438 › FULLTEXT01.pdf · Identification of personalized multi-omic disease modules in

Documents