VAISHNAVI GOWRISANKAR 665591371 04/07/2015 Human Disease Symptom Network Zhou et.al., 2014
Dec 22, 2015
Outline
Introduction Human Symptom Disease network (HSDN)
Construction of HSDNResults
Performance evaluation of HSDN Integrating gene disease associations Integrating shared protein interactions Diversity of disease manifestation and molecular mechanism Disease Groups
DiscussionLimitationsFuture Directions
Introduction
Networks used to study entangled relationship between diseases
This construction of networks has been widely used to infer comorbidity links between disorders and
disease history of patients1
disease phenotypic network using comorbidity patterns have been used to understand disease progression patterns2
Introduction
Symptoms and signs which a patients presents overlooked
Symptoms are most directly observable characteristics of a disease and the very basis of clinical disease classification
Connection between shared symptoms and genes of 2 diseases could bridge gap between biological discovery and bed side clinical observations.
In this article, a large scale medical bibliographic records (PubMed – including MEDLINE) and the related Medical Subjects Headings (MeSH) metadata was used to generate a symptom-based network of human diseases - HSDN
The link weight between 2 diseases quantifies the similarity of their respective symptoms
By integrating disease-gene association and protein-protein interaction data, the correlations between the symptom similarity of the disease and their degree of shared genes was investigated
Introduction
Construction of HSDN
Basic Datasets: Construction of symptom-based disease network requires a basic taxonomy for diseases and symptoms
(MeSH) a corpus of data from which to extract their
relations (PubMed)MeSH vocabulary and PubMed literature
database was chosen from several possible combinations
ICD9/10, HPO and OMIM
MeSH is used directly to index all articles in the massive PubMed database
Construction of HSDN
MeSH is designed as hierarchical structure with general categories (Animals, Diseases, Phenomena and Process)
Diseases contains the sub-category Symptoms and Signs – terms related to clinical manifestations observed by physicians and perceived by patients
All terms in the diseases category except ‘animal diseases’ was included
Construction of HSDN
Finally 4442 distinct MeSH diseases terms and 322 distinct MeSH symptom terms were used in PubMed query which resulted in 7,109,429 PubMed records
The above 7,109,429 PubMed records are filtered for the co-occurrence of at least one disease and one symptom term 849,103 records was obtained
Construction of HSDN
Extracting the disease-symptom relationships from PubMed bibliographic literature database. The association between symptoms and diseases are based on their co-occurrence in the MeSH metadata fields of PubMed
Construction of HSDN
Symptom based disease similarity• To quantify the relationship between a symptom and
a disease, Tf-Idf is used• Every disease j by a vector of symptoms dj & wi,j
quantifies the strength of the association between symptom i and disease j
• To avoid absolute co-occurrence due to highly abundant symptoms and publication biases towards certain diseases, Tf-Idf is used instead of wi,j
Construction of HSDN
Term frequency-Inverse document frequency (Tf-Idf)
Wi,j is the strength of an association between symptom i and disease j
N – number of all diseases in the dataset ni – number of diseases where symptom i appears
Similarity between 2 diseases is defined by the cosine similarity of the respective disease vectors Vectors dx and dy of 2 diseases x and y
Cosine similarity ranges from 0 (no shared symptoms) to 1 (identical symptoms)
Construction of HSDN
A disease network is constructed, in which nodes represent diseases and links represent symptom similarities between diseases
Construction of HSDN
Integrating gene disease association and PPI databases to obtain shared genes/PPI between diseases
Construction of HSDN
The backbone of the HSDN. Highly clustered regions of the network belong to same broad disease categories
Results
Performance Evaluation of HSDN Manual evaluation of retrieved co-occurrences
1000 records were randomly selected from 849,103 PubMed records and extracted disease-symptoms relationship with the help of medical experts.
Our evaluation focused on the issues disease-symptom relationship is direct and not
influenced by drugs or coincidental co-occurrence reported symptoms-disease relations are very specific.
57% of the records point to one disease, 28.5% point to 2 diseases and only 14.5% pointing 2 or more.
minimal false positives only 0.8% (disease x is not related to symptom y)
Results
Performance Evaluation of HSDN Reliability test for the disease similarity score
Construction of benchmark disease network (HPO) and comparing it with HSDN Construction of HPO (Human Phenotype Ontology)
• Manually curated database derived from OMIM (Online catalogue of human genes and genetic disorders)
• Covers all phenotypic abnormalities in commonly human monogenic disease
• MeSH disease terms are typically more general and therefore several OMIM identifiers may map to one MeSH term
• Final HPO used to benchmark the HSDN contained 940 diseases map both on OMIM and MeSH with 121,945 links indicating shared symptoms
Results
This network is much smaller than HSDN but arguably of higher quality (OMIM disease identifiers are much more specific when mapped with MeSH)
Higher symptom similarity in HSDN is related to higher edge overlap in HPO
Pearson correlation coefficient between ratio of shared disease links and disease similarity is very high (0.96) indicating HSDN obtains a reliable disease similarity score
Results
Shared symptoms indicate shared genes between diseases Integrated 3 genotype-phenotype databases and
constructed a Human Disease Network (HDN) as described by Goh et.al3
In HDN, 2 diseases are connected if they share a gene Comparing HSDN with HDN, overlapping link ratio shows a
strong positive correlation between disease similarity Overlapping link ratio is a fraction of disease pairs with both
shared symptoms and shared genes of all disease pairs with shared symptoms.
It can be inferred that diseases with more similar symptoms are more likely to have common gene associations
Shared symptoms indicate shared protein interactions Not only gene association but also close interaction of
proteins Integrated 5 publicly available PPI databases into 1 binary
PPI network Constructed disease networks in which 2 diseases are linked
if they share first and second order PPI interactions Proteins associated to the same human disease/disease
category or phenotype tend to interact with each other and so HDSN focuses only on symptoms and includes all diseases categories, thereby providing robust evidence that interacting proteins between diseases are also connected to similar higher level manifestations.
Results
Results
High symptoms similarity strongly correlates with shared genes as well as first and second order protein interactions suggesting general relationship between phenotypic similarity on one hand and path lengths on the PPI network on the other hand
To test this we calculate the shortest path (DijKstra’s algorithm) link for all protein pairs and the minimum shortest PPI path length between each disease pair
Higher the symptom similarity shorter the PPI network distance between diseases
Results
DijKstra’s algorithm to find all shortest path in the PPI network To quantify the PPI distance between disease pairs
single linkage distance DSL is used DSL is the minimum of all shortest paths between
related proteins For 2 diseases x and y with corresponding related
protein sets Px and Py, the single linkage distance is given by
D(pi, pj) is the shortest path length between 2 proteins pi and Pj
Results
Diversity of disease manifestations and molecular mechanisms Pleiotropism and genetic heterogeneity causes discrepancy
in diverse clinical manifestations and underlying cellular mechanisms
To understand these complex relations genome components are mapped with intermediate phenotype components, environmental factors
To analyze the relation between molecular and phenotypic diversity of diseases SGPDN is constructed.
Shared genes, proteins disease network (SGPDN) An integrated disease network that combines phenotypic
relations based on symptom similarity with shared molecular mechanisms based on protein interactions was constructed
Results
HSDN for significant links with similarity score >0.1 is filtered All disease links supported by either shared genes or 1st/2nd
order protein interactions Betweennes and node diversity are used to measure the
disease diversity in this network Betweennes is a centrality measure quantifying how many
shortest path run through the node
Diversity ϕ of node j is based on the node bridging coefficient
k(i) is the degree of node I, N(i) denotes its neighborhood
Results
A strong positive correlation of the 2 quantities used to measure disease diversity in the SGPDN and the corresponding maximum diversities of disease related genes in the PPI network was found
These results demonstrate that a disease with diverse clinical manifestations will typically also have more diverse underlying cellular network mechanisms
Results
Disease Groups To study the interrelationship between the classes of
diseases. In the SGPDN, it was found that diseases within the same
category form clear highly interconnected communities Eg: metabolic diseases, digestive system diseases
Exceptions include bacterial, viruses diseases which link to all the communities
Discussion
Results indicate strong associations between symptom similarity of diseases and shared genes and PPI’s
Clear correspondence between the diversity of the clinical manifestations of diseases and the underlying diversity in their cellular mechanisms
Individual level disease phenotypes (symptoms) and molecular level disease components (genes/PPIs) show robust correlations, even though their direct associations are influenced by complicated intermediate factors
Discussion
Observed correlations between clinical manifestations and molecular mechanisms of disease can be highly valuable for functional annotations of genomics and reveal regularities between different disease categories
Another promising use of this broad data across disease categories is a comparison between genetic and infectious diseases
Symptoms also play a crucial role in drug related research and as most FDA approved drugs are palliative (just treat symptoms rather targeting disease specific genes or pathways)
Limitations
MeSH vocabulary is relatively old and rigid with only annual updates This could limit the extent to which the identified
associations capture latest research results of the rapidly evolving field of medicine
Future Directions
How to improve full text analysis of large-scale database to increase the accuracy of search?
How to improvise on the distinction between symptoms and disease is not very well understood as yet?
How to develop techniques that can automatically extract information from clinical records?
How to develop a method of symptom similarity scores that can be assigned to provide for gene prioritization and target identification of viral/bacterial infections.
References
1) Rzhetsky A., Wajngurt,D., Park,N. & Zheng,T. Probing genetic overlap among complex human phenotypes. Proc. Natl. Acad. Sci USA 104, 11694-11699 (2007)
2)Hidalgo, C.A., Blumm, N., Barabasi, A.L. & Christakis, N.A A dynamic network approach for the study of human phenotypes. PLoS. Comput. Biol 5, e1000353 (2009)
3) Goh, K. I. et al. The human disease network. Proc. Natl Acad. Sci., 104, 8685–8690, (2007)
4) Supplementary Methods, Zhou et.al Nature Communications, 1-22, 2014
5) Wikipedia
Definitions
Genotype – The genetic makeup of a cell, an organism, or an individual usually with reference to a specific characteristic under consideration
Phenotype – The outward appearance of an organism, the expression of genotype in the form of traits that can be seen or measured.
HSDN – Human Symptoms Disease Network
MeSH – Medical Subject Headings – defined by experts and offers a comprehensive vocabulary across all disease categories
PPI – Protein-protein interaction
Definitions
Polygenicity – multiple gene inheritance influencing the phenotypic trait
Pleiotropism – A single gene affects a number of phenotypic traits in the same organism. These affected traits often seem unrelated to each other
Genetic heterogeneity: Single phenotype or genetic disorder can be caused by a number of alleles,locus. This is in contrast to pleiotropism
Definitions