1 DELINEATING CANCER GENOMICS THROUGH DATA VISUALIZATION Project report submitted in partial fulfilment of the requirement for the degree of Bachelor of Design By Linu George (111020516) Rupam Das (11020529) Under the supervision of Dr. Prasad Bokil DEPARTMENT OF DESIGN INDIAN INSTITUTE OF TECHNOLOGY GUWAHATI (July 2014 - November 2014)
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
DELINEATING CANCER GENOMICS
THROUGH DATA VISUALIZATION
Project report submitted
in partial fulfilment of the requirement for the degree of
Bachelor of Design
By
Linu George (111020516)
Rupam Das (11020529)
Under the supervision of
Dr. Prasad Bokil
DEPARTMENT OF DESIGN
INDIAN INSTITUTE OF TECHNOLOGY GUWAHATI
(July 2014 - November 2014)
1
Approval Sheet
This project report entitled "Delineating cancer genomics through Data Visualization" by Linu
George (11020516) and Rupam Das (11020529) is approved for the degree of Bachelor of
Design.
Examiners
____________________
____________________
____________________
Supervisor(s)
____________________
____________________
____________________
Chairman
____________________
Date: 10th November, 2014
Place: IIT Guwahati
ii
Declaration
We declare that this written submission represents our ideas in our own words and where others'
ideas or words have been included, we have adequately cited and referenced the original. We
also declare that we have adhered to all principles of academic honesty and integrity and have
not misrepresented or fabricated or falsified any idea/data/fact/source in our submission. We
understand that any violation of the above will be cause for disciplinary action by the institute
and can also evoke penal action from the sources which have thus not been properly cited or
from whom proper permission has not been taken when needed.
______________________ __________________
Linu George Rupam Das
11020506 11020529
Date: 10th November, 2014
iii
Certificate
It is certified that the work contained in the project report titled " Delineating cancer genomics
through Data Visualization " by Linu George (11020506) and Rupam Das (11020529), has been
carried out under my supervision and that this work has not been submitted elsewhere for a
degree.
________________________________
Dr Prasad Bokil
Asst. Professor
Department of Design
Indian Institute of Technology
Novemeber, 2014
iv
Acknowledgement
First and foremost, we are extremely thankful to our project supervisor Dr Prasad Bokil for his
invaluable support and guidance which made our project work productive, stimulating and
enjoyable. We feel honoured to have worked with him, and owe a great debt of gratitude for
his patience and inspiration.
We are grateful to the Department of Design, IIT Guwahati, for providing us with a rich learning
and working environment, without which many of our ideas probably would have not come to
fruition.
Last, but never the least, we would like to thank our family members who have been a constant
source of motivation for everything good that we have attempted to do so far in life.
______________________ _____________________
Linu George Rupam Das
11020516 11020529
v
ABSTRACT
In spite in advances in technologies for working with data, people spend undue amount of time
in understanding the data and manipulating it into holistic visualization. Data visualiza t ion
software for complex dataset such as in cancer genomics (which we have taken as case study)
are not able to provide effective visualization for the users. Identification and characteriza t ion
of cancer detection are important areas of research that are based on the integrated analysis of
multiple heterogeneous genomics datasets. In this report, we review the key issues and
challenges associated with cancer genomics through exploration of data visualiza t ion
techniques, interactions and methods, which will in-turn advance the state of the art.
1
1. INTRODUCTION
Data visualization is a concept that is used for the representation of raw and crude form of
information in visual form which helps the user to perceive as well as understand the given
scenario. We proceeded with a very basic of the topic and tried to explore each and every
possibilities of the domain of visualization. As we were going through the references and
sources of our project we found out there is lack of coherence between the data that being
presented and the data that is being showcased. The raw cancer data that has been provided to
us was in a much disorganized manner, which made it very tough to decipher meaningful
visualization out of that datasets.
The advent of high-throughput technologies have given rise to fruitful research on large-scale
genomic data analysis. Since genomic data is usually very large and complex, visualiza t ion
tools are always essential for data examination and interpretation [12]. As we know that the
amount of bio-medical data available on the Web grows exponentially with time. The resulting
large volume of data makes manual exploration very tedious. Moreover, the velocity at which
this data changes and the variety of formats in which bio-medical data is published and
documented in the Web makes it difficult to access them in an integrated form. Finally, the lack
of an integrated vocabulary makes querying more difficult [13].
The data is the key component of the visualization and it plays a large role in determining the
effectiveness of the visualization tool. This large set of unorganized clattered data lead to
accumulation of huge chunk of information which should be dealt with the elegant techniques
of visual graphics and visualization. This can help in comprehension of huge amounts of data
into a well-organized form by inducing the user to think about the data and encourage the eye
to compare different pieces of data.
To overcome this problem we went through many of the resources that are already there for
these researches and found out similarities and dissimilarities, which led to the documentat ion
of large set of trends and possibilities which has been documented in our later chapters.
1.1 Objectives of the project
1. To find out the recent trends in the domain of data visualization.
2. Use these knowledge to brainstorm various possibilities in the domain of cancer.
3. Explore new visualization mediums to depict the same information.
2
4. Generate concept low fidelity prototypes for that cancer data set.
5. Explain the scope of work that can be done in later stages.
1.2 Need of data visualization
Visualization is the graphical presentation of information, with the goal of providing the viewer
with a qualitative understanding of the information contents. It is also the process of
transforming objects, concepts, and numbers into a form that is visible to the human eyes. When
we say “information”, we may refer to data, processes, relations, or concepts.
It’s also about understanding ratios and relationships among numbers. Not about understand ing
individual numbers, but about understanding the patterns, trends, and relationships that exist in
groups of numbers.
To see and understand pictures is one of the natural instincts of human, and to understand
numerical data it takes years’ worth of training from schools, and even so, a lot of people are
still not good with numerical data. Data visualization shifts the load from numerical reasoning
to visual reasoning. Getting information from pictures is far more time-saving than looking
through text and numbers – that’s why many decision makers would rather have information
presented to them in graphical form, as opposed to a written or textual form [1].
As we go through various forms of visualization we also discovered that data visualization is
not scientific visualization. Scientific visualization uses animation, simulation, and
sophisticated computer graphics to create visual models of structures and processed that cannot
otherwise be seen, or seen in sufficient detail [1].
While data visualization is a way of communicating and presenting the display presentation of
the given set of data is a way that helps to minimize the cognitive load of that human who is
trying to understand data.
2. LITERATURE REVIEW
3
2.1 Data visualisation: An overview
• In spite of advances in technologies for working with data, analysts still spend an
inordinate amount of time
• The diagnosing data quality issues and manipulating data into a usable form.
• This process of ‘data wrangling’ often constitutes the most tedious and time-consuming
aspect of analysis.
• Though data cleaning and integration are longstanding issues in the database
community, relatively little research has explored how interactive visualization can
advance the state of the art.
• Data visualization is a quite new and promising field in computer science and uses
computer graphic effects to reveal the patterns, trends, relationships out of datasets.
2.2 Background
The history of visualization was shaped to some extent by available technology and by the
pressing needs of the time, they include: primitive paintings on clays, maps on walls,
photographs, table of numbers (with rows and columns concepts), these are all some kind of
data visualization – although we may not call them under this name at that time. These
eventually led to new opportunities for the analysis and communication of data using
visualization. The current scenario is very encouraging which helps us to use and practice
various possibilities of desktop screens, mouse, and keyboard-based systems that are making
them increasingly attractive [2]. Talking about graphics, we should remind what is called
graphical entities and attributes. They are the following variables which decides what data
should be categorize to which part [1]:
Entity: point, line(curve), polyline, glyph, surface, solid, image, text