Unit 2 Visualization Data visualization is the display of information in a graphic or tabular format. Successful visualization requires that the data (information) be converted into a visual format so that the characteristics of the data and the relationships among data items or attributes can be analyzed or reported. OR Visualization is the conversion of data into a visual or tabular format so that the characteristics of the data and the relationships among data items or attributes can be analyzed or reported. Visualization of data is one of the most powerful and appealing techniques for data exploration. Humans have a well developed ability to analyze large amounts of information that is presented visually. Goals: to detect general patterns and trends o to detect outliers and unusual patterns Ex: Graphs and tables
This word file is an introduction to data visualization which is apart of data mining.this serves as a good source to undersatnd the scope of data visualization
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Unit 2
Visualization
Data visualization is the display of information in a graphic or tabular format. Successful visualization requires that the data (information) be converted into a visual format so that the characteristics of the data and the relationships among data items or attributes can be analyzed or reported. OR
Visualization is the conversion of data into a visual or tabular format so that the characteristics of the data and the relationships among data items or attributes can be analyzed or reported.
Visualization of data is one of the most powerful and appealing techniques for data exploration.
Humans have a well developed ability to analyze large amounts of information that is presented visually.
Goals: to detect general patterns and trends
o to detect outliers and unusual patterns
Ex: Graphs and tables
Data from: satellite photos, sonar measurements, surveys, or computer simulations
2.1 Motivation for visualization
People can quickly absorb large amounts of visual information· and find patterns in it.
To make use of the domain knowledge that is "locked up in people's heads. Accelerates the identification of hidden patterns in data
o “A picture is worth a thousand words”
Applications of Data Visualization Techniques Retail Banking Government Insurance Health Care and Medicine Telecommunications Transportation Capital Markets Asset Management
Example from software company:
Offers several Data Mining solutions, depending on users need. IBM Information Warehouse Solutions IBM Visualizer Red Brick
Transform data for a consistent and understandable view
Distribute data where needed
Provide high-speed access to the data
Leading company for large-scale data mining Data spread across mutliple databases Data spread across processors for faster queries
Example: Sea Surface Temperature
Consider Figure 3.2, which shows the Sea Surface Temperature (SST) in degrees Celsius for July, 1982. This picture summarizes the information from approximately 250,000 numbers and is readily interpreted in a few seconds. For example, it is easy to see that the ocean temperature is highest at the equator and lowestat. the poles.
Representation: Mapping Data to Graphical Elements
The first step in visualization is the mapping of. information to a visual format;i.e., mapping the objects, attributes, and relationships in a set of information to visual objects, at tributes, and relationships. That is, data objects, their attributes, and the relationships among data objects are translated into graphical elements such as points, lines, shapes, and colors.
a key challenge of visualization is to choose a technique that makes the relationships of interest easily observable.
Important points: representation Selection arrangement
Tech niques
Visualization techniques are often specialized to the type of data being analyzed. Indeed , new visualization techniques and approaches, as well as specialized variations of existing approaches, are being continuously created, typically in response to new kinds of data and visualization tasks.
classify visualization techniques.
number of attributes involved (1, 2, 3, or many) whether the data has some special characteristic, such as a hierarchical or
graph structure. the type of attributes involved.
o the type of application: scientific,statistical, or information visualization.
One way to classify categories ofvisualization is based on type of data:– One dimensional– Two dimensional– Three dimensional– Multi-dimensional– Hierarchical– Graph
This chapter use three categories of VT: visualization techniques of a small number of attributes visualization of data with spatial and/ or temporal attributes, visualization of data with many attributes.
visualization techniques of a small number of attributes
.Stem and Leaf PlotsH istogramsT wo-Dimen siona l HistogramsBox PlotsPie ChartPercentile Plots and Empirical Cumulative Distribut ion FunctionsScatter PlotsVisua liz ing Spa t io- t emporal DataContour PlotsSurface Plots
Vector Field P lotsLower-Dimensional SlicesAnimation
Stem and Leaf Plots
Stem and leaf plots can be used to provide insight into the distribution of one-dimensional integer or continuous data. For the simplest type of stem and leaf plot, we split the values into groups, where each group contains those values t hat are the same except for the last digit. Each group becomes a stem, while the last digits of a group are the leaves. Hence, if the values are two-digit integers,e.g., 35, 36, 42, and 51, then the stems will be the high-order digits, e.g., 3,4, and 5, while the leaves are the low-order digits, e.g., 1, 2, 5, and 6. By plotting the stems vertically and leaves horizontally, vye can provide a visual representation of the distribution of the data.
The set of integers shown in Figure 3.4 is the sepal length incentimeters (multiplied by 10 to make the values integers) taken from the Irisdata set. For convenience, the values have also been sorted.The stem and leaf plot for this data is shown in Figure 3.5. Each number inFigure 3.4 is first put into one of the vertical groups-4, 5, 6, or 7- accordingto its ten's digit. Its last digit is then placed to the right of the colon. Often,especially if the amount of data is larger, it is desirable to split the st,ems.For example, instead of placing all values whose ten's digit is 4 in the same"bucket," the stem 4 is repeated twice; all values 40-44 are put in the bucketcorresponding to the first stem and all values 45-49 are put in the bucketcorresponding to the second stem. This approach is shown in the stem and
leaf plot of Figure 3.6. Other variations are also possible.
Figure 3.6. Stem and leaf plot for the sepal length from the Iris data set when buckets correspondingto digits are split.
Histograms Stem and leaf plots are a type of istogram, a plot that displaysthe distribution of values for attributes by dividing the possible valuesinto bins and showing the number of objects that fall into each bin.
HistogramUsually shows the distribution of values of a single variable.Divide the values into bins and show a bar plot of thenumber of objects in each bin.The height of each bar indicates the number of objects ifall bins are of same width.If bins are of dierent width, then often it is the *area* ofthe bar that indicates the number of objects in that bin.Shape of histogram depends on the number of bins.
Example 3.8. Figure 3.7 shows histograms (with 10 bins) for sepal length,sepal width, petal length, and petal width. Since the shape of a histogramcan depend on the number of bins, histograms for the same data, but with 20bins, are shown in Figure 3.8.
T wo-Dimen siona l Histograms Two-dimensional histograms are also possible.Each attribute is divided into intervals and the two sets of intervals definetwo-dimensional rectangles of values.Exam ple 3.9. Figure 3.9 shows a two-dimensional histogram of petal lengthand petal width. Because each a ttribute is split into three bins, there are ninerectangular two-dimensional bins. The height of each rectangular bar indicatesthe number of objects (flowers in this case) that fall into each bin. Most ofthe flowers fall into only three of the bins-those along the diagonal. It is notpossible to see this by looking at the one-dimensional distributions.