Dhaanish Ahmed College of Engineering Padappai Department of Information Technology Sub Code & Name : CS1004 – Datawarehousing and Mining UNIT 1 2 mark 1.Define warehousing? 2.Distinguish between data warehouse and data mart? 3.List out the components of data warehouse? 4.Define data cube? Give an example? 5.What is fact table and dimension table? 6.Compare OLAP and OLTP? 7.What are meta data? 8.What is the need for OLAP? 9.Define star schema, snowflake schema and fact constellation? 10.What is starnet query model? 11.Write down the applications data warehousing. 12. When is data mart appropriate? 13. What is concept hierarchy? give an example. 14.What are the uses of statistics in data mining? 15.Name some advanced database systems? 16.Name some specific application oriented datdabases? 17.Define Relational Database? 18.Define Transactional Database? 19.Define Spatial database? 20.What is Temporal Database? 21.What is Time series database? 22.What is legacy database?
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Dhaanish Ahmed College of Engineering
Padappai
Department of Information Technology
Sub Code & Name : CS1004 – Datawarehousing and Mining
UNIT 1
2 mark
1.Define warehousing?
2.Distinguish between data warehouse and data mart?
3.List out the components of data warehouse?
4.Define data cube? Give an example?
5.What is fact table and dimension table?
6.Compare OLAP and OLTP?
7.What are meta data?
8.What is the need for OLAP?
9.Define star schema, snowflake schema and fact constellation?
10.What is starnet query model?
11.Write down the applications data warehousing.
12. When is data mart appropriate?
13. What is concept hierarchy? give an example.
14.What are the uses of statistics in data mining?
15.Name some advanced database systems?
16.Name some specific application oriented datdabases?
17.Define Relational Database?
18.Define Transactional Database?
19.Define Spatial database?
20.What is Temporal Database?
21.What is Time series database?
22.What is legacy database?
23.What is learning?
24.Why machine learning is done?
25.Give the Components of a learning system?
26.Give some factors for evaluating performance of learning system?
27.What are the steps in data mining process?
28.Define datamart?
29.List the merits of data modeling tool?
30.What is data warehouse performance issue?
31.What are the types of performance issue?
32.Why do you need data ware house life cycle process?
33.Merits of data ware house?
34.What are the steps in data ware house life cycle process?
35.What are the Characteristics of data ware house ?
36.List some of the data ware house tools?
37.What is end user data access tool?
38.Define molap?
39.Define Holap?
40.Define Rolap?
41.What is ad hoc query tool?
42.List few of the data mining applications?
43.Define Supervised learning scheme?
44. .Define UnSupervised learning scheme?
45.What is the necessity of data mining?
46.Draw the flow chart of database Evolution?
47.Define OLTP?
48.Expand OLAP?
49. Expand OLTP?
50.What is data stream?
51.What is a data tomb?
52.What is data archarology?
53.What is data dredging?
54.What is KDD?
55.What is data warehouse server?
56.Point out few Advanced databases?
57.What is an sttribute?
58.Define Tuple?
59.What is SQL?
60.Define heterogenous database?
61.Define WWW?
62.List the applications of WWW?
63.What is web log mining?
64.Define Outlier Analysis?
65.Define Evolution Analysis?
66.Define Technical meta data?
67. Define Businessl meta data?
68.What is a Distributive measure?
69. What is an Algebraic measure?
70. What is a Holistic measure?
71.What is Roll up data?
72.Define Drill down operation?
73.Define slice?
74. Define Dice?
75.What is pivot?
76. Define Drill within operation?
77. Define Drill across operation?
78. Define Drill through operation?
79.What is top down view?
80. What is Data source view?
81. What is Data ware house view?
82. What is Business query view?
83.Define Data Cube?
84.What is a cube operator?
85.What is No Materialization?
86. What is Full Materialization?
87. What is Partial Materialization?
88.merits of bitmap indexing?
89. merits of join indexing?
90.Define MDDB?
91.What is Query driven approach?
92.What is Update driven Approach?
93.What is a Dimension?
94.Define Fact?
95.Define Cuboid?
96.Define Aggregation?
97.What is a statistical database?
98.What is CRM?
99.What is the use of Load ?
100.Define Refresh?
4 mark
1.What is the difference between view and materialized view?
2. Explain the Difference between star and snowflake schema?
3. Mention the various tasks to be accomplished as part of data pre-processing.
4. Mention the advantages of Hierarchical clustering?
5.What is the difference between view and materialized view?
6. Explain the Difference between star and snowflake schema?
7. What is Data Warehouse Metadata?
8. What is Dimensionality Reduction?
9. What is Concept Description?
10..Difference between Supervised and UnSupervised learning scheme?
11.Discuss Join Indexing?
12. DiscussBitmap Indexing?
13. Explain the steps in knowledge discovery?
14.Give short notes on database Evolution?
15. Give short notes on dataware house?
16.Explain data stream?
17.Describe KDD?
18.Explain Relational database?
19.Give the importance ofER model?
20.What are the functions of relational database?
21.How can be a customer analyzed by data mining system?
22.Differentiate data ware house and data mart?
23. Differentiate data ware house vs Heterogenous DBMS?
24. Differentiate data ware house vs Operational DBMS?
25.Discuss Star schema?
26.Discuss Snowflake Schema?
27.Discuss Fact Constellation?
28.Discuss a Starnet query model?
29.Describe top down view?
30. Describe Data source view?
31. Describe Data ware house view?
32. Describe Business query view?
33.Explain Enterprise ware house?
34.Explain Data mart?
36.Describe Virtual ware house?
37.Expalin ROLAP?
38. Expalin MOLAP?
39. Expalin HOLAP?
40. Expalin Specialized SQL server?
41.Discuss efficient processing of OLAP queries?
42.Compare OLAM and OLAP?
43.Explain the Curse of Dimensionality?
44.Define Full and Ice berg cube?
45. Define closedl and shell cube?
46.Explain Knowledge mining from data?
47.Explain Knowledge Extraction?
48.Describe data analysis?
49.Explain Data archaeology?
50.Discuss data dredging?
51.Describe Database?
52.Explain Information Repository?
53.Explain Knowledge base?
54.Discuss Data mining Engine?
55.Describe Pattern Evaluation Module?
56.ExplainUser Interface?
57.Discuss data discrimination?
58.Explain Mining different kinds of knowledge in DB?
59.Discuss Interactive mining of Knowledge at multiple levels of abstraction?
60.Describe Incorporation of background knowledge?
61. Discuss data mining Query language?
62.E xplain ad hoc Data mining?
63. Discuss Presentation and visualization of data mining results?
64. Describe Handling Incomplete data?
65. Explain Scalability of data mining algorithms?
66. Describe Efficiency of data mining?
67.Explain Parallel mining algorithm?
68. Explain Distributed mining algorithm?
69. Explain Incremental mining algorithm?
70.Discuss Spreadsheets?
71.Describe Dimension table?
72.Explain Fact table?
73.Give the cube definition statement?
74.Interpret the measures for data cube?
75.Discuss Schema Hierarchy?
8 mark
1.What is over fitting and what can you do to prevent it?
2. In classification trees, what are surrogate splits, and how are they used?
3. What is the objective function of the K-Means algorithm?
4.What are the difference between three main types of data usage: information processing,
analytical processing and data mining?
5.Discuss the motivation behind OLAP mining.
6. Discuss the various types of metadata?
7. Categorize OLAP tools?
8.Explain various data mining issues?
9.Describe Indexing technique of OLAP with example?
10. Give short notes on database Evolution with a neat flow chart?
11.Give the importance of data mining?
12.Give the functions of OLAP?
13. What are the major components of data mining?
14.Give the importance of pattern evaluation model?
15.How decision making is performed in data warehouse?
16.Is data warehouse suited for OLAP, Explain in Brief?
17.Explain object relational database in detail?
18.What is raster format? Explain its use with an example?
19.Describe the role of DBMS in data mining?
20.Explain Information Delivery System?
21.Discuss Access Tools?
22.Explain Conceptual modeling of data ware house?
23.Explain Three categories of measures?
24.Explain Business Analysis Framework?
25.Discuss 4 views regarding the design of a data ware house?
25.Describe data ware house design process?
26.Explain 3 data ware house Models?
27.Describe Efficient Computation of Data cube?
28.Discuss data ware house Back end tools and utilities?
29.Explain Data ware house applications?
30.Describe the architecture of OLAM?
31.Discuss in detail about the Lattice of Cuboid?
32.How many cuboids are there in n dimensional data cube?
33. Describe Full cube?
34. Describe closed cube?
35. Describe Ice berg cube?
36. Describe shell cube?
37.Explain Significance Constraint?
38. Explain Probe Constraint?
39. Explain Gradient Constraint?
40.Discuss on Information System?
41.Describe Temporal Database?
42. Describe Time seriesl Database?
43. Describe Sequence Database?
44. Describe Spatial Database?
45. Describe SpatioTemporal Database?
46. Describe Textl Database?
47. Describe Mu;timedia Database?
48.Give short notes on Heterogenous DBMS?
49. Give short notes on Legacy Database?
50.Explain Classification of Data mining Systems?
16 mark
1.Enumerate the building blocks of a data warehouse. Explain the
importance of metadata in a data warehouse environment. What are the
challenges in metadata management?
2. Distinguish between the entity-relationship modeling technique
and dimensional modeling. Why is the entity-relational modeling
technique not suitable for the data warehouse?
3. Create a star schema diagram that will enable FIT-WORLD GYM
INC. to analyze their revenue. The fact table will include – for every
instance of revenue taken – attribute(s) useful for analyzing
revenue. The star schema will include all dimensions that can be
useful for analyzing revenue. Formulate query: “Find the
percentage of revenue generated by members in the last year”.
How many cuboids are there in the complete data cube?
4.Briefly compare the following concepts. Explain your points with an example
(i) Snowflake schema, fact constellation, star net query model
5. ) Discuss the typical OLAP operations with an example.
6.Discuss how computations can be performed efficiently on data cubes.
(ii) Write short notes on data warehouse meta data.
7. Describe the multidimensional data model.How it is used in data warehousing?
8. Explain the architecture of data warehouse with a neat sketch?
9. Explain the operations performed on data warehouse with examples?
10. Distinguish between data mining and data warehousing?
11. Discuss various data mining issues with some examples?
12.Explain Data mining Functionalities?
13.Explain different types of data repositories on which mining can be performed?
14.What are the major components of data mining? Explain with a neat Flowchart?
15.Explain SQL inDetail?
16.Discuss in detail OLAP server Architectures?
17.Explain data ware house Implementation?
18.Describe Selected Computation of Cuboids?
19.Explain Efficient methods for data cube computation?
20.Explain Optimization technique?
21.Discuss Multiway array aggregation for full cube computation?
22.Explain BUC Algorithm?
23.Discuss Star cubing?
24.Write an algorithm for Shell Fragment computation?
25.Discuss Constrained Gradient Analysis Data cube?
UNIT 2
2 mark
1.What is the need of data preprocessing?
2.Define smoothing and Binning?
3.What is the need for discretization in data mining?
4.What is concept hierarchy? Give an example?
5.Define DQML?
6.What are functional components of GUI in data mining?
7.Define task relevant data?
8.What is meant by concept description?
9.What is data generalization?
10.How to perform class comparison?
11. Define Data Mining.?
12.What is the main goal of statistics?
13.What are the factors to be considered while selecting samples in statistics?
14.Define data cleaning?
15.Define Data integration?
16.Define Data Selection?
17. Define Data Transformation?
18. What is pattern evaluation?
19.What is Knowledge Presentation?
20.List the steps in preprocessing?
21.What is visualization?
22.Name some conventional visualization techniques?
23Give the features included in modern visualization techniques?
24.Define Conventional visualization ?
25.Define Spatial visualization ?
26.Define Descriptive Data mining?
27.What is Predictive data mining?
28.What is data generalization?
29.Define attribute oriented induction?
30.What is Jack knife?
31.What is Bootstrap?
32.Give the views of Statistical approach?
33.What are the assumptions of Statistical approach?
34.What is the use of Probablistic graphical model?
35.Give the Importance of of Probablistic graphical model?
36.Define Deterministic model?
37.Define System?
38.Define Model?
39.How to choose the best model?
40.Principles of Qualitative Formulation?
41.What is linear regression?
42.State the types of linear model?
43.What is the use of linear model?
44.What are the goals of time series analysis?
45.What is smoothing?
46.What is lag?
47.What do you mean by concept hierarchy?
48.Define inconsistency cleaning?
49.What is Column level cleaning?
50.Define Descriptive data summarization?
51.What is a missing value?
52.Define Normalization?
53.What is attribute subset selection?
54.Define Dimensionality reduction?
55. Define Numerosity reduction?
56.What is a Central Tendency?
57.Define mean?
58.Define Mode?
59.Define Median?
60.What is a mid range?
61.What is Dispersion of data?
62.Define IQR?
63.What is variance?
64.Define range?
65. List the data transformation operations?
66.Define Quartiles?
67.What is weighted arithmetic mean?
68.What is Unimode?
69. What is Bimode?
70. What is Trimode?
71.Define Multimode?
72.Give the empirical relation for unimodal frequency?
73.What is Dispersion?
74.Define Standard Deviation?
75.What is 5 number summary?
76.What is a boxplot?
77.Define first Quartile?
78.Define Third Quartile?
79.What are Whiskers?
80.Give the formula for standard deviation?
81. Give the formula for variance?
82.What is discrepancy detection?
83.Define Unique rule?
84.Define Consecutive rile?
85.Define Null rule?
86.What is a data scrubbing tool?
87.What is data auditing tool?
88.Define data migration tool?
89.What is an ETL?
90.Define Redundancy?
91.What is correlation analysis?
92.Define Correlation coefficient?
93.Define attribute construction?
94.Define Discrete wavelet Transform?
95.Define Sampling?
96.What is comparison?
97.What is Discrimination?
98.Define attribute removal?
99.What is data focusing?
100.What is attribute generalization control?
4 mark
1.Distinguish between concept description and OLAP?
2.What is quantitative rule?
3.What is attribute relevance analysis?
4.What do you mean by attribute oriented induction?
5.List out the methods for implementing class comparison?
6. Write a short note on regression?
7. Write a short note on correlation?
8.Discuss Parametric methods?
9.Explain Non Parametric methods in detail?
10.Explain Data Generalization?
11.Describe Concept Hierarchy generation?
12.Explain Data mining Primitives?
13.Explain attribute oriented induction?
14.Discuss on Descriptive data summarization?
15.Explain Histogram?
16.Discuss Quantile plot?
17.Describe Q-Q plot?
18.Explain Scatter plot?
19.Discuss Loess curve?
20.Describe Missing values?
21.Explain Noisy data?
22.Describe Binning?
23.Explain Regression?
24.Discuss Clustering?
25.Describe the Mean,median,mode,mid range?
26.Explain IQR ,variance, quartiles?
27.Discuss Discrepancy detection?
28.Explain data scrubbing tools?
29.Discuss data Auditing tools?
30.Explain data migration tools?
31.Discuss Entity identification problem?
32.Explain Correlation analysis?
33.Explain smoothing?
34.Describe Aggragation?
35.Discuss Generalization?
36.Explain Normalization?
37.Describe Attribute Construction?
38.Discuss Min max Normalization?
39.Explain z-score Normalization?
40.Discuss Normalization by decimal scaling?
41.Describe Data cube aggragation?
42.Explain attribute subset selection?
43.Discuss on Dimensionality reduction?
44.Explain Numerosity reduction?
45.Explain discretization?
46.Explain concept hierarchy generation?
47.Describe stepwise forward selection?
48. Describe stepwise backward Elimination?
49. Describe the combination of forward selection and backward elimination?
50.Discuss on decision tree induction?
51.Explain DFT?
52.Explain Hierarchical pyramid algorithm?
53.Describe Orthonormal?
54.Give short notes on PCA?
55.Discuss Log Linear Models?
56.Describe Equal Width histogram?
57. Describe Equal Frequency histogram?
58.What is V-Optimal?
59.Describe MaxDiff Histogram?
60.What are the 3 data clusters?
61. Describe Multidimensional histogram?
62.Define Centroid distance?
63.Describe Multidimensional index trees?
64.Explain SRSWOR?
65.Describe SRSWR?
66.Explain Cluster sample?
67.Discuss Stratified Sample?
68.List the merits of Sampling?
69.Discuss Top down Discretization?
70.Explain Splitting?
71. Discuss bottom up Discretization?
72.Discuss Merging?
73.Draw a flow chart for stepwise forward selection?
74Draw a flow chart for stepwise backward Elimination?
75.Draw a flow chart for the combination of forward selection and backward elimination?
8 mark
1..Mention the various tasks to be accomplished as part of data pre-processing.?
2. What is over fitting and what can you do to prevent it?
3.Explain the 5 steps in the Knowledge Discovery in Databases (KDD)
process.
4.Discuss in brief the characterization of data mining algorithms.
5.Discuss in brief important implementation issues in data mining.
6. List and discuss the various data mining primitives?
7. Distinguish between statistical inference and exploratory data analysis.?
8. Write a short note on machine learning. What is supervised and unsupervised learning?
9. Write a short note on regression and correlation?
10. Discuss on Descriptive data summarization with examples?
11.Explain Graphic display of basic descriptive data summaries?
12.Explain data cleaning?
13Describe data cleaning as a process?
14.Explain the measures of Central tendency?
15.Describe the measures of Dispersion of data?
16.Explain about data integration?
17. Describe Attribute Construction with example?
18.Discuss Min max Normalization with example?
19.Explain z-score Normalization with example?
20.Discuss Normalization by decimal scaling with example?
21.Discuss about data transformation?
22.Explain About data reduction?
23.Discuss the basic heuristic methods of attribute subset selection?
24.Explain Wavelet Transforms?
25.Explain Principle Component Analysis?
26. .Explain Histogram with examples?
27.Discuss Quantile plot with examples?
28.Describe Q-Q plot with examples?
29.Explain Scatter plot with examples?
30.Discuss Loess curve with examples?
31.Describe Missing values with examples?
32.Explain Noisy data with examples?
33.Describe Binning with examples?
34.Explain Regression with examples?
35.Discuss Clustering with examples?
36.Apply binning method for data smoothing for 4,8,15,21,21,24,25,28,34?
37.Discuss that data integration is the detection and resolution of data value conflicts?
38.How can we find the good subset of original attributes?
39.Justify Wavelet Transforms can be applied to multidimensional data?
40.How can we reduce the data volume by choosing alternative smaller forms of data
representation?
41.How are the buckets determined and the attribute values partitioned in Histogram?
42.Explain Discretization by Intuitive partitioning?
43.Explain x2 merging?
44.Explain 3-4-5 rule with an example?
45.Describe the specification of a partial ordering of attributes explicitly at schema level by users
or experts?
46.Explain the specification of a portion of a hierarchy by explicit data grouping?
47.Discuss on the specification of a set of attributes not of their partial
Ordering?
48.Discuss issues to consider during data integration?
49.Data quality can be accessed in terms of accuracy , completeness and consistency. Propose
other two dimensions of data quality?
50.Suppose a group of 12 sales price records has been sorted as follows
5,10,11,13,15,35,50,55,72,92,204,215
Partition them into 3 bins by
i)Equidepth partition
ii)Equal width partitioning
iii)Clustering
16 mark
1.Explain the need and steps involved in data preprocessing?
2.List out the primitives for specifying a data mining task?
3.Describe how concept hierarchies are useful in data mining?
4.What are the various issues addressed during data integration?
5.Write in detail about attribute oriented induction with algorithm?
6.Describe the various descriptive statistical measures for data mining?
7.Explain various methods of data cleaning in detail?
8. Give an account on data mining Query language?
9. How is Attribute-Oriented Induction implemented? Explain in detail.?
10. Write and explain the algorithm for mining frequent item sets without candidate generation.
Give relevant example.?
11. With relevant examples discuss the role of statistics in data mining?
12. Enumerate and discuss various statistical techniques and methods for
data analysis?
13. For class characterization, what are the main differences between a data cube based
implementation and a relational implementation such as attribute-oriented induction?
14.Explain Smoothing Techniques?
15.Explain Data Transformation in detail?
16.Explain Normalization in Detail?
17.Discuss Data Reduction in detail?
18.Describe Parametric and Non Parametric methods in detail?
19. Explain Data Generalization and Concept Hierarchy generation?
20.Describe the Alternative method for Data generalization snd Concept Descrip[tion?
21.Given 1dimensional data set X={-5,0,23.0,17.6,9.23,1.11} normalize the data set using i)Min-
Max Normalization[0,1]
ii) Min-Max Normalization[-1,1]
iii)Standard Deviation Normalization
22.Explain Designing the GUI based On DMQL?
23.A data set for analysis includes X={7,12,5,18,9,13,12,19,7,12,12,13,3,4,5,13,8,7,6} Find
Mean, median, mode and Standard Deviation for X?
24.Give the Graphical summarization of the data set X using boxplot representation. Find