Top Banner
1 Bibliomining: An Introdu ction
49

1 Bibliomining: An Introduction. 2 Outline Introduction Bibliomining Process Example Applications Placing Bibliomining in Context A Research Agenda to.

Jan 11, 2016

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Bibliomining: An Introduction. 2 Outline Introduction Bibliomining Process Example Applications Placing Bibliomining in Context A Research Agenda to.

1

Bibliomining: An Introduction

Page 2: 1 Bibliomining: An Introduction. 2 Outline Introduction Bibliomining Process Example Applications Placing Bibliomining in Context A Research Agenda to.

2

Outline

• Introduction• Bibliomining Process• Example Applications• Placing Bibliomining in Context• A Research Agenda to Advance Bibliomining

Page 3: 1 Bibliomining: An Introduction. 2 Outline Introduction Bibliomining Process Example Applications Placing Bibliomining in Context A Research Agenda to.

3

Origins and Definition of Bibliomining

• ‘‘bibliometrics’’ + ‘‘data mining’’– Bibliometrics focuses on the creation of works– Data mining (Web usage mining) focuses on the access of works

• The application of data mining and bibliometric tools to data produced from library services

• Gain a better understanding of library user communities– Frequencies and aggregate measures hide underlying patterns

• The combination of data mining, bibliometrics, statistics, and reporting tools used to extract patterns of behavior-based artifacts from library systems for aiding decision-making or justifying services

Page 4: 1 Bibliomining: An Introduction. 2 Outline Introduction Bibliomining Process Example Applications Placing Bibliomining in Context A Research Agenda to.

4

Bibliometrics

• Traditional bibliometrics is based on the quantitative exploration of document-based scholarly communication

• Data for bibliometrics– Works: authors, collections– Connections: citations, authorship, common terms, other aspects of t

he creation and publication process

• Allow the researchers to understand the context in which a work was created, the long-term citation impact of the work and the differences between fields in regard to their scholastic output patterns

Page 5: 1 Bibliomining: An Introduction. 2 Outline Introduction Bibliomining Process Example Applications Placing Bibliomining in Context A Research Agenda to.

5

Data for Bibliometrics

Page 6: 1 Bibliomining: An Introduction. 2 Outline Introduction Bibliomining Process Example Applications Placing Bibliomining in Context A Research Agenda to.

6

Bibliometrics (Cont.)

• Frequency-based, Visualization, data mining– Frequency of authorship in a subject, commonality of words used, an

d discovery of a core set of frequently cited works – Integrating the citations between works allows for very rich exploratio

n of relations between scholars and topics– Linkages between works are used to aid in automated information re

trieval and visualization of scholarship and the social networks between those involved with the creation process

– Many newer bibliometric applications involve Web-based resources and hyperlinks that enhance or replace traditional citation information

Page 7: 1 Bibliomining: An Introduction. 2 Outline Introduction Bibliomining Process Example Applications Placing Bibliomining in Context A Research Agenda to.

7

Social Network

Page 8: 1 Bibliomining: An Introduction. 2 Outline Introduction Bibliomining Process Example Applications Placing Bibliomining in Context A Research Agenda to.

8

User-based Data Mining

• One popular area: the examination of how users explore Web spaces (Web usage Mining)– Focus on accesses of different Web pages by a particular user (or IP

address)– Patterns of use are discovered through data mining and used to

personalize the information presented to the user or improve the information service

• In user-based data mining, the links between works come from a commonality of use– If one user accesses two works during the same session, for

example, then if another user views one of those works then the other might also be of interest

Page 9: 1 Bibliomining: An Introduction. 2 Outline Introduction Bibliomining Process Example Applications Placing Bibliomining in Context A Research Agenda to.

9

Data for User-Based Data Mining

Links between works that result from the users

Page 10: 1 Bibliomining: An Introduction. 2 Outline Introduction Bibliomining Process Example Applications Placing Bibliomining in Context A Research Agenda to.

10

Data for Anonymized Community-Based Web Usage Mining

Demographic Surrogate

Page 11: 1 Bibliomining: An Introduction. 2 Outline Introduction Bibliomining Process Example Applications Placing Bibliomining in Context A Research Agenda to.

11

Bibliomining Process

Page 12: 1 Bibliomining: An Introduction. 2 Outline Introduction Bibliomining Process Example Applications Placing Bibliomining in Context A Research Agenda to.

12

Overview

• Determining areas of focus• Identifying internal and external data sources• Collecting, cleaning, and anonymizing the data into a data w

arehouse• Selecting appropriate analysis tools• Discovery of patterns through data mining and creation of re

ports with traditional analytical tools• Analyzing and implementing the results

Page 13: 1 Bibliomining: An Introduction. 2 Outline Introduction Bibliomining Process Example Applications Placing Bibliomining in Context A Research Agenda to.

13

Determining Areas of Focus

• Might come from a specific problem in the library or may be a general area requiring exploration and decision-making

• Directed data mining: problem-focused– Ex. Budget cuts have reduced the staff time for contacting patrons

about delinquent materials. Is there a way to predict the chance patrons will return material once it is one week late in order to prioritize our calling lists?

• Undirected data mining: consider general topical area– Ex. How are different departments and types of patrons using the

electronic journals?– May produce an overwhelming number of patterns to explore for

validation– should be considered only when a strong data warehouse is in place

Page 14: 1 Bibliomining: An Introduction. 2 Outline Introduction Bibliomining Process Example Applications Placing Bibliomining in Context A Research Agenda to.

14

Identifying Data Sources

• The bibliomining process requires transactional, non-aggregated, low-level data

• Privacy issue?• Internal data sources are those already within the library sys

tem– Patron database, transactional data, Web server logs

• External data sources– Demographic information related to a specific ID number that is locat

ed in the computer center or personnel management system– Demographic information for zip codes from census data

Page 15: 1 Bibliomining: An Introduction. 2 Outline Introduction Bibliomining Process Example Applications Placing Bibliomining in Context A Research Agenda to.

15

Data for Bibliomining

Page 16: 1 Bibliomining: An Introduction. 2 Outline Introduction Bibliomining Process Example Applications Placing Bibliomining in Context A Research Agenda to.

16

Conceptual Framework for Data Types in the Bibliomining Data Mining

Page 17: 1 Bibliomining: An Introduction. 2 Outline Introduction Bibliomining Process Example Applications Placing Bibliomining in Context A Research Agenda to.

17

A Framework for the Data

• Data about a work– Three kinds of fields

• Fields that were extracted from the work (like title or author)• Fields that are created about the work (like subject heading)• Fields that indicate the format and location of the work (like URL

or collection)– Come from a MARC record, Dublin Core information, or CMS– Can also connect into bibliometric information, such as citations or li

nks to other works• May require extraction from the original source (in the case of dig

ital reference) or linking into a citation database– Challenge: no article level usage reports

Page 18: 1 Bibliomining: An Introduction. 2 Outline Introduction Bibliomining Process Example Applications Placing Bibliomining in Context A Research Agenda to.

18

A Framework for the Data (Cont.)

• Data about the user– Demographic surrogate– Other fields that come from inferences about the user: zip code,

location/department/lab (inference from IP address)

Page 19: 1 Bibliomining: An Introduction. 2 Outline Introduction Bibliomining Process Example Applications Placing Bibliomining in Context A Research Agenda to.

19

A Framework for the Data (Cont.)

• Data about the service– Searching, circulation, reference, interlibrary loan and other library

services– Fields common to most services include time and date, library

personnel involved, location, method, and if the service was used in conjunction with other services

– Each library services also has a set of appropriate fields• Searching: the content of the search and the next steps taken• Interlibrary loan: cost, a vendor, and a time of fulfillment• Circulation: acquisition process of the work and circulation

length.

Page 20: 1 Bibliomining: An Introduction. 2 Outline Introduction Bibliomining Process Example Applications Placing Bibliomining in Context A Research Agenda to.

20

Creating the Data Warehouse

• A data warehouse is a DB that is separate from the operational systems and contains a cleaned and anonymized version of the operational data reformatted for analysis

• Use queries to extract the data from the identified sources, combines those data using common fields, cleans the data, and writes the resulting records into either a flat file or a relational database designed specifically for analysis

• Can be automated to pull data from the operational systems into the data warehouse on a regular basis

Page 21: 1 Bibliomining: An Introduction. 2 Outline Introduction Bibliomining Process Example Applications Placing Bibliomining in Context A Research Agenda to.

21

Creating the Data Warehouse – Protecting Patron Privacy

• Going through the data warehousing process requires the library to examine their data sources

• By explicitly determining what to keep and what to destroy, libraries can save the demographic information needed to evaluate communities of users without keeping records of the individuals in those communities

• Two examples

Page 22: 1 Bibliomining: An Introduction. 2 Outline Introduction Bibliomining Process Example Applications Placing Bibliomining in Context A Research Agenda to.

22

Cleaning Transactional Records

Page 23: 1 Bibliomining: An Introduction. 2 Outline Introduction Bibliomining Process Example Applications Placing Bibliomining in Context A Research Agenda to.

23

Cleaning Web Server Transactional Records

Page 24: 1 Bibliomining: An Introduction. 2 Outline Introduction Bibliomining Process Example Applications Placing Bibliomining in Context A Research Agenda to.

24

Creating the Data Warehouse – Building the Data Warehouse

• Building the data warehouse takes much more time than mining the data

• Suggest to start with a narrowly defined bibliomining topic and work through the entire process

• This iterative process also has the advantage of allowing those developing the data warehouse, to improve their collection and cleaning algorithms early in the life of the bibliomining project

Page 25: 1 Bibliomining: An Introduction. 2 Outline Introduction Bibliomining Process Example Applications Placing Bibliomining in Context A Research Agenda to.

25

Selecting Appropriate Analysis Tools

• Traditional Reporting• Management information system (MIS)• Online Analytical Processing (OLAP)• Visualization• Data Mining

Page 26: 1 Bibliomining: An Introduction. 2 Outline Introduction Bibliomining Process Example Applications Placing Bibliomining in Context A Research Agenda to.

26

Analysis Tools – Traditional Reporting

• Library decision-makers examine aggregates and averages to understand their service use

• The advantage to the data warehouse is that new questions can be asked not only of the present situation but also, the past– This allows those doing evaluation or measurement to ask new

questions and then create a historical view of those reports in order to understand trends

• Libraries can more easily understand behavior between different demographic groups in the library

Page 27: 1 Bibliomining: An Introduction. 2 Outline Introduction Bibliomining Process Example Applications Placing Bibliomining in Context A Research Agenda to.

27

Analysis Tools – Management information system (MIS)

• Provide a manager with the ability to ask basic questions of the data

• ILS packages have some type of basic MIS built in• An MIS built on top of a data warehouse made for the library

will be more powerful and provide information that the library needs to see

• Another addition to MIS is a critical factor alert system– Example: if hourly circulation (factor) is below or above a certain

level, a manager could be immediately notified so staffing changes could be made

Page 28: 1 Bibliomining: An Introduction. 2 Outline Introduction Bibliomining Process Example Applications Placing Bibliomining in Context A Research Agenda to.

28

Analysis Tools – Online Analytical Processing (OLAP)

• An interactive view of the data• Under the surface, the OLAP tool has run thousands of DB

queries to combine all of the selected variables along with all of the selected measures (aggregation types, timeframes…)

• All of the fields are defined ahead of time, and the system runs many queries before anyone uses it– Response to the manager using the OLAP front-end for reports is ins

tant, which encourages exploration

• Penn Library Data Farm (http://metrics.library.upenn.edu/prototype/datafarm/)

Page 29: 1 Bibliomining: An Introduction. 2 Outline Introduction Bibliomining Process Example Applications Placing Bibliomining in Context A Research Agenda to.

29

Analysis Tools – Online Analytical Processing (OLAP) (Cont.)

• The user will pick one of many variables from a list to examine

• Example: use of e-journals under dimensions, such as time and subject– A high-level view of this data in a tabular report (year and general cla

ssification)– Expand the report -- click on a year expand the year into quarters,

leaving the subject headings the same and recalculating the data.– The user can then click on another field to drill down into the data

• During exploration, the manger can capture any view of the data and turn it into a regular report

Page 30: 1 Bibliomining: An Introduction. 2 Outline Introduction Bibliomining Process Example Applications Placing Bibliomining in Context A Research Agenda to.

30

Analysis Tools – Visualization

• Present the characteristics of data in a visual form

Page 31: 1 Bibliomining: An Introduction. 2 Outline Introduction Bibliomining Process Example Applications Placing Bibliomining in Context A Research Agenda to.

31

Analysis Tools – Data Mining

• Discovery of valid, novel, and actionable patterns in large amounts of data using statistical and artificial intelligence tools

• Two main categories of data mining tasks– Description: understand the data from the past and the present

• discover patterns for affinity groups of variables common to different patrons or clusters of demographic groups that exhibit certain characteristics (association rule mining, clustering)

– Prediction: make a statement about the unknown based upon what is known

• Classification (place an item into a category)• Estimation (produce a numeric value for an unknown variable)

Page 32: 1 Bibliomining: An Introduction. 2 Outline Introduction Bibliomining Process Example Applications Placing Bibliomining in Context A Research Agenda to.

32

Analysis Tools – Data Mining (Cont.)

• Techniques: neural networks, regression, clustering, rule generation, and classification

• Process: – Take a cleaned data set– Generate new variables from existing ones – Split the data into model building sets and test sets– Apply techniques to the model building sets to discover patterns– Use the test sets to ensure the patterns are more generalizable– Confirm these patterns with someone who knows the domain

• Web Usage Mining, Text Mining (+ bibliometrics)

Page 33: 1 Bibliomining: An Introduction. 2 Outline Introduction Bibliomining Process Example Applications Placing Bibliomining in Context A Research Agenda to.

33

Analysis Tools – Category & Cluster Results

Category

Cluster Results

Page 34: 1 Bibliomining: An Introduction. 2 Outline Introduction Bibliomining Process Example Applications Placing Bibliomining in Context A Research Agenda to.

34

Cluster LabelRelated TopicCitation Relation

Analysis Tools – Cluster Detail Information

Cluster Label

Related ArticleAbstract

Page 35: 1 Bibliomining: An Introduction. 2 Outline Introduction Bibliomining Process Example Applications Placing Bibliomining in Context A Research Agenda to.

35

Analysis Tools – Citation Relation

Page 36: 1 Bibliomining: An Introduction. 2 Outline Introduction Bibliomining Process Example Applications Placing Bibliomining in Context A Research Agenda to.

36

DREW Open Effort Project

• Digital reference electronic warehouse (DREW) .– Develop an XML schema to…

• Allow digital reference transactions from different services and in different communication forms to live together in one space

• Allow researchers to access these archives and explore them using a variety of methods

– Capture the results of this research into a management information system, and then allow the reference services to view their own archives through the tools created by the researchers

• Knowledge base, citations and links to other works

Page 37: 1 Bibliomining: An Introduction. 2 Outline Introduction Bibliomining Process Example Applications Placing Bibliomining in Context A Research Agenda to.

37

Analysis and Implementation

• Once the results have been developed, they must be validated– Test and tweak the model with data that were not used during the

development process (training and test)– The most important validation is to have a librarian who is familiar

with that particular library context examine the models .

• Implement the report/model– Essential to monitor the variables that power the models over time; if

the mean of a variable strays too far because of changes in the library, the model may have to be reevaluated

Page 38: 1 Bibliomining: An Introduction. 2 Outline Introduction Bibliomining Process Example Applications Placing Bibliomining in Context A Research Agenda to.

38

Example Applications – See Another PPT

Page 39: 1 Bibliomining: An Introduction. 2 Outline Introduction Bibliomining Process Example Applications Placing Bibliomining in Context A Research Agenda to.

39

Placing Bibliomining in Context

Page 40: 1 Bibliomining: An Introduction. 2 Outline Introduction Bibliomining Process Example Applications Placing Bibliomining in Context A Research Agenda to.

40

Conceptual Framework for Decision-Makers

Page 41: 1 Bibliomining: An Introduction. 2 Outline Introduction Bibliomining Process Example Applications Placing Bibliomining in Context A Research Agenda to.

41

Conceptual Framework for Library and Information Scientists

Hypothetico-Deductive-Inductive Method

歸納

推論

Page 42: 1 Bibliomining: An Introduction. 2 Outline Introduction Bibliomining Process Example Applications Placing Bibliomining in Context A Research Agenda to.

42

Understanding both Frameworks

• In both frameworks, bibliomining is not the end of the exploration process

• It is one tool to be used in combination with other methods of measurement and evaluation, such as LIBQUAL, E-metrics, cost-benefit analyses, surveys, focus groups, or other qualitative explorations

• Using only bibliomining to understand a digital library can result in biased or incomplete results

• While the information provided by bibliomining is useful, it needs to be supplemented by more user-based approaches to provide a more complete picture of the library system

Page 43: 1 Bibliomining: An Introduction. 2 Outline Introduction Bibliomining Process Example Applications Placing Bibliomining in Context A Research Agenda to.

43

A Research Agenda to Advance Bibliomining

Page 44: 1 Bibliomining: An Introduction. 2 Outline Introduction Bibliomining Process Example Applications Placing Bibliomining in Context A Research Agenda to.

44

Data Collection

• Various data sources– Integrated library system– Web-based front-end to digital libraries (federated search)– A system to support interlibrary loan– A system to support digital reference services– External systems – citation databases, census data

• How to collect data and match it between systems– Standard for data – Project COUNTER, NISO Z39.7-200x (library

metrics and statistics) aggregate-level data– Cooperation between system creators – easily exportable data

warehouse and match between systems through common fields

Page 45: 1 Bibliomining: An Introduction. 2 Outline Introduction Bibliomining Process Example Applications Placing Bibliomining in Context A Research Agenda to.

45

User Privacy

• The bibliomining data warehouse can provide the method for keeping information about the materials used in the library without maintaining specific information about the users of the library

• How about the effect this anonymization has on the power of the data mining tools to discover patterns?

• Privacy-protecting data mining• Privacy issues coming from Digital Reference Service (DRS):

personal information in the questions – Text mining and NLP

Page 46: 1 Bibliomining: An Introduction. 2 Outline Introduction Bibliomining Process Example Applications Placing Bibliomining in Context A Research Agenda to.

46

Variable, Metric, and Model Generation

• While researchers have developed metrics for library statistics, they have primarily focused on fields from one data source

• Once the warehouse has been constructed, the possibilities grow for the discovery interesting variables for mining and metrics for evaluation

• Start in the data mining process, looking for relationships between individual variables that allow for deeper understanding– Through the patterns discovered with data mining, new metrics and

measures can be proposed

• Example: one-time high-demand needs VS. needs that represent the general user base

Page 47: 1 Bibliomining: An Introduction. 2 Outline Introduction Bibliomining Process Example Applications Placing Bibliomining in Context A Research Agenda to.

47

Integration of Management Information System and Data Mining tools

• Integrate the found algorithms into the systems that drive digital libraries

• This combination of a built-in data warehouse, interactive reporting module, standards for report description, and modular design will make it much easier for library decision-makers to get involved with bibliomining.

• Toward developing these integrated modules for other systems that support digital libraries

Page 48: 1 Bibliomining: An Introduction. 2 Outline Introduction Bibliomining Process Example Applications Placing Bibliomining in Context A Research Agenda to.

48

Multi-system Data Warehouses and Knowledge Bases

• The creation of services that span many digital libraries– Library consortia– Joining together digital library sources and services while still maintaining

identity for those participating (like National Science Digital Library)

• Join data warehouses with libraries that have similar user groups and similar collections– Agree demographic surrogates or develop a cross-walk algorithm to map

demographics– Need to ensure that these patterns apply to their own library before making

decisions based upon them

• Methods for combining utilization and collection metadata between different systems.– Standardize a series of metrics (what do “Hit” and “Visit” mean?)– Create a standard for record-level data (MARC, COUNTER…)

Page 49: 1 Bibliomining: An Introduction. 2 Outline Introduction Bibliomining Process Example Applications Placing Bibliomining in Context A Research Agenda to.

49

Conclusion: moving beyond evaluation to understanding

• The final and most long-lasting area of research of bibliomining is improving understanding of digital libraries at a generalized, and perhaps even conceptual, level

• These data warehouses will combine resources traditionally unavailable in this combined form to researchers– What connections can be made between patron demographics, and

bibliometric-based social networks of authors? – How much influence do the works written and cited by faculty at an i

nstitution have on the patterns of student use of library services?– How do usage patterns differ between departments or demographic

groups, and what can the library do to better personalize and enhance existing services?

• Qualitative + quantitative