Data Warehousing and Mining Data from Library and University Systems for Assessment of Library Operations ENUG Conference Cheng Library, William Paterson University, Wayne, New Jersey, Thursday, October 21, 2010 Ray Schwartz, Systems Specialist Librarian Cheng Library, William Paterson University, Wayne, New Jersey, USA schwartzr2 @ wpunj.edu
53
Embed
Data Warehousing and Mining Data from Library and University Systems for Assessment of Library Operations
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Data Warehousing and Mining Data from Library
and University Systems for Assessment of Library
OperationsENUG Conference
Cheng Library, William Paterson University, Wayne, New Jersey,
Thursday, October 21, 2010
Ray Schwartz, Systems Specialist Librarian
Cheng Library, William Paterson University, Wayne, New Jersey, USAschwartzr2 @ wpunj.edu
2
Outline• What is Data Mining and Data
Warehousing and Why Do We Do It?• Our Library and University• Patron Statistical Categories• Application Server• Reporting
3
Collecting Transactional Data
• ILSs collect transactional data for circulation and allocation of collection funds.
• ILL and Document Delivery services supply general transactional data.
• Reports from vendor services– Bibliographic utilities– Subscription agents– Book jobbers
4
• Most ILSs have search and web server logs
• Most (if not all) Databases have usage reports
• Link Resolver logs• Proxy Server logs• Many other ways of collecting
• Combined usage by department/majors of more than one library service.
6
What is Data Mining and Data Warehousing
• Extracting data from legacy systems and other resources;
• cleaning, scrubbing and preparing data for decision support;
• maintaining data in appropriate data stores; • accessing and analysing data using a variety
of end user tools; • and mining data for significant relationships.
• Chaffey, D., Mayer, R., Johnston, K., & Ellis-Chadwick, F. (2002). Internet Marketing: Strategy, Implementation and Practice (2nd ed.). Financial Times/ Prentice Hall.
7
• The primary purpose of these efforts is to provide easy access to specifically prepared data that can be used with decision support applications such as management reports, queries, decision support systems, executive information systems and data mining.
• Chaffey, D., Mayer, R., Johnston, K., & Ellis-Chadwick, F. (2002). Internet Marketing: Strategy, Implementation and Practice (2nd ed.). Financial Times/ Prentice Hall.
• 19 librarians and 26 library staff• 350,000 volumes• 18,000 audiovisual items• 47,000 print and electronic periodicals • 124 general and subject specific
• Voyager ILS • Clio ILL Software• EZProxy Server• Banner – University ERP• University Networked Drive K:• University Email Server• University Web Server
39.0836.2993% 13 14 508 76 13 419 GRADUATE STUDENTS
19.6915.3978% 186 238 3,663 698 250 2,715 UNDERGRADUATE STUDENTS
CIRC/ BORROWER
CIRC/ MEMBER
% BORROW
INGBORROWERSMEMBERSTOTAL CIRCEQUIP CIRCMEDIA CIRCBOOK CIRCPATRON STATUS
21
Communications Majors FY08/09
Statistical Categories // Item Type / Location / Call No Type / Call NoCommunications
Majors Freshman Sophomore Junior SeniorM- DVD / Media Services / Other / DVD 194 17 31 52 94M- VideoCass / Media Services / Other / VC 228 11 40 67 110T- Book / 2nd Floor - Circulating / Library of Congress / B 34 9 8 11 6T- Book / 2nd Floor - Circulating / Library of Congress / BD 3 1 2T- Book / 2nd Floor - Circulating / Library of Congress / BF 30 5 5 12 8... 2nd Floor Circulating 1531 222 310 403 596T- Juvenile / CMC / 125 14 26 20 35T- NJDoc / Askew Documents Room / Other / 1 1New Jersey History 10 0 2 7 1T- ReserveBk / Reserves Desk / 189 13 46 68 62T- SpecColl / Special Collection / Library of Congress / LC 3 3 T- Book-McNaughton / Leisure Lounge / Library of Congress / F 2 1 1T- Book-McNaughton / Leisure Lounge / Library of Congress / HF 1 1 T- Book-McNaughton / Leisure Lounge / Library of Congress / HS 2 2 T- Book-McNaughton / Leisure Lounge / Library of Congress / HV 5 1 2 2T- Book-McNaughton / Leisure Lounge / Library of Congress / ML 1 1 T- Book-McNaughton / Leisure Lounge / Library of Congress / PN 3 3 T- Book-McNaughton / Leisure Lounge / Library of Congress / PS 29 4 10 15T- Book-McNaughton / Leisure Lounge / Library of Congress / RC 2 1 1T- Book-McNaughton / Leisure Lounge / Library of Congress / TL 1 1Leisure Lounge 49 9 1 19 20
22
Challenges with combining data from various services
• Little to no linkage of data
• Multiple user IDs for authentication
23
Second Step – Setup an Application Server
24
What is an Application Server?
• A machine or its software that works in conjunction with a web server to deliver application services such as the dynamic creation of a webpage from content stored in a database. From http://www.webtools.ca.gov/help/Glossary.asp• Web Server Software (Apache or IIS)
• Database Management System – DBMS (MySQL, Oracle, MS SQL Server)
• Scripting Language (Perl, PHP, ColdFusion, ASP)
25
Why an Application Server?
• Relevant data in logfiles need to be in a database to be analyze.
• Need your own DBMS to create new tables and queries.
26
• Decide how you will use the Application Server.
• Decide on the best and most plausible configuration.
27
Authentication of ILL and other forms are routed through the EZProxy server
28
Daily and Weekly Email Reports from the
Application ServerCirc Fines Audit Daily Report - Daily at 6:05 AM.
Dupe Patron Record Report - Daily at 5:56 AM.
Hobart Media Services Equipment Pickup Summary - Daily at 6:58 AM.
Media Service Scheduling Rooms Report - Daily at 6:02 AM.
Media Services Equipment Pickup Summary - Daily at 7:00 AM.
Received Title Alert - Daily at 6:59 AM.
Reserves Overdues - Daily at 5:59 AM.
Scheduled LIS Tasks - Daily at 6:00 AM.
ILL Borrowing Overdues Report - Weekly at 5:59 AM.
ILL Lending Reports - Weekly at 6:15 AM.
29
Monthly Email Reports from the Application Server
Circ Fines Audit - Monthly at 6:10 AM. Circulation by Location and Item Type - Monthly at 6:21 AM. Circulation Lost and Paid - Monthly at 6:25 AM. Circulation Online Renewal Count - Monthly at 6:30 AM. Media Circulation - Monthly at 6:35 AM. Reserve Circulation - Monthly at 6:40 AM.
30
31
On Demand Reports
32
Lists of patrons with fines between $10 and $19.99 • Student and Alumni fines list - Sorted by either Name, Amount or Notice
Date.• PALS and Courtesy Patron fines list - Sorted by Name.• All other Patron fines list - Sorted by Name.
Lists of patrons with fines over $19.99 • Student and Alumni fines list - Sorted by either Name, IID, Amount, Notice
Date or Notes.• PALS and Courtesy Patron fines list - Sorted by Name.• VALE Patron fines list - Sorted by Name.• All other Patron fines list - Sorted by Name.
Lists of patrons with overdues older than 30 days • Student and Alumni overdues list - Sorted by either Name, IID or Notes.• PALS and Courtesy Patron overdues list - Sorted by Name. • All other Patron overdues list except VALE - Sorted by Name.
Lending Services Reports
33
Lists of VALE patrons with overdues older than 6 months • VALE patron overdues list - Sorted by Name.
Miscellaneous Reports • Patrons with the word "Collection Agency" or "CA" in their notes.• Patrons with the word "FINE" in one of their notes. • Patrons with the word "SOILS" in their notes. • Patrons with the word "FALL07 SOILS" in their notes. • Patrons with the word "HOLD" in their notes. • Combined list of HOLD, FINE, and CA.
Circulation Reports by Item Type from 2003 to the present• All Staff.• All Colleges • Undergraduates by Major. • Graduates by Major • Patrons that have reached a total fine balance of $10 or more after 31-
Dec-2009 and 30-Nov-2009
Lending Services Reports, cont.
34
One of Our Projects• Mining EZProxy logfiles and linking to
patron statistical categories from the Voyager Patron Database
– What majors and departments are accessing which database services?
– What majors and departments are accessing the ILL services?
9M- Music9M- Special Programs8M- Psychology7M- Biotechnology7M- Political Science6M- Anthropology6M- Music - Jazz Studies4M- Business4M- Communication4M- Nursing