Mar 27, 2015
ACCOLEDS/DLI Training
EDMONTON, DECEMBER 5, 2001
Mike Sivyer
DLI ORIENTATION PROGRAM
INTRODUCTIONINTRODUCTION
EDMONTON, DECEMBER 5, 2001
GOLDILOCKS AND
THE THREE BEARS
DLI ORIENTATION PROGRAMDLI ORIENTATION PROGRAM
Chuck -POPPAPOPPA BEARBEAR
Elizabeth -
MOMMA BEAR
Garth -
BABY BEAR
EDMONTON, DECEMBER 5, 2001
Mike -
…!!!
2001
The Need to The Need to Liberate Liberate The DataThe Data
The Need to The Need to Liberate Liberate The DataThe Data
EDMONTON, DECEMBER 5, 2001
Why the heck do we need a DLI?
Historically Stats Canada made published data available to public through the DSP
These were regular paper publications and did not include electronic numeric files (ie. Public use micro data files)
Data files were available to researchers at marginal costs
Custom tables were another, more costly, method to access unpublished data
The Need to Liberate The Data
In 1980’s federal budget cuts resulted in Stats Canada’s increased emphasis on cost recovery
In early 1990’s the cost of public use microdata files underwent a dramatic increase
This pushed most data files became out of reach for majority of academic researchers & students
The Need to Liberate The Data
The cost of Stats Canada data in the late 80’s and early 90’s can be compared to the gas prices of just a few months ago. To many academic researchers and students these files cost...
The Need to Liberate The Data
The Need to Liberate The Data A consortium of universities had been created
to gain access to 1986 Census data This idea was well received by STC and led to
a movement within academic community to Liberate the rest of STC’s electronic datafiles
A paper in 1991: “Liberating the Data: Proposal for a Proposal” led to a working group to further investigate this idea
Group made up of reps. from: universities, SSFC, CARL, CAPDU, as well as STC and DSP
The Need to Liberate The Data
Champions within both the academic community and Statistics Canada came forth to push this idea
Informal approval was received in 1995 This was followed by the creation of:
An internal STC Steering CommitteeA Project TeamAn External Advisory Committee
The Need to Liberate The Data
A Licence Agreement was drafted and approved
Author divisions were asked to provide there data to the Initiative
Institutions were invited to join the initiative
Other Gov’t agencies became involved and formal approval for 5-year pilot received from Treasury Board in early 1996
The Need to Liberate The Data
Funding for initiative was to be mostly from participating institutions with CARL members paying $12,000 and CASUL members paying $3,000
One time funding from: TB, IC, MRCC A 5 year funding commitment from: HC,
HRDC, SSHRC, STC
The Need to Liberate The Data Use of Internet as a dissemination tool seen
as a key component of initiative Established mechanisms for communications,
storage, finding and ordering data Created an FTP Site at STC DLILIST - a forum for questions and sharing
of information DLIORDER & WWW DLI ORDER DESK - for
placing orders for products not on the FTP site
Began disseminating files in 1996
The Need to Liberate The Data Before DLI about 15 institutions offered a
data service Therefore co-operative training of
members was seen as an extremely important aspect due to varying degrees of experience of members.
Established a training committee and began to develop a curriculum, identify trainers, establish budgets
Regional training workshops were begun in 1997
The Need to Liberate The Data Training workshops have been given in
each region on an annual basis since then
Have just had a review of training and a report was presented to EAC last fall
One suggestion was to have another Orientation session for new members who missed the one in 1997
This workshop and this special Orientation session part of continuing co-operative training
The Need to Liberate The Data In 1996 there were 50 post secondary
members In 1998 there were 61 Today there are 66 members There are over 13,000 files in the DLI
collection including data files, documentation, CD’s etc.
Can now access the collection via DLI Web Site as well as FTP
The Need to Liberate The Data
The DLI is now a permanent program at Stats Canada located within the Library and Information Centre
Today’s graduates have had the opportunity to use Canadian data throughout their studies
The DLI has been described as one of the most important developments in the social sciences in Canada for the past 50 years!
EDMONTON, DECEMBER , 2001
What is What is
The The
Data Liberation Data Liberation Initiative?Initiative?
The Products
The Licence
The Service
The Community
DLI provides access to Stats Canada data produced as standard electronic products available to the public
These data are digitally encoded and stored in a file structure
These include:Micro data Files Geography FilesDatabasesAggregate data in table format
THETHE PRODUCTSPRODUCTS
Main focus of DLI Collection on Socio-Economic data:HealthEducation, LiteracyLabour Market, IncomeTravel JusticeCensus, DemographicEtc.
THETHE PRODUCTSPRODUCTS
Few products related to Business data in DLI Collection
Not usually produced as a standard electronic product for public dissemination
DLI includes some business products such as:Trade dataFinancial Performance Indicators CD Inter-Corporate OwnershipFleet ReportSurvey of Manufacturing
THETHE PRODUCTSPRODUCTS
Standard Electronic Product
THETHE PRODUCTSPRODUCTS
An “off the shelf ” electronic product available to the public
Not included are standard publications available in electronic form as these are usually part of DSP
Registered in STC Catalogue of Products and Services and has a Product Number
82F0077XIE Report on smoking prevalence in Canada
82-003-XPB980034138 Attitudes toward smoking
First 2 characters = Major Subject Group
Third character = Product Class (e.g. F = Fixed, M = Microdata product, C = Custom product, etc)
Fourth - seventh character = unique product number
THETHE PRODUCTSPRODUCTS
82F0077XIE Report on smoking prevalence in Canada
82-003-XPB980034138 Attitudes toward smoking
Eighth character = variable descriptor (e.g.. P = Preliminary, U = Update, X = Not Applicable, G = Guide, etc)
Ninth character = medium (e.g.. T = tape, D = diskette, I = Internet, C = CD, etc)
Tenth Character = Language
THETHE PRODUCTSPRODUCTS
11-008-XPE960032868 Youth smoking in Canada
This article was published in Canadian social trends, catalogue number 11-008-XPE, Winter 1996 no.43
82M0011XDB Youth Smoking Survey, 1994
82M0011GPE Youth Smoking Survey, 1994 - Microdata user's guide
82C0014 Youth Smoking Survey, 1994 - Custom tabulations
THETHE PRODUCTSPRODUCTS
Metadata available in both Official Languages whenever available
New data products continually being added to Collection
Includes:Updated data from regular on-going
surveysData from ad-hoc special surveys -one
time onlyData from new surveys in STC program
THETHE PRODUCTSPRODUCTS
Updates may be provided in different format than earlier version:For example PUMF Beyond 20/20
As new versions are received have to decide to either replace data or add to Collection
Over 13,000 files in Collection including:Data filesMetadata & Readme filesCensus & GeographyCD’s
THETHE PRODUCTSPRODUCTS
Not all products in DLI Collection are standard electronic products
Have some “special” products just for DLI which contain non-public data:
KLEMS databaseAn experimental database of productivity data
Justice StatisticsComplete set of Beyond 20/20 tables normally
only available to members of CCJS Initiative
THETHE PRODUCTSPRODUCTS
No longer included in the DLI Collection are:
E-STATReceived CD once per yearE-STAT now free on-line to subscribing
institutionsCANSIM
received a CD of CANSIM once a yearWith introduction of CANSIM II this no longer
happensCANSIM data included on E-STAT
THETHE PRODUCTSPRODUCTS
DLI is open to all accredited Post Secondary Institutions in Canada
Data made available on a subscription basis
All member institutions must sign a Licence Agreement
Data made available to Educators, Students and Other Staff while they have such status at the InstitutionE.g.. A student who goes to USA to do Masters
no longer has access to data
THETHE LICENCELICENCE
Data is made available for:Academic Research and PublishingTeachingPlanning of academic/educational services
Use of data in textbooks falls under a different set of STC licences and permissions
Data not to be used in any commercial or private activities (even if no $$ involved)
DLI Contact responsible to ensure eligible use of data
THETHE LICENCELICENCE
DATA AVAILTHRU DLI?
YES
NO CONTACTDLI TEAM AVAIL?
YES
NO CONTACTREGIONALADVISORYOFFICE
USERAUTHORIZED?
YES
NO CONTACTDLI TEAMIF UNSURE
OK?
YES
NO CONTACTREGIONALADVISORYOFFICE
FOR ACADEMICPURPOSE?
YES
NO CONTACTDLI TEAMIF UNSURE
OK?
YES
NO CONTACTREGIONALADVISORYOFFICE
PROVIDE FILE
Other questions to help determine if use falls under definition of academic research
If publishing - is use strictly for publishing in academic or scholarly journal?
Is use under a contract with outside agency/organization? - Any $$ involved?
Do $$ come thru institution’s “grants dept”?Even if no $$ involved did contract come
thru regular institutional channels? Is data expected to be shared with outside
agency/organization?
THETHE LICENCELICENCE
Other important elements of the Licence Agreement:
Data & products offered “as is “STC remains owner of intellectual property
- only access to data is providedUsers must not link data or otherwise try to
identify individual respondentsDLI Contact to implement data security
measuresMay have users sign before allowing access
THETHE LICENCELICENCE
Plans to create document on Web to address more common data use elegibility questions
Have gathered all questions since 1996This to be done as soon as time and
resources permitUntil then if unsure send message to
Team for considerationAll questions reviewed by DLI Manager
& Director as well as Co-Chairs of EAC
THETHE LICENCELICENCE
In process of finalizing new Licence Agreement to reflect move from pilot to permanency
This to be sent to Library Directors soon
Until then will continue to operate under original Licence Agreement
THETHE LICENCELICENCE
DLI was conceived to be a Internet based means of dissemination - internet the main mode of data transfer and communications
DLI Team offers both an FTP and a Web based service for access to Collection
DLILIST - forum for making enquires, sharing of information and general communication between and among members
DLIORDER & WWW DLI ORDER DESK - processes to order hard copy versions of products not available electronically
THETHE SERVICESSERVICES
The DLI TeamDLI activities at STC are performed by a
small Team situated in the Library and Information Center - currently there are 6 members on the Team
Have just lost one Team member who was responsible for Liaison/Communications and whose duties included:Responding to enquires in DLILISTLiaison with author divisionsProducing DLI Update
THETHE SERVICESSERVICES
Other Team members include:André Blondin Responsible for:
Quality Control of data and metadataMaintenance of FTP site directoriesLoading of files on FTP siteOverseeing creation of SPSS
THETHE SERVICESSERVICES
André Blondin
THETHE SERVICESSERVICES
Other Team members include:Christiane Rousseau Responsible for:
Assistant Liaison/CommunicationDLIORDER/WWW DLI ORDER DESK
and hardcopy productsMaintaining library of CD’s and print
documentation
THETHE SERVICESSERVICES
Christiane Rousseau
THETHE SERVICESSERVICES
Other Team members include:Roger Arsenault Responsible for:
Creation and validation of SPSS
THETHE SERVICESSERVICES
Roger Arsenault
THETHE SERVICESSERVICES
Other Team members include:Marie Josée BourgeoisResponsible for:
Web page creation and developmentWeb links to data and metadata
THETHE SERVICESSERVICES
Marie Josée Bourgeois
THETHE SERVICESSERVICES
Other Team members include: Jackie Godfrey Responsible for:
Project on-line infrastructureData security i.e.. Passwords, IP
validation, etcFTP & Web listservs
THETHE SERVICESSERVICES
THETHE SERVICESSERVICES
Others who are connected to Team
Anne Chartrand - Administrative Assistant
Carole Paradis - Cataloguing of DLI Collection
Ernie Boyko - Director of Library
THETHE SERVICESSERVICES
Although the Licence states “as is “ this is not really the case
When a product is received by Team a number of steps are performed before it is placed in the Collection:
First of all check to ensure that all files - data, metadata (French & English) have been received
Open each file to ensure it is what it says it is (e.g if a .DOC then file is a WORD file, etc)
THETHE SERVICESSERVICES
Run program against data file to verify:Number of recordsRecord lengthOverall size of file
Compare results against codebook and/or record layout
If SAS and/or SPSS received run against file If no SPSS - create itRename all files to conform to DLI
standards
THETHE SERVICESSERVICES
Create FTP path & directoriesCreate Readme fileLoad all files into appropriate placeUpdate all related Web pages ( Site
Additions, List of Products, Product Release Table and Web Site page, etc.)
Announce addition on DLILIST
THETHE SERVICESSERVICES
To help users keep track of receipt of product we have Product Release Table on Web
When data availability announced in Daily we add product to table
Must first ensure that a DLI product (e.g. PUMF) will be produced
Contact division to determine date of release of PUMF
Monitor and update Product Release Table on ongoing basis
THETHE SERVICESSERVICES
The Daily is the official release vehicle of STC
All data must first be announced in Daily before being disseminated to public
Data announced in Daily does not mean that a public file is available
Could just be an announcement of data availability - i.e. for custom requests
Look at DLI Update of Fall 2000 for a detailed description of how STC released in the Daily
THETHE SERVICESSERVICES
DLI UpdateA newsletter designed as a means to
inform, teach and share information containing articles written by various DLI Contacts and Team members
Produced by Liaison/Communications OfficerOriginally intended to be produced on a
semi - regular basis (there were 3 in 1997, 2 in 98, 1 in 99 and 2 in 2000)
Due to lack of resource have not produced a new issue since fall 2000
THETHE SERVICESSERVICES
SPSSMany files have not come with SPSS
descriptions - these are created by DLI Team
Often older files do not have French versions of documentation so extremely difficult to create French SPSS
Creation of these SPSS labels can take some time after receipt of documentation, depending on workload, size of file, and if any documentation in electronic format
THETHE SERVICESSERVICES
SPSSStarting to receive some kind of SPSS
descriptors from author divisions If and when SPSS supplied by author
division they can require major editing to fit with “DLI Users” requirements (e.g. length of variable and value labels)
The preparation of SPSS is a major undertaking and the DLI Team expends 1+ FTE on this activity
THETHE SERVICESSERVICES
TrainingDLI project subsidises annual
Regional Training Workshops
DLI Web page will provide links to various training materials
Team members assist in training workshops if and when required
THETHE SERVICESSERVICES
CA*NET3 - new service being offered by DLIA separate internet line connecting individual
universities, federal & provincial government labs and research institutes through provincially based Regional Advanced Networks (RAN)
Developed by CANARIE and private industry and is the world’s first national optical internet
DLI pays annual fee to be connected to this line
Has increased download efficiencies for DLI Contacts
THETHE SERVICESSERVICES
The DLI is a partnership between Statistics Canada and the participating post secondary institutions
There are currently 66 member institutions (BCIT left but Red Deer College joined)
Have received enquiries from some CEGEP’s
We are thinking about foreign institutions with Canadian Studies programs
THETHE COMMUNITYCOMMUNITY
Major activities and direction of project guided by members themselves through the External Advisory Committee (EAC)
EAC meets twice a year to review and discuss major issues
Co-Chairs are in frequent contact with DLI Manager and Director at STC
New governance document on EAC to be on Web soon
THETHE COMMUNITYCOMMUNITY
EAC made up of following academic members:Elizabeth Hamilton - UNBMary MacLeod - AcadiaGaetan Drolet - LavalBarbara Znamirowski - TrentWendy Watkins - Carleton (Co-Chair)Mark Leggott - U of WinnipegChuck Humphrey - U of Alberta (Co-Chair)Walter Piovesan - Simon Fraser
THETHE COMMUNITYCOMMUNITY
Other voting members:Bruno Gnassi - DSP Jeffery Smith - Asst Director Special SurveysErnie Boyko - STCMike Sivyer - STC
Other participants include:Rest of DLI TeamOften invite members of different STC
divisions to make presentations to EAC on their products, etc.
THETHE COMMUNITYCOMMUNITY
There are a number of advantages to belonging to DLI:
The DLI provides academic community with “one stop shopping” for SCT products at affordable prices
Provide a forum for sharing information and obtaining advice
Value added to basic STC products (e.g. SPSS)
Participation in training workshops also a great “community builder”
THETHE COMMUNITYCOMMUNITY