New ways to monitor the results of science investments Julia Lane American Institutes for Research University of Strasbourg University of Melbourne
Dec 14, 2015
New ways to monitor the results of science
investments
Julia LaneAmerican Institutes for Research
University of StrasbourgUniversity of Melbourne
Key messages
• United States universities and agencies are building a people-based framework drawing on STAR METRICS/UMETRICS
• The EU and Australia/New Zealand have similar potential
• CERIF and CASRAI can help inform the effort – focus on simplicity and value added of common standards
Overview
• Motivation• Conceptual Framework• Empirical Framework
– What– Who – What results
• International activities
How much should a nation spend on science? What kind of science? How much from private versus public sectors? Does demand for funding by potential science performers imply a shortage of funding or a surfeit of performers?......A new “science of science policy” is emerging, and it may offer more compelling guidance for policy decisions and for more credible advocacy
Key questions
We spend a lot
Note…the data don’t exist
An Opportunity . . . STAR METRICS represents a valuable step toward developing detailed, broadly accessible and nationally representative data that would allow systematic and scientific analysis of the organization, productivity, and at least some of the effects of federally funded research [but] . . .
1. . . . STAR METRICS data are largely inaccessible . . .
2. . . . data collection could usefully be expanded to include more universities and other performers . . .
3. . . . STAR METRICS data would be more useful if steps were taken to ensure the data can be flexibly linked to other data sources [such as] those maintained by the federal statistical and science agencies . . . as well as proprietary data sources . . . Creating a robust and linkable dataset may require the addition of individual and organizational identifiers.
P. 4-10
Overview
• Motivation• Conceptual Framework• Empirical Framework
– What– Who – What results
• International activities
A conceptual framework
Science Investments Universities
Fund
DiscoveryLearning
Dissemination
JobsStimulus
Hiring, Spending
Knowledge, People, Skills
InnovationEntrepreneurshipEconomic Growth
Public HealthFood Safety
SecurityRational Policy
…
Framework
Most policy discussions focus here, on expenditures and obligations. The emphasis is grants, not people.
Science is not a jobs program, but research has substantial short term stimulus effects (Weinberg et al 2014)
Improving institutions for discovery and training requires understanding the knowledge production function
Estimating network effects on outcomes- How does network structure and composition
affect student research and career outcomes?- Investments create and sustain teams and
those networks do research and train students- Identification strategy: Use higher order
features of networks as instruments
This is the real public value of university research and the United States is building a systematic basis to evaluate, explain, or improve effects.
Develop and validate measures of knowledge transmission from academia to industry - Document and inform knowledge transfer
Overview
• Motivation• Conceptual Framework• Empirical Framework
– What– Who – What results
• International activities
STAR METRICS/UMETRICS
The Empirical Framework
Source: Ian Foster, University of Chicago
18
Lots of data• STAR Metrics employee transactions
– Caltech employee data contains 619,113 transactions and 11,939 employees spanning 1999-2012
– Purdue employee data contains 359,767 transactions and 27,248 employees spanning 2008-2013
• STAR Metrics vendor transactions– Caltech vendor data contains 5,564,643 transactions and 15,646 vendors spanning 1999-2012
Describing What is funded
Source: Ian Foster, University of Chicago
Different Text analytics paradigms: Best is to use statistical machine learning augmented with lexicons and linguistics (Rayid Ghani)
Lexicon-based Rules Linguistic Rules Statistical Machine Learning (augmented with linguistics & lexicons)
Description Rules based on lists of words Rules using words and linguistic operators (parts of speech for example)
Statistical approaches that can be trained and learn over time. Can incorporate lexicons and linguistics as well
Ease of creation & maintenance
Low Low High
Accuracy Low Medium High
Context Sensitiveness
Low High High
Interpretability High(unless the rules get large)
Medium Medium
Do it by hand Hire linguists Do it the right way
Describing WHO is funded
Source: Ian Foster, University of Chicago
Institution STARSTARPilotProject
AcquisitionAnd Analysis
DirectBenefitAnalysis
IntellectualPropertyBenefitAnalysis
InnovationAnalysis
Jobs,Purchases,ContractsBenefitAnalysis
DetailedCharacterization
andSummary
Institution
Agency Budget
Award
StateFunding
Personnel Vendor Contractor
HR System ProcurementSystem
SubcontractingSystem
EndowmentFunding
Financial System
Hire Buy Engage
Disbursement
Award
Record
Start-Up
Papers
Patents
DownloadState
ResearchProject
ExistingInstitutionalReporting
Agency
Award dataFile File Description Data Element XML/CSV Data Element Name Definition Type Format Examples
Max Length
Frequency
Period Start Date PeriodStartDate The start date for the period. Date (YYYY-MM-DD) 2009-10-01 10
Period End Date PeriodEndDate The end date for the period. Date (YYYY-MM-DD) 2009-12-31 10
Unique Aw ard Number UniqueAw ardNumber Identif ier specifying an aw ard and its funding source, as defined by concatenating the 6-position funding source code—either the CFDA code or a STAR Other Funding Source (OFS) code—w ith an aw ard identif ier—either the federal aw ard ID from the aw arding Federal Agency (such as the federal grant number, federal contract number, or the federal loan number) or an internal aw ard ID for non-federal aw ards—w ith a space in betw een the tw o numbers.
String [Funding Source] [Aw ard Identif ier]##.### Award Identifier
28.124 FSJ123247.000 5544111 00.000 124658100.200 State Aw ard 112.345 047.000 0
57
Recipient Account Number RecipientAccountNumber Research Institution's internal number for the aw ard. String No required format FS111222555 255
Overhead Charged OverheadCharged Actual Overhead dollars charged to the aw ard in the specif ied period.
Number Dollar value - numeric w ithout $ sign or commas
1222.31 18,2
Unique Aw ard Number UniqueAw ardNumber Identif ier specifying an aw ard and its funding source, as defined by concatenating the 6-position funding source code—either the CFDA code or a STAR Other Funding Source (OFS) code—w ith an aw ard identif ier—either the federal aw ard ID from the aw arding Federal Agency (such as the federal grant number, federal contract number, or the federal loan number) or an internal aw ard ID for non-federal aw ards—w ith a space in betw een the tw o numbers.
String [Funding Source] [Aw ard Identif ier]##.### Award Identifier
28.124 FSJ123247.000 5544111 00.000 124658100.200 State Aw ard 112.345 047.000 0
57
Recipient Account Number RecipientAccountNumber Research Institution's internal number for the aw ard. String No required format FS111222555 255
Aw ard Title Aw ardTitle Title of Aw ard String No required format Collaborative Research: Empirical Analyses of Committee Voting
Please provide a full quarter's
w orth of transactional
data. Only sum data to a
quarterly level if transactional
data is unavailable.
Award
Overhead charged
grouped by Award
Award Specific Info
Federal grant awards
information
Employee data
File File Description Data Element XML/CSV Data Element Name Definition Type Format Examples
Period Start Date PeriodStartDate The start date for the transaction. Date (YYYY-MM-DD) 2009-10-01
Period End Date PeriodEndDate The end date for the transaction. Date (YYYY-MM-DD) 2009-10-31
Unique Aw ard Number UniqueAw ardNumber Identif ier specifying an aw ard and its funding source, as defined by concatenating the 6-position funding source code—either the CFDA code or a STAR Other Funding Source (OFS) code—w ith an aw ard identif ier—either the federal aw ard ID from the aw arding Federal Agency (such as the federal grant number, federal contract number, or the federal loan number) or an internal aw ard ID for non-federal aw ards—w ith a space in betw een the tw o numbers.
String [Funding Source] [Aw ard Identif ier]##.### Award Identifier
28.124 FSJ123247.000 5544111 00.000 124658100.200 State Aw ard 112.345 047.000 0
Recipient Account Number RecipientAccountNumber Research Institution's internal number for the aw ard. String No required format FS111222555
De-identif ied Employee ID Number Deidentif iedEmployeeIdNumber Unique Employee ID (not Social Security number) of grant funded personnel
String No required format E998811
Occupational Classif ication OccupationalClassif ication Occupational classif ication / Job description of the funded personnel (ex. Faculty, Undergrad Student, Grad Student, Research Support, Technician/Staff Scientist, Post Graduate Researcher, Clinicians)
String No required format Associate Professor- Biology
FTE Status FteStatus Designation of the status (percent) of the funded personnel (full time = 1.0, half time = .5)
Number Decimal Number Betw een 0 and 1 (inclusive)
0.5
Proportion of Earnings Allocated to Aw ard ProportionOfEarningsAllocatedToAw ard OR ProportionOfEarningsAllocated
Calculated portion of earnings charged by funded personnel to the aw ard in the specif ied period.
Number Decimal Number Betw een -1000 and 1000 (exclusive)
0.33
Percentage of Earnings Spent on Award,
grouped by Employee
Employee
Employee data
De-identif ied Employee ID Number Deidentif iedEmployeeIdNumber Unique Employee ID (not Social Security number) of grant funded personnel
String
Employee Last Name Lastname Employee last name String
Employee Middle Name Middlename Employee middle name String
Employee First Name Firstname Employee f irst name String
Employee DOB DOB Employee date of birth Date
Employee NameLink employee
names to Employee File
Vendor data
File File Description Data Element XML/CSV Data Element Name Definition Type Format
Period Start Date PeriodStartDate The start date for the transaction. Date (YYYY-MM-DD)
Period End Date PeriodEndDate The end date for the transaction. Date (YYYY-MM-DD)
Unique Aw ard Number UniqueAw ardNumber Identif ier specifying an aw ard and its funding source, as defined by concatenating the 6-position funding source code—either the CFDA code or a STAR Other Funding Source (OFS) code—w ith an aw ard identif ier—either the federal aw ard ID from the aw arding Federal Agency (such as the federal grant number, federal contract number, or the federal loan number) or an internal aw ard ID for non-federal aw ards—w ith a space in betw een the tw o numbers.
String [Funding Source] [Aw ard Identif ier]##.### Award Identifier
Recipient Account Number RecipientAccountNumber Research Institution's internal number for the aw ard. String No required format
Vendor ID VendorID Internal identif ier specifying the organization or institution of the vendor organization.
String No required format
Vendor DUNS Number VendorDunsNumber The Vendor's 9 digit DUNS number. If DUNS is unavailable then substitute zip code w ith "Z" prefix so as to distinguish it from the DUNs number. If using foreign zip codes, prefix w ith "F" instead of "Z".
String DUNS:######### ZIP/Postal Code:Z#####Z#####-####Z#########Foreign_postal_code
Vendor Payment Amount VendorPaymentAmount The funds charged to the aw ard by the vendor in the specif ied period.
Number Dollar value - numeric w ithout $ sign or commas
Vendor
Vendor payments
grouped by Vendor
Subaward dataFile File Description Data Element XML/CSV Data Element Name Definition Type
Period Start Date PeriodStartDate The start date for the transaction. Date
Period End Date PeriodEndDate The end date for the transaction. Date
Unique Aw ard Number UniqueAw ardNumber Identif ier specifying an aw ard and its funding source, as defined by concatenating the 6-position funding source code—either the CFDA code or a STAR Other Funding Source (OFS) code—w ith an aw ard identif ier—either the federal aw ard ID from the aw arding Federal Agency (such as the federal grant number, federal contract number, or the federal loan number) or an internal aw ard ID for non-federal aw ards—w ith a space in betw een the tw o numbers.
String
Recipient Account Number RecipientAccountNumber Research Institution's internal number for the aw ard. String
Institution ID InstitutionID Internal identif ier specifying the organization or institution of the sub-recipient organization.
String
Sub-Aw ard Recipient DUNS Number SubAw ardRecipientDunsNumber The sub recipient organization’s 9- digit DUNS number. If DUNS is unavailable then substitute zip code w ith "Z" prefix so as to distinguish it from the DUNs number. If using foreign zip codes, prefix w ith "F" instead of "Z".
String
Sub-Aw ard Payment Amount SubAw ardPaymentAmount The funds charged to the aw ard by the sub-aw ardee in specif ied period.
Number
Sub-Award
Sub-Award Payments
grouped by Sub-Award Recipient
File File Description Data Element XML/CSV Data Element Name Definition Type
Organization or Institution ID OrgID Internal identif ier specifying the organization or institution of the sub-recipient organization.
String
Organization or Institution Name Subaw ardName Name of the sub-recipient or vendor organization String
Organization or Institution City City City of the sub-recipient or vendor organization String
Organization or Institution State State State of sub-recipient or vendor organization String
Organization or Institution Zip Code Zip Code Zip Code of sub-recipient or vendor organization String
Organization or Institution Country Country Country of sub-recipient or vendor organization String
Institution / Vendor Crosswal
k
Sub-Award Payments
grouped by Sub-Award Recipient
HR data
29
Lots of data• STAR Metrics employee transactions
– Caltech employee data contains 619,113 transactions and 11,939 employees spanning 1999-2012
– Purdue employee data contains 359,767 transactions and 27,248 employees spanning 2008-2013
• STAR Metrics vendor transactions– Caltech vendor data contains 5,564,643 transactions and 15,646 vendors spanning 1999-2012
Lots of opportunities
30
Data issues Occupational coding Transaction data Standardization Linkages GapsConceptual issues• Multiple sources of grant funds• Multiple units of analysis (individual, PI, grant, research field)• …
Example of occupational codingprof school project assistant 9514-Prof School Project Assistantgraduate instructor 9515-Graduate Instructor
md student project assistant 9516-MD Student Project Assistant
ph d candidate grad instructor 9517-Ph D Candidate Grad Instructor
phd candidate teaching asst 9519-PhD Candidate Teaching Asstresearch assistant 9521-Research Assistantundergrad research asst i 9522-Undergrad Research Asst Iundergrad research asst ii 9523-Undergrad Research Asst II
ugrad rsrch asst(non-univ stu) 9524-Ugrad Rsrch Asst(Non-Univ Stu)
ugrad tchg asst (non-univ stu) 9525-Ugrad Tchg Asst (Non-Univ Stu)
graduate research project asst 9526-Graduate Research Project Asst
ph d cand grad rsrch proj asst 9527-Ph D Cand Grad Rsrch Proj Asst
advanced masters research asst 9528-Advanced Masters Research Asst
phd candidate research asst 9529-PhD Candidate Research Asstadministrative fellow 9531-Administrative Fellow
phd candidate admin fellow 9533-PhD Candidate Admin Fellow
professional program assistant 9535-Professional Program Assistant
legal proj asst (w/o tuit ben) 9539-Legal Proj Asst (w/o Tuit Ben)pharmacy associate 9540-Pharmacy Associatepre-doctoral assistant 9545-Pre-Doctoral Assistantpost-doctoral associate 9546-Post-Doctoral Associateveterinary medical resident 9548-Veterinary Medical Resident
veterinary resident-grad prgm 9549-Veterinary Resident-Grad Prgm
Unit of analysis: Networks
Jason Owen Smith
Unit of analysis: Projects
Describing the Results
Source: Ian Foster, University of Chicago
35
Summary StatisticsJoint Frequency by NAICS and Occupation
Most Purdue matches were Faculty members who performed Consulting and Educational Services
Faculty Graduate Undergrad Other0
200
400
600
800
1000
1200
Other62 - Health Care56 - Administrative45 - Retail Trade71 - Arts & Entertainment81 - Other Services61 - Educational Services54 - Professional Services
0
200
400
600
800
1000
1200
1400
Other
Undergrad
Graduate
Faculty
Overview
• Motivation• Conceptual Framework• Empirical Framework
– What– Who – What results
• International activities
Engage internationally
Example for international universities
What work is being done
Key messages
• United States universities and agencies are building a people-based framework drawing on STAR METRICS/UMETRICS
• The EU and Australia/New Zealand have similar potential
• CERIF and CASRAI can help inform the effort – focus on simplicity and value added of common standards
Comments and questions?