T2D + DATA IDENTIFICATION, CURATION & DURATION Maxine Tedesco ACCOLEDS: December 2-4, 2009
Jan 13, 2016
T2D + DATA IDENTIFICATION, CURATION & DURATION
Maxine Tedesco
ACCOLEDS: December 2-4, 2009
TABLE TO DATA (T2D) PROJECT
Approved March/08 at the COPPUL director’s meeting as a collaborative project seeking to implement a system of linking articles & data in open access journals published at COPPUL institutions.
T2D ACTIVITIES TO DATE May/08: Brainstorming at IASSIST conference July/08: Drupal Wiki established & “Outline of
Activities” disseminated to project members Fall/08: Maxine undertook a Literature Search
(building on work done by Jim Jacobs, Feb/08) December/08: Maxine reported at ACCOLEDS
and renewed effort to involve project members Spring/09: Maxine investigated related project
topics in connection with Study Leave research
Additionally, Chuck liaised/advocated for the project throughout the timeline & consultation with OA publishers was undertaken by some project members.
T2D PROJECT STAGES
1. Investigating Literature Searches re: background, tools, etc.
2. Recruiting Open access publishers amenable to a pilot
project Researchers willing to deposit data
3. Marking Develop a set of descriptive tags for table content Identify which parts of a data file “should” be
linked and/or archived
4. Tooling (i.e., tools for markup, searching & display)
5. Evaluating/Reporting (i.e., HOW the project results contribute to research, teaching & learning)
SO … WHAT IS IN IT FOR US?
This seemed like a reasonable question to investigate further in the research in terms of “background information”.
TAKING INTO ACCOUNT RESEARCHERS’ DISCIPLINARY DIFFERENCES, TABLES/FIGURES ARE INCREASINGLY:
used as a more effective summary of the article’s content than subject headings or other descriptors
used as a quick means of identifying types of data, methodologies &/or results
used to assess article relevance before reading the entire article
less effective if completely extracted from the surrounding explanatory text and/or complementary tables/figures
DISAGGREGATION
Disaggregation of article components such as tables/figures facilitates searching at a greater level of granularity in order to:
Improve search precision (# of relevant items) & recall (# of tables/figures not otherwise retrieved in a traditional search)
Facilitate the REAGGREGATION of a journal article’s components into new forms/formats
REAGGREGATION?
Researchers wish to easily incorporate tabular information:
into new documents (to support original research)
into multimedia documents (to support presentations - classrooms or conferences)
into other contexts (utilize data in pre-existing tables rather than generate new time-consuming and/or expensive datasets)
into a comparison of similar information (to check one’s own work against other work)
SO … WHAT CAN MAKE IT EASIER TO RETRIEVE RELEVANT TABLES/FIGURES?
The research was decidedly sparse in this area or not quite as “on-topic” as one would have hoped.
OVERVIEW OF LITERATURE REVIEW
The research mostly dealt with such topics as:
Making T&F (tables/figures) more accessible to the visually impaired.
Improved graphical presentation of T&F. Poor quality of T&F replication in
electronic versions of documents. Improved dissemination of statistical
information. Full-text does not necessarily mean the
inclusion of T&F.
FORMAT-SPECIFIC DATABASES
TableBase (Gage; 1997+) table title, table text, and descriptor fields
are searchable text that accompanies the table is not
searchable or retrievable from the product tables are directly downloadable to Excel
Statistical Universe (Lexis-Nexis PowerTables; 2000+) users search by “criteria” links to full-text documents in the CIS/LEXIS-
NEXIS digital archive & on WWW sites download a PDF file or an Excel spreadsheet
SEARCH RESULTSfrom TableBase
TYPICAL RECORD in TableBase
DATABASES WITH “DEEP INDEXING” FEATURES
Illustrata (ProQuest/CSA; 2006+) assigns 7-8 index terms per image (these
are searchable but not the table text itself) thumbnail images for quick preview links to full-text and other components
within the product
Selected ProQuest Databases (Oct. 1, 2009+) deep indexing of images added along with
traditional abstracting & indexing of text (at no additional cost)
ILLUSTRATA RESULTS PAGE
ILLUSTRATA ARTICLE RECORD
ILLUSTRATA OBJECT RECORD
GEOREF DATABASE’S LINK TO “DEEP INDEXING”
ABSTRACT RETRIEVED FROM GEOREF FOR "AERONOMY" AND
"MAPS”
PRODUCTS THAT INDEX TABLE CONTENT
TableSeer (search engine; 2006+) automatically identifies tables in digital
documents and extracts the contents in the cells of the tables
contents are stored in a queryable table in a database which extracts table metadata and uses a novel ranking function to search for tables relevant to user queries
BioText Search Engine (freely available web-based application; 2007+) searches over 300 open access journals ability to search for words within a table
TABLESEER IS PART OF CHEMXSEER
http://chemxseer.ist.psu.edu/
BIOTEXT SEARCH IN ARTICLES FOR: “HYPERCHOLESTEROLEMIA” &
“EDUCATION”
SAME BIOTEXT SEARCH IN “FIGURE CAPTIONS” – GRID VIEW
SAME BIOTEXT SEARCH IN “TABLES”
SO … WHAT DOES THIS ALL MEAN FOR THE T2D PROJECT?
Not exactly sure but perhaps, in seeing this trend in the Abstract & Indexing industry, we might investigate developing a “SocioText” type of product to index open access journals such as the Canadian Journal of Sociology = ??
SO … WHAT ELSE NEEDS TO BE “PUT ON THE TABLE”?
What if the table information is insufficient and
I want to look at entire dataset?
Where is the entire dataset?
Who owns the entire dataset?
When will it become available for me to use?
How can I get my hands on it?
IDENTIFIC/CUR/DUR-ATION!
Personal Websites Institutional Repositories Subject-specific Repositories such as:
Dryad - http://datadryad.org/repo ExLab - http://exlab.bus.ucf.edu
AND THEN PERHAPS, there’s still: Desk Drawers (aka: LOST)
SO . . . WHAT DO WE DO NOW?
Hopefully I’ve been able to provide some context and/or “food for thought” and, well . . .
stay tuned for updates!