quads.esds.ac.uk/squad THE PROJECT SMART QUALITATIVE DATA: METHODS AND COMMUNITY TOOLS FOR DATA MARK-UP SQUAD aims to explore methodological and technical solutions for ‘exposing’ digital qualitative data to make them fully shareable and exploitable. The main objectives are to: • specify, test and propose an eXtended Markup Language (XML) schema for storing and marking up qualitative data • investigate requirements for contextualising qualitative data and developing standards for data documentation • develop semi-automated using natural language processing (NLP) tools for preparing marked up qualitative data for sharing • research tools for publishing and interrogating data via the web – Qualitative Data Mark-Up Tools (QDMT) WHAT FEATURES OF TEXT CAN BE MARKED UP? Spoken interview texts provide the clearest and most common example of the types of encoding features that can be marked up. There are three basic groups of structural features: • utterance, specific turn taker, defining idiosyncrasies in transcription • links to analytic annotation and other data types (e.g. thematic codes,concepts,audio or video links, researcher annotations) • identifying information such as real names, company names, place names, occupations, temporal information USING NLP TOOLS Information Extraction (IE) is a sub-field of NLP which aims to identify key pieces of information in texts using 'shallow' analysis techniques. A typical IE system will perform Named Entity Recognition where particular kinds of proper names and terms are identified, classified and marked up. This is a means of annotating documents with semantic metadata – enabling resource discovery and data exploration. The Edinburgh LT-XML and CME tools have been used to process the data. Example: Italy's business world was rocked by the announcement last Thursday that Mr. Verdi would leave his job as vice- president of Music Masters of Milan, Inc to become operations director of Arthur Anderson. DEFINING CONTEXT Rich context enables informed re-use of data. But defining how to provide context for raw data to make it more ‘usable’ is complex. ESDS Qualidata has done much to establish informal ways of documenting raw data. Micro and macro level features should be considered including: Fieldwork observations are useful as are timelines and political chronologies. Equally when undertaking a replication or restudy, detailed information on sampling procedures, field work approaches and question guides will be essential. SQUAD has identified a minimal generic set of elements that represent a baseline for contextualising data. • how the research question was framed • the research application process • project progress • fieldwork situations • analyses processes