7/30/2019 Samuel Lubli, Martin-Dietrich Glessgen, Phoenix2. A Tool for Web-Based Annotation of Medieval Texts
1/27
Phoenix2A Tool for Web-Based Annotation of Medieval Texts
COST Workshop Connecting Textual Corpora and Dictionaries
Samuel Laubli1,2 Martin-Dietrich Glessgen1
1Institute of Romance StudiesUniversity of Zurich
2
Institute of Computational LinguisticsUniversity of Zurich
April 26, 2013
7/30/2019 Samuel Lubli, Martin-Dietrich Glessgen, Phoenix2. A Tool for Web-Based Annotation of Medieval Texts
2/27
Samuel Laubli | 2/27
7/30/2019 Samuel Lubli, Martin-Dietrich Glessgen, Phoenix2. A Tool for Web-Based Annotation of Medieval Texts
3/27
Samuel Laubli | 3/27
Contents
1. Background Corpus Digital Edition Tools
2. Phoenix2 in Use
Import Querying Annotation External Editing
3. Hands-On Session
4. Conclusion
7/30/2019 Samuel Lubli, Martin-Dietrich Glessgen, Phoenix2. A Tool for Web-Based Annotation of Medieval Texts
4/27Samuel Laubli | 4/27
Background
1. Background
Les plus anciens documents linguistiques de la France
B k d C
7/30/2019 Samuel Lubli, Martin-Dietrich Glessgen, Phoenix2. A Tool for Web-Based Annotation of Medieval Texts
5/27Samuel Laubli | 5/27
Background Corpus
Corpus
Les plus anciens documents linguistiques de la France (DocLing)
Old French charters of the 13th century
Collection founded by Jacques Monfrin (Ecole Nationale des Chartes)
Now pursued by Martin-Dietrich Glessgen (University of Zurich)
Currently comprises over 2000 documents from different regions
B k d C
7/30/2019 Samuel Lubli, Martin-Dietrich Glessgen, Phoenix2. A Tool for Web-Based Annotation of Medieval Texts
6/27Samuel Laubli | 6/27
Background Corpus
Corpus
Departements Editors [Adaptors] # Doc.
1. Published VolumesOise Carolus-Barre [Tock, Grubl] 202Haute-Marne Gigot [Tock, Kiha] 142Vosges Lanher [Trotter] 285Aube, S.-et-M., Yonne Coq 103
2. Revised Volumes
Meurthe-et-Moselle Arnod, Glessgen 290Douai Mestayer, Brunner 350
3. New Volumes in Progress
Jura Muller 105Marne Kiha 230Meuse Matthey 250
Moselle Pitz 180Nievre Alletsgruber 30Haute-Saone Muller 155Saone-et-Loire Alletsgruber 95Chancellerie royale Videsott 150 [+350]
Adapted from [Glessgen, 2011]
Background Digital Edition
7/30/2019 Samuel Lubli, Martin-Dietrich Glessgen, Phoenix2. A Tool for Web-Based Annotation of Medieval Texts
7/27Samuel Laubli | 7/27
Background Digital Edition
Digital Edition
Project lead: Martin-Dietrich Glessgen
Aimed at editing Old French charters of the 13th century
Charters are manually transcribed into a machine-readable format
Double encoding principle:
a) Original (ancient) view
b) Modern view Use the same data for print and online editions
Background Tools
7/30/2019 Samuel Lubli, Martin-Dietrich Glessgen, Phoenix2. A Tool for Web-Based Annotation of Medieval Texts
8/27Samuel Laubli | 8/27
Background Tools
Digital Edition Requirements
Functional Requirements:
Editor for assisting editors in transcribing charters
Storage and management of transcribed charters
Querying of transcribed charters
Annotation Text level (date, genre, regest, ...) Word level (Lemma, PoS, Morphology, ...)
Export in distinct formats for: Print publication
Web publication Research (working formats) Use within other tools External Editing
Background Tools
7/30/2019 Samuel Lubli, Martin-Dietrich Glessgen, Phoenix2. A Tool for Web-Based Annotation of Medieval Texts
9/27Samuel Laubli | 9/27
Background Tools
Digital Edition Requirements
Functional Requirements:
Working process, programs
TAGGING TOOL LEXICOGRAPHIC TOOL
Entities/Data
charter xml-charter
enhanced xml-charter
S-1.1 S-1.2
XML-EDITOR
mapping entry
UML Control Flow
Background Tools
7/30/2019 Samuel Lubli, Martin-Dietrich Glessgen, Phoenix2. A Tool for Web-Based Annotation of Medieval Texts
10/27Samuel Laubli | 10/27
Background Tools
Digital Edition Requirements
Quality Requirements:
Powerful yet easy to use
Fast querying
Easily accessible (client-server architecture)
Use of non-commercial technology
Background Tools
7/30/2019 Samuel Lubli, Martin-Dietrich Glessgen, Phoenix2. A Tool for Web-Based Annotation of Medieval Texts
11/27Samuel Laubli | 11/27
g
Phoenix2: Architecture
Phoenix2 is a web-based tool for managing, querying, and annotating
medieval texts.
PHOENIX2 Web Interface(browser-based)
CSS
phoenix2-cssCSS-Framework
XHTML
JavaScript
jQueryJavascript-Framework
PHP
MySQLRDBMS
ApacheWebserver
nformal
Phoenix2 in Use
7/30/2019 Samuel Lubli, Martin-Dietrich Glessgen, Phoenix2. A Tool for Web-Based Annotation of Medieval Texts
12/27
Samuel Laubli | 12/27
2. Phoenix2 in Use
Live Demonstration
Phoenix2 in Use Import
7/30/2019 Samuel Lubli, Martin-Dietrich Glessgen, Phoenix2. A Tool for Web-Based Annotation of Medieval Texts
13/27
Samuel Laubli | 13/27
Live Demonstration
Importing Texts
Phoenix2 in Use Import
7/30/2019 Samuel Lubli, Martin-Dietrich Glessgen, Phoenix2. A Tool for Web-Based Annotation of Medieval Texts
14/27
Samuel Laubli | 14/27
Machine-Readable Format: XML/XSD
Phoenix2 builds upon texts encoded in an idiosyncratic XML format. We
use three schemata:
entry: Lightweight markup aimed at facilitating the initialtranscription of charters (original format). Either tokenized oruntokenized.
storage: Main format for use within Phoenix2. Thoroughly tokenized;all Tokens are typed (tok/num/punct).
edit: Similar to storage, but slightly adapted for use in externalXML editors.
Extra attributes for word-level annotations Checksums for re-import into Phoenix2 (check-in)
Phoenix2 in Use Import
7/30/2019 Samuel Lubli, Martin-Dietrich Glessgen, Phoenix2. A Tool for Web-Based Annotation of Medieval Texts
15/27
Samuel Laubli | 15/27
Indexing Texts in a Relational Database
Why does importing texts take quite a while?
Texts are indexed into a relational database
We use a relational MySQL database. This allows for
Fast querying Linking additional entities to texts without including them in the XML
Storing system data (user accounts, settings, ...)
Phoenix2 in Use Import
7/30/2019 Samuel Lubli, Martin-Dietrich Glessgen, Phoenix2. A Tool for Web-Based Annotation of Medieval Texts
16/27
Samuel Laubli | 16/27
Indexing Texts in a Relational Database
Phoenix2 in Use Querying
7/30/2019 Samuel Lubli, Martin-Dietrich Glessgen, Phoenix2. A Tool for Web-Based Annotation of Medieval Texts
17/27
Samuel Laubli | 17/27
Live Demonstration
Querying Texts
Phoenix2 in Use Querying
7/30/2019 Samuel Lubli, Martin-Dietrich Glessgen, Phoenix2. A Tool for Web-Based Annotation of Medieval Texts
18/27
Samuel Laubli | 18/27
Regular Expressions
Queries in Phoenix2 can be formulated using Regular Expressions.
abbe finds all words that contain the string abbe abbe, abbes, ...
^pou?r$ finds por and pour.
[aeiou]{3} finds words that contain three consecutive vowels ...
Phoenix2 in Use Annotation
7/30/2019 Samuel Lubli, Martin-Dietrich Glessgen, Phoenix2. A Tool for Web-Based Annotation of Medieval Texts
19/27
Samuel Laubli | 19/27
Live Demonstration
Annotating Words
Phoenix2 in Use External Editing
7/30/2019 Samuel Lubli, Martin-Dietrich Glessgen, Phoenix2. A Tool for Web-Based Annotation of Medieval Texts
20/27
Samuel Laubli | 20/27
Live Demonstration
External Editing
Hands-On Session
7/30/2019 Samuel Lubli, Martin-Dietrich Glessgen, Phoenix2. A Tool for Web-Based Annotation of Medieval Texts
21/27
Samuel Laubli | 21/27
3. Hands-On Session
Try it Yourself
Hands-On Session
7/30/2019 Samuel Lubli, Martin-Dietrich Glessgen, Phoenix2. A Tool for Web-Based Annotation of Medieval Texts
22/27
Samuel Laubli | 22/27
Log In
All you need is
Any modern internet browser
Internet connection
Log in via
URL: tiny.uzh.ch/2A
User: cost
Password: action
Enter login credentials twice
Feel free to explore and manipulate whatever you want its just a copy.
Conclusion
http://localhost/var/www/apps/conversion/tmp/scratch_1/tiny.uzh.ch/2Ahttp://localhost/var/www/apps/conversion/tmp/scratch_1/tiny.uzh.ch/2Ahttp://tiny.uzh.ch/2A7/30/2019 Samuel Lubli, Martin-Dietrich Glessgen, Phoenix2. A Tool for Web-Based Annotation of Medieval Texts
23/27
Samuel Laubli | 23/27
4. Conclusion
Phoenix2 A Tool for Web-Based Annotation of Medieval Texts
Conclusion
7/30/2019 Samuel Lubli, Martin-Dietrich Glessgen, Phoenix2. A Tool for Web-Based Annotation of Medieval Texts
24/27
Samuel Laubli | 24/27
Conclusion
Phoenix2 is an implementation based on the most recent
computational and philological standards.
It is aimed at
Transperancy of all data and source codes (i.e., well-documented opensource technology)
Connectivity through well-defined interfaces Persistance of all data and interfaces
Usability for both experts and novices
We pursue the stringent and uncompromising synthesis of philology,linguistics, and information technology based on a long-term, intensivecooperation between computational linguistics and special branches ofacademic knowledge.
Conclusion
7/30/2019 Samuel Lubli, Martin-Dietrich Glessgen, Phoenix2. A Tool for Web-Based Annotation of Medieval Texts
25/27
Samuel Laubli | 25/27
Conclusion
Feel free to try and get in touch with us.Feedback is very welcome.
7/30/2019 Samuel Lubli, Martin-Dietrich Glessgen, Phoenix2. A Tool for Web-Based Annotation of Medieval Texts
26/27
Samuel Laubli | 26/27
Thank You
These slides are available atwww.cl.uzh.ch/people/team/laeubli.html
http://www.cl.uzh.ch/people/team/laeubli.htmlhttp://www.cl.uzh.ch/people/team/laeubli.html7/30/2019 Samuel Lubli, Martin-Dietrich Glessgen, Phoenix2. A Tool for Web-Based Annotation of Medieval Texts
27/27
Samuel Laubli | 27/27
Bibliography
Glessgen, M.-D. (2011).
Presentation generale: architecture et methodologie du projet des plus anciensdocuments linguistiques de la france, edition electronique.
In Glessgen, M.-D., Kiha, D., and Videsott, P., editors, Lelaboration philologique et
linguistique des Plus anciens documents linguistiques de la France, Editionelectronique (Bibliotheque de l Ecole des Chartes 168), pages 8394.