Top Banner
IAC (ACCESS INTERFACE CORPUS) DEVELOPED BY BARCELONA MEDIA & UNIVERSITAT POMPEU FABRA TONI BADIA (BARCELONA MEDIA - UNIVERSITAT POMPEU FABRA) JUDITH DOMINGO (BARCELONA MEDIA) CARME COLOMINAS (UNIVERSITAT POMPEU FABRA) UCCTS, 2010 (Omskrik)
21

IAC (ACCESS INTERFACE CORPUS) DEVELOPED BY BARCELONA MEDIA & UNIVERSITAT POMPEU FABRA TONI BADIA (BARCELONA MEDIA - UNIVERSITAT POMPEU FABRA) JUDITH DOMINGO.

Mar 28, 2015

Download

Documents

Grace McCormick
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: IAC (ACCESS INTERFACE CORPUS) DEVELOPED BY BARCELONA MEDIA & UNIVERSITAT POMPEU FABRA TONI BADIA (BARCELONA MEDIA - UNIVERSITAT POMPEU FABRA) JUDITH DOMINGO.

IAC (ACCESS INTERFACE CORPUS)

DEVELOPED BY BARCELONA MEDIA & UNIVERSITAT POMPEU FABRA

TONI BADIA (BARCELONA MEDIA - UNIVERSITAT POMPEU FABRA)JUDITH DOMINGO (BARCELONA MEDIA)CARME COLOMINAS (UNIVERSITAT POMPEU FABRA)

UCCTS, 2010 (Omskrik)

Page 2: IAC (ACCESS INTERFACE CORPUS) DEVELOPED BY BARCELONA MEDIA & UNIVERSITAT POMPEU FABRA TONI BADIA (BARCELONA MEDIA - UNIVERSITAT POMPEU FABRA) JUDITH DOMINGO.

IAC CORPORA USE: REQUIREMENTS

It’s easy to build corpus from the web but difficult to search

We need tools that allow frequency statistics, sorting results, linguistically-annotated sequences, etc.

Page 3: IAC (ACCESS INTERFACE CORPUS) DEVELOPED BY BARCELONA MEDIA & UNIVERSITAT POMPEU FABRA TONI BADIA (BARCELONA MEDIA - UNIVERSITAT POMPEU FABRA) JUDITH DOMINGO.

Concordances software (MonoConc, Concordance)

Databases

Corpus query systems (ie.CQP, EMDROS)Useful but tough to learnNot useful for training as students spend too much

time to learn the query system

IAC CORPORA: SEARCHING METHODS

Page 4: IAC (ACCESS INTERFACE CORPUS) DEVELOPED BY BARCELONA MEDIA & UNIVERSITAT POMPEU FABRA TONI BADIA (BARCELONA MEDIA - UNIVERSITAT POMPEU FABRA) JUDITH DOMINGO.

IAC CORPORA: INTERFACES (SEARCHING METHODS)

DISADVANTAGESLearn more than 1 interface

from the user point of viewProgramming and design

interfaces background needed (external resources)

If different attribute types are added > new design of the interface > new founding needed

Usually, more expensive than other options

ADVANTAGESUser-friendly

Not necessary training

Page 5: IAC (ACCESS INTERFACE CORPUS) DEVELOPED BY BARCELONA MEDIA & UNIVERSITAT POMPEU FABRA TONI BADIA (BARCELONA MEDIA - UNIVERSITAT POMPEU FABRA) JUDITH DOMINGO.

IAC (ACCESS INTERFACE CORPUS)

Translation Department (UPF) had many corpus (changing and growing constantly)

IAC was born (developed by Barcelona Media and UPF)

GOALSMonolingual and aligned corporaFast and easy creation of interfaces for corpora One interface design for all the corpora

Page 6: IAC (ACCESS INTERFACE CORPUS) DEVELOPED BY BARCELONA MEDIA & UNIVERSITAT POMPEU FABRA TONI BADIA (BARCELONA MEDIA - UNIVERSITAT POMPEU FABRA) JUDITH DOMINGO.

IAC INTERFACES

Simple : Key Words Out of Context

Advanced : Key Words In Context

Statistics: KWIC and frequency-based results

*** For corpus searching and indexation, IAC uses Corpus WorkBench (CWB) developed by IMS Stuttgart

EXAMPLES IAC

Page 7: IAC (ACCESS INTERFACE CORPUS) DEVELOPED BY BARCELONA MEDIA & UNIVERSITAT POMPEU FABRA TONI BADIA (BARCELONA MEDIA - UNIVERSITAT POMPEU FABRA) JUDITH DOMINGO.

IAC CORPUS FORMAT

<metadata title = “Demo” year=“2010”>

<func=subj>

The Det sg

boy Noun sg

</func>

buysVerbsg

<func=DO>

pencils Noun pl

</func>

</metadata>

Tabular

xml for metadata

Verticalized

xml for structural data

Page 8: IAC (ACCESS INTERFACE CORPUS) DEVELOPED BY BARCELONA MEDIA & UNIVERSITAT POMPEU FABRA TONI BADIA (BARCELONA MEDIA - UNIVERSITAT POMPEU FABRA) JUDITH DOMINGO.

IAC CORPORA: INSERTING A CORPUS INTO IAC

Upload the corpus (txt file) at the server

Searching interface design through a graphical tool (included in IAC) according to the corpus type and the linguistic annotation added

Page 9: IAC (ACCESS INTERFACE CORPUS) DEVELOPED BY BARCELONA MEDIA & UNIVERSITAT POMPEU FABRA TONI BADIA (BARCELONA MEDIA - UNIVERSITAT POMPEU FABRA) JUDITH DOMINGO.
Page 10: IAC (ACCESS INTERFACE CORPUS) DEVELOPED BY BARCELONA MEDIA & UNIVERSITAT POMPEU FABRA TONI BADIA (BARCELONA MEDIA - UNIVERSITAT POMPEU FABRA) JUDITH DOMINGO.

IAC is a flexible and powerful tool that goes beyond current corpora interfaces limitations

User-friendly toolAccess to multiple corpus from the same

platformNo need of external developer or

programming backgroundFast interface creation that can be modified

easily

IAC CONCLUSIONS

Page 11: IAC (ACCESS INTERFACE CORPUS) DEVELOPED BY BARCELONA MEDIA & UNIVERSITAT POMPEU FABRA TONI BADIA (BARCELONA MEDIA - UNIVERSITAT POMPEU FABRA) JUDITH DOMINGO.

Thank you!

[email protected]

Temporary web:

http://webconsultaiactemporal.barcelonamedia.org

Page 12: IAC (ACCESS INTERFACE CORPUS) DEVELOPED BY BARCELONA MEDIA & UNIVERSITAT POMPEU FABRA TONI BADIA (BARCELONA MEDIA - UNIVERSITAT POMPEU FABRA) JUDITH DOMINGO.

SOME EXAMPLES…

Page 13: IAC (ACCESS INTERFACE CORPUS) DEVELOPED BY BARCELONA MEDIA & UNIVERSITAT POMPEU FABRA TONI BADIA (BARCELONA MEDIA - UNIVERSITAT POMPEU FABRA) JUDITH DOMINGO.

ADVANCED SEARCH

To show the advanced search, we use an annotated corpus with translation.

Let's look at examples of sequences with 1 or more words with syntax errors.

Page 14: IAC (ACCESS INTERFACE CORPUS) DEVELOPED BY BARCELONA MEDIA & UNIVERSITAT POMPEU FABRA TONI BADIA (BARCELONA MEDIA - UNIVERSITAT POMPEU FABRA) JUDITH DOMINGO.

ADVANCED SEARCH

Page 15: IAC (ACCESS INTERFACE CORPUS) DEVELOPED BY BARCELONA MEDIA & UNIVERSITAT POMPEU FABRA TONI BADIA (BARCELONA MEDIA - UNIVERSITAT POMPEU FABRA) JUDITH DOMINGO.

ADVANCED SEARCH

Page 16: IAC (ACCESS INTERFACE CORPUS) DEVELOPED BY BARCELONA MEDIA & UNIVERSITAT POMPEU FABRA TONI BADIA (BARCELONA MEDIA - UNIVERSITAT POMPEU FABRA) JUDITH DOMINGO.

ALIGNED CORPORA WITH METADATA

As example of aligned corpora, a Spanish > English corpus

Can

Could

May

Might

Poder (verb)

Our goal is to get examples of poder (Verb) translated as may or might in Economics texts.

Page 17: IAC (ACCESS INTERFACE CORPUS) DEVELOPED BY BARCELONA MEDIA & UNIVERSITAT POMPEU FABRA TONI BADIA (BARCELONA MEDIA - UNIVERSITAT POMPEU FABRA) JUDITH DOMINGO.

ALIGNED CORPORA WITH METADATA

Page 18: IAC (ACCESS INTERFACE CORPUS) DEVELOPED BY BARCELONA MEDIA & UNIVERSITAT POMPEU FABRA TONI BADIA (BARCELONA MEDIA - UNIVERSITAT POMPEU FABRA) JUDITH DOMINGO.

ALIGNED CORPORA WITH METADATA

Page 19: IAC (ACCESS INTERFACE CORPUS) DEVELOPED BY BARCELONA MEDIA & UNIVERSITAT POMPEU FABRA TONI BADIA (BARCELONA MEDIA - UNIVERSITAT POMPEU FABRA) JUDITH DOMINGO.

STATISTICS

Statistics are useful to get quantitative results of sequences. Our goal in this case is to get quantitative results of the prepositions that follow the verb pensar (to think) in Spanish

Page 20: IAC (ACCESS INTERFACE CORPUS) DEVELOPED BY BARCELONA MEDIA & UNIVERSITAT POMPEU FABRA TONI BADIA (BARCELONA MEDIA - UNIVERSITAT POMPEU FABRA) JUDITH DOMINGO.

STATISTICS

Page 21: IAC (ACCESS INTERFACE CORPUS) DEVELOPED BY BARCELONA MEDIA & UNIVERSITAT POMPEU FABRA TONI BADIA (BARCELONA MEDIA - UNIVERSITAT POMPEU FABRA) JUDITH DOMINGO.

STATISTICS

Back