Supporting e-learning with automatic glossary extraction Experiments with Portuguese

Supporting e-learning with

automatic glossaryextraction

Experiments with Portuguese

Rosa Del Gaudio, António BrancoRANLP, Borovets 2007

Presentation Plan

● LT4eL project● ILIAS● Corpus● Tool● Grammars

● Copula● Other Verbs● Punctuation

● Results● Conclusion

LT4eL● Improve retrieval and accessibility of LO in learning management systems●Employ language technology resources and tools for the semi-automatic generation of descriptive metadata .

●Develop new functionalities such as a key word extractor and a glossary candidate detector, semantic search, tuned for the various languages addressed in the project (Bulgarian, Czech, Dutch, English, German, Maltese, Polish, Portuguese, Romanian).

Objective

● Build a Glossary in an automatic way to support e-learning process. In practice this means to extract a definition from unstructured text (scientific papers, enciclopedia, web pages)

● Better access to information for student ●Accelerate the work of the tutor

ILIAS: Glossary Candidate Detector

The Corpus

• 274.000 tokens • Tutorials

• PhD Thesis

• Scientific papers

• 3 Domains evenly represented

• e-learning

• Technology for non experts

• Calimera

XML format

<definingText continue="y" def="m147" def_type1="is_def" id="d5"><markedTerm dt="y" id="m147" kw="y"><tok base="intranet" class="word" ctag="PNM" id="t9032" sp="y">Intranet</tok></markedTerm><tok base="ser" class="word" ctag="V" id="t9033" msd="pi-3s" sp="y">é</tok><tok base="uma" class="word" ctag="UM" id="t9034" msd="fs" sp="y">uma</tok><tok base="rede" class="word" ctag="CN" id="t9035" msd="fs" sp="y">rede</tok><tok base="desenvolver,desenvolvido" class="word" ctag="PPA" id="t9036" msd="fs"

sp="y">desenvolvida</tok><tok base="para" class="word" ctag="PREP" id="t9037" sp="y">para</tok><tok base="processamento" class="word" ctag="CN" id="t9038" msd="ms"

sp="y">processamento</tok><tok base="de" class="word" ctag="PREP" id="t9039" sp="y">de</tok><tok base="informação" class="word" ctag="CN" id="t9040" msd="fp"

sp="y">informações</tok><tok base="em" class="word" ctag="PREP" id="t9041" sp="y">em</tok><tok base="uma" class="word" ctag="UM" id="t9042" msd="fs" sp="y">uma</tok><tok base="empresa" class="word" ctag="CN" id="t9043" msd="fs" sp="y">empresa</tok><tok base="ou" class="word" ctag="CJ" id="t9044" sp="y">ou</tok><tok base="organização" class="word" ctag="CN" id="t9045" msd="fs">organização</tok><tok class="punctuation" ctag="PNT" id="t9046" sp="y">.</tok></definingText>

LxTransduce

• Input: simple text or xml

• Regular expressions

• Substitution and markup

• Output the same file with changes

• Match tree using elements

• Quick

• Unicode friendly

• freeware

• Easy to integrate in other tools (java)

Rules in lxtransduce

<rule name="Conj"> <query match="tok[@ctag =

'CJ']"/></rule>

First developmentphase

● Less than 50% of the corpus● Focus on the verb● Precision: manually marked/all automatic● Recall: correct automatic/manually marked● F2 :3*(precision*recall)/2*precision+recall

0.220.200.31Gr 01

0.260.440.14Gr 00

F2RecallPrecision

Second developing phase

• 75% of the corpus for developing

• 25% of the corpus for testing

• Specific grammar/rules for each type

Copula baseline grammar

Verb “to be” third person singular or plural present indicative

Copula base result

• Sentence level results

• Problem with precision

Copula Grammar

Rules for is_type

<rule name="Serdef"> <querymatch="tok[@ctag = ’V’ and

@base=’ser’ and(@msd[starts-with(.,’fi-

3’ )]or @msd[starts-with(.,’pi-

3’ )])]</rule>....

<rule name="copula1"><seq><ref name="SERdef"/><best><seq><ref name="Art"/><ref name="adj|adv|prep|"

mult="*"/><ref name="Noun" mult="+"/></seq>....</best><ref name="tok" mult="*"/><end/></seq></rule>

Confronting Results

Include that patterns that were excluded

Try to gather the syntactic pattern of non definition and confront with the syntactic pattern of definition.

Other_Verbs grammar

• Collect verbs in a lexicon• Three different category:

reflexive, active, passive.• 22 different verbs

<rule name="Vpas"><seq><ref name="tok"/><not><ref name="not"/> </not><ref name="tok" mult="?"/><query match="tok[mylex(@base)

and (@ctag='PPA')]" constraint="mylex(@base)/cat='pas'"/>

</seq></rule>

Results for verb_type

• Analyze each verbs separately as with is_type

• Richer syntactic patterns

Punctuation Grammar

<rule name="punct_def"><seq><start/><ref name="CompmylexSN"

mult="+"/><query match="tok[.~’^:\$’]"/><ref name="tok" mult="+"/><end/></seq></rule>

●Preliminary work

●Definition introduced by colon mark (most frequent)

All-in-one

• Combination of the previous grammars

• The type is not take into account to calculate precision and recall

Conclusions and Future Work

• Overall results: Recall 86%, Precision 14%

• Difference among domains: the style of a document influence the result.

• Improve the rules for verb_type and punc_type

• Combining with other techniques such as ML

Supporting e-learning with automatic glossary extraction Experiments with Portuguese

learning material

elearning tasks

elearning process

verbs grammarcollect

project bulgarian

precision recallsecond

automatic way

effectiveness of retrieval

Documents

CGAP Glossary English to Portuguese Jan 2007

Chapter Two Portuguese Atlantic Islands...Portuguese, a...

Glossary of Nautical Terms: English – Portuguese...

IMF Glossary · IMF glossary : English-French-Portuguese =....

Glossary of Nautical Terms: English Portuguese English

DptOIE: A Portuguese Open Information Extraction system ...

Glossary of Nautical Terms: English â€“ Portuguese...

PHOTOCOPIABLE MATERIAL © Express...

FOREST FIRE MULTILINGUAL GLOSSARY SPANISH VERSION ·...

Social Studies y Glossary Glossar - rcsdk12.org ·...

ITILV3 Glossary Brazilian Portuguese v3.1.24

Glossário bilíngue de termos de microfinanças Glossary of...

ENGLISH LANGUAGE ARTS Translation of ELA Glossary€¦ ·.....

My Portuguese Phrasebook - Learn Portuguese

The names of lighting artefacts: extraction and...

Let’s speak sustainable construction - eesc.europa.eu ·....