Information Access I Multilingual Text Summarization GSLT, Göteborg, October 2003 Barbara Gawronska, Högskolan i Skövde
Dec 21, 2015
Information Access I Multilingual Text Summarization
GSLT,
Göteborg, October 2003
Barbara Gawronska, Högskolan i Skövde
Types of summaries
(Spärck Jones 1999, Hovy & Lin 1999)
With respect to content: Indicative: provide an idea what the text is about, but do not
render the content Informative: shortened versions of the text
With respect to the way of creating: Extracts: reused portions of the text Abstracts: re-generated text reflecting the important content Compressed texts: (Knight & Marcu 2000): compressing
syntactic parse trees in order to get a shorter text
Text compression (Knight & Marcu 2000, Lin 2003)
”Given the original sentence t, find the best short sentence s generated from t, i.e. maximize P(s|t).
Original sentence (Lin 2003):
In Louisiana, the hurricane landed with wind speeds of about 120 miles per hour and caused severe damage in small coastal centres such as
Morgan City, Franklin and New Iberia
Text compression (2) (fragments of Fig. 1 in Lin 2003)
Number of Words Adjusted Log-Prob
Raw Log -Prob Sentence
14 -9.212 -128.967 In Louisiana, the hurricane landed withwind speeds of about 120 miles perhour.
14 -9.216 -129.022 The hurricane landed and causedsevere damage in small centres suchas Morgan C ity.
12 -9.252 -111.020 In Louisiana, the hurricane landed withwind speeds and caused severedamage.
14 -9.315 -130.406 In Louisiana the hurricane landed withwind speeds of about 120 miles perhour.
12 -9.372 -112.459 In Louisiana the hurricane landed withwind speeds and caused severedamage.
12 -9.680 -116.158 The hurricane landed with wind speedsof about 120 miles per hour.
10 -9.821 -98.210 The hurricane landed with wind speedsand caused severe damage.
Different genres and tasks require different summaries (informative summaries not so good for detective stories
)
and
Different texts require different summarization techniques
A special case: dialogue summarization:
selecting successful ’dialog transactions’ –
the game theoretical approach
(Verbmobil: Wahlster, Alexandersson)
A possible combination system including multilingual summarization of news reports
Lexicaldatabases,
grammar rules
Speechrecognition Parsing
Speechsynthesis
Textgeneration
Newsreports
Inform ationExtraction
Key wordsand phrases
Evaluation of different methods of semantic classification in the lexicon
Development of a summarization module that would be well-suited for the news domain
A comparison between the ‘traditional’ machine translation (MT) on the one side, and information extraction (IE) combined with reading
comprehension (RC) and multilingual text generation (MTG) on the other side
Exploration of the interplay between textual structure, syntax, and prosodic markers.
The main objectives of the Newspeak project:
GUERILLA FIGHTS IN LEBANON
Israeli warplanes and artillery attacked suspected guerrilla hideouts Friday following a series of clashes in south Lebanon. Four guerrillas were reportedly killed. Guerrillas of the Syrian-backed Amal group attacked Israeli and allied militia positions in the Israeli-occupied zone at daybreak, Lebanese security officials said. Three guerrillas were killed in the assaults, said an Israeli army spokesman in Jerusalem. Amal said none of its fighters was killed.
One of the main problems with media texts: no possibility of stating what is a true fact (hence, some
criticism could be raised against TREC factoid questions...)
Base space B
Belief space M
a
a1
a1: blue eyes
a: green eyes
Max believes thewoman with greeneyes has blue eyes
The Theory of Mental Spaces (Fauconnier1985, Fauconnier and Sweetser 1996)
The notion of ’mental spaces’(Fauconnier 1985, Sweetser & Fauconnier 1996, Sanders & Redeker 1996)
ID
B 1 M = B 2sa id
s
x '(m ') x (m )
"T he m an w as c lea rly on the run fromthe po lice ", the spokesm an sa id .
s = spokesm anm = the m anx = c lea rly on the run from the po liceB 1 = N arra to r's rea lity , base spaceM = C harac te r's rea lity , em bedded spaceB 2 = C harac te r's rea lity , new base space
ID
B 1 Maccord ing to
s
x ''(m ') x '(m ) x
B 2
" "
A cco rd ing to the spokesm an , the m anw as "c lea rly on the run from the po lice ".
s = spokesm anm = the m anx = c lea rly on the run from the po liceB 1 = N arra to r's rea lity , base spaceM = C harac te r's rea lity , em bedded spaceB 2 = C harac te r's rea lity , new base space
GUERILLA FIGHTS IN LEBANON
Israeli warplanes and artillery attacked suspected guerrilla hideouts Friday following a series of clashes in south Lebanon. Four guerrillas were reportedly killed. Guerrillas of the Syrian-backed Amal group attacked Israeli and allied militia positions in the Israeli-occupied zone at daybreak, Lebanese security officials said. Three guerrillas were killed in the assaults, said an Israeli army spokesman in Jerusalem. Amal said none of its fighters was killed.
One of the main problems with media texts: no possibility of stating what is a true fact
(hence, some criticism could be raised against TREC factoid questions...)
’Mental Spaces’ in sample text 1
M 2Sender: Lebanese
security offic ia ls
Am al attacked IsraelP lace: Israeli-occupied zone
T im e:Saturday daybreak
M 3Sender: Israeliarm y spokesm an
M 4 Sender:Am al
R esult in M 3:Three guerillasdead
R esult in M 4: N oguerillas dead
M 1:Israel attacks guerilla h ideoutsfour guerillas k illedT im e: F riday
MSender: newsagency
Sample text 2
BEIT JALA, West Bank
Israeli troops pulled out of Beit Jala before dawn on Thursday, leaving the Palestinian town quiet amid reports of fresh violence in other West Bank towns.The Palestinians said the Israel Defence Forces had staged incursions into Hebron, killing one and injuring 16 others, and Tulkarem, killing one and injuring 10. The Israel Defence Forces (IDF) had no immediate comment on the accusation that troops had entered Tulkarem, and strongly denied there was an incursion at Hebron.
’Mental Spaces’ in sample text 2
in f_source : C N Np lace : B e it Ja latim e : A ugus t 30 2001
in f_source : new s agencyc la im :[p lace : B e it Ja la
tim e : B e fo re daw n on T hursdayac tion : Is rae li troops pu lled ou tresu lt: N o dead , no in ju red ]
M
M1
M2
M3
M4M3bin f_source : R eports
c la im :[p lace : W es t B ank tow nstim e : T hursdayac tion : V io lenceresu lt: N o t know n ]
in f_source : T he Is rae l D e fense F o rcesc la im : the even t repo rted in M 3a is no t true
c la im :[p lace : H ebrontim e : T hursdayac tion : Is rae li incu rs ionsresu lt: 1 dead , 16 in ju red ]
c la im :[p lace : T u lka remtim e : T hursdayac tion : Is rae li incu rs ionsresu lt: 1 dead , 10 in ju red ]
M3a
in f_source : P a les tin ians
English N ews
The reading com ponent: aprocedure transform ing the news
texts file in to P rolog-lists
P reparsing and pre lim inarysubdom ain identification
Identification of m enta l spaces
Tem plate filling
G eneration of a sum m ary in therequired target language provided
with prosodic m arkers
Infovox text-to-speech system
English lexicon
Subdom ain-specific tem plates lis tsof key words and sem antic features
Target language lexiconand gram m ar
Newspeak – the extraction and generation modules
Exploding objects
anyth ing having existence (liv ing or nonliv ing)
a physica l (tangib le and vis ib le) entity
a m an-m ade object
an artifact (or system of artifacts) that is instrum enta l inaccom plish ing som e end
som eth ing that serves as a m eans of transportation
weapons considered collectively
w eaponry used in fighting or hunting
a conveyance that transports people or objects
any vehicle propelled by a rocket_engine
a rocket-propelled vehicle carrying passengers orinstrum ents or a w arhead (missile)
a body that is throw n or pro jected (missile)
an instrum enta lity invented for a particu lar purpose
bursts with sudden violence from internal energy
an explosive device fused to denote under specific conditions(bomb)
missilebomb
WordNet Classification:
S y n set (W ord N etd efin ition )
S am p leW ord C ategory inN ew sp eak
D om in atin g ca tegoryin N ew sp eak
a g ro u p o f p e o p le w h ow o rkto g e th e r
a rm y , m ilita ry , tro o p s ,o rg a n isa tio n , a g e n c y ,p a rty
G ro u p o f p e o p le
g ro u p o f p e o p le w illin g too b e y o rd e rs
a rm y , m ilita ry , troo p s ,p o lic e …
A rm e d fo rc e s G ro u p o f p e o p le
a c o n v e y a n c e th a ttra n sp o rts p e o p le o ro b je c ts
ta n k , m iss ile , ro c k e t.. M e a n s o ftra n sp o rta tio n
a v e h ic le th a t m o v e s o nw h e e ls a n d u su a lly h a s ac o n ta in e r fo r tra n sp o rtin gth in g s o r p e o p le
ta n k , c a r… M e a n s o ftra n sp o rta tio n ,m e d iu m : e a rthsu rfa c e
M e a n s o ftra n sp o rta tio n
a v e h ic le u se d b y th ea rm e d fo rc e s
ta n k D e stru c tio n a sfu n c tio n
b u rs ts w ith su d d e nv io le n c e fro m in te rn a le n e rg y
b o m b D e stru c tio n a sfu n c tio n
WordNet vs. Newspeak noun classification
S y n set (W ord N etd efin ition )
S am p leW ord C ategory inN ew sp eak
D om in atin g ca tegoryin N ew sp eak
a sp e e c h a c t th a t c o n v e y sin fo rm a tio n
re p o rt S o u rc e o fin fo rm a tio n
a sp e e c h a c t th a t c o n v e y sin fo rm a tio n
re p o rt n e u tra l S p e e c h a c t
a m e a n s o r in s tru m e n ta lityfo r c o m m u n ic a tin g
ra d io , n e w sp a p e r S o u rc e o fin fo rm a tio n
a la rg e in d e fin ite lo c a tio no n th e su rfa c e o f th e E a rth
re g io n , to w n , c o u n try p la c e 2 D + c o n v .b o rd e rs
P la c e
a n y ro a d o r p a th a ffo rd in gp a ssa g e fro m o n e p la c e toa n o th e r
s tre e t p la c e -p a th (1 D ) P la c e
a s tru c tu re th a t h a s a ro o fa n d w a lls a n d s ta n d s m o reo r le ss p e rm a n e n tly in o n ep la c e
re s ta u ra n t,c h u rc h
p la c e _ 3 D P la c e
th e so lid p a rt o f th e e a rth 'ssu rfa c e
p e n in su la ,m o u n ta in s
p la c e 2 D + n a tu ra lb o rd e rs
P la c e
WordNet vs. Newspeak noun classification (2)
The outline of the summarization processEnglish News
Tokenization
Named Entity Recognition
Semantic classification, identificationof closed-class words
Identification of words denotingspeech acts
Identification of ”senders”(coreference identification included)
Identification of ”Mental Spaces”,selection decisions
TL-summary generation
Closed-class lexicon
VerbNet
WordNet
Subdomain-specific templates
TL-generators
Domain-specific reclassificationpatterns
Template filling”Squeezing” or
Named Entity Recognition and ClassificationStart
Place pointer at the firstword in the sentence
Move pointer to nextword
First LetterUppercase?
Word in’NO-ProperName’ DB?
Add to Proper NameCandidate String
Word inProper Name
IndicatorDB?
Proper NameCandidate String
empty?
The 1:st wordin Proper Name Candidate
String =2nd word in the
sentence?
the 1:st word in thesentence = closed-
class word?
Add to Proper NameCandidate String (initial
position)Semantic Classification
of Proper Name(clear Proper Name
string)
More words in thesentence?
Isthe Sentence FirstWord Classified?
Is the word alluppercase and more
than one token
End
Closed-classword?
SemanticClassification
Yes
No
Yes
No
Yes
No
Yes
No
Yes
NoYes
No
Yes No Yes
No
Yes
Yes
No
Iraqi President Saddam Hussein is striking a defiant tone a day after U.S. President George Bush's State of the Union address, saying his nation is ready to "destroy and defeat" any American attack.
In a televised meeting with his military commanders on Wednesday, Saddam said the U.S. had no right to attack his country, and every American soldier is coming "as an aggressor."
"If they have illusions, by God, America will be harmed," the Iraqi leader said. "[It is] not in the American people's interest that such harm come to it, its reputation and economy."
In a powerful address Tuesday evening, Bush braced Americans and the rest of the world for a possible war with Iraq, warning that America was determined in its resolve to see Saddam disarmed.
Sample text 3
[source(semcat(Iraqi President Saddam Hussein,[propername,human([]),human([high_status])])),semcat(tone,[[],speech_act(manner)]),circ([semcat(is,[[],cop([])]),semcat(striking,[[],[]]),semcat(a,[[],det([])]),semcat(defiant,[[],[]])]),said([semcat(a,[[],det([])]),semcat(day,[[],time_period([])]),semcat(after,[[],prep([])]),semcat(U.S. President George Bush_s State,[propername,place([country]),group_of_people([]),human([high_status]),human([]),place([d23,convent_borders])]),semcat(of,[[],prep([])]),semcat(the,[[],det([])]),semcat(Union,[propername,explosion([]),group_of_people([]),place([country])]),semcat(address,[[],speech_act([neutral]),place([d2])]),semcat(saying,[[],say_verb([neutral])]),semcat(his,[[],poss([])]),semcat(nation,[[],place([country]),group_of_people([])]),semcat(is,[[],cop([])]),semcat(ready,[[],[]]),semcat(to,[[],prep([])]),semcat(",[[],[]]),semcat(destroy,[[],[]]),semcat(and,[[],konj([])]),semcat(defeat,[[],[]]),semcat(",[[],[]]),semcat(any,[[],det([])]),semcat(American,[propername,human([])]),semcat(attack,[[],military_operation([])]),semcat(.,[[],[]])]),[]]
[source(semcat(Saddam,[propername,[]])),semcat(said,[[],say_verb([neutral])])…
Sample output from SemCat + speaker and speech act identification
coreference checked
[source(semcat(Iraqi President Saddam Hussein,[propername,human([]),human([high_status])])),semcat(tone,[[],speech_act(manner)]),circ([semcat(is,[[],cop([])]),semcat(striking,[[],[]]),semcat(a,[[],det([])]),semcat(defiant,[[],[]])]),said([semcat(a,[[],det([])]),semcat(day,[[],time_period([])]),semcat(after,[[],prep([])]),semcat(U.S. President George Bush_s State,[propername,place([country]),group_of_people([]),human([high_status]),human([]),place([d23,convent_borders])]),semcat(of,[[],prep([])]),semcat(the,[[],det([])]),semcat(Union,[propername,explosion([]),group_of_people([]),place([country])]),semcat(address,[[],speech_act([neutral]),place([d2])]),semcat(saying,[[],say_verb([neutral])]),semcat(his,[[],poss([])]),semcat(nation,[[],place([country]),group_of_people([])]),semcat(is,[[],cop([])]),semcat(ready,[[],[]]),semcat(to,[[],prep([])]),semcat(",[[],[]]),semcat(destroy,[[],[]]),semcat(and,[[],konj([])]),semcat(defeat,[[],[]]),semcat(",[[],[]]),semcat(any,[[],det([])]),semcat(American,[propername,human([])]),semcat(attack,[[],military_operation([])]),semcat(.,[[],[]])]),[]]
[source(semcat(Iraqi President Saddam Hussein,[propername,human([]),human([high_status])])),semcat(said,[[],say_verb([])]),…
Sample output from SemCat + speaker and speech act identification (2)
Searle’s classification of illocutionary acts
Macro-class Words-worldrelation
The psychologicalstate of the sender
Sample verbs
Representatives The speaker fits hiswords to the world
Belief that p Claim, announce,forecast, predict
Directives Attempt to achieve asituation where theworld fits t o thewords
Wanting that p Ask, beg, order,forbid, instruct
Commissives Commit the speakerto act in order to fitthe world to thewords
Intending p Promise, offer,swear, threaten
Declarations Alter the world Wed, baptise, name,call, dub
Expressive s No dynamic world -words relationship
Specified in thesincerity conditionexpressed by theprepositional content
Thank, apologize,congratulate, regret,pardon
The classification of speech act phrases in the Newspeak lexicon (1)
Phrases Intention: Swants R tobelieve that…
Feature(s) in the system lexicon Macro group inthe system lexicon
say, claim,report,announce,inform
p informative(neutral) informative
confirm p informative(positive) informativedeny Not p informative( negative) informativecall p p’ p=p’ interpretation(neutral) interpretationcondemn p & negative(p) interpretation(negative) interpretationprize p & positive(p) interpretation(positive) interpretationforecast,predict, assume,hypothesise
p is placed in ahypothetical(future) mentalspace
hypothese(neutral) hypothese
The classification of speech act phrases in the system lexicon (2)
Phrases Intention: Swants R tobelieve that…
Feature(s) in the system lexicon Macro group inthe system lexicon
offer, promise,swear
p is placed in ahypothetical(future) mentalspace &positive(p)
hypothese(positive) hypothese
warn, threaten p is placed in ahypothetical(future) mentalspace &negative(p)
hypothese(negative) hypothese
blame x on paccuse x of p
p & cause(x,p)& negative(p)
cause_interpretation(negative) interpretation
suspect x for p p & negative(p)& in ahypotheticalspace cause(x,p)
hypothetical_cause_interpretation interpretation,hypothese
declined to say,refused to say,neitherconfirmed nordenied, had nocomments
preferably not p utterance_refusal utterance_refusal
Some principles for selection of claims to be rendered:
1) Informatives: • Neutral, the sender is not marked for high status: officials said, the news agency reported, reportedly…A claim p introduced by a neutral informative is rendered in the summary; the source is omitted if there are no denials or confirmations of p in the text and if the source is not marked for high status, like ‘President’
• Neutral, the sender marked for high status, and ‘declarations’: the President said…the government condemned…The source is rendered if it is marked for high status
• Affirmative; confirmations of explicit claims: Israeli sources confirmed that…Confirmations of previous explicit claims are omitted in the summary
• Affirmative; confirmations of claims that are not explicitly mentioned:Both the information source and the claim, including the type of the speech act phrase, are rendered in the summary, if the speech act is a confirmation of a claim not present in the news report
Some principles for selection of claims to be rendered:
1) Informatives: • Negative, or neutral followed by denied claims: The president denied, The Israeli source said that it is not true…Both the initial claim and its denial are rendered in the summary together with the information about the senders
2) Utterance refusal, negated speech act phrases, hypotheses, commissives, interpretations: The Israeli sources neither denied or confirmed, the minister did not say, if…, the defense secretary declined to say…, the government had no immediate comments…
Utterance refusals or negated speech act phrases related to an explicit claim are omitted
If a source refuses to confirm/deny a claim that has not been explicitly mentioned in the previous part of the text, the whole speech act is rendered, inclusive the type of the speech act
Hypotheses and commissives are rendered together with their sources and marked for unsure epistemic status
Some principles for selection of claims to be rendered:
3) Epistemic spaces: e. g. no one knows if the device was planted deliberately or if it was leftover from New Year’s Eve
If two claims would exclude each other in the same mental space, and if no source in the text takes responsibility for any of these claims, both claims are to be rendered as hypotheses
Sample input text
RAMALLAH, West Bank -- Palestinian leader Yasser Arafat said Thursday that elections as part of a reform of the Palestinian Authority will be held this winter, whether or not Israeli forces withdraw from the Palestinian territories.That represented a change of course from Arafat, who said last week that no elections would be held until the Israelis pulled back. Shortly after Arafat's announcement, a committee he had appointed to set up elections resigned, according to Israel Radio, because Arafat would not agree to a specific date for the elections. Other Palestinian leaders said the resignations were a procedural matter. Arafat also condemned Wednesday's suicide bombing in the Israeli town of Rishon Letzion . Two Israelis were killed and at least 37 others wounded when the bomber detonated explosives in the center of a crowded pedestrian district.The terror attack marked the second time in two weeks a suicide bombing directed at civilians has rocked Rishon Letzion, a town about 15 miles southeast of Tel Aviv. On May 8, a suicide attack at a pool hall killed 15 people and wounded dozens of others."Suddenly there was an explosion," 16-year-old Shmuel Voller told The Associated Press on Wednesday.The bombing occurred on Rothschild Street in the heart of the town around 9:15 p.m. (2:15 p.m. ET).
Generation: sample summary
RAMALLAH, West Bank -- Palestinian leader Yasser Arafat said Thursday that elections as part of a reform of the Palestinian Authority will take place this winter, whether or not Israeli forces withdraw from the Palestinian territories.On Wednesday, a suicide bombing took place in the Israeli town of Rishon Letzion, on Rothschild Street in the center of a crowded pedestrian district, around 9:15 p.m. (2:15 p.m. ET). Two Israelis were killed and at least 37 others wounded. Arafat condemned the attack.
Swedish:Israeliska trupper tågade ut ur Beit JalaIsraeli+pl troops marched out of/left Beit Jala (tågade ut ur instead of *drog ut av)
Polish:Wojska izraelskie wycofały się z Beit JalaTroops Israeli backed out from Beit Jala(wycofały się instead of *wyciągnęły or *wyciągały).
Generation
TL vocabulary more restricted than SL vocabulary
TL pattern fit textual/semantic relations
E: A bomb exploded in Bilbao, Spain, early Friday morning. S: En bomb exploderade i den spanska staden Bilbao tidigt på fredagsmorgonen a bomb explode-past in def Spanish city Bilbao early on Friday-morning-def E: There were no injuries.S: Inga personskador rapporterades no person-injuries report-past-passive E: ETA is suspected for being responsible for the attack.S: Förmodligen ligger ETA bakom bombdådet. Presumably lay-pres ETA behind bomb-outrage-def
Generation
Animacy degree
Gramma-tical gender
Semantic features
Accusative form
Adjective ending in plural
Verb ending in plural,past tense
inanimate +ma/+fe -alive acc=nom -e -ły
+ne +/- alive
semianimate +ma - alive, + mobile or + spherical
sg: acc=gen or acc=nom,pl: acc=nom
-e -ły
animate +ma/+fe + alive sg: acc=gen,pl:acc=nom
-e -ły
superanimate +ma + humanacc=gen
-i/-y -li
The grammatical and semantic characteristics of Polish nouns
Krzesła sta-łyChair+PL stand+PAST+PL’The chairs were standing (there)’
Psy sta-łyDog+PL stand+PAST+PL’The dogs were standing (there)’
Duchy sta-łyGhost+PL stand+PAST+PL’The ghosts were standing (there)’
Dziewczynki sta-łyGirl+PL stand+PAST+PL’The girls were standing (there)’
Krzesła sta-łyChair+PL stand+PAST+PL’The chairs were standing (there)’
Chłopcy sta-liBoy+PL stand+PAST+PL+MALE+HUMAN’The boys were standing (there)’
PolW N Database
Case DeclNumberGenderSemCatW ord Cat
Case DeclNumberGenderSemCatW ord Cat
Pojawili się więc Algierczycy, Jemeńczycy, obywatele Bangladeszu, Uzbecy, Kirgizi i Tadżycy.
AlgierczycyJemeńczycyobywateleUzbecy
n hum ma pl nom 35n hum ma pl nomn hum ma pl nomn hum ma pl nom
351436
KirgiziTadżycy
n hum ma pl nomn hum ma pl nom
3835
Pojawili v hum ma pl
Stop-list and a suffix list with declension numbers
’There arrived Algerians, Yemenis, citizens of Bangladesh, Uzbeks, Kirgizis, and Tadjiks’
Extracting ’superanimate’ nouns (1)
Postverbal subjects:We wtorek w stolicy Kataru zebrali się na nieformalnej konferencji ministrowie 22 państw Ligi Arabskiej.
Preverbal subjects:W przyjętej w Dausze wspólnej deklaracji Arabowie zdecydowanie potępili terroryzm we wszelkich formach.
Antecedents of the relative pronoun ’którzy’:Komórka składała się z wielu dziesiątek osób, w tym dwóch pilotów, którzy kształcili się w tych samych szkołach amerykańskich, co Mohammed Atta.
‘22 ministers of the Arab countries gathered together at an informal conference in the capital of Qatar on Tuesday.‘
‘In the joint declaration the Arab leaders have strongly condemned all forms of terrorism.’
‘The cell consisted of dozens of people, including two pilots, who had completed their education at the same American schools that Mohammed Atta attended.’
Extracting ’superanimate’ nouns (2)
The decrease of unknown superanimate noun forms during the training phase
(training on 4 files, ca 11 000 words each) – normalized data
0
20
40
60
80
100
120
140
160
1 2 3 4
Corpus
Co
un
t
normalised unique forms
normalized unknownforms
0
10
20
30
40
50
60
70
80
90
100
1 2 3 4
Corpus
%
Percent correctlyclassified nouns
Correctly classified nouns
0
20
40
60
80
100
120
+ma,+hum,+pl,+nom
+ma,+hum
+ne,+sg,+nom
other nouns
other categories
World news Sport Science Business
Nouns (types) found in the database 166 47 64 48
Nouns (types) added to the database 24 121 98 78
Total 190 168 162 126
The results of post-editing after the training phase
The lexical coverage of different text domains
The general procedure for extracting and classifying different word classes in Polish
Stop-listB
Items w ith a highmarkedness degree
ACorpus study,
linguistichypotheses
Database
FMost frequent
inflectional forms
EPost-editing
CAgreeing items in
sentence or phrasecontext
DGenerating
inflectional forms
Stop-listpojaw ili
v: +pl,+ma,+hum
ACorpus study,
linguistichypotheses
Database
FMost frequent
inflectional forms
EPost-editing
CAgreeing items in
sentence or phrasecontext
DGenerating
inflectional forms
Stop-listpojaw ili
v: +pl,+ma,+hum
ACorpus study,
linguistichypotheses
Database
FMost frequent
inflectional forms
EPost-editing
Algierczycyn:+pl,+ma,
+hum,+nom
DGenerating
inflectional forms
FMost frequent
inflectional forms
Stop-listpojaw ili
v: +pl,+ma,+hum
ACorpus study,
linguistichypotheses
Database
EPost-editing
Algierczycyn:+pl,+ma,
+hum,+nom
Algierczyk+sg,+nom
Algierczyków+pl,+gen
Stop-listB
Items w ith a highmarkedness degree
ACorpus study,
linguistichypotheses
Database
Algierczyków+pl,+gen
EPost-editing
CAgreeing items in
sentence or phrasecontext
DGenerating
inflectional forms
Stop-listB
Items w ith a highmarkedness degree
ACorpus study,
linguistichypotheses
Database
Algierczyków+pl,+gen
EPost-editing
DGenerating
inflectional forms
protestujących
'protesting':prt,+pl,+gen
Stop-listB
Items w ith a highmarkedness degree
ACorpus study,
linguistichypotheses
Database
Algierczyków+pl,+gen
EPost-editing
protestujących
'protesting':prt,+pl,+gen
protestować protestujący protestującą
v, infprt,+pl,+nomprt,+sg,+fe,+acc
Stop-list
ACorpus study,
linguistichypotheses
Database
FMost frequent
inflectional forms
EPost-editing
CAgreeing items in
sentence or phrasecontext
DGenerating
inflectional forms
protestującą prt,+sg,+fe,+acc
Stop-list
ACorpus study,
linguistichypotheses
Database
FMost frequent
inflectional forms
EPost-editing
DGenerating
inflectional forms
protestującą prt,+sg,+fe,+acc
grupę 'group', n,+sg,+fe,+acc