Top Banner
Question Answering from the Web Using Knowledge Annotation and Knowledge Mining Techniques oleh Jimmy Lin and Boris Katz diceritakan kembali oleh: Jan Peter Alexander
22

Question Answering from the Web Using Knowledge Annotation ... filetipe teks (exact vs. inexact) tersebut. Knowledge Mining Formulasikan Permintaan Buat N-grams Pilih (vote) Saring

Mar 30, 2019

Download

Documents

phamlien
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Question Answering from the Web Using Knowledge Annotation ... filetipe teks (exact vs. inexact) tersebut. Knowledge Mining Formulasikan Permintaan Buat N-grams Pilih (vote) Saring

Question Answering from the Web UsingKnowledge Annotation and Knowledge Mining Techniques

olehJimmy Lin and Boris Katz

diceritakan kembali oleh:Jan Peter Alexander

Page 2: Question Answering from the Web Using Knowledge Annotation ... filetipe teks (exact vs. inexact) tersebut. Knowledge Mining Formulasikan Permintaan Buat N-grams Pilih (vote) Saring

Poin Penting Paper

● Sistem Question-Answering● Data dari World Wide Web (WWW)

ARANEA● Teknik

● Knowledge Annotation● Knowledge Minning

STARTparagraf

Page 3: Question Answering from the Web Using Knowledge Annotation ... filetipe teks (exact vs. inexact) tersebut. Knowledge Mining Formulasikan Permintaan Buat N-grams Pilih (vote) Saring

Permasalahan

● Sistem Temu-Kembali Traditional● Pencarian dengan daftar potensi halaman.

[bikin pusing]

● Factoid!● Pertanyaan mengarah ke Jawaban

– Kapan John Doe Lahir?● Jawaban sederhana

– 1970 [TAHUN]

Page 4: Question Answering from the Web Using Knowledge Annotation ... filetipe teks (exact vs. inexact) tersebut. Knowledge Mining Formulasikan Permintaan Buat N-grams Pilih (vote) Saring

Konsep

● Hukum Zipf (Zipf’s Law)

“A small fraction of question types accounts for a significant portion of all question instances.”

“Sebagian kecil dari jenis pertanyaan menjawab sebagian besar pertanyaan-

pertanyaan yang ada.”

Page 5: Question Answering from the Web Using Knowledge Annotation ... filetipe teks (exact vs. inexact) tersebut. Knowledge Mining Formulasikan Permintaan Buat N-grams Pilih (vote) Saring

Konsep (cont'd)

Page 6: Question Answering from the Web Using Knowledge Annotation ... filetipe teks (exact vs. inexact) tersebut. Knowledge Mining Formulasikan Permintaan Buat N-grams Pilih (vote) Saring

Aranea

WebDatabases

GoogleKnowledgeAnnotation

KnowledgeMining

KnowledgeBoosting

Questions

Answers

Page 7: Question Answering from the Web Using Knowledge Annotation ... filetipe teks (exact vs. inexact) tersebut. Knowledge Mining Formulasikan Permintaan Buat N-grams Pilih (vote) Saring

KnowledgeAnnotation

CIAWorld Factbook

Biography.com 50states.com...

When was x born?What is the birth date of x?...→ {biography.com, x, birthdate}

Page 8: Question Answering from the Web Using Knowledge Annotation ... filetipe teks (exact vs. inexact) tersebut. Knowledge Mining Formulasikan Permintaan Buat N-grams Pilih (vote) Saring

KnowledgeMining

Formulasikan Permintaan

Buat N-grams

Pilih (vote)

Saring Kandidat

Kombinasikan Kandidat

Nilai Kandidat

Dapatkan Dukungan

Eksekusi Permintaan

Page 9: Question Answering from the Web Using Knowledge Annotation ... filetipe teks (exact vs. inexact) tersebut. Knowledge Mining Formulasikan Permintaan Buat N-grams Pilih (vote) Saring

Kuantitas daripada Kualitas

Ada 2 Jenis kueri:

- Tepat (exact) Pattern matching

- Tidak Tepat (inexact) Sekumpulan kata kunci

“When did the Mesozoic period end?”

KnowledgeMining

Formulasikan Permintaan

Buat N-grams

Pilih (vote)

Saring Kandidat

Kombinasikan Kandidat

Nilai Kandidat

Dapatkan Dukungan

Eksekusi Permintaan

Page 10: Question Answering from the Web Using Knowledge Annotation ... filetipe teks (exact vs. inexact) tersebut. Knowledge Mining Formulasikan Permintaan Buat N-grams Pilih (vote) Saring

“When did the Mesozoic period end?”

KnowledgeMining

Formulasikan Permintaan

Buat N-grams

Pilih (vote)

Saring Kandidat

Kombinasikan Kandidat

Nilai Kandidat

Dapatkan Dukungan

Eksekusi Permintaan Query: When did the Mesozoic period end

Tipe: inexactSkor: 1Snippet yg hendak digali: 100

Query: the Mesozoic period ended

Tipe: inexactSkor: 1Snippet yg hendak digali: 100

Query: the Mesozoic period ended ?x

Type: exactSkor: 2Snippet yg hendak digali: 100Maks. panjang ?x: 50Maks. panjang kata ?x: 5

Page 11: Question Answering from the Web Using Knowledge Annotation ... filetipe teks (exact vs. inexact) tersebut. Knowledge Mining Formulasikan Permintaan Buat N-grams Pilih (vote) Saring

KnowledgeMining

Formulasikan Permintaan

Buat N-grams

Pilih (vote)

Saring Kandidat

Kombinasikan Kandidat

Nilai Kandidat

Dapatkan Dukungan

Eksekusi Permintaan

Menghasilkan semua kemungkinan N-gram

(Unigram, Bigram, Trigram, dan Tetragram)

dari potongan teks hasil Eksekusi Permintaan.

Lalu diberi skor awal berdasarkan tipe teks (exact vs. inexact) tersebut.

Page 12: Question Answering from the Web Using Knowledge Annotation ... filetipe teks (exact vs. inexact) tersebut. Knowledge Mining Formulasikan Permintaan Buat N-grams Pilih (vote) Saring

KnowledgeMining

Formulasikan Permintaan

Buat N-grams

Pilih (vote)

Saring Kandidat

Kombinasikan Kandidat

Nilai Kandidat

Dapatkan Dukungan

Eksekusi Permintaan

N-gram diskor ulang berdasarkan

jumlah penemuan kembali n-gram tersebut.

Konsep:Sebuah jawaban yang ditemukan pada beberapa dokumen memiliki kemungkinan sebagai jawaban sahih.

Page 13: Question Answering from the Web Using Knowledge Annotation ... filetipe teks (exact vs. inexact) tersebut. Knowledge Mining Formulasikan Permintaan Buat N-grams Pilih (vote) Saring

KnowledgeMining

Formulasikan Permintaan

Buat N-grams

Pilih (vote)

Saring Kandidat

Kombinasikan Kandidat

Nilai Kandidat

Dapatkan Dukungan

Eksekusi Permintaan

N-gram dieliminasi berdasarkan kriteria:

✔ Kandidat diawali/diakhiri stop word dibuang

✔ Kandidat yang mengandung pertanyaan asli dibuang. Kecuali yang mengandung focus words.

“How many meters. . . ”

✔ Heuristik untuk menerapkan tipe jawaban

“how far”, “how fast”, “how tall” → Jawaban numerik

✔ Filter-filter tertutup (fixed-list) untuk jawaban bertipe closed- class items.

“what sports...”, “what nationality...”, “what language...”

Page 14: Question Answering from the Web Using Knowledge Annotation ... filetipe teks (exact vs. inexact) tersebut. Knowledge Mining Formulasikan Permintaan Buat N-grams Pilih (vote) Saring

KnowledgeMining

Formulasikan Permintaan

Buat N-grams

Pilih (vote)

Saring Kandidat

Kombinasikan Kandidat

Nilai Kandidat

Dapatkan Dukungan

Eksekusi Permintaan

Menggabungkan n-gram yang lebih pendek ke n-gram yang panjang

apabila

n-gram pendek terdapat di dalam n-gram yang panjang.

Skor “de Soto” ditambahkan ke

Skor “Hernando de Soto”

Page 15: Question Answering from the Web Using Knowledge Annotation ... filetipe teks (exact vs. inexact) tersebut. Knowledge Mining Formulasikan Permintaan Buat N-grams Pilih (vote) Saring

KnowledgeMining

Formulasikan Permintaan

Buat N-grams

Pilih (vote)

Saring Kandidat

Kombinasikan Kandidat

Nilai Kandidat

Dapatkan Dukungan

Eksekusi Permintaan

skor=skor∗1

∣A∣∑w∈A

logNwc

Page 16: Question Answering from the Web Using Knowledge Annotation ... filetipe teks (exact vs. inexact) tersebut. Knowledge Mining Formulasikan Permintaan Buat N-grams Pilih (vote) Saring

KnowledgeMining

Formulasikan Permintaan

Buat N-grams

Pilih (vote)

Saring Kandidat

Kombinasikan Kandidat

Nilai Kandidat

Dapatkan Dukungan

Eksekusi Permintaan

Cek apakah jawaban benar ada di teks snippet asli yang didapat dari web.

Page 17: Question Answering from the Web Using Knowledge Annotation ... filetipe teks (exact vs. inexact) tersebut. Knowledge Mining Formulasikan Permintaan Buat N-grams Pilih (vote) Saring

KnowledgeBoosting

Pengecekan secara heuristik:

Sekumpulan prosedur untuk mengenali lokasi geografis, tahun.

KnowledgeAnnotation

KnowledgeMining

Page 18: Question Answering from the Web Using Knowledge Annotation ... filetipe teks (exact vs. inexact) tersebut. Knowledge Mining Formulasikan Permintaan Buat N-grams Pilih (vote) Saring

Hasil# of q. %

Knowledge Annotation

Benar 30 6.0

Tidak tepat 2 0.4

Salah 10 2.0

Total 42 8.4

Knowledge Mining

Benar 153 30.6

Tidak tepat 43 8.6

Salah 262 52.4

Total 458 91.6

Total Benar 183 36.6

Tidak tepat 45 9.0

Salah 272 54.4

Total 500 100.0

Performa

Knowledge Annotation

Benar 71.4

Tidak tepat 4.7

Salah 23.9

Knowledge Mining

Benar 33.4

Tidak Tepat 9.4

Salah 57.2

● Unsupported (AQUAINT)

● inexact

Page 19: Question Answering from the Web Using Knowledge Annotation ... filetipe teks (exact vs. inexact) tersebut. Knowledge Mining Formulasikan Permintaan Buat N-grams Pilih (vote) Saring

Hasil (cont'd)

● Sekitar 16% (30/183) jawaban tepat didapat dari knowledge annotation.● 28 basisdata access schemata● 7 knowledge source● Hanya data beberapa hari kerja

● Butuh pemahaman tentang natural language.● Human error (Machine Learning);● Pertanyaan temporal (“Gubernur tahun 1950”);● Nilai Semantik (“orang kedua”);

Page 20: Question Answering from the Web Using Knowledge Annotation ... filetipe teks (exact vs. inexact) tersebut. Knowledge Mining Formulasikan Permintaan Buat N-grams Pilih (vote) Saring

Kesimpulan

● Aranea menggunakan 2 jenis teknik:● Knowledge Annotation● Knowledge Mining

Page 21: Question Answering from the Web Using Knowledge Annotation ... filetipe teks (exact vs. inexact) tersebut. Knowledge Mining Formulasikan Permintaan Buat N-grams Pilih (vote) Saring

Contoh Hasil Kueri

● When is Gerald Ford’s birthday?● July 14, 1913● (extracted using knowledge annotation techniques from biography.com)

● Who founded Taoism?● Lao Tzu● (extracted using knowledge mining techniques)

● What was the name of the first child of English parents to be born in America?● Virginia Dare● (extracted using knowledge mining techniques)

Page 22: Question Answering from the Web Using Knowledge Annotation ... filetipe teks (exact vs. inexact) tersebut. Knowledge Mining Formulasikan Permintaan Buat N-grams Pilih (vote) Saring

Mining Problem

● (1) Wilt Chamberlain scored 100 points on March 2, 1962 against the New Yorks Knicks.

● (2) On December 8, 1961, Wilt Chamberlain scored 78 points in a triple overtime game. It was a new NBA record, but Warriors coach Frank McGuire didn’t expect it to last long, saying, “He’ll get 100 points someday.” McGuire’s prediction came true just a few months later in a game against the New York Knicks on March 2.