Dan Liu 1 , Wei Lin 1 , Shiliang Zhang 2 , Si Wei 1 , Hui Jiang 3 1 iFLYTEK Research, Hefei, Anhui, China 2 University of Science and Technology of China, Hefei, Anhui, Mingbin Xu 3 , Feng Wei 3 , Sed Watchara 3 , Yuchen Kang 3 , Hui Jiang 3 3 Dept. of Electrical Engineering and Computer Science York University, Toronto, Canada Exploring Neural Networks for Entity Discovery and Linking (EDL)
39
Embed
Exploring Neural Networks for Entity Discovery and Linking ...€¦ · Dan Liu1, Wei Lin1, Shiliang Zhang2, Si Wei1, Hui Jiang3 1iFLYTEK Research, Hefei, Anhui, China 2University
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Dan Liu1, Wei Lin1, Shiliang Zhang2, Si Wei1, Hui Jiang3
1iFLYTEK Research, Hefei, Anhui, China2University of Science and Technology of China, Hefei, Anhui,
Mingbin Xu 3, Feng Wei3, Sed Watchara3, Yuchen Kang3, Hui Jiang33Dept. of Electrical Engineering and Computer Science
York University, Toronto, Canada
Exploring Neural Networks for Entity Discovery and Linking (EDL)
Outline Introduction• Deep Learning for NLP EDL PipelineTwo submitted systems
• USTC_NELSLIP
• YorkNRMExperiments and Discussions Conclusions
2
Deep Learning for NLP
3
Data Feature Model
neural networks
compact representative
Deep Learning for NLP
3
Data Feature Model
neural networks
compact representative
Deep Learning for NLP
3
Data Feature Model
neural networks
compact representative
Word: word embedding
sentence/paragraph/document: variable-length word sequences
Deep Learning for NLP
4
Data Feature Model
neural networks
the more the better
compact representative
RNNs/LSTMs
CNNs
DNNs + FOFE
FOFE: a fixed-size and unique encoding method for variable length sequences [Zhang et. al., 2015]
Excel in some NLP tasks: language modelling, …
A: [1 0 0] B: [0 1 0] C: [0 0 1]
ABC: [a2, a, 1] ABCBC:
[a4, a3+a, 1+a2]
Fixed-size Ordinally-Forgetting Encoding (FOFE)
FOFE: a fixed-size and unique encoding method for variable length sequences [Zhang et. al., 2015]
Excel in some NLP tasks: language modelling, …
A: [1 0 0] B: [0 1 0] C: [0 0 1]
ABC: [a2, a, 1] ABCBC:
[a4, a3+a, 1+a2]
Fixed-size Ordinally-Forgetting Encoding (FOFE)
FOFE: a fixed-size and unique encoding method for variable length sequences [Zhang et. al., 2015]
Excel in some NLP tasks: language modelling, …
A: [1 0 0] B: [0 1 0] C: [0 0 1]
ABC: [a2, a, 1] ABCBC:
[a4, a3+a, 1+a2]
Fixed-size Ordinally-Forgetting Encoding (FOFE)
FOFE+DNN for all NLP tasks
Input Text
FOFE codes
!!!!
!!
!!!!!
!deep
neuralnets
losslessinvertible
universalapproximators
any NLP targets
Theoretically sound
No feature engineering
Simple models
General methodology
• not only sequence labeling problems
• but also (almost) all NLP tasks
6
EDL Pipeline
7
Entity Discovery Candidate Generation
Candidate Ranking
EDL System 1: USTC
8
Entity Discovery
CandidateGeneration
Candidate Ranking
CNN/RNNcondition LM
Attention Enc-Dec
FOFEDNN
Rule-basedgeneration
NN-basedRanking
EDL System 1: USTC
8
Entity Discovery
CandidateGeneration
Candidate Ranking
CNN/RNNcondition LM
Attention Enc-Dec
FOFEDNN
Rule-basedgeneration
NN-basedRanking
USTC_NELSLIP
EDL Sytem 2: York
9
Entity Discovery
CandidateGeneration
Candidate Ranking
RNNcondition LM
Attention Enc-Dec
FOFEDNN
Rule-basedgeneration
NN-basedRanking
YorkNRM
Entity Linking
10
Entity Discovery
CandidateGeneration
Candidate Ranking
Rule-basedgeneration
NN-basedRanking
Entity Linking: Candidate Generation
11
Rule-based Query Expansion
Query search (mySQL) and fuzzy match (Lucene)
Candidate Generation: Performance
12
KBP2015 test set ENG CMN SPAavg. count 22.60 92.96 38.55
coverage rate 93% 92.1% 88.4%
Quality of generated candidate lists
Average count vs. coverage rate
Entity Linking: NN-based Ranking
13
dim featuree1 100 mention string embedding
e2 100 candidate name embedding
e3 10 mention type
e4 10 document type
e5 10 candidate hot value vector
e6 10 edit distance between mention
string and candidate name
e7 10 cosine similarity of document and
candidate description
e8 10 edit distance between translations of
mention and candidate
Use some hand-crafted features as input
Use feedforward DNNs to compute ranking scores
NIL clustering based on string-match
Entity Discovery (ED)
14
Entity Discovery
CandidateGeneration
Candidate Ranking
CNN/RNNcondition LM
Attention Enc-Dec
FOFEDNN
USTC ED Model1
15
Pr (Y |X) =NY
i=1
P (yi | X, yi�1, yi�2, ...y1)
Mention Detection as Sequence Labelling
Word sequence ==> BIO tags
CNN: 5 layers of convolutional layers
RNN: GRU-based model
Viterbi decoding
USTC ED Model2
16
Introduce attention
Tree-structured tags for nested entities
USTC ED Model2
16
Kentucky Fried Chicken
Introduce attention
Tree-structured tags for nested entities
USTC ED Model2
16
[FAC [PER Kentucky ]PER Fried Chicken ]FAC
Kentucky Fried Chicken
Introduce attention
Tree-structured tags for nested entities
USTC ED Model2
16
[FAC [PER Z ]PER Z Z ]FAC
[FAC [PER Kentucky ]PER Fried Chicken ]FAC
Kentucky Fried Chicken
Introduce attention
Tree-structured tags for nested entities
USTC ED Model2
17
[FAC [PER Z ]PER Z Z ]FAC
Kentucky Fried Chicken
Introduce attention
Tree-structured tags for nested entities
USTC ED Performance
18
Effect of various training data sets:
• KBP15 training data
• iFLYTEK in-house data (10,000 labelled Chinese and English doc)