GRADE: Automatic Graph-Enhanced Coherence Metric for Evaluating Open-Domain Dialogue Systems, 2020, EMNLP
[Abstract]
Topic-level graph를 활용해, Turn-level이 아닌 Dialog-level에서의 Metric을 계산하는 방식 제안
-. K-hop neighboring
-. hop의 weight 활용
[Architecture]
- BERT로 context-response의 pair를 encoding
- ConceptNet으로 pair의 topic-level dialog graph 생성 후 inference
- 1,2 모두를 입력받아 MLP로 최종 score 계산
[Metric]
- Utterance-level Contextualized Encoding
Vc = BERT(c,r)
2. Dialogue Graph Construction (Topic-level representation)
G = (V, E)
V = topic nodes, E = set of edges between toics
* Dialogue graph
G = rule-base key word extractor (TF-IDF + Part-Of-Speech features)
then the keywords in c in the context-topic nodes of G, denoted as Vc = {t1, t2, ... tp} while the keywords in r is the reponse-topic nodes of G, denoted as Vr = {tp+1, tp+2, ..., tp+q}
Vc와 Vr의 합집합인 V를 찾음
k-hop neighboring을 활용해 edge 구성 graph matrix 생성
3. Topic-level Graph Reasoning
3.1. graph attention network로 node representation 연결
3.2. update representation
3.3. topic-level graph representation 출력
4. Coherence Scoring
s = FC3(FC2(FC1([vc;vg])))
5. Training
5.1. Training Objective
context-response pair와 context-false response pair의 margin ranking loss를 minimize하도록 학습
5.2. Negative Sampling
random sampling을 하지 않고 ground-truth response와 비슷한 false response를 선택
두 가지 sampling 방식을 활용
5.2.1. lexical sampling: Lucene to retrieve utterances와 ground-truth해서 middle one을 선택
5.2.2. embedding-based sampling: 1,000 utterances를 고르고 top5-cosine similarity 중 random 선택
6.Limitation
[Code]
GitHub - li3cmz/GRADE: GRADE: Automatic Graph-Enhanced Coherence Metric for Evaluating Open-Domain Dialogue Systems
GRADE: Automatic Graph-Enhanced Coherence Metric for Evaluating Open-Domain Dialogue Systems - GitHub - li3cmz/GRADE: GRADE: Automatic Graph-Enhanced Coherence Metric for Evaluating Open-Domain Dia...
github.com
[Dataset format; dailydialog]
{
"act": [2, 1, 1, 1, 1, 2, 3, 2, 3, 4],
"dialog": "[\"Good afternoon . This is Michelle Li speaking , calling on behalf of IBA . Is Mr Meng available at all ? \", \" This is Mr Meng ...",
"emotion": [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
}
- Act: a list of classification labels, with possible values including __dummy__ (0), inform (1), question (2), directive (3), commissive (4)
- Dialog: a list of string features.
- Emotion: a list of classification labels, with possible values including no emotion (0), anger (1), disgust (2), fear (3), happiness (4)
'Papers > Metric' 카테고리의 다른 글
[Review] FACTSCORE: Fine-grained Atomic Evaluation ofFactual Precision in Long Form Text Generation (0) | 2024.03.21 |
---|---|
Generation 정량 평가 Metric (0) | 2023.01.27 |