[Review] GRADE: Automatic Graph-Enhanced Coherence Metric for Evaluating Open-Domain Dialogue System

GRADE: Automatic Graph-Enhanced Coherence Metric for Evaluating Open-Domain Dialogue Systems, 2020, EMNLP

[Abstract]

Topic-level graph를 활용해, Turn-level이 아닌 Dialog-level에서의 Metric을 계산하는 방식 제안

-. K-hop neighboring

-. hop의 weight 활용

[Architecture]

BERT로 context-response의 pair를 encoding
ConceptNet으로 pair의 topic-level dialog graph 생성 후 inference
1,2 모두를 입력받아 MLP로 최종 score 계산

[Metric]

Utterance-level Contextualized Encoding

Vc = BERT(c,r)

2. Dialogue Graph Construction (Topic-level representation)

G = (V, E)

V = topic nodes, E = set of edges between toics

* Dialogue graph

G = rule-base key word extractor (TF-IDF + Part-Of-Speech features)

then the keywords in c in the context-topic nodes of G, denoted as Vc = {t1, t2, ... tp} while the keywords in r is the reponse-topic nodes of G, denoted as Vr = {tp+1, tp+2, ..., tp+q}

Vc와 Vr의 합집합인 V를 찾음

k-hop neighboring을 활용해 edge 구성 graph matrix 생성

3. Topic-level Graph Reasoning

3.1. graph attention network로 node representation 연결

3.2. update representation

3.3. topic-level graph representation 출력

4. Coherence Scoring

s = FC3(FC2(FC1([vc;vg])))

5. Training

5.1. Training Objective

context-response pair와 context-false response pair의 margin ranking loss를 minimize하도록 학습

5.2. Negative Sampling

random sampling을 하지 않고 ground-truth response와 비슷한 false response를 선택

두 가지 sampling 방식을 활용

5.2.1. lexical sampling: Lucene to retrieve utterances와 ground-truth해서 middle one을 선택

5.2.2. embedding-based sampling: 1,000 utterances를 고르고 top5-cosine similarity 중 random 선택

6.Limitation

[Code]

GitHub - li3cmz/GRADE: GRADE: Automatic Graph-Enhanced Coherence Metric for Evaluating Open-Domain Dialogue Systems

GRADE: Automatic Graph-Enhanced Coherence Metric for Evaluating Open-Domain Dialogue Systems - GitHub - li3cmz/GRADE: GRADE: Automatic Graph-Enhanced Coherence Metric for Evaluating Open-Domain Dia...

github.com

[Dataset format; dailydialog]

{

"act": [2, 1, 1, 1, 1, 2, 3, 2, 3, 4],

"dialog": "[\"Good afternoon . This is Michelle Li speaking , calling on behalf of IBA . Is Mr Meng available at all ? \", \" This is Mr Meng ...",

"emotion": [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

}

Act: a list of classification labels, with possible values including __dummy__ (0), inform (1), question (2), directive (3), commissive (4)
Dialog: a list of string features.
Emotion: a list of classification labels, with possible values including no emotion (0), anger (1), disgust (2), fear (3), happiness (4)

저작자표시 비영리 변경금지

'Papers > Metric' 카테고리의 다른 글

[Review] FACTSCORE: Fine-grained Atomic Evaluation ofFactual Precision in Long Form Text Generation (0)	2024.03.21
Generation 정량 평가 Metric (0)	2023.01.27

NLP AI Research Review

[Review] GRADE: Automatic Graph-Enhanced Coherence Metric for Evaluating Open-Domain Dialogue System

'Papers > Metric' 카테고리의 다른 글

티스토리툴바

[Review] GRADE: Automatic Graph-Enhanced Coherence Metric for Evaluating Open-Domain Dialogue System

'Papers > Metric' 카테고리의 다른 글

'Papers/Metric' Related Articles

티스토리툴바