Papers
arxiv:1911.12237

SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization

Published on Nov 27, 2019
Authors:
,
,
,

Abstract

The SAMSum Corpus challenges automated dialogue summarization, achieving higher ROUGE scores than news summaries, despite lower human evaluations.

AI-generated summary

This paper introduces the SAMSum Corpus, a new dataset with abstractive dialogue summaries. We investigate the challenges it poses for automated summarization by testing several models and comparing their results with those obtained on a corpus of news articles. We show that model-generated summaries of dialogues achieve higher ROUGE scores than the model-generated summaries of news -- in contrast with human evaluators' judgement. This suggests that a challenging task of abstractive dialogue summarization requires dedicated models and non-standard quality measures. To our knowledge, our study is the first attempt to introduce a high-quality chat-dialogues corpus, manually annotated with abstractive summarizations, which can be used by the research community for further studies.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/1911.12237 in a model README.md to link it from this page.

Datasets citing this paper 6

Browse 6 datasets citing this paper

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/1911.12237 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.
OSZAR »