DialogSum Challenge

Summarizing Real-life Scenario Dialogues, INLG 2022 Shared Task
The hidden test set is released!

Submission and Important Dates

To participant in our shared task, please submit your model outputs via emails.

Participants are encouraged to submit a 4-5 page report to describe your model along with your model outputs, which will be peer-reviewd in a light way.

The submission deadlines for both output and report are 23:59 1st June (UTC-12), after which we will begin human evaluation and peer-review.

For all reports, please use the template that INLG uses for regular papers.

Repeated submission before the deadline is alway welcome. However, we will report results of your latest submission on the leaderboard.

If you would like to submit mutiple models, please specify it in your email.

Please send emails to Yulong Chen if you have any questions.

DialogSum Challenge Shared Task

DialogSum is a shared task on summarizing real-life scenario dialogues, accepted by INLG 2021, to encourage researchers to address challenges in dialogue summarization from multiple aspects. (Also, check our DialogSum Dataset paper for analysis on each challenges) DialogSum Challenge Proposal (INLG 2021)

DialogSum Dataset

DialogSum is a manually annotated large-scale dataset for dialogue summariation. Different from previous dialogue summarization dataset, DialogSum focuses on dialogues under rich real-life scenarios, including more diverse task-oriented dialogues. DialogSum Dataset Paper (Findings of ACL 2021) Download DialogSum Data

Automatic Evaluation

Once you are satisfied with your model performance on the public Dialogsum dataset, you are encouraged to send the output to yulongchen1010@gmail.com with your public test performance (for fairness, please use py-rouge for evaluation) and methods to get the official scores on the hidden test set.


We thank reviewers of ACL 2021 and INLG 2021 for their comments on this project, and Xuefeng Bai, Ming Shen and Bonnie Webber for their insightful discussion and input. Also, we thank Pranav Rajpurkar for giving us the permission to build this website based on SQuAD.


If you have any questions, please feel free to pose them at our Github issues page or contact Yulong Chen, Yang Liu or Naihao Deng.


Call for Participants!

Public Testset Hidden Testset
Rank Model R1 R2 RL BERTScore R1 R2 RL BERTScore
1 GoodBai 47.61 21.66 45.48 92.72 49.66 26.03 48.55 91.69
2 UoT 47.29 21.65 45.92 92.26 49.75 25.15 46.50 91.76
3 IITP-CUNI 47.26 21.18 45.17 92.70 45.89 21.88 43.16 91.13
4 TCS_WITM_2022* 47.02 21.20 44.90 90.13 50.32 25.59 47.40 91.81

*: we evaluate the prediction topic-wise.