Corpus of Word Importance Annotations
About the project
The Switchboard Corpus consists of audio recordings of approximately 260 hours of speech consisting of about 2,400 two-sided telephone conversations among 543 speakers (consisting of 302 male, 241 female) from across the United States. In January 2003, the Institute for Signal and Information Processing (ISIP) released written transcripts for the entire corpus, which consists of nearly 400,000 conversational turns. The ISIP transcripts include a complete lexicon list and automatic word alignment timing corresponding to the original audio files. In our project, a pair of annotators have assigned word-importance scores to these transcripts. As of September 2017, they have annotated over 25,000 tokens, with overlap of approximately 3,100 tokens. We announce the release of these annotations as a set of supplementary files, aligned to the ISIP transcripts. Our annotation work continues, and we aim to annotate all of the Switchboard corpus and with a larger group of annotators.
Corpus Avaiable for Download
Release History
-
September, 2017
Below are the files distributed in this release:
- 2005
- sw2005A-ms98-a-trans.text
- sw2005A-ms98-a-word.text
- sw2005B-ms98-a-trans.text
- sw2005B-ms98-a-word.text
- 2191
- sw2191A-ms98-a-trans.text
- sw2191A-ms98-a-word.text
- sw2191B-ms98-a-trans.text
- sw2191B-ms98-a-word.text
- 2222
- sw2222A-ms98-a-trans.text
- sw2222A-ms98-a-word.text
- sw2222B-ms98-a-trans.text
- sw2222B-ms98-a-word.text
- 2348
- sw2348A-ms98-a-trans.text
- sw2348A-ms98-a-word.text
- sw2348B-ms98-a-trans.text
- sw2348B-ms98-a-word.text
- 2450
- sw2450A-ms98-a-trans.text
- sw2450A-ms98-a-word.text
- sw2450B-ms98-a-trans.text
- sw2450B-ms98-a-word.text
- 2565
- sw2565A-ms98-a-trans.text
- sw2565A-ms98-a-word.text
- sw2565B-ms98-a-trans.text
- sw2565B-ms98-a-word.text
- 2636
- sw2636A-ms98-a-trans.text
- sw2636A-ms98-a-word.text
- sw2636B-ms98-a-trans.text
- sw2636B-ms98-a-word.text
- 2710
- sw2710A-ms98-a-trans.text
- sw2710A-ms98-a-word.text
- sw2710B-ms98-a-trans.text
- sw2710B-ms98-a-word.text
- 2886
- sw2886A-ms98-a-trans.text
- sw2886A-ms98-a-word.text
- sw2886B-ms98-a-trans.text
- sw2886B-ms98-a-word.text
- 3044
- sw3044A-ms98-a-trans.text
- sw3044A-ms98-a-word.text
- sw3044B-ms98-a-trans.text
- sw3044B-ms98-a-word.text
- 3083
- sw3083A-ms98-a-trans.text
- sw3083A-ms98-a-word.text
- sw3083B-ms98-a-trans.text
- sw3083B-ms98-a-word.text
- 3203
- sw3203A-ms98-a-trans.text
- sw3203A-ms98-a-word.text
- sw3203B-ms98-a-trans.text
- sw3203B-ms98-a-word.text
- 3301
- sw3301A-ms98-a-trans.text
- sw3301A-ms98-a-word.text
- sw3301B-ms98-a-trans.text
- sw3301B-ms98-a-word.text
- 3324
- sw3324A-ms98-a-trans.text
- sw3324A-ms98-a-word.text
- sw3324B-ms98-a-trans.text
- sw3324B-ms98-a-word.text
- 3601
- sw3601A-ms98-a-trans.text
- sw3601A-ms98-a-word.text
- sw3601B-ms98-a-trans.text
- sw3601B-ms98-a-word.text
- 3817
- sw3817A-ms98-a-trans.text
- sw3817A-ms98-a-word.text
- sw3817B-ms98-a-trans.text
- sw3817B-ms98-a-word.text
- 4010
- sw4010A-ms98-a-trans.text
- sw4010A-ms98-a-word.text
- sw4010B-ms98-a-trans.text
- sw4010B-ms98-a-word.text
- 4021
- sw4021A-ms98-a-trans.text
- sw4021A-ms98-a-word.text
- sw4021B-ms98-a-trans.text
- sw4021B-ms98-a-word.text
- 4320
- sw4320A-ms98-a-trans.text
- sw4320A-ms98-a-word.text
- sw4320B-ms98-a-trans.text
- sw4320B-ms98-a-word.text
- 4400
- sw4400A-ms98-a-trans.text
- sw4400A-ms98-a-word.text
- sw4400B-ms98-a-trans.text
- sw4400B-ms98-a-word.text
- 4531
- sw4531A-ms98-a-trans.text
- sw4531A-ms98-a-word.text
- sw4531B-ms98-a-trans.text
- sw4531B-ms98-a-word.text
- 4721
- sw4721A-ms98-a-trans.text
- sw4721A-ms98-a-word.text
- sw4721B-ms98-a-trans.text
- sw4721B-ms98-a-word.text
Want to participate?