Social:Switchboard Telephone Speech Corpus

From HandWiki

The Switchboard Telephone Speech Corpus is a corpus of spoken English language consisted of almost 260 hours of speech. It was created in 1990 by Texas Instruments via a DARPA grant, and released in 1992 by NIST. The corpus contains 2,400 telephone conversations among 543 US speakers (302 male, 241 female).[1][2][3] Participants did not know each other, and conversations were held on topics from a predetermined list.[4] Switchboard-2 Phase II was collected in 1999 and includes "4,472 five-minute telephone conversations involving 679 participants".[5]

The corpus was used for development of speech recognition algorithms.[6]

Text example:[7]

A: All right um well [laughter-uh] let's see i'm twenty
B: How old are you Lisa. Okay that i'm older
A: Yeah how old are you. Older [laughter]
B: Older than you [laughter-are]
A: [laughter-okay]
B: Okay we are supposed to talk about places we like to go so i'm gonna and where are you from where are you calling from?
A: I'm calling from uh Provo Utah but I'm from Plano Texas
B: Oh you are from Plano my sister lives in Plano yes her husband is the new Director of Admissions at uh University of Texas at Dallas
A: Oh really. Oh wow my dad used to work at UTD also
B: Yeah so I [vocalized-noise]. Anyway so where's your favorite place to go?
A: Um. Generally we just go on family vacations to Arizona my grandparents live there that's generally our usual summer vacation

Further reading

References

  1. "Switchboard-1 Release 2 - Linguistic Data Consortium" (in en). https://catalog.ldc.upenn.edu/LDC97S62. 
  2. "Papers with Code - Switchboard-1 Corpus Dataset" (in en). https://paperswithcode.com/dataset/switchboard-1-corpus. 
  3. Godfrey, John J.; Holliman, Edward C.; McDaniel, Jane (23 March 1992). "SWITCHBOARD: Telephone speech corpus for research and development". [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing. IEEE Computer Society. pp. 517–520. doi:10.1109/ICASSP.1992.225858. ISBN 0-7803-0532-9. https://dl.acm.org/doi/10.5555/1895550.1895693. Retrieved 26 January 2024. 
  4. "NXT Swbd Overview". https://groups.inf.ed.ac.uk/switchboard/overview.html. 
  5. "Switchboard-2 Phase II - Linguistic Data Consortium" (in en). https://catalog.ldc.upenn.edu/LDC99S79. 
  6. "Switchboard Transcription System". https://www1.icsi.berkeley.edu/Speech/stp/description.html. 
  7. Soni, Mayank; Spillane, Brendan; Gilmartin, Emer; Saam, Christian; Cowan, Benjamin R.; Wade, Vincent (2021). "An Empirical Study of Topic Transition in Dialogue". arXiv:2111.14188 [cs.CL].