In this paper we will present work carried out on the 50,000 words Italian Spontaneous Speech Corpus called AVIP, under national project API, made available for free download from the website of the coordinator, the University of Naples – Federico II. We will concentrate on the tuning of the parser for Italian which had been previously used to parse 100,000 words corpus of written Italian within the National Treebank initiative coordinated by ILC in Pisa. We will also present the linguistic annotation tools needed to allow the parser to produce syntactic structures automatically. In particular, in order to produce appropriate linguistic annotations, all transcribed materials need to be transliterated from the audio-transcription to a more standard orthographic format. In that way, the parser receives as an input the adequately transformed orthographic transcription of the dialogues making up the corpus, in which pauses, hesitations and other disfluencies have been turned into most likely corresponding punctiation marks, interjections or truncation of the word underlying the uttered segment. The most interesting phenomenon we will discuss is without any doubts “overlap”, i.e. a speech event in which two people speak at the same time by uttering actual words or in some cases nonwords, when one of the speakers, usually the one which is not the current turntaker, interrupts or backchannels the current speaker. This phenomenon takes place at a certain point in time where it has to be anchored to the speech signal but in order to be fully parsed and subsequently semantically interpreted, it needs to be referred semantically both to a following turn and to the local turn where it may produce conversational moves to repair what has been previously said by the current speaker.

Parsing the Oral Corpus AVIP/API

DELMONTE, Rodolfo;TONELLI, Sara
2004-01-01

Abstract

In this paper we will present work carried out on the 50,000 words Italian Spontaneous Speech Corpus called AVIP, under national project API, made available for free download from the website of the coordinator, the University of Naples – Federico II. We will concentrate on the tuning of the parser for Italian which had been previously used to parse 100,000 words corpus of written Italian within the National Treebank initiative coordinated by ILC in Pisa. We will also present the linguistic annotation tools needed to allow the parser to produce syntactic structures automatically. In particular, in order to produce appropriate linguistic annotations, all transcribed materials need to be transliterated from the audio-transcription to a more standard orthographic format. In that way, the parser receives as an input the adequately transformed orthographic transcription of the dialogues making up the corpus, in which pauses, hesitations and other disfluencies have been turned into most likely corresponding punctiation marks, interjections or truncation of the word underlying the uttered segment. The most interesting phenomenon we will discuss is without any doubts “overlap”, i.e. a speech event in which two people speak at the same time by uttering actual words or in some cases nonwords, when one of the speakers, usually the one which is not the current turntaker, interrupts or backchannels the current speaker. This phenomenon takes place at a certain point in time where it has to be anchored to the speech signal but in order to be fully parsed and subsequently semantically interpreted, it needs to be referred semantically both to a following turn and to the local turn where it may produce conversational moves to repair what has been previously said by the current speaker.
2004
Atti del Convegno "Il Parlato Italiano"
File in questo prodotto:
File Dimensione Formato  
delmontePOC.pdf

non disponibili

Tipologia: Abstract
Licenza: Licenza non definita
Dimensione 403.63 kB
Formato Adobe PDF
403.63 kB Adobe PDF   Visualizza/Apri

I documenti in ARCA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10278/39555
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact