Investigating Lexical and Syntactic Differences in Written and Spoken English Corpora

Mina Rajaei Moghadam; Mosab Rezaei; Miguel Williams; Gülşat Aygen; Reva Freedman

doi:10.32473/flairs.37.1.135598

Investigating Lexical and Syntactic Differences in Written and Spoken English Corpora

Autores/as

Mina Rajaei Moghadam Department of English at the Northern Illinois University
Mosab Rezaei Department of Computer Science at the Northern Illinois University
Miguel Williams Department of Computer Science at the Northern Illinois University
Gülşat Aygen Department of English at the Northern Illinois University
Reva Freedman Department of Computer Science at the Northern Illinois University

DOI:

https://doi.org/10.32473/flairs.37.1.135598

Resumen

This paper presents an analysis of the differences between written text and the transcription of spoken text using current Natural Language Processing (NLP) methods. The purpose of the study is to investigate the long and rich history of attempts to differentiate spoken and written text in fields such as linguistics, communication, and rhetoric, which date back to the early 20th century. Given the availability of large quantities of machine-readable data and machine learning algorithms that can handle them, it is possible to use a large number of derived features. The research focuses on syntactic and lexical differences in written books and transcriptions of speeches by United States presidents. The analysis investigates morphological, lexical, syntactical, and text-level aspects. In this process, multiple features have been considered including lexical diversity, syllable count, frequency of parts of speech, and features relating to the parse tree, like the average length of noun phrases, and the use of interrogative sentences, among others. This study will enhance our understanding of the difference between written text and the transcription of spoken text in various disciplines including computer science, applied linguistics, communication, and similar fields.

Descargas

PDF (English)

Publicado

2024-05-12

Cómo citar

Rajaei Moghadam, M., Rezaei, M., Williams, M., Aygen, G., & Freedman, R. (2024). Investigating Lexical and Syntactic Differences in Written and Spoken English Corpora. The International FLAIRS Conference Proceedings, 37(1). https://doi.org/10.32473/flairs.37.1.135598

Descargar cita

Número

Vol. 37 (2024): Vol. 37 (2024): Proceedings of FLAIRS-37

Sección

Special Track: Applied Natural Language Processing

Licencia

Derechos de autor 2024 Mina Rajaei Moghadam, Mosab Rezaei, Miguel Williams, Gülşat Aygen, Reva Freedman

Esta obra está bajo una licencia internacional Creative Commons Atribución-NoComercial 4.0.

Investigating Lexical and Syntactic Differences in Written and Spoken English Corpora

Autores/as

DOI:

Resumen

Descargas

Publicado

Cómo citar

Número

Sección

Licencia

Desarrollado por

Enviar un artículo

Idioma