The Nonverbal Element in Persian Verbal Multiword Expressions: A Corpus Annotation Approach

Authors

  • Vahide Tajalli NLP lab, Shahid Beheshti University, Tehran, Iran
  • Mehrnoush Shamsfard NLP lab, Shahid Beheshti University, Tehran, Iran
  • Yalda Yarandi NLP lab, Shahid Beheshti University, Tehran, Iran
  • Mahtab Sarlak NLP lab, Shahid Beheshti University, Tehran, Iran
  • Arezoo Haghbin NLP lab, Shahid Beheshti University, Tehran, Iran

DOI:

https://doi.org/10.46991/jil/2025.02.04

Keywords:

Compound Verb, Nonverbal Element, Persian, Preverb, Text Corpus, Verbal Multiword Expression

Abstract

This article presents a linguistic framework for the identification and annotation of Persian (Farsi) Verbal Multiword Expressions (VMWEs), developed in alignment with the standards and methodologies set by the PARSEME Corpus—an international research network focused on the systematic analysis of multiword expressions across languages. The study aims to bridge the gap between universal annotation guidelines and language-specific grammatical features by tailoring the PARSEME framework to the structural and semantic properties of Persian. By extracting the characteristics of Persian VMWEs, particularly their nonverbal elements (preverbs) and their diverse syntactic and morphological patterns, this work contributes to a more refined understanding of Persian verbal idiomaticity and the advancement of natural language processing tasks. The article details the development of annotation guidelines that reflect both cross-linguistic categories and Persian-specific grammatical phenomena and the process of annotating a corpus of 5,617 sentences encompassing a wide range of Persian VMWEs including light verb constructions, verbal idioms, and prefix verbs. The practical applications of these guidelines in natural language processing are discussed, highlighting their potential to enhance machine understanding of complex verbal constructions, improve syntactic parsing accuracy, and support downstream tasks such as machine translation, information extraction, and semantic role labeling.

Downloads

Download data is not yet available.

References

Anosheh, M. 2019. “Serial Verb Construction in Persian: A Minimalist Approach.” Journal of Researches in Linguistics 11(1): 73–91.

Eshaghi, M., and G. Karimi-Doostan. 2023. “The Productivity of Persian Light Verbs.” In Light Verb Constructions as Complex Verbs, 1–28.

Farahani, M., M. Gharachorloo, M. Farahani, and M. Manthouri. 2021. “ParsBERT: Transformer-Based Model for Persian Language Understanding.” Neural Processing Letters 53: 3831–3847.

Folli, R., H. Harley, and S. Karimi. 2005. “Determinants of Event Type in Persian Complex Predicates.” Lingua 115(10): 1365–1401.

Iranpour Mobarakeh, M., and B. Minaei-Bidgoli. 2009. “Verb Detection in Persian Corpus.” International Journal of Digital Content Technology and its Applications 3(1): 58–65.

Karimi, S. 1997. “Persian Complex Verbs: Idiomatic or Compositional.” Lexicology-Berlin- 3: 273–318.

Karimi-Doostan, G. 2005. “Light Verbs and Structural Case.” Lingua 115(12): 1737–1756.

Karimi-Doostan, G. 2011. “Separability of Light Verb Constructions in Persian.” Studia Linguistica 65(1): 70–95.

Mansoory, N., M. Shamsfard, and M. Rouhizadeh. 2012. “Compound Verbs in Persian WordNet.” International Journal of Lexicography 25(1): 50–67.

Mohammad, J., and S. Karimi. 1992. “Light Verbs Are Taking Over: Complex Verbs in Persian.” In Proceedings of WECOL 5: 195–212.

Moloodi, A., and M. Kouhestani. 2017. “The Role of Metaphor and Metonymy in the Semantics of Persian Adjectival Preverbs: A Cognitive Linguistics Approach.” Language Art 2(2): 91–105.

Rasekh, M. 2014. “Persian Clitics: Doubling and Agreement.” Journal of Modern Languages 24(1): 16–33.

Rasooli, M. S., H. Faili, and B. Minaei-Bidgoli. 2011. “Unsupervised Identification of Persian Compound Verbs.” In Advances in Artificial Intelligence: MICAI 2011, 394–406. Springer Berlin Heidelberg.

Safari, P., M. S. Rasooli, A. Moloodi, and A. Nourian. 2022. “The Persian Dependency Treebank Made Universal.” In Proceedings of the Thirteenth Language Resources and Evaluation Conference, 7078–7087.

Samvelian, P., and P. Faghiri. 2013. “Introducing PersPred, a Syntactic and Semantic Database for Persian Complex Predicates.” In The 9th Workshop on Multiword Expressions, 11–20.

Samvelian, P., and P. Faghiri. 2014. “Persian Complex Predicates: How Compositional Are They?” Semantics-Syntax Interface 1(1): 43–74.

Sarlak, M., Y. Yarandi, and M. Shamsfard. 2023. “Predicting Compositionality of Verbal Multiword Expressions in Persian.” In Proceedings of the 19th Workshop on Multiword Expressions (MWE 2023), 14–23.

Savary, A., C. B. Khelil, C. Ramisch, V. Giouli, V. B. Mititelu, N. H. Mohamed, … A. Walsh. 2023. “PARSEME Corpus Release 1.3.” In Proceedings of the 19th Workshop on Multiword Expressions (MWE 2023), 24–35.

Shamsfard, M. 2007. “Developing FarsNet: A Lexical Ontology for Persian.” GWC 2008, 413.

Downloads

Published

2026-04-17

Issue

Section

Articles

How to Cite

Tajalli, V., Shamsfard, M., Yarandi, Y., Sarlak, M., & Haghbin, A. (2026). The Nonverbal Element in Persian Verbal Multiword Expressions: A Corpus Annotation Approach. Journal of Iranian Linguistics, 2(2), 87-107. https://doi.org/10.46991/jil/2025.02.04

Similar Articles

11-15 of 15

You may also start an advanced similarity search for this article.