Detecting the Presence of Named Entities in Bengali: Corpus and Experiments

Farzana Rashid; Fahmida Hamid

doi:10.32473/flairs.v34i1.128445

Authors

Farzana Rashid University of North Carolina at Asheville
Fahmida Hamid

DOI:

https://doi.org/10.32473/flairs.v34i1.128445

Abstract

Named Entity Recognition (NER) belongs to the field of Information Extraction (IE) and Natural Language
Processing (NLP). NER aims to find and categorize named entities present in the textual data into recognizable classes. Named entities play vital roles in other related fields like question-answering, relationship extraction, and machine translation. Researchers have done a significant amount of work (e.g., dataset construction and analysis) in this direction for several languages like English, Spanish, Chinese, Russian, Arabic, to name a few. We do not find a comparable amount of work for several South-Asian languages like Bengali/Bangla. Hence, as part of the initial phase, we have constructed a qualitative dataset in Bengali.
In this paper, we identify the presence of Named Entities (NEs) in the Bengali text (sentences), classify them in standardized categories, and test whether an automatic detection of NE is possible. We present a new corpus and experimental results. Our dataset, annotated by multiple humans, shows promising results (F-measures ranging from 0.72 to 0.84) in different setups (support vector machine (SVM) setups with simple language features and Long-Short Term Memory (LSTM) setup with various word embedding).

Detecting the Presence of Named Entities in Bengali: Corpus and Experiments

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Developed By

Make a Submission

Language