Content area
Full Text
Abstract
In this paper, we present a morphological based automatic tagging for Telugu without requiring any machine learning algorithm or training data. We believe that inflectional and agglutinating languages, the critical information required for tagging comes more from word internal structure than from the context and we show how a well designed morphological analyzer can assign correct tags and disambiguate many cases of tag ambiguities too. We have used fine grained, hierarchical tag set, carrying not only morph-syntactic information but also some aspects of lexical and semantic information that is necessary or useful for syntactic parsing. We give details of our experiments and results obtained. We believe our approach can also be applied to other Dravidian languages.
Keywords: Morphology, POS tagging, Tagging, Telugu, Lexicon.
(ProQuest: ... denotes formulae omitted.)
1. Introduction
The ultimate goal of research on Natural language processing (NLP) is to understand human or natural languages and to facilitate human-machine interaction through human language or natural language. To achieve such research goal, NLP people has focused on different sub tasks. Part of speech tagging is one of such sub-task.
In computational linguistics part of speech tagging also called grammatical tagging is a classification system. Tagging is the process of assigning short labels to words in a text for the purpose of indicating lexical, morphological, syntactic, semantic or other such information associated with these words. When the focus is mainly on syntactic categories and/or sub-categories, this is also known as part-of-speech or POS tagging. It may be noted that the term tagging is broader than the term POS tagging. One of the main reasons for incorporating a tagging level between lexical and morphological levels on the one side and syntactic parsing on the other side is to reduce ambiguities. Tag ambiguities multiply at an exponential rate making syntactic parsing so much more difficult.
Word categories or classes are crucial to the study of sentence structure. In fact, they are more important than
words. Each sentence has different words and different order. For example The Ram saw the running dear, and
The running dear saw the Ram, and The Ram saw the dear running. Sentences having the same set of words can vary in meaning and the difference can only be accounted by the...