![]() ![]() Fourth section describes the approach then fifth section gives the evaluation of the system. This paper is further divided into five more sections in which second section provides the related work and next section shows the morphological characteristics of Assamese. On the other hand, unsupervised learning method does not use the annotated corpus and it calculates the probabilities by using automatic word groupings. Therefore, HMM requires a large amount of annotated corpus to obtain high accuracy. This method is called supervised learning method. The tagging process is done by computing the tag sequence probability and the word likelihood probability of the corpus. It is a probabilistic model that uses an annotated training corpus. ![]() A bigram Hidden Markov Model (HMM) is used which is one of the processes in this technique. The developed POS Tagger for Assamese follows the Stochastic Approach. This model uses the essential feature of statistical approaches and uses the rules for better efficiency. Hybrid Approach is the combination of more than one method which usually contains rule- based and statistical methods. ![]() The probability of a given sequence of tags is calculated from the frequency of words from the annotated training corpus. The tag which occurs most repeatedly in the training data is assigned to unknown or ambiguous word. The Stochastic Approach is based on the probabilities of words that occur for a particular tag. This method is dependent on dictionary or lexicon to generate the possible POS tags for every word in input text. These rules identify the appropriate tag for an ambiguous word. Rule Based POS tagging is the most primitive approach where hand-written linguistic rules are used for tagging. ![]() There are several methods of POS tagging and basically there are three main approaches which are Rule Based Approach, Stochastic Approach and Hybrid Approach. As POS Tagger has a great impact on other NLP systems, a tagging result with high accuracy is always encouraging. Thus, it is considered as an initial step of the language processing task. POS Tagger tries to assign the accurate POS labels to ambiguous words in a sentence according to the context and it has a vital role in various NLP applications as because the POS tagged data is used in many other NLP tasks (Jurafsky & Martin, 2000), e.g., in Parsing, the tagged data helps in finding out noun and verb groups, in Named Entity Recognition, it helps in determining the proper names like the name of a person, place or a thing, in Information Retrieval, it helps in selecting the proper nouns or other important word classes from a given text, in Speech Recognition, it helps in modeling a language, in Machine Translation, it helps in generating the probability for word translation of the source language into the target language, as well as it is useful for many other NLP applications. Therefore, POS tagging becomes a challenging task for Assamese. There is an inflection of noun and verb in a sentence in accordance with the grammatical characteristics as well. As Assamese is morphologically rich and agglutinative language, several words have more than one POS category that makes the word ambiguous. It is a very important process because it resolves the ambiguity of words in a sentence by assigning accurate POS label to a word depending on the context. Besides words, punctuation characters and symbols are also labeled accordingly. based on both its definition and context. Part-of-Speech (POS) tagging is the process where every word in a natural language sentence is marked with its corresponding part of speech category like noun, verb, adjective, adverb, etc. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |