Content area
Full Text
Abstract- Transliteration is the mapping of a word or text written in one writing system into another writing system. Transliteration maps the letters of the source language to the letters in the target language for a specific pair of source and target language. Transliteration must preserve sound. Transliteration can be used for encryption also. Here the source language is English and the target language is Malayalam. In some cases the letters in the source script may not match exactly with the target language. Transliteration usually defines some conventions for dealing with that. The source string is segmented in to transliteration units and related with the target language units. Thus transliteration problem can be viewed as a sequence labeling problem. Here the classification is done using Support Vector Machine (SVM).
Index Terms-Transliteration, Sequence Labeling Approach, Support Vector Machine
I. INTRODUCTION
Transliteration performs a mapping from one alphabet into another. Transliterations can be used to write words in some old scripts with good precision. For example, traditional or cheap typesetting with a small character set; editions of old texts in scripts not used any more; some library catalogues. The transliteration process is quite close to phonetic mapping of Indian language characters to the letters of the Roman alphabet; hence it should preserve phonetic structure of words. Transliteration can be used in situations where we want to express words or concepts in a language with another script.
II. THE SEQUENCE LABELING APPROACH
Transliteration maps the letters from the source script to the letters of the goal script. The process of transliteration mainly involves two steps:
* Segmentation of the source string into transliteration units.
* Mapping the source language transliteration into the target language.
Thus the transliteration problem can be viewed as a sequence labeling problem [1] from one language alphabet to another.
Here the source language is English and target language is Malayalam. An English name, for example, X is segmented in to x1, x2,...,xn where xi corresponds to the alphabet in the name. Let the equivalent Malayalam name be Y and Y is segmented as y1, y2,...,yn where each yi is treated as a label in the label sequence. Each xi is now aligned with its phonetically equivalent yi.
x^sub 1^ x^sub 2^ ..........