Content area
Full Text
Received: 25-August-2017; Revised: 24-February-2018; Accepted: 27-February-2018
©2018 ACCENTS
Abstract
Root extraction is one of the main text operations conducted by converting the conflation into its root. This process aims to overcome the morphological richness problem of the Arabic language. Root extraction gives a valuable support to many natural language processing applications such as information retrieval, machine translation, and text-summarizing applications. In this research, a hybrid technique to extract Arabic word roots has been developed. The proposed technique depends on optimization function, which is the enhancing process performed by playing a set of non-morphological rules to enhance the n-gram technique. The proposed technique is tested using a dataset containing more than 6000 distinguished words belonging to 141 different roots. The results show a marked improvement after using the hybrid method, the proposed technique extracts correctly about 99% of tripartite strong roots and about 86% of tripartite vowels roots.
Keywords
Arabic root extraction, Natural language processing, Hybrid technique, Similarity.
(ProQuest: ... denotes formulae omitted.)
1.Introduction
The Arabic language is one of the major languages in the world. The language is spoken by nearly 400 million people and ranks fifth in the world's languages [1]. It is also the language of the holy Quran used by more than a billion and a half Muslims in their prayers. The Arabic alphabet consists of 28 letters written from right to left using cursive letters. Arabic words are derived from their roots by adding postfixes, infixes, and suffixes or by amending the center of the word. Many applications in the Arabic language computerization field utilize the conversion of Arabic words into their roots to use their roots instead of the word. The main examples of these applications are information retrieval systems, document classification systems, text summarizing, automatic translation systems, and optical character systems (e.g., optical character recognition (OCR)) [2].
Arabic roots can be classified according by containing vowels into two types [3].
The first type, which is called the vowel root, is the root that contains at least one vowel. The second type, which is called the strong root, is a root that does not contain a vowel. We can classify Arabic roots into the four following types according to the number of letters forming the root: trio, which forms...