Serrano Morales, Mónica2017-02-102017-02-102017http://hdl.handle.net/10230/28109Treball de fi de màster en Lingüística Teòrica i AplicadaIn the last few years, there has been an increase of the interest on Modern Standard Arabic. There is where computational linguistics fits in. This paper analyses the intersection between Arabic and computational linguistics, focusing on text processing and the tools developed for this purpose. The fundamental functions of Arabic computational processing are: sentence segmentation, tokenization, morphosyntactic tagging, lemmatization, diacritization and base phrase chunking. After analysing each of these tasks, a study has been carried in order to elaborate a selection of tools into two groups: computational morphology (BAMA, ALMORGEANA, ELIXIRFM, MAGEAD, MADA+TOKAN and AMIRA) and computational syntax (The Penn Arabic Treebank, The Prague Treebank, and The Columbia Arabic Treebank). Finally, the evaluation of those tools establishes the differences among them, showing their advantages and disadvantages. The conclusion of this paper opens a window to future work regarding information extraction, information retrieval, summarization, question answering or Arabic as second language.application/pdfengAttribution-NonCommercial-NoDerivs 3.0 SpainLingüística computacionalÀrab -- Processament de dadesLingüística computacionalWhat is modern standard Arabic NLP? Definition and tools (or how to understand Arabic even if you do not know a word)info:eu-repo/semantics/masterThesisinfo:eu-repo/semantics/openAccess