Аутоматско издвајање мишљења из текстуалних

Abstract

Регрутовање нових и задржавање постојећих студената су важна питања за све високошколске установе. Стога је пресудно стално праћење нивоа задовољства студената. Аутоматска анализа мишљења студената се може реализовати применом аспектно базиране сентимент анализе (АБСА). АБСА је под- дисциплина обраде природног језика која се фокусира на идентификацију сентимената (негативних, неутралних, позитивних) и аспеката (носиоца сентимента) у реченици. Циљ ове докторске дисертације је да предложи систем за АБСА текстуалних коментара студентских анкета на српском језику. Предложени систем се ослања на технике обраде природног језика, модела машинког учења, правила и речника. Корпус је прикупљен и анотиран за развој и евалуацију система и укључује рецензије студената о наставном особљу и студијским програмима на Факултету техничких наука. Резултати истраживања показују да се позитивни сентимент може успешно идентификовати са Ф-мером 0,91, док се негативан сентимент може идентификовати са Ф-мером 0,97. Док су Ф-мере за аспекте у опсегу између 0,49 и 0,89, у зависности од њихове учесталости у корпусу. Према сазнању аутора, ово је прво истраживање АБСА које је спроведено на нивоу сегмента реченице за српски језик. Методологија и сазнања која су представљена у овој докторској дисертацији пружају преко потребне основе за даљи рад на анализи сентимената за српски језик који је у овој области недовољно истражен и има недостатак језичких ресурса.

Alternate abstract:

The increase in the number of websites and platforms has led to an increase in the number of textual opinion data expressed in digital form. People began to connect with each other and to freely express their opinions and views on any topic, regardless of their geographical location. Therefore, since 2002, the number of researches in the field of sentiment analysis has increased with the aim of extracting useful information such as the aspect that people talk about and what their opinions are. In recent years, the application of opinion and sentiment analysis has expanded to almost all domains, from consumer products, multimedia, finance, social events, politics, etc. The information obtained can be used in applications such as product and service recommendation systems or decision support systems.

The topic of this doctoral dissertation is automatic opinion analysis of textual comments in the Serbian language. Automatic opinion extraction, or sentiment analysis, is a sub-discipline of the field of natural language processing that deals with the discovery of common patterns (characteristics) of people's opinions from unstructured content (raw text). More specifically, the automatic extraction of opinion from the text implies the extraction of an attitude or feeling (denoted as sentiment) that a person has expressed towards a certain object (denoted as an aspect). Sentiment is usually classified as positive, negative and neutral. Automatic extraction of opinions can be performed at different levels such as a document, a sentence, and a sentence segment.

The main goal of this research is to develop a model for the automatic extraction of students' opinions expressed in textual comments in the Serbian language. The comments of students expressed through mandatory surveys conducted at the Faculty of Technical Sciences, University of Novi Sad were taken into consideration. The survey process at the Faculty of Technical Sciences includes five types of surveys for over 14,000 students annually. Each survey provides an opportunity for students to through textual comments express their opinion on one or more aspects regarding various aspects of their student experience. Manual processing and analysis of a large number of textual comments is time consuming and prone to human error. Therefore, there is a clear need for a system that will automatically extract opinions from comments.

The proposed methodology is based on the use of several types of models - models based on dictionaries and rules, machine learning models, which include deep learning models. Rule-based and dictionary-based models have been shown to be able to significantly improve performance only for the corpus on which they were developed. Integrating rulebased and vocabulary-based models with common machine learning models can be beneficial regardless of corpus. Multilingual deep learning models with the application of transfer learning have proven to be very effective in the task of identifying sentiment. In the case of aspect identification, deep learning models have shown similar success as common machine learning models. To improve the performance of this model, a larger amount of annotated corpus is needed for all classes of aspects, which would provide better training for deep learning models.

Alternate abstract:

Regrutovanje novih i zadržavanje postojećih studenata su važna pitanja za sve visokoškolske ustanove. Stoga je presudno stalno praćenje nivoa zadovoljstva studenata. Automatska analiza mišljenja studenata se može realizovati primenom aspektno bazirane sentiment analize (ABSA). ABSA je pod- disciplina obrade prirodnog jezika koja se fokusira na identifikaciju sentimenata (negativnih, neutralnih, pozitivnih) i aspekata (nosioca sentimenta) u rečenici. Cilj ove doktorske disertacije je da predloži sistem za ABSA tekstualnih komentara studentskih anketa na srpskom jeziku. Predloženi sistem se oslanja na tehnike obrade prirodnog jezika, modela mašinkog učenja, pravila i rečnika. Korpus je prikupljen i anotiran za razvoj i evaluaciju sistema i uključuje recenzije studenata o nastavnom osoblju i studijskim programima na Fakultetu tehničkih nauka. Rezultati istraživanja pokazuju da se pozitivni sentiment može uspešno identifikovati sa F-merom 0,91, dok se negativan sentiment može identifikovati sa F-merom 0,97. Dok su F-mere za aspekte u opsegu između 0,49 i 0,89, u zavisnosti od njihove učestalosti u korpusu. Prema saznanju autora, ovo je prvo istraživanje ABSA koje je sprovedeno na nivou segmenta rečenice za srpski jezik. Metodologija i saznanja koja su predstavljena u ovoj doktorskoj disertaciji pružaju preko potrebne osnove za dalji rad na analizi sentimenata za srpski jezik koji je u ovoj oblasti nedovoljno istražen i ima nedostatak jezičkih resursa.

Details

Title

Аутоматско издвајање мишљења из текстуалних коментара студентских анкета

Author

Nikolić, Nikola

Publication year

2021

Publisher

ProQuest Dissertations & Theses

ISBN

9798544211709

Source type

Dissertation or Thesis

Language of publication

Serbian

ProQuest document ID

2570103502

Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.

Аутоматско издвајање мишљења из текстуалних коментара студентских анкета

Content area

Abstract

Details

Suggested sources