WEKO3
アイテム
コードスイッチングと低リソース環境下での感情分析
https://doi.org/10.18997/0002001048
https://doi.org/10.18997/000200104822760bc7-0c94-4e61-a35e-de17bd429e23
| 名前 / ファイル | ライセンス | アクション |
|---|---|---|
|
|
|
| アイテムタイプ | 学位論文 = Thesis or Dissertation(1) | |||||||
|---|---|---|---|---|---|---|---|---|
| 公開日 | 2024-11-20 | |||||||
| 資源タイプ | ||||||||
| 資源タイプ識別子 | http://purl.org/coar/resource_type/c_db06 | |||||||
| 資源タイプ | doctoral thesis | |||||||
| タイトル | ||||||||
| タイトル | Sentiment Analysis on Code Switched and Low Resource Settings | |||||||
| 言語 | en | |||||||
| タイトル | ||||||||
| タイトル | コードスイッチングと低リソース環境下での感情分析 | |||||||
| 言語 | ja | |||||||
| 言語 | ||||||||
| 言語 | jpn | |||||||
| 著者 |
Niraj Pahari,
× Niraj Pahari,
|
|||||||
| 抄録 | ||||||||
| 内容記述タイプ | Abstract | |||||||
| 内容記述 | Opinion mining, also referred to as sentiment analysis, involves identifying and categorizing the sentiment or opinion expressed within textual data. The ability to understand the sentiment from the text can provide valuable insights which can be leveraged across various domains. Due to the increasing use of social media and e-commerce, trend of people expressing their opinion on any entities is on the rise. As social media exhibits casual language norms, users tend to employ colloquialisms and codeswitching when communicating online. Code-switching is the phenomenon of mixing two or more languages during a single communication act. While existing language models demonstrate proficiency in processing different languages, their performance is compromised when faced with mixed-language input. Moreover, these language models require sufficient training data to achieve optimal performance. This dissertation aims to address the challenges associated with sentiment analysis in low-resource and code-switched settings. First, we utilize the knowledge in one task to improve the performance of the model on another related task with the help of multi-task learning (MTL). Moreover, we utilize generative models to generate synthetic data that can be employed for training more robust models. We propose a novel MTL framework, based on soft parameter sharing utilizing BERT models for three tasks based on fine-grained sentiment analysis. The distribution of knowledge across different layers of the BERT architecture is also investigated. Furthermore, we integrate data augmentation with our proposed MTL framework to train more robust models for each involved task. Experiment results demonstrate the effectiveness of data augmentation and MTL for sharing the knowledge between the related tasks, ultimately enabling the overcoming of data scarcity challenges. Then, we study various strategies for mitigating the data scarcity in code-switching. To address this issue, we employ MTL, data augmentation, and semi-supervised learning approaches. Additionally, we create a novel dataset for sentiment analysis of Nepali- English mixed text, thereby contributing to the existing body of knowledge on this topic. Our research also seeks to investigate the preferred language of expression among Nepali-English multilingual speakers when articulating distinct opinions. The findings suggest that individuals tend to prefer their native language over their second language for expression of negative sentiments. It is worth noting that the official script for Nepali language is Devanagari script. However, there is growing trend to use Romanized form (in Latin script) in social media even for Nepali language. This significantly impacts the performance of language models on code-switched text as the language models are trained primarily with the text on native script for each language. To address this mismatch, we propose cross-script knowledge transfer method that leverages the knowledge captured in respective script of the language. Results obtained in two distinct language pairs demonstrate the efficacy of our model in understanding code-switched language pairs. |
|||||||
| 目次 | ||||||||
| 内容記述タイプ | TableOfContents | |||||||
| 内容記述 | 1 Introduction| 2 Preliminaries| 3 Handling Data Scarcity Using Multitask Learning and Data Augmentation for Monolingual Text| 4 Handling Data Scarcity Using Data Augmentation and Learning Based Approaches for Code-Switched Text| 5 Sentiment Analysis Dataset Construction and Linguistic Behaviour Study| 6 Cross-Language-Script Transfer and Alignment for Sentiment Detection in Code-Switched Data| 7 Conclusion |
|||||||
| 備考 | ||||||||
| 内容記述タイプ | Other | |||||||
| 内容記述 | 九州工業大学博士学位論文 学位記番号:情工博甲第401号 学位授与年月日:令和6年9月25日 | |||||||
| 学位授与番号 | ||||||||
| 学位授与番号 | 甲第401号 | |||||||
| 学位名 | ||||||||
| 学位名 | 博士(情報工学) | |||||||
| 学位授与年月日 | ||||||||
| 学位授与年月日 | 2024-09-25 | |||||||
| 学位授与機関 | ||||||||
| 学位授与機関識別子Scheme | kakenhi | |||||||
| 学位授与機関識別子 | 17104 | |||||||
| 学位授与機関名 | 九州工業大学 | |||||||
| 言語 | ja | |||||||
| 学位授与年度 | ||||||||
| 内容記述タイプ | Other | |||||||
| 内容記述 | 令和6年度 | |||||||
| 出版タイプ | ||||||||
| 出版タイプ | VoR | |||||||
| 出版タイプResource | http://purl.org/coar/version/c_970fb48d4fbd8a85 | |||||||
| アクセス権 | ||||||||
| アクセス権 | open access | |||||||
| アクセス権URI | http://purl.org/coar/access_right/c_abf2 | |||||||
| ID登録 | ||||||||
| ID登録 | 10.18997/0002001048 | |||||||
| ID登録タイプ | JaLC | |||||||