ログイン
Language:

WEKO3

  • トップ
  • ランキング
To
lat lon distance
To

Field does not validate



インデックスリンク

インデックスツリー

メールアドレスを入力してください。

WEKO

One fine body…

WEKO

One fine body…

アイテム

  1. 学位論文
  2. 学位論文

コードスイッチングと低リソース環境下での感情分析

https://doi.org/10.18997/0002001048
https://doi.org/10.18997/0002001048
22760bc7-0c94-4e61-a35e-de17bd429e23
名前 / ファイル ライセンス アクション
jou_k_401.pdf jou_k_401.pdf (3.6 MB)
アイテムタイプ 学位論文 = Thesis or Dissertation(1)
公開日 2024-11-20
資源タイプ
資源タイプ識別子 http://purl.org/coar/resource_type/c_db06
資源タイプ doctoral thesis
タイトル
タイトル Sentiment Analysis on Code Switched and Low Resource Settings
言語 en
タイトル
タイトル コードスイッチングと低リソース環境下での感情分析
言語 ja
言語
言語 jpn
著者 Niraj Pahari,

× Niraj Pahari,

en Niraj Pahari,

Search repository
抄録
内容記述タイプ Abstract
内容記述 Opinion mining, also referred to as sentiment analysis, involves identifying and categorizing the sentiment or opinion expressed within textual data. The ability to understand the sentiment from the text can provide valuable insights which can be leveraged across various domains. Due to the increasing use of social media and e-commerce, trend of people expressing their opinion on any entities is on the rise. As social media exhibits casual language norms, users tend to employ colloquialisms and codeswitching when communicating online. Code-switching is the phenomenon of mixing two or more languages during a single communication act. While existing language models demonstrate proficiency in processing different languages, their performance is compromised when faced with mixed-language input. Moreover, these language models require sufficient training data to achieve optimal performance. This dissertation aims to address the challenges associated with sentiment analysis in low-resource and code-switched settings.
First, we utilize the knowledge in one task to improve the performance of the model on another related task with the help of multi-task learning (MTL). Moreover, we utilize generative models to generate synthetic data that can be employed for training more robust models. We propose a novel MTL framework, based on soft parameter sharing utilizing BERT models for three tasks based on fine-grained sentiment analysis.
The distribution of knowledge across different layers of the BERT architecture is also investigated. Furthermore, we integrate data augmentation with our proposed MTL framework to train more robust models for each involved task. Experiment results demonstrate the effectiveness of data augmentation and MTL for sharing the knowledge between the related tasks, ultimately enabling the overcoming of data scarcity challenges.
Then, we study various strategies for mitigating the data scarcity in code-switching. To address this issue, we employ MTL, data augmentation, and semi-supervised learning approaches. Additionally, we create a novel dataset for sentiment analysis of Nepali- English mixed text, thereby contributing to the existing body of knowledge on this topic. Our research also seeks to investigate the preferred language of expression among Nepali-English multilingual speakers when articulating distinct opinions. The findings suggest that individuals tend to prefer their native language over their second language for expression of negative sentiments. It is worth noting that the official script for Nepali language is Devanagari script. However, there is growing trend to use Romanized form (in Latin script) in social media even for Nepali language. This significantly impacts the performance of language models on code-switched text as the language models are trained primarily with the text on native script for each language. To address this mismatch, we propose cross-script knowledge transfer method that leverages the knowledge captured in respective script of the language. Results obtained in two distinct language pairs demonstrate the efficacy of our model in understanding code-switched language pairs.
目次
内容記述タイプ TableOfContents
内容記述 1 Introduction| 2 Preliminaries| 3 Handling Data Scarcity Using Multitask Learning and Data Augmentation for Monolingual Text| 4 Handling Data Scarcity Using Data Augmentation and Learning Based Approaches for Code-Switched Text| 5 Sentiment Analysis Dataset Construction and Linguistic Behaviour
Study| 6 Cross-Language-Script Transfer and Alignment for Sentiment Detection
in Code-Switched Data| 7 Conclusion
備考
内容記述タイプ Other
内容記述 九州工業大学博士学位論文 学位記番号:情工博甲第401号 学位授与年月日:令和6年9月25日
学位授与番号
学位授与番号 甲第401号
学位名
学位名 博士(情報工学)
学位授与年月日
学位授与年月日 2024-09-25
学位授与機関
学位授与機関識別子Scheme kakenhi
学位授与機関識別子 17104
学位授与機関名 九州工業大学
言語 ja
学位授与年度
内容記述タイプ Other
内容記述 令和6年度
出版タイプ
出版タイプ VoR
出版タイプResource http://purl.org/coar/version/c_970fb48d4fbd8a85
アクセス権
アクセス権 open access
アクセス権URI http://purl.org/coar/access_right/c_abf2
ID登録
ID登録 10.18997/0002001048
ID登録タイプ JaLC
戻る
0
views
See details
Views

Versions

Ver.1 2024-11-20 02:28:36.703568
Show All versions

Share

Share
tweet

Cite as

Other

print

エクスポート

OAI-PMH
  • OAI-PMH JPCOAR 2.0
  • OAI-PMH JPCOAR 1.0
  • OAI-PMH DublinCore
  • OAI-PMH DDI
Other Formats
  • JSON
  • BIBTEX
  • ZIP

コミュニティ

確認

確認

確認


Powered by WEKO3


Powered by WEKO3