WEKO3
アイテム
限られた計算機資源やデータを有効利用した意見マイニング
https://doi.org/10.18997/0002001049
https://doi.org/10.18997/0002001049eba8a7a1-f3d7-4f08-83c2-76b85e9ea9dd
| 名前 / ファイル | ライセンス | アクション |
|---|---|---|
|
|
|
| Item type | 学位論文 = Thesis or Dissertation(1) | |||||||
|---|---|---|---|---|---|---|---|---|
| 公開日 | 2024-11-20 | |||||||
| 資源タイプ | ||||||||
| 資源タイプ識別子 | http://purl.org/coar/resource_type/c_db06 | |||||||
| 資源タイプ | doctoral thesis | |||||||
| タイトル | ||||||||
| タイトル | Opinion Mining by Utilizing Limited Computational Resources and Data | |||||||
| 言語 | en | |||||||
| タイトル | ||||||||
| タイトル | 限られた計算機資源やデータを有効利用した意見マイニング | |||||||
| 言語 | ja | |||||||
| 言語 | ||||||||
| 言語 | jpn | |||||||
| 著者 |
Al-Mahmud,
× Al-Mahmud,
|
|||||||
| 抄録 | ||||||||
| 内容記述タイプ | Abstract | |||||||
| 内容記述 | As the Internet’s role in daily communication expands, so does the practice of sharing opinions on platforms like Facebook, Twitter, and Amazon. This proliferation of usergenerated content makes manual analysis of online opinion trends impractical, highlighting the need for automated opinion mining techniques. Opinion mining, a subfield of natural language processing (NLP), aims to extract and analyze opinions from textual data and plays a crucial role in market research, product feedback, and public opinion analysis. In NLP research, two primary approaches are used to categorize text: sentence-level and token-level classification. Sentence-level classification categorizes entire sentences into predefined classes, commonly utilized in sentiment analysis, topic classification, and document classification. Conversely, token-level classification is more granular, assigning labels to individual words or tokens within texts, which is essential for tasks like named entity recognition (NER), part-of-speech (POS) tagging, and fine-grained sentiment analysis. This thesis utilizes limited computational resources (such as memory and processing power) and data in opinion mining by dealing with four tasks: opinion holder detection, sentiment analysis, aspect-opinion extraction, and intent analysis. In this thesis, opinion holder detection consists of two steps: detecting the presence of opinion holders in the text and identifying them. For the first step, we employ DistilBERT as a feature extractor with logistic regression (LR), namely DistilBERT+LR to utilize limited computational resources with better performance than BERT+LR. The second step employs a character-level contextual string embedding (CSE) model with conditional random field (CRF), namely CSE+CRF, which utilizes limited computational resources while exhibiting very competitive performance compared with the heavyweight models. In sentiment analysis for limited Bangla data, we apply stepwise learning utilizing transformers-based models. This technique leverages an auxiliary task with larger datasets to improve performance on the main task with smaller datasets. The effectiveness of writers’ opinion expression styles (“nativeness”) between source and target data in stepwise learning is also explored. Aspect-opinion extraction focuses on Bangla, addressing the limitations of conventional sentiment analysis, which assumes a single sentiment per text. Aspect-opinion extraction identifies multiple targets and opinions within a text, providing a more granular understanding. We propose a method that combines feature embeddings from different transformers-based models followed by fine-tuning to utilize limited prepared Bangla data effectively for performance improvement. The intent analysis study extends conventional sentiment analysis by introducing more classes such as suggestion and sarcastic for deeper insights and handles the task at a granular level, differing from the token-level aspect-opinion extraction where there is no dealing with sarcasm. Due to no annotated Bangla data for this task, we generate ChatGPT data as auxiliary data. We also prepare user-generated limited Bangla data. Then we deploy a semi-supervised self-training with transformers-based models to exploit the auxiliary data to enhance performance on the prepared user-generated limited Bangla data. |
|||||||
| 目次 | ||||||||
| 内容記述タイプ | TableOfContents | |||||||
| 内容記述 | 1 Introduction| 2 Basic Model and Technique| 3 Dataset Construction and Opinion Holder Detection Using Pre-Trained Models| 4 Demonstration of Effectiveness of Nativeness in Stepwise Learning by Performing Bangla Sentiment Analysis| 5 Dataset Construction and Evaluation for Aspect-Opinion Extraction in Bangla Fine-Grained Sentiment Analysis| 6 Demonstration of Effectiveness of Exploiting ChatGPT-Generated Data to the Transformers-Based Models by Performing Bangla Intent Analysis| 7 Conclusion |
|||||||
| 備考 | ||||||||
| 内容記述タイプ | Other | |||||||
| 内容記述 | 九州工業大学博士学位論文 学位記番号:情工博甲第402号 学位授与年月日:令和6年9月25日 | |||||||
| 学位授与番号 | ||||||||
| 学位授与番号 | 甲第402号 | |||||||
| 学位名 | ||||||||
| 学位名 | 博士(情報工学) | |||||||
| 学位授与年月日 | ||||||||
| 学位授与年月日 | 2024-09-25 | |||||||
| 学位授与機関 | ||||||||
| 学位授与機関識別子Scheme | kakenhi | |||||||
| 学位授与機関識別子 | 17104 | |||||||
| 言語 | ja | |||||||
| 学位授与機関名 | 九州工業大学 | |||||||
| 学位授与年度 | ||||||||
| 内容記述タイプ | Other | |||||||
| 内容記述 | 令和6年度 | |||||||
| 出版タイプ | ||||||||
| 出版タイプ | VoR | |||||||
| 出版タイプResource | http://purl.org/coar/version/c_970fb48d4fbd8a85 | |||||||
| アクセス権 | ||||||||
| アクセス権 | open access | |||||||
| アクセス権URI | http://purl.org/coar/access_right/c_abf2 | |||||||
| ID登録 | ||||||||
| ID登録 | 10.18997/0002001049 | |||||||
| ID登録タイプ | JaLC | |||||||