WEKO3
アイテム
時系列予測モデル・大規模言語モデルによる意味トリプルへの変換における自然言語記述の文法的複雑さの影響に関する研究
https://doi.org/10.18997/0002001057
https://doi.org/10.18997/0002001057d298ccd9-64b8-4154-b38e-288e7caf8f35
| 名前 / ファイル | ライセンス | アクション |
|---|---|---|
|
|
|
| アイテムタイプ | 学位論文 = Thesis or Dissertation(1) | |||||||
|---|---|---|---|---|---|---|---|---|
| 公開日 | 2024-11-21 | |||||||
| 資源タイプ | ||||||||
| 資源タイプ識別子 | http://purl.org/coar/resource_type/c_db06 | |||||||
| 資源タイプ | doctoral thesis | |||||||
| タイトル | ||||||||
| タイトル | The Effect of the Grammatical Complexity of Natural Language Descriptions in the Translation to Semantic Triples by Sequential Models: Its Performance and Internal Mechanisms | |||||||
| 言語 | en | |||||||
| タイトル | ||||||||
| タイトル | 時系列予測モデル・大規模言語モデルによる意味トリプルへの変換における自然言語記述の文法的複雑さの影響に関する研究 | |||||||
| 言語 | ja | |||||||
| 言語 | ||||||||
| 言語 | jpn | |||||||
| 著者 |
Manu, Shrivastava
× Manu, Shrivastava
|
|||||||
| 抄録 | ||||||||
| 内容記述タイプ | Abstract | |||||||
| 内容記述 | Industry 5.0 has put machines at the forefront in various industries. Machines are involved in every aspect of human life making it very critical to keep them in proper working condition. These machines are highly complex and rapidly evolving, resulting in a scarcity of highly skilled manpower capable of repairing them. One way to bridge this gap is to develop a knowledge-based system that can understand various machine components, their working, and causes of machine failure and thus help in the maintenance of machines and reduce the dependency on expert technicians. These knowledge bases or ontologies define concepts, relationships, and properties within a particular domain and can generate new inferences based on predicate logic. Manual generation of these concepts and relationships from raw text is very time-consuming and therefore in recent years machine learning techniques have been employed for these tasks, where a sentence is taken as an input and concepts and relations between them are extracted. For these knowledge-based systems to be dependable the concepts and relationship between them should be accurate. The task of extracting concepts and relationships between them for ontology creation is called ontology population task. This Ontology Population Task (OPT) can be formulated as a classification task or a Neural Machine Translation (NMT) task. In the classification task, a sentence is given to a neural network model, and the model finds the words that belong either to a concept or a relation. In the case of NMT, an input sentence is translated to output a Resource Description Framework (RDF) triple. The current work aims to improve the quality of NMT task. The input sentences to a machine learning model can have different structures, the impact of these sentence structures on sequential model performance is not well studied. Most of the Natural Language Processing (NLP) applications are trained using annotated data without any importance given to the structure of the sentences used in training, this may lead to training data being skewed in terms of sentence structure and the distribution of sentences in training may differ from the distribution of sentences in real scenario. In this work to improve the quality of concepts and relations extracted from natural text, we analyze the effect of sentence structure on sequential models based on Bidirectional long short-term memory and Transformer architecture. We provide insight into the learning behavior of sequential models using statistical analysis methods like Kolmogorov-Smirnov test (KS test) and Cramer Von Mises test (CvM test). We also evaluate the model behavior on extraction task based on mean seeking forward Kullback-Leibler Divergence (KLD) and mode seeking backward KLD loss function. Finally, the thesis contributes by providing mechanisms to improve the quality of concepts and relations extracted from natural text. The performance of the sequential model differs based on the loss function used for learning, a Modified Jeffreys Divergence (MJD) proposed in this work that combines the mean seeking behavior of forward KLD and mode seeking behavior of backward KLD contributes to the quality improvement. Based on the insight gained from the statistical analysis: the sequential model’s performance is affected by the structure of the sentences used for training and therefore the data used for training should have a proper distribution of different types of sentence structure, our proposal of a Structure Dependent Weighted Loss Function and the mechanism of selecting different model checkpoints based on sentence type also helped in improving the performance of the sequential model. |
|||||||
| 目次 | ||||||||
| 内容記述タイプ | TableOfContents | |||||||
| 内容記述 | 1 Introduction| 2 Literature Review| 3 Basic Concept| 4 Methodology and Results| 5 Discussion and Conclusion | |||||||
| 備考 | ||||||||
| 内容記述タイプ | Other | |||||||
| 内容記述 | 九州工業大学博士学位論文 学位記番号:生工博甲第498号 学位授与年月日:令和6年9月25日 | |||||||
| 学位授与番号 | ||||||||
| 学位授与番号 | 甲第498号 | |||||||
| 学位名 | ||||||||
| 学位名 | 博士(工学) | |||||||
| 学位授与年月日 | ||||||||
| 学位授与年月日 | 2024-09-25 | |||||||
| 学位授与機関 | ||||||||
| 学位授与機関識別子Scheme | kakenhi | |||||||
| 学位授与機関識別子 | 17104 | |||||||
| 学位授与機関名 | 九州工業大学 | |||||||
| 言語 | ja | |||||||
| 学位授与年度 | ||||||||
| 内容記述タイプ | Other | |||||||
| 内容記述 | 令和6年度 | |||||||
| 出版タイプ | ||||||||
| 出版タイプ | VoR | |||||||
| 出版タイプResource | http://purl.org/coar/version/c_970fb48d4fbd8a85 | |||||||
| アクセス権 | ||||||||
| アクセス権 | open access | |||||||
| アクセス権URI | http://purl.org/coar/access_right/c_abf2 | |||||||
| ID登録 | ||||||||
| ID登録 | 10.18997/0002001057 | |||||||
| ID登録タイプ | JaLC | |||||||