WEKO3
アイテム
{"_buckets": {"deposit": "838a7e73-8caf-4ab7-b60a-6d10022f99e2"}, "_deposit": {"created_by": 14, "id": "2000612", "owner": "14", "owners": [14], "pid": {"revision_id": 0, "type": "depid", "value": "2000612"}, "status": "published"}, "_oai": {"id": "oai:kyutech.repo.nii.ac.jp:02000612", "sets": ["24"]}, "author_link": ["24877"], "control_number": "2000612", "item_21_biblio_info_6": {"attribute_name": "書誌情報", "attribute_value_mlt": [{"bibliographicIssueDates": {"bibliographicIssueDate": "2024-04-23", "bibliographicIssueDateType": "Issued"}, "bibliographicIssueNumber": "9", "bibliographicPageStart": "3558", "bibliographicVolumeNumber": "14", "bibliographic_titles": [{"bibliographic_title": "Applied Sciences", "bibliographic_titleLang": "en"}]}]}, "item_21_description_4": {"attribute_name": "抄録", "attribute_value_mlt": [{"subitem_description": "Large Language Models (LLMs), such as ChatGPT, encounter ‘jailbreak’ challenges, wherein safeguards are circumvented to generate ethically harmful prompts. This study introduces a straightforward black-box method for efficiently crafting jailbreak prompts that bypass LLM defenses. Our technique iteratively transforms harmful prompts into benign expressions directly utilizing the target LLM, predicated on the hypothesis that LLMs can autonomously generate expressions that evade safeguards. Through experiments conducted with ChatGPT (GPT-3.5 and GPT-4) and Gemini-Pro, our method consistently achieved an attack success rate exceeding 80% within an average of five iterations for forbidden questions and proved to be robust against model updates. The jailbreak prompts generated were not only naturally worded and succinct, but also challenging to defend against. These findings suggest that the creation of effective jailbreak prompts is less complex than previously believed, underscoring the heightened risk posed by black-box jailbreak attacks.", "subitem_description_language": "en", "subitem_description_type": "Abstract"}]}, "item_21_link_62": {"attribute_name": "研究者情報", "attribute_value_mlt": [{"subitem_link_url": "https://hyokadb02.jimu.kyutech.ac.jp/html/100000509_ja.html"}]}, "item_21_publisher_7": {"attribute_name": "出版社", "attribute_value_mlt": [{"subitem_publisher": "MDPI"}]}, "item_21_relation_12": {"attribute_name": "DOI", "attribute_value_mlt": [{"subitem_relation_type": "isIdenticalTo", "subitem_relation_type_id": {"subitem_relation_type_id_text": "https://doi.org/10.3390/app14093558", "subitem_relation_type_select": "DOI"}}]}, "item_21_rights_13": {"attribute_name": "著作権関連情報", "attribute_value_mlt": [{"subitem_rights": "Copyright (c) 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).", "subitem_rights_resource": "https://creativecommons.org/licenses/by/4.0/"}]}, "item_21_select_59": {"attribute_name": "査読の有無", "attribute_value_mlt": [{"subitem_select_item": "yes"}]}, "item_21_source_id_8": {"attribute_name": "ISSN", "attribute_value_mlt": [{"subitem_source_identifier": "2076-3417", "subitem_source_identifier_type": "EISSN"}]}, "item_21_text_28": {"attribute_name": "論文ID(連携)", "attribute_value_mlt": [{"subitem_text_value": "10430409"}]}, "item_21_text_63": {"attribute_name": "連携ID", "attribute_value_mlt": [{"subitem_text_value": "12223"}]}, "item_21_version_type_58": {"attribute_name": "出版タイプ", "attribute_value_mlt": [{"subitem_version_resource": "http://purl.org/coar/version/c_970fb48d4fbd8a85", "subitem_version_type": "VoR"}]}, "item_creator": {"attribute_name": "著者", "attribute_type": "creator", "attribute_value_mlt": [{"creatorNames": [{"creatorName": "Takemoto, Kazuhiro", "creatorNameLang": "en"}, {"creatorName": "竹本, 和広", "creatorNameLang": "ja"}], "familyNames": [{"familyName": "Takemoto", "familyNameLang": "en"}, {"familyName": "竹本", "familyNameLang": "ja"}], "givenNames": [{"givenName": "Kazuhiro", "givenNameLang": "en"}, {"givenName": "和広", "givenNameLang": "ja"}], "nameIdentifiers": [{"nameIdentifier": "24877", "nameIdentifierScheme": "WEKO"}, {"nameIdentifier": "40512356", "nameIdentifierScheme": "e-Rad", "nameIdentifierURI": "https://nrid.nii.ac.jp/ja/nrid/1000040512356"}, {"nameIdentifier": "35270356700", "nameIdentifierScheme": "Scopus著者ID", "nameIdentifierURI": "https://www.scopus.com/authid/detail.uri?authorId=35270356700"}, {"nameIdentifier": "0000-0002-6355-1366", "nameIdentifierScheme": "ORCiD", "nameIdentifierURI": "https://orcid.org/0000-0002-6355-1366"}, {"nameIdentifier": "100000509", "nameIdentifierScheme": "九工大研究者情報", "nameIdentifierURI": "https://hyokadb02.jimu.kyutech.ac.jp/html/##_ja.html"}]}]}, "item_files": {"attribute_name": "ファイル情報", "attribute_type": "file", "attribute_value_mlt": [{"accessrole": "open_access", "date": [{"dateType": "Available", "dateValue": "2024-05-10"}], "download_preview_message": "", "file_order": 0, "filename": "10430409.pdf", "filesize": [{"value": "299 KB"}], "format": "application/pdf", "future_date_message": "", "is_thumbnail": false, "mimetype": "application/pdf", "size": 299000.0, "url": {"url": "https://kyutech.repo.nii.ac.jp/record/2000612/files/10430409.pdf"}, "version_id": "ee06509b-4915-4b23-9bac-44130d7cf65e"}]}, "item_keyword": {"attribute_name": "キーワード", "attribute_value_mlt": [{"subitem_subject": "large language models", "subitem_subject_scheme": "Other"}, {"subitem_subject": "jailbreak attacks", "subitem_subject_scheme": "Other"}, {"subitem_subject": "security and privacy", "subitem_subject_scheme": "Other"}]}, "item_language": {"attribute_name": "言語", "attribute_value_mlt": [{"subitem_language": "eng"}]}, "item_resource_type": {"attribute_name": "資源タイプ", "attribute_value_mlt": [{"resourcetype": "journal article", "resourceuri": "http://purl.org/coar/resource_type/c_6501"}]}, "item_title": "All in How You Ask for It: Simple Black-Box Method for Jailbreak Attacks", "item_titles": {"attribute_name": "タイトル", "attribute_value_mlt": [{"subitem_title": "All in How You Ask for It: Simple Black-Box Method for Jailbreak Attacks", "subitem_title_language": "en"}]}, "item_type_id": "21", "owner": "14", "path": ["24"], "permalink_uri": "http://hdl.handle.net/10228/0002000612", "pubdate": {"attribute_name": "PubDate", "attribute_value": "2024-05-10"}, "publish_date": "2024-05-10", "publish_status": "0", "recid": "2000612", "relation": {}, "relation_version_is_last": true, "title": ["All in How You Ask for It: Simple Black-Box Method for Jailbreak Attacks"], "weko_shared_id": -1}
All in How You Ask for It: Simple Black-Box Method for Jailbreak Attacks
http://hdl.handle.net/10228/0002000612
http://hdl.handle.net/10228/0002000612a496773b-a777-4b1e-8a63-87199a9a0a58
名前 / ファイル | ライセンス | アクション |
---|---|---|
![]() |
|
Item type | 学術雑誌論文 = Journal Article(1) | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
公開日 | 2024-05-10 | |||||||||
資源タイプ | ||||||||||
資源タイプ識別子 | http://purl.org/coar/resource_type/c_6501 | |||||||||
資源タイプ | journal article | |||||||||
タイトル | ||||||||||
言語 | en | |||||||||
タイトル | All in How You Ask for It: Simple Black-Box Method for Jailbreak Attacks | |||||||||
言語 | ||||||||||
言語 | eng | |||||||||
著者 |
竹本, 和広
× 竹本, 和広
WEKO
24877
|
|||||||||
抄録 | ||||||||||
内容記述タイプ | Abstract | |||||||||
内容記述 | Large Language Models (LLMs), such as ChatGPT, encounter ‘jailbreak’ challenges, wherein safeguards are circumvented to generate ethically harmful prompts. This study introduces a straightforward black-box method for efficiently crafting jailbreak prompts that bypass LLM defenses. Our technique iteratively transforms harmful prompts into benign expressions directly utilizing the target LLM, predicated on the hypothesis that LLMs can autonomously generate expressions that evade safeguards. Through experiments conducted with ChatGPT (GPT-3.5 and GPT-4) and Gemini-Pro, our method consistently achieved an attack success rate exceeding 80% within an average of five iterations for forbidden questions and proved to be robust against model updates. The jailbreak prompts generated were not only naturally worded and succinct, but also challenging to defend against. These findings suggest that the creation of effective jailbreak prompts is less complex than previously believed, underscoring the heightened risk posed by black-box jailbreak attacks. | |||||||||
言語 | en | |||||||||
書誌情報 |
en : Applied Sciences 巻 14, 号 9, p. 3558, 発行日 2024-04-23 |
|||||||||
出版社 | ||||||||||
出版者 | MDPI | |||||||||
DOI | ||||||||||
関連タイプ | isIdenticalTo | |||||||||
識別子タイプ | DOI | |||||||||
関連識別子 | https://doi.org/10.3390/app14093558 | |||||||||
ISSN | ||||||||||
収録物識別子タイプ | EISSN | |||||||||
収録物識別子 | 2076-3417 | |||||||||
著作権関連情報 | ||||||||||
権利情報Resource | https://creativecommons.org/licenses/by/4.0/ | |||||||||
権利情報 | Copyright (c) 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). | |||||||||
キーワード | ||||||||||
主題Scheme | Other | |||||||||
主題 | large language models | |||||||||
キーワード | ||||||||||
主題Scheme | Other | |||||||||
主題 | jailbreak attacks | |||||||||
キーワード | ||||||||||
主題Scheme | Other | |||||||||
主題 | security and privacy | |||||||||
出版タイプ | ||||||||||
出版タイプ | VoR | |||||||||
出版タイプResource | http://purl.org/coar/version/c_970fb48d4fbd8a85 | |||||||||
査読の有無 | ||||||||||
値 | yes | |||||||||
研究者情報 | ||||||||||
https://hyokadb02.jimu.kyutech.ac.jp/html/100000509_ja.html | ||||||||||
論文ID(連携) | ||||||||||
10430409 | ||||||||||
連携ID | ||||||||||
12223 |