[1] 中国信息通信研究院,中国科学院.大模型治理蓝皮书:从规则走向实践(2023年)[R/OL].(2023-11)[2025-03-17].http://www.caict.ac.cn/kxyj/qwfb/ztbg/202311/P020231124526622371194.pdf. [2] 腾讯朱雀实验室,腾讯研究院,腾讯混元大模型,等.大模型安全与伦理研究报告2024:以负责任AI引领大模型创新[R/OL].(2024-01-29)[2025-03-17].https://ncstatic-file.clewm.net/rsrc/2024/0129/13/29ef6ae159a0bd8d4e75e5380aae0c47.pdf. [3] PAN A,CHAN J S,ZOU A,et al. Do the Rewards Justify the Means? Measuring Trade-Offs between Rewards and Ethical Behavior in the Machiavelli Benchmark[M].Honolulu,Hawaii:International Conference on Machine Learning,PMLR,2023.https://dblp.org/db/conf/icml/index.html. [4] PEREZ E,RINGER S,LUKOSIUTE K,et al. Discovering Language Model Behaviors with Model-Written Evaluations[M]//ROGERS A, BOYD-GRABER J, OKAZAKI N(eds.).Findings of the Association for Computational Linguistics:AGL2023. Toronto:Association for Computational Linguistics,2023. https://aclanthology.org/2023.findings-acl/. [5] URBINA F,LENTZOS F,INVERNIZZI C,et al. Dual Use of Artificial-Intelligence-Powered Drug Discovery[J].Nature Machine Intelligence,2022(3). [6] 矣晓沅,谢幸.大模型道德价值观对齐问题剖析[J].计算机研究与发展,2023(9). [7] 高玉平.从道德建构到政治建构:论罗尔斯的制度理念[J].道德与文明,2010(4). [8] DeepSeek.“幻方AI & 深度求索 GTC2024 特邀演讲”和而不同:大语言模型价值观对齐解耦化[EB/OL].(2024-03-20)[2025-03-17].https://mp.weixin.qq.com/s/llnNmoQ2p3ZTrMUmS0oH2A. [9] JI M J,QIU T Y,CHEN B Y,et al. AI Alignment: A Comprehensive Survey[J].arXiv preprint, 2023. arXiv:2310.19852. [10] 李思雯.人工智能价值对齐的路径探析[J].伦理学研究,2024(5). [11] 何静.DeepSeek驱动下的范式转型与认知进化[J].阅江学刊,2025(2).