An Analytical Exploration of Pathways for Value Alignment in Artificial Intelligence

Abstract

摘要： 价值对齐作为人工智能技术良善发展的有效手段和必经之路,旨在让大模型的能力、行为与人类的真实意图、价值观以及社会道德准则相一致。现有人工智能价值对齐的技术路径和伦理进路虽具备可行性,但存在局限与不足。价值对齐的技术路径缺乏有效性与可拓展性,受限于人类主观偏好,而弱进路价值对齐路径存在“对齐鸿沟”、价值观难以统一和静态价值观等问题,强进路价值对齐路径则存在道德无法化约为一种能力、情感计算技术的困境和多元主体对齐的复杂性等难题。交互式价值对齐路径是实现人工智能价值对齐的有效路径,赋予人工智能交互主体性是价值对齐的前提,情境化价值共识是交互式价值对齐的关键,通过人机合作和社会场景模拟来实现价值对齐。

关键词: 人工智能, 人机交互关系, 交互式价值对齐

Li Siwen. An Analytical Exploration of Pathways for Value Alignment in Artificial Intelligence[J]. Studies in Ethics, 2024(5): 99-108.

李思雯. 人工智能价值对齐的路径探析[J]. 伦理学研究, 2024(5): 99-108.

References

[1] SCHEUTZ M.The Inherent Dangers of Unidirectional Emotional Bonds Between Humans and Social Robots[M]//Robot Ethics:The Ethical and Social Implications of Robotics. Cambridge:The MIT Press, 2012.
[2] 中国政府网.生成式人工智能服务管理暂行办法[R/OL].(2023-07-10)[2024-07-10]. https://www.gov.cn/zhengce/zhengceku/202307/content_6891752.htm.
[3] LANIER J. The Myth of AI[EB/OL].(2014-11-14)[2024-07-10]. https://www.edge.org/conversation/the-myth-of-ai#26015.
[4] WIENER N.Some Moral and Technical Consequences of Automation:As Machines Learn They May Develop Unforeseen Strategies at Rates that Baffle Their Programmers[J].Science,1960, 131(3410).
[5] LEIKE J, KRUEGER D, EVERITT T, et al. Scalable Agent Alignment via Reward Modeling:A Research Direction[EB/OL].(2018-11-09)[2024-07-10]. https://arxiv.org/abs/1811.07871.
[6] GABRIEL I. Artificial Intelligence, Values,Alignment[J].Minds and Machines, 2020(30).
[7] 腾讯研究院. AI大模型价值对齐:是什么,为什么,怎么做?[EB/OL].(2023-08-24)[2024-07-10]. https://www.tisi.org/26547.
[8] 吴冠军. 大语言模型的信任问题与资本逻辑[J].当代世界与社会主义, 2023(5).
[9] ANTHROPIC. Claude's Constitution[EB/OL].(2023-05-09)[2024-07-10]. https://www.anthropic.com/index/claudes-constitution.
[10] SHEVLANE T, FARQUHAR S, GARFINKEL B, et al. Model Evaluation for Extreme Risks[EB/OL].(2023-09-22)[2024-07-10].https://arxiv.org/abs/2305.15324.
[11] BAKKER M A, CHADWICK M J, SHEAHAN H R, et al. Fine-tuning Language Models to find Agreement Among Humans with Diverse Preferences[EB/OL]. (2022-11-28)[2024-07-10].https://arxiv.org/abs/2211.15006.
[12] 于雪.智能机器的道德设计进路及其责任归因[J].伦理学研究, 2022(4).
[13] 龚群.论弱人工智能体的道德性考察[J].哲学研究, 2023(3).
[14] COECKELBERGH M. Moral Appearances:Emotions, Robots,Human Morality[J].Ethics and information technology, 2010(12).
[15] JI J M, QIU T Y, CHEN B Y, et al. AI Alignment:A Comprehensive Survey[EB/OL].(2024-02-27)[2024-07-10]. https://alignmentsurvey.com.
[16] 董春雨.从机器认识的不透明性看人工智能的本质及其限度[J].中国社会科学, 2023(5).
[17] 亚里士多德.尼各马可伦理学[M].廖申白, 译. 北京:商务印书馆, 2016.
[18] KENWARD B, SINCLAIR T. Machine Morality, Moral Progress,the Looming Environmental Disaster[J].Cognitive Computation and Systems, 2021.
[19] 胡盛澜. 人工情感智能体的道德赋能问题探析[J].自然辩证法研究, 2023,39(2).
[20] 罗莎琳德·皮卡德.情感计算[M].罗森林,译. 北京:北京理工大学出版社, 2005.
[21] 吴童立. 人工智能有资格成为道德主体吗[J].哲学动态, 2021(6).
[22] 付长珍. 机器人会有“同理心”吗?:基于儒家情感伦理学的视角[J].哲学分析, 2019, 10(6).
[23] 孙伟平.价值哲学视域中的算法歧视与社会公正[J].哲学研究, 2023(3).
[24] 刘伟. 人机融合:超越人工智能[M].北京:清华大学出版社, 2021.
[25] 顾心怡. 脑机融合下的交互自治与伦理影响研究[J].自然辩证法通讯, 2023,45(7).
[26] 宋春艳.人机融合智能的自我意识与交互主体性[J].伦理学研究, 2023(5).
[27] 刘伟.人机融合智能的现状与展望[J].国家治理, 2019(4).
[28] KRISHNA R, LEE D, LI F-F, et al. Socially Situated Artificial Intelligence Enables Learning from Human Interaction[EB/OL].(2022-06-14)[2024-07-10]. https://www.pnas.org/doi/epdf/10.1073/pnas.2115730119.
[29] 杨庆峰. 人工智能神话、超级智能及其合约伦理学[J].山西大学学报(哲学社会科学版), 2023,46(6).
[30] DAMIANO L,DUMOUCHEL P.Emotions in Relation. Epistemological and Ethical Scaffolding for Mixed Human-robot Social Ecologies[J].HUMANA. MENTE Journal of Philosophical Studies, 2020,13(37).
[31] FAIRWEATHER N B.Why Incomplete Codes of Ethics Are Worse than None At All[M]//Computer Ethics and Professional Responsibility. Malden:Blackwell Publishing, 2004.
[32] LISCIO E, MEER M V D, SIEBERT L C, et al. What Values should an Agent Align with?:An Empirical Comparison of General and Context-specific Values[J].Autonomous Agents and Multi-Agent Systems, 2022,36(23).
[33] EMELIN D, BRAS R L, HWANG J D, et al. Moral Stories:Situated Reasoning about Norms, Intents, Actions,Their Consequences[EB/OL].(2020-12-31)[2024-07-10]. https://arxiv.org/pdf/2012.15738v1.
[34] HENDRYCKS D, BURNS C, BASART S, et al. Aligning AI with Shared Human Values[EB/OL].(2020-08-05)[2024-07-10]. https://arxiv.org/pdf/2008.02275v1.
[35] JIANG L W, HWANG J, BHAGAVATULA C, et al. Can Machines Learn Morality?The Delphi Experiment[EB/OL].(2022-07-12)[2024-07-10]. https://arxiv.org/abs/2110.07574.
[36] 刘哲.人工智能时代身体异化的隐忧:从现象学角度反思人与智能机器人的交互关系[J].外国哲学, 2022(2).
[37] YUAN L Y, GAO X F, ZHENG Z L, et al. In situ Bidirectional Human-robot Value Alignment[J].Science Robotics, 2022(7).
[38] LUCIANO F, SANDERS J W.On the Morality of Artificial Agents[J].Minds and Machine, 2004, 14(3).
[39] LIU R, YANG R X, JIA C Y, et al. Training Socially Aligned Language Models on Simulated Social Interactions[EB/OL].(2023-10-28)[2024-07-10]. https://arxiv.org/pdf/2305.16960.
[40] ARMSTRONG S.Smarter than Us:The Rise of Machine Intelligence[M].Berkeley:Machine Intelligence Research Institute, 2014.