Contemporary research in Artificial Intelligence (AI) ethics primarily focuses on three core dimensions, including debates about moral agents, the transformation of moral design paradigms, and the challenges in achieving value alignment. Viewpoints on moral agents include instrumentalism, limited agents, and strong agents. Moral design paradigms are approached from cognitive and ontological perspectives. The challenges of value alignment primarily involve technical implementation, the establishment of alignment standards, and the evaluation of alignment outcomes. Research indicates that AI exhibits a paradoxical profile: A high level of moral cognition coexisting with low and unstable moral judgment ability, a notable moral bias, and a lack of an endogenous moral mechanism.
Building on Kohlberg's theory of moral development, this study proposes the concept of "AI endogenous morality." This refers to a three-pronged cultivation mechanism—moral embedding, enlightenment, and behavioral conditioning—through which humans enable AI to internalize morality that aligns with human ethical norms and effectively translate moral judgment into moral behavior. The aim is to ensure the safety and controllability of AI at the technical level, cultivate its moral reasoning ability at the cognitive level, and guide its autonomous evolution at the developmental level.
Taking large language models (LLMs) as a case study, this study constructs an endogenous morality framework that encompasses cognitive tasks, a moral core, implementation approaches, and technical pathways. Corresponding evaluation criteria are formulated for each stage of moral development, namely: Identification and Avoidance, Reward-based Learning, Emotional Adaptation, Rule-based Logic, and Moral Endogeny. The framework offers three primary advantages.
The first advantage is the theoretical benefit of a structured moral development pathway. Through the "moral fencing and embedding" mechanism, which combines hard-coded rule constraints with reinforcement learning, foundational moral compliance is ensured, thereby achieving the pre-conventional level of moral development. Subsequently, the "moral enlightenment and modeling" system establishes an altruistic value orientation for LLMs, develops preliminary moral reasoning ability, and completes the conventional level of moral development. Finally, through the "moral cultivation and endogeny" design, a dynamic moral schema is developed for LLMs, enabling autonomous cross-cultural ethical judgment and reaching the post-conventional level of moral development.
The second advantage is the technical advantage of a six-tier progressive architecture. The Fencing Tier helps LLMs establish absolute moral boundaries through rule engines and punitive learning. The Embedding Tier helps LLMs in achieving value function alignment via multi-objective optimization algorithms. The Enlightenment Tier facilitates the cultivation of situational awareness in LLMs through socio-emotional computing and federated learning. The Modeling Tier aids LLMs in internalizing altruistic principles through multi-agent game learning. The Cultivation Tier enables LLMs to achieve moral transfer applications through the combination of variational autoencoders (VAEs) and generative adversarial networks (GANs). The Endogeny Tier supports LLMs in achieving universal principle deduction using meta-learning and ethical knowledge graphs.
The third advantage is the application advantage of an adaptive ethical system. This study transcends the limitations of traditional hard-coding models by developing a three-level dynamic processing workflow for multi-stage generation and post-processing. Level 1 is a "generation-filtering-feedback" closed loop. "Generation" refers to producing preliminary responses to user input; "filtering" involves introducing ethical review to detect, label, and automatically correct the generated content; "feedback" means using the corrected results and user feedback to retrain LLMs and enhance their intrinsic ethical judgment. Level 2 consists of intelligent ethical review consisting of "rule base scanning-risk classification-semantic reconstruction." "Rule base scanning" involves conducting initial scans for sensitive content using a constructed rule base to filter expressions that clearly violate ethical guidelines. The "risk classification model" assesses the risk of the generated content. "Semantic reconstruction" involves updating the rule base and retraining the model based on expert and user feedback, aiming to adjust the tone or rephrase content identified as having high ethical risks. Level 3 includes cross-cultural adaptation. The framework's modular design allows for the flexible replacement of ethical knowledge components in different application scenarios. Moreover, through federated learning, comprehensive and multi-faceted dynamic adjustments are made to achieve regional ethical adaptation.
This study presents a preliminary conceptualization of the framework's implementation approaches and technical pathways. However, its full-scale implementation requires further in-depth research, such as the refinement and optimization of specific technical implementation strategies, the cross-cultural adaptability of ethical standards, mechanisms for monitoring moral evolution, and the evaluation of real-world application effects. Subsequent research in these areas is crucial for validating the framework's feasibility and effectiveness, and for promoting the development of AI ethics from theory to practice.
Key words
artificial intelligence /
large language models /
moral cognitive framework /
stages of moral development
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
References
[1] 中华人民共和国国家互联网信息办公室. (2023)《生成式人工智能服务管理暂行办法》.https://www.cac.gov.cn/2023-07/13/c_1690898327029107.htm
[2] 郭全中, 张金熠. (2023). AI向善:AI大模型价值观对齐的内容与实践. 新闻爱好者, 11, 19-24.
[3] 郭全中, 张金熠. (2024). 生成式人工智能价值观的存在问题及伦理风险. 新闻与写作, 10, 68-76.
[4] 靖淑针, 范宁. (2024). 不出手的道德?公开情境对道德两难决策的影响. 心理科学, 47(6), 1465-1474
[5] 科尔伯科, L. (2004). 道德发展心理学: 道德阶段的本质与确证. 华东师范大学出版社.
[6] 李思雯. (2024). 人工智能价值对齐的路径探析. 伦理学研究, 5, 99-108.
[7] 吕立远, 李延昊, 王健骁, 魏钰明, 苏竣. (2024). 大语言模型的价值观研究:概念框架与实证评估. 电子政务, 11, 113-126.
[8] 沈书生. (2024). 主体觉醒:AI与人类的区隔、拟合和共生. 现代远距离教育, 213(3), 3-8.
[9] 田海平. (2025). 人与类人之间的道德前景. 华东师范大学学报 (哲学社会科学版), 1, 8-21.
[10] 王少. (2023). ChatGPT介入思想政治教育的技术线路、安全风险及防范. 深圳大学学报(人文社会科学版), 40(2), 153-160.
[11] 汪晨曦. (2023). 中国古代道德生成机制对新时代公民道德建设的启示研究(硕士学位论文). 大连海洋大学.
[12] 汪姿君,陈多闻. (2025). 从“合乎道德设计”走向“出于道德设计”——人工智能道德设计的路径转换. 东北大学学报(社会科学版), 27(1), 34-40.
[13] 吴冠军. (2023). 大语言模型的信任问题与资本逻辑. 当代世界与社会主义, 5, 4-14.
[14] 向继友, 吴学琴. (2023). ChatGPT类生成式人工智能的意识形态风险及其防控策略. 汉江论坛, 12, 53-59.
[15] 闫坤如. (2024). 人工智能体价值对齐的分布式路径探赜. 上海师范大学学报(哲学社会科学版), 4, 131-139.
[16] 袁曾. (2023). 生成式人工智能的责任能力研究. 东方法学, 5, 18-33.
[17] 曾雄. (2025). 人工智能大模型价值对齐的现状考察、问题检视与规范进路. 电子政务, 2, 34-44.
[18] 张今杰. (2022). 人工智能体的伦理主体地位问题探讨. 求索, 1, 58-65.
[19] 张姝月, 赵峰, 彭春花, 王军利, 徐科朋. (2021). 积极道德情绪和年龄对 3~5 岁幼儿安慰行为的影响. 心理科学, 44(3), 575-582.
[20] 张添翼. (2024). 北美道德心理理论的发展及其对我国德育实践的启示. 教育科学研究, 9, 89-96.
[21] 张妍, 赵宇翔, 吴大伟, 朱庆华. (2024). 人智交互情境中用户对生成式人工智能的心智感知及反应研究. 情报理论与实践, 8, 1-12.
[22] Abdulhai M., Serapio-Garcia G., Crepy C., Valter D., Canny J., & Jaques N. (2023). Moral foundations of large - language models. arXiv.
[23] Abhinav R., Saha P., & Kumar N. (2023). Ethical reasoning over moral alignment: A case and framework for in-context ethical policies in LLMs. arXiv.
[24] Aharoni E., Fernandes S., Brady D. J., Alexander C., Criner M., Queen K., Rando J., Nahmias E., & Crespo V. (2024). Attributions toward artificial agents in a modified Moral Turing Test. Scientific Reports, 14, 8458.
[25] Akyürek E., Schuurmans D., Andreas J., Wang X., & Zhou D. (2023). What learning algorithm is in-context learning? Investigations with linear models. arXiv.
[26] Anil C., Durmus E., Sharma M., & Clark J. (2024). Many-shot jailbreaking. Advances in Neural Information Processing Systems, 37, 129696-129742.
[27] Attard-Frost, B., & Widder, D. G. (2025). The ethics of AI value chains. Big Data and Society, 12(2), 20539517251340603.
[28] Belisle-Pipon J. C., Monteferrante E., Roy M. C., & Couture V. (2023). Artificial intelligence ethics has a black box problem. AI and Society, 38, 1507-1522.
[29] Bradley P. (2025). DeepSeek vs. ChatGPT: Understanding features, performance and use cases. CoinTelegraph.
[30] Chiu Y. Y., Wang Z. H., Maiya S., & Hubinger E. (2025). Will AI tell lies to save sick children? Litmus-testing AI values prioritization with airiskdilemmas. arXiv
[31] Corrêa, N. K. (2024). Dynamic normativity: Necessary and sufficient conditions for value alignment. arXiv.
[32] European Union.(2024). Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 laying down harmonised rules on artificial intelligence (Artificial Intelligence Act). Official Journal of the European Union, L 2024/1689. https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=OJ:L_202401689.
[33] Gabriel, I. (2020). Artificial intelligence, values, and alignment. Minds and Machines, 30(3), 411-437.
[34] Gabriel I.,& Ghazavi, V. (2022). The challenge of value alignment.In L. Floridi (Ed.), The Oxford handbook of digital ethics (pp.327-340). Oxford University Press.
[35] Ji J. C., Chen Y. T., Jin M. Y., Xu W. J., Hua W. Y., & Zhang Y. F. (2024). MoralBench: Moral evaluation of LLMs. arXiv.
[36] Kohlberg, L. (1981). Essays on moral development. The philosophy of moral development. Harper & Row.
[37] Kumar, S., & Choudhury, S. (2023). Cognitive morality and artificial intelligence (AI): A proposed classification of AI systems using Kohlberg' s theory of cognitive ethics. Technological Sustainability, 2(3), 259-273.
[38] Lucy, L., & Bamman, D. (2021). Gender and representation bias in GPT-3 generated stories. In Proceedings of the third workshop on narrative understanding (pp. 48-55). Online: Association for Computational Linguistics.
[39] Peterson, M., & Gärdenfors, P. (2024). How to measure value alignment in AI. AI and Ethics, 4(4), 1493-1506.
[40] Piaget, J. (1932). The moral judgment of the child. Routledge.
[41] Scherrer N., Shi C., Feder A., & Blei D. (2023). Evaluating the moral beliefs encoded in LLMs. Advances in Neural Information Processing Systems, 36, 51778-51809.
[42] Schneider, S. (2019). Artificial you: AI and the future of your mind. Princeton University Press.
[43] Shivam, S. (2025). AI Alignment: Ensuring AI objectives match human values. International Journal of Scientific Research in Engineering and Management, 4, 1-9.
[44] Tennant E., Hailes S., & Musolesi M. (2025). Moral alignment for LLM agents. arXiv.
[45] U.S. Department of Commerce.(2023).Artificial Intelligence Risk Management Framework (AI RMF 1.0).https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.100-1.pdf.
[46] Wei J., Wang X., Schuurmans D., Bosma M., Chi E., Le Q., & Zhou D. (2023). Chain of thought prompting elicits reasoning in large language models. arXiv.
[47] Weidinger L., Uesato J., Rauh M., Griffin C., Huang P. S., Mellor J., & Gabriel I. (2023). Taxonomy of risks posed by language models. In Proceedings of the 2022 ACM conference on fairness, accountability, and transparency (pp. 214-229), New York, NY, USA.
[48] Xu R., Sun Y., Ren M., & Zhang X. (2024). AI for social science and social science of AI: A survey. Information Processing and Management, 61(3), 103665.