the latest in aiBeta

Constitutional AI

Constitutional AI aims to align artificial intelligence systems with human values by incorporating ethical principles directly into their design and training. Current research focuses on methods for deriving these principles, including techniques that aggregate diverse human feedback, learn principles from existing datasets of preferences, and iteratively refine principles through automated processes. This approach holds significant promise for improving AI safety and trustworthiness, offering a more scalable and potentially less biased alternative to solely relying on human oversight for AI alignment.

10papers

Papers

April 7, 2025

Constitution or Collapse? Exploring Constitutional AI with Llama 3-8B
LLaMa LlamaCare Artificial Intelligence Language Model Event Collapse Multi Attribute Helpfulness Dataset Constitutional AI

March 12, 2025

SciFi-Benchmark: How Would AI-Powered Robots Behave in Science Fiction Literature?
Constitutional AI Robot Behavior Artificial Intelligence English Literature

March 3, 2025

Proportionality in Thumbs Up and Down Voting
Computational Social Choice Constitutional AI Proportionality Notion Human Finger Voting Method

February 23, 2025

Toward Responsible Federated Large Language Models: Leveraging a Safety Filter and Constitutional AI
Human SAFETY Large Language Model LLM Safety Responsible AI Constitutional AI

February 21, 2025

C3AI: Crafting and Evaluating Constitutions for Constitutional AI
Design Principle Artificial Intelligence Framework Constitutional AI

February 1, 2025

How Effective Is Constitutional AI in Small LLMs? A Study on DeepSeek-R1 and Its Peers
Study Feature Constitutional AI Self Feedback Large Language Model Peer Agent DeepSeek Coder

January 31, 2025

Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming
Red Teaming Constitutional AI Jailbreak Attack LLM Based Robust Defense Temporal Data

January 28, 2025

Decoding Human Preferences in Alignment: An Improved Approach to Inverse Constitutional AI
Inverse Task Better Alignment Artificial Intelligence Contextual Alignment Constitutional AI Concept Extraction

June 24, 2024

Public Constitutional AI
Constitutional AI Trustworthy Artificial Intelligence Artificial Intelligence Governance

June 12, 2024

Collective Constitutional AI: Aligning a Language Model with Public Input
Language Model Constitutional AI Language Model Behavior

June 2, 2024

Inverse Constitutional AI: Compressing Preferences into Principles
Human Feedback Constitutional AI Preference Pair Preference Feedback Feedback Annotation General Principle

April 16, 2024

Social Choice Should Guide AI Alignment in Dealing with Diverse Human Feedback
Collective Decision AI Ethic AI Alignment Reinforcement Learning Expert Feedback Social Choice Constitutional AI

March 27, 2024

IterAlign: Iterative Constitutional Alignment of Large Language Models
Constitutional AI LLM Alignment Large Language Model

February 12, 2024

Suppressing Pink Elephants with Direct Principle Feedback
LLM Behavior Constitutional AI Language Model Fine Tuned Llama Feedback System Pink Elephant

November 18, 2023

Case Repositories: Towards Case-Based Reasoning for AI Alignment
Case Based Reasoning AI Alignment Constitutional AI

October 24, 2023

ConstitutionMaker: Interactively Critiquing Large Language Models by Converting Feedback into Principles
Constitutional AI Chatbot Response Human Feedback Feedback Mechanism General Principle Large Language Model

October 20, 2023

Specific versus General Principles for Constitutional AI
Conversational Model Ethical Behavior Constitutional AI Human Feedback General Principle

December 15, 2022

Constitutional AI: Harmlessness from AI Feedback
Constitutional AI AI Feedback Human Preference AI Assistant