Social:AI Constitution

From HandWiki
Short description: Machine learning technique


AI Constitution is a set of rules, principles and guidelines that govern the behaviour of AI system in which it is implemented, such a system is broadly called Constitutional AI (CAI).[1][2][3]

AI Constitution aims to ensure ethical and responsible AI behaviour, as well as to protect the rights and interests of humans and other stakeholders who interact with that AI system.[4] It was first proposed by Anthropic in December 2022 in a research paper titled "Constitutional AI: Harmlessness from AI Feedback".[1] Anthropic has revealed its own set of written principles, which are based on various sources such as the Universal Declaration of Human Rights, Apple's terms of service, Google's ethical guidelines and other principles proposed by other AI research labs (such as Sparrow Principles from Google DeepMind).[1][5][6][7][8]

Background

The concept of using a constitution to align AI systems with human values is not new. It has been discussed by various philosophers, ethicists, and AI researchers for decades. In 1942, Isaac Asimov introduced the famous Three Laws of Robotics, which are designed to prevent robots from harming humans or themselves.[9] In 2019, an ethical framework for AI was created by two ethicists, Luciano Floridi and Josh Cowls, based on the four principles of bioethics - beneficence, non-maleficence, autonomy, and justice - with the addition of a fifth AI-specific principle called explicability.[10] This framework of five principles was intended to provide guidance on developing and deploying AI systems in a manner that is ethical and aligns with human values. The four principles from bioethics aim to promote benefiting society and avoiding harm, respecting human agency and rights, and ensuring fairness. The explicability principle states that AI systems should be designed and operated to make their functioning transparent, understandable, and accountable to end-users as well as wider society.

Constitutional AI method

Constitutional AI is an approach which aims to train harmless AI assistants without direct human oversight over specific model outputs through the method of using AI's help in supervising other AI systems.[1] Human oversight is provided mainly through establishment of high-level rules or principles. The technique uses a set of rules or principles, called a constitution, to guide the AI's behavior and self-improvement. The AI learns to follow the constitution by generating self-critiques and revisions of its own outputs, and by using reinforcement learning to optimize its preferences based on the constitution. The aim of Constitutional AI is to create AI systems that are helpful, harmless, and honest, and that can explain their reasoning and decisions to humans.[1][11]

A diagram of Anthropic's "Constitutional AI" training process.

Constitutional AI involves a two-phase training process. First is a supervised learning (SL) phase, where an initial AI model is sampled to generate responses to queries. The model then critiques and revises its own responses to better align with principles. This self-generated dataset is used to fine-tune the original model. Second is a reinforcement learning (RL) phase. The fine-tuned model from the SL phase is sampled to produce pairs of responses. A separate AI model evaluates which response in each pair better upholds principles. This creates a dataset of AI preferences. A preference model is trained on this dataset and used as the reward signal for RL, an approach referred to as "RL from AI feedback" (RLAIF). Through this SL and RLAIF process, Constitutional AI systems can be trained to be harmless, non-evasive assistants.[1][11] They engage with harmful queries by explaining objections, using chain-of-thought reasoning to improve transparency. The methods enable precise control over AI behavior with minimal human labeling of outputs.

Challenges and limitations

While Constitutional AI has the potential to improve the transparency, safety, and decision-making of AI systems, it also faces several challenges and limitations. One of the main challenges of Constitutional AI is defining the principles that will guide the AI system's behavior. These principles must be comprehensive, clear, and consistent with human values, which can be challenging to achieve.[11] Another criticism is that the principles used in Constitutional AI may not cover all possible scenarios, which can lead to unintended consequences.[11] Additionally, the principles may not be able to account for the complexity and unpredictability of real-world situations. Critics also point out that while Constitutional AI can help control AI behavior, it may not be sufficient to prevent undesirable outcomes. AI systems can still act in ways that are harmful or contrary to human values, even when following the principles outlined in the constitution.[11] Furthermore, Constitutional AI may not be suitable for all types of AI systems or applications. For example, it may be challenging to apply Constitutional AI to AI systems that are designed for creative or exploratory purposes.

See also

References

  1. 1.0 1.1 1.2 1.3 1.4 1.5 arXiv (2022). "Constitutional AI: Harmlessness from AI Feedback". arXiv:2212.08073 [cs.CL].
  2. Wiggers, Kyle (2023-05-09). "Anthropic thinks 'constitutional AI' is the best way to train models" (in en-US). https://techcrunch.com/2023/05/09/anthropic-thinks-constitutional-ai-is-the-best-way-to-train-models/. 
  3. "How Anthropic Is Teaching AI the Difference Between Right and Wrong" (in en). https://www.marketingaiinstitute.com/blog/anthropic-claude-constitutional-ai. 
  4. Vincent, James (2023-05-09). "AI startup Anthropic wants to write a new constitution for safe AI" (in en-US). https://www.theverge.com/2023/5/9/23716746/ai-startup-anthropic-constitutional-ai-safety. 
  5. "Inside the White-Hot Center of A.I. Doomerism" (in en). 2023-07-11. https://www.nytimes.com/2023/07/11/technology/anthropic-ai-claude-chatbot.html. 
  6. Anthropic PBC. "Claude's Constitution". https://www.anthropic.com/index/claudes-constitution. 
  7. Eliot, Lance. "Latest Generative AI Boldly Labeled As Constitutional AI Such As Claude By Anthropic Has Heart In The Right Place, Says AI Ethics And AI Law" (in en). https://www.forbes.com/sites/lanceeliot/2023/05/25/latest-generative-ai-boldly-labeled-as-constitutional-ai-such-as-claude-by-anthropic-has-heart-in-the-right-place-says-ai-ethics-and-ai-law/. 
  8. Edwards, Benj (2023-05-09). "AI gains "values" with Anthropic's new Constitutional AI chatbot approach" (in en-us). https://arstechnica.com/information-technology/2023/05/ai-with-a-moral-compass-anthropic-outlines-constitutional-ai-in-its-claude-chatbot/. 
  9. "Three laws of robotics | Definition, Isaac Asimov, & Facts | Britannica" (in en). https://www.britannica.com/topic/Three-Laws-of-Robotics. 
  10. Floridi, Luciano; Cowls, Josh (2019-07-03). "A Unified Framework of Five Principles for AI in Society" (in en). Harvard Data Science Review 1 (1). doi:10.1162/99608f92.8cd550d1. ISSN 2644-2353. https://hdsr.mitpress.mit.edu/pub/l0jsh9d1/release/8. 
  11. 11.0 11.1 11.2 11.3 11.4 "Anthropic explains how Claude's AI constitution protects it against adversarial inputs" (in en-US). https://www.engadget.com/anthropic-explains-how-its-constitutional-ai-girds-claude-against-adversarial-inputs-160008153.html.