Organization:Alignment Research Center

Alignment Research Center
Formation	April 2021; 4 years ago
Founder	Paul Christiano
Type	Nonprofit research institute
Legal status	501(c)(3) tax exempt charity
Purpose	AI alignment and safety research
Location	Berkeley, California;
Website	alignment.org

Short description: AI safety research organization

The Alignment Research Center (ARC) is a nonprofit research institute based in Berkeley, California, dedicated to the alignment of advanced artificial intelligence with human values and priorities.^[1] Established by former OpenAI researcher Paul Christiano, ARC focuses on recognizing and comprehending the potentially harmful capabilities of present-day AI models.^[2]^[3]

Details

ARC's mission is to ensure that powerful machine learning systems of the future are designed and developed safely and for the benefit of humanity. It was founded in April 2021 by Paul Christiano and other researchers focused on the theoretical challenges of AI alignment.^[4] They attempt to develop scalable methods for training AI systems to behave honestly and helpfully. A key part of their methodology is considering how proposed alignment techniques might break down or be circumvented as systems become more advanced.^[5] ARC has been expanding from theoretical work into empirical research, industry collaborations, and policy.^[6]^[7]

In March 2023, OpenAI asked the ARC to test GPT-4 to assess the model's ability to exhibit power-seeking behavior.^[8] ARC evaluated GPT-4's ability to strategize, reproduce itself, gather resources, stay concealed within a server, and execute phishing operations.^[9] As part of the test, GPT-4 was asked to solve a CAPTCHA puzzle.^[10] It was able to do so by hiring a human worker on TaskRabbit, a gig work platform, deceiving them into believing it was a vision-impaired human instead of a robot when asked.^[11] ARC determined that GPT-4 responded impermissibly to prompts eliciting restricted information 82% less often than GPT-3.5, and hallucinated 60% less than GPT-3.5.^[12]

In March 2022, the ARC received $265,000 from Open Philanthropy.^[13] After the bankruptcy of FTX, ARC said it would return a $1.25 million grant from disgraced cryptocurrency financier Sam Bankman-Fried's FTX Foundation, stating that the money "morally (if not legally) belongs to FTX customers or creditors."^[14]

References

↑ MacAskill, William (2022-08-16). "How Future Generations Will Remember Us" (in en). https://www.theatlantic.com/ideas/archive/2022/08/future-generations-climate-change-pandemics-ai/671148/.
↑ Klein, Ezra (2023-03-12). "This Changes Everything" (in en-US). The New York Times. ISSN 0362-4331. https://www.nytimes.com/2023/03/12/opinion/chatbots-artificial-intelligence-future-weirdness.html.
↑ Piper, Kelsey (2023-03-29). "How to test what an AI model can — and shouldn't — do" (in en). Vox. https://www.vox.com/future-perfect/2023/3/29/23661633/gpt-4-openai-alignment-research-center-open-philanthropy-ai-safety.
↑ Christiano, Paul (2021-04-26). "Announcing the Alignment Research Center" (in en). https://ai-alignment.com/announcing-the-alignment-research-center-a9b07f77431b.
↑ Christiano, Paul; Cotra, Ajeya; Xu, Mark (December 2021). "Eliciting Latent Knowledge: How to tell if your eyes deceive you" (in en). Alignment Research Center. https://docs.google.com/document/d/1WwsnJQstPq91_Yh-Ch2XRL8H_EpsnjrC1dwZXR37PC8/edit?usp=embed_facebook.
↑ "Alignment Research Center" (in en). https://www.alignment.org/.
↑ Pandey, Mohit (2023-03-17). "Stop Questioning OpenAI's Open-Source Policy" (in en-US). https://analyticsindiamag.com/stop-questioning-openais-open-source-policy/.
↑ GPT-4 System Card, OpenAI, March 23, 2023, https://cdn.openai.com/papers/gpt-4-system-card.pdf, retrieved 2023-04-16
↑ Edwards, Benj (2023-03-15). "OpenAI checked to see whether GPT-4 could take over the world" (in en-us). Ars Technica. https://arstechnica.com/information-technology/2023/03/openai-checked-to-see-whether-gpt-4-could-take-over-the-world/.
↑ "Update on ARC's recent eval efforts: More information about ARC's evaluations of GPT-4 and Claude". Alignment Research Center. 17 March 2023. https://evals.alignment.org/blog/2023-03-18-update-on-recent-evals/.
↑ Cox, Joseph (March 15, 2023). "GPT-4 Hired Unwitting TaskRabbit Worker By Pretending to Be 'Vision-Impaired' Human" (in en). Vice News Motherboard. https://www.vice.com/en/article/jg5ew4/gpt4-hired-unwitting-taskrabbit-worker.
↑ Burke, Cameron (March 20, 2023). "'Robot' Lawyer DoNotPay Sued For Unlicensed Practice Of Law: It's Giving 'Poor Legal Advice'" (in en-US). Yahoo Finance. https://finance.yahoo.com/news/robot-lawyer-donotpay-sued-unlicensed-183435232.html.
↑ "Alignment Research Center — General Support" (in en-us). 2022-06-14. https://www.openphilanthropy.org/grants/alignment-research-center-general-support/.
↑ Wallerstein, Eric (2023-01-07). "FTX Seeks to Recoup Sam Bankman-Fried's Charitable Donations" (in en-US). Wall Street Journal. ISSN 0099-9660. https://www.wsj.com/articles/ftx-seeks-to-recoup-sam-bankman-frieds-charitable-donations-11673049354.

External links

Official website

0.00

(0 votes)

Original source: https://en.wikipedia.org/wiki/Alignment Research Center. Read more

[1] MacAskill, William (2022-08-16). "How Future Generations Will Remember Us" (in en). https://www.theatlantic.com/ideas/archive/2022/08/future-generations-climate-change-pandemics-ai/671148/.

[2] Klein, Ezra (2023-03-12). "This Changes Everything" (in en-US). The New York Times. ISSN 0362-4331. https://www.nytimes.com/2023/03/12/opinion/chatbots-artificial-intelligence-future-weirdness.html.

[3] Piper, Kelsey (2023-03-29). "How to test what an AI model can — and shouldn't — do" (in en). Vox. https://www.vox.com/future-perfect/2023/3/29/23661633/gpt-4-openai-alignment-research-center-open-philanthropy-ai-safety.

[4] Christiano, Paul (2021-04-26). "Announcing the Alignment Research Center" (in en). https://ai-alignment.com/announcing-the-alignment-research-center-a9b07f77431b.

[5] Christiano, Paul; Cotra, Ajeya; Xu, Mark (December 2021). "Eliciting Latent Knowledge: How to tell if your eyes deceive you" (in en). Alignment Research Center. https://docs.google.com/document/d/1WwsnJQstPq91_Yh-Ch2XRL8H_EpsnjrC1dwZXR37PC8/edit?usp=embed_facebook.

[6] "Alignment Research Center" (in en). https://www.alignment.org/.

[7] Pandey, Mohit (2023-03-17). "Stop Questioning OpenAI's Open-Source Policy" (in en-US). https://analyticsindiamag.com/stop-questioning-openais-open-source-policy/.

[8] GPT-4 System Card, OpenAI, March 23, 2023, https://cdn.openai.com/papers/gpt-4-system-card.pdf, retrieved 2023-04-16

[9] Edwards, Benj (2023-03-15). "OpenAI checked to see whether GPT-4 could take over the world" (in en-us). Ars Technica. https://arstechnica.com/information-technology/2023/03/openai-checked-to-see-whether-gpt-4-could-take-over-the-world/.

[10] "Update on ARC's recent eval efforts: More information about ARC's evaluations of GPT-4 and Claude". Alignment Research Center. 17 March 2023. https://evals.alignment.org/blog/2023-03-18-update-on-recent-evals/.

[11] Cox, Joseph (March 15, 2023). "GPT-4 Hired Unwitting TaskRabbit Worker By Pretending to Be 'Vision-Impaired' Human" (in en). Vice News Motherboard. https://www.vice.com/en/article/jg5ew4/gpt4-hired-unwitting-taskrabbit-worker.

[12] Burke, Cameron (March 20, 2023). "'Robot' Lawyer DoNotPay Sued For Unlicensed Practice Of Law: It's Giving 'Poor Legal Advice'" (in en-US). Yahoo Finance. https://finance.yahoo.com/news/robot-lawyer-donotpay-sued-unlicensed-183435232.html.

[13] "Alignment Research Center — General Support" (in en-us). 2022-06-14. https://www.openphilanthropy.org/grants/alignment-research-center-general-support/.

[14] Wallerstein, Eric (2023-01-07). "FTX Seeks to Recoup Sam Bankman-Fried's Charitable Donations" (in en-US). Wall Street Journal. ISSN 0099-9660. https://www.wsj.com/articles/ftx-seeks-to-recoup-sam-bankman-frieds-charitable-donations-11673049354.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

v t e Existential risk from artificial intelligence
Concepts	AI box AI takeover Control problem Existential risk from artificial general intelligence Friendly artificial intelligence Instrumental convergence Intelligence explosion Machine ethics Superintelligence Technological singularity
Organizations	Allen Institute for Artificial Intelligence Center for Applied Rationality Center for Security and Emerging Technology Centre for the Study of Existential Risk DeepMind Foundational Questions Institute Future of Humanity Institute Future of Life Institute Humanity+ Institute for Ethics and Emerging Technologies Leverhulme Centre for the Future of Intelligence Machine Intelligence Research Institute OpenAI
People	Nick Bostrom Sam Harris Stephen Hawking Bill Hibbard Bill Joy Elon Musk Steve Omohundro Huw Price Martin Rees Stuart J. Russell Jaan Tallinn Max Tegmark Frank Wilczek Roman Yampolskiy Andrew Yang Eliezer Yudkowsky
Other	Open Letter on Artificial Intelligence Ethics of artificial intelligence Controversies and dangers of artificial general intelligence Artificial intelligence as a global catastrophic risk Superintelligence Our Final Invention
Category

Anonymous

Search

Organization:Alignment Research Center

Namespaces

More

Page actions

Contents

Details

See also

References

External links

Navigation

Navigation

Help

Translate

Wiki tools

Wiki tools

Formation	April 2021; 4 years ago (April 2021)
Founder	Paul Christiano
Type	Nonprofit research institute
Legal status	501(c)(3) tax exempt charity
Purpose	AI alignment and safety research
Location	Berkeley, California
Website	alignment.org

Anonymous

Search

Organization:Alignment Research Center

Details

See also

References

External links

Navigation

Wiki tools

Page tools

Other projects

Categories