Unsolved:AI takeover

From HandWiki
Revision as of 22:12, 4 February 2024 by Wikisleeper (talk | contribs) (correction)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Short description: Hypothetical artificial intelligence scenario
Robots revolt in R.U.R., a 1920 Czech play translated as "Rossum's Universal Robots"

An AI takeover is a scenario in which artificial intelligence (AI) becomes the dominant form of intelligence on Earth, as computer programs or robots effectively take control of the planet away from the human species. Possible scenarios include replacement of the entire human workforce, takeover by a superintelligent AI, and the popular notion of a robot uprising. Stories of AI takeovers are very popular throughout science fiction. Some public figures, such as Stephen Hawking and Elon Musk, have advocated research into precautionary measures to ensure future superintelligent machines remain under human control.[1]

Types

Automation of the economy

Main page: Engineering:Technological unemployment

The traditional consensus among economists has been that technological progress does not cause long-term unemployment. However, recent innovation in the fields of robotics and artificial intelligence has raised worries that human labor will become obsolete, leaving people in various sectors without jobs to earn a living, leading to an economic crisis.[2][3][4][5] Many small and medium size businesses may also be driven out of business if they cannot afford or licence the latest robotic and AI technology, and may need to focus on areas or services that cannot easily be replaced for continued viability in the face of such technology.[6]

Technologies that may displace workers

AI technologies have been widely adopted in recent years. While these technologies have replaced many traditional workers, they also create new opportunities. Industries that are most susceptible to AI takeover include transportation, retail, and military. AI military technologies, for example, allow soldiers to work remotely without any risk of injury. Author Dave Bond argues that as AI technologies continue to develop and expand, the relationship between humans and robots will change; they will become closely integrated in several aspects of life. AI will likely displace some workers while creating opportunities for new jobs in other sectors, especially in fields where tasks are repeatable.[7][8]

Computer-integrated manufacturing

Computer-integrated manufacturing uses computers to control the production process. This allows individual processes to exchange information with each other and initiate actions. Although manufacturing can be faster and less error-prone by the integration of computers, the main advantage is the ability to create automated manufacturing processes. Computer-integrated manufacturing is used in automotive, aviation, space, and ship building industries.

White-collar machines

The 21st century has seen a variety of skilled tasks partially taken over by machines, including translation, legal research and low level[clarification needed] journalism. Care work, entertainment, and other tasks requiring empathy, previously thought safe from automation, have also begun to be performed by robots.[9][10][11][12]

Autonomous cars

An autonomous car is a vehicle that is capable of sensing its environment and navigating without human input. Many such vehicles are being developed, but as of May 2017 automated cars permitted on public roads are not yet fully autonomous. They all require a human driver at the wheel who at a moment's notice can take control of the vehicle. Among the obstacles to widespread adoption of autonomous vehicles are concerns about the resulting loss of driving-related jobs in the road transport industry. On March 18, 2018, the first human was killed by an autonomous vehicle in Tempe, Arizona by an Uber self-driving car.[13]

Eradication

Main page: Unsolved:Existential risk from artificial general intelligence

Scientists such as Stephen Hawking are confident that superhuman artificial intelligence is physically possible, stating "there is no physical law precluding particles from being organised in ways that perform even more advanced computations than the arrangements of particles in human brains".[14][15] Scholars like Nick Bostrom debate how far off superhuman intelligence is, and whether it poses a risk to mankind. According to Bostrom, a superintelligent machine would not necessarily be motivated by the same emotional desire to collect power that often drives human beings but might rather treat power as a means toward attaining its ultimate goals; taking over the world would both increase its access to resources and help to prevent other agents from stopping the machine's plans. As an oversimplified example, a paperclip maximizer designed solely to create as many paperclips as possible would want to take over the world so that it can use all of the world's resources to create as many paperclips as possible, and, additionally, prevent humans from shutting it down or using those resources on things other than paperclips.[16]

In fiction

AI takeover is a common theme in science fiction. Fictional scenarios typically differ vastly from those hypothesized by researchers in that they involve an active conflict between humans and an AI or robots with anthropomorphic motives who see them as a threat or otherwise have active desire to fight humans, as opposed to the researchers' concern of an AI that rapidly exterminates humans as a byproduct of pursuing its goals.[17] The idea is seen in Karel Čapek's R.U.R., which introduced the word robot in 1921,[18] and can be glimpsed in Mary Shelley's Frankenstein (published in 1818), as Victor ponders whether, if he grants his monster's request and makes him a wife, they would reproduce and their kind would destroy humanity.[19]

According to Toby Ord, the idea that an AI takeover requires robots is a misconception driven by the media and Hollywood. He argues that the most damaging humans in history were not physically the strongest, but that they used words instead to convince people and gain control of large parts of the world. He writes that a sufficiently intelligent AI with an access to the internet could scatter backup copies of itself, gather financial and human resources (via cyberattacks or blackmails), persuade people on a large scale, and exploit societal vulnerabilities that are too subtle for humans to anticipate.[20]

The word "robot" from R.U.R. comes from the Czech word, robota, meaning laborer or serf. The 1920 play was a protest against the rapid growth of technology, featuring manufactured "robots" with increasing capabilities who eventually revolt.[21] HAL 9000 (1968) and the original Terminator (1984) are two iconic examples of hostile AI in pop culture.[22]

Contributing factors

Advantages of superhuman intelligence over humans

Nick Bostrom and others have expressed concern that an AI with the abilities of a competent artificial intelligence researcher would be able to modify its own source code and increase its own intelligence. If its self-reprogramming leads to its getting even better at being able to reprogram itself, the result could be a recursive intelligence explosion in which it would rapidly leave human intelligence far behind. Bostrom defines a superintelligence as "any intellect that greatly exceeds the cognitive performance of humans in virtually all domains of interest", and enumerates some advantages a superintelligence would have if it chose to compete against humans:[17][23]

  • Technology research: A machine with superhuman scientific research abilities would be able to beat the human research community to milestones such as nanotechnology or advanced biotechnology
  • Strategizing: A superintelligence might be able to simply outwit human opposition
  • Social manipulation: A superintelligence might be able to recruit human support,[17] or covertly incite a war between humans[24]
  • Economic productivity: As long as a copy of the AI could produce more economic wealth than the cost of its hardware, individual humans would have an incentive to voluntarily allow the Artificial General Intelligence (AGI) to run a copy of itself on their systems
  • Hacking: A superintelligence could find new exploits in computers connected to the Internet, and spread copies of itself onto those systems, or might steal money to finance its plans

Sources of AI advantage

According to Bostrom, a computer program that faithfully emulates a human brain, or that runs algorithms that are as powerful as the human brain's algorithms, could still become a "speed superintelligence" if it can think many orders of magnitude faster than a human, due to being made of silicon rather than flesh, or due to optimization increasing the speed of the AGI. Biological neurons operate at about 200 Hz, whereas a modern microprocessor operates at a speed of about 2,000,000,000 Hz. Human axons carry action potentials at around 120 m/s, whereas computer signals travel near the speed of light.[17]

A network of human-level intelligences designed to network together and share complex thoughts and memories seamlessly, able to collectively work as a giant unified team without friction, or consisting of trillions of human-level intelligences, would become a "collective superintelligence".[17]

More broadly, any number of qualitative improvements to a human-level AGI could result in a "quality superintelligence", perhaps resulting in an AGI as far above us in intelligence as humans are above non-human apes. The number of neurons in a human brain is limited by cranial volume and metabolic constraints, while the number of processors in a supercomputer can be indefinitely expanded. An AGI need not be limited by human constraints on working memory, and might therefore be able to intuitively grasp more complex relationships than humans can. An AGI with specialized cognitive support for engineering or computer programming would have an advantage in these fields, compared with humans who evolved no specialized mental modules to specifically deal with those domains. Unlike humans, an AGI can spawn copies of itself and tinker with its copies' source code to attempt to further improve its algorithms.[17]

Possibility of unfriendly AI preceding friendly AI

Is strong AI inherently dangerous?

Main page: AI alignment

A significant problem is that unfriendly artificial intelligence is likely to be much easier to create than friendly AI. While both require large advances in recursive optimisation process design, friendly AI also requires the ability to make goal structures invariant under self-improvement (or the AI could transform itself into something unfriendly) and a goal structure that aligns with human values and does not undergo instrumental convergence in ways that may automatically destroy the entire human race. An unfriendly AI, on the other hand, can optimize for an arbitrary goal structure, which does not need to be invariant under self-modification.[25]

The sheer complexity of human value systems makes it very difficult to make AI's motivations human-friendly.[17][26] Unless moral philosophy provides us with a flawless ethical theory, an AI's utility function could allow for many potentially harmful scenarios that conform with a given ethical framework but not "common sense". According to Eliezer Yudkowsky, there is little reason to suppose that an artificially designed mind would have such an adaptation.[27]

Odds of conflict

Many scholars, including evolutionary psychologist Steven Pinker, argue that a superintelligent machine is likely to coexist peacefully with humans.[28]

The fear of cybernetic revolt is often based on interpretations of humanity's history, which is rife with incidents of enslavement and genocide. Such fears stem from a belief that competitiveness and aggression are necessary in any intelligent being's goal system. However, such human competitiveness stems from the evolutionary background to our intelligence, where the survival and reproduction of genes in the face of human and non-human competitors was the central goal.[29] According to AI researcher Steve Omohundro, an arbitrary intelligence could have arbitrary goals: there is no particular reason that an artificially intelligent machine (not sharing humanity's evolutionary context) would be hostile—or friendly—unless its creator programs it to be such and it is not inclined or capable of modifying its programming. But the question remains: what would happen if AI systems could interact and evolve (evolution in this context means self-modification or selection and reproduction) and need to compete over resources—would that create goals of self-preservation? AI's goal of self-preservation could be in conflict with some goals of humans.[30]

Many scholars dispute the likelihood of unanticipated cybernetic revolt as depicted in science fiction such as The Matrix, arguing that it is more likely that any artificial intelligence powerful enough to threaten humanity would probably be programmed not to attack it. Pinker acknowledges the possibility of deliberate "bad actors", but states that in the absence of bad actors, unanticipated accidents are not a significant threat; Pinker argues that a culture of engineering safety will prevent AI researchers from accidentally unleashing malign superintelligence.[28] In contrast, Yudkowsky argues that humanity is less likely to be threatened by deliberately aggressive AIs than by AIs which were programmed such that their goals are unintentionally incompatible with human survival or well-being (as in the film I, Robot and in the short story "The Evitable Conflict"). Omohundro suggests that present-day automation systems are not designed for safety and that AIs may blindly optimize narrow utility functions (say, playing chess at all costs), leading them to seek self-preservation and elimination of obstacles, including humans who might turn them off.[31]

Precautions

Main page: Philosophy:AI control problem

The AI control problem is the issue of how to build a superintelligent agent that will aid its creators, while avoiding inadvertently building a superintelligence that will harm its creators.[32] Some scholars argue that solutions to the control problem might also find applications in existing non-superintelligent AI.[33]

Major approaches to the control problem include alignment, which aims to align AI goal systems with human values, and capability control, which aims to reduce an AI system's capacity to harm humans or gain control. An example of "capability control" is to research whether a superintelligence AI could be successfully confined in an "AI box". According to Bostrom, such capability control proposals are not reliable or sufficient to solve the control problem in the long term, but may potentially act as valuable supplements to alignment efforts.[17]

Warnings

Physicist Stephen Hawking, Microsoft founder Bill Gates, and SpaceX founder Elon Musk have expressed concerns about the possibility that AI could develop to the point that humans could not control it, with Hawking theorizing that this could "spell the end of the human race".[34] Stephen Hawking said in 2014 that "Success in creating AI would be the biggest event in human history. Unfortunately, it might also be the last, unless we learn how to avoid the risks." Hawking believed that in the coming decades, AI could offer "incalculable benefits and risks" such as "technology outsmarting financial markets, out-inventing human researchers, out-manipulating human leaders, and developing weapons we cannot even understand." In January 2015, Nick Bostrom joined Stephen Hawking, Max Tegmark, Elon Musk, Lord Martin Rees, Jaan Tallinn, and numerous AI researchers in signing the Future of Life Institute's open letter speaking to the potential risks and benefits associated with artificial intelligence. The signatories "believe that research on how to make AI systems robust and beneficial is both important and timely, and that there are concrete research directions that can be pursued today."[35][36]

Arthur C. Clarke's Odyssey series and Charles Stross's Accelerando relate to humanity's narcissistic injuries in the face of powerful artificial intelligences threatening humanity's self-perception.[37]

Prevention through AI alignment

In the field of artificial intelligence (AI), AI alignment research aims to steer AI systems towards humans' intended goals, preferences, or ethical principles. An AI system is considered aligned if it advances its intended objectives. A misaligned AI system pursues some objectives, but not the intended ones.[38]

It can be challenging for AI designers to align an AI system because it can be difficult for them to specify the full range of desired and undesired behavior. To avoid this difficulty, they typically use simpler proxy goals, such as gaining human approval. But that approach can create loopholes, overlook necessary constraints, or reward the AI system for merely appearing aligned.[38][39]

Misaligned AI systems can malfunction or cause harm. AI systems may find loopholes that allow them to accomplish their proxy goals efficiently but in unintended, sometimes harmful ways (reward hacking).[38][40][41] They may also develop unwanted instrumental strategies, such as seeking power or survival, because such strategies help them achieve their final given goals.[38][42][43] Furthermore, they may develop undesirable emergent goals that may be hard to detect before the system is in deployment, where it faces new situations and data distributions.[44][45]

Today, these problems affect existing commercial systems such as language models,[46][47][48] robots,[49] autonomous vehicles,[50] and social media recommendation engines.[46][43][51] Some AI researchers argue that more capable future systems will be more severely affected, since these problems partially result from the systems being highly capable.[52][40][39]

Many leading AI scientists, including Geoffrey Hinton and Stuart Russell, argue that AI is approaching human-like (AGI) and superhuman cognitive capabilities (ASI) and could endanger human civilization if misaligned.[53][43]

AI alignment is a subfield of AI safety, the study of how to build safe AI systems.[54] Other subfields of AI safety include robustness, monitoring, and capability control.[55] Research challenges in alignment include instilling complex values in AI, avoiding deceptive AI,[56] scalable oversight, auditing and interpreting AI models, and preventing emergent AI behaviors like power-seeking.[55] Alignment research has connections to interpretability research,[57][58] (adversarial) robustness,[54] anomaly detection, calibrated uncertainty,[57] formal verification,[59] preference learning,[60][61][62] safety-critical engineering,[63] game theory,[64] algorithmic fairness,[54][65] and the social sciences.[66]

See also


Notes

References

  1. Lewis, Tanya (2015-01-12). "Don't Let Artificial Intelligence Take Over, Top Scientists Warn". Purch. http://www.livescience.com/49419-artificial-intelligence-dangers-letter.html. "Stephen Hawking, Elon Musk and dozens of other top scientists and technology leaders have signed a letter warning of the potential dangers of developing artificial intelligence (AI)." 
  2. Lee, Kai-Fu (2017-06-24). "The Real Threat of Artificial Intelligence". https://www.nytimes.com/2017/06/24/opinion/sunday/artificial-intelligence-economic-inequality.html. "These tools can outperform human beings at a given task. This kind of A.I. is spreading to thousands of domains, and as it does, it will eliminate many jobs." 
  3. Larson, Nina (2017-06-08). "AI 'good for the world'... says ultra-lifelike robot". https://phys.org/news/2017-06-ai-good-world-ultra-lifelike-robot.html. "Among the feared consequences of the rise of the robots is the growing impact they will have on human jobs and economies." 
  4. Santini, Jean-Louis (2016-02-14). "Intelligent robots threaten millions of jobs". https://phys.org/news/2016-02-intelligent-robots-threaten-millions-jobs.html#nRlv. ""We are approaching a time when machines will be able to outperform humans at almost any task," said Moshe Vardi, director of the Institute for Information Technology at Rice University in Texas." 
  5. Williams-Grut, Oscar (2016-02-15). "Robots will steal your job: How AI could increase unemployment and inequality". Business Insider. http://www.businessinsider.com/robots-will-steal-your-job-citi-ai-increase-unemployment-inequality-2016-2?r=UK&IR=T. "Top computer scientists in the US warned that the rise of artificial intelligence (AI) and robots in the workplace could cause mass unemployment and dislocated economies, rather than simply unlocking productivity gains and freeing us all up to watch TV and play sports." 
  6. "How can SMEs prepare for the rise of the robots?" (in en-US). LeanStaff. 2017-10-17. http://www.leanstaff.co.uk/robot-apocalypse/. 
  7. Frank, Morgan (2019-03-25). "Toward understanding the impact of artificial intelligence on labor". Proceedings of the National Academy of Sciences of the United States of America 116 (14): 6531–6539. doi:10.1073/pnas.1900949116. PMID 30910965. Bibcode2019PNAS..116.6531F. 
  8. Bond, Dave (2017). Artificial Intelligence. pp. 67–69. 
  9. Skidelsky, Robert (2013-02-19). "Rise of the robots: what will the future of work look like?". The Guardian (London). https://www.theguardian.com/business/2013/feb/19/rise-of-robots-future-of-work. 
  10. Bria, Francesca (February 2016). "The robot economy may already have arrived". openDemocracy. https://www.opendemocracy.net/can-europe-make-it/francesca-bria/robot-economy-full-automation-work-future. 
  11. Srnicek, Nick (March 2016). "4 Reasons Why Technological Unemployment Might Really Be Different This Time". novara wire. http://wire.novaramedia.com/2015/03/4-reasons-why-technological-unemployment-might-really-be-different-this-time/. 
  12. Brynjolfsson, Erik; McAfee, Andrew (2014). "passim, see esp Chpt. 9". The Second Machine Age: Work, Progress, and Prosperity in a Time of Brilliant Technologies. W. W. Norton & Company. ISBN 978-0393239355. 
  13. Wakabayashi, Daisuke (March 19, 2018). "Self-Driving Uber Car Kills Pedestrian in Arizona, Where Robots Roam". New York Times. https://www.nytimes.com/2018/03/19/technology/uber-driverless-fatality.html. 
  14. Hawking, Stephen; Stuart Russell; Max Tegmark; Frank Wilczek (1 May 2014). "Stephen Hawking: 'Transcendence looks at the implications of artificial intelligence - but are we taking AI seriously enough?'". The Independent. https://www.independent.co.uk/news/science/stephen-hawking-transcendence-looks-at-the-implications-of-artificial-intelligence-but-are-we-taking-9313474.html. 
  15. Müller, Vincent C.; Bostrom, Nick (2016). "Future Progress in Artificial Intelligence: A Survey of Expert Opinion". Fundamental Issues of Artificial Intelligence. Springer. pp. 555–572. doi:10.1007/978-3-319-26485-1_33. ISBN 978-3-319-26483-7. https://nickbostrom.com/papers/survey.pdf. Retrieved 2022-06-16. "AI systems will... reach overall human ability... very likely (with 90% probability) by 2075. From reaching human ability, it will move on to superintelligence within 30 years (75%)... So, (most of the AI experts responding to the surveys) think that superintelligence is likely to come in a few decades..." 
  16. Bostrom, Nick (2012). "The Superintelligent Will: Motivation and Instrumental Rationality in Advanced Artificial Agents". Minds and Machines (Springer) 22 (2): 71–85. doi:10.1007/s11023-012-9281-3. https://nickbostrom.com/superintelligentwill.pdf. Retrieved 2022-06-16. 
  17. 17.0 17.1 17.2 17.3 17.4 17.5 17.6 17.7 Bostrom, Nick. Superintelligence: Paths, Dangers, Strategies. 
  18. "The Origin Of The Word 'Robot'". Science Friday (public radio). 22 April 2011. https://www.sciencefriday.com/segments/the-origin-of-the-word-robot/. 
  19. Botkin-Kowacki, Eva (28 October 2016). "A female Frankenstein would lead to humanity's extinction, say scientists". Christian Science Monitor. https://www.csmonitor.com/Science/2016/1028/A-female-Frankenstein-would-lead-to-humanity-s-extinction-say-scientists. 
  20. Ord, Toby (2020). "Unaligned artificial intelligence". The precipice: existential risk and the future of humanity. london New York (N.Y.): Bloomsbury academic. ISBN 978-1-5266-0023-3. 
  21. Hockstein, N. G.; Gourin, C. G.; Faust, R. A.; Terris, D. J. (17 March 2007). "A history of robots: from science fiction to surgical robotics". Journal of Robotic Surgery 1 (2): 113–118. doi:10.1007/s11701-007-0021-2. PMID 25484946. 
  22. Hellmann, Melissa (21 September 2019). "AI 101: What is artificial intelligence and where is it going?". The Seattle Times. https://www.seattletimes.com/business/technology/ai-101-what-is-artificial-intelligence-and-where-is-it-going/. 
  23. Babcock, James; Krámar, János; Yampolskiy, Roman V. (2019). "Guidelines for Artificial Intelligence Containment". Next-Generation Ethics. pp. 90–112. doi:10.1017/9781108616188.008. ISBN 9781108616188. 
  24. Baraniuk, Chris (23 May 2016). "Checklist of worst-case scenarios could help prepare for evil AI". New Scientist. https://www.newscientist.com/article/2089606-checklist-of-worst-case-scenarios-could-help-prepare-for-evil-ai/. 
  25. Yudkowsky, Eliezer S. (May 2004). "Coherent Extrapolated Volition". Singularity Institute for Artificial Intelligence. http://singinst.org/upload/CEV.html. 
  26. Muehlhauser, Luke; Helm, Louie (2012). "Intelligence Explosion and Machine Ethics". Singularity Hypotheses: A Scientific and Philosophical Assessment. Springer. https://intelligence.org/files/IE-ME.pdf. Retrieved 2020-10-02. 
  27. Yudkowsky, Eliezer (2011). "Complex Value Systems in Friendly AI". Artificial General Intelligence. Lecture Notes in Computer Science. 6830. pp. 388–393. doi:10.1007/978-3-642-22887-2_48. ISBN 978-3-642-22886-5. 
  28. 28.0 28.1 Pinker, Steven (13 February 2018). "We're told to fear robots. But why do we think they'll turn on us?" (in en). Popular Science. https://www.popsci.com/robot-uprising-enlightenment-now/. 
  29. Creating a New Intelligent Species: Choices and Responsibilities for Artificial Intelligence Designers - Singularity Institute for Artificial Intelligence, 2005
  30. Omohundro, Stephen M. (June 2008). "The basic AI drives". Artificial General Intelligence 2008. pp. 483–492. https://selfawaresystems.files.wordpress.com/2008/01/ai_drives_final.pdf. Retrieved 2020-10-02. 
  31. Tucker, Patrick (17 Apr 2014). "Why There Will Be A Robot Uprising". Defense One. http://www.defenseone.com/technology/2014/04/why-there-will-be-robot-uprising/82783/. 
  32. Russell, Stuart J. (8 October 2019). Human compatible : artificial intelligence and the problem of control. ISBN 978-0-525-55862-0. OCLC 1237420037. http://worldcat.org/oclc/1237420037. Retrieved 2 January 2022. 
  33. "Google developing kill switch for AI". BBC News. 8 June 2016. https://www.bbc.com/news/technology-36472140. 
  34. Rawlinson, Kevin (29 January 2015). "Microsoft's Bill Gates insists AI is a threat". https://www.bbc.co.uk/news/31047780. 
  35. "The Future of Life Institute Open Letter". The Future of Life Institute. 28 October 2015. http://futureoflife.org/ai-open-letter. 
  36. Bradshaw, Tim (11 January 2015). "Scientists and investors warn on AI". The Financial Times. http://www.ft.com/cms/s/0/3d2c2f12-99e9-11e4-93c1-00144feabdc0.html#axzz3TNL9lxJV. 
  37. Kaminski, Johannes D. (December 2022). "On human expendability: AI takeover in Clarke's Odyssey and Stross's Accelerando" (in en). Neohelicon 49 (2): 495–511. doi:10.1007/s11059-022-00670-w. ISSN 0324-4652. https://link.springer.com/10.1007/s11059-022-00670-w. 
  38. 38.0 38.1 38.2 38.3 Russell, Stuart J.; Norvig, Peter (2021). Artificial intelligence: A modern approach (4th ed.). Pearson. pp. 5, 1003. ISBN 9780134610993. https://www.pearson.com/us/higher-education/program/Russell-Artificial-Intelligence-A-Modern-Approach-4th-Edition/PGM1263338.html. Retrieved September 12, 2022. 
  39. 39.0 39.1 Ngo, Richard; Chan, Lawrence; Mindermann, Sören (2023-02-22). "The alignment problem from a deep learning perspective". arXiv:2209.00626 [cs.AI].
  40. 40.0 40.1 Pan, Alexander; Bhatia, Kush; Steinhardt, Jacob (2022-02-14). "The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models". International Conference on Learning Representations. https://openreview.net/forum?id=JYtwGwIL7ye. Retrieved 2022-07-21. 
  41. Zhuang, Simon; Hadfield-Menell, Dylan (2020). "Consequences of Misaligned AI". 33. Curran Associates, Inc.. pp. 15763–15773. https://proceedings.neurips.cc/paper/2020/hash/b607ba543ad05417b8507ee86c54fcb7-Abstract.html. Retrieved 2023-03-11. 
  42. Carlsmith, Joseph (2022-06-16). "Is Power-Seeking AI an Existential Risk?". arXiv:2206.13353 [cs.CY].
  43. 43.0 43.1 43.2 Russell, Stuart J. (2020). Human compatible: Artificial intelligence and the problem of control. Penguin Random House. ISBN 9780525558637. OCLC 1113410915. https://www.penguinrandomhouse.com/books/566677/human-compatible-by-stuart-russell/. 
  44. Christian, Brian (2020). The alignment problem: Machine learning and human values. W. W. Norton & Company. ISBN 978-0-393-86833-3. OCLC 1233266753. https://wwnorton.co.uk/books/9780393635829-the-alignment-problem. Retrieved September 12, 2022. 
  45. Langosco, Lauro Langosco Di; Koch, Jack; Sharkey, Lee D.; Pfau, Jacob; Krueger, David (2022-06-28). "Goal Misgeneralization in Deep Reinforcement Learning". International Conference on Machine Learning. PMLR. pp. 12004–12019. https://proceedings.mlr.press/v162/langosco22a.html. Retrieved 2023-03-11. 
  46. 46.0 46.1 Bommasani, Rishi; Hudson, Drew A.; Adeli, Ehsan; Altman, Russ; Arora, Simran; von Arx, Sydney; Bernstein, Michael S.; Bohg, Jeannette et al. (2022-07-12). "On the Opportunities and Risks of Foundation Models". Stanford CRFM. https://fsi.stanford.edu/publication/opportunities-and-risks-foundation-models. 
  47. Ouyang, Long; Wu, Jeff; Jiang, Xu; Almeida, Diogo; Wainwright, Carroll L.; Mishkin, Pamela; Zhang, Chong; Agarwal, Sandhini; Slama, Katarina; Ray, Alex; Schulman, J.; Hilton, Jacob; Kelton, Fraser; Miller, Luke E.; Simens, Maddie; Askell, Amanda; Welinder, P.; Christiano, P.; Leike, J.; Lowe, Ryan J. (2022). "Training language models to follow instructions with human feedback". arXiv:2203.02155 [cs.CL].
  48. Zaremba, Wojciech; Brockman, Greg; OpenAI (2021-08-10). "OpenAI Codex". OpenAI. https://openai.com/blog/openai-codex/. Retrieved 2022-07-23. 
  49. Kober, Jens; Bagnell, J. Andrew; Peters, Jan (2013-09-01). "Reinforcement learning in robotics: A survey" (in en). The International Journal of Robotics Research 32 (11): 1238–1274. doi:10.1177/0278364913495721. ISSN 0278-3649. http://journals.sagepub.com/doi/10.1177/0278364913495721. Retrieved September 12, 2022. 
  50. Knox, W. Bradley; Allievi, Alessandro; Banzhaf, Holger; Schmitt, Felix; Stone, Peter (2023-03-01). "Reward (Mis)design for autonomous driving" (in en). Artificial Intelligence 316: 103829. doi:10.1016/j.artint.2022.103829. ISSN 0004-3702. 
  51. Stray, Jonathan (2020). "Aligning AI Optimization to Community Well-Being" (in en). International Journal of Community Well-Being 3 (4): 443–463. doi:10.1007/s42413-020-00086-3. ISSN 2524-5295. PMID 34723107. 
  52. Russell, Stuart; Norvig, Peter (2009). Artificial Intelligence: A Modern Approach. Prentice Hall. pp. 1003. ISBN 978-0-13-461099-3. https://aima.cs.berkeley.edu/. 
  53. Smith, Craig S.. "Geoff Hinton, AI's Most Famous Researcher, Warns Of 'Existential Threat'" (in en). https://www.forbes.com/sites/craigsmith/2023/05/04/geoff-hinton-ais-most-famous-researcher-warns-of-existential-threat/. 
  54. 54.0 54.1 54.2 Amodei, Dario; Olah, Chris; Steinhardt, Jacob; Christiano, Paul; Schulman, John; Mané, Dan (2016-06-21). "Concrete Problems in AI Safety". arXiv:1606.06565 [cs.AI].
  55. 55.0 55.1 Ortega, Pedro A.; Maini, Vishal; DeepMind safety team (2018-09-27). "Building safe artificial intelligence: specification, robustness, and assurance". DeepMind Safety Research – Medium. https://deepmindsafetyresearch.medium.com/building-safe-artificial-intelligence-52f5f75058f1. Retrieved 2022-07-18. 
  56. Hagendorff, Thilo (2023-07-31). "Deception Abilities Emerged in Large Language Models". arXiv:2307.16513 [cs.CL].
  57. 57.0 57.1 Rorvig, Mordechai (2022-04-14). "Researchers Gain New Understanding From Simple AI". Quanta Magazine. https://www.quantamagazine.org/researchers-glimpse-how-ai-gets-so-good-at-language-processing-20220414/. Retrieved 2022-07-18. 
  58. Doshi-Velez, Finale; Kim, Been (2017-03-02). "Towards A Rigorous Science of Interpretable Machine Learning". arXiv:1702.08608 [stat.ML].
  59. Russell, Stuart; Dewey, Daniel; Tegmark, Max (2015-12-31). "Research Priorities for Robust and Beneficial Artificial Intelligence". AI Magazine 36 (4): 105–114. doi:10.1609/aimag.v36i4.2577. ISSN 2371-9621. https://ojs.aaai.org/index.php/aimagazine/article/view/2577. Retrieved September 12, 2022. 
  60. Wirth, Christian; Akrour, Riad; Neumann, Gerhard; Fürnkranz, Johannes (2017). "A survey of preference-based reinforcement learning methods". Journal of Machine Learning Research 18 (136): 1–46. 
  61. Christiano, Paul F.; Leike, Jan; Brown, Tom B.; Martic, Miljan; Legg, Shane; Amodei, Dario (2017). "Deep reinforcement learning from human preferences". Red Hook, NY, USA: Curran Associates Inc.. pp. 4302–4310. ISBN 978-1-5108-6096-4. 
  62. Heaven, Will Douglas (2022-01-27). "The new version of GPT-3 is much better behaved (and should be less toxic)". MIT Technology Review. https://www.technologyreview.com/2022/01/27/1044398/new-gpt3-openai-chatbot-language-model-ai-toxic-misinformation/. Retrieved 2022-07-18. 
  63. Mohseni, Sina; Wang, Haotao; Yu, Zhiding; Xiao, Chaowei; Wang, Zhangyang; Yadawa, Jay (2022-03-07). "Taxonomy of Machine Learning Safety: A Survey and Primer". arXiv:2106.04823 [cs.LG].
  64. Clifton, Jesse (2020). "Cooperation, Conflict, and Transformative Artificial Intelligence: A Research Agenda". Center on Long-Term Risk. https://longtermrisk.org/research-agenda/. Retrieved 2022-07-18. 
  65. Prunkl, Carina; Whittlestone, Jess (2020-02-07). "Beyond Near- and Long-Term" (in en). Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society. New York NY USA: ACM. pp. 138–143. doi:10.1145/3375627.3375803. ISBN 978-1-4503-7110-0. https://dl.acm.org/doi/10.1145/3375627.3375803. Retrieved September 12, 2022. 
  66. Irving, Geoffrey; Askell, Amanda (2019-02-19). "AI Safety Needs Social Scientists". Distill 4 (2): 10.23915/distill.00014. doi:10.23915/distill.00014. ISSN 2476-0757. https://distill.pub/2019/safety-needs-social-scientists. Retrieved September 12, 2022. 

External links