Artificial Intelligence Ethics and Safety is an emerging field of research. For many years, concerns related to artificial intelligence systems were classified as just speculation. However, we can say that since 2016, with a seminal publication entitled "Concrete Problems in AI Safety", the goals and obstacles of the field have been better "formalized and accepted".
In 2019, with the study entitled "Risks from Learned Optimization in Advanced Machine Learning Systems", the Alignment Problem was better formalized as a Machine Learning problem. We can also cite Stuart Russell's "Human Compatible: Artificial Intelligence and the Problem of Control", which defines the alignment problem as one of the most important open problems in the AI research field. After all, if our goal is to develop "beneficial AI," and not just "arbitrary AI," we need to look for ways to solve the alignment problem.
The alignment problem can be defined as follows:
How to bridge the gap between the intentions, goals, and preferences of a human controller and the objective function, and optimized model, of a system created by machine learning.
Specifying goals and objectives is a big problem in machine learning. What has allowed machine learning to become the current paradigm in intelligent systems research and development is that by this methodology we can train models to solve tasks that are too complex to be entirely "hand-coded" (e.g., computer vision).
However, this methodology suffers from one major obstacle. Besides not having formal guarantees on how a model will behave after its training, aligning the controller's intentions (e.g., to classify faces) with what a model is optimizing (e.g., to discern skin colors) is no simple feat. There are several safety issues in machine learning that need to be addressed, especially if we want to implement machine learning-trained systems to interact with complex environments (i.e., the real world).
As we investigate this problematic, we realize that it intersects with several (still open) questions from different areas of knowledge:
How can human preferences be robustly modeled in a computational language?
How to avoid unwanted behavior in domains outside the training distribution?
Could human preferences be inferred from observations alone?
How can we schematize a form of moral reasoning?
What metaphysical and metaethical assumptions should we make to deal with such a problem in an AI-native language?
Answers to such questions can help clarify deep questions about the nature of human normativity and morality. As we teach our creations what "ought to be done," we better learn how we ourselves should act.
Perhaps the main application of this research (in the medium and short term) would be to define how to develop better models. Models that relate to humans more ergonomically. Models that better represent our intentional preferences. Either by "understanding" what was requested or by acting in a way to better serve us.
In this research, we seek to formalize a methodology for how an AI can come to: (1) learn the preferences of its controller; (2) aggregate such preferences to a coherent state of aggregation (i.e., reflexive equilibrium); and (3) act to minimize the impacts caused on the environment and its other agents.
All steps of this methodology are based on relatively new and experimental machine learning techniques:
Preference learning is formalized as an inverse reinforcement learning problem;
Preference aggregation is performed by metanormative algorithms;
Impact reduction is realized by the AUP (Attainable Utility Preservation) strategy.
All these strategies are justified by an interdisciplinary approach, based on areas such as Moral Philosophy, Machine Learning, Decision Theory, Cognitive Science, and Economics. The result is a theoretical framework that we call "Dynamic Normativity."
Dynamic Normativity can be characterized as a theoretical framework for investigating issues involving the development of moral AI, i.e., aligned AI. As a moral theory for artificial agents, Dynamic Normativity tells us what a "normative engine" should look like. We can also say that Dynamic Normativity is a theory about the relationships between humans and AI.
This research is related to the Ph.D. (on going) Thesis of Nicholas Kluge. For more information contact us.
For more information on AI Safety, these are good places to get informed and involved:
Comments