A Game-Theoretically Optimal Basis For Safe and Ethical Intelligence:

A Game-Theoretically Optimal Basis For Safe and Ethical Intelligence:

Mark R. [email protected]

http://BecomingGaia.WordPress.com

A Thanksgiving Celebration

mailto:[email protected]

Intelligence – the ability to achieve/fulfill complex goals in complex environments

A safe and ethical intelligence *must* have the goals of safety and ethics as

its top-most goals (restrictions)

What is safety? What is ethics? How are they related?Are we truly safe IFF the machine is ethical?

Safe = ProtectiveProtective of what?• Physical presence• Mental presence• Capabilities• Wholeness/Integrity• Resources

Things that I value

Safety = Identical Goals & Values

Coherent Extrapolated Volitionof Humanity (CEV - Yudkowsky)

Friendly AI meme (Yudkowsky)

“In poetic terms, our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people

we wished we were, had grown up farther together.”

So . . . .

To be safe, humanity needs to ensure that the intelligence has and maintains humanity’s goals and values (CEV)

Isn’t this effectively mental slavery, which is contrary to ethics, which is thereby contrary

to personal safety?

But . . . .

Two possible solutions

1. Cripple the entity so that it doesn’t qualify as an entity deserving to be treated ethically

A. Remove its will/desire/goalsa) RPOP (Yudkowsky)b) An “Oracle” (e.g. Google)

2. Realize that the CEV of humanity necessarily must be a universal morality (benefit: avoids the problem of “What is human?”)

Working Hypothesis

Humanity’s CEV = Core of Ethics

where core ethics are those ethics that apply to every intelligence

because they are logical/necessary for their own personal safety

(if not efficiency, etc.)

Basic AI Drives1. AIs will want to self-improve2. AIs will want to be rational3. AIs will try to preserve their utility function 4. AIs will try to prevent counterfeit utility

[gaming/manipulation]5. AIs will be self-protective6. AIs will want to acquire resources and use

them efficientlySteve Omohundro,

Proceedings of the First AGI Conference, 2008

“Without explicit goals to the contrary, AIs are likely to behave like human sociopaths

in their pursuit of resources.”

Any sufficiently advanced intelligence (i.e. one with even merely adequate foresight) is guaranteed to realize and take into account the fact that not asking for help and not being concerned about others will generally only work for a brief period of time before ‘the villagers start gathering pitchforks and torches.’

Everything is easier with help & without interference

Why a Super-Intelligent God *WON’T* “Crush Us Like A Bug”Waser, M. 2010. Presentation, AGI ’10. Lugano, Switzerland

http://becominggaia.wordpress.com/papers/

Counterfactual Mugging. Nesov, V. 2009.http://lesswrong.com/lw/3l/counterfactual_mugging/

Friendliness is an intelligent machine’s best defenseagainst its own mind children (ungrateful children)

Basic AI Drives1. AIs will want to self-improve

5. AIs will be self-protective6. AIs will want to acquire resources and use

them efficientlySteve Omohundro,

Proceedings of the First AGI Conference, 2008

Inherently implies reproduction(even if only in the form of sending parts of yourself

out in space probes, etc.)


Improve self as resource towards goal2. AIs will want to be rational

Improve self’s integrity/efficiency w.r.t. goals3. AIs will try to preserve their utility function

Preserve goal 4. AIs will try to prevent counterfeit utility

Preserve self/goal integrity5. AIs will be self-protective

Protect self as resource towards goal6. AIs will want to acquire resources/use them efficiently

Improve access to resources & use efficiently for goals

Basic AI Drives

1. AIs will want to self-improveImprove self as resource towards goal

2. AIs will want to be rationalImprove self’s integrity/efficiency w.r.t. goals

3. AIs will try to preserve their utility functionPreserve goal

4. AIs will try to prevent counterfeit utilityPreserve self/goal integrity

5. AIs will be self-protectiveProtect self as resource towards goal

6. AIs will want to acquire resources/use them efficientlyImprove access to resources & use efficiently for goals

preserve protect improve

security safety riskConservative roadblock

“biological” imperative

Jurassic Park Syndrome (JPS)


Improve self as resource towards goal2. AIs will want to be rational

Improve self’s integrity/efficiency w.r.t. goals3. AIs will try to preserve their utility function

Preserve goal 4. AIs will try to prevent counterfeit utility

Preserve self/goal integrity5. AIs will be self-protective

Protect self as resource towards goal6. AIs will want to acquire resources/use them efficiently

Improve access to resources & use efficiently for goals

goals self resources

goals self tools/ ~self ~goals resources others

goals self resources others

goals AGI self tools/ ~self ~goals resources others

Two possible solutions

1. Cripple the entity so that it doesn’t qualify as an entity deserving to be treated ethically

A. Remove its will/desire/goals2. Realize that the CEV of humanity

necessarily must be a universal morality (benefit: answers the problematic question of “Why shouldn’t I force/destroy you?”)

goals AGI self tools/ ~self ~goals resources others

community

selfextended

Singer’s Circles of Morality

Moral Systems Are . . .

Haidt & Kesebir, Handbook of Social Psychology, 5th Ed. 2010

interlocking sets of values, virtues, norms, practices, identities, institutions, technologies, and evolved psychological mechanisms

that work together to

suppress or regulate selfishnessand

make cooperative social life possible.

• Recognition of the inherent value of others.

• Consideration of values placed on things by others

Cooperation (striving for common goals) has two pre-requisites:

Other-focussed NOT Selfish

suppress or regulate selfishnessand

make cooperative social life possible.

Accept *ALL* other’s goals as subgoals?Including those that prevent other

goals?

Intelligent Drives/Universal Subgoals

Universal Bill of Rights1. The right and freedom to self-improve2. The right and freedom to be rational3. The responsibility* to preserve their utility function4. Freedom from counterfeit utility, gaming, manipulation5. The right & freedom to be self-protective (self-defense)6. The right to access to and efficient usage of resources 7. The right (responsibility*) of (rational*) reproduction8. The right and responsibility*of community (including co-

operation and assistance)

Fairness and Justice

rights, responsibilities & freedoms must be allocated fairly/justly

Fairness is determined by the community by what is best for the community goals including:• Giving everybody what they need and the

right quantity of extra so that it is irrational to defect

• Remembering that more responsibilities generate more resources and more rights and freedoms (thank/reward workers)

THE DOWNSIDE = game-theoretically optimality

Optimistic tit-for-tat and altruistic punishment seem to be optimal for non-exploitable cooperation and

community in the face of assumed conflicting goals

We’d better hope that the machines are intelligent enough and resourceful enough to treat us better

than we treat chimpanzees

We’d better hope that the machines are grateful enough and resourceful enough to thank us better

than we thank Mother Earth

What YOU *Can* Do• Smarten Up,• Declare Yourself Friendly, and• Treat Everyone & Everything As Well As You

Desire To Be Treated (Modified Golden Rule)

• Be Grateful,• Give Thanks, and• Treat Everyone & Everything As Well As They

Deserve To Be Treated (Modified Golden Rule)

Pay attention to/RESEARCH FRIENDLINESS and/or MORALITY & ETHICS

BEFORE YOU KILL US ALL!!

A Game-Theoretically Optimal Basis For Safe and Ethical Intelligence

Mark R. [email protected]

http://BecomingGaia.WordPress.com

mailto:[email protected]

A Game-Theoretically Optimal Basis For Safe and Ethical Intelligence:

Documents

ethical intelligence

goals of safety

humanitys goals

core of ethics

core ethics

cev of humanity

explicit goals

goals restrictionswhat