Top Banner
CSCI 1300 Artificial Intelligence Lecture Mike Mozer December 4, 2003
35

CSCI 1300 Artificial Intelligence Lecture Mike Mozer December 4, … · 2003-12-04 · Artificial Intelligence Lecture Mike Mozer December 4, 2003. Computer Science Operating Systems

Jun 28, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CSCI 1300 Artificial Intelligence Lecture Mike Mozer December 4, … · 2003-12-04 · Artificial Intelligence Lecture Mike Mozer December 4, 2003. Computer Science Operating Systems

CSCI 1300

Artificial Intelligence Lecture

Mike Mozer

December 4, 2003

Page 2: CSCI 1300 Artificial Intelligence Lecture Mike Mozer December 4, … · 2003-12-04 · Artificial Intelligence Lecture Mike Mozer December 4, 2003. Computer Science Operating Systems

Computer Science

Operating Systems

Programming Languages

Networking

Security

Theory

Artificial Intelligence

Page 3: CSCI 1300 Artificial Intelligence Lecture Mike Mozer December 4, … · 2003-12-04 · Artificial Intelligence Lecture Mike Mozer December 4, 2003. Computer Science Operating Systems

Artificial Intelligence

Natural Language Understanding

Speech Recognition

Computer Vision

Robotics

Reasoning

Planning

Machine Learning

Page 4: CSCI 1300 Artificial Intelligence Lecture Mike Mozer December 4, … · 2003-12-04 · Artificial Intelligence Lecture Mike Mozer December 4, 2003. Computer Science Operating Systems

Machine Learning

Supervised Learningspam filters (hotmail.com)

ALVINN (autonomous vehicle navigation)

Unsupervised Learningcollaborative filtering (amazon.com)

fault monitoring

Reinforcement Learningtd-gammon (champion backgammon playing program)

elevator controller

adaptive home lighting/heating control

Page 5: CSCI 1300 Artificial Intelligence Lecture Mike Mozer December 4, … · 2003-12-04 · Artificial Intelligence Lecture Mike Mozer December 4, 2003. Computer Science Operating Systems

Reinforcement Learning: A Simple Example

Suppose you are in one of two stateshungry

sleepy

Suppose you can take one of two actionsgo to Turley’s

lie on bed

Reward contingencieshungry -> go to Turley’s reward

hungry -> lie on bed no reward

sleepy -> go to Turley’s no reward

sleepy -> lie on bed reward

Reward depends on what action you take in a given state.

Page 6: CSCI 1300 Artificial Intelligence Lecture Mike Mozer December 4, … · 2003-12-04 · Artificial Intelligence Lecture Mike Mozer December 4, 2003. Computer Science Operating Systems

Reinforcement Learning: A Simple Example

How do you learn to take the correct action?

Trial and error!

Through experience, system can learn to predict the reward that will be obtained for some action given the current state:

reward(action | state)

This is also notated as “Q(state, action)”

Given the expected reward, agent can choose best action:if Q(hungry, Turley’s) > Q(hungry, lie on bed) then go to Turley’selse lie on bed

Page 7: CSCI 1300 Artificial Intelligence Lecture Mike Mozer December 4, … · 2003-12-04 · Artificial Intelligence Lecture Mike Mozer December 4, 2003. Computer Science Operating Systems

Reinforcement Learning in the Real World

IssuesDelayed reinforcement (e.g., car accident due to worn tires)

Occasional reinforcement (e.g., chess playing)

Short term versus long term rewards (e.g., skipping class)

Exploration versus exploitation (e.g., trying new restaurants)

Partially observable state (e.g., viral infection)

Multiple agents (e.g., multiple elevators)

s1 s2 s3 s4 s5 s6 s7

time interval

state

action

instantaneous

1 2 3 4 5 6 7

a1 a2 a3 a4 a5 a6 a7

r1 r2 r3 r4 r5 r6 r7reinforcement

Page 8: CSCI 1300 Artificial Intelligence Lecture Mike Mozer December 4, … · 2003-12-04 · Artificial Intelligence Lecture Mike Mozer December 4, 2003. Computer Science Operating Systems

Elevator Control

Page 9: CSCI 1300 Artificial Intelligence Lecture Mike Mozer December 4, … · 2003-12-04 · Artificial Intelligence Lecture Mike Mozer December 4, 2003. Computer Science Operating Systems

Elevator Control

Page 10: CSCI 1300 Artificial Intelligence Lecture Mike Mozer December 4, … · 2003-12-04 · Artificial Intelligence Lecture Mike Mozer December 4, 2003. Computer Science Operating Systems

Q learning(Watkins, 1989; Watkins & Dayan, 1992)

Q(x,u): If action u is taken in state x, what is the minimum cost we can expect to obtain?

Policy based on Q values:

Incremental update rule for Q values:

Given fully observable state, infinite exploration, etc., guaranteed to converge on optimal policy.

π xt( ) argminuQ xt ut,( ) with probability 1 θ–( )

random with probability θ

=

exploration rate

Q xt ut,( ) 1 α–( )Q xt ut,( ) α maxu ct λQ xt 1+ u,( )+[ ]+←

discount factorlearning rate

Page 11: CSCI 1300 Artificial Intelligence Lecture Mike Mozer December 4, … · 2003-12-04 · Artificial Intelligence Lecture Mike Mozer December 4, 2003. Computer Science Operating Systems

The Adaptive HouseMichael Mozer+*Robert Dodier#Debra Miller*

Marc Anderson*Josh Anderson✩ Diane Lukianow✩

Dan Bertini# Tom Moyer�

Matt Bronder* Charles Myers✩

Michael Colagrosso* Tom Pennell*Robert Cruickshank# James Ries✩

Brian Daugherty* Erik Skorpen✩

Mark Fontenot� Joel Sloss✩

Okechukwu Ikeako✩ Lucky Vidmar*Paul Kooros✩ Matthew Weeks✩

University of Colorado*Department of Computer Science+Institute of Cognitive Science

#Department of Civil, Environmental, and Architectural Engineering✩Department of Electrical and Computer Engineering

�Department of Mechanical Engineering�Department of Aerospace Engineering

http://www.cs.colorado.edu/~mozer/adaptive-house

Page 12: CSCI 1300 Artificial Intelligence Lecture Mike Mozer December 4, … · 2003-12-04 · Artificial Intelligence Lecture Mike Mozer December 4, 2003. Computer Science Operating Systems

The adaptive house

Not a programmable house, but a house that programs itself.

House adapts to the lifestyle of the inhabitants.House monitors environmental state and senses actions of inhabitant.

House learns inhabitants’ schedules, preferences, and occupancy patterns.

House uses this information to achieve two objectives:(1) anticipate inhabitant needs(2) conserve energy

Domain: home comfort systems• air heating

• lighting

• water heating

• ventilation

Page 13: CSCI 1300 Artificial Intelligence Lecture Mike Mozer December 4, … · 2003-12-04 · Artificial Intelligence Lecture Mike Mozer December 4, 2003. Computer Science Operating Systems

The adaptive house

Residence in Marshall, Colorado, outside of Boulder

Page 14: CSCI 1300 Artificial Intelligence Lecture Mike Mozer December 4, … · 2003-12-04 · Artificial Intelligence Lecture Mike Mozer December 4, 2003. Computer Science Operating Systems

Some of the gang

Page 15: CSCI 1300 Artificial Intelligence Lecture Mike Mozer December 4, … · 2003-12-04 · Artificial Intelligence Lecture Mike Mozer December 4, 2003. Computer Science Operating Systems

Great room

Page 16: CSCI 1300 Artificial Intelligence Lecture Mike Mozer December 4, … · 2003-12-04 · Artificial Intelligence Lecture Mike Mozer December 4, 2003. Computer Science Operating Systems

Bedrooms and bathrooms

Page 17: CSCI 1300 Artificial Intelligence Lecture Mike Mozer December 4, … · 2003-12-04 · Artificial Intelligence Lecture Mike Mozer December 4, 2003. Computer Science Operating Systems

Sensors

Page 18: CSCI 1300 Artificial Intelligence Lecture Mike Mozer December 4, … · 2003-12-04 · Artificial Intelligence Lecture Mike Mozer December 4, 2003. Computer Science Operating Systems

Sensors

Page 19: CSCI 1300 Artificial Intelligence Lecture Mike Mozer December 4, … · 2003-12-04 · Artificial Intelligence Lecture Mike Mozer December 4, 2003. Computer Science Operating Systems

Water heater

Page 20: CSCI 1300 Artificial Intelligence Lecture Mike Mozer December 4, … · 2003-12-04 · Artificial Intelligence Lecture Mike Mozer December 4, 2003. Computer Science Operating Systems

Furnace

Page 21: CSCI 1300 Artificial Intelligence Lecture Mike Mozer December 4, … · 2003-12-04 · Artificial Intelligence Lecture Mike Mozer December 4, 2003. Computer Science Operating Systems

Controls

Page 22: CSCI 1300 Artificial Intelligence Lecture Mike Mozer December 4, … · 2003-12-04 · Artificial Intelligence Lecture Mike Mozer December 4, 2003. Computer Science Operating Systems

Computers

Page 23: CSCI 1300 Artificial Intelligence Lecture Mike Mozer December 4, … · 2003-12-04 · Artificial Intelligence Lecture Mike Mozer December 4, 2003. Computer Science Operating Systems

Training signals

Actions performed by inhabitant specify setpoints➜ anticipation of inhabitant desires

Gas and electricity costs➜ energy conservation

Page 24: CSCI 1300 Artificial Intelligence Lecture Mike Mozer December 4, … · 2003-12-04 · Artificial Intelligence Lecture Mike Mozer December 4, 2003. Computer Science Operating Systems

An reinforcement learning framework

Each constraint has an associated cost:discomfort cost if inhabitant preferences are neglected

energy cost depends on device and intensity setting

The optimal control policy minimizes

where t = index over nonoverlapping time intervalst0 = current time intervalut = control decision for interval txt = environmental state during interval t

J t0( ) E= 1κ--- d xt( ) e ut( )+

t t0 1+=

t0 κ+

∑κ ∞→lim

Page 25: CSCI 1300 Artificial Intelligence Lecture Mike Mozer December 4, … · 2003-12-04 · Artificial Intelligence Lecture Mike Mozer December 4, 2003. Computer Science Operating Systems

ACHE(Adaptive Control of Home Environments)

Separate control system for each task

air temperature regulation

furnacespace heatersfansdampersblinds

lighting regulation

wall sconcesoverhead lights

water temperature regulation

hot water heater

device

inhabitant actions

environmentalstate

setpoints

and energy costs

ACHE

Page 26: CSCI 1300 Artificial Intelligence Lecture Mike Mozer December 4, … · 2003-12-04 · Artificial Intelligence Lecture Mike Mozer December 4, 2003. Computer Science Operating Systems

General architecture of ACHE

instantaneousenvironmental state

occupancymodel

statetransformation

predictors

setpointgenerator

deviceregulator

decision

staterepresentation

occupiedzones

setpointprofile

future stateinformation

Page 27: CSCI 1300 Artificial Intelligence Lecture Mike Mozer December 4, … · 2003-12-04 · Artificial Intelligence Lecture Mike Mozer December 4, 2003. Computer Science Operating Systems

Lighting control

What makes lighting control a challenge?Twenty-two banks of lights, each with 16 intensity levels; seven banks of lights in great room alone

Motion-triggered lighting does not work

Lighting moods

Two constraints must be satisfied simultaneously• maintaining lighting according to inhabitant preferences• conserving energy

Range of time scales involved

Sluggishness of system

Page 28: CSCI 1300 Artificial Intelligence Lecture Mike Mozer December 4, … · 2003-12-04 · Artificial Intelligence Lecture Mike Mozer December 4, 2003. Computer Science Operating Systems

Resolving the sluggishness dilemma

Anticipator: Neural network that predicts which zone(s) will become occupied in the next two seconds

Input1, 3, and 6 second average of motion signals (36)instantaneous and 2 second average of door status (20)instantaneous, 1 second, and 3 second average of sound level (33)current zone occupancy status and durations (16)time of day (2)

Outputp(zone i becomes occupied in next 2 seconds | currently unoccupied) (8)

Runs every 250 ms

Page 29: CSCI 1300 Artificial Intelligence Lecture Mike Mozer December 4, … · 2003-12-04 · Artificial Intelligence Lecture Mike Mozer December 4, 2003. Computer Science Operating Systems

Training anticipatorOccupancy model provides training signalTwo types of errors

miss

false alarm

Training procedureGiven partially trained net, collect misses and false alarms.Retrain net when 200 additional examples collected.TD algorithm for misses

state(t – 2000 ms)state(t – 1750 ms)...state(t – 250 ms)

zone i becomes occupied

state(t) zone i vacant

0 20000 40000 60000Number of training examples

hit/(

mis

s+fa

)

Page 30: CSCI 1300 Artificial Intelligence Lecture Mike Mozer December 4, … · 2003-12-04 · Artificial Intelligence Lecture Mike Mozer December 4, 2003. Computer Science Operating Systems

Examples of anticipator performance

Page 31: CSCI 1300 Artificial Intelligence Lecture Mike Mozer December 4, … · 2003-12-04 · Artificial Intelligence Lecture Mike Mozer December 4, 2003. Computer Science Operating Systems

Lighting controller costs

Energy cost7.2 cents per kW-hr

Discomfort cost1 cent per device whose level is manually adjusted

Anticipator miss cost.1 cent per device that was off and should have been on

Anticipator false alarm cost.1 cent per device that was turned on

Page 32: CSCI 1300 Artificial Intelligence Lecture Mike Mozer December 4, … · 2003-12-04 · Artificial Intelligence Lecture Mike Mozer December 4, 2003. Computer Science Operating Systems

Results

• about three months of data collection• events logged only from 19:00 – 06:59

2000 4000 6000 80000

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

# events

cost

(ce

nts) discomfort

energy

Page 33: CSCI 1300 Artificial Intelligence Lecture Mike Mozer December 4, … · 2003-12-04 · Artificial Intelligence Lecture Mike Mozer December 4, 2003. Computer Science Operating Systems

Air temperature control

0 5 10 15 20off

on fu

rnac

eSunday March 6, 2000

0 5 10 15 20away

home

0 5 10 15 200

0.5

1

Time of day

p(st

ate

chan

ge)

Page 34: CSCI 1300 Artificial Intelligence Lecture Mike Mozer December 4, … · 2003-12-04 · Artificial Intelligence Lecture Mike Mozer December 4, 2003. Computer Science Operating Systems

Comparison of control policiesusing artificial occupancy data

10.750.50.2507

7.2

7.4

7.6

7.8

8

8.2

Variability Index

Mea

n C

ost \

($/d

ay\)

Productivity Loss = 1.0 hr.

10.750.50.2507

7.5

8

8.5

9

9.5

10

10.5

Variability Index

Mea

n C

ost \

($/d

ay\)

Productivity Loss = 3.0 hr.

constant temperature

constant temperature

NeurothermostatNeurothermostat

setbackthermostat setback

thermostat

occupancytriggered

occupancytriggered

Page 35: CSCI 1300 Artificial Intelligence Lecture Mike Mozer December 4, … · 2003-12-04 · Artificial Intelligence Lecture Mike Mozer December 4, 2003. Computer Science Operating Systems

Comparison of control policiesusing real occupancy data

Mean Daily Costproductivity lossρ = 1 ρ = 3

Neurothermostat $6.77 $7.05constant temperature $7.85 $7.85occupancy triggered $7.49 $8.66setback thermostat $8.12 $9.74