Video title: Ambiguity and confirmation bias in reward learning
Video URL: https://www.youtube.com/watch?v=Fagr6lo7kNs
Ann์ ์๋ ์ฝ๊ฐ ๋น๊ด์ ์ธ ์ฑํฅ์
๋๋ค.
์ด๋ ๋ ํํฐ์ ๊ฐ๋๋ฐ, ๋ณ๋ก ๊ธฐ๋๋ฅผ ํ์ง ์์ต๋๋ค. ์ผ์ด ์ ํ๋ฆด ๊ฑฐ๋ผ๊ณ ์๊ฐํ์ง ์์ฃ .
๊ทธ๋์ ๋ถ์๊ธฐ๋ฅผ ํ๋ ค๊ณ ๋๋ด์ ํ๋ ํ์ง๋ง, ์ฌ๋๋ค์ด ํฌ๊ฒ ์์ต๋๋ค.
๊ทธ๋ฐ๋ฐ Ann์ ์ฌ๋๋ค์ด ์๊ธฐ _๋น์๊ณ ์๋ค_๊ณ ๋๋ผ์ฃ .
๊ฒฐ๊ตญ ์์ฒ๋ฅผ ๋ฐ๊ณ , ๋ค์๋ ํํฐ์ ๊ฐ์ง ์๊ฒ ๋ค๊ณ ๋ง์๋จน์ต๋๋ค.
์ด๋ฒ์ Bob์
๋๋ค.
Bob์ ํ๊ณ ๋ ๋๊ด์ฃผ์์๋ผ์, ์ํฉ์ด ๋์ฒด๋ก ์ ํ๋ฆด ๊ฑฐ๋ผ๊ณ ์๊ฐํฉ๋๋ค.
๊ทธ๋ ํํฐ์ ๊ฐ์ ๋๋ด์ ํ๋๋ฐ, ๋ชจ๋๊ฐ ํฌ๊ฒ ์์์ต๋๋ค.
Bob์ โ์, ๋ด ๋๋ด์ด ํตํ๋ค! ๋ด๊ฐ ๊ฝค ๊ด์ฐฎ์์!โ๋ผ๊ณ ์๊ฐํ๊ณ ๊ธฐ๋ถ์ด ์ข์์ง๋๋ค.
๋ค์์๋ ๋ ํํฐ์ ๊ฐ๊ฒ ๋ค๊ณ ๊ฒฐ์ฌํ์ฃ .
์ฆ, _Ann_๊ณผ _Bob_์ ์ ํํ ๊ฐ์ ๊ฒฝํ์ ํ์์๋, ์์ ํ ๋ค๋ฅด๊ฒ ํด์ํ์ต๋๋ค.
์ด๊ฑด ์ฌ๋ฌ ์ด์ ๋ก ์ค์ํฉ๋๋ค.
๊ทธ๋ค์ด ์์ผ๋ก ์ด๋ค _์ํ์ ๊ฐ์ํ ์ง_โ์ฆ, ๋๋ค์ ์๋ํ ์งโ์ ์ํฅ์ ๋ฏธ์นฉ๋๋ค.
๊ทธ๋ฆฌ๊ณ ์ด๊ธฐ ๊ธฐ๋๊ฐ ์๋ก ๋ฌ๋๊ธฐ ๋๋ฌธ์, ๊ฐ์ ์ฌ์ค์ ๊ฐ์ ๋ค๋ฅด๊ฒ ๋ฐ์๋ค์๋ ๊ฒ๋๋ค.
์ด๋ฐ ํ์์ด ๋ฐ๋ก **ํ์ฆ ํธํฅ(confirmation bias)** ์
๋๋ค.
์ธํฐ๋ท์์ โconfirmation biasโ๋ฅผ ๊ฒ์ํด ๋ณด๋ฉด, ๋๋ถ๋ถ ์ด๋ฐ ๊ทธ๋ฆผ์ ๋ณผ ์ ์์ต๋๋ค โ
โ๊ฐ๊ด์ ์ฌ์คโ๊ณผ โ๊ธฐ์กด ์ ๋
โ์ ๊ฒน์ณ์ ๋ณธ ๊ฒฐ๊ณผ๊ฐ ๊ณง ์ฐ๋ฆฌ๊ฐ _๋ณด๋ ์ธ๊ณ_๋ผ๋ ๋ป์ด์ฃ .
---
ํ์ฆ ํธํฅ์ ๋ณดํต ๋์ ํํ์ ๋ฃ์ต๋๋ค.
์ธ๊ฐ ์ถ๋ก ์์ ๋ฐ์ํ๋ ๊ฐ์ฅ ์ ๋ช
ํ๊ณ ๋๋ฆฌ ์๋ ค์ง ์ค๋ฅ๋ก ์ฌ๊ฒจ์ง๊ณ ,
๊ณผํยท์ํยท์ ์นยท๋ฒ๋ฅ ๋ฑ ์ค์ํ ๋ถ์ผ์์์ **ํธํํจ(closedโmindedness)** ์ ์์ธ์ผ๋ก ๋นํ๋ฐ์ต๋๋ค.
๊ทธ๋์ ์ฌ๋๋ค์ โํ์ฆ ํธํฅ์ ์ํํ๊ณ ์
์ฑ์ธ ์ค๋ฅโ๋ผ๊ณ ์๊ฐํ์ฃ .
ํ์ง๋ง, ๋ง์ฝ ์ด ํธํฅ์ด _์ ์์ (adaptive)_ ์ผ ์ ์๋ค๋ฉด ์ด๋จ๊น์?
_ํต๊ณ์ (Bayesian)_ ๊ด์ ์์ ๋ณด๋ฉด, ๊ทธ๊ฒ์ ์คํ๋ ค ํฉ๋ฆฌ์ ์ผ ์ ์์ต๋๋ค.
์ฆ, ๋ถํ์คํ๊ฑฐ๋ ๋ชจํธํ ์ํฉ์ ์ง๋ฉดํ์ ๋,
๊ธฐ์กด ๋ฏฟ์์ ๋ฐํ์ผ๋ก ๊ทธ๊ฒ์ ํด์ํ๋ ๊ฒ์ด ํฉ๋ฆฌ์ ์ผ ๊ฐ๋ฅ์ฑ์ด ์๋ค๋ ๊ฒ์
๋๋ค.
์ปดํจํฐ ๊ณผํ์์๋ ์ด๋ฅผ โ๊ท๋ฉ์ ํธํฅ(inductive bias)โ์ด๋ผ ๋ถ๋ฆ
๋๋ค โ
ํ์ต ๊ณผ์ ์ ํจ์จ์ ์ผ๋ก ์์ํ๊ฒ ๋์์ฃผ๋ ์ผ์ข
์ โ์ฌ์ ๋ฐฉํฅ์ฑโ์ด๋ผ๋ ๋ป์ด์ฃ .
---
๊ทธ๋์ ์ฐ๋ฆฌ๋ _ํ์ฆ ํธํฅ์ด ์ค์ ๋ก ์ ์์ ์ผ ์ ์๋๊ฐ_๋ฅผ ์ดํด๋ดค์ต๋๋ค.
์ด ์ฃผ์ ๋ ์ฌ๋ฌ ์์ญ์์ ์ด๋ฏธ ๋ค๋ค์ก๋๋ฐ,
์์ปจ๋ ์ฌ๋๋ค์ด ์์ ์ด ์ด๋ฏธ ๊ฐ์ง ๊ฐ์ค์ ํ์ฆํ๋ ์ ๋ณด๋ฅผ ๋ ์ฐพ์๋ณด๋ ์ ๋ณดํ์ ๊ณผ์ ,
๋๋ _์ง๊ฐ(perception)_ ๊ณผ์ ์์ ์ ๋งคํ ์๊ทน์ ๊ธฐ์กด ์ ๋
์ ๋ฐ๋ผ ์ด๋ป๊ฒ ํด์ํ๋๊ฐ ๋ฑ์ด ์์ต๋๋ค.
ํ์ง๋ง **๋ณด์ ํผ๋๋ฐฑ(reward feedback)** ์ฒ๋ฆฌ์์์ ํ์ฆ ํธํฅ ์ฐ๊ตฌ๋ ์๋์ ์ผ๋ก ์ ์์ต๋๋ค.
ํนํ, ์ฐ๋ฆฌ๊ฐ ์ง์คํ ๊ฒ์ **์ ํํ ํ๋์ ๊ฒฐ๊ณผ์ ์ ํํ์ง ์์ ํ๋์ ๋น๊ต(์ ํํ์ธ ๋น๋์นญ, choiceโconfirming asymmetry)** ๊ฐ ์๋๋ผ,
๊ทธ๋ฅ **์ ๋งคํ ๊ฒฐ๊ณผ ๊ทธ ์์ฒด๋ฅผ ์ด๋ป๊ฒ ํ๊ฐํ๋๋**์์ต๋๋ค.
์ฆ, _Ann_๊ณผ _Bob_์ ์ฌ๋ก์ฒ๋ผ, ๋์ผํ ์ฌ๊ฑด์ ๋๊ด์ ์ด๊ฑฐ๋ ๋น๊ด์ ์ผ๋ก ํด์ํ๋ ํ์์ ๊ด์ฐฐํ์ต๋๋ค.
---
์ฐ๋ฆฌ์ ์ฐ๊ตฌ๋ ๋ ๊ฐ์ง ๋ชฉํ๋ฅผ ๊ฐ์ต๋๋ค.
1. ์ ๋งคํ ๊ธ์ ์ ๊ฒฐ๊ณผ๋ฅผ ํด์ํ๋ ๋ฐฉ์์ด ํ์ฆ ํธํฅ์ ๋ฐ์ํ๋ค๋ฉด, ๊ทธ๊ฒ์ด **์ ์์ ๊ธฐ๋ฅ์ ๊ฐ๋๊ฐ**?
2. ์ด๋ฌํ ํธํฅ์ด **๊ฐ์ธ์ ๋๊ด์ ์ฑํฅ(optimistic disposition)** ๊ณผ ์ฐ๊ฒฐ๋๋๊ฐ?
์ด ํ์์ ์ฌ๋ฌ ๊ธฐ์กด ์ฐ๊ตฌ์ ๋ง๋ฟ์ ์์ต๋๋ค.
์ฐ๋ฆฌ๊ฐ ์ฃผ๋ชฉํ ๊ฒ์ **์ ์์ ๋ถํ์ค์ฑ(valence ambiguity)** โ
๊ฒฐ๊ณผ์ โํฌ๊ธฐโ๋ ์์ง๋ง, ๊ทธ๊ฒ์ด _์ข์์ง ๋์์ง_๋ ๋ชจ๋ฅด๋ ์ํฉ์
๋๋ค.
๋ค์ ๋งํด, ํํฐ์์ ์ฌ๋๋ค์ด ์์๋๋ฐ, ๊ทธ๊ฒ _๋น์์_์ธ์ง _๊ณต๊ฐ์ ์์_์ธ์ง ๋ชจ๋ฅด๋ ์ํฉ๊ณผ ๊ฐ์ฃ .
---
์ด์ ๊ด๋ จํ์ฌ, ๋๋ ๋ณด์์ ํฌ๊ธฐ(๊ฐ๋)์ ์ ์์ ๋ฐฉํฅ(์ข์/๋์จ)์ ๋ฐ๋ก ์ธ์ฝ๋ฉํ๋ค๋ ์ ๊ฒฝ๊ณผํ ์ฐ๊ตฌ๊ฐ ์์ต๋๋ค.
์์์ ์ผ๋ก๋, ์ ๋งคํ ์ ์ ์๊ทน(์: ๋๋ ์ผ๊ตด)์ด _๊ธฐ์ ๋๋์ธ์ง, ๋๋ผ์ด ๋ถํ์ธ์ง_๋ฅผ ์ด๋ป๊ฒ ํด์ํ๋์ง๊ฐ
๋ถ์์ด๋ ์ฐ์ธ ๊ฐ์ ์์์ ํน์ฑ๊ณผ ์ฐ๊ฒฐ๋๋ค๋ ๊ฒฐ๊ณผ๊ฐ ์์ต๋๋ค.
๋ถ์ํ ์ฌ๋๋ค์ ์ผ๋ฐ์ ์ผ๋ก **๋ถ์ ์ ํด์ ํธํฅ(negative interpretation bias)** ์ ๋ณด์
๋๋ค โ
์ฆ, ์ ๋งคํ ์๊ทน์ ๋ ๋ถ์ ์ ์ผ๋ก ํด์ํ๋ ๊ฒฝํฅ์ด ์์ฃ .
---
์ฐ๋ฆฌ์ ์ฐ๊ตฌ๋ ๋ ๊ฐ์ง ๊ธฐ์ฌ๋ฅผ ํฉ๋๋ค.
1. **์ ์์ ๋ถํ์ค์ฑ(valence ambiguity)์ ํฌํจํ๋ ์๋ก์ด ๋ค์ค์ ํ ๊ณผ์ (multiโarmed bandit task)** ๋ฅผ ์ค๊ณํ์ต๋๋ค.
2. **๋ฒ ์ด์ง์ ํ์ต ๋ชจ๋ธ**์ ์๋ก ๊ตฌ์ถํ์ฌ, ์ด๋ฐ ์ํฉ์์ ์ธ๊ฐ์ด ์ด๋ป๊ฒ ํ์ตํ๋์ง ๊ณ์ฐ์ ์ผ๋ก ์ค๋ช
ํ์ต๋๋ค.
์ด๋ ๋ค์๊ณผ ๊ฐ์ ์ฌ๊ณ ๋ฅผ ๊ธฐ๋ฐ์ผ๋ก ํฉ๋๋ค.
ํ ๊ฒฐ๊ณผ์ ์ผ๋ถ ์ ๋ณด(์: ํฌ๊ธฐ)๋ ์์ง๋ง, ๋๋จธ์ง(์ ์์ ๊ฐ์น)๋ ๋ชจ๋ฅผ ๋,
์ฌ์ ๋ฏฟ์(prior belief)์ ํ์ฉํด ๊ทธ โ๋น์นธโ์ ๋ฒ ์ด์ง์ ๋ฐฉ์์ผ๋ก ์ถ์ ํ๋ค๋ ๊ฒ์
๋๋ค.
---
### ๐ช ์คํ ๊ฐ์: โํฉ๊ธ ์ฑ๊ตด์ ๊ณผ์ โ
์ฐธ๊ฐ์๋ค์ ์๋ถ ์๋์ ๊ธ๊ด ์ฑ๊ตด์๋ก ์ค์ ๋ฉ๋๋ค.
๋ ๊ฐ์ ๊ธ๊ด ์ค ํ๋๋ฅผ ์ ํํด ์ฑ๊ตด์ ํ๋ฉด, ๋ณด์์ด ์ฃผ์ด์ง๋๋ค.
์๋ฅผ ๋ค์ด ์ผ์ชฝ ๊ธ๊ด์ ์ ํํ๋๋ **+10**์ด๋ผ๋ ๊ฒฐ๊ณผ๋ฅผ ์ป์๋ค๋ฉด, ์ด๋ โ๊ธ 10๋จ์โ๋ฅผ ์ป์๋ค๋ ๋ป์
๋๋ค.
๋ ๋ค๋ฅธ ๊ธ๊ด์ ํํ์ ๊ฒฝ์ฐ์ _๋ฐ์ฌ์ค์ ๊ฒฐ๊ณผ(counterfactual outcome)_ ๋ ํจ๊ป ๋ณด์ฌ์ค๋๋ค.
์๋ฅผ ๋ค์ด โ๋ค๋ฅธ ๊ธ๊ด์ ํํ์ผ๋ฉด โ11์ ์ ์ป์์ ๊ฒ์ด๋คโ๋ผ๊ณ ํ์ฃ .
์ด๋ ์ฐธ๊ฐ์๋ค์ ์ฌ๋ฌ ์๋๋ฅผ ๋ฐ๋ณตํ๋ฉด์ ์ด๋ค ๊ธ๊ด์ด ํ๊ท ์ ์ผ๋ก ๋ ์ ๋ฆฌํ์ง ํ์ตํฉ๋๋ค.
์ฌ๊ธฐ๊น์ง๋ ์ผ๋ฐ์ ์ธ ๊ฐํํ์ต(paradigm) ์ ์ฐจ์
๋๋ค.
---
ํ์ง๋ง ์ผ๋ถ ์๋์์๋ **์ ๋งคํ ๊ฒฐ๊ณผ(๋ชจํธํ ๊ด์, dirty ore)** ๊ฐ ๋ฑ์ฅํฉ๋๋ค.
์๋ฅผ ๋ค์ด โ6๋จ์์ ๋ฌด์ธ๊ฐ๋ฅผ ์ป์๋คโ๋ ์ ๋ณด๋ ์ฃผ์ด์ง์ง๋ง, ๊ทธ๊ฒ์ด ๊ธ์ธ์ง ๋์ธ์ง ์ ์ ์์ต๋๋ค.
๊ฒ์์ด ๋๋๋ฉด ์ค์ ๋ก ์ด๋ค ๊ฒ์ด์๋์ง๊ฐ ๊ฒฐ์ฐ๋์ด ๋ณด์์ ๋ฐ์๋์ง๋ง, ํ์ต ๊ณผ์ ์์๋ ์๋ ค์ฃผ์ง ์์ต๋๋ค.
๊ทธ๋์ ์ฐธ๊ฐ์๋ **์์ ์ ์ฌ์ ๋ฏฟ์๊ณผ ํ์ฌ๊น์ง์ ์ ๋ณด**๋ฅผ ๋ฐํ์ผ๋ก
์ด ๊ฒฐ๊ณผ๊ฐ _๊ธ์ธ์ง ๋์ธ์ง_ ์ถ์ ํด์ผ ํฉ๋๋ค.
์ด๋๋ โ๋ค๋ฅธ ๊ธ๊ด์ ๊ณจ๋์ผ๋ฉด ๋ฌด์์ ์ป์์๊น?โ ํ๋ ๋ฐ์ฌ์ค์ ์ ๋ณด๋ ๊ฐ์ด ๋ณด์
๋๋ค.
---
์ฐธ๊ฐ์์ **์ฌ์ ๋ฏฟ์**์ ์ ๋ํ๊ธฐ ์ํด, ๊ฐ ๋ธ๋ก ์์ ์์๋ **๊ฐ์ ์ ์ง(saloon)** ์ฅ๋ฉด์ด ๋์ต๋๋ค.
๊ทธ๊ณณ์์ ์นด์ฐ๋ณด์ด๊ฐ ์ด๋ ๊ฒ ๋งํ์ฃ :
- โ์ด ์ง์ญ์ ๊ธ์ด ์์ฃผ ๋ง๋!โ (๊ธ์ ์ ์กฐ๊ฑด)
- โ์ด ์ง์ญ์ ๊ฑฐ์ ๋๋ฟ์ด๋ผ๋ค.โ (๋ถ์ ์ ์กฐ๊ฑด)
- โ๋ชจ๋ฅด๊ฒ ์ด, ์ ๋ชจ๋ฅด๊ฒ ๋ค.โ (์ค๋ฆฝ ์กฐ๊ฑด)
์ฆ, ์ค์ ํ๋ฅ ์ ๋ฐ๊พธ์ง ์๊ณ ๋จ์ง **๊ธฐ๋(prior)** ๋ง ์กฐ์ํ๋ ๊ฒ๋๋ค.
์ค์ ํ๊ฒฝ์ ํญ์ ํ ๊ธ๊ด์ด ํ๊ท +10, ๋ค๋ฅธ ์ชฝ์ด ํ๊ท โ10์ผ๋ก ๊ณ ์ ๋์ด ์์ต๋๋ค.
---
### โ๏ธ ๋ชจ๋ธ๋ง
์ด ์ฐ๊ตฌ๋ **๋ฒ ์ด์ง์ ๊ฐํํ์ต(Bayesian reinforcement learning)** ๋ชจ๋ธ์ ๊ธฐ๋ฐ์ผ๋ก ํฉ๋๋ค.
๊ธฐ๋ณธ์ด ๋๋ _RescorlaโWagner ๋ชจ๋ธ_์ โ๋ณด์์์ธก์ค์ฐจ(reward prediction error)โ์ ๋ฐ๋ผ ๋ณด์ ์ถ์ ์น๋ฅผ ๊ฐฑ์ ํฉ๋๋ค.
๋ฒ ์ด์ง์ ๋ฒ์ ์์๋ ์ด ๊ฐฑ์ ํฌ๊ธฐ(ํ์ต๋ฅ )๊ฐ ๋ถํ์ค์ฑ์ ๋ฐ๋ผ ์๋์ผ๋ก ์กฐ์ ๋ฉ๋๋ค.
์ฆ, ์ฌ์ ๋ฏฟ์์ด ๋ชจํธํ ์๋ก ์ค์ ๋ฐ์ดํฐ์ ๋ ํฐ ๊ฐ์ค์น๋ฅผ ๋๋ ์์ด์ฃ .
ํ์ง๋ง ์ฐ๋ฆฌ ๊ณผ์ ์์๋ **์ ๋งคํ ๊ฒฐ๊ณผ**๊ฐ ์กด์ฌํฉ๋๋ค.
์๋ฅผ ๋ค์ด โ๋๋ฌ์ด ๊ด์ 10๋จ์โ๋ผ๋ ๊ฒฐ๊ณผ๋ฅผ ๋ฐ์์ ๋,
ํ๋ฌ์ค์ธ์ง ๋ง์ด๋์ค์ธ์ง ํ์คํ ๋ชจ๋ฅด๋ ๊ฒฝ์ฐ,
๋ชจ๋ธ์ ๊ทธ ๊ฒฐ๊ณผ๋ฅผ ๋ฒ ์ด์ง์ ํ๊ท ์น(์์ ๋ณด์)๋ก ๋ณด์ (impute)ํฉ๋๋ค.
์ฆ, ๊ธฐ์กด ์ ๋
์ด ๊ธ์ ์ ์ด๋ฉด ๊ทธ ๊ฒฐ๊ณผ๋ฅผ ์ข ๋ +10 ์ชฝ์ผ๋ก,
๋น๊ด์ ์ด๋ฉด โ10 ์ชฝ์ผ๋ก ํด์ํ๋๋ก ํฉ๋๋ค.
๋, ๊ทธ๋ฌํ ๋ถํ์ค์ฑ์ ๋ฐ์ํด **์ถ๊ฐ์ ์ธ ๋ถ์ฐ(uncertainty)** ์ ๊ณ์ฐ์ ํฌํจ์ํต๋๋ค.
50๋50์ผ๋ก ๋ชจํธํ ์๋ก ๋ถํ์ค์ฑ์ด ์ปค์ง๊ณ , ๊ฒฐ๊ณผ์ ํฌ๊ธฐ๊ฐ ํด์๋ก ๊ทธ ์ํฅ๋ ์ปค์ง๋๋ค.
์ํ์ ์ผ๋ก๋ ๊ณ์ฐ ํธ์๋ฅผ ์ํด โ๊ฐ์ฐ์์ ๊ทผ์ฌ(assumed density filtering)โ๋ฅผ ์ ์ฉํ์ต๋๋ค.
---
### ๐ ๊ฒฐ๊ณผ ์์ฝ
- ์ฐธ๊ฐ์๋ค์ ์ค์ ๋ก ํ์ต์ ์ ํ๊ณ , ๋ชจ๋ธ์ด ํ์ต๊ณก์ ์ ์ ์์ธกํ์ต๋๋ค.
- โ์ด ๊ฒฐ๊ณผ๊ฐ ๊ธ์ด์์๊น ๋์ด์์๊น?โ ๋ผ๋ **์ ๋
๋ณด๊ณ (stated belief)** ๋ฅผ ๋ณด๋ฉด,
- โ๊ธ์ด ๋ง์ ์ง์ญโ์ด๋ผ๊ณ ๋ค์ ์กฐ๊ฑด์์๋ ๊ธ์ผ๋ก ํด์ํ๋ ๋น์จ์ด ๋์์ต๋๋ค.
- โ๋์ด ๋ง์ ์ง์ญโ ์กฐ๊ฑด์์๋ ๋ฐ๋๋ก ๋ฎ์์ต๋๋ค.
- ์ค๋ฆฝ ์กฐ๊ฑด์ ๋์ฒด๋ก ๊ธ ์กฐ๊ฑด์ ๊ฐ๊น์ ์ต๋๋ค.
- ๋ฒ ์ด์ง์ ๋ชจ๋ธ์ ์ด๋ฐ ์กฐ๊ฑด ๊ฐ ์ ๋
์ฐจ์ด๋ฅผ ์ ํํ ์ฌํํ์ต๋๋ค.
---
๋ ํฅ๋ฏธ๋ก์ด ๊ฒ์,
**์ค๋ฆฝ ์กฐ๊ฑด์ ์ ๋
๊ณก์ **์ ์ค์ ๊ธ์ ํ๋ฅ ๊ณผ ๊ฑฐ์ ์ผ์นํ๋ค๋ ์ ์
๋๋ค.
์ฆ, ์กฐ์์ผ๋ก ๊ธยท๋ถ์ ๊ธฐ๋๋ฅผ ์คฌ์ ๋๋ ์ด๋ฐ์๋ ํธํฅ์ด ์์์ผ๋,
ํ์ต์ด ์งํ๋จ์ ๋ฐ๋ผ ์ ์ฐจ ์ค์ ํ๋ฅ ์ ๊ทผ์ ํด ๊ฐ์ต๋๋ค.
์ด๊ฒ์ โ์ด๊ธฐ ๋๊ด/๋น๊ด์ ๋ฏฟ์ โ ๊ฒฝํ ๋ฐ์ดํฐ๋ก ์ ์ฐจ ๊ต์ โ์ด๋ผ๋
ํ์ฆ ํธํฅ์ _์ ์์ _ ์๋ ๋ฐฉ์์ ์ ๋ณด์ฌ์ค๋๋ค.
---
### ๐ ๊ฐ์ธ์ฐจ์ ๋๊ด์ฑ
์ด์ ๊ฐ์ธ์ฐจ๋ฅผ ์ดํด๋ดค์ต๋๋ค.
์ฐธ๊ฐ์๋ค์ **๋๊ด์ฑ(optimism)** ์ _Life Orientation Test (LOTโR)_ ๋ก ์ธก์ ํ์ต๋๋ค.
์ด ์ ์๋ ํ๋ณต๊ฐ, ๊ฑด๊ฐ ๋ฑ๊ณผ๋ ๋ฐ์ ํ๊ฒ ์ฐ๊ด๋ ์ฌ๋ฆฌ์ฒ๋์
๋๋ค.
๋ถ์ ๊ฒฐ๊ณผ, ๋๊ด์ฑ์ด ๋์ ์ฌ๋์ผ์๋ก
**์ ๋งคํ ๊ฒฐ๊ณผ๋ฅผ โ๊ธ์ผ ๊ฐ๋ฅ์ฑ์ด ๋๋คโ** ๊ณ ํด์ํ์ต๋๋ค.
ํ์ง๋ง ๊ฐ๊ด์ ์ ํ์ฑ์์๋ ์ฐจ์ด๊ฐ ์์์ต๋๋ค.
์ฆ, _๋๊ด์ ์ธ ์ฌ๋๋ค์ ๋์ผํ ์ ๋งคํ ์ ๋ณด๋ฅผ ๋ ๊ธ์ ์ ์ผ๋ก ํด์ํ๋ค_๋ ๊ฒ์
๋๋ค.
---
### ๐งญ ๊ฒฐ๋ก
1. **์ ์์ ๋ถํ์ค์ฑ ํ์์์ ํ์ต ๊ณผ์ ์ ๊ณ์ฐ ๋ชจ๋ธ**์ ์ ์ํจ.
2. **ํ์ฆ ํธํฅ์ด ๋จ์ํ ์ค๋ฅ๊ฐ ์๋๋ผ, ์ ์์ ์ญํ ์ ํ ์ ์์**์ ๋ณด์.
- ๋ฒ ์ด์ง์ ๋ชจ๋ธ์ ์ฐธ๊ฐ์์ ์ค์ ์ ๋
๋ณํ๋ฅผ ์ ์ค๋ช
ํจ.
- ์๊ฐ ๊ฒฝ๊ณผ์ ๋ฐ๋ผ ์ฌ์ ์ ๋
๊ณผ ์ค์ ๊ฒฝํ์ด ์ผ์นํด๊ฐ.
3. **๋๊ด์ ์ฑํฅ์ ๊ธ์ ์ ํด์์ ๊ฐ๋์ ์๊ด**์ด ์์.
์ฆ, ๋๊ด์ฃผ์๋ ์ผ์ข
์ **โ์ฌ์ ํ๋ฅ (prior)โ** ๋ก ์์ฉํด,
๋ชจํธํ ๊ฒฝํ์ ๋ ๊ธ์ ์ ์ผ๋ก ํด์ํ๊ฒ ๋ง๋ค๊ณ ,
์ด๋ ๊ณง ํ์ต๊ณผ ํ๋ ์ ํ์ ์ํฅ์ ๋ฏธ์น ์ ์์ต๋๋ค.
๋ฐ๋ผ์, ํ์ฆ ํธํฅ์ ์์ ํ ๋นํฉ๋ฆฌ์ ์ธ ์ค๋ฅ๊ฐ ์๋๋ผ
โ๋ฒ ์ด์ง์์ ํ์ต์ด ์๋ํ๋ ์์ฐ์ค๋ฌ์ด ํํโ์ผ ์๋ ์์ต๋๋ค.
์ฐ๋ฆฌ ์คํ๊ณผ ๋ชจ๋ธ์ ์ด ๊ฐ์ค์ ํ์ํ ์ ์๋ ์๋ก์ด ํ์ ์ ๊ณตํฉ๋๋ค.
as you mentioned this has worked with the Sam gershman and Haley Dorfman who is now a postdoc with Liz Phelps at Harvard so let's see if I can move my slides meet and and is going to a party and she is a little pessimistic by nature and so she doesn't expect things to generally go well but she decides to tell a joke and everybody laughs and they laugh and things very cruelly at her and Anna's sad and she decides to never go to another party again meet Bob goes to a party Bob is kind of optimistic by Nature so he thinks that things generally go fairly well so he decides to tell a joke and everybody laughs and Bob thinks wow they're laughing at my joke Bob is cool I love this and I'm going to the next party again now Ann and Bob basically had the exact same experience they saw the same data and yet they interpreted that in totally different ways this is important for a number of reasons right it clearly affects their propensity to take risks in the future it affects whether they're going to go out and have that experience again and possibly disconfirm their initial beliefs but because they had these different expectations to begin with they end up interpreting that data in different ways now what happens here is basically a confirmation bias and if you Google confirmation bias you'll see a lot of figures that look like this where it means what you see is basically the intersection between the objective facts and what confirms your prior beliefs here so confirmation bias it gets a very bad rap some say it's the best known and most widely accepted notion of inferential error to come out of the literature on human reasoning and it's been blamed for closed-mindedness in Fields as hopefully important and Broad as science medicine politics law so people people hate confirmation bias right they think this is this terrible nefarious pernicious kind of bias that we should never have but what if confirmation bias could actually be adaptive what if it could have some useful value to it and if you think of it from a statistical perspective from a Bayesian perspective it could actually make sense that is to say when you're faced with some uncertain or ambiguous cities it makes sense that you should interpret those in light of your prior beliefs so if you think of it in computer science terms they might call it an inductive bias right something that helps you to jump start your learning process so we have some reason to think that actually it could be adaptive and we've seen this in a few different domains so far some have studied confirmation bias that might be adaptive in the way we acquire information that is do we seek out information that confirms our pre-existing hypotheses or in the interpretation of perceptual data so when you have ambiguous perceptual stimuli how do we interpret those given our prior beliefs uh we're looking at the processing of reward feedback now so far there's only been a little work on what are called Choice confirming asymmetries and so that reflects how people seem to place greater weight on positive outcomes for actions that they chose and less weight on positive outcomes for actions that they didn't shoot so the sort of counterfactual outcomes here we're looking at something different we're looking at how you appraise these ambiguous outcomes themselves uh irrespective of whether that's what you chose or what you didn't choose so more like our Ann Bob situation to begin with and we're going to provide two things here we're going to have a new model and we're going to have a new task so we have two goals here first we want to figure out is the way that we interpret ambiguous rewarding outcomes if that's confirmation biased is that potentially adaptive and second can we connect these biases to individual differences and in particular we'll look at differences in one's optimistic disposition now this relates to a number of literatures and we're particularly looking at a setting where you have what we call valence ambiguity and that means the magnitude of the outcome is something that you know but you're not sure about the valence you don't know whether it was good or bad and this is just like that and Bob situation where you saw uh there was some response some event that happened where everybody left but you weren't sure whether that was in a good way or a bad way and so this links with a few different literatures uh one that has to do with the way that we encode valence and magnitude perhaps separately in the brain uh also there's there's neuroscientific work uh and clinical work on the way that we process ambiguous rewards and Valence ambiguity in particular has been studied in the context of emotion processing so if uh I show you a face of somebody who's surprised was that a good surprise or a bad surprise right were they surprised because of some positive experience or a negative experience and the way that we interpret ambiguous outcomes has also been linked to uh clinical traits and clinical outcomes so for example people who are anxious tend to have what's called a negative interpretation bias where they tend to read into these negative stimuli negative faces negative experience or ambiguous stimuli experiences in a more negative fashion so we have two contributions here first is we are providing a new uh a multi-armed bandit task a new experimental Paradigm uh to incorporate valence ambiguity and second we're proposing a new Basin computational model of how you maybe should learn under this valence ambiguity and it captures this idea that you have almost a Bayesian missing data imputation that is to say if you know part of something like the magnitude of some outcome and you have some prior beliefs you should use that to actually guide your assessment of what the valence is so the way that you fill in the blanks should be consistent with uh Bayes and statistical principles here so in our experimental Paradigm we turn to the wild west as we've done before in a previous paper and in this task people are in the role of gold miners in the wild west this is a very uh an interesting way of presenting a classic multi-armed Bandit Paradigm that's been used in a number of different fields here so participants are in the role of this minor where they have two different gold mines that they can choose to dig from so in a given trial they have to pick one of these once they pick one they get an outcome here so I picked the one on the left and we show them you got a reward of plus 10 which translates into a positive amount of money for them and we show that here as you got 10 units of gold so gold is good we also show them the counter factual outcome here so that's in this case if you picked the other mind you would have gotten minus 11 points we did this so that we can shut off any exploration exploitation and just focus on interpretation of these outcomes and learning here so you took the action you got the outcome and then you go back through this again and again right classic reinforcement learning type approach so far that's pretty standard here's the modification we're making on some trials when you dig you get something that looks like this and this is dirty ore it's an ambiguous outcome where you know that this was six units of something but you don't know whether it was gold or rocks because it's covered in dirt and you'll still get paid at the end of the task based on whether it was in fact gold or rocks but you will not be told which of those it is and that means you have to make your best guess based on any information you have to date any prior uh pre-existing beliefs as well as the magnitude of what's given to you here and again you get the counter factual outcome it was about 50 of the trials where we give them these ambiguous outcomes and always for the option that they chose and then we asked them do you think that this was Rockstar gold so we had a stated belief question here and that tells us more specifically what actually were your beliefs about this ambiguous stimulus now in order to uh try and influence their prior beliefs going into a particular round we started off each block by putting them into this sort of virtual salute and in the saloon they encountered a cowboy who told them that he'd heard a rumor that maybe there's a lot of gold in this terrain right it might be a very rich environment or he said well I hear a rumor that there were a lot of rocks not really a great place or he didn't really hear anything he's not really sure here so this is a neutral condition so we can try and influence them a bit to say it's a rich a poor or a neutral environment we didn't actually change the reward statistics here it was always the case that one mine had an average value of plus 10 when if the other ones had a value of -10 and then there was just some common noise across the actual outcome that they would receive in any given round so we held the context itself or the environment fixed but we just tried to change their pre-existing beliefs a little bit so just to reiterate we said that uh in each block first they were in the saloon they were given some pre-existing belief a little bit then they were given a choice if the minds to dig from they got the feedback and made a judgment about whether they thought the outcome was rocks or gold and then they cycled back to Part B the choice here and went through that again and then after about uh you know a handful of trials they would go back into another condition where they were given a different pre-existing belief here so in our model this is built on a basin reinforcement learning model so the standard rescorla Wagner model this is the most basic version of model free reinforcement learning it says that you update your estimate of rewards by taking your initial estimate and then you increase that based on this reward prediction error which is the difference between the outcome that you actually got minus your pre-existing estimate and scale that by some learning rate that says well how much are you adjusting based on that new data now if you have a Bayesian version of this then that learning rate is itself derived from optimal statistical principles and so the the weight that you place on your prior beliefs versus the data is basically a reflection of how much uncertainty there is in each of these relative to one another so if you have very very big pre-existing beliefs that means your estimate variance here is quite high you're going to update more towards the data because you didn't really know much to start with now to this point this is a standard Basin reinforcement learning model this is the the Kalman filter from engineering but when we look at our Paradigm with ambiguous outcomes well now there are some points here which are actually not so clear right what does it mean to have a given outcome that's ambiguous if I told you you got 10 units of this Derby or should you count that as plus 10 should you count that as minus 10 should it be zero should you ignore it should it be somewhere in between where exactly if so and so we say you're going to impute that ambiguous outcome with basically your best guess your mean uh our expectation of that reward given the magnitude of it so if you think that this magnitude given you know you got A plus 10 if you think that tends to be more likely to be positive than negative then you're going to treat it as somewhere in the middle of minus 10 and plus 10 but skewed more towards the plus 10 side and you can calculate that more precisely given the basic Machinery which I just won't go into very much here the other thing that you need to do is to change the uncertainty about the outcome here as well because it's not only the signal noise that says how different each outcome is in a given trial from the true mean you also have some further uncertainty caused by the um the ambiguity from the coarsening of whether you whether it was positive or negative here as well and so you can see that that's a maximally uncertain when you're totally unsure 50 50 guess as to whether it was a positive or negative outcome here um and it happens to be scaled by the reward because you can think how if you've got an outcome of like zero or one you know that it's basically plus one or minus one you know that it's almost the same it doesn't really matter but if it's plus 100 or minus 100 now you're really really unsure and you should factor that in and I should mention this model is actually approximately based and we did what's called assumed density filtering which basically projects everything onto a gaussian space because it makes the math work out really nicely and that's a lot easier than doing it other ways so what did we actually find here well first the Bayesian model 50s learning curves fairly well so people were learning in the task and this was captured by you know any of these kinds of models that we would use here [Music] we also looked at the stated beliefs right remember that was when we asked them do you think that this outcome was rocks for gold and of course uh can you see my mouse cursor here cool so of course if there's gold this is a nice attention check we know that everybody said it was gold and it was rocks barring Some Noise you said it was rocks but in those dirty outcomes people basically said well if they were in that rich condition where they got the rumor of gold in this environment they had higher stated beliefs in gold and if they were in the poor condition then they had lower stated beliefs neutral oddly seemed to be pretty close to the rich condition actually and these little circles here represent the model predictions of a bayesin model where we fit condition specific prior beliefs here so the Basin model was able to capture these differences in stated beliefs across conditions as well now this is not model uh predictions here this is just the data itself and this is pretty neat if you look at these dotted lines here these dotted lines here reflect the true prevalence of gold in each uh trial here that is to say given the option you chose How likely was that to actually be gold and this is increasing with experience because you're just getting better at the task right you're you're actually learning this is the better mine to pick I'm going to pick that more often that's why this increases here what's cool is that these solid lines represent the stated beliefs that people had and in this neutral condition this Gray Line their stated beliefs track the actual true probabilities really closely that's to say if we didn't kind of push them in one direction or another their stated beliefs are pretty accurate compared to the True Values now the other neat thing is well we did see differences between these conditions so when we said well it's a bit Rich then they they are at a higher belief State here and when it's poor it was somewhat lower but in particular that obtains more at the start rather than the finish so it's almost as if they have different prior beliefs induced by this uh manipulation but with experience they're actually converging to the truth here so that's kind of neat right this is consistent with some idea of uh confirmation bias and optimism as being prior beliefs that are then uh influenced by the data as well and it turns out that if we fit that Bayesian model with different prior means for each condition you could actually capture those differences across conditions that is to say you end up with uh differences in those beliefs at the very beginning but then those end up washed out with the experience that you're getting because that experience is pretty uh comparable of that so the Bayesian model is capturing these dated belief Dynamics fairly well here it turns out the best fitting model does seem to vary somewhat across individuals but across the board the Bayesian version of the model fit better than a more traditional riskola Wagner reinforcement learning model that just skipped those ambiguous outcomes here comes here now can we look at differences across individuals right we saw that there are these pre uh these prior differences in beliefs when we very experimentally people's beliefs but what about individual trait differences and so we looked at optimism using this well-known life orientation test provides the lotr and this is pretty well known and commonly used in Clinical Psychology to capture optimism and it's been linked to a lot of uh well-known positive life and health outcomes so it's been linked to uh you know how healthy you are in terms of certain kinds of diseases and in terms of how you know happy you tend to be so you can see how this has a link between well and things like uncertain times I usually expect the best this is a very clear link uh in at least intuitively to how you're going to interpret these ambiguous outcomes and so we found in fact that if we take the average stated belief in gold on those ambiguous trials that is positively correlated with their optimism level that is to say optimists believe that these ambiguous outcomes tend to be positively rewarding here and it's not that there was any difference across them in terms of their objective accuracy that was pretty much uncorrelated only in terms of their subjective beliefs here so optimists and facts were attributing these ambiguous outcomes with more positive valence in effect so there's more that we're digging into still but in summary we showed this or download this task that measured learning and beliefs about these ambiguous outcomes in this context of valence ambiguity uh and we found that this observed confirmation bias seemed to be consistent with Bayesian mechanisms which is important for demonstrating that's adaptive in the sense that this data beliefs seem to be accurate they were connected closely to the true probabilities we also found that the Bayesian model could provide an adequate fit of this data and we're still trying to look into this form but it could be consistent with this idea that's been proposed of optimism as priors right so your your optimism determines your prior beliefs here which could then be uh minimized with the data that you're getting here or we're still looking into this but it provides some support for the idea that confirmation bias in these reinforcement learning type settings could actually be adapted and we provide a paradigm and a model that helps us to explore this idea so that's uh my talk and I'm happy to take any questions to the extent we have time for it