A Reinforcement Learning (more information here) Intelligent Agent would be screwed if the reward mechanism isn't coded correctly.
Say, if we were to give it 10 reward points for every time it litters and 1 reward points for each time it throws rubbish to the bin, with the goal being to reach 50 reward points (yes, a stupid example..).
It should never throw rubbish to the bin after it's first "exploration" of throwing rubbish to the bin.
Because based on its "exploit" rule, it should be exploiting its knowledge that littering is more rewarding.
Why do we expect differently from a human being?
Why would we encourage someone to do something that is not of any benefit to themselves, or others simply out of pity?
Or perhaps, simply because we think we have good manners?
Then, after all that incorrect encouragement, why would we expect them to simply realise and choose not to stop doing that thing?
I'm a bit lost.
Hmm - a lot lost maybe.
PS: Looking this brings back memory.. good ones, especially when I saw the paper on Cyber-Minder (I read that more than 10 times throughout my Hons year!).
I do miss dealing with such an interesting, mind-challenging, complex yet much less complicated problem compared to dealing with a human.
No Response to "Reinforcement Learning"
Post a Comment