DeepMind Used YouTube Videos To Train Game-Beating Atari Bot
Friday June 1, 2018. 04:20 AM , from Slashdot
Artem Tashkinov shares a report from The Register: DeepMind has taught artificially intelligent programs to play classic Atari computer games by making them watch YouTube videos. Exploration games like 1984's Montezuma's Revenge are particularly difficult for AI to crack, because it's not obvious where you should go, which items you need and in which order, and where you should use them. That makes defining rewards difficult without spelling out exactly how to play the thing, and thus defeating the point of the exercise. For example, Montezuma's Revenge requires the agent to direct a cowboy-hat-wearing character, known as Panama Joe, through a series of rooms and scenarios to reach a treasure chamber in a temple, where all the goodies are hidden. Pocketing a golden key, your first crucial item, takes about 100 steps, and is equivalent to 100^18 possible action sequences.
To educate their code, the researchers chose three YouTube gameplay videos for each of the three titles: Montezuma's Revenge, Pitfall, and Private Eye. Each game had its own agent, which had to map the actions and features of the title into a form it could understand. The team used two methods: temporal distance classification (TDC), and cross-modal temporal distance classification (CDC). The DeepMind code still relies on lots of small rewards, of a kind, although they are referred to as checkpoints. While playing the game, every sixteenth video frame of the agent's session is taken as a snapshot and compared to a frame in a fourth video of a human playing the same game. If the agent's game frame is close or matches the one in the human's video, it is rewarded. Over time, it imitates the way the game is played in the videos by carrying out a similar sequence of moves to match the checkpoint frame. In the end, the agent was able to exceed average human players and other RL algorithms: Rainbow, ApeX, and DQfD. The researchers documented their method in a paper this week. You can view the agent in action here.
Read more of this story at Slashdot.
Jul, Mon 23 - 02:15 CEST