Learning attentional policies for tracking and recognition in video with deep networks

de Freitas, Nando

Learning attentional policies for tracking and recognition in video with deep networks

Loris Bazzani‚ Nando Freitas‚ Hugo Larochelle‚ Vittorio Murino and Jo−Anne Ting

Abstract

We propose a novel attentional model for simultaneous object tracking and recognition that is driven by gaze data. Motivated by theories of the human perceptual system, the model consists of two interacting pathways: ventral and dorsal. The ventral pathway models object appearance and classification using deep (factored)-restricted Boltzmann machines. At each point in time, the observations consist of retinal images, with decaying resolution toward the periphery of the gaze. The dorsal pathway models the location, orientation, scale and speed of the attended object. The posterior distribution of these states is estimated with particle filtering. Deeper in the dorsal pathway, we encounter an attentional mechanism that learns to control gazes so as to minimize tracking uncertainty. The approach is modular (with each module easily replaceable with more sophisticated algorithms), straightforward to implement, practically efficient, and works well in simple video sequences.

Address

New York‚ NY‚ USA

Book Title

Proceedings of the 28th International Conference on Machine Learning (ICML−11)

Editor

Lise Getoor and Tobias Scheffer

ISBN

978−1−4503−0619−5

Location

Bellevue‚ Washington‚ USA

Month

June

Pages

937–944

Publisher

ACM

Series

ICML '11

Year

2011

Learning attentional policies for tracking and recognition in video with deep networks

Abstract

Links

See Also