Deeply Learned Attributes for Crowded Scene Understanding
主要内容:
1. 自己创建数据集:Who do What at someWhere (WWW)
2. 用深度模型训练
动机:大多数的人群研究都是基于特定场景的,变化场景表现就没有那么好了
不同场景都有着相同的属性,但是这些属性没有明确的定义
创建数据集用于理解人群的场景
具体工作:
创建数据集WWW:
1. 数据集内容:
10,000 videos
8257 crowded scenes
94 attributes
The largest dataset , all from real-word
2. 创建步骤:
1) 从人群场景中收集关键词
2) 收集视频,从Getty Images, Pond, andYouTube,surveillance,movies
3) 收集属性:we first collected tagsfrom Pond5 and Getty Images,再做一些处理(属性的清理工作),并雇佣一些人为视频标记这些属性
用卷积神经网络
1) 输入:以往的模型是直接输入帧
我们的模型:appearance and crowd motion channels ,可以这样做的原因:differentcrowd system share similar principles that can be characterized by some genericproperties.
(让我们的模型去学习外表和运动特征,并结合他们,可以抓取属性之间的相关性)
用以下的卷积神经网络模型:
2) 提取motion channels
Three motion channels :Collectiveness ,Stability ,Conflict
All the descriptors are defined upon tracklets detected by the KLT feature point tracker, and each of them is computed on 75 frames of each video in the WWW dataset.
We first define a K-NN (K = 10) graph for the whole tracklet point set.
We use descriptor in [45] to extract collectiveness. [33] to stability. [33] to conflict.
实验:
1. 人工实验
找来8个实验者,在分别给定background10, tracklets, and background with tracklets下测试人工分辨的如何
2. 用深度模型在www数据集上训练