快捷搜索: 王者荣耀 脱发

[深度学习论文笔记][Image Classification] Human Performance

Russakovsky, Olga, et al. “Imagenet large scale visual recognition challenge.” International Journal of Computer Vision 115.3 (2015): 211-252. (Citations: 1352).

1 Error Both CNN and Human are Susceptible to

1.1 Multiple Objects

Both CNN and humans struggle with images that contain multiple ILSVRC classes (usually many more than five), with little indication of which object is the focus of the image. See the first column of Fig. 3.12.

We attribute 24% of GoogLeNet errors and 16% of human errors to this category. Humans can have a slight advantage in this error type, since it can sometimes be easy to identify the most salient object in the image.

1.2 Incorrect Annotations We found that approximately 0.3% were incorrectly annotated in the ground truth. This introduces an approximately equal number of errors for both humans and GoogLeNet.

2 Error CNN is More Susceptible to Than Human 2.1 Object Small or Thin Such as a standing person wearing sunglasses, or a small ant on a stem of a flower. We found that 21% of GoogLeNet errors fall into this category, while none of the human errors do. See the forth column of Fig. 3.12.

2.2 Image Filters Many people enhance their photos with filters that distort the contrast and color distributions of the image. We found that 13% of the images that GoogLeNet incorrectly classified contained a filter. See the third column of Fig. 3.12.

2.3 Abstract Representations GoogLeNet struggles with images that depict objects of interest in an abstract form, such as 3D-rendered images, paintings, sketches, plush toys, or statues. We attribute approximately 6% of GoogLeNet errors to this type. See the fifth column of Fig. 3.12.

2.4 Miscellaneous Sources Including extreme closeups of parts of an object, unconventional viewpoints, objects with heavy occlusions. See the second column of Fig. 3.12.

3 Error Human is More Susceptible to Than CNN 3.1 Fine-Grained Recognition Humans are noticeably worse at fine-grained recognition, even when they are in clear view. We estimate that 37% of the human errors fall into this category, while only 7% of GoogLeNet erros do. See the last column of Fig. 3.12.

3.2 Class Unawareness The annotator may sometimes be unaware of the ground truth class present as a label option. Approximately 24% of the human errors fall into this category.

3.3 Insufficient Training Data The annotator is only presented with 13 examples of a class under every category name. Approximately 5% of human errors fall into this category.

4 Conclusions Human accuracy is not a point. It lives on a tradeoff curve. It is clear that humans will soon only be able to outperform state of the art image classification models by use of significant effort, expertise, and time.

经验分享 程序员 微信小程序 职场和发展