整理内容来自:Fei-Fei Li: How computer understand the world?
闲来无事,听了李菲菲博士(女神)在TED上的精彩报告。科普性很好,语言很流畅,可以作为计算机视觉方向英语口语练习的好材料,就把字幕整理了下来,以备学习方便。
Let me show you something.
(Video) Girl: Okay, that's a cat sitting in a bed.
The boy is petting the elephant.
Those are people that are going on an airplane. That's a big airplane.
Fei-Feo Li: That's a three-year-old child describing what she sees in a series of photos. She might still have a lot to learn about this world, but she's already an expert at one very important task: to make sense of what she sees.
Our society is more technologically advanced than ever. We send people to the moon, we make phones that talk to us or customize radio stations that can play only music we like. Yet, our most advanced machines and computers still struggle at this task. So I'm here today to give you a progess report on the latest advances in our research in computer vision, one of the most frontier and potentially revolutionary techniques in computer science.
Yes, we have prototyped cars that can drive by themselves, but without smart vision, they cannot tell the difference between a crumpled paper bag on the ground, which can be run over, and a rock that size, which should be avoided. We have made fabulous megapixel cameras, but we have not delivered sight to the blind. Drones can fly over massive land, but don't have enough vision technology to help us to track the change of the rainforests. Security cameras are everywhere, but they do no alter us when a child is drowing in a swimming pool. Photos and videos are becoming an integral part of global life. They are being generated at a pace that's far beyond what any human or teams of humans, could hope ot view, and you and I are contributing to that at thsi TED.
Yet, our most advanced software is still struggling at understanding and managing this enormous content. So in other words, collectively as a society, we're very blind, because our smartest machines are still blind. "Why is this so hard?" you may ask. Camera can take pictures like this one by converting lights into a two-dimensional array of numbers known as pixels, but these ar