Robot vision system refers to the use of computers to achieve human visual functions, that is, to recognize the objective three-dimensional world using computers. More than 70% of the information received by humans comes from vision. Human vision provides humanity with the most detailed and reliable information about its surrounding environment.
The powerful functions of human vision and perfect information processing methods have aroused great interest among intelligent researchers. People hope to study the artificial vision system used in robots based on biological vision, and hope that robots have the ability to perceive the environment similar to humans. Robots rely on various sensors to perceive information from the external world. Just like humans, the visual system provides most robots with the external stage boundary information they need. Therefore, the visual system plays an important role in robot technology.
According to the number and characteristics of visual sensors, mainstream mobile robot vision systems include monocular vision, binocular stereo vision, multi eye vision, and panoramic vision.
Monocular vision, where a monocular vision system only uses one visual sensor. During the imaging process, the monocular vision system projects from the three-dimensional objective world to the ND image, thereby losing depth information, This is the main drawback of this visual system However, due to its simple structure, mature algorithms, and low computational complexity, monocular vision systems have been widely used in autonomous mobile robots, such as target tracking, indoor positioning and navigation based on monocular features. At the same time, monocular vision is the foundation of other types of vision systems, such as binocular 3D vision, multi vision, etc., which are achieved through other means and measures on the basis of monocular vision systems.
Binocular 3D vision, consisting of two cameras, utilizes the principle of triangulation to obtain depth information of the scene, and can reconstruct the 3D shape and position of the surrounding scene, similar to the visual function of the human eye. The principle is simple. The binocular vision system needs to accurately understand the spatial position relationship between two cameras, and the 3D information of the scene environment requires two images of the same scene taken from different angles and complex matching to accurately obtain the camera's 3D vision system. It can more accurately restore the 3D information of the visual scene and is widely used in mobile robot positioning and navigation, obstacle avoidance, and map construction. However, the difficulty of stereo vision systems lies in corresponding point matching, which greatly limits the application prospects of stereo vision in the field of robotics.
Multi eye vision system, which uses three or more cameras, mostly a three eye vision system, mainly used to solve the problem of matching ambiguity in 3D vision systems and improve matching accuracy. Moravik was the first to study multi eye vision systems. He developed a visual navigation system using a single camera "sliding 3D vision" developed by "Stanford Cart". Yasuda proposed a 3D vision system to solve the corresponding point matching problem, truly breaking through the limitations of 3D vision systems and pointing out the boundary points of the 3D vision system. The accuracy of ternary matching is high. Yashi proposed a three eye matching algorithm characterized by polygonal approximate undulating boundary segments, And it has been applied to mobile robots, achieving good results. The advantage of a binocular vision system is to fully utilize the information of the third camera, reduce incorrect matching, solve the ambiguity of matching in the binocular vision system, and improve positioning accuracy. However, the relative position of the three cameras should be reasonably placed in the binocular vision system. Its structural configuration is more cumbersome than that of a binocular vision system, and the matching algorithm is more complex, consumes more time, and has poorer real-time performance.
Panoramic vision is a multi-directional imaging system with a large field of view that can reach 360 degrees, which is incomparable to other traditional lenses. The panoramic vision system can be achieved through image spelling or refractive optical elements. The image stitching method uses single or multiple cameras to rotate, scan the scene at a large angle, obtain consecutive multiple frames of images in different directions, and then use stitching technology to obtain panoramic images. The refractive panoramic vision system consists of a CCD camera and refractive optical elements. By utilizing the principle of mirror imaging, 360 degree scenes can be observed with fast imaging speed, which can meet real-time requirements and has very important application prospects. It can be applied to robot navigation. Essentially, a panoramic vision system is also a monocular vision system that cannot obtain depth information of the scene. Another characteristic is low image resolution and large image distortion, which affects the stability and accuracy of image processing. In the process of image processing, it is first necessary to correct distorted images based on the imaging model. This not only affects the real-time performance of the visual system, but also causes information loss. In addition, this visual system has significant requirements for the machining accuracy of panoramic mirrors. If the accuracy of the hyperbolic reflector cannot meet the requirements, the use of an ideal model will have a significant deviation in image correction.
Hybrid visual systems and hybrid visual systems absorb the advantages of various visual systems. A composite visual system consists of two or more visual systems, mainly using monocular or binocular visual systems, and equipped with other visual systems. The panoramic vision system is composed of a spherical reflection system. The panoramic vision system provides large angle environmental information, a 3D vision system, and a laser rangefinder to detect obstacles. Zhu Zhigang from Tsinghua University developed a multi-scale visual sensing system POST using a camera to achieve binocular, omnidirectional, and era panoramic imaging, providing navigation for robots. The panoramic vision system has the advantages of a large visual range and high accuracy of a binocular vision system, but the system configuration is complex and the cost is high.