PERCEPTION
The definition of AI is based on the nature of the problems it tackles, namely those for which humans currently outperform computers. Also , it includes cognitive tasks. A part from those two aspects, there are many other tasks(that also fall with in this realm) such as basic perceptual and motor skills in which even lower animals posses phenomenal capabilities compared to computers.
Perception involves interpreting sights, sounds, smells and touch. Action includes the ability to negative through the world and manipulate objects. If we want to build robots that live in the world, we must understand these processes. Figure 4.3 shows a design for a complete autonomous robot. Most of AI is concerned with only cognition, we will simply add sensors and effectors to them. But the problems in perception and action are substantial in their own right and are being tackled by researchers in the field of robotics.
In the past, robotics and AI have been largely independent endeavors , and they have developed different techniques to solve different problems. One key difference between AI programs and robots is that AI programs usually operate in computer-stimulated worlds, robots must operate in physical world. For example, in the case of moves in chess, an AI program can search millions of nodes in a game tree without ever having to sense or touch anything in the real world. A complete chess-playing robot, on the other hand , must be capable of grasping pieces, visually interpreting board positions, and carrying on a host of other actions. The distinction between real and simulated worlds has several implications as given below:
A design for an Autonomous Robot:
1. The input to an AI program is symbolic in form (example : a typed English sentence), whereas the input to a robot is typically an analog signal ,such as a two dimensional video image or a speech wave form.
2. Robots require special hardware for perceiving and affecting the world, while AI programs require only general-purpose computers.
3. Robot sensors are inaccurate, and their effector are limited in precision.
4. many robots must react in real time. A robot fighter plane, for example, cannot afford to search optimally or o stop monitoring the world during a LISP garbage collection.
5. the real world is unpredictable, dynamic, and uncertain. A root cannot hope to maintain a correct and complete description of the world. This means that a robot must consider the trade-off between devising and executing plans. This trade-off has several aspects. For one thing a robot may not possess enough information about the world for it to do any useful planning. In that case, it must first engage in information gathering activity . furthermore, once it begins executing a plan, the robot must continually the results of its actions. If the results are unexpected, then re-planning may be necessary.
6. Because robots must operate in the real world, searching and back tracking can be costly.
Recent years have seen efforts to integrate research in robotics and AI. The old idea of simply sensors and effectors to existing AI programs has given way to a serious rethinking of basic AI algorithms in light of the problems involved in dealing with the physical world. Research in robotics is likewise affected by AI techniques , since reasoning about goals and plans is essential for mapping perceptions onto appropriate actions.
At this point one might ask whether physical robots are necessary for research purposes. Since current AI programs already operate in simulated worlds, why not build more realistic simulations, which better model the real world? Such simulators do exist. There are several advantages to using a simulated world: Experiment can be conducted very rapidly, conditions can easily be replicated, programs can return to previous states at no cost, and sensory input can be treated no fragile, expensive mechanical parts. The major drawback to simulators is figuring out exactly which factors to build in. experience with real robots continue4s to expose tough problems that do not arise even in the most sophisticated simulators . the world turns out – not surprisingly to be an excellent model of itself, and a readily available one.
We perceive our environment through many channels: sight, sound, touch, smell, taste. Many animals processes these same perceptual capabilities , and others also able to monitor entirely different channels. Robots, too, can process visual and auditory information, and they can also equipped with more exotic sensors. Such as laser rangefinders, speedometers and radar.
Two extremely important sensory channels for human are vision and spoken language. It is through these two faculties that we gather almost all of the knowledge that drives our problem-solving behaviors.
Vision:Accurate machine vision opens up a new realm of computer applications. These applications include mobile robot navigation, complex manufacturing tasks analysis of satellite images, and medical image processing. The question is that how we can transform raw camera images into useful information about the world.
A Video Camera provides a computer with an image represented as a two-dimensional grid of intensity levels. Each grid element, or pixel, may store a single bit of information (that is , black/white) or many bits(perhaps a real-valued intensity measure and color information). A visual image is composed of thousands of pixels. What kinds of things might we want to do with such an image? Here are four operations, in order of increasing complexity:
1. Signal Processing:- Enhancing the image, either for human consumption or as input to another program.
2. Measurement Analysis:- For images containing a single object, determining the two-dimensional extent of the object depicted.
3. Pattern Recognition:- For single – object images, calssifying the object into a category drawn from a finite set of possibilities.
4. image Understanding :- For images containing many objects, locating the object in the image, classifying them, and building a three-dimensional model of the scene.
There are algorithms that perform the first two operations. The third operation, pattern recognition varies in its difficulty. It is possible to classify two-dimensional (2-D) objects, such as machine parts coming down a conveyor belt, but classifying 3-D objects is harder because of the large number of possible orientations for each object. Image understanding is the most difficult visual task, and it has been the subject of the most study in AI. While some aspects of image understanding reduce to measurement analysis and pattern recognition, the entire problem remains unsolved , because of difficulties that include the following:
1. An image is two-dimensional, while the world is three-dimensional some information is necessarily lost when an image is created.
2. One image may contain several objects, and some objects may partially occlude others.
3. The value of a single pixel is affected by many different phenomena, including the color of the object, the source of the light , the angale and distance of the camera, the pollution in the air, etc. it is hard to disentangle these effects.
As a result, 2-D images are highly ambiguous. Given a single image, we could construct any number of 3-D worlds that would give rise to the image . it is impossible to decide what 3-D solid it should portray. In order to determine the most likely interpretation of a scene , we have to apply several types of knowledge.
Speech Recognition: Natural Language understanding systems usually accept typed input, but for a number of applications this is not acceptable. Spoken language is a more natural form of communication in many human-computer interfaces. Speech recognition systems have been available for some time, but their limitations have prevented widespread used . Below are five major design issues in speech systems. These issues also provide dimensions along which systems can be compared with one another.
1. Speaker Dependence versus Speaker Independence :
A speaker –independent system can liten to any speakear and translate the sounds into written text. Speaker independence ishard to achieve because of the wide variations in pitch and accent. It is easier to build a speaker –dependent system, which can be trained on the voice