Is depth information and optical flow helpful for visual control?

Johannes Hansen; Marc Ebner

doi:10.1515/bams-2015-0044

Publicly Available Published by De Gruyter March 1, 2016

Is depth information and optical flow helpful for visual control?

Johannes Hansen and Marc Ebner

From the journal Bio-Algorithms and Med-Systems

https://doi.org/10.1515/bams-2015-0044

Abstract

The human visual system was shaped through natural evolution. We have used artificial evolution to investigate whether depth information and optical flow are helpful for visual control. Our experiments were carried out in simulation. The task was controlling a simulated racing car. We have used The Open Racing Car Simulator for our experiments. Genetic programming was used to evolve visual algorithms that transform input images (color, optical flow, or depth information) to control commands for a simulated racing car. We found that significantly better solutions were found when color, depth, and optical flow were available as input together compared with color, depth, or optical flow alone.

Keywords: depth map; genetic programming; optical flow; visual control

Introduction

With this contribution, we investigated whether depth and motion provide an evolutionary advantage compared to color alone. As a test environment, we have used The Open Racing Car Simulator (TORCS) [1]. Simulated evolution [2] was used to evolve control algorithms for the racing car. These algorithms use screen grabs from the racing car simulator. The screen grabs are processed using elementary computer vision operators. The output of these algorithms control the steering wheel as well as gas/brakes of the car [3]. OpenCV [4, 5], an open source library for computer vision, was used for image processing. Genetic programming [6–8] was used to evolve visual algorithms. We will see that significantly better solutions are found if color, depth, and optical flow are all available.

This article is structured as follows. The next section gives a brief introduction to the visual system. It is followed by the description of the racing car simulator, explanation how data from the simulator is used to compute optical flow, and a brief introduction to genetic programming. Related work on visual control using genetic programming is discussed as well as how genetic programming is used to evolve visual control algorithms. Finally, the study’s results and conclusions are given.

The visual system

The human visual system was shaped through natural evolution [9, 10]. Visual processing starts with light entering the eye. This light is measured by two different types of receptors inside the retina [11, 12]: rods and cones. The rods mediate vision when little light is available. They have a much higher sensitivity than the cones. Cones are in charge of color vision. Three different types of cones exist, which absorb light mainly in the red, green, and blue parts of the visual spectrum.

Some preprocessing occurs inside the retina. Information flows from the retinal receptors to the retinal ganglion cells. This information exits the eye at the blind spot, passes through the lateral geniculate nucleus, and then reaches the primary visual cortex or area V1. Area V1 is highly structured [13]. Cells within ocular dominance segments respond primarily to stimuli from one or the other eye. V1 contains columns with orientation-sensitive cells. Blobs with color or lightness-sensitive cells also exist. Visual information is analyzed using a retinotopic map with respect to different aspects. Indeed, the entire visual cortex is highly structured.

Color, shape, and motion appear to be processed by separate visual areas [14, 15]. Color is a product of the brain. It is processed in visual area V4. Shape is processed in V3, and motion is processed in V5. A dedicated area for face and object recognition also exists. It is also interesting that color and motion are not perceived synchronously. Moutoussis and Zeki [16] demonstrated that color is perceived earlier than motion. The brain appears to bind visual attributes that are perceived together.

TORCS

TORCS [1] is an open-source racing car three-dimensional (3D) simulation. A sample screenshot is shown in Figure 1. We have used this simulator as a test environment to evaluate whether depth information and optical flow are helpful for visual control. Currently, Berhard Wymann maintains the TORCS project. Its original creators were Eric Espié and Christophe Guionneau. The TORCS simulator provides several different racing tracks. A player can choose among different cars when playing the game. Several different opponents are available to race against. A split-screen mode is also available. Up to four human players are supported.

Figure 1:

The Open Racing Car Simulator.

Supported controls are a joystick, mouse, and keyboard. Some steering wheels are also supported. The game features realistic 3D graphics, lighting, smoke, and skid marks. Game physics include simulation of collisions, tire and wheel properties, and a simple damage model. It even includes a simple aerodynamic model with slip-streaming and ground effects. TORCS was used in several different scientific competitions [17, 18]. For these competitions, participants are developing artificial intelligence methods that drive the racing car along its track. Usually, a client-server architecture is used and competition participants develop a client that sends its control commands to the TORCS server.

We have modified TORCS slightly to use it for our experiments. We extract screen grabs from the graphics buffer. A dense depth map is extracted form the so-called Z-buffer of the graphics library [19], i.e. both color and depth are readily available form the graphics context. In addition to color and depth, we also provide optical flow to the visual control algorithms. How we compute optical flow is described in the next section.

Computing optical flow

Optical flow describes the pattern of motion of objects that are seen on the screen. This information is very helpful for visual control. We would like to estimate a dense flow field, i.e. optical flow for each image pixel. Let v(x, y, t)=(v_x, v_y) be the optical flow, estimated for image pixel (x, y) in an image taken at time t, then the image content in the vicinity of pixel (x, y) will be found at position (x+v_xΔt, y+v_yΔt) in an image taken at time t+Δt provided that the object is moving with constant velocity across the screen.

Several different methods have been developed to compute optical flow from visual input [20–23]. In recent years, the accuracy of the methods has improved considerably. Many methods for computing optical flow are based on partial derivatives. However, block-based methods are also used. Block-based methods search for pixels within an area of a given pixel in a subsequent image to determine image motion.

Estimating optical flow from two subsequent images is an expensive image operation. Therefore, we have used the depth map and the known motion of the race car to compute optical flow. The depth map is readily available from the graphics library. It is a by-product of rendering the scene. The depth map contains, for each image pixel, the distance from the object to the camera along the Z-axis. The motion of the race car is available directly from the race car simulator.

Let d(x, y)=d_x,y be the depth map of the current image shown on the screen. We assume that all screen coordinates are specified relative to the center of the screen. Let f be the focal length of the camera. The location of an object of the scene that is shown at image pixel (x, y) has coordinates (X_S, Y_S, Z_S) inside the camera coordinate system centered on the car driver.

(1)[XSYSZS]=dx,yf[xyf] (1)

The coordinates (X_S, Y_S, Y_S) are relative to the viewer sitting inside the car, i.e. these are eye coordinates.

Let R be the inverse 3×3 rotation matrix that describes the rotatory motion of the racing car from one time step of the simulation to the next. Let D be the inverse vector that describes the translatory motion of the racing car from one time step of the simulation to the next. Hence, after the racing car has moved, the point (X_S, Y_S, Z_S) will have moved to a location (X′S, Y′S, Z′S) relative to the eye of the driver.

(2)[X′SY′SZ′S]=R[XSYSZS]+D (2)

The coordinate (X′S, Y′S, Z′S) can be projected onto the screen using the known focal length of the camera. Let (x′, y′) be the screen coordinates of (X′S, Y′S, Z′S), then we have

(3)[x′y′]=f[X′S/Z′SY′S/Z′S] (3)

Optical flow can then be computed by subtracting the screen coordinate before the racing car has moved from the screen coordinate after the car has moved.

(4)[vxvy]=[x′y′]−[yy] (4)

As the depth map and the known motion of the racing car are correct, the optical flow will also be correct. Figure 2 shows the computed optical flow for a sample image.

It would be possible to compute optical flow directly from the input images using an algorithm based on partial derivatives or using block based methods. However, this would take considerably more computing resources and also would have the disadvantage that the estimated optical flow would not be 100% correct for all image pixels.

Figure 2:

Computing optical flow from game angine data.

(A) input image, (B) depth map, and (C) optical flow computed using the depth map and the known ego-motion of the car.

Next, we will describe genetic programming, which we have used to evolve visual control algorithms.

Genetic programming

Genetic programming [6–8] is an evolutionary algorithm. Evolutionary algorithms use simulated evolution to solve optimization problems. Such algorithms work with a population of individuals. Each individual represents a possible solution to the optimization problem. Darwinian selection is used to select above average individuals to create an offspring population, i.e. a new generation of individuals.

In genetic programming, individuals are represented as trees. The fitness of an individual describes how well this individual solves the given problem. The main operators of an evolutionary algorithms are selection, reproduction, and variation. Above-average individuals are selected to create offspring. For our experiments, we have used four genetic operators: reproduction, mutation, ephemeral random constant (ERC) mutation, and crossover. Each genetic operator is applied with a certain probability p_rep, p_mut, p_{ERC mut}, and p_cross respectively. These four probabilities sum to 1.

Individuals of the first generation are created using the so-called ramped half and half initialization [6]. To create an offspring, one genetic operator is randomly selected (using the four probabilities). The reproduction operator simply creates a copy of the genetic material of the individual, i.e. the tree. Mutation, ERC mutation, and crossover create offspring that are similar but not identical to their parents. Depending on the type of operator, one or two parents are selected from the population. Typically, tournament selection is used to select new parents. For tournament selection, n_T individuals are selected with uniform probability from the population. These n_T individuals form a tournament. The individual with highest fitness is the winner of the tournament and becomes a parent. This parent will then create offspring using one of the genetic operators.

The genetic operators are illustrated in Figure 3. Figure 3A shows the reproduction operator that is applied with probability p_rep. Figure 3B shows the mutation operator that is applied with probability p_mut. Figure 3C shows the ERC mutation operator that is applied with probability p_{ERC mut}. Figure 3D shows the crossover operator that is applied with probability p_cross.

Figure 3:

Genetic operators.

(A) reproduction, (B) mutation, (C) ERC mutation, and (D) crossover.

If the reproduction operator is applied, then a copy of the parent individual is created. Next, a node of the tree is randomly selected. Internal nodes are selected with probability 0.9, whereas external nodes are selected with probability 0.1. Finally, the selected node is replaced with a randomly generated sub-tree. The method that is used to create this sub-tree is the same method that is used to create the individuals of the first generation. For ERC-mutation, a single node which contains a so called ephemeral random constant (ERC) is selected. All ERCs located within this subtree are mutated. We originally intended to use Gaussian mutation (like an evolution strategy [24]) to slightly alter this ERC value, i.e. v:=v·e^0.01z where v is the original value of the constant and z is a normally distributed random value with mean 0 and standard deviation 1. However, we have actually used v:=v·0.01 which pulls the ERC towards zero and may also change the sign of the constant. For crossover, two parent individuals are selected. Next, two nodes are randomly selected (one for each tree). Again, internal nodes are selected with probability 0.9, whereas external nodes are selected with probability 0.1. Then the two sub-trees (together with the selected nodes) are exchanged between the two individuals. Trees are limited to a maximum depth of 17.

Whenever an offspring is created, it is inserted into the next generation of individuals. The process of selecting a genetic operator and creating offspring continues until the next generation is filled. Usually, an evolutionary run is terminated after a certain number of generations have been created. The individual with highest fitness that was found during all these generations is the solution that solves our problem best.

For our experiments, we have used the Evolutionary Computation Library ECJ developed by Luke [25]. Development of ECJ started in 1998 and is a mature library for evolutionary computation.

Visual control using genetic programming

Our racing car is controlled through visual input alone. We only use the images obtained from screen grabs and the optical flow. Data from the game engine is not used to control the car. It is only used to compute optical flow as described above.

Genetic programming has been used by Winkeler and Manjunath [26] for object detection. Johnson et al. [27] used it to evolve visual routines. Ebner and Tiede [28] have previously evolved controllers for TORCS using genetic programming. However, for this work, input was taken directly from the game engine and not from screen grabs. Koutník et al. [29] have used an evolutionary algorithm to evolve compressed encodings of a recursive neural network to control a racing car in TORCS. Tanev and Shimohara [30, 31] have used a genetic algorithm to evolve parameters that will control an actual scale model of a car using visual input from an overhead camera.

Other researchers have used Atari video games for training game players [32]. Hausknecht et al. [33, 34] evaluated neuro-evolutionary methods for general game playing of Atari video games. They found that HyperNEAT was the only neuro-evolution algorithm able to play based on raw-pixel input from the games. Mnih et al. [35, 36] created a deep neural network that was trained using reinforcement learning. It was able to achieve a level comparable to human players. Deep learning in combination with Monte-Carlo tree search planning was used by Guo et al. [37]. Parker and Bryant [38, 39] evolved controllers for Quake II, which used only visual input.

Materials and methods

We are using strongly typed genetic programming [40] to evolve two trees. The first tree is used to control the steering wheel. The second tree is used to control the velocity of the racing car. The terminal symbols are shown in Table 1. We work with two return types: float and image. The only terminal symbol returning a floating point value is an ERC. An ERC is a random floating point value from the range [0, 1]. Once a node with an ERC is created, it stays constant throughout the life of the node. However, it may be modified by the ERC mutation operator.

Table 1

Terminal symbols.

Name	Return type	Description
`ERC`	Float	ERC in the range [0, 1]
`imageR`	Image	Input image (red channel)
`imageG`	Image	Input image (green channel)
`imageB`	Image	Input image (blue channel)
`imageC`	Image	Input image (cyan channel)
`imageM`	Image	Input image (magenta channel)
`imageY`	Image	Input image (yellow channel)
`imageGray`	Image	Input image (gray channel, average RGB)
`depthMap`	Image	Input image (depth map)
`opticalFlowX`	Image	Optical flow (horizontal component)
`opticalFlowY`	Image	Optical flow (vertical component)

The remaining terminal symbols provide access to visual information obtained via screen grabs from the game engine. This screen grab is scaled down to one third of its original size. All pixel values are transformed to the range [0, 1]. All terminal symbols returning image data provide single band images: red channel (imageR), green channel (imageG), blue channel (imageB), cyan channel (imageC), magenta channel (imageM), yellow channel (imageY), and gray channel (imageGray). The depth map of this input image is available through the terminal symbol (depthMap). Optical flow is computed using the depth map and the known ego-motion of the car, which is available from the game engine. As optical flow is a two-dimensional vector, the x-component of this vector is made available through the terminal opticalFlowX, and the y-component is made available through the terminal opticalFlowY. All image data is downscaled to one third of this size of the original image. Pixel values are scaled to the range [0, 1].

The set of elementary functions is shown in Tables 2 and 3. Table 2 shows elementary functions that return a floating point value. Table 3 shows elementary functions that return an entire image. We have used standard arithmetic functions such as addition and multiplication, computation of minimum and maximum. We have also included functions that search for maximum and minimum values inside the image. These functions return either the x or the y coordinate of the position where the extremum was found. A Gaussian filter is also available. If we apply a Gaussian filter to a gray-scale input image and then a function that locates the maximum, we can locate the brightest point in the image. The function extractNAME extracts smaller regions from the input image. The size of the region is one third of the image. The location of the region can be specified through the parameters of the function. All bands (see terminal symbols) are available for this extraction operation.

Table 2

Elementary functions.

Name	Output type	Description
`abs(float v)`	Float	Absolute value, o=\|v\|
`round(float v)`	Float	Round function, o=round(v)
`floor(float v)`	Float	Floor function, o=⌊v⌋
`ceil(float v)`	Float	Ceil function, o=⌈v⌉
`neg(float v)`	Float	Negate input, o=–v
`sqrt(float v)`	Float	Square root, o=v
`minLocX(Image c)`	Float	x-Coordinate (range [0, 1]) of minimum c(x, y)
`minLocY(Image c)`	Float	y-Coordinate (range [0, 1]) of minimum c(x, y)
`maxLocX(Image c)`	Float	x-Coordinate (range [0, 1]) of maximum c(x, y)
`maxLocY(Image c)`	Float	y-Coordinate (range [0, 1]) of maximum c(x, y)
`avg(Image c)`	Float	Average value of all pixels, o=∑x,yc(x, y)
`min(float a,float b)`	Float	Minimum value, o=(a<b)?b:a
`max(float a,float b)`	Float	Maximum value, o=(a>b)?b:a
`q-quantile(Image c,float q)`	Float	q-quantile of the image
`add(float a,float a)`	Float	Addition, o=a+b
`mult(float a,float b)`	Float	Multiplication, o=a·b

The return value of the node is o.

Table 3

Elementary functions.

Name	Output type	Description
`abs(Image c)`	Image	Absolute value o(x, y)=\|c(x, y)\|
`sqrt(Image c)`	Image	Square root, o(x, y)=c(x, y)
`min(Image c)`	Image	Minimum value, o(x, y)=min_x,yc(x, y)
`max(Image c)`	Image	Maximum value, o(x, y)=max_x,yc(x, y)
`add(image a,image b)`	Image	Addition, o(x, y)=a(x, y)+b(x, y)
`constImage(float v)`	Image	Constant image, o(x, y)=v
`invert(Image c)`	Image	Image, o(x, y)=max–c(x, y), where max is the maximum value of all image pixels
`gauss(float v,Image c)`	Image	Gaussian filter with kernel e−x2/(2σ2), where σ=0.3(\|v\|–1)+0.8
`median(float v,Image c)`	Image	Median filter with size [2\|v\|+1]
`binary(Image c,float v)`	Image	Binary threshold, o(x, y)=(c(x, y)>v)?1:0
`clamp(Image c,float v)`	Image	clamp value, o(x, y)=(c(x, y)>v)?v:c(x, y)
`thresholdPass(Image c,float v)`	Image	Threshold o(x, y)=(c(x, y)>v)?c(x, y):0
`thresholdZero(Image c,float v)`	Image	Threshold o(x, y)=(c(x, y)>v)?0:c(x, y)
`avgThreshold(Image c)`	Image	Average threshold, o(x, y)=(c(x, y)>a)?1:0 with a=∑x,yc(x, y).
`localThreshold(Image c)`	Image	Local average threshold, o(x, y)=(c(x, y)>a(x, y))?1:0 where a(x, y) is obtained by convolving the input image with a Gaussian kernel e−x2/(2σ2) with standard deviation σ=1.1
`extractNAME(float s,float t)`	Image	10 variants of this elementary function exist, with `NAME ∈{R, G, B C, M, Y, Gray, DeathMap, OpticalFlowX, OpticalFlowY}.` The parameters (s, t) specify the upper left corner of a rectangular area that is extracted from the current input image (as specified by NAME). The width and height of this area is one third of the input image.

The return value is o(x, y) for each pixel (x, y) of the output image.

This functions is very useful to control the steering wheel of the car. This is illustrated in Figure 4. Suppose we extract a region from the depth map from the left-hand side of the image and we extract another region from the depth map from the right-hand side of the image. If we apply the average function, which computes the average depth within these two areas, then we can compare both average depths to control the steering wheel. This simple control algorithm will drive the car in the direction where more space is available.

Figure 4:

Extracting sub-regions from the visual input is helpful for visual control.

(A) Sample tree which evaluates information from the depth map, (B) input image, and (C) depth map with regions.

We have carried out four sets of experiments to evaluate whether depth information is helpful in controlling the racing car. For our experiments, we used the same basic set of elementary functions but varied the input information that was made available to the evolved individuals. For experiment A, only color information was available. For experiment B, only the depth map was available. For experiment C, only optical flow was provided. Finally, for experiment D, all of the visual information (color, depth, and optical flow) was provided. The terminal symbols and elementary functions for the four experiments are shown in Table 4. Note that some of the functions listed under “All” accept both floating point values and images as input.

Table 4

Terminal symbols and elementary functions used for our experiments.

Experiment	Terminal symbols and elementary functions
A	`imageR, imageG, imageB, imageC, imageM, imageY, imageGray, extract{R,G,B,C,M,Y,Gray}`
B	`imageDepthMap, extractDepthMap`
C	`opticalFlowX, opticalFlowY, extractOpticalFlowX extractOpticalFlowY`
D	`imageR, imageG, imageB, imageC, imageM, imageY, imageGray, extract{R,G,B,C,M,Y,Gray}, imageDepthMap, extractDepthMap, opticalFlowX, opticalFlowY, extractOpticalFlowX extractOpticalFlowY`
All	`ERC, abs, round, floor, ceil, neg, sqrt, minLocX, minLocY, maxLocX, maxLocY, avg, min, max, q-quantile, add, mult, constImage, invert, gauss, median, binary, clamp, thresholdPass, thresholdZero, avgThreshold, localThreshold`

The track that we have used for all of our experiments is shown in Figure 5. The red arrow illustrates the direction in which the race will start. A path taken by an evolved individual is shown overlaid on this track (green line). The end of this path is marked with a green cross. At this point, the evolved driver lost control of its car and crashed into the border of the track.

Figure 5:

Track used for our experiments.

(A) Track and (B) path taken by an evolved individual (green line). Driving direction (red arrow).

The task is to evolve visual controllers that will drive the car along the track. Therefore, fitness is computed by considering distance traveled along the track. In addition, the damage attained is also used for the fitness computation. Individuals that stay away from the border of track and manage to avoid damage will receive higher fitness values. Let d be the distance traveled along the track (in meters). Let a be the amount of damage attained (with range [0, 1], where 1 is a completely damaged car). Then fitness f_i of individual i is given as

(5)fi={−50if controller is disqualifiedd⋅(1−a)2not disqualified,only single sided steeringmax{2d⋅(1−a)2,0}not disqualified,steering toward left and right. (5)

A controller is disqualified if it (a) drives along the track in the wrong direction or (b) does not use the steering wheel, i.e. the first tree returns a constant value or (c) does not use the gas pedal, i.e. the second tree returns a constant value. If the controller is disqualified, it will be penalized with a fitness value of –50. Otherwise, it will receive a fitness of d·(1–a)². This fitness value is doubled if the controller turns the steering wheel to the left as well as to the right.

For each experiment, 10 runs with different initializations of the random number generator were performed. For each run, a population of 200 individuals was evolved for 99 generations. Individuals were selected using tournament selection with n_T=7. Crossover was applied with probability p_cross=0.2; mutation was applied with probability p_mut=0.4; ERC mutation was applied with probability p_ERC–mut=0.3; reproduction was applied with probability p_rep=0.1. Ramped half and half initialization was used to initialize the individuals of the first generation with depth ranging from 2 to 6.

Results

Figure 6 shows the best fitness values obtained for all four experiments. By far the highest fitness values were reached when only depth information was available. It is clear (as we have described above) that this type of information is helpful in controlling the car. Optical flow also seemed to be helpful. Indeed, it is well known that bees use optical flow to achieve centering behavior [41, 42]. This ability (comparison of lateral optical flow) has also been used for visual control in robotics [43, 44].

Figure 6:

Best fitness values obtained for all four experiments.

Ten runs were conducted for each experiment. Depth information seems to provide an evolutionary advantage. For some of the runs, it produced exceptionally high-fitness individuals. Optical flow also seems to provide an evolutionary advantage.

Figure 7 shows the average best fitness for all four experiments. Using only color information resulted, on average, in less average best fitness compared to using depth information, optical flow, or all three after 99 generations. Average best fitness at generation 99 is summarized in Table 5. This table also shows the overall best fitness obtained in all of the 40 runs (10 per experiment) that we have carried out.

Figure 7:

Average best fitness for all four experiments.

Table 5

Experimental results for all four experiments.

Experiment	Overall best fitness	Average best fitness f
A	312.8	168.4
B	1978.6	636.2
C	1129.7	341.2
D	875.0	345.4

Overall best fitness is the best fitness obtained over all 10 runs for each experiment. The average best fitness obtained in generation 99 is shown in the last column.

We have used a Mann-Whitney U test to compare these averages, as shown in Table 6. We confirmed that the maximum fitness at the end of 99 generations was not normally distributed using a Kolmogorov-Smirnov test. Results, when using depth information or optical flow, were not significantly better compared to when only visual information was used. Average fitness improved considerably when depth information or optical flow was used for visual control. However, the difference was not significantly different because the Mann-Whitney U test compares results on an ordinal scale.

Table 6

Comparison of average best fitness in generation 99 using the Mann-Whitney U test.

Hypothesis	p-Value
H₀:f_A=f_B, H₁:f_A<f_B	0.30
H₀:f_A=f_C, H₁:f_A<f_C	0.65
H₀:f_A=f_D, H₁:f_A<f_D	0.02

Significant differences are shown in bold face.

Only when all visual data (color, depth, and optical flow) is combined do we get significantly better results than when either visual input is used alone. Interestingly, the human brain evaluates its visual input with respect to color (in V4) and motion (in V5) [11, 45]. Ocular dominance columns are found in V1. Depth information can be computed from disparity information, i.e. a lateral offset of visual information between the two eyes [46]. The human visual system somehow combines this visual information in higher areas to achieve visual control.

Conclusion

We have used an evolutionary algorithm (genetic programming) to evolve image-processing algorithms to control a racing car. These algorithms are able to process various types of visual information: color, depth information, or optical flow. Each individual consists of two trees. The first tree is used to control the steering wheel and the second tree is used to control the acceleration of the car. Several elementary visual operations such as Gaussian smoothing, addition, multiplication, threshold, or locating a maximum or minimum response are also provided. These are all elementary operations that can be performed easily by a network of spiking neurons. In our experiments, we found that significantly better results in driving a racing car along its track are obtained when color, depth, and optical flow are provided together.

Corresponding author: Marc Ebner, Institut für Mathematik und Informatik, Ernst-Moritz-Arndt-Universität Greifswald, Walther-Rathenau-Straße 47, 17487 Greifswald, Germany, Tel.: +49-3834-86-4646, Fax: +49-3834-86-4640, E-mail: marc.ebner@uni-greifswald.de

References

1. Wymann B, Dimitrakakis C, Sumner A, Espié E, Guinneau C. TORCS, The Open Racing Car Simulator. Available at: http://www.torcs.org. Accessed: Mar 2015.Search in Google Scholar

2. Holland JH. Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. Cambridge, MA: MIT Press, 1992.10.7551/mitpress/1090.001.0001Search in Google Scholar

3. J. Hansen. Visuelle Steuerung eines simulierten Rennfahrzeugs mit Hilfe von genetischer Programmierung. Master’s thesis, Ernst Moritz Arndt Universita? Greifswald. Faculty of Mathematics and Natural Sciences, Greifswald, Germany, Dec. 2014.Search in Google Scholar

4. Bradski G. The OpenCV library. Dr Dobb’s J Software Tools 2000 Nov.Search in Google Scholar

5. Pulli K, Baksheev A, Kornyakov K, Eruhimov V. Real-time computer vision with OpenCV. Commun ACM 2012;55:61–9.10.1145/2184319.2184337Search in Google Scholar

6. Koza JR. Genetic programming. On the programming of computers by means of natural selection. Cambridge, MA: MIT Press, 1992.Search in Google Scholar

7. Koza JR. Genetic programming II. Automatic discovery of reusable programs. Cambridge, MA: MIT Press, 1994.Search in Google Scholar

8. Banzhaf W, Nordin P, Keller RE, Francone FD. Genetic programming – an introduction: on the automatic evolution of computer programs and its applications. San Francisco, CA: Morgan Kaufmann, 1998.10.1007/BFb0055923Search in Google Scholar

9. Darwin C. The origin of species. Edited with an introduction by Gillian Beer. Oxford: Oxford University Press, 1996.Search in Google Scholar

10. Nilsson D-E, Pelger S. A pessimistic estimate of the time required for an eye to evolve. Proc R Soc Lond B 1994;256:53–8.10.1098/rspb.1994.0048Search in Google Scholar PubMed

11. Tovée MJ. An introduction to the visual system. Cambridge: Cambridge University Press, 1996.Search in Google Scholar

12. Dowling JE. The retina: an approachable part of the brain. Cambridge, MA: Belknap Press of Harvard University Press, 1987.Search in Google Scholar

13. Livingstone MS, Hubel DH. Anatomy and physiology of a color system in the primate visual cortex. J Neurosci 1984;4: 309–56.10.1523/JNEUROSCI.04-01-00309.1984Search in Google Scholar

14. Zeki SM. Review article: functional specialisation in the visual cortex of the rhesus monkey. Nature 1978;274:423–8.10.1038/274423a0Search in Google Scholar PubMed

15. Zeki S. Inner vision. An exploration of art and the brain. Oxford: Oxford University Press, 1999.Search in Google Scholar

16. Moutoussis K, Zeki S. A direct demonstration of perceptual asynchrony in vision. Proc R Soc Lond B 1997;264:393–9.10.1098/rspb.1997.0056Search in Google Scholar PubMed PubMed Central

17. Loiacono D, Cardamone L, Lanzi PL. Simulated car racing championship. Competition software manual. Available at: http://arxiv.org/abs/1304.1672. Accessed: Apr 2013.Search in Google Scholar

18. Loiacono D, Togelius J, Lanzi PL, Kinnaird-Heether L, Lucas SM, Simmerson M, et al. The WCCI 2008 simulated car racing competition. In: Proceedings of the 2008 IEEE Symposium on Computational Intelligence and Games, Perth, Australia, December 15–18. Piscataway, NJ: IEEE, 2008.Search in Google Scholar

19. Wright Jr RS, Haemel N, Sellers G, Lipchak B. OpenGL SuperBible. Comprehensive tutorial and reference, 5th ed. Upper Saddle River, NJ: Addison-Wesley, 2011.Search in Google Scholar

20. Horn BK. Robot vision. Cambridge, MA: MIT Press, 1986.Search in Google Scholar

21. Bülthoff H, Little J, Poggio T. A parallel algorithm for real-time computation of optical flow. Nature 1989;337:549–53.10.1038/337549a0Search in Google Scholar PubMed

22. Wang H, Brady M, Page I. A fast algorithm for computing optic flow and its implementation on a transputer array. In: Zisserman A, editor. Proceedings of the British Machine Vision Conference. Oxford: British Machine Vision Association, 1990:175–80.Search in Google Scholar

23. Brox T, Bruhn A, Papenberg N, Weickert J. High accuracy optical flow estimation based on a theory for warping. In: Pajdla T, Matas J, editors. Proceedings of the 8th European Conference on Computer Vision, Part IV, Prague, Czech Republic, May 2004. Berlin: Springer-Verlag, 2004:25–36.Search in Google Scholar

24. I. Rechenberg, Evolutionsstrategie ’94. Stuttgart: frommann-holzboog, 1994.Search in Google Scholar

25. Luke S. The ECJ owner’s manual. A user manual for the ECJ Evolutionary Computation Library, 2015. Available at: https://cs.gmu.edu/~eclab/projects/ecj/docs/manual/manual.pdfSearch in Google Scholar

26. Winkeler JF, Manjunath BS. Genetic programming for object detection. In: Koza JR, Deb K, Dorigo M, Fogel DB, Garzon M, Iba H, Riolo RL, editors. Genetic Programming 1997, Proceedings of the Second Annual Conference, Stanford University, July 13–16, 1997. San Francisco, CA: Morgan Kaufmann, 1997:330–5.Search in Google Scholar

27. Johnson MP, Maes P, Darrell T. Evolving visual routines. In: Brooks RA, Maes P, editors. Artificial Life IV, Proceedings of the Fourth International Workshop on the Synthesis and Simulation of Living Systems. Cambridge, MA: MIT Press, 1994:198–209.Search in Google Scholar

28. Ebner M, Tiede T. Evolving driving controllers using genetic programming. In: IEEE Symposium on Computational Intelligence & Games, Politecnico di Milano, Milano, Italy, September 7–10. Piscataway, NJ: IEEE, 2009:279–86.Search in Google Scholar

29. Koutník J, Cuccu G, Schmidbuber J, Gomez F. Evolving large-scale neural networks for vision-based reinforcement learning. In: Proceedings of the Genetic and Evolutionary Computation Conference, Amsterdam, The Netherlands, July 6–10, 2013. ACM, NY, 2001.10.1145/2463372.2463509Search in Google Scholar

30. Tanev IT, Shimohara K. On human competitiveness of the evolved agent operating a scale model of a car. In: Proceedings of the IEEE Congress on Evolutionary Computation, Singapore, September 25–28, 2007. Piscataway, NJ: IEEE, 2007:3646–53.Search in Google Scholar

31. Tanev I, Shimohara K. Evolution of agent, remotely operating a scale model of a car through a latent video feedback. J Intell Robot Syst 2008;52:263–83.10.1007/s10846-008-9212-ySearch in Google Scholar

32. Bellemare MG, Naddaf Y, Veness J, Bowling M. The arcade learning environment: an evaluation platform for general agents (extended abstract). In: Proceedings of the 24th International Joint Conference on Artificial Intelligence, 2015:4148–52.Search in Google Scholar

33. Hausknecht M, Lehman J, Miikkulainen R, Stone P. A neuroevolution approach to general Atari game playing. IEEE Trans Comput Intell AI Games 2014;6:355–66.10.1109/TCIAIG.2013.2294713Search in Google Scholar

34. Hausknecht M, Khandelwal P, Miikkulainen R, Stone P. Hyperneat-ggp: a hyperneat-based Atari general game player. In: Proceedings of the Genetic and Evolutionary Computation Conference, Philadelphia, PA, July 7–11, 2012.Search in Google Scholar

35. Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, et al. Human-level control through deep reinforcement learning. Nature 2015;518:529–33.10.1038/nature14236Search in Google Scholar

36. Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, et al. Playing Atari with deep reinforcement learning. In: NIPS Deep Learning Workshop, 2013.Search in Google Scholar

37. Guo X, Singh S, Lee H, Lewis R, Wang X. Deep learning for real-time Atari game play using offline Monte-Carlo tree search planning. In: Ghahramani Z, Welling M, Cortes C, Lawrence N, Weinberger K, editors. Adv Neural Inf Process Syst 27. Curran Associates, 2014:3338–46.Search in Google Scholar

38. Parker M, Bryant BD. Visual control in quake ii with a cyclic controller. In: Hingston P, Barone L, editors. Proceedings of the 2008 IEEE Symposium on Computational Intelligence and Games. Piscataway, NJ: IEEE Press, 2008:151–8.Search in Google Scholar

39. Parker M, Bryant BD. Visual control in quake ii with a cyclic controller. In: Proceedings of the 2009 IEEE Symposium on Computational Intelligence and Games. Piscataway, NJ: IEEE Press, 2009:287–93.Search in Google Scholar

40. Montana DJ. Strongly typed genetic programming. Evol Comput 1995;3:199–230.10.1162/evco.1995.3.2.199Search in Google Scholar

41. Srinivasan MV. How bees exploit optic flow: behavioural experiments and neural models. Philos Trans R Soc Lond B 1992;337:253–9.10.1098/rstb.1992.0103Search in Google Scholar

42. Srinivasan MV. Distance perception in insects. Curr Direct Psychol Sci 1992;1:22–6.10.1111/1467-8721.ep10767830Search in Google Scholar

43. Santos-Victor J, Sandini G, Curotto F, Garibaldi S. Divergent stereo for robot navigation: learning from bees. In: Proceedings of Computer Vision and Pattern Recognition, New York, 1993:434–9.Search in Google Scholar

44. Ebner M, Zell A. Centering behavior with a mobile robot using monocular foveated vision. Robot Auton Syst 2000;32:207–18.10.1016/S0921-8890(99)00127-XSearch in Google Scholar

45. Zeki S. A vision of the brain. Oxford: Blackwell Science, 1993.Search in Google Scholar

46. Arndt PA, Mallot HA, Bülthoff HH. Human stereovision without localized image features. Biol Cybern 1995;72:279–93.10.1007/BF00202784Search in Google Scholar PubMed

Received: 2015-12-3

Accepted: 2016-1-25

Published Online: 2016-3-1

Published in Print: 2016-3-1

Is depth information and optical flow helpful for visual control?

Abstract

Introduction

The visual system

TORCS

Computing optical flow

Genetic programming

Visual control using genetic programming

Materials and methods

Results

Conclusion

References

Journal and Issue

Articles in the same Issue