Neural architecture search for image saliency fusion
Introduction
According to [1], “Visual salience (or visual saliency) is the distinct subjective perceptual quality which makes some items in the world stand out from their neighbors and immediately grab our attention”. The human vision system is able to efficiently detect salient areas in a scene and further process them to extract high-level information [2], [3]. Visual saliency has been primarily studied by neuroscientists, cognitive scientists and recently has received attention from other research communities working in the fields of computer vision, computer graphics and multimedia e.g. [4]. In the area of multimedia and computer vision, visual saliency can be used to emphasize object-level regions in the scene that can serve as a pre-processing step for scene recognition [5], [6], object detection [7], [8], segmentation [9], and tracking [10]. It can also be exploited for image manipulation and visualization in applications such as image retargeting [11], image collage [12], and non-photorealistic rendering [13]. Moreover, in multimedia application saliency can be exploited for image and video summarization [14], [15], [16], enhancement [17], retrieval [18], and image quality or aesthetic assessment [19], [20].
Saliency detection methods can be divided into two categories: bottom-up and top-down. Bottom-up methods are stimuli-driven [21]. The saliency is usually modeled by local or global contrast on hand-crafted visual features and knowledge about human visual attention is embedded in the model exploiting some heuristic priors such as background [22], compactness [23], or objectness [24]. With these methods no explicit information about the semantics of the salient regions is provided but it is indirectly embedded via prior assumptions that are made on the location, shape or visual properties of the salient regions to be detected. Bottom-up methods can be considered general purpose.
Top-down saliency methods are designed to find regions in the images that are relevant for a given task. They are often also referred to as task-driven approaches. These methods usually formulate the saliency detection as a supervised learning problem [25]. The rationale of top-down saliency methods is to identify image regions that belong to a pre-defined object category [26]. For this reason, these methods are theoretically more robust for identifying salient regions in cluttered backgrounds where bottom-up methods may fail. Top-down approaches rely on the use of training data to build the detection model. They can be very robust for the specific task on which they are trained but may not generalize well to other tasks.
In order to make the detection more robust and to improve the generalization capabilities, saliency methods often integrate different features [27] that can be both hand-crafted or learned by Convolutional Neural Networks (CNNs) [28], [29], [30], or fuse saliency maps generated from different methods [31]. However, the feature definition and selection, and the combination strategies are usually empirically designed.
Since multiple observers may consider salient different regions in the scene depending on the scene context and/or on the observer’s cultural background, saliency detection is an ill-posed problem [22], [32]. Saliency detection methods proposed in the literature exploit different rationales, visual clues, and assumptions but as demonstrated by the experiments in [33], there is no best overall saliency detection algorithm that is able to achieve good results on all the different benchmark datasets.
In our previous works [34], [35], we have exploited genetic programming (GP) to build the rationale with which to combine the binary outputs of several change detection algorithms. By using a-priori defined unary, binary and n-ary operators, the GP approach automatically combined the inputs using the provided operators and built an optimal, task-driven, solution (i.e. program) in the form of a hierarchical tree structure.
In this work we want to further investigate and extend this approach to combine graylevel saliency maps, a domain we first addressed in [36]. We first create a candidate solution for combining the saliency maps using GP with a set of operations whose parameters are a-priori fixed. To further improve this solution, we should also tune these parameters, but they cannot be easily (or efficiently) optimized within the GP framework. In order to optimize the parameters, we use the candidate solution obtained by the GP as a blueprint upon which to design the architecture of a backbone Convolutional Neural Network. Within the CNN optimization framework, it is now easier and much more efficient to search for the optimal parameters of the operations of the GP solution. Another important advantage of the implementation of the backbone CNN is that the proposed solutions can be evaluated and then we can easily and safely create deeper variants of the CNN by including other operations (e.g. post-processing) on intermediate results. These operations, initialized as identities, are further optimized or can be completely ignored by the CNN during training.
The extensive experiments on benchmark datasets, both qualitative and quantitative, validate the effectiveness of the proposed fusion strategy.
Finally, beyond the focus on saliency estimation for the scope of this paper, the proposed information fusion technique can be considered a general purpose method, with possible applications to other fields such as change detection [35] and semantic segmentation [37].
Section snippets
Saliency detection algorithms
Borji et al. [33] benchmarked 41 different saliency detection algorithm each based on different assumptions and heuristics. For example, Li et al. [38] compute saliency from the perspective of image reconstruction error of background images generated at different level of details. A graph-based approach is used instead by Yang et al. [39]. Again, superpixels are the base for the saliency computation. Foreground and background region queries are used to rank each image regions using a
Proposed method
Our proposed saliency estimation approach aims at combining the advantages of Genetic Programming with those of Convolutional Neural Networks. With our approach, we design and optimize GP-generated solutions for saliency estimation in three steps. In the first step, Genetic Programming techniques are exploited to combine existing saliency maps using a set of provided operations. The output of this step is a fusion tree that encodes the optimal fusion strategy with respect to the defined
Experiments
In this section, we first describe the experimental setup, by introducing the input saliency estimation algorithms, the datasets that have been adopted at different phases of the optimization, and the evaluation metrics. We then present the following experiments: we select different fusion trees from the Genetic Programming phase, generate the corresponding CNNs, and evaluate them on various datasets for a comparison with the input algorithms.
Conclusions
We have proposed a general purpose neural architecture search strategy, with a focus on the estimation of image saliency. Specifically, we have devised a three-step optimization process that combines the output of existing algorithms for saliency estimation.
First, a fusion tree is generated through genetic programming, working on a set of predefined operators. The discrete search space of the operators to be used and combined is efficiently handled by the evolutionary algorithm. This initial
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan X Pascal GPU used for this research. The research leading to these results has received funding from TEINVEIN: TEcnologie INnovative per i VEicoli Intelligenti, CUP (Codice Unico Progetto – Unique Project Code): E96D17000110009 – Call “Accordi per la Ricerca e l’Innovazione”, cofunded by POR FESR 2014–2020 (Programma Operativo Regionale, Fondo Europeo di Sviluppo Regionale – Regional Operational
References (80)
- et al.
Context proposals for saliency detection
Comput. Vis. Image Underst.
(2018) - et al.
Evolving deep neural networks
Artificial Intelligence in the Age of Neural Networks and Brain Computing
(2019) Visual saliency
Scholarpedia
(2007)- et al.
Controlled and automatic human information processing: ii. perceptual learning, automatic attending and a general theory.
Psychol. Rev.
(1977) - et al.
Controlled and automatic human information processing: I. Detection, search, and attention.
Psychol. Rev.
(1977) - et al.
Computational modelling of visual attention
Nat. Rev. Neurosci.
(2001) - et al.
Discriminant saliency, the detection of suspicious coincidences, and applications to visual recognition
IEEE Trans. Pattern Anal. Mach. Intell.
(2009) - et al.
Region-based saliency detection and its application in object recognition.
IEEE Trans. Circuits Syst. Video Technol.
(2014) - et al.
Robust object detection at regions of interest with an application in ball recognition
Proceedings of the IEEE International Conference on Robotics and Automation
(2005) - et al.
An integrated model of top-down and bottom-up attention for optimizing detection speed
Proceedings of the 2006 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
(2006)
Saliency based image segmentation
Proceedings of the 2011 International Conference on Multimedia Technology
Saliency-based discriminant tracking
Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition
Seam carving for content-aware image resizing
ACM Trans. Graph.
Saliency for image manipulation
Vis. Comput.
Stylization and abstraction of photographs
ACM Trans. Graph.
Adaptive color image compression based on visual attention
Proceedings of the 11th International Conference on Image Analysis and Processing (ICIAP)
Video summarization using a neurodynamical model of visual attention
Proceedings of the 6th IEEE Workshop on Multimedia Signal Processing
A novel multiresolution spatiotemporal saliency detection model and its applications in image and video compression
IEEE Trans. Image Process.
Low quality image enhancement using visual attention
Opt. Eng.
Database saliency for fast image retrieval
IEEE Trans. Multimed.
Color image quality assessment combining saliency and FSIM
Proceedings of the Fifth International Conference on Digital Image Processing (ICDIP 2013)
Saliency retargeting: an approach to enhance image aesthetics
Proceedings of the 2011 IEEE Workshop on Applications of Computer Vision (WACV)
A model of saliency-based visual attention for rapid scene analysis
IEEE Trans. Pattern Anal. Mach. Intell.
Geodesic saliency using background priors
Proceedings of the European Conference On Computer vision
Saliency filters: contrast based filtering for salient region detection
Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
The secrets of salient object segmentation
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Salient object detection: a discriminative regional feature integration approach
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Top-down saliency with locality-constrained contextual sparse coding.
Proceedings of the 2015 BMVC
Learning to detect a salient object
IEEE Trans. Pattern Anal. Mach. Intell.
Deep contrast learning for salient object detection
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Deeply supervised salient object detection with short connections
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Multiscale fully convolutional network for image saliency
J. Electron. Imaging
Saliency aggregation: a data-driven approach
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Revisiting salient object detection: simultaneous detection, ranking, and subitizing of multiple salient objects
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Salient object detection: a benchmark
IEEE Trans. Image Process.
How far can you get by combining change detection algorithms?
Proceedings of the International Conference on Image Analysis and Processing – ICIAP 2017
Combination of video change detection algorithms by genetic programming
IEEE Trans. Evol. Comput.
Combining saliency estimation methods
Proceedings of the International Conference on Image Analysis and Processing – ICIAP 2019
A CNN architecture for efficient semantic segmentation of street scenes
Proceedings of the 8th IEEE International Conference on Consumer Electronics – Berlin (ICCE-Berlin)
Saliency detection via dense and sparse reconstruction
Proceedings of the 2013 IEEE International Conference on Computer Vision
Cited by (22)
Neuroevolution with box mutation: An adaptive and modular framework for evolving deep neural networks
2023, Applied Soft ComputingAutomatic design of machine learning via evolutionary computation: A survey
2023, Applied Soft ComputingAutoTinyML for microcontrollers: Dealing with black-box deployability
2022, Expert Systems with ApplicationsCitation Excerpt :According to the literature, several NAS approaches are available, such as Evolutionary Algorithms (EA) (Angeline, Saunders, & Pollack, 1994; Bianco, Buzzelli, Ciocca, & Schettini, 2020; Miikkulainen et al., 2019; Suganuma, Shirakawa, & Nagao, 2017; Xie & Yuille, 2017), Random Search (Liu, Simonyan, Vinyals, Fernando, & Kavukcuoglu, 2018), Reinforcement Learning (Baker, Gupta, Naik, & Raskar, 2017; Zhong, Yan, Wu, Shao, & Liu, 2018; Zoph & Le, 2017) and Sequential Model-based Optimization (SMBO) - aka Bayesian Optimization (BO) (Bergstra, Yamins, & Cox, 2013; Domhan, Springenberg, & Hutter, 2015; Jin, Song, & Hu, 2019; Mendoza, Klein, Feurer, Springenberg, & Hutter, 2016; Zela, Klein, Falkner, & Hutter, 2018), but they do not consider any constraint about the deployment of the trained model onto a device with limited hardware capacity. Indeed, a promising direction is to develop NAS methods for multi-objective problems (Dong, Cheng, Juan, Wei, & Sun, 2018; Elsken, Metzen, & Hutter, 2019; Zhou et al., 2018), in which measures of resource efficiency are used as objectives along with the predictive performance on unseen data.
Automated design of CNN architecture based on efficient evolutionary search
2022, NeurocomputingCitation Excerpt :The efficient building units of architectures can ensure the effectiveness of the generated architectures, so that the algorithm can find an architecture with good performance as soon as possible. In the optimization of saliency detection algorithms, Bianco et al. [18] encoded the operators in existing saliency algorithms and constructed discrete search spaces to accelerate the generation of candidate solutions. Performance predictors aim to avoid time-consuming training processes by predicting the fitness values of DNNs.
Evolutionary neural architecture search for remaining useful life prediction
2021, Applied Soft ComputingCitation Excerpt :The approaches proposed in [62], instead, make use of Grammatical Evolution [63] to evolve DNNs. In [64], GP is used to design CNNs to perform image saliency fusion. As a final remark, as shown in [10–12], it is worth to note that evolutionary approaches to NAS yield NNs that have a good trade-off between performance and size.
Environment enhanced fusion of infrared and visible images based on saliency assignment
2024, Signal, Image and Video Processing