Supervised Deep Sparse Coding Networks for Image Classification

In this paper, we propose a novel deep sparse coding network (SCN) capable of efficiently adapting its own regularization parameters for a given application. The network is trained end-to-end with a supervised task-driven learning algorithm via error backpropagation. During training, the network learns both the dictionaries and the regularization parameters of each sparse coding layer so that the reconstructive dictionaries are smoothly transformed into increasingly discriminative representations. In addition, the adaptive regularization also offers the network more flexibility to adjust sparsity levels. Furthermore, we have devised a sparse coding layer utilizing a “skinny” dictionary. Integral to computational efficiency, these skinny dictionaries compress the high-dimensional sparse codes into lower dimensional structures. The adaptivity and discriminability of our 15-layer SCN are demonstrated on six benchmark datasets, namely Cifar-10, Cifar-100, STL-10, SVHN, MNIST, and ImageNet, most of which are considered difficult for sparse coding models. Experimental results show that our architecture overwhelmingly outperforms traditional one-layer sparse coding architectures while using much fewer parameters. Moreover, our multilayer architecture exploits the benefits of depth with sparse coding's characteristic ability to operate on smaller datasets. In such data-constrained scenarios, our technique demonstrates a highly competitive performance compared with the deep neural networks.

Arc-Support Line Segments Revisited: An Efficient High-Quality Ellipse Detection

Over the years many ellipse detection algorithms spring up and are studied broadly, while the critical issue of detecting ellipses accurately and efficiently in real-world images remains a challenge. In this paper, we propose a valuable industry-oriented ellipse detector by arc-support line segments, which simultaneously reaches high detection accuracy and efficiency. To simplify the complicated curves in an image while retaining the general properties including convexity and polarity, the arc-support line segments are extracted, which grounds the successful detection of ellipses. The arc-support groups are formed by iteratively and robustly linking the arc-support line segments that latently belong to a common ellipse. Afterward, two complementary approaches, namely, locally selecting the arc-support group with higher saliency and globally searching all the valid paired groups, are adopted to fit the initial ellipses in a fast way. Then, the ellipse candidate set can be formulated by hierarchical clustering of 5D parameter space of initial ellipses. Finally, the salient ellipse candidates are selected and refined as detections subject to the stringent and effective verification. Extensive experiments on three public datasets are implemented and our method achieves the best F-measure scores compared to the state-of-the-art methods. The source code is available at https://github.com/AlanLuSun/High-quality-ellipse-detection.

Exemplar-Based Recursive Instance Segmentation With Application to Plant Image Analysis

Instance segmentation is a challenging computer vision problem which lies at the intersection of object detection and semantic segmentation. Motivated by plant image analysis in the context of plant phenotyping, a recently emerging application field of computer vision, this paper presents the exemplar-based recursive instance segmentation (ERIS) framework. A three-layer probabilistic model is first introduced to jointly represent hypotheses, voting elements, instance labels, and their connections. Afterward, a recursive optimization algorithm is developed to infer the maximum a posteriori (MAP) solution, which handles one instance at a time by alternating among the three steps of detection, segmentation, and update. The proposed ERIS framework departs from previous works mainly in two respects. First, it is exemplar-based and model-free, which can achieve instance-level segmentation of a specific object class given only a handful of (typically less than 10) annotated exemplars. Such a merit enables its use in case that no massive manually-labeled data is available for training strong classification models, as required by most existing methods. Second, instead of attempting to infer the solution in a single shot, which suffers from extremely high computational complexity, our recursive optimization strategy allows for reasonably efficient MAP-inference in full hypothesis space. The ERIS framework is substantialized for the specific application of plant leaf segmentation in this work. Experiments are conducted on public benchmarks to demonstrate the superiority of our method in both effectiveness and efficiency in comparison with the state-of-the-art.

A Novel Key-Point Detector Based on Sparse Coding

Most popular hand-crafted key-point detectors such as Harris corner, MSER, SIFT, SURF rely on some specific pre-designed structures for detection of corners, blobs, or junctions in an image. The very nature of pre-designed structures can be considered a source of inflexibility for these detectors in different contexts. Additionally, the performance of these detectors is also highly affected by non-uniform change in illumination. To the best of our knowledge, while there are some previous works addressing one of the two aforementioned problems, there currently lacks an efficient method to solve both simultaneously. In this paper, we propose a novel Sparse Coding based Key-point detector (SCK) which is fully invariant to affine intensity change and independent of any particular structure. The proposed detector locates a key-point in an image, based on a complexity measure calculated from the block surrounding its position. A strength measure is also proposed for comparing and selecting the detected key-points when the maximum number of key-points is limited. In this paper, the desirable characteristics of the proposed detector are theoretically confirmed. Experimental results on three public datasets also show that the proposed detector achieves significantly high performance in terms of repeatability and matching score.

Image Compressed Sensing Using Convolutional Neural Network

In the study of compressed sensing (CS), the two main challenges are the design of sampling matrix and the development of reconstruction method. On the one hand, the usually used random sampling matrices (e.g., GRM) are signal independent, which ignore the characteristics of the signal. On the other hand, the state-of-the-art image CS methods (e.g., GSR and MH) achieve quite good performance, however with much higher computational complexity. To deal with the two challenges, we propose an image CS framework using convolutional neural network (dubbed CSNet) that includes a sampling network and a reconstruction network, which are optimized jointly. The sampling network adaptively learns the sampling matrix from the training images, which makes the CS measurements retain more image structural information for better reconstruction. Specifically, three types of sampling matrices are learned, i.e., floating-point matrix, {0, 1}-binary matrix, and {-1, +1}-bipolar matrix. The last two matrices are specially designed for easy storage and hardware implementation. The reconstruction network, which contains a linear initial reconstruction network and a non-linear deep reconstruction network, learns an end-to-end mapping between the CS measurements and the reconstructed images. Experimental results demonstrate that CSNet offers state-of-the-art reconstruction quality, while achieving fast running speed. In addition, CSNet with {0, 1}-binary matrix, and {-1, +1}-bipolar matrix gets comparable performance with the existing deep learning-based CS methods, outperforms the traditional CS methods. Experimental results further suggest that the learned sampling matrices can improve the traditional image CS reconstruction methods significantly.

Single-Perspective Warps in Natural Image Stitching

Results of image stitching can be perceptually divided into single-perspective and multiple-perspective. Compared to the multiple-perspective result, the single-perspective result excels in perspective consistency but suffers from projective distortion. In this paper, we propose two single-perspective warps for natural image stitching. The first one is a parametric warp, which is an incremental combination of the dual-feature-based as-projective-as-possible warp and the quasi-homography warp. The second one is a mesh-based warp, which is determined by optimizing a total energy function that simultaneously emphasizes different characteristics of the single-perspective warp, including alignment, distortion and saliency. A comprehensive evaluation demonstrates that the proposed warp outperforms some state-of-the-art warps in urban scenes, including APAP, AutoStitch, SPHP and GSP.

Deep Salient Object Detection With Contextual Information Guidance

Integration of multi-level contextual information, such as feature maps and side outputs, is crucial for Convolutional Neural Networks (CNNs)-based salient object detection. However, most existing methods either simply concatenate multi-level feature maps or calculate element-wise addition of multi-level side outputs, thus failing to take full advantages of them. In this paper, we propose a new strategy for guiding multi-level contextual information integration, where feature maps and side outputs across layers are fully engaged. Specifically, shallower-level feature maps are guided by the deeper-level side outputs to learn more accurate properties of the salient object. In turn, the deeper-level side outputs can be propagated to high-resolution versions with spatial details complemented by means of shallower-level feature maps. Moreover, a group convolution module is proposed with the aim to achieve high-discriminative feature maps, in which the backbone feature maps are divided into a number of groups and then the convolution is applied to the channels of backbone feature maps within each group. Eventually, the group convolution module is incorporated in the guidance module to further promote the guidance role. Experiments on three public benchmark datasets verify the effectiveness and superiority of the proposed method over the state-of-the-art methods.

Re-Caption: Saliency-Enhanced Image Captioning Through Two-Phase Learning

Visual saliency and semantic saliency are important in image captioning. However, a single-phase image captioning model benefits little from limited saliency information without a saliency predictor. In this paper, a novel saliency-enhanced re-captioning framework via two-phase learning is proposed to enhance single-phase image captioning. In the framework, both visual and semantic saliency cues are distilled from the first-phase model and fused with the second-phase model for model self-boosting. The visual saliency mechanism can generate a saliency map and a saliency mask for an image without learning a saliency predictor. The semantic saliency mechanism sheds some lights on the properties of those words with the part-of-speech Noun in a caption. Besides, another type of saliency, sample saliency is proposed to compute the saliency degree of each sample, which is helpful for more robust image captioning. In addition, how to combine the three types of saliency for further performance boost is also examined. Our framework can treat an image captioning model as a saliency extractor, which may benefit other captioning models and the related tasks. The experimental results on both the Flickr30k and MSCOCO datasets show that the saliency-enhanced models can obtain promising performance gains.

Inpainting Versus Denoising for Dose Reduction in Scanning-Beam Microscopies

We consider sampling strategies for reducing the radiation dose during image acquisition in scanning-beam microscopies, such as SEM, STEM, and STXM. Our basic assumption is that we may acquire subsampled image data (with some pixels missing) and then inpaint the missing data using a compressed-sensing approach. Our noise model consists of Poisson noise plus random Gaussian noise. We include the possibility of acquiring fully sampled image data, in which case the inpainting approach reduces to a denoising procedure. We use numerical simulations to compare the accuracy of reconstructed images with the “ground truths.” The results generally indicate that, for sufficiently high radiation doses, higher sampling rates achieve greater accuracy, commensurate with the well-established literature. However, for very low radiation doses, where the Poisson noise and/or random Gaussian noise begins to dominate, then our results indicate that subsampling/inpainting can result in smaller reconstruction errors. We also present an information-theoretic analysis, which allows us to quantify the amount of information gained through the different sampling strategies and enables some broader discussion of the main results.

Compressive Color Pattern Detection Using Partial Orthogonal Circulant Sensing Matrix

One key issue in compressive sensing is to design a sensing matrix that is random enough to have a good signal reconstruction quality and that also enjoys some desirable properties, such that orthogonality or being circulant. The classic method to construct such sensing matrices is to, first, generate a full orthogonal circulant matrix and, then, select only a few rows. In this paper, we propose a refined construction of orthogonal circulant sensing matrices that generates a circulant matrix, where only a given subset of its rows are orthogonal. That way, the generation method is a lot less constrained leading to better sensing matrices, and we still have the desired properties. The proposed partial shift-orthogonal sensing matrix is compared to random and learned sensing matrices in the frame of signal reconstruction. This sensing matrix is pattern-dependent and, thus, efficient to detect color patterns and edges from the measurements of a color image.