Underwater Scene Prior Inspired Deep Underwater Image and Video Enhancement

Publication date: Available online 5 September 2019

Source: Pattern Recognition

Author(s): Chongyi Li, Saeed Anwar

Abstract

In underwater scenes, wavelength-dependent light absorption and scattering degrade the visibility of images and videos. The degraded underwater images and videos affect the accuracy of pattern recognition, visual understanding, and key feature extraction in underwater scenes. In this paper, we propose an underwater image enhancement convolutional neural network (CNN) model based on underwater scene prior, called UWCNN. Instead of estimating the parameters of underwater imaging model, the proposed UWCNN model directly reconstructs the clear latent underwater image, which benefits from the underwater scene prior which can be used to synthesize underwater image training data. Besides, based on the light-weight network structure and effective training data, our UWCNN model can be easily extended to underwater videos for frame-by-frame enhancement. Specifically, combining an underwater imaging physical model with optical properties of underwater scenes, we first synthesize underwater image degradation datasets which cover a diverse set of water types and degradation levels. Then, a light-weight CNN model is designed for enhancing each underwater scene type, which is trained by the corresponding training data. At last, this UWCNN model is directly extended to underwater video enhancement. Experiments on real-world and synthetic underwater images and videos demonstrate that our method generalizes well to different underwater scenes.

A Generalized Least-Squares Approach Regularized with Graph Embedding for Dimensionality Reduction

Publication date: Available online 5 September 2019

Source: Pattern Recognition

Author(s): Xiang-Jun Shen, Si-Xing Liu, Bing-Kun Bao, Chun-Hong Pan, Zheng-Jun Zha, Jianping Fan

Abstract

In current graph embedding methods, low dimensional projections are obtained by preserving either global geometrical structure of data or local geometrical structure of data. In this paper, the PCA (Principal Component Analysis) idea of minimizing least-squares reconstruction errors is regularized with graph embedding, to unify various local manifold embedding methods within a generalized framework to keep global and local low dimensional subspace. Different from the well-known PCA method, our proposed generalized least-squares approach considers data distributions together with an instance penalty in each data point. In this way, PCA is viewed as a special instance of our proposed generalized least squares framework for preserving global projections. Applying a regulation of graph embedding, we can obtain projection that preserves both intrinsic geometrical structure and global structure of data. From the experimental results on a variety of face and handwritten digit recognition, our proposed method has advantage of superior performances in keeping lower dimensional subspaces and higher classification results than state-of-the-art graph embedding methods.

SCHEMA: A Discrete Cross-Modal Hashing by Preserving Multiple Similarities

Publication date: Available online 3 September 2019

Source: Pattern Recognition

Author(s): Yongxin Wang, Zhen-Duo Chen, Liqiang Nie, Wei Zhang, Huaxiang Zhang, Xin-Shun Xu

Abstract

Recently, cross-modal hashing has attracted much attention. To learn hash codes, many supervised cross-modal hashing methods construct a large-size semantic pairwise similarity matrix and reconstruct it by hash codes, which is time-consuming and neglects the similarity between high-level and low-level features. In addition, the binary constraints of hash codes make the optimization problem NP-hard. Most methods relax the binary constraints, leading to large quantization error. To address these issues, in this paper, we present a novel cross-modal hashing method, i.e., diScrete Cross-modal Hashing by prEserving Multiple similArities, SCHEMA for short. It embeds multiple types of similarity into the learning of binary codes, i.e., the high-level similarity and the low-level similarity. In the light of this, the binary codes may preserve more similarity information of the samples in the original space. In addition, to solve the optimization problem, it equivalently transforms the binary constraints into an intersection of two continuous spaces. Thereafter, the problem is solved with a proposed algorithm without relaxation, avoiding the large quantization error problem. Moreover, the computational complexity of its training is linear to the size of a training set, making it scalable to large-scale datasets. Extensive experimental results on four benchmark datasets demonstrate SCHEMA consistently outperforms some state-of-the-art hashing methods by large gaps.

Realtime Multi-Scale Scene Text Detection with <strong>S</strong>cale-based <strong>R</strong>egion <strong>P</strong>roposal <strong>N</strong>etwork

Publication date: Available online 3 September 2019

Source: Pattern Recognition

Author(s): Wenhao He, Xu-Yao Zhang, Fei Yin, Zhenbo Luo, Jean-Marc Ogier, Cheng-Lin Liu

Abstract

Multi-scale approaches have been widely used for achieving high accuracy for scene text detection, but they usually slow down the speed of the whole system. In this paper, we propose a two-stage framework for realtime multi-scale scene text detection. The first stage employs a novel Scale-based Region Proposal Network (SRPN) which can localize text of wide scale range and estimate text scale efficiently. Based on SRPN, non-text regions are filtered out, and text region proposals are generated. Moreover, based on text scale estimation by SRPN, small or big texts in region proposals are resized into a unified normal scale range. The second stage then adopts a Fully Convolutional Network based scene text detector to localize text words from proposals of the first stage. Text detector in the second stage detects texts of narrow scale range but accurately. Since most non-text regions are eliminated through SRPN efficiently, and texts in proposals are properly scaled to avoid multi-scale pyramid processing, the whole system is quite fast. We evaluate both performance and speed of the proposed method on datasets ICDAR2015, ICDAR2013, and MSRA-TD500. On ICDAR2015, our system can reach the state-of-the-art F-measure score of 85.40% at 16.5fps (frame per second), and competitive performance of 79.66% at 35.1fps, either of which is more than 5 times faster than previous best methods. On ICDAR2013 and MSRA-TD500, we also achieve remarkable speedup by keeping competitive performance. Ablation experiments are also provided to demonstrate the reasonableness of our method.

Fast and Robust Template Matching with Majority Neighbour Similarity and Annulus Projection Transformation

Publication date: Available online 30 August 2019

Source: Pattern Recognition

Author(s): Jinxiang Lai, Liang Lei, Kaiyuan Deng, Runming Yan, Yang Ruan, Zhou Jinyun

Abstract

In the paper, a novel fast and robust template matching method named A-MNS based on Majority Neighbour Similarity (MNS) and the annulus projection transformation (APT) is proposed. Its essence is the MNS, a useful, rotation-invariant, low-computational-cost and robust similarity measurement. The proposed method is theoretically demonstrated and experimentally evaluated as being able to estimate the rotation angle of the target object, overcome challenges such as background clutter, occlusion, arbitrary rotation transformation, and non-rigid deformation, while performing fast matching. Empirical results evaluated on the up-to-date benchmark show that A-MNS is 4.419 times faster than DDIS (the state-of-the-art) and is also competitive in terms of its matching accuracy.

Enhanced Grassmann Discriminant Analysis with Randomized Time Warping for Motion Recognition

Publication date: Available online 30 August 2019

Source: Pattern Recognition

Author(s): Lincon Souza, Bernardo B. Gatto, Jing-Hao Xue, Kazuhiro Fukui

Abstract

This paper proposes a framework for classifying motion sequences, by extending the framework of Grassmann discriminant analysis (GDA). A problem of GDA is that its discriminant space is not necessarily optimal. This limitation becomes even more prominent when utilizing the subspace representation of randomized time warping (RTW). RTW is a sequence representation that can effectively model a motion’s temporal information by a low-dimensional subspace, simplifying the problem of comparing two sequences to that of comparing two subspaces. The key idea of the proposed enhanced GDA is projecting class subspaces onto a generalized difference subspace before mapping them on a Grassmann manifold. The GDS projection can remove overlapping components of the subspaces in the vector space, nearly orthogonalizing them. Consequently, a dictionary of orthogonalized class subspaces produces a set of more discriminant data points in the Grassmann manifold, in comparison with the original set. This set of data points can further enhance the discriminant ability of GDA. We demonstrate the validity of the proposed framework, RTW+eGDA, through experiments on motion recognition using the publicly available Cambridge gesture, KTH action, and UCF sports datasets.

Population-Guided Large Margin Classifier for High-Dimension Low -Sample-Size Problems

Publication date: Available online 31 August 2019

Source: Pattern Recognition

Author(s): Qingbo Yin, Ehsan Adeli, Liran Shen, Dinggang Shen

Abstract

In this paper, we propose a novel linear binary classifier, denoted by population-guided large margin classifier (PGLMC), applicable to any sorts of data, including high-dimensional low-sample-size (HDLSS). PGLMC is conceived with a projecting direction w given by the comprehensive consideration of local structural information of the hyperplane and the statistics of the training samples. Our proposed model has several advantages compared to those widely used approaches. First, it isn't sensitive to the intercept term b. Second, it operates well with imbalanced data. Third, it is relatively simple to be implemented based on Quadratic Programming. Fourth, it is robust to the model specification for various real applications. The theoretical properties of PGLMC are proven. We conduct a series of evaluations on the simulated and five realworld benchmark data sets, including DNA classification, medical image analysis and face recognition. PGLMC outperforms the state-of-theart classification methods in most cases, or obtains comparable results.

In-Air Handwritten Chinese Text Recognition with Temporal Convolutional Recurrent Network

Publication date: Available online 26 August 2019

Source: Pattern Recognition

Author(s): Ji Gan, Weiqiang Wang, Ke Lu

Abstract

As a new human-computer interaction way, in-air handwriting allows users to perform gesture-based writing in the midair. However, most existing in-air handwriting systems mainly focus on recognizing either isolated characters/words or only a small number of texts, making those systems far from practical applications. Instead, here we present a 3D in-air handwritten Chinese text recognition (IAHCTR) system for the first time, and construct the first public large-scale IAHCT dataset. Moreover, a novel architecture, named the temporal convolutional recurrent network (TCRN), is proposed for online HCTR. Specifically, the TCRN first applies the 1-dimensional convolution to extract local contextual features from low-level trajectories, and then it utilizes the recurrent network to capture long-term dependencies of high-level outputs. Compared with the state-of-the-art architecture, the TCRN not only avoids the domain-specific knowledge for feature image extraction, but also attains higher training efficiency with a more compact model. Empirically, this TCRN also outperforms the single recurrent network with faster prediction and higher accuracy. Experiments on CASIA-OLHWDB2 & ICDAR-2013 demonstrate that the TCRN yields the best result in comparison to the state-of-the-art methods for online HCTR.

Decomposed Slice Sampling for Factorized Distributions

Publication date: Available online 26 August 2019

Source: Pattern Recognition

Author(s): Jiachun Wang, Shiliang Sun

Abstract

Slice sampling provides an automatical adjustment to match the characteristics of the distribution. Although this method has made great success in many situations, it becomes limited when the distribution is complex. Inspired by Higdon [1], in this paper, we present a decomposed sampling framework based on slice sampling called decomposed slice sampling (DSS). We suppose that the target distribution can be divided into two multipliers so that information in each term can be used, respectively. The first multiplier is used in the first step of DSS to obtain horizontal slices and the last term is used in the second step. Simulations on four simple distributions indicate the effectiveness of our method. Compared with slice sampling and Hamiltonian Monte Carlo on Gaussian distributions in different dimensions and ten real-world datasets, the proposed method achieves better performance.