Image dehazing with uneven illumination prior by dense residual channel attention network

Existing dehazing methods based on convolutional neural networks estimate the transmission map by treating channel-wise features equally, which lacks flexibility in handling different types of haze information, leading to the poor representational ability of the network. Besides, the scene lights are predicted by an even illumination prior which does not work for a real situation. To solve these problems, the authors propose a dense residual channel attention network (DRCAN) for estimating the transmission map and use an image segmentation strategy to predict scene lights. Specifically, DRCAN is built based on the proposed dense residual block (DRB) and dense residual channel attention block (DRCAB). DRB extracts the hierarchical features with increasing receptive fields. DRCAB makes the network focus on the features containing heavy haze information. After the transmission map is estimated, fuzzy partition entropy combined with graph cuts is used to segment the transmission map into scene regions covered with varying scene lights. This strategy not only considers the fuzzy intensities of the low-contrast transmission map but also takes spatial correlation into account. Finally, a clear image is obtained by the transmission map and varying scene lights. Extensive experiments demonstrate that our method is comparable to most of existing methods.

Robust segmentation of the colour image by fusing the SDD clustering results from different colour spaces

Segmentation of the colour image is challenging because the colour information is lost after being projected into three channels of the colour space. Many state-of-the-art colour image segmentation methods are based on monochrome segmentation in one channel of the colour space. However, the optimal performance of a segmentation method usually could not be achieved in a single colour space due to the complexity and diversity of the colour images. In this study, the authors propose to segment the colour image by fusing the slope difference distribution (SDD) clustering results in different colour spaces. For simplicity, the segmentation approach is designed as two-label segmentation and it could be easily generalised to be multiple-label segmentation. The proposed approach is compared with the state-of-the-art colour image segmentation methods both quantitatively and qualitatively. Experimental results verified the effectiveness of the proposed approach.

Active contours driven by modified LoG energy term and optimised penalty term for image segmentation

An active contour model to segment the images is proposed by combining local binary fitting (LBF) energy function and modified Laplacian of Gaussian (MLoG) energy function. A MLoG energy function based on a new boundary indicator function or edge stop function (ESF) is introduced to smoothen the homogeneous regions and enhance the edge information of objects. Also, MLoG energy term with LBF energy term is incorporated to drive the initial contour towards the object boundary. Finally, the penalty term is replaced with a new optimized potential function, which can improve the corresponding speed function. By adding the optimized area energy term, contour position is accelerated towards the object boundary. Further, the addition of MLoG based on new ESF, makes the proposed model insensitive to the initial contour. Experiments are performed on various real images, MS-COCO 2014 train data set images and Segmentation Evaluation Database images shared in Weizmann Institute of Science website. The proposed model provides better segmentation results compared to the other state of the art models in terms of segmentation accuracy, F-score and CPU execution time. Further, experimental results also prove the robustness of the proposed model in terms of contour initialization, intensity inhomogeneity and noise.

F2PNet: font-to-painting translation by adversarial learning

For Chinese font images, when all their strokes are replaced by pattern elements such as flowers and birds, they become flower–bird character paintings, which are traditional Chinese art treasures. The generation of flower–bird painting requires professional painters’ great efforts. How to automatically generate these paintings from font images? There is a huge gap between the font domain and the painting domain. Although many image-to-image translation frameworks have been proposed, they are unable to handle this situation effectively. In this study, a novel method called font-to-painting network (F2PNet) is proposed for font-to-painting translation. Specifically, an encoder equipped with dilated convolutions extracts features of the font image, and then the features are fed into the domain translation module for mapping the font feature space to the painting feature space. The acquired features are further adjusted by the refinement module and utilised by the decoder to obtain the target painting. The authors apply adversarial loss and cycle-consistency loss to F2PNet and further propose a loss term, which is called recognisability loss and makes the generated painting have font-level recognisability. It is proved by experiments that F2PNet is effective and can be used as an unsupervised image-to-image translation framework to solve more image translation tasks.

Novel breast cancer classification framework based on deep learning

AbstractBreast cancer is a major cause of transience amongst women. In this paper, two novel techniques, ResNet50 and VGG-16, are utilised and re-trained to recognise two classes rather than 1000 classes with high accuracy and low computational requirements. In addition, transfer learning and data augmentation are performed to solve the problem of lack of tagged data. To get a better accuracy, the support vector machine (SVM) classifier is utilised instead of the last fully connected layer. Our models performance are verified utilising k-fold cross-validation. Our proposed techniques are trained and evaluated on three mammographic datasets: mammographic image analysis society, digital database for screening mammography (DDSM) and the curated breast imaging subset of DDSM. This paper explains end-to-end fully convolutional neural networks without any prepossessing or post-processing. The proposed technique of employing ResNet50 hybridised with SVM achieves the best performance, specifically with the DDSM dataset, producing 97.98% accuracy, 98.46% area under the curve, 97.63% sensitivity, 96.51% precision, 95.97% F1 score and computational time 1.8934 s.

Multiscale matters for part segmentation of instruments in robotic surgery

A challenging aspect of instrument segmentation in robotic surgery is to distinguish different parts of the same instrument. Parts with similar textures are common in a practical instrument and are difficult to distinguish. In this work, the authors introduce an end-to-end recurrent model that comprises a multiscale semantic segmentation network and a refinement model. Specifically, the semantic segmentation network uniformly transforms the input images in multiple scales into a semantic mask, and the refinement model is a single-scale net recurrently optimising the above semantic mask. Through extensive experiments, the authors validate that the models with multiscale inputs perform better than those to fuse encoded feature maps and ones with spatial attention. Furthermore, the authors verify the effectiveness of the proposed model with state-of-the-art performances on several robotic instrument datasets derived from MICCAI Endoscopic Vision Challenges.

Human activity recognition using improved dynamic image

In action recognition, the dynamic image (DI) approach is recently proposed to code a video signal to a still image. Since DI descriptor is strongly dependent on first frames, it cannot extract dynamics that do not occur in the first frames or even long dynamics. On the other hand, most of the video frames are not informative for the task of action recognition. Therefore, the authors' intuition is that the process of representing a video using all frames is inefficient. Thus, in this study, they proposed to remove the existing redundancy inside the frames and extract some processed informative images based on the information theory which are called key frames. The proposed method is capable enough to extract sufficient frames regardless of the duration and the position of frames in the entire video. Motivated by this method and DI, they proposed a novel key frames dynamic image (KFDI) approach. Experimental results on popular UCF11, Olympic Sports, and J-HMDB datasets show the superiority of the proposed KFDI approach compared to the DI in capturing long dynamics of videos for action recognition. Their experiments show KFDI improves the accuracy 2–6% compared to DI.

Target distance measurement method using monocular vision

Most existing machine vision-based location methods mainly focus on the spatial positioning schemes using one or two cameras along with non-vision sensors. To achieve an accurate location, both schemes require processing a large amount of data. In this study, the authors propose a novel method, which requires much less amount of data to be processed for measuring target distance using monocular vision. Based on the geometric model of camera imaging, the parameters of the camera (such as camera's focal length and equivalent focal length.), as well as the principle of analogue signal being transformed into a digital signal, the authors derive the relationship among the target distance, field of view, equivalent focal length and camera resolution. Experimental results show that the proposed method can effectively and accurately achieve the target distance measurement.

Multi-modal image fusion based on saliency guided in NSCT domain

Image fusion aims at aggregating the redundant and complementary information in multiple original images, the most challenging aspect is to design robust features and discriminant model, which enhances saliency information in the fused image. To address this issue, the authors develop a novel image fusion algorithm for preserving the invariant knowledge of the multimodal image. Specifically, they formulate a novel unified architecture based on non-subsampled contourlet transform (NSCT). Their method introduces Quadtree decomposition and Bezier interpolation to extract crucial infrared features. Furthermore, they propose a saliency advertising phase congruency-based rule and local Laplacian energy-based rule for low- and high-pass sub-bands fusion, respectively. In this approach, the fusion image could not only combine the local and global features of the source image to avoid smoothing the edge of the target, but also retain the minor scales details and resists the interference noise of the multi-modal image. Both objective assessments and subjective visions of experimental results indicate that the proposed algorithm performs competitively in both objective evaluation criteria and visual quality.

Colour image enhancement with brightness preservation and edge sharpening using a heat conduction matrix

In this study, an enhancement process obtained by applying the heat conduction equation of solid and stagnant fluids on colour images is proposed. After the colour channel stretching, the RGB colour image was converted to the HSI model. The heat conduction equation was applied for each pixel on the I channel of the HSI colour model. The elements of the feature matrix called heat conduction matrix (HCM) can have negative, positive or zero values. A pixel with a small negative HCM value indicates that I needs level enhancement for a good image, whereas a small positive HCM value means that the I level value will be reduced and aligned with its neighbours. High positive or negative values are defined as the edges of the objects and the I level values of such pixels are not changed to protect the edges. In addition, whether HCM is negative or positive, the balanced increment and decrement path at a level I ensures that the mean brightness value performs natural protection. Finally, an enhanced image is obtained by transitioning from the HSI to the RGB colour model. Experimental results show that this method can enhance colour image details better than other methods.