Picasso: A Sparse Learning Library for High Dimensional Data Analysis in R and Python

We describe a new library named picasso, which implements a unified framework of pathwise coordinate optimization for a variety of sparse learning problems (e.g., sparse linear regression, sparse logistic regression, sparse Poisson regression and scaled sparse linear regression) combined with efficient active set selection strategies. Besides, the library allows users to choose different sparsity-inducing regularizers, including the convex $\ell_1$, nonvoncex MCP and SCAD regularizers. The library is coded in \texttt{C++} and has user-friendly R and Python wrappers. Numerical experiments demonstrate that picasso can scale up to large problems efficiently.

Accelerated Alternating Projections for Robust Principal Component Analysis

We study robust PCA for the fully observed setting, which is about separating a low rank matrix $\BL$ and a sparse matrix $\BS$ from their sum $\BD=\BL+\BS$. In this paper, a new algorithm, dubbed accelerated alternating projections, is introduced for robust PCA which significantly improves the computational efficiency of the existing alternating projections proposed in (Netrapalli et al., 2014) when updating the low rank factor. The acceleration is achieved by first projecting a matrix onto some low dimensional subspace before obtaining a new estimate of the low rank matrix via truncated SVD. Exact recovery guarantee has been established which shows linear convergence of the proposed algorithm. Empirical performance evaluations establish the advantage of our algorithm over other state-of-the-art algorithms for robust PCA.

Pyro: Deep Universal Probabilistic Programming

Pyro is a probabilistic programming language built on Python as a platform for developing advanced probabilistic models in AI research. To scale to large data sets and high-dimensional models, Pyro uses stochastic variational inference algorithms and probability distributions built on top of PyTorch, a modern GPU-accelerated deep learning framework. To accommodate complex or model-specific algorithmic behavior, Pyro leverages Poutine, a library of composable building blocks for modifying the behavior of probabilistic programs.

Convergence Rate of a Simulated Annealing Algorithm with Noisy Observations

In this paper we propose a modified version of the simulated annealing algorithm for solving a stochastic global optimization problem. More precisely, we address the problem of finding a global minimizer of a function with noisy evaluations. We provide a rate of convergence and its optimized parametrization to ensure a minimal number of evaluations for a given accuracy and a confidence level close to 1. This work is completed with a set of numerical experimentations and assesses the practical performance both on benchmark test cases and on real world examples.

Scalable Kernel K-Means Clustering with Nystrom Approximation: Relative-Error Bounds

Kernel $k$-means clustering can correctly identify and extract a far more varied collection of cluster structures than the linear $k$-means clustering algorithm. However, kernel $k$-means clustering is computationally expensive when the non-linear feature map is high-dimensional and there are many input points. Kernel approximation, e.g., the Nystrom method, has been applied in previous works to approximately solve kernel learning problems when both of the above conditions are present. This work analyzes the application of this paradigm to kernel $k$-means clustering, and shows that applying the linear $k$-means clustering algorithm to $\frac{k}{\epsilon} (1 + o(1))$ features constructed using a so-called rank-restricted Nystrom approximation results in cluster assignments that satisfy a $1 + \epsilon$ approximation ratio in terms of the kernel $k$-means cost function, relative to the guarantee provided by the same algorithm without the use of the Nystrom method. As part of the analysis, this work establishes a novel $1 + \epsilon$ relative-error trace norm guarantee for low-rank approximation using the rank-restricted Nystrom approximation. Empirical evaluations on the $8.1$ million instance MNIST8M dataset demonstrate the scalability and usefulness of kernel $k$-means clustering with Nystrom approximation. This work argues that spectral clustering using Nystrom approximation---a popular and computationally efficient, but theoretically unsound approach to non-linear clustering---should be replaced with the efficient and theoretically sound combination of kernel $k$-means clustering with Nystrom approximation. The superior performance of the latter approach is empirically verified.

Near Optimal Frequent Directions for Sketching Dense and Sparse Matrices

Given a large matrix $A\in\mathbb{R}^{n\times d}$, we consider the problem of computing a sketch matrix $B\in\mathbb{R}^{\ell\times d}$ which is significantly smaller than but still well approximates $A$. We consider the problems in the streaming model, where the algorithm can only make one pass over the input with limited working space, and we are interested in minimizing the covariance error $\|A^TA-B^TB\|_2.$ The popular Frequent Directions algorithm of \cite{liberty2013simple} and its variants achieve optimal space-error tradeoffs. However, whether the running time can be improved remains an unanswered question. In this paper, we almost settle the question by proving that the time complexity of this problem is equivalent to that of matrix multiplication up to lower order terms. Specifically, we provide new space-optimal algorithms with faster running times and also show that the running times of our algorithms can be improved if and only if the state-of-the-art running time of matrix multiplication can be improved significantly.

Deep Learning in Cardiology

The medical field is creating large amount of data that physicians are unable to decipher and use efficiently. Moreover, rule-based expert systems are inefficient in solving complicated medical tasks or for creating insights using big data. Deep learning has emerged as a more accurate and effective technology in a wide range of medical problems such as diagnosis, prediction, and intervention. Deep learning is a representation learning method that consists of layers that transform data nonlinearly, thus, revealing hierarchical relationships and structures. In this review, we survey deep learning application papers that use structured data, and signal and imaging modalities from cardiology. We discuss the advantages and limitations of applying deep learning in cardiology that also apply in medicine in general, while proposing certain directions as the most viable for clinical use.

Hydrogels for Advanced Stem Cell Therapies: A Biomimetic Materials Approach for Enhancing Natural Tissue Function

Stem-cell-based therapy is a promising approach for the treatment of a myriad of diseases and injuries. However, the low rate of cell survival and the uncontrolled differentiation of the injected stem cells currently remain key challenges in advancing stem cell therapeutics. Hydrogels are biomaterials that are potentially highly effective candidates for scaffold systems for stem cells and other molecular encapsulation approaches to target in vivo delivery. Hydrogel-based strategies can potentially address several current challenges in stem cell therapy. We present a concise overview of the recent advances in applications of hydrogels in stem cell therapies, with a focus particularly on the recent advances in the design and approaches for application of hydrogels in tissue engineering. The capability of hydrogels to either enhance the function of the transplanted stem cells by promoting their controlled differentiation or enhance the recruitment of endogenous adult stem cells to the injury site for repair is also reviewed. Finally, the importance of impacts and the desired relationship between the scaffold system and the encapsulated stem cells are discussed.