[ Information ] [ Publications ] [Signal processing codes] [ Signal & Image Links ]  
[ Main blog: A fortunate hive ] [ Blog: Information CLAde ] [ Personal links ]  
[ SIVA Conferences ] [ Other conference links ] [ Journal rankings ]  
[ Tutorial on 2D wavelets ] [ WITS: Where is the starlet? ]  
If you cannot find anything more, look for something else (Bridget Fountain) 


Abstract: Underdetermined or illposed inverse problems require additional information to yield sound solutions with tractable optimization algorithms. Sparsity arguments are consequent heuristics to that matter, with numerous applications in signal restoration, image inversion or machine learning. Since the l_{0} count measure is barely tractable, many statistical or learning approaches have invested in computable proxies, such as the l_{1} norm. However, the latter does not exhibit the desirable property of scale invariance for sparse data. Generalizing the SOOT Euclidean/Taxicab l_{1}/l_{2} normratio initially introduced for blind deconvolution, we propose SPOQ, a family of smoothed scaleinvariant penalty regularizations. It consists of surrogates for l_{p}overl_{q} quasinorm/norm ratios, p in ]0,2[, q≥2 with Lipschitz regularity. These surrogates are embedded into a novel majorizeminimize trustregion approach, generalizing the variable metric forwardbackward algorithm. On naturally sparse massspectrometry signals, we show that SPOQ significantly outperforms l_{0}, l_{1}, Cauchy, Welsch and penalties on several performance measures. Guidelines on SPOQ hyperparameters tuning are also provided, suggesting simple datadriven choices.
fchen
The law of parsimony (or Occam's razor^{1}) is an important heuristic principle, at least a guideline, in history, social and empirical sciences . In modern terms, it can be expressed as a preference to simpler models, when they possess  on observed phenomena  a power of explanation comparable to more complex ones. In statistical data processing, it can be used to limit the degrees of freedom for parametric models, to reduce a space of search, to define stopping criteria, to bound filter support, to simplify signals or images believed to encompass meaningful structures, with the help of sparsifying transformations. For processes that inherently generate sparse observations (spiking neurons, chemical sensing), potentially corrupted by smoothing kernels and noise, sparsity may provide a quantitative target on restored data. In the context of partial observations, it can become a means to selecting one solution, among all potential solutions that are consistent with observations.
A natural playground for sparsity, in discrete time series analysis, is c_{00}(\eR)$, the space of almostzero real sequences, which is closed under finite addition and convolution , . Unless stated otherwise, in the following, we consider finite sequences of $c_{00}(\eR)$, $\ex=(x_n)_{1 \leq n \leq N} \in \eR^N$, denoted by bold lower case. The cornerstone measure of parsimony is the count index^{2}, i.e. the number of nonzero terms in $\ex$, denoted by $\lzx{\ex}$. It is also called cardinality function, numerosity measure, parsimony or sparsity. For p ∈ ]0, + ∞[, we define $\lpx{p}{\ex} = \left( \sum_{n=1}^Nx_n^p \right)^{1/p}$. It satisfies norm axioms for p ≥ 1. For quasinorms l_{p}, p < 1, a weaker form of the triangle inequality, $\ \ex+\ey \ \le K \left(\ \ex\+\ \ey\ \right)$ with K ∈ [1, + ∞[, is satisfied^{3}. Note that the quasinorm (p < 1) is sometimes called pnormlike .
Piecewise constant function is nonsmooth and nonconvex. It is often considered as unusable for data optimization in large linear systems^{4}, since its leads to NPhard problems . Under certain drastic conditions, an $\lz$problem can however be solved exactly using a convex relation by surrogate penalties, like the norm . In practice, such conditions are rarely met, and the use of the norm yields approximate solutions and becomes more heuristic . Mixedinteger programming reformulations using branchandbound methods are possible, albethey for relatively smallsized problems.
Norm or quasinormbased penalties have subsequently played an important role in sparse data processing or parsimonious models for highdimension regression. Squared Euclidean norm l_{2}^{2} possesses efficient implementations but often heavily degrades data sharpness. As a data fidelity term, the l_{2}^{2} cost function alone cannot address, at the same time, residual noise properties and additional data assumptions. It can be supplemented by different forms of variational regularization, à la Tikhonov . Those act on the composition of data with a wellsuited sparsifying operator, e.g. identity, gradients, higherorder derivatives, or wavelet frames . The l_{2}^{2} penalty case corresponds to ridge regression . In the basis pursuit or lasso method (least absolute shrinkage and selection operator ) the onenorm or taxicab distance () is preferred, as it promotes a form of sparsity on the solution. Solutions to this “l_{2}^{2} fidelity plus l_{1}” regularization problem are related to total variation regularization in image restoration . A convex combination of ridge and lasso regularizations yields the naive elastic net regularization . Other convex pnorms (p ≥ 1) regularizations have been addressed, as in bridge regression which interpolates between lasso and ridge . Expecting sparser solutions in practice, nonconvex leastsquares plus l_{p} (p < 1) problems have been addressed . Although a priori appealing for sparse data restoration, such problems retain NPhard complexities . Another caveat of the above norm/quasinorm penalties, as proxies for reasonable approximations to , is their scalevariance: norms and quasinorms satisfy the absolute homogeneity axiom ($\lpx{p}{\lambda\ex}=\lambda\lpx{p}{\ex}$, for $\lambda\in \eR $). Either to copycat the 0degree homogeneity of , or to cope with scaling ambiguity, scaleinvariant contrast functions were suggested . Recently, the nonconvex hyperbolic penalty has reemerged as a triad: a useful stopping criterion, a sparsity index and a sparse deconvolution penalty, as detailed in . The latter work (SOOT: Smoothed OneOverTwo norm ratio) also proposed an efficient optimization algorithm with theoretically guaranteed convergence. We now motivate its extension to a broader family of (quasi)norm ratio $\lpx{p}{\ex}/\lpx{q}{\ex}$ with couples (p, q) ∈ ]0, 2[ × [2, ∞[, based on both their counting properties and probabilistic interpretation.
First, we have equivalence relations in finite dimension:
$$\lpx{q}{\ex} \leq \lpx{p}{\ex} \leq \lzx{\ex}^{\frac{1}{p}\frac{1}{q}}\lpx{q}{\ex} \leq N^{\frac{1}{p}\frac{1}{q}}\lpx{q}{\ex}
\label{eq_lplq}$$
with p ≤ q, from the standard powermean inequality implying classical l_{p}space embeddings and generalized RogersHölder's inequalities^{5} The LHS in is attained when $\ex$ realizes an instance of the most prototypical sparse signals of $c_{00}(\eR)$, with only one nonzero component. The RHS is reached by a maximally nonsparse $\ex$, where all the samples are set to a nonzero constant (the constant N can even be set to the actual support of $\ex$). Thus, quasinormratios provide interesting proxies for a sparseness measure of $\ex$, to quantify how much the “action” or “energy” of a discrete signal is concentrated into only a few of its components. They are invariant under integer (circular) shift or sample shuffling in the sequence, and under nonzero scale change (or 0degree homogeneity). Those ratios are sometimes termed pqmeans.
For every p ∈ ]0, 2[ and q ∈ [2, + ∞[, we thus define:
$$%\xinrn \quad
\ell_p / \ell_q (\ex) = \left(\sum_{n =1}^N \left(\frac{x_n^q}{\sum_{n =1}^N x_{n}^q} \right)^{p/q} \right)^{1/p} \,.
\label{eqlplqexact}$$
It appeared in the context of Jensen's inequalities in . If one interprets the inner term
$$p_n = \frac{x_n^q}{\sum_{n =1}^N x_n^q}$$
as a discrete probability distribution, then l_{p}/l_{q} rewrites as an increasing function (u → u^{1/p}) of a sum of concave functions (u → u^{p/q} when p ≤ q) of probabilities. The minimization of such an additive information cost function , a special case of Schurconcave functionals , is used for instance in best basis selection .
Resultingly, special cases of l_{p}/l_{q} quasinorm ratios have served as sparsityinducing penalties in the long history of blind signal deconvolution or image deblurring, as as stopping criteria (for instance in NMF, nonnegative matrix factorization ), measures of sparsity satisfying a number of sound parsimonyprone axioms , estimates for timefrequency spread or concentration . The l_{p} quasinorm weighted by l_{2} is considered as a “possible sparseness criterion” in when scaled with the normalizing factor $N^{\frac{1}{p}\frac{1}{2}}$. It bears close connection with the kurtosis for centered distributions and central moments, widely used in sparse sensory coding or adaptive filtering .
The most frequent one with (p, q) = (1, 2) is used as a surrogate to l_{0} , see for instance . This ratio was recently used to enhance Lasso recovery on graphs . The early history of this track includes the “minimum entropy deconvolution” proposed in , where the ”varimax norm“, an adaptation of the kurtosis (l_{4}/l_{2})^{4}, is maximized to yield visually simpler (spikier) signals. It was inspired by simplicity measures proposed in factor analysis , and meant to improve one of the earliest mentioned l_{0} regularization for robust modeling in seismic. The relationship with the concept of entropy was later explained asymptotically in . It was generalized to the socalled ”variable norm deconvolution" by maximizing (l_{q}/l_{2})^{q} by . Note that the techniques in are relatively rudimentary. They aim at finding some inverse filter that maximizes a given contrast. They do not explicitly take into account noise statistics. Even more, the deconvolved signal estimate is linearly obtained from observations, see for an overview. More recently, uses l_{1}/l_{∞} for sparse recovery, and l_{∞}/l_{0} for cardinalitypenalized clustering. The work in introduces the family of entropybased sparsity measures $\left(\ell_q/\ell_1 \right)^{\frac{q}{1q}}$ (termed qratio sparsity level in ), extending a previous work on squared l_{1}/l_{2} ratios for compressed sensing. Finally, proposes an extension of to an l_{q}/l_{2} ratio to discriminate between sharp and blurry images, and use norm ratio for the purpose of impulsive signature enhancement in sparse filtering, still without rigorous convergence proofs.
Our main contribution resides in providing a set of smoothenough surrogates to with sufficient Lipschitz regularity. The resulting penalties, called SPOQ, extend the SOOT approach from . We also propose a novel trustregion which generalizes and improves the variable metric forwardbackward algorithm from . This paper is organized as follows. Section [Introduction] recalled the parsimony role and sparsity measures. Section [ProposedFormulation] describes the observation model and the proposed SPOQ quasinormnorm or normratio sparsity regularization. We derive our trustregion minimization algorithm and analyze its convergence guarantees in Section [MinimizationAlgorithm]. Section [Application] illustrates the performance of proposed SPOQ regularization in the recovery of “naturally sparse” mass spectrometry signals, over , , Cauchy, Welsch and sparsity penalties.
Named after William of Ockham, to whom it is often attributed, and stated as “Entities should not be multiplied without necessity”, or “Non sunt multiplicanda entia sine necessitate”.↩
It is neither a norm nor a quasinorm. Its pseudonorm moniker depends on the definition of the subhomogeneity axiom.↩
The best such K is usually called the modulus of concavity of the quasinorm, and saturates to 1 for norms. For 0 < p ≤ 1, one obtains $\ell_p(\ex_+\ey) \le 2^{\frac{1p}{p}} \big(\ell_p(\ex)+\ell_p(\ey)\big)$.↩
Other denominations are subset selection, minimum weight solution, sparse nullspace or minimum set cover.↩
For finite positive sequences $\ex$ and $\ey$, u, v, w > 0 and 1/u + 1/v ≤ 1/w we have , or :
(∑_{n = 1}^{N}(x_{n}y_{n})^{w})^{1/w} ≤ (∑_{n = 1}^{N}(x_{n})^{u})^{1/u}(∑_{n = 1}^{N}(y_{n})^{v})^{1/v} .
↩