<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xml:lang="en" article-type="research-article">
  <front>
    <journal-meta>
      <journal-id journal-id-type="publisher-id">algorithms</journal-id>
      <journal-title>Algorithms</journal-title>
      <abbrev-journal-title abbrev-type="publisher">Algorithms</abbrev-journal-title>
      <abbrev-journal-title abbrev-type="pubmed">algorithms</abbrev-journal-title>
      <issn pub-type="epub">1999-4893</issn>
      <publisher>
        <publisher-name>MDPI</publisher-name>
      </publisher>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.3390/a5040421</article-id>
      <article-id pub-id-type="publisher-id">algorithms-05-00421</article-id>
      <article-categories>
        <subj-group>
          <subject>Article</subject>
        </subj-group>
      </article-categories>
      <title-group>
        <article-title>Univariate <italic>L</italic><italic><sup>p</sup></italic> and <italic>l</italic><italic><sup>p</sup></italic> Averaging, 0 &lt; <italic>p</italic> &lt; 1, in Polynomial Time by Utilization of Statistical Structure</article-title>
      </title-group>
	  <contrib-group>
        <contrib contrib-type="author">
          <name>
            <surname>Lavery</surname>
            <given-names>John E.</given-names>
          </name>
          <xref rid="af1-algorithms-05-00421" ref-type="aff">1</xref>
          <xref rid="af2-algorithms-05-00421" ref-type="aff">2</xref>
        </contrib>
      </contrib-group>
      
      <aff id="af1-algorithms-05-00421"><label>1 </label>Mathematical Sciences Division and Computing Sciences Division, Army Research Office, Army Research Laboratory, P.O. Box 12211, Research Triangle Park, NC 27709-2211, USA; Email: <email>john.e.lavery4.civ@mail.mil</email>; Tel.: +1-919-549-4253; Fax: +1-919-549-4354</aff>
      <aff id="af2-algorithms-05-00421"><label>2 </label>Edward P. Fitts Department of Industrial and Systems Engineering, North Carolina State University, Raleigh, NC 27695-7906, USA</aff>
      <pub-date pub-type="epub">
        <day>05</day>
        <month>10</month>
        <year>2012</year>
      </pub-date>
      <pub-date pub-type="collection">
	  <month>12</month>
        <year>2012</year>
      </pub-date>
      <volume>5</volume>
      <issue>4</issue>
      <fpage>421</fpage>
      <lpage>432</lpage>
      <history>
        <date date-type="received">
          <day>28</day>
          <month>07</month>
          <year>2012</year>
        </date>
        <date date-type="rev-recd">
          <day>06</day>
          <month>09</month>
          <year>2012</year>
        </date>
        <date date-type="accepted">
          <day>17</day>
          <month>09</month>
          <year>2012</year>
        </date>
      </history>
      <permissions>
        <copyright-statement>©  2012 by the authors; licensee MDPI, Basel, Switzerland.</copyright-statement>
        <copyright-year>2012</copyright-year>
        <license xmlns:xlink="http://www.w3.org/1999/xlink" license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/3.0/">
          <p>This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license (<uri>http://creativecommons.org/licenses/by/3.0/</uri>).</p>
        </license>
      </permissions>
      <abstract>
        <p>We present evidence that one can calculate generically combinatorially expensive <italic>L<sup>p</sup></italic> and <italic>l<sup>p</sup></italic> averages, 0 &lt; <italic>p</italic> &lt; 1, in polynomial time by restricting the data to come from a wide class of statistical distributions. Our approach differs from the approaches in the previous literature, which are based on <italic>a priori</italic> sparsity requirements or on accepting a local minimum as a replacement for a global minimum. The functionals by which <italic>L<sup>p</sup></italic> averages are calculated are not convex but are radially monotonic and the functionals by which <italic>l<sup>p</sup></italic> averages are calculated are nearly so, which are the keys to solvability in polynomial time. Analytical results for symmetric, radially monotonic univariate distributions are presented. An algorithm for univariate <italic>l<sup>p</sup></italic> averaging is presented. Computational results for a Gaussian distribution, a class of symmetric heavy-tailed distributions and a class of asymmetric heavy-tailed distributions are presented. Many phenomena in human-based areas are increasingly known to be represented by data that have large numbers of outliers and belong to very heavy-tailed distributions. When tails of distributions are so heavy that even medians (<italic>L</italic><sup>1</sup> and <italic>l</italic><sup>1</sup> averages) do not exist, one needs to consider using <italic>l<sup>p</sup></italic> minimization principles with 0 &lt; <italic>p</italic> &lt; 1.</p>
      </abstract>
      <kwd-group>
        <kwd>average</kwd>
        <kwd>heavy-tailed distribution</kwd>
        <kwd><italic>L<sup>p</sup></italic> average</kwd>
        <kwd><italic>l<sup>p</sup></italic> average</kwd>
        <kwd>median</kwd>
        <kwd>mode</kwd>
        <kwd>polynomial time</kwd>
        <kwd>radial monotonicity</kwd>
        <kwd>statistical structure</kwd>
        <kwd>univariate</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec sec-type="intro">
      <title>1. Introduction</title>
      <p>Minimization principles based on the <italic>l</italic><sup>1</sup> and <italic>L</italic><sup>1</sup> norms have recently rapidly become more common due to discovery of their important roles in sparse representation in signal and image processing [<xref ref-type="bibr" rid="B1-algorithms-05-00421">1</xref>,<xref ref-type="bibr" rid="B2-algorithms-05-00421">2</xref>], compressive sensing [<xref ref-type="bibr" rid="B3-algorithms-05-00421">3</xref>,<xref ref-type="bibr" rid="B4-algorithms-05-00421">4</xref>], shape-preserving geometric modeling [<xref ref-type="bibr" rid="B5-algorithms-05-00421">5</xref>,<xref ref-type="bibr" rid="B6-algorithms-05-00421">6</xref>] and robust principal component analysis [<xref ref-type="bibr" rid="B7-algorithms-05-00421">7</xref>,<xref ref-type="bibr" rid="B8-algorithms-05-00421">8</xref>,<xref ref-type="bibr" rid="B9-algorithms-05-00421">9</xref>]. In compressive sensing and sparse representation, it is known that, under proper sparsity conditions (for example, the restricted isometry property [<xref ref-type="bibr" rid="B3-algorithms-05-00421">3</xref>,<xref ref-type="bibr" rid="B4-algorithms-05-00421">4</xref>]), <italic>l</italic><sup>1</sup> solutions are equivalent to “<italic>l</italic><sup>0</sup> solutions”, that is, the sparsest solutions, an important result because it allows one to find the solution of a combinatorially expensive <italic>l</italic><sup>0</sup> maximum-sparsity minimization problem by a polynomial-time linear programming procedure for minimizing <italic>l</italic><sup>1</sup> functionals. When the data follow heavy-tailed statistical distributions and the tails of the distributions are “not too heavy,” various <italic>l</italic><sup>1</sup> minimization principles, in the form of calculation of medians and quantiles, are primary choices that are efficient and robust against the many outliers [<xref ref-type="bibr" rid="B10-algorithms-05-00421">10</xref>,<xref ref-type="bibr" rid="B11-algorithms-05-00421">11</xref>,<xref ref-type="bibr" rid="B12-algorithms-05-00421">12</xref>]. Such distributions correspond to the uncertainty in many human-based phenomena and activities, including the Internet [<xref ref-type="bibr" rid="B13-algorithms-05-00421">13</xref>,<xref ref-type="bibr" rid="B14-algorithms-05-00421">14</xref>], finance [<xref ref-type="bibr" rid="B15-algorithms-05-00421">15</xref>,<xref ref-type="bibr" rid="B16-algorithms-05-00421">16</xref>] and other human and physical phenomena [<xref ref-type="bibr" rid="B16-algorithms-05-00421">16</xref>]. <italic>l</italic><sup>1</sup> minimization principles are applicable also to data from light-tailed distributions such as the Gaussian, but, for such distributions, are less efficient than classical procedures (calculation of standard averages and variances).</p>
      <p>When tails of the distributions are so heavy that even <italic>l</italic><sup>1</sup> minimization principles do not exist, one needs to consider using <italic>l<sup>p</sup></italic> minimization principles with 0 &lt; <italic>p</italic> &lt; 1, a topic on which investigation has recently started [<xref ref-type="bibr" rid="B2-algorithms-05-00421">2</xref>,<xref ref-type="bibr" rid="B3-algorithms-05-00421">3</xref>,<xref ref-type="bibr" rid="B17-algorithms-05-00421">17</xref>,<xref ref-type="bibr" rid="B18-algorithms-05-00421">18</xref>,<xref ref-type="bibr" rid="B19-algorithms-05-00421">19</xref>,<xref ref-type="bibr" rid="B20-algorithms-05-00421">20</xref>]. <italic>l<sup>p</sup></italic> minimization principles, 0 &lt; <italic>p</italic> &lt; 1, are of interest because they produce solutions that are in general sparser, that is, closer to <italic>l</italic><sup>0</sup> solutions, than <italic>l</italic><sup>1</sup> minimization principles [<xref ref-type="bibr" rid="B20-algorithms-05-00421">20</xref>]. However, when 0 &lt; <italic>p</italic> &lt; 1, solving <italic>l<sup>p</sup></italic> minimization principles is generically combinatorially expensive (NP-hard) [<xref ref-type="bibr" rid="B18-algorithms-05-00421">18</xref>], because <italic>l<sup>p</sup></italic> minimization principles can have arbitrarily large numbers of local minima. (“Generically” means “in the absence of additional information.”) Investigations about polynomial-time <italic>l<sup>p</sup></italic> minimization, 0 &lt; <italic>p</italic> &lt; 1, have focused on (1) obtaining local rather than global solutions [<xref ref-type="bibr" rid="B2-algorithms-05-00421">2</xref>,<xref ref-type="bibr" rid="B18-algorithms-05-00421">18</xref>,<xref ref-type="bibr" rid="B20-algorithms-05-00421">20</xref>] and (2) achieving a global minimum by restricting the class of problems to those with sufficient sparsity [<xref ref-type="bibr" rid="B3-algorithms-05-00421">3</xref>,<xref ref-type="bibr" rid="B17-algorithms-05-00421">17</xref>,<xref ref-type="bibr" rid="B19-algorithms-05-00421">19</xref>] (the approach used in compressive sensing). However, local solutions often differ strongly from global solutions and sparsity restrictions are often not applicable. The fact that the <italic>l</italic><sup>0</sup> solution is, relative to other potential solutions, the sparsest solution does not imply that this solution is sparse to any specific degree. The sparsest solution may not be sparse in any absolute sense at all; it is just sparser than any other solution.</p>
      <p>The approach that we will investigate in the present paper shares with compressive sensing the strategy of restricting the nature of the problem to achieve polynomial-time performance. However, we do so not by requiring sparsity to some <italic>a priori</italic> set level but rather by restricting the data to come from a wide class of statistical distributions, an approach not previously considered in the literature. This restriction turns out to be mild, often verifiable and often realistic since the problem as posed is often meaningful only when the data come from a statistical distribution. The approach in this paper differs from the approaches in the previous literature on <italic>l<sup>p</sup></italic> minimization principles also in a second way, namely, in that it starts the investigation of <italic>l<sup>p</sup></italic> minimization principles from consideration of their continuum analogues, <italic>L<sup>p</sup></italic> minimization principles.</p>
      <p>The classes of <italic>L<sup>p</sup></italic> and <italic>l<sup>p</sup></italic> minimization principles that we will investigate in this paper are those that represent univariate continuum <italic>L<sup>p</sup></italic> averaging and discrete <italic>l<sup>p</sup></italic> averaging, defined as follows. Univariate <italic>L<sup>p</sup></italic> and <italic>l<sup>p</sup></italic> averages are the real numbers <italic>a</italic> at which the following functionals <italic>A</italic> and <italic>B</italic> achieve their respective global minima:
      <disp-formula id="algorithms-05-00421-i001">
<inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="algorithms-05-00421-i001.tif"/>
<label>(1)</label>
</disp-formula>

where <italic>ψ</italic> is a probability density function (pdf) that satisfies the conditions given below, and
      <disp-formula id="algorithms-05-00421-i002">
<inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="algorithms-05-00421-i002.tif"/>
<label>(2)</label>
</disp-formula>
where the <italic>x<sub>i</sub></italic> are data points from the distribution with pdf <italic>ψ</italic>. The pdf <italic>ψ</italic> is assumed to have measurable second derivative and to satisfy the following two conditions:
      <list list-type="bullet">
        <list-item>
          <p>radially strictly monotonically decreasing outwards from the mode (3a)</p>
        </list-item>
        <list-item>
          <p>ψ and d<italic>ψ</italic>/d<italic>x</italic> bounded by <italic>c</italic>|<italic>x</italic>|<sup>–<italic>β</italic></sup> and <italic>c</italic>|<italic>x</italic>|<sup>–<italic>β</italic>–1</sup>, respectively, for given <italic>c</italic> and <italic>β</italic> &gt; <italic>p</italic> + 1 as <inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="algorithms-05-00421-i003.tif"/> (3b)</p>
        </list-item>
      </list></p>
      <p>Without loss of generality, we assume that the mode, that is, the <italic>x</italic> at which <italic>ψ</italic> achieves its maximum, is at the origin.</p>
      <p>In a departure from the traditional use of <italic>x</italic> as the independent variable of a univariate pdf, we will express univariate pdfs in radial form with <italic>r</italic> being the radius measured outward from the mode of the distribution. (This notation is chosen to allow natural generalization to higher dimensions in the future.) With the notation <italic>g</italic>(<italic>r</italic>) = <italic>ψ</italic>(–<italic>r</italic>) and <italic>f</italic>(<italic>r</italic>) = <italic>ψ</italic>(<italic>r</italic>), <italic>r</italic> ≥ 0, functional <italic>A</italic> can be rewritten in the form
      <disp-formula id="algorithms-05-00421-i004">
<inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="algorithms-05-00421-i004.tif"/>
<label>(4)</label>
</disp-formula>

Since functional (4) is finite only when
      <disp-formula id="algorithms-05-00421-i005">
<inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="algorithms-05-00421-i005.tif"/>
<label>(5)</label>
</disp-formula>

the mean (<italic>L</italic><sup>2</sup> average) does not exist for distributions with <italic>β</italic> ≤ 3 and even the median (<italic>L</italic><sup>1</sup> average) does not exist for distributions with <italic>β</italic> ≤ 2. For example, the median does not exist for the Student <italic>t</italic> distribution with one degree of freedom because <italic>β</italic> = 2 for this distribution. To create meaningful “averages” in these cases, weighted and trimmed sample means have been proposed with success [<xref ref-type="bibr" rid="B21-algorithms-05-00421">21</xref>]. However, weighted and trimmed sample means require <italic>a priori</italic> knowledge of the specific distribution and/or of various parameters, knowledge that is often not available. Minimization of the <italic>L<sup>p</sup></italic> functional (4) or of the <italic>l<sup>p</sup></italic> functional (2) is, when 0 &lt; <italic>p</italic> &lt; min{1, <italic>β</italic>−1}, an alternative for creating an “average” for a heavy-tailed distribution or of a sample thereof.</p>
      <p>In the present paper, we will investigate whether, by providing only the information that the data come from a “standard” statistical distribution that satisfies Conditions (3), the <italic>L<sup>p</sup></italic> and <italic>l<sup>p</sup></italic> averaging functionals <italic>A</italic> and <italic>B</italic> can be minimized in a way that leads to polynomial-time minimization of general <italic>L<sup>p</sup></italic> and <italic>l<sup>p</sup></italic> functionals. Specifically, in the next two sections, we will investigate to what extent the <italic>L<sup>p</sup></italic> and <italic>l<sup>p</sup></italic> averaging functionals are devoid of local minima other than the global minimum, a key feature in this process. For illustration of the theoretical results, we will present computational results for the following three types of distributions:</p>
      <p><italic>Distribution 1</italic>: Gaussian (light-tailed distribution) distribution with probability density function
      <disp-formula id="algorithms-05-00421-i006">
<inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="algorithms-05-00421-i006.tif"/>
<label>(6)</label>
</disp-formula></p>

      <p><italic>Distribution 2</italic>: Symmetric heavy-tailed distribution with probability density function
      <disp-formula id="algorithms-05-00421-i007">
<inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="algorithms-05-00421-i007.tif"/>
<label>(7a)</label>
</disp-formula>

      <disp-formula id="algorithms-05-00421-i008">
<inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="algorithms-05-00421-i008.tif"/>
<label>(7b)</label>
</disp-formula>

where
      <disp-formula id="algorithms-05-00421-i009">
<inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="algorithms-05-00421-i009.tif"/>
<label>(7c)</label>
</disp-formula>

(For Distribution 2, the <italic>β</italic> of condition (3b) is <italic>α</italic>.)</p>
      <p><italic>Distribution 3</italic>: Asymmetric heavy-tailed distribution with probability density function
      <disp-formula id="algorithms-05-00421-i010">
<inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="algorithms-05-00421-i010.tif"/>
<label>(8a)</label>
</disp-formula>

      <disp-formula id="algorithms-05-00421-i011">
<inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="algorithms-05-00421-i011.tif"/>
<label>(8b)</label>
</disp-formula>

(right tail heavier than left tail), where
      <disp-formula id="algorithms-05-00421-i012">
<inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="algorithms-05-00421-i012.tif"/>
<label>(8c)</label>
</disp-formula>
(For Distribution 3, the <italic>β</italic> of condition (3b) is (1 + <italic>α</italic>)/2.)</p>
      <p>In Distributions 2 and 3, <italic>α</italic> is a real number &gt; 1. Gaussian Distribution 1 is used to show that the results discussed here are applicable not only to heavy-tailed distributions but also to light-tailed distributions. These results are applicable <italic>a fortiori</italic> to compact distributions with no tails at all (tails uniformly 0). (Analysis and computations were carried out with the uniform distribution and with a pyramidal distribution, two distributions with no tails, but these results will not be discussed here.) While <italic>L<sup>p</sup></italic> and <italic>l<sup>p</sup></italic> averages can be calculated for light-tailed and no-tailed distributions, there are more meaningful and more efficient ways, for example, arithmetic averaging, to calculate central points of light-tailed and no-tailed distributions. <italic>L<sup>p</sup></italic> and <italic>l<sup>p</sup></italic> averages are most meaningful for heavy-tailed distributions.</p>
    </sec>
    <sec>
      <title>2. <italic>L</italic><sup>p</sup> Averaging</title>
      <p>We present in <xref ref-type="fig" rid="algorithms-05-00421-f001">Figure 1</xref>, <xref ref-type="fig" rid="algorithms-05-00421-f002">Figure 2</xref> and <xref ref-type="fig" rid="algorithms-05-00421-f003">Figure 3</xref> the functionals <italic>A</italic>(<italic>a</italic>) for Distributions 1-3, respectively, for various <italic>p</italic>. These functionals <italic>A</italic>(<italic>a</italic>) have one global minimum at or near <italic>r</italic> = 0, no additional minima, are convex in a neighborhood of the global minimum and are concave outside of this neighborhood. The fact that the <italic>A</italic>(<italic>a</italic>) are not globally convex is not important. Each <italic>A</italic>(<italic>a</italic>) is radially monotonically increasing outward from its minimum, which is sufficient to guarantee that there is only one global minimum and that there are no other local minima. On every finite closed interval in <xref ref-type="fig" rid="algorithms-05-00421-f001">Figure 1</xref>, <xref ref-type="fig" rid="algorithms-05-00421-f002">Figure 2</xref> and <xref ref-type="fig" rid="algorithms-05-00421-f003">Figure 3</xref> that does not include the global minimum, the derivative d<italic>A</italic>/d<italic>a</italic> is bounded away from 0. Hence, in all these cases, standard line-search methods converge to the global minimum in polynomial time. The structure of <italic>A</italic>(<italic>a</italic>) seen in <xref ref-type="fig" rid="algorithms-05-00421-f001">Figure 1</xref>, <xref ref-type="fig" rid="algorithms-05-00421-f002">Figure 2</xref> and <xref ref-type="fig" rid="algorithms-05-00421-f003">Figure 3</xref> is due to the fact that <italic>A</italic>(<italic>a</italic>) is based on a probability density function with strictly monotonically decreasing density in the radial directions outward from the mode. This structure does not generically occur for density functions <italic>f</italic>(<italic>r</italic>) and <italic>g</italic>(<italic>r</italic>) representing, for example, irregular scattered clusters. However, averaging in general and <italic>L<sup>p</sup></italic> averaging in particular make little sense when the data are clustered irregularly. The computational results presented in <xref ref-type="fig" rid="algorithms-05-00421-f001">Figure 1</xref>, <xref ref-type="fig" rid="algorithms-05-00421-f002">Figure 2</xref> and <xref ref-type="fig" rid="algorithms-05-00421-f003">Figure 3</xref> suggest the hypothesis that, under “normal” statistical conditions on the data, <italic>L<sup>p</sup></italic> averaging is well posed and computationally tractable. In the remainder of this section, we will investigate portions of this hypothesis.</p>
      <fig id="algorithms-05-00421-f001" position="anchor">
        <label>Figure 1</label>
        <caption>
          <p><italic>L<sup>p</sup></italic> averaging functional <italic>A</italic>(<italic>a</italic>) for Gaussian Distribution 1.</p>
        </caption>
        <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="algorithms-05-00421-g001.tif"/>
      </fig>
      <fig id="algorithms-05-00421-f002" position="anchor">
        <label>Figure 2</label>
        <caption>
          <p><italic>L<sup>p</sup></italic> averaging functional <italic>A</italic>(<italic>a</italic>) for symmetric heavy-tailed Distribution 2 with <italic>α</italic> = 2.</p>
        </caption>
        <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="algorithms-05-00421-g002.tif"/>
      </fig>
      <fig id="algorithms-05-00421-f003" position="anchor">
        <label>Figure 3</label>
        <caption>
          <p><italic>L<sup>p</sup></italic> averaging functional <italic>A</italic>(<italic>a</italic>) for asymmetric heavy-tailed Distribution 3 with <italic>α</italic> = 2.</p>
        </caption>
        <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="algorithms-05-00421-g003.tif"/>
      </fig>
      <p>The structure of the <italic>L<sup>p</sup></italic> averaging functional <italic>A</italic>(<italic>a</italic>) seen in <xref ref-type="fig" rid="algorithms-05-00421-f001">Figure 1</xref>, <xref ref-type="fig" rid="algorithms-05-00421-f002">Figure 2</xref> and <xref ref-type="fig" rid="algorithms-05-00421-f003">Figure 3</xref> and described in the previous paragraph occurs for all symmetric distributions, a situation that can be shown as follows. For symmetric distributions (that is, those for which <italic>g</italic>(<italic>r</italic>) = <italic>f</italic>(<italic>r</italic>)), the <italic>L<sup>p</sup></italic> averaging functional <italic>A</italic>(<italic>a</italic>) can be written as
      <disp-formula id="algorithms-05-00421-i016">
<inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="algorithms-05-00421-i016.tif"/>
<label>(9)</label>
</disp-formula>

      <italic>A</italic>(<italic>a</italic>) is symmetric around <italic>a</italic> = 0, so we need consider only the behavior of <italic>A</italic>(<italic>a</italic>) for <italic>a</italic> ≥ 0. For <italic>a</italic> ≥ 0,
      <disp-formula id="algorithms-05-00421-i017">
<inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="algorithms-05-00421-i017.tif"/>
<label>(10)</label>
</disp-formula>

      and
      <disp-formula id="algorithms-05-00421-i019">
<inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="algorithms-05-00421-i019.tif"/>
<label>(11)</label>
</disp-formula></p>

      <p>One computes expressions (10) and (11) by differentiating the right sides of expressions (9) and (10), respectively, with respect to <italic>a</italic>. One expresses the integral to be differentiated as the sum of an integral on (0,<italic>a</italic>) and an integral on (<italic>a</italic>,∞) and differentiates these two integrals separately. To simplify d<italic>A</italic>/d<italic>a</italic> to the form given in (10), one integrates by parts and combines the two resulting integrals. From these expressions, one obtains first that d<italic>A</italic>/d<italic>a</italic>(0) = 0 and d<sup>2</sup><italic>A</italic>/d<italic>a</italic><sup>2</sup>(0) &gt; 0, that is, there is a local minimum at <italic>a</italic> = 0 and second that, for all <italic>a</italic> &gt; 0, d<italic>A</italic>/d<italic>a</italic>(<italic>a</italic>) &gt; 0, that is, <italic>A</italic> is strictly monotonically increasing for <italic>a</italic> &gt; 0. Thus, for symmetric pdfs, <italic>A</italic>(<italic>a</italic>) has its global minimum at <italic>a</italic> = 0, that is, the <italic>L<sup>p</sup></italic> average exists and is equal to the mode of the distribution. There are no places where d<italic>A</italic>/d<italic>a</italic> = 0 other than at <italic>a</italic> = 0 and, on every finite closed interval that does not include the mode 0, d<italic>A</italic>/d<italic>a</italic> is bounded away from 0. Standard line-search methods for calculating the minimum of this <italic>A</italic>(<italic>a</italic>) are thus globally convergent.</p>
      <p>A general analytical structure for asymmetric distributions analogous to that described above for symmetric distributions is not yet available because, for asymmetric distributions, the properties of <italic>A</italic>(<italic>a</italic>) depend on additional properties of the probability density functions <italic>f</italic>(<italic>r</italic>) and <italic>g</italic>(<italic>r</italic>) that have not yet been clarified. Most of the previous statistical research about two-tailed distributions that extend infinitely in each direction has been focused on symmetric distributions and it is the symmetric case on which we will focus in the remainder of this paper.</p>
    </sec>
    <sec>
      <title>3. <italic>l</italic><sup>p</sup> Averaging</title>
      <p>It is meaningful to calculate an <italic>l<sup>p</sup></italic> average of a discrete set of data, that is, the point at which <italic>B</italic>(<italic>a</italic>) achieves its global minimum, only for data from a distribution that satisfies Conditions (3) and for which the <italic>L<sup>p</sup></italic> average exists, that is, for which 0 &lt; <italic>p</italic> &lt; <italic>β</italic> − 1. We propose the following algorithm.</p>
      <p><italic>Algorithm 1</italic>: Algorithm for <italic>l<sup>p</sup></italic> Averaging</p>
      <p>STEP 1. Sort the data <italic>x<sub>i</sub></italic>, <italic>i</italic> = 1, 2, . . . , <italic>I</italic>, from smallest to largest. (To avoid proliferation of notation, use the same notation <italic>x<sub>i</sub></italic>, <italic>i</italic> = 1, 2, . . . , <italic>I</italic>, for the data after sorting as before.)</p>
      <p>STEP 2. Choose an integer <italic>q</italic> that represents the number of neighbors of a given point in the sorted data set in each direction (lower and higher index) that will be included in a local set of indices to be used in the “window” in Step 4. (The “window size” is thus 2<italic>q</italic> + 1)</p>
      <p>STEP 3. Choose a point <italic>x<sub>j</sub></italic> from which to start. (The median of the data, that is, the <italic>l</italic><sup>1</sup> average, is generally a good choice for the initial <italic>x<sub>j</sub></italic>.)</p>
      <p>STEP 4. For each <italic>k</italic>, <italic>j</italic> − <italic>q</italic> ≤ <italic>k</italic> ≤ <italic>j</italic> + <italic>q</italic>, calculate <italic>B</italic>(<italic>x<sub>k</sub></italic>).</p>
      <p>STEP 5. If the <italic>x<sub>k</sub></italic> that yields the minimum of the <italic>B</italic>(<italic>x<sub>k</sub></italic>) calculated in Step 4 is <italic>x<sub>j</sub></italic>, stop. In this case, <italic>x<sub>j</sub></italic> is the computed <italic>l<sup>p</sup></italic> average of the data. Otherwise, let <italic>x<sub>k</sub></italic> be a new <italic>x<sub>j</sub></italic> and return to Step 4.</p>
      <p>STEP 6. If convergence has not occurred within a predetermined number of iterations, stop and return an error message.</p>
      <p><italic>Remark 1</italic>. Algorithm 1 considers the values of <italic>B</italic>(<italic>a</italic>) only at the data points <italic>x<sub>i</sub></italic> and not between data points. For <italic>a</italic> strictly between two consecutive data points <italic>x<sub>i</sub></italic> and <italic>x<sub>i</sub></italic><sub>+1</sub>, <italic>B</italic>(<italic>a</italic>) is concave and is above the line connecting (<italic>x<sub>i</sub></italic>,<italic>B</italic>(<italic>x<sub>i</sub></italic>)) and (<italic>x<sub>i</sub></italic><sub>+1</sub>,<italic>B</italic>(<italic>x<sub>i</sub></italic><sub>+1</sub>)), so a minimum cannot occur there. It is sufficient, therefore, to consider only the values of <italic>B</italic> at the points <italic>x<sub>i</sub></italic> when searching for a minimum. A graph of the points (<italic>x<sub>i</sub></italic>,<italic>B</italic>(<italic>x<sub>i</sub></italic>)), <italic>i</italic> = 1, 2, . . . , <italic>I</italic>, approximates the graph of the continuum <italic>L<sup>p</sup></italic> functional <italic>A</italic>(<italic>a</italic>), which, for symmetric distributions, has only one local minimum, namely, its global minimum. The graph of the points (<italic>x<sub>i</sub></italic>,<italic>B</italic>(<italic>x<sub>i</sub></italic>)) may have some relatively shallow local minima produced by the irregular spacing of the <italic>x<sub>i</sub></italic> (cf. <xref ref-type="fig" rid="algorithms-05-00421-f004">Figure 4</xref> below) and/or the asymmetry of the distribution. The window structure of Algorithm 1 is designed to allow the algorithm to “jump over” these local minima on its way to the global minimum.</p>
      <p><italic>Remark 2</italic>. The cost of Algorithm 1 is polynomial, namely, the cost <italic>O</italic>(<italic>I</italic> log <italic>I</italic>) of the sorting operation of Step 1 plus the cost of the iterations of Step 4, namely, <italic>O</italic>(<italic>I</italic><sup>2</sup>) (= the number of iterations, which cannot exceed <italic>O</italic>(<italic>I</italic>), times the cost <italic>O</italic>(<italic>I</italic>) of calculating each iteration). Analogous algorithms for higher-dimensional averages are expected to retain this polynomial-time nature.</p>
      <p>In computational experiments, we used samples of size I = 2000 from the symmetric heavy-tailed Distribution 2 with various α, 1 &lt; α ≤ 3, and window sizes 2q + 1 = 7, 9, 11, . . . , 25. For comparison with <xref ref-type="fig" rid="algorithms-05-00421-f002">Figure 2</xref>, we present in <xref ref-type="fig" rid="algorithms-05-00421-f004">Figure 4</xref> the graphs of the points (x<sub>i</sub>, B(x<sub>i</sub>)) for the sample from Distribution 2 with α = 2 and p = 0.5 and 0.02. The starting point for Step 3 of the Algorithm 1 was chosen to be x<sub>I−2q</sub>, a point near the end of the right tail (beyond the limited domains shown in <xref ref-type="fig" rid="algorithms-05-00421-f004">Figure 4</xref>). As mentioned in Step 3 of Algorithm 1, the median of the data is a much better choice for a starting point. However, choosing a point near the right tail makes the iterations of Algorithm 1 traverse a large distance before converging to an approximation of the l<sup>p</sup> average and thus provides an excellent test for the robustness of Algorithm 1. Computational results for p = 0.5, 0.1 and 0.02 and for window sizes 2q + 1 = 7, 13, 19 and 25 are presented in <xref ref-type="table" rid="algorithms-05-00421-t001">Table 1</xref>, <xref ref-type="table" rid="algorithms-05-00421-t002">Table 2</xref>, <xref ref-type="table" rid="algorithms-05-00421-t003">Table 3</xref> and <xref ref-type="table" rid="algorithms-05-00421-t004">Table 4</xref>. For reference, we note that the continuum L<sup>p</sup> averages of Distribution 2, when they exist, that is, when p &lt; α − 1, are all 0. Thus, the errors of the l<sup>p</sup> averages in <xref ref-type="table" rid="algorithms-05-00421-t001">Table 1</xref>, <xref ref-type="table" rid="algorithms-05-00421-t002">Table 2</xref>, <xref ref-type="table" rid="algorithms-05-00421-t003">Table 3</xref> and <xref ref-type="table" rid="algorithms-05-00421-t004">Table 4</xref> are the same as the l<sup>p</sup> averages themselves.</p>
      <p>The entries in <xref ref-type="table" rid="algorithms-05-00421-t001">Table 1</xref>, <xref ref-type="table" rid="algorithms-05-00421-t002">Table 2</xref>, <xref ref-type="table" rid="algorithms-05-00421-t003">Table 3</xref> and <xref ref-type="table" rid="algorithms-05-00421-t004">Table 4</xref> indicate that, for all cases with <italic>p</italic> &lt; <italic>α</italic> − 1, the <italic>l<sup>p</sup></italic> average computed by Algorithm 1 is an excellent approximant of the <italic>L<sup>p</sup></italic> average 0 given the large number of outliers and the huge spread of the data in Distribution 2. (For <italic>α</italic> = 3 and <italic>α</italic> = 1.02, the ranges of the data are [−16.0, 22.6] and [−6.44 × 10<sup>154</sup>, 5.02 × 10<sup>169</sup>], respectively. For <italic>α</italic> = 2, 1, 1.5, 1.1, 1.05, 1.04 and 1.03, the ranges are between these two ranges.) The entries for <italic>p</italic> = 0.5 with <italic>α</italic> = 1.5 and for <italic>p</italic> = 0.1 with <italic>α</italic> = 1.1, 1.05, 1.04 and 1.03 in <xref ref-type="table" rid="algorithms-05-00421-t001">Table 1</xref> and <xref ref-type="table" rid="algorithms-05-00421-t002">Table 2</xref> indicate that, in a few cases when <italic>p</italic> is equal to or only slightly greater than <italic>α</italic> − 1, the <italic>l<sup>p</sup></italic> average yielded by Algorithm 1 can still be a good approximant of the center of the distribution in spite of the fact that the <italic>l<sup>p</sup></italic> average is theoretically meaningful only when <italic>p</italic> &lt; <italic>α</italic> − 1. The entries for <italic>p</italic> = 0.5 with <italic>α</italic> = 1.1, 1.05, 1.04, 1.03 and 1.02 and for <italic>p</italic> = 0.1 with <italic>α</italic> = 1.02 indicate that, in accordance with expectations, when <italic>p</italic> is significantly greater than <italic>α</italic> − 1, the <italic>l<sup>p</sup></italic> average produced by Algorithm 1 is not a meaningful approximant of the center of the distribution. Since larger window size is of assistance when attempting to “jump over” local minima, it is expected that <italic>l<sup>p</sup></italic> averages should converge to the <italic>L<sup>p</sup></italic> average 0 as the window size 2<italic>q</italic> + 1 increases (and as the sample size increases). The results in <xref ref-type="table" rid="algorithms-05-00421-t001">Table 1</xref>, <xref ref-type="table" rid="algorithms-05-00421-t002">Table 2</xref>, <xref ref-type="table" rid="algorithms-05-00421-t003">Table 3</xref> and <xref ref-type="table" rid="algorithms-05-00421-t004">Table 4</xref> confirm that, for the samples used in these calculations, increasing the window size does indeed increase the accuracy of the <italic>l<sup>p</sup></italic> averages as approximations of the <italic>L<sup>p</sup></italic> average 0. In addition, the results in <xref ref-type="table" rid="algorithms-05-00421-t003">Table 3</xref> and <xref ref-type="table" rid="algorithms-05-00421-t004">Table 4</xref> for <italic>p</italic> &lt; <italic>α</italic> − 1 show that, for the samples used in these calculations, there is an optimal <italic>q</italic>, namely, <italic>q</italic> = 19 that produces <italic>l<sup>p</sup></italic> averages that are just as good as the <italic>l<sup>p</sup></italic> averages produced by the larger <italic>q</italic> = 25 but (due to smaller window size) requires less computational effort.</p>
	  <fig id="algorithms-05-00421-f004" position="anchor">
        <label>Figure 4</label>
        <caption>
          <p>Points (<italic>x<sub>i</sub></italic>, <italic>B</italic>(<italic>x<sub>i</sub></italic>)) for 2000-point sample from symmetric heavy-tailed Distribution 2 with <italic>α</italic> = 2.</p>
        </caption>
        <graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="algorithms-05-00421-g004.tif"/>
      </fig>
      
      <p>Algorithm 1 is applicable to heavy-tailed distributions in general but the rule for choosing <italic>q</italic> will certainly be dependent on the specific class of distributions under consideration. While this rule is not yet known precisely, we can provide here a description of the principles that will likely be the foundations for the rule. The choice of <italic>q</italic> is related to how wide the local minima in the discrete functional <italic>B</italic> are. The local minima of <italic>B</italic> occur at places where there are clusters of data points (due to expected statistical variation in the sample). Understanding the relationships between (1) the clustering properties of samples from the given class of distributions, (2) the widths of the local minima as functions of the clustering and (3) the <italic>p</italic>-dependent analytical properties of functional <italic>B</italic> will likely yield the rule for choosing <italic>q</italic>.</p>
	  <table-wrap id="algorithms-05-00421-t001" position="float">
        <object-id pub-id-type="pii">algorithms-05-00421-t001_Table 1</object-id>
        <label>Table 1</label>
        <caption>
          <p>Sample <italic>l<sup>p</sup></italic> averages calculated by Algorithm 1 with window size 2<italic>q</italic> + 1 = 7 for 2000-point data set from Distribution 2.</p>
        </caption>
        <table>
          <thead>
            <tr>
              <th align="left" valign="middle">α\ <sup>p</sup></th>
              <th align="left" valign="middle">0.5</th>
              <th align="left" valign="middle">0.1</th>
              <th align="left" valign="middle">0.02</th>
            </tr>
          </thead>
          <tbody>
            <tr>
              <td align="left" valign="middle">3</td>
              <td align="left" valign="middle">0.028</td>
              <td align="left" valign="middle">0.560</td>
              <td align="left" valign="middle">0.701</td>
            </tr>
            <tr>
              <td align="left" valign="middle">2</td>
              <td align="left" valign="middle">0.038</td>
              <td align="left" valign="middle">0.779</td>
              <td align="left" valign="middle">0.779</td>
            </tr>
            <tr>
              <td align="left" valign="middle">1.5</td>
              <td align="left" valign="middle">0.057</td>
              <td align="left" valign="middle">0.575</td>
              <td align="left" valign="middle">0.575</td>
            </tr>
            <tr>
              <td align="left" valign="middle">1.1</td>
              <td align="left" valign="middle">7.58</td>
              <td align="left" valign="middle">0.244</td>
              <td align="left" valign="middle">0.244</td>
            </tr>
            <tr>
              <td align="left" valign="middle">1.05</td>
              <td align="left" valign="middle">1.49 × 10<sup>30</sup></td>
              <td align="left" valign="middle">0.281</td>
              <td align="left" valign="middle">0.476</td>
            </tr>
            <tr>
              <td align="left" valign="middle">1.04</td>
              <td align="left" valign="middle">1.14 × 10<sup>45</sup></td>
              <td align="left" valign="middle">0.349</td>
              <td align="left" valign="middle">0.598</td>
            </tr>
            <tr>
              <td align="left" valign="middle">1.03</td>
              <td align="left" valign="middle">2.83 × 10<sup>74</sup></td>
              <td align="left" valign="middle">0.466</td>
              <td align="left" valign="middle">0.466</td>
            </tr>
            <tr>
              <td align="left" valign="middle">1.02</td>
              <td align="left" valign="middle">1.52 × 10<sup>119</sup></td>
              <td align="left" valign="middle">1.38 × 10<sup>16</sup></td>
              <td align="left" valign="middle">0.516</td>
            </tr>
          </tbody>
        </table>
      </table-wrap>
      <table-wrap id="algorithms-05-00421-t002" position="float">
        <object-id pub-id-type="pii">algorithms-05-00421-t002_Table 2</object-id>
        <label>Table 2</label>
        <caption>
          <p>Sample <italic>l<sup>p</sup></italic> averages calculated by Algorithm 1 with window size 2<italic>q</italic> + 1 = 13 for 2000-point data set from Distribution 2.</p>
        </caption>
        <table>
          <thead>
            <tr>
              <th align="left" valign="middle">α\ 
                <italic><sup>p</sup></italic></th>
              <th align="left" valign="middle">0.5</th>
              <th align="left" valign="middle">0.1</th>
              <th align="left" valign="middle">0.02</th>
            </tr>
          </thead>
          <tbody>
            <tr>
              <td align="left" valign="middle">3</td>
              <td align="left" valign="middle">0.021</td>
              <td align="left" valign="middle">0.094</td>
              <td align="left" valign="middle">0.531</td>
            </tr>
            <tr>
              <td align="left" valign="middle">2</td>
              <td align="left" valign="middle">0.027</td>
              <td align="left" valign="middle">0.126</td>
              <td align="left" valign="middle">0.126</td>
            </tr>
            <tr>
              <td align="left" valign="middle">1.5</td>
              <td align="left" valign="middle">0.041</td>
              <td align="left" valign="middle">0.189</td>
              <td align="left" valign="middle">0.189</td>
            </tr>
            <tr>
              <td align="left" valign="middle">1.1</td>
              <td align="left" valign="middle">3.76</td>
              <td align="left" valign="middle">0.108</td>
              <td align="left" valign="middle">0.108</td>
            </tr>
            <tr>
              <td align="left" valign="middle">1.05</td>
              <td align="left" valign="middle">2.56 × 10<sup>29</sup></td>
              <td align="left" valign="middle">0.207</td>
              <td align="left" valign="middle">0.207</td>
            </tr>
            <tr>
              <td align="left" valign="middle">1.04</td>
              <td align="left" valign="middle">1.14 × 10<sup>45</sup></td>
              <td align="left" valign="middle">0.257</td>
              <td align="left" valign="middle">0.257</td>
            </tr>
            <tr>
              <td align="left" valign="middle">1.03</td>
              <td align="left" valign="middle">2.83 × 10<sup>74</sup></td>
              <td align="left" valign="middle">0.341</td>
              <td align="left" valign="middle">0.341</td>
            </tr>
            <tr>
              <td align="left" valign="middle">1.02</td>
              <td align="left" valign="middle">1.52 × 10<sup>119</sup></td>
              <td align="left" valign="middle">3.24 × 10<sup>14</sup></td>
              <td align="left" valign="middle">0.516</td>
            </tr>
          </tbody>
        </table>
      </table-wrap>
      <table-wrap id="algorithms-05-00421-t003" position="float">
        <object-id pub-id-type="pii">algorithms-05-00421-t003_Table 3</object-id>
        <label>Table 3</label>
        <caption>
          <p>Sample <italic>l<sup>p</sup></italic> averages calculated by Algorithm 1 with window size 2<italic>q</italic> + 1 = 19 for 2000-point data set from Distribution 2.</p>
        </caption>
        <table>
          <thead>
            <tr>
              <th align="left" valign="middle">α \<sup>p</sup></th>
              <th align="left" valign="middle">0.5</th>
              <th align="left" valign="middle">0.1</th>
              <th align="left" valign="middle">0.02</th>
            </tr>
          </thead>
          <tbody>
            <tr>
              <td align="left" valign="middle">3</td>
              <td align="left" valign="middle">0.021</td>
              <td align="left" valign="middle">0.015</td>
              <td align="left" valign="middle">0.015</td>
            </tr>
            <tr>
              <td align="left" valign="middle">2</td>
              <td align="left" valign="middle">0.021</td>
              <td align="left" valign="middle">0.020</td>
              <td align="left" valign="middle">0.020</td>
            </tr>
            <tr>
              <td align="left" valign="middle">1.5</td>
              <td align="left" valign="middle">0.031</td>
              <td align="left" valign="middle">0.029</td>
              <td align="left" valign="middle">0.029</td>
            </tr>
            <tr>
              <td align="left" valign="middle">1.1</td>
              <td align="left" valign="middle">0.902</td>
              <td align="left" valign="middle">0.108</td>
              <td align="left" valign="middle">0.108</td>
            </tr>
            <tr>
              <td align="left" valign="middle">1.05</td>
              <td align="left" valign="middle">2.56 × 10<sup>29</sup></td>
              <td align="left" valign="middle">0.207</td>
              <td align="left" valign="middle">0.207</td>
            </tr>
            <tr>
              <td align="left" valign="middle">1.04</td>
              <td align="left" valign="middle">1.14 × 10<sup>45</sup></td>
              <td align="left" valign="middle">0.257</td>
              <td align="left" valign="middle">0.257</td>
            </tr>
            <tr>
              <td align="left" valign="middle">1.03</td>
              <td align="left" valign="middle">2.83 × 10<sup>74</sup></td>
              <td align="left" valign="middle">0.341</td>
              <td align="left" valign="middle">0.341</td>
            </tr>
            <tr>
              <td align="left" valign="middle">1.02</td>
              <td align="left" valign="middle">1.52 × 10<sup>119</sup></td>
              <td align="left" valign="middle">1.78 × 10<sup>7</sup></td>
              <td align="left" valign="middle">0.516</td>
            </tr>
          </tbody>
        </table>
      </table-wrap>
      <table-wrap id="algorithms-05-00421-t004" position="float">
        <object-id pub-id-type="pii">algorithms-05-00421-t004_Table 4</object-id>
        <label>Table 4</label>
        <caption>
          <p>Sample <italic>l<sup>p</sup></italic> averages calculated by Algorithm 1 with window size 2<italic>q</italic> + 1 = 25 for 2000-point data set from Distribution 2.</p>
        </caption>
        <table>
          <thead>
            <tr>
              <th align="left" valign="middle">α\ <italic><sup>p</sup></italic></th>
              <th align="left" valign="middle">0.5</th>
              <th align="left" valign="middle">0.1</th>
              <th align="left" valign="middle">0.02</th>
            </tr>
          </thead>
          <tbody>
            <tr>
              <td align="left" valign="middle">3</td>
              <td align="left" valign="middle">0.021</td>
              <td align="left" valign="middle">0.015</td>
              <td align="left" valign="middle">0.015</td>
            </tr>
            <tr>
              <td align="left" valign="middle">2</td>
              <td align="left" valign="middle">0.021</td>
              <td align="left" valign="middle">0.020</td>
              <td align="left" valign="middle">0.020</td>
            </tr>
            <tr>
              <td align="left" valign="middle">1.5</td>
              <td align="left" valign="middle">0.031</td>
              <td align="left" valign="middle">0.029</td>
              <td align="left" valign="middle">0.029</td>
            </tr>
            <tr>
              <td align="left" valign="middle">1.1</td>
              <td align="left" valign="middle">0.498</td>
              <td align="left" valign="middle">0.108</td>
              <td align="left" valign="middle">0.108</td>
            </tr>
            <tr>
              <td align="left" valign="middle">1.05</td>
              <td align="left" valign="middle">2.56 × 10<sup>29</sup></td>
              <td align="left" valign="middle">0. 207</td>
              <td align="left" valign="middle">0.207</td>
            </tr>
            <tr>
              <td align="left" valign="middle">1.04</td>
              <td align="left" valign="middle">1.14 × 10<sup>45</sup></td>
              <td align="left" valign="middle">0.257</td>
              <td align="left" valign="middle">0.257</td>
            </tr>
            <tr>
              <td align="left" valign="middle">1.03</td>
              <td align="left" valign="middle">2.83 × 10<sup>74</sup></td>
              <td align="left" valign="middle">0.341</td>
              <td align="left" valign="middle">0.341</td>
            </tr>
            <tr>
              <td align="left" valign="middle">1.02</td>
              <td align="left" valign="middle">1.52 × 10<sup>119</sup></td>
              <td align="left" valign="middle">2.37 × 10<sup>6</sup></td>
              <td align="left" valign="middle">0.516</td>
            </tr>
          </tbody>
        </table>
      </table-wrap>
      
    </sec>
    <sec sec-type="conclusions">
      <title>4. Conclusions</title>
      <p>The wide-spread impression that minimization of <italic>L<sup>p</sup></italic> and <italic>l<sup>p</sup></italic> functionals, 0 &lt; <italic>p</italic> &lt; 1, is combinatorially expensive is valid for general situations in which no structure of the data is known. However, the results in this paper suggest that, when the data come from an appropriate statistical distribution, <italic>L<sup>p</sup></italic> and <italic>l<sup>p</sup></italic> averages can be calculated in polynomial time. The approach of the paper is applicable without precise knowledge of the parameters of the distribution. One does not need precise knowledge of the parameters but rather only generalizations of Conditions (3), an upper bound on the exponent −<italic>β</italic> of the tail density and additional conditions for asymmetric distributions and for setting up a rule for choosing <italic>q</italic> in Algorithm 1.</p>
      <p>Topics for future research include
      <list list-type="bullet">
        <list-item>
          <p>Quantitative rules for using information about the underlying continuum distribution to choose the <italic>q</italic> of Algorithm 1 based on a user’s preferred tradeoff between maximum accuracy and minimum computational burden</p>
        </list-item>
        <list-item>
          <p>Investigation of the advantages and disadvantages of introducing smoothing in the <italic>B</italic>(<italic>x<sub>k</sub></italic>) calculated in Step 4 of Algorithm 1 to increase the robustness against shallow local minima; connection of the smoothing with properties of the underlying distributions</p>
        </list-item>
        <list-item>
          <p>Description of the class(es) of symmetric and asymmetric univariate and multivariate distributions for which radially strictly monotonic <italic>L<sup>p</sup></italic> averaging functionals and radially nearly strictly monotonic <italic>l<sup>p</sup></italic> averaging functionals can be created and thus for which <italic>L<sup>p</sup></italic> and <italic>l<sup>p</sup></italic> averages can be calculated in polynomial time</p>
        </list-item>
        <list-item>
          <p>Investigation of convergence of the <italic>l<sup>p</sup></italic> average to the <italic>L<sup>p</sup></italic> average and of related issues of efficiency, optimality, breakdown point, influence function, <italic>etc.</italic></p>
        </list-item>
        <list-item>
          <p>Investigation of the conditions under which <italic>L<sup>p</sup></italic> and <italic>l<sup>p</sup></italic> averages converge to the mode as <italic>p</italic> → 0</p>
        </list-item>
        <list-item>
          <p>Treatment of more general univariate and multivariate <italic>l<sup>p</sup></italic> minimization problems including but not limited to <italic>l<sup>p</sup></italic> regression and matrix-constrained <italic>l<sup>p</sup></italic> minimization, for example, minimization of
          <disp-formula id="algorithms-05-00421-i021">
<inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="algorithms-05-00421-i021.tif"/>
<label>(12)</label>
</disp-formula>
(cf. [<xref ref-type="bibr" rid="B17-algorithms-05-00421">17</xref>,<xref ref-type="bibr" rid="B18-algorithms-05-00421">18</xref>]) (The <italic>l<sup>p</sup></italic> averaging process considered in the present paper can be expressed in format (12).)
          </p>
        </list-item>
      </list></p>
           <p>Many phenomena in human-based areas (sociology, cognitive science, psychology, economics, human networks, social media, <italic>etc.</italic>) are increasingly known to be represented by data that have large numbers of outliers and belong to very heavy-tailed distributions, which suggests that <italic>L<sup>p</sup></italic> and <italic>l<sup>p</sup></italic> averaging, <italic>L<sup>p</sup></italic> and <italic>l<sup>p</sup></italic> regression and more general <italic>L<sup>p</sup></italic> and <italic>l<sup>p</sup></italic> minimization tasks, 0 &lt; <italic>p</italic> &lt; 1, will be important in practice. The results of the present paper provide the first indication that one may be able to solve, in polynomial time, generically combinatorially expensive <italic>L<sup>p</sup></italic> and <italic>l<sup>p</sup></italic> minimization problems for these phenomena by requiring only “natural” statistical structure without having to impose restrictions such as sparsity and without having to accept suboptimal local solutions instead of optimal global solutions.</p>
    </sec>
    
  </body>
  <back><ack>
      <title>Acknowledgment</title>
      <p>The author expresses his gratitude to the referees, whose well-though-out questions and insightful comments led to significant improvements in this paper.</p>
    </ack>
    <ref-list>
      <title>References</title>
      <ref id="B1-algorithms-05-00421">
        <label>1.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Gribonval</surname>
              <given-names>R.</given-names>
            </name>
            <name>
              <surname>Nielsen</surname>
              <given-names>M.</given-names>
            </name>
          </person-group>
          <article-title>Sparse Approximations in Signal and Image Processing</article-title>
          <source>EURASIP Book Ser. Signal Process. Commun.</source>
          <year>2006</year>
          <volume>86</volume>
          <fpage>415</fpage>
          <lpage>416</lpage>
        </citation>
      </ref>
      <ref id="B2-algorithms-05-00421">
        <label>2.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Lai</surname>
              <given-names>M.-J.</given-names>
            </name>
            <name>
              <surname>Wang</surname>
              <given-names>J.</given-names>
            </name>
          </person-group>
          <article-title>An unconstrained <italic>l</italic><sub>q</sub> minimization with 0 &lt; <italic>q</italic> ≤ 1 for sparse solution of under-determined linear systems</article-title>
          <source>SIAM J. Optim.</source>
          <year>2010</year>
          <volume>21</volume>
          <fpage>82</fpage>
          <lpage>101</lpage>
        </citation>
      </ref>
      <ref id="B3-algorithms-05-00421">
        <label>3.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Chartrand</surname>
              <given-names>R.</given-names>
            </name>
          </person-group>
          <article-title>Exact reconstruction of sparse signals via nonconvex minimization</article-title>
          <source>IEEE Signal Process. Lett.</source>
          <year>2007</year>
          <volume>14</volume>
          <fpage>707</fpage>
          <lpage>710</lpage>
          <pub-id pub-id-type="doi">10.1109/LSP.2007.898300</pub-id>
        </citation>
      </ref>
      <ref id="B4-algorithms-05-00421">
        <label>4.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Candès</surname>
              <given-names>E.J.</given-names>
            </name>
            <name>
              <surname>Wakin</surname>
              <given-names>M.B.</given-names>
            </name>
          </person-group>
          <article-title>An introduction to compressive sampling</article-title>
          <source>IEEE Signal Process. Mag.</source>
          <year>2008</year>
          <volume>25</volume>
          <fpage>21</fpage>
          <lpage>30</lpage>
          <pub-id pub-id-type="doi">10.1109/MSP.2007.914731</pub-id>
        </citation>
      </ref>
      <ref id="B5-algorithms-05-00421">
        <label>5.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Auquiert</surname>
              <given-names>P.</given-names>
            </name>
            <name>
              <surname>Gibaru</surname>
              <given-names>O.</given-names>
            </name>
            <name>
              <surname>Nyiri</surname>
              <given-names>E.</given-names>
            </name>
          </person-group>
          <article-title>Fast <italic>L</italic><sub>1</sub>-<italic>C</italic><sup>k</sup> polynomial spline interpolation algorithm with shape-preserving properties</article-title>
          <source>Comput. Aided Geom. Design</source>
          <year>2011</year>
          <volume>28</volume>
          <fpage>65</fpage>
          <lpage>74</lpage>
          <pub-id pub-id-type="doi">10.1016/j.cagd.2010.10.002</pub-id>
        </citation>
      </ref>
      <ref id="B6-algorithms-05-00421">
        <label>6.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Yu</surname>
              <given-names>L.</given-names>
            </name>
            <name>
              <surname>Jin</surname>
              <given-names>Q.</given-names>
            </name>
            <name>
              <surname>Lavery</surname>
              <given-names>J.E.</given-names>
            </name>
            <name>
              <surname>Fang</surname>
              <given-names>S.-C.</given-names>
            </name>
          </person-group>
          <article-title>Univariate cubic <italic>L</italic><sub>1</sub> interpolating splines: Spline functional, window size and analysis-based algorithm</article-title>
          <source>Algorithms</source>
          <year>2010</year>
          <volume>3</volume>
          <fpage>311</fpage>
          <lpage>328</lpage>
          <pub-id pub-id-type="doi">10.3390/a3030311</pub-id>
        </citation>
      </ref>
      <ref id="B7-algorithms-05-00421">
        <label>7.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Candès</surname>
              <given-names>E.J.</given-names>
            </name>
            <name>
              <surname>Li</surname>
              <given-names>X.</given-names>
            </name>
            <name>
              <surname>Ma</surname>
              <given-names>Y.</given-names>
            </name>
            <name>
              <surname>Wright</surname>
              <given-names>J.</given-names>
            </name>
          </person-group>
          <article-title>Robust principal component analysis?</article-title>
          <source>J. ACM</source>
          <year>2011</year>
          <volume>58</volume>
          <fpage>1</fpage>
          <lpage>37</lpage>
        </citation>
      </ref>
      <ref id="B8-algorithms-05-00421">
        <label>8.</label>
        <citation citation-type="confproc">
          <person-group person-group-type="author">
            <name>
              <surname>Ke</surname>
              <given-names>Q.</given-names>
            </name>
            <name>
              <surname>Kanade</surname>
              <given-names>T.</given-names>
            </name>
          </person-group>
          <article-title>Robust <italic>L</italic><sub>1</sub> norm factorization in the presence of outliers and missing data by alternative convex programming</article-title>
          <source>Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005)</source>
          <person-group person-group-type="editor">
            <name>
              <surname>Schmid</surname>
              <given-names>C.</given-names>
            </name>
            <name>
              <surname>Soatto</surname>
              <given-names>S.</given-names>
            </name>
            <name>
              <surname>Tomasi</surname>
              <given-names>C.</given-names>
            </name>
          </person-group>
          <publisher-name>IEEE Computer Society</publisher-name>
          <publisher-loc>Los Alamitos, CA, USA</publisher-loc>
          <conf-loc>San Diego, CA, USA</conf-loc>
          <conf-date>20−25 June 2005</conf-date>
          <year>2005</year>
          <fpage>739</fpage>
          <lpage>746</lpage>
        </citation>
      </ref>
      <ref id="B9-algorithms-05-00421">
        <label>9.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Kwak</surname>
              <given-names>N.</given-names>
            </name>
          </person-group>
          <article-title>Principal component analysis based on <italic>L</italic>1-norm maximization</article-title>
          <source>IEEE Trans. Pattern Anal. Mach. Intell.</source>
          <year>2008</year>
          <volume>30</volume>
          <fpage>1672</fpage>
          <lpage>1680</lpage>
          <pub-id pub-id-type="doi">10.1109/TPAMI.2008.114</pub-id>
        </citation>
      </ref>
      <ref id="B10-algorithms-05-00421">
        <label>10.</label>
        <citation citation-type="confproc">
          <person-group person-group-type="author">
            <name>
              <surname>Dodge</surname>
              <given-names>Y.</given-names>
            </name>
          </person-group>
          <article-title>Statistical Data Analysis Based on the L<sub>1</sub> Norm and Related Methods</article-title>
          <source>Proceedings of the Conference on Statistical Data Analysis Based on the L<sub>1</sub> Norm and Related Methods</source>
          <publisher-name>Birkhäuser</publisher-name>
          <publisher-loc>Basel, Switzerland</publisher-loc>
          <conf-loc>Neuchâtel, Switzerland</conf-loc>
          <conf-date>4–9 August 2002</conf-date>
          <year>2002</year>
        </citation>
      </ref>
      <ref id="B11-algorithms-05-00421">
        <label>11.</label>
        <citation citation-type="book">
          <person-group person-group-type="author">
            <name>
              <surname>Nolan</surname>
              <given-names>J.P.</given-names>
            </name>
          </person-group>
          <article-title>Multivariate stable distributions: Approximation, estimation, simulation and identification</article-title>
          <source>A Practical Guide to Heavy Tails</source>
          <person-group person-group-type="editor">
            <name>
              <surname>Adler</surname>
              <given-names>R.J.</given-names>
            </name>
            <name>
              <surname>Feldman</surname>
              <given-names>R.E.</given-names>
            </name>
            <name>
              <surname>Taqqu</surname>
              <given-names>M.S.</given-names>
            </name>
          </person-group>
          <publisher-name>Birkhäuser</publisher-name>
          <publisher-loc>Cambridge, MA, USA</publisher-loc>
          <year>1998</year>
          <fpage>509</fpage>
          <lpage>525</lpage>
        </citation>
      </ref>
      <ref id="B12-algorithms-05-00421">
        <label>12.</label>
        <citation citation-type="book">
          <person-group person-group-type="author">
            <name>
              <surname>Resnick</surname>
              <given-names>S.I.</given-names>
            </name>
          </person-group>
          <source>Heavy-Tail Phenomena: Probabilistic and Statistical Modeling</source>
          <publisher-name>Springer-Verlag</publisher-name>
          <publisher-loc>Berlin, Germany</publisher-loc>
          <year>2007</year>
        </citation>
      </ref>
      <ref id="B13-algorithms-05-00421">
        <label>13.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Faloutsos</surname>
              <given-names>M.</given-names>
            </name>
            <name>
              <surname>Faloutsos</surname>
              <given-names>P.</given-names>
            </name>
            <name>
              <surname>Faloutsos</surname>
              <given-names>C.</given-names>
            </name>
          </person-group>
          <article-title>On power-law relationships of the internet topology</article-title>
          <source>Comp. Comm. Rev.</source>
          <year>1999</year>
          <volume>29</volume>
          <fpage>251</fpage>
          <lpage>262</lpage>
          <pub-id pub-id-type="doi">10.1145/316194.316229</pub-id>
        </citation>
      </ref>
      <ref id="B14-algorithms-05-00421">
        <label>14.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Willinger</surname>
              <given-names>W.</given-names>
            </name>
            <name>
              <surname>Govindan</surname>
              <given-names>R.</given-names>
            </name>
            <name>
              <surname>Jamin</surname>
              <given-names>S.</given-names>
            </name>
            <name>
              <surname>Paxson</surname>
              <given-names>V.</given-names>
            </name>
            <name>
              <surname>Shenker</surname>
              <given-names>S.</given-names>
            </name>
          </person-group>
          <article-title>Scaling phenomena in the Internet: Critically examining criticality</article-title>
          <source>Proc. Natl. Acad. Sci. USA</source>
          <year>2002</year>
          <volume>99</volume>
          <fpage>2573</fpage>
          <lpage>2580</lpage>
          <pub-id pub-id-type="doi">10.1073/pnas.012583099</pub-id>
        </citation>
      </ref>
      <ref id="B15-algorithms-05-00421">
        <label>15.</label>
        <citation citation-type="book">
          <person-group person-group-type="author">
            <name>
              <surname>Rachev</surname>
              <given-names>S.T.</given-names>
            </name>
            <name>
              <surname>Menn</surname>
              <given-names>C.</given-names>
            </name>
            <name>
              <surname>Fabozzi</surname>
              <given-names>F.J.</given-names>
            </name>
          </person-group>
          <source>Fat-Tailed and Skewed Asset Return Distributions: Implications for Risk Management, Portfolio Selection, and Option Pricing</source>
          <publisher-name>John Wiley</publisher-name>
          <publisher-loc>Hoboken, NJ, USA</publisher-loc>
          <year>2005</year>
        </citation>
      </ref>
      <ref id="B16-algorithms-05-00421">
        <label>16.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Reed</surname>
              <given-names>W.J.</given-names>
            </name>
            <name>
              <surname>Jorgensen</surname>
              <given-names>M.A.</given-names>
            </name>
          </person-group>
          <article-title>The double Pareto-lognormal distribution—A new parametric model for size distributions</article-title>
          <source>Comm. Statist. Theory Methods</source>
          <year>2004</year>
          <volume>33</volume>
          <fpage>1733</fpage>
          <lpage>1753</lpage>
          <pub-id pub-id-type="doi">10.1081/STA-120037438</pub-id>
        </citation>
      </ref>
      <ref id="B17-algorithms-05-00421">
        <label>17.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Foucart</surname>
              <given-names>S.</given-names>
            </name>
            <name>
              <surname>Lai</surname>
              <given-names>M.-J.</given-names>
            </name>
          </person-group>
          <article-title>Sparsest solutions of underdetermined linear systems via <italic>l</italic><sub>q</sub>-minimization for 0 &lt; <italic>q</italic> ≤ 1</article-title>
          <source>Appl. Comput. Harmon. Anal.</source>
          <year>2009</year>
          <volume>26</volume>
          <fpage>395</fpage>
          <lpage>407</lpage>
          <pub-id pub-id-type="doi">10.1016/j.acha.2008.09.001</pub-id>
        </citation>
      </ref>
      <ref id="B18-algorithms-05-00421">
        <label>18.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Ge</surname>
              <given-names>D.</given-names>
            </name>
            <name>
              <surname>Jiang</surname>
              <given-names>X.</given-names>
            </name>
            <name>
              <surname>Ye</surname>
              <given-names>Y.</given-names>
            </name>
          </person-group>
          <article-title>A note on the complexity of <italic>L</italic><sub>p</sub> minimization</article-title>
          <source>Math. Program.</source>
          <year>2011</year>
          <volume>129</volume>
          <fpage>285</fpage>
          <lpage>299</lpage>
          <pub-id pub-id-type="doi">10.1007/s10107-011-0470-2</pub-id>
        </citation>
      </ref>
      <ref id="B19-algorithms-05-00421">
        <label>19.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Gribonval</surname>
              <given-names>R.</given-names>
            </name>
            <name>
              <surname>Nielsen</surname>
              <given-names>M.</given-names>
            </name>
          </person-group>
          <article-title>Highly sparse representations from dictionaries are unique and independent of the sparseness measure</article-title>
          <source>Appl. Comput. Harmon. Anal.</source>
          <year>2007</year>
          <volume>22</volume>
          <fpage>335</fpage>
          <lpage>355</lpage>
          <pub-id pub-id-type="doi">10.1016/j.acha.2006.09.003</pub-id>
        </citation>
      </ref>
      <ref id="B20-algorithms-05-00421">
        <label>20.</label>
        <citation citation-type="journal">
          <person-group person-group-type="author">
            <name>
              <surname>Wang</surname>
              <given-names>M.</given-names>
            </name>
            <name>
              <surname>Xu</surname>
              <given-names>W.</given-names>
            </name>
            <name>
              <surname>Tang</surname>
              <given-names>A.</given-names>
            </name>
          </person-group>
          <article-title>On the Performance of Sparse Recovery via <italic>L</italic><sub>p</sub>-minimization (0 ≤ <italic>p</italic> ≤ 1)</article-title>
          <source>IEEE Trans. Info. Theory</source>
          <year>2011</year>
          <volume>57</volume>
          <fpage>7255</fpage>
          <lpage>7278</lpage>
          <pub-id pub-id-type="doi">10.1109/TIT.2011.2159959</pub-id>
        </citation>
      </ref>
      <ref id="B21-algorithms-05-00421">
        <label>21.</label>
        <citation citation-type="book">
          <person-group person-group-type="author">
            <name>
              <surname>Wilcox</surname>
              <given-names>R.R.</given-names>
            </name>
          </person-group>
          <source>Introduction to Robust Estimation and Hypothesis Testing</source>
          <edition>2nd</edition>
          <publisher-name>Elsevier</publisher-name>
          <publisher-loc>Burlington, MA, USA</publisher-loc>
          <year>2005</year>
        </citation>
      </ref>
    </ref-list>
  </back>
</article>
