Least absolute deviations
Part of a series on 
Regression analysis 

Models 
Estimation 
Background 

Least absolute deviations (LAD), also known as least absolute errors (LAE), least absolute value (LAV), least absolute residual (LAR), sum of absolute deviations, or the L_{1} norm condition, is a statistical optimality criterion and the statistical optimization technique that relies on it. Similar to the least squares technique, it attempts to find a function which closely approximates a set of data. In the simple case of a set of (x,y) data, the approximation function is a simple "trend line" in twodimensional Cartesian coordinates. The method minimizes the sum of absolute errors (SAE) (the sum of the absolute values of the vertical "residuals" between points generated by the function and corresponding points in the data). The least absolute deviations estimate also arises as the maximum likelihood estimate if the errors have a Laplace distribution. It was introduced in 1757 by Roger Joseph Boscovich.^{[1]}
Formulation
Suppose that the data set consists of the points (x_{i}, y_{i}) with i = 1, 2, ..., n. We want to find a function f such that [math]\displaystyle{ f(x_i)\approx y_i. }[/math]
To attain this goal, we suppose that the function f is of a particular form containing some parameters that need to be determined. For instance, the simplest form would be linear: f(x) = bx + c, where b and c are parameters whose values are not known but which we would like to estimate. Less simply, suppose that f(x) is quadratic, meaning that f(x) = ax^{2} + bx + c, where a, b and c are not yet known. (More generally, there could be not just one explanator x, but rather multiple explanators, all appearing as arguments of the function f.)
We now seek estimated values of the unknown parameters that minimize the sum of the absolute values of the residuals:
 [math]\displaystyle{ S = \sum_{i=1}^n y_i  f(x_i). }[/math]
Solution
Though the idea of least absolute deviations regression is just as straightforward as that of least squares regression, the least absolute deviations line is not as simple to compute efficiently. Unlike least squares regression, least absolute deviations regression does not have an analytical solving method. Therefore, an iterative approach is required. The following is an enumeration of some least absolute deviations solving methods.
 Simplexbased methods (such as the BarrodaleRoberts algorithm^{[2]})
 Because the problem is a linear program, any of the many linear programming techniques (including the simplex method as well as others) can be applied.
 Iteratively reweighted least squares^{[3]}
 Wesolowsky’s direct descent method^{[4]}
 LiArce’s maximum likelihood approach^{[5]}
 Recursive reduction of dimensionality approach^{[6]}
 Check all combinations of pointtopoint lines for minimum sum of errors
Simplexbased methods are the “preferred” way to solve the least absolute deviations problem.^{[7]} A Simplex method is a method for solving a problem in linear programming. The most popular algorithm is the BarrodaleRoberts modified Simplex algorithm. The algorithms for IRLS, Wesolowsky's Method, and Li's Method can be found in Appendix A of ^{[7]} among other methods. Checking all combinations of lines traversing any two (x,y) data points is another method of finding the least absolute deviations line. Since it is known that at least one least absolute deviations line traverses at least two data points, this method will find a line by comparing the SAE (Smallest Absolute Error over data points) of each line, and choosing the line with the smallest SAE. In addition, if multiple lines have the same, smallest SAE, then the lines outline the region of multiple solutions. Though simple, this final method is inefficient for large sets of data.
Using linear programming
The problem can be solved using any linear programming technique on the following problem specification. We wish to
 [math]\displaystyle{ \text{Minimize} \sum_{i=1}^n y_i  a_0  a_1x_{i1}  a_2x_{i2}  \cdots  a_kx_{ik} }[/math]
with respect to the choice of the values of the parameters [math]\displaystyle{ a_0,\ldots, a_k }[/math], where y_{i} is the value of the i^{th} observation of the dependent variable, and x_{ij} is the value of the i^{th} observation of the j^{th} independent variable (j = 1,...,k). We rewrite this problem in terms of artificial variables u_{i} as
 [math]\displaystyle{ \text{Minimize} \sum_{i=1}^n u_i }[/math]
 with respect to [math]\displaystyle{ a_0,\ldots, a_k }[/math] and [math]\displaystyle{ u_1,\ldots, u_n }[/math]
 subject to
 [math]\displaystyle{ u_i \ge y_i  a_0  a_1x_{i1}  a_2x_{i2}  \cdots  a_kx_{ik} \,\ \,\ \,\ \,\ \,\ \text{for } i=1,\ldots,n }[/math]
 [math]\displaystyle{ u_i \ge [y_i  a_0  a_1x_{i1}  a_2x_{i2}  \cdots  a_kx_{ik}] \,\ \,\ \text{ for } i=1,\ldots,n. }[/math]
These constraints have the effect of forcing each [math]\displaystyle{ u_i }[/math] to equal [math]\displaystyle{ y_i  a_0  a_1x_{i1}  a_2x_{i2}  \cdots  a_kx_{ik} }[/math] upon being minimized, so the objective function is equivalent to the original objective function. Since this version of the problem statement does not contain the absolute value operator, it is in a format that can be solved with any linear programming package.
Properties
There exist other unique properties of the least absolute deviations line. In the case of a set of (x,y) data, the least absolute deviations line will always pass through at least two of the data points, unless there are multiple solutions. If multiple solutions exist, then the region of valid least absolute deviations solutions will be bounded by at least two lines, each of which passes through at least two data points. More generally, if there are k regressors (including the constant), then at least one optimal regression surface will pass through k of the data points.^{[8]}^{:p.936}
This "latching" of the line to the data points can help to understand the "instability" property: if the line always latches to at least two points, then the line will jump between different sets of points as the data points are altered. The "latching" also helps to understand the "robustness" property: if there exists an outlier, and a least absolute deviations line must latch onto two data points, the outlier will most likely not be one of those two points because that will not minimize the sum of absolute deviations in most cases.
One known case in which multiple solutions exist is a set of points symmetric about a horizontal line, as shown in Figure A below.
To understand why there are multiple solutions in the case shown in Figure A, consider the pink line in the green region. Its sum of absolute errors is some value S. If one were to tilt the line upward slightly, while still keeping it within the green region, the sum of errors would still be S. It would not change because the distance from each point to the line grows on one side of the line, while the distance to each point on the opposite side of the line diminishes by exactly the same amount. Thus the sum of absolute errors remains the same. Also, since one can tilt the line in infinitely small increments, this also shows that if there is more than one solution, there are infinitely many solutions.
Advantages and disadvantages
The following is a table contrasting some properties of the method of least absolute deviations with those of the method of least squares (for nonsingular problems).^{[9]}^{[10]}
Ordinary least squares regression  Least absolute deviations regression  

Not very robust  Robust  
Stable solution  Unstable solution  
One solution*  Possibly multiple solutions 
*Provided that the number of data points is greater than or equal to the number of features.
The method of least absolute deviations finds applications in many areas, due to its robustness compared to the least squares method. Least absolute deviations is robust in that it is resistant to outliers in the data. LAD gives equal emphasis to all observations, in contrast to ordinary least squares (OLS) which, by squaring the residuals, gives more weight to large residuals, that is, outliers in which predicted values are far from actual observations. This may be helpful in studies where outliers do not need to be given greater weight than other observations. If it is important to give greater weight to outliers, the method of least squares is a better choice.
Variations, extensions, specializations
If in the sum of the absolute values of the residuals one generalises the absolute value function to a tilted absolute value function, which on the left halfline has slope [math]\displaystyle{ \tau1 }[/math] and on the right halfline has slope [math]\displaystyle{ \tau }[/math], where [math]\displaystyle{ 0\lt \tau\lt 1 }[/math], one obtains quantile regression. The case of [math]\displaystyle{ \tau=1/2 }[/math] gives the standard regression by least absolute deviations and is also known as median regression.
The least absolute deviation problem may be extended to include multiple explanators, constraints and regularization, e.g., a linear model with linear constraints:^{[11]}
 minimize [math]\displaystyle{ S(\mathbf{\beta}, b) = \sum_i  \mathbf{x}'_i \mathbf{\beta} + b  y_i  }[/math]
 subject to, e.g., [math]\displaystyle{ \mathbf{x}'_1 \mathbf{\beta} + b  y_1 \leq k }[/math]
where [math]\displaystyle{ \mathbf{\beta} }[/math] is a column vector of coefficients to be estimated, b is an intercept to be estimated, x_{i } is a column vector of the i^{th} observations on the various explanators, y_{i} is the i^{th} observation on the dependent variable, and k is a known constant.
Regularization with LASSO (least absolute shrinkage and selection operator) may also be combined with LAD.^{[12]}
See also
 Quantile regression
 Regression analysis
 Linear regression model
 Absolute deviation
 Average absolute deviation
 Median absolute deviation
 Ordinary least squares
 Robust regression
References
 ↑ "Least Absolute Deviation Regression". The Concise Encyclopedia of Statistics. Springer. 2008. pp. 299–302. doi:10.1007/9780387328331_225. ISBN 9780387328331. https://archive.org/details/conciseencyclope00dodg.
 ↑ I. Barrodale & F. D. K. Roberts (1973). "An improved algorithm for discrete L_{1} linear approximation". SIAM Journal on Numerical Analysis 10 (5): 839–848. doi:10.1137/0710069. Bibcode: 1973SJNA...10..839B.
 ↑ E. J. Schlossmacher (December 1973). "An Iterative Technique for Absolute Deviations Curve Fitting". Journal of the American Statistical Association 68 (344): 857–859. doi:10.2307/2284512.
 ↑ G. O. Wesolowsky (1981). "A new descent algorithm for the least absolute value regression problem". Communications in Statistics – Simulation and Computation B10 (5): 479–491. doi:10.1080/03610918108812224.
 ↑ Yinbo Li and Gonzalo R. Arce (2004). "A Maximum Likelihood Approach to Least Absolute Deviation Regression". EURASIP Journal on Applied Signal Processing 2004 (12): 1762–1769. doi:10.1155/S1110865704401139. Bibcode: 2004EJASP2004...61L.
 ↑ Ana Sovic Krzic and Damir Sersic (2018). "L1 minimization using recursive reduction of dimensionality". Signal Processing 151: 119–129. doi:10.1016/j.sigpro.2018.05.002.
 ↑ ^{7.0} ^{7.1} William A. Pfeil, Statistical Teaching Aids, Bachelor of Science thesis, Worcester Polytechnic Institute, 2006
 ↑ Branham, R. L., Jr., "Alternatives to least squares", Astronomical Journal 87, June 1982, 928–937. [1] at SAO/NASA Astrophysics Data System (ADS)
 ↑ For a set of applets that demonstrate these differences, see the following site: http://www.math.wpi.edu/Course_Materials/SAS/lablets/7.3/73_choices.html
 ↑ For a discussion of LAD versus OLS, see these academic papers and reports: http://www.econ.uiuc.edu/~roger/research/rq/QRJEP.pdf and https://www.leeds.ac.uk/educol/documents/00003759.htm
 ↑ Mingren Shi; Mark A., Lukas (March 2002). "An L_{1} estimation algorithm with degeneracy and linear constraints". Computational Statistics & Data Analysis 39 (1): 35–55. doi:10.1016/S01679473(01)000494. http://researchrepository.murdoch.edu.au/id/eprint/15195/.
 ↑ Li Wang, Michael D. Gordon & Ji Zhu (December 2006). "Regularized Least Absolute Deviations Regression and an Efficient Algorithm for Parameter Tuning". pp. 690–700. doi:10.1109/ICDM.2006.134.
Further reading
 Peter Bloomfield and William Steiger (1980). "Least Absolute Deviations CurveFitting". SIAM Journal on Scientific Computing 1 (2): 290–301. doi:10.1137/0901019.
 Subhash C. Narula and John F. Wellington (1982). "The Minimum Sum of Absolute Errors Regression: A State of the Art Survey". International Statistical Review 50 (3): 317–326. doi:10.2307/1402501.
 Robert F. Phillips (July 2002). "Least absolute deviations estimation via the EM algorithm". Statistics and Computing 12 (3): 281–285. doi:10.1023/A:1020759012226.
 Enno Siemsen & Kenneth A. Bollen (2007). "Least Absolute Deviation Estimation in Structural Equation Modeling". Sociological Methods & Research 36 (2): 227–265. doi:10.1177/0049124107301946.
Original source: https://en.wikipedia.org/wiki/Least absolute deviations.
Read more 