Let's say we have a bunch of numbers, represented by the tick marks towards the bottom the fancy interactive plot drawing thingy below. Next, we chose some other number, represented by the big pink bar that you can drag.
Sums of shapes
For each number in our bunch (each tick mark), we could draw a line from the number to the other number we chose (the pink bar). Then we could draw a square for each of these numbers with a side as long as the this line. (These squares are represented by the squares in the plot thingy.) We could add up the areas of all of these squares. People call that the sum of squared error or the sum of squares.
Instead of adding up the squares, we could just add up the lines. People call that the sum of absolute errors, but I like calling it the sum of lines.
Sometimes, these lines will have no length because the two numbers that form the line (the tick mark and the pink bar) are the same number. We could draw a point for each tick mark whose value is not exactly the same as the pink bar. Then we could count how many points we have and call that the sum of points.
Nota bene: The line and square at the bottom right are proportional to but not equal to the sums of lines and squares, respectively.
Values of the other number that minimize the sums of shapes
If you play around with the plot above, you'll find one location of the pink bar that yields the smallest sum of squares. (The "Sum of squares" square at the bottom-right will be smallest for this situation.) We call this location the mean.
You'll also find one spot or two adjacent spots that yield the smallest sum of lines. We call this location the median.
And you'll find at least one spot with the smallest sum of points. (This spot will have particularly few points in the "Sum of points" section at the bottom-left.) We call this spot the mode.
Extrapolating the distance measure
I see the sum of points as the zero-order distance measure, the sum of lines as the one-order distance measure and sum of squares as the two-order distance measure. A general distance measure that includes all of these is the sum of n-dimensional volumes (Is there a better word for that?) of the n-dimensional hypercubes. Said more concisely (in LaTeX),
Distance_n=\sum_i \lvert x_i - c\rvert^n
where each i corresponds to an observation (represented above by tick marks), n is the number of dimensions, and c represents that other number (represented above by the pink bar).
Sum of squares
The sum of squares is thus this.
Distance_2=\sum_i \lvert x_i - c\rvert^2
The value of c that minimizes is the mean.
Sum of lines
The sum of lines is this.
Distance_1=\sum_i \lvert x_i - c\rvert^1
The value of c that minimizes is the median.
Sum of points
To make this work with the zero-order distance, I proclaim that equals 0. The sum of points is this.
Distance_0=\sum_i \lvert x_i - c \rvert ^0
The quantity within the summation is zero if equals c and one otherwise.
The value of c that minimizes is the mode.
Higher-power distance measures emphasize more extreme values
I see the mode, median and mean as different measures of the center of a distribution. (I labeled them c for "center".)
As we increase the power of the distance measure, we use more information from the tails to produce the measure of the center of the distribution.
The mode only looks for the most common values; all the information that it conveys about the other values is that they are less common.
The median turns out to be the value of middle rank. For example, if there are 9 numbers, the fifth-highest/fifth-lowest is the median. The median doesn't distinguish between an observation that is slightly greater than most and an observation that is exceptionally greater than most.
Compared to the median, the mean takes more information from extreme values. It might not be particularly obvious why, so I present a simple example.
If we have only two observations (represented above by the black dots), the sum of lines will be the same as long as we choose a center point that is between the two points; the sum of lines will be the distance between the two points. The sum of squares, on the other hand, is smallest in the center because we'll have two smallish squares (orange) rather than one huge square (teal).
Center points for higher-power distance measures
What center points minimize these higher-power distance measures? I calculated the distance measures for dimensions up to 100 on the following skewed distribution, using many different center values for each dimension.
Then I chose the center value with the lowest distance measure and called that the n-dimensional measure of the distribution's center. (Mode is the 0-dimensional measure, median is the 1-dimensional measure, and mean is the 2-dimensional measure.)
As the number of dimensions goes up, the measure of the center moves in the direction of the long tail of the distribution.
What this is called?
It had seemed odd to me that I haven't heard of a sum of cubes. Is there standard a name for the stuff I just explained? Does anyone use higher-power distance or centrality measures for real things? I asked this stuff on the internet and received some answers. The stuff above is related to the Lp spaces.