Adonis Diaries

Mean/Average: more than one kind of means for generalization across batches of people, characters, and objects?

Posted on: June 16, 2022

There are 3 different kinds of Means used in various physical/chemical/financial fields.

You have so far the regular Arithmetic mean, the Geometric mean and the Harmonic mean. The last 2 mean presuppose that all the data have positive values in order to be applied.

The Harmonic mean will be developed at the end of the article.

Geometric vs. Arithmetic Mean: What’s the difference?

Annie Guo

Jan 17, 2022

Part 4 of the Learn Stats with Bayes series

There are multiple ways to calculate the mean of a dataset:

▹ Arithmetic mean — this is the method that we are most familiar with and taught in schools.

Formula for arithmetic average, , where x1, x2…xn is the data sample, and n is the sample size

Geometric mean — this is the topic of the following post

▹ Harmonic mean

Definition of Geometric Mean

Geometric mean, also known as geometric average, is a way to calculate average for data that represent exponential or varying growth.

It is defined as the product of the individual data points, all raised to the power of 1/n, where n is the sample size.

Formula for geometric mean, where x1, x2…xn is the data sample, and n is the sample size

Application of Geometric Mean

Geometric mean is typically used for data whose values are exponential in nature, or are supposed to be multiplied together.

1. Data which are exponential in nature

You can use geometric means to calculate things like:

  • Prevalence of diseases, like Covid cases in a country, since some diseases grow exponentially, and having more cases now and will likely lead to having more cases in the future
  • Growth of a population

2. Data which are supposed to be multiplied together

  • In finance, geometric mean can be used to aggregate investment returns. For example, geometric mean is used to calculate CAGR (compound annual growth rate) which represents the constant rate of return over a given time period.
  • In statistics, geometric mean is used in the denominator of Bayes’ rule between the likelihood and prior

One big caveat of geometric mean is that it only works with positive numbers, so you may need to consider a different method to calculate means if your data contains negative values.

A Worked Example

Imagine that you are a researcher in public health and want to know the growth rate of COVID cases in a town.

In Month 1 the town had 100 cases, and then 180, 210 and 300 in the following months, so the month-over-month growth rate is 80%, 16.67% and 42.85% respectively. This will yield an arithmetic mean of 46.51% (which is 80% + 16.67% + 42.85% divided by 3).

Example of arithmetic mean based on observed data

However, if you start Month 1 with 100 cases and let it grow by 46.50% each month, you will end up with 314 cases on Month 4, which is higher than the 300 cases that we had from the observed data. This means that the arithmetic mean is an over-estimate of the true month-on-month growth rate.

314 is bigger than 300, which shows that using the arithmetic mean will lead to an OVER-estimated number of cases than the actual number of cases. This means that arithmetic mean is higher than the true growth rate in this case.

Instead, if we calculate the geometric mean for these months, it will get us a geometric mean of 44.22% per month.

Geometric mean for the COVID case growth example

Using the geometric mean, If we start with 100 cases in Month 1 and let the cases grow by 44.22% per month, then we will end up with 300 cases in Month 4. In this case, the geometric mean is a better representation of the true growth rate than arithmetic mean.

Using geometric mean yields 300 cases for Month 4, which matches with the observed cases.

Note: projection of both means are bad for the second month. If this is the case for most data, then a better mean has to be resolved.

Implementation of Geometric Mean


Both scipy and statistics libraries have implemented geometric mean in Pythonso it’s probably faster and more stable than trying to write it yourself.

Note that the implementation in statistics is only available starting Python 3.8, so you might want to check your python version first 😃

>>> from scipy.stats.mstats import gmean
>>> gmean([1.8,1.1667,1.4286])-1
0.4422729209113938>>> from statistics import geometric_mean
>>> geometric_mean([1.8,1.1667,1.4286])-1

Excel/Google sheet

Pretty straight forward, just use the geomean() function:

That’s it! If you find this posts useful please consider following me for more posts on statistics and data science.

Note: Until statistical packages find a different version for the arithmetic mean, I will have doubt on the results for the experimental research data.

Harmonic Means

It is one of the Pythagorean means. It is sometimes appropriate for situations when the average rate[1] is desired.
The harmonic mean can be expressed as the reciprocal of the arithmetic mean of the reciprocals of the given set of observations.

The harmonic mean H of the positive real numbers {\displaystyle x_{1},x_{2},\ldots ,x_{n}}x_{1},x_{2},\ldots ,x_{n} is defined to be{\displaystyle H={\frac {n}{{\frac {1}{x_{1}}}+{\frac {1}{x_{2}}}+\cdots +{\frac {1}{x_{n}}}}}={\frac {n}{\sum \limits _{i=1}^{n}{\frac {1}{x_{i}}}}}=\left({\frac {\sum \limits _{i=1}^{n}x_{i}^{-1}}{n}}\right)^{-1}.}{\displaystyle H={\frac {n}{{\frac {1}{x_{1}}}+{\frac {1}{x_{2}}}+\cdots +{\frac {1}{x_{n}}}}}={\frac {n}{\sum \limits _{i=1}^{n}{\frac {1}{x_{i}}}}}=\left({\frac {\sum \limits _{i=1}^{n}x_{i}^{-1}}{n}}\right)^{-1}.}

Examples of where Harmonic Mean are applied in different application fields.

In Physics

Average speed
In many situations involving rates and ratios, the harmonic mean provides the correct average.

For instance, if a vehicle travels a certain distance d outbound at a speed x (e.g. 60 km/h) and returns the same distance at a speed y (e.g. 20 km/h), then its average speed is the harmonic mean of x and y (30 km/h) – not the arithmetic mean (40 km/h).

The total travel time is the same as if it had traveled the whole distance at that average speed. This can be proven as follows:
Average speed for the entire journey
= .mw-parser-output .sfrac{white-space:nowrap}.mw-parser-output .sfrac.tion,.mw-parser-output .sfrac .tion{display:inline-block;vertical-align:-0.5em;font-size:85%;text-align:center}.mw-parser-output .sfrac .num,.mw-parser-output .sfrac .den{display:block;line-height:1em;margin:0 0.1em}.mw-parser-output .sfrac .den{border-top:1px solid}.mw-parser-output .sr-only{border:0;clip:rect(0,0,0,0);height:1px;margin:-1px;overflow:hidden;padding:0;position:absolute;width:1px}

Total distance traveled/Sum of time for each segment
= 2d/d/x + d/y = 2/1/x+1/y
However, if the vehicle travels for a certain amount of time at a speed x and then the same amount of time at a speed y, then its average speed is the arithmetic mean of x and y, which in the above example is 40 km/h.

The same principle applies to more than two segments: given a series of sub-trips at different speeds, if each sub-trip covers the same distance, then the average speed is the harmonic mean of all the sub-trip speeds.

And if each sub-trip takes the same amount of time, then the average speed is the arithmetic mean of all the sub-trip speeds. (If neither is the case, then a weighted harmonic mean or weighted arithmetic mean is needed. For the arithmetic mean, the speed of each portion of the trip is weighted by the duration of that portion, while for the harmonic mean, the corresponding weight is the distance. In both cases, the resulting formula reduces to dividing the total distance by the total time.)

However one may avoid the use of the harmonic mean for the case of “weighting by distance”. Pose the problem as finding “slowness” of the trip where “slowness” (in hours per kilometre) is the inverse of speed.

When trip slowness is found, invert it so as to find the “true” average trip speed.

For each trip segment i, the slowness si = 1/speedi. Then take the weighted arithmetic mean of the si’s weighted by their respective distances (optionally with the weights normalized so they sum to 1 by dividing them by trip length).

This gives the true average slowness (in time per kilometre). It turns out that this procedure, which can be done with no knowledge of the harmonic mean, amounts to the same mathematical operations as one would use in solving this problem by using the harmonic mean. Thus it illustrates why the harmonic mean works in this case.

This section does not cite any sources. Please help improve this section by adding citations to reliable sources. Unsourced material may be challenged and removed. (December 2019) (Learn how and when to remove this template message)
Similarly, if one wishes to estimate the density of an alloy given the densities of its constituent elements and their mass fractions (or, equivalently, percentages by mass), then the predicted density of the alloy (exclusive of typically minor volume changes due to atom packing effects) is the weighted harmonic mean of the individual densities, weighted by mass, rather than the weighted arithmetic mean as one might at first expect. To use the weighted arithmetic mean, the densities would have to be weighted by volume. Applying dimensional analysis to the problem while labeling the mass units by element and making sure that only like element-masses cancel makes this clear.

If one connects two electrical resistors in parallel, one having resistance x (e.g., 60 Ω) and one having resistance y (e.g., 40 Ω), then the effect is the same as if one had used two resistors with the same resistance, both equal to the harmonic mean of x and y (48 Ω): the equivalent resistance, in either case, is 24 Ω (one-half of the harmonic mean). This same principle applies to capacitors in series or to inductors in parallel.
However, if one connects the resistors in series, then the average resistance is the arithmetic mean of x and y (with total resistance equal to the sum of x and y). This principle applies to capacitors in parallel or to inductors in series.
As with the previous example, the same principle applies when more than two resistors, capacitors or inductors are connected, provided that all are in parallel or all are in series.
The “conductivity effective mass” of a semiconductor is also defined as the harmonic mean of the effective masses along the three crystallographic directions.[9]

As for other optic equations, the thin lens equation 1/f = 1/u + 1/v can be rewritten such that the focal length f is one-half of the harmonic mean of the distances of the subject u and object v from the lens.[10]

In finance[edit]
The weighted harmonic mean is the preferable method for averaging multiples, such as the price–earnings ratio (P/E). If these ratios are averaged using a weighted arithmetic mean, high data points are given greater weights than low data points. The weighted harmonic mean, on the other hand, correctly weights each data point.[11] The simple weighted arithmetic mean when applied to non-price normalized ratios such as the P/E is biased upwards and cannot be numerically justified, since it is based on equalized earnings; just as vehicles speeds cannot be averaged for a roundtrip journey (see above).[12]
For example, consider two firms, one with a market capitalization of $150 billion and earnings of $5 billion (P/E of 30) and one with a market capitalization of $1 billion and earnings of $1 million (P/E of 1000). Consider an index made of the two stocks, with 30% invested in the first and 70% invested in the second. We want to calculate the P/E ratio of this index.
Using the weighted arithmetic mean (incorrect):



    × × 
    × × 

{\displaystyle P/E=0.3\times 30+0.7\times 1000=709}

Using the weighted harmonic mean (correct):










    ≈ ≈ 

{\displaystyle P/E={\frac {0.3+0.7}{0.3/30+0.7/1000}}\approx 93.46}

Entonces, the correct P/E of 93.46 of this index can only be found using the weighted harmonic mean, while the weighted arithmetic mean will significantly overestimate it.

In geometry[edit]
In any triangle, the radius of the incircle is one-third of the harmonic mean of the altitudes.
For any point P on the minor arc BC of the circumcircle of an equilateral triangle ABC, with distances q and t from B and C respectively, and with the intersection of PA and BC being at a distance y from point P, we have that y is half the harmonic mean of q and t.[13]
In a right triangle with legs a and b and altitude h from the hypotenuse to the right angle, h² is half the harmonic mean of a² and b².[14][15]
Let t and s (t > s) be the sides of the two inscribed squares in a right triangle with hypotenuse c. Then s² equals half the harmonic mean of c² and t².
Let a trapezoid have vertices A, B, C, and D in sequence and have parallel sides AB and CD. Let E be the intersection of the diagonals, and let F be on side DA and G be on side BC such that FEG is parallel to AB and CD. Then FG is the harmonic mean of AB and DC. (This is provable using similar triangles.)

Crossed ladders. h is half the harmonic mean of A and B
One application of this trapezoid result is in the crossed ladders problem, where two ladders lie oppositely across an alley, each with feet at the base of one sidewall, with one leaning against a wall at height A and the other leaning against the opposite wall at height B, as shown. The ladders cross at a height of h above the alley floor. Then h is half the harmonic mean of A and B. This result still holds if the walls are slanted but still parallel and the “heights” A, B, and h are measured as distances from the floor along lines parallel to the walls. This can be proved easily using the area formula of a trapezoid and area addition formula.
In an ellipse, the semi-latus rectum (the distance from a focus to the ellipse along a line parallel to the minor axis) is the harmonic mean of the maximum and minimum distances of the ellipse from a focus.

In other sciences[edit]
In computer science, specifically information retrieval and machine learning, the harmonic mean of the precision (true positives per predicted positive) and the recall (true positives per real positive) is often used as an aggregated performance score for the evaluation of algorithms and systems: the F-score (or F-measure). This is used in information retrieval because only the positive class is of relevance, while number of negatives, in general, is large and unknown.[16] It is thus a trade-off as to whether the correct positive predictions should be measured in relation to the number of predicted positives or the number of real positives, so it is measured versus a putative number of positives that is an arithmetic mean of the two possible denominators.
A consequence arises from basic algebra in problems where people or systems work together. As an example, if a gas-powered pump can drain a pool in 4 hours and a battery-powered pump can drain the same pool in 6 hours, then it will take both pumps 6·4/6 + 4, which is equal to 2.4 hours, to drain the pool together. This is one-half of the harmonic mean of 6 and 4: 2·6·4/6 + 4 = 4.8. That is, the appropriate average for the two types of pump is the harmonic mean, and with one pair of pumps (two pumps), it takes half this harmonic mean time, while with two pairs of pumps (four pumps) it would take a quarter of this harmonic mean time.
In hydrology, the harmonic mean is similarly used to average hydraulic conductivity values for a flow that is perpendicular to layers (e.g., geologic or soil) – flow parallel to layers uses the arithmetic mean. This apparent difference in averaging is explained by the fact that hydrology uses conductivity, which is the inverse of resistivity.
In sabermetrics, a player’s Power–speed number is the harmonic mean of their home run and stolen base totals.
In population genetics, the harmonic mean is used when calculating the effects of fluctuations in the census population size on the effective population size. The harmonic mean takes into account the fact that events such as population bottleneck increase the rate genetic drift and reduce the amount of genetic variation in the population. This is a result of the fact that following a bottleneck very few individuals contribute to the gene pool limiting the genetic variation present in the population for many generations to come.
When considering fuel economy in automobiles two measures are commonly used – miles per gallon (mpg), and litres per 100 km. As the dimensions of these quantities are the inverse of each other (one is distance per volume, the other volume per distance) when taking the mean value of the fuel economy of a range of cars one measure will produce the harmonic mean of the other – i.e., converting the mean value of fuel economy expressed in litres per 100 km to miles per gallon will produce the harmonic mean of the fuel economy expressed in miles per gallon. For calculating the average fuel consumption of a fleet of vehicles from the individual fuel consumptions, the harmonic mean should be used if the fleet uses miles per gallon, whereas the arithmetic mean should be used if the fleet uses litres per 100 km. In the USA the CAFE standards (the federal automobile fuel consumption standards) make use of the harmonic mean.
In chemistry and nuclear physics the average mass per particle of a mixture consisting of different species (e.g., molecules or isotopes) is given by the harmonic mean of the individual species’ masses weighted by their respective mass fraction

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s




Blog Stats

  • 1,508,233 hits

Enter your email address to subscribe to this blog and receive notifications of new posts by

Join 820 other followers
%d bloggers like this: