## Archive for the ‘**mathematics**’ Category

## Big O() notation: a couple of sources

This post contains, just for future reference, a couple of primary sources relevant to the (“Big O”) notation:

- Some introductory words from
*Asymptotic Methods in Analysis*by de Bruijn - An letter from Donald Knuth on an approach to teaching calculus using this notation.

## Visualizing product of permutations

A simple pedagogical trick that may come in handy: represent a permutation using arrows (curved lines) from to for each . Then, the product of two permutations can be represented by just putting the two corresponding figures (sets of arrows) one below the other, and following the arrows.

The figure is from an article called *Symmetries* by Alain Connes, found via the Wikipedia article on Morley’s trisector theorem (something entirely unrelated to permutations, but the article covers both of them and more).

I’m thinking how one might write a program to actually draw these: if we decide that the “height” of the figure is some , then each arrow needs to go from some to (using here the usual screen convention of coordinate increasing from left to right, and coordinate increasing from top to bottom). Further, each curve needs to have vertical slope at its two endpoints, so that successive curves can line up smoothly. The constraint on starting point, ending point, and directions at the endpoints defines almost a quadratic Bezier curve, except that here the two directions are parallel. So it’s somewhere between a quadratic and the (usual) cubic Bezier curve, which is given by the start point, end point, and derivatives at the start and end point. (Here we only care about the direction of the derivative; we can pick some arbitrary magnitude to fix the curve: the larger we pick, the more smooth it will look at the ends, at the cost of smoothness in the interior.)

Even knowing the curve, how do we generate an image?

## The idea of logarithms, and the first appearance of e

[Incomplete post: about 10% written, about 90% left.]

The notion of the number , the exponential function , and logarithms are often conceptual stumbling blocks even to someone who has an otherwise solid understanding of middle-school mathematics.

Just *what* is the number ? How was it first calculated / where did it first turn up? Premature exposure to its numerical value

only serves to deepen the mysteriousness and to make it seem arbitrary.

Here a historical perspective helps: as is often the case, here too, the first appearance is simpler and more well-motivated than the accounts in dry textbooks. This is from this account by Matthew P. Wiener (originally posted on USENET somewhere, as quoted by MJD). I’m just going to quote it directly for now, and edit it later:

Napier, who invented logarithms, more or less worked out a table of logarithms to base , as follows:

0 1 2 3 4 5 6 7 8 9 10 ... 1 2 4 8 16 32 64 128 256 512 1024 ...The arithmetic progression in the first row is matched by a geometric progression in the second row. If, by any luck, you happen to wish to multiply 16 by 32, that just happen to be in the bottom row, you can look up their “logs” in the first row and add 4+5 to get 9 and then conclude 16·32=512.

For most practical purposes, this is useless. Napier realized that what one needs to multiply in general is for a base—the intermediate values will be much more extensive. For example, with base 1.01, we get:

0 1.00 1 1.01 2 1.02 3 1.03 4 1.04 5 1.05 6 1.06 7 1.07 8 1.08 9 1.09 10 1.10 11 1.12 12 1.13 13 1.14 14 1.15 15 1.16 16 1.17 17 1.18 18 1.20 19 1.21 20 1.22 21 1.23 22 1.24 23 1.26 24 1.27 25 1.28 26 1.30 27 1.31 28 1.32 29 1.33 30 1.35 31 1.36 32 1.37 33 1.39 34 1.40 35 1.42 [...] 50 1.64 51 1.66 52 1.68 53 1.69 54 1.71 55 1.73 [...] 94 2.55 95 2.57 96 2.60 97 2.63 98 2.65 99 2.68 100 2.70 101 2.73 102 2.76 103 2.79 104 2.81 105 2.84 [...]So if you need to multiply 1.27 by 1.33, say, just look up their logs, in this case, 24 and 29, add them, and get 53, so 1.27·1.33=1.69. For two/three digit arithmetic, the table only needs entries up to 9.99.

Note that is almost there, as the antilogarithm of 100. The natural logarithm of a number can be read off from the above table, as just [approximately] the corresponding exponent.

What Napier actually did was work with base .9999999. He spent 20 years computing powers of .9999999 by hand, producing a grand version of the above. That’s it. No deep understanding of anything, no calculus, and pops up anyway—in Napier’s case, was the 10 millionth entry. (To be pedantic, Napier did not actually use decimal points, that being a new fangled notion at the time.)

Later, in his historic meeting with Briggs, two changes were made. A switch to a base was made, so that logarithms would scale in the same direction as the numbers, and the spacing on the logarithm sides was chosen so that . These two changes were, in effect, just division by .

In other words, made its first appearance rather implicitly.

(I had earlier read a book on Napier and come to the same information though a lot less clearly, here.)

I had started writing a series of posts leading up to an understanding of the exponential function (here, here, here), but it seems to have got abandoned. Consider this one a contribution to that series.

## The functional equation f(x+y) = f(x)f(y)

Suppose satisfies . What can we say about ?

Putting gives

which can happen if either or . Note that the function which is identically zero satisfies the functional equation. If is not this function, i.e., if for at least one value of , then plugging that value of (say ) into the equation gives . Also, for any , the equation forces as well. Further, so for all .

Next, putting gives , and by induction . Putting in place of in this gives which means (note we’re using here). And again, . So , which completely defines the function at rational points.

[As , it can be written as for some constant , which gives for rational .]

To extend this function to irrational numbers, we need some further assumptions on , such as continuity. It turns out that being continuous at any point is enough (and implies the function is everywhere): note that . Even being Lebesgue-integrable/measurable will do.

Else, there are discontinuous functions satisfying the functional equation. (Basically, we can define the value of the function separately on each “independent” part. That is, define the equivalence class where and are related if for rationals and , pick a representative for each class using the axiom of choice (this is something like picking a basis for , which corresponds to the equivalence class defined by the relation ), define the value of the function independently for each representative, and this fixes the value of on . See this article for more details.)

To step back a bit: what the functional equation says is that is a homorphism from , the additive group of real numbers, to , the multiplicative monoid of real numbers. If is not the trivial identically-zero function, then (as we saw above) is in fact a homomorphism from , the additive group of real numbers, to , the multiplicative group of positive real numbers. What we proved is that the exponential functions are precisely all such functions that are nice (nice here meaning either measurable or continuous at least one point). (Note that this set includes the trivial homomorphism corresponding to : the function identically everywhere. If is not this trivial map, then it is in fact an isomorphism.)

Edit [2013-10-11]: See also Overview of basic facts about Cauchy functional equation.

## Trajectory of a point moving with acceleration perpendicular to velocity

(Just some basic high-school physics stuff; to assure myself I can still do some elementary things. :P Essentially, showing that if a particle moves with acceleration perpendicular to velocity, or velocity perpendicular to position, then it traces out a circle. Stop reading here if this is obvious.)

Suppose a point moves in the plane such that its acceleration is always perpendicular to its velocity, and of the same magnitude. What is its path like?

To set up notation: let’s say the point’s position at time is , its velocity is , and its acceleration is .

The result of rotating a point by 90° is . (E.g. see figure below)

So the fact that acceleration is at right angles to velocity means that , or, to write everything in terms of the velocity,

where we can get rid of by substituting the second equation (in the form ) into the first:

or in other words

By some theory about ordinary differential equations, which I don’t know (please help!) (but see the very related example you saw in high school, of simple harmonic motion), the solutions to this equation are and and any linear combination of those: the solution in general is

where and is the angle such that and . And the fact that gives . So . Note that is indeed perpendicular to as we wanted.

The actual trajectory can be got by integrating

to get and . This trajectory is a point moving on a circle centered at point and of radius , with speed or unit angular speed. Note that velocity is also perpendicular to the point’s position wrt the centre of the circle, as velocity is tangential to the circle, as it should be.

With a suitable change of coordinates (translate the origin to , then rotate the axes by , then scale everything so that ), this is the familiar paremetrization of the circle.

Note: Just as we derived from assuming that the acceleration is perpendicular to velocity, we can, by assuming that velocity is perpendicular to position, identically derive , i.e. that the point moves on a circle.

## The power series for sin and cos

There are many ways to *derive* the power series of and using the machinery of Taylor series etc., but below is another elementary way of demonstrating that the well-known power series expansions are the right ones. The argument below is from Tristan Needham’s *Visual Complex Analysis*, which I’m reproducing without looking at the book just to convince myself that I’ve internalized it correctly.

So: let

We will take the following two for granted (both can be proved with some effort):

- Both power series are convergent.
- The power series can be differentiated term-wise.

As suggested by (2) above, the first thing we observe is that and .

So firstly:

which means that is a constant and does not vary with . Putting shows that and , so for all .

Secondly, define the angle as a function of , by . (To be precise, this defines up to a multiple of , i.e. modulo .)

Differentiating the left-hand side of this definition gives

(where means )

while differentiating the right-hand side gives

The necessary equality of the two tells us that , which along with the initial condition that says , gives (or to be precise, ).

In other words, we have shown that the power series and satisfy and therefore and for some . The observation that (or our earlier observation that for all ) gives , thereby showing that and .

So much for and . Just as an aside, observe that if we take to be a symbol satisfying , then

the right hand side of which looks very much like the result of “substituting” in the known (real) power series

(which itself can be proved using the term-wise differentiation above and the defining property , say).

So this is one heuristic justification for us to *define* .

Or, if we *define* as the result of substituting in the real power series for , this *proves* that .

## The potions puzzle by Snape in Harry Potter and the Philosopher’s Stone

Sometime this week, I reread the first Harry Potter book (after at least 10 years… wow, has it been that long?), just for contrast after reading Rowling’s adult novel *The Casual Vacancy* (on which more later). Anyway, in the penultimate chapter there is a puzzle:

He pulled open the next door, both of them hardly daring to look at what came next — but there was nothing very frightening in here, just a table with seven differently shaped bottles standing on it in a line.

“Snape’s,” said Harry. “What do we have to do?”

They stepped over the threshold, and immediately a fire sprang up behind them in the doorway. It wasn’t ordinary fire either; it was purple. At the same instant, black flames shot up in the doorway leading onward. They were trapped.

“Look!” Hermione seized a roll of paper lying next to the bottles. Harry looked over her shoulder to read it:

Danger lies before you, while safety lies behind,

Two of us will help you, whichever you would find,

One among us seven will let you move ahead,

Another will transport the drinker back instead,

Two among our number hold only nettle wine,

Three of us are killers, waiting hidden in line.

Choose, unless you wish to stay here forevermore,

To help you in your choice, we give you these clues four:

First, however slyly the poison tries to hide

You will always find some on nettle wine’s left side;

Second, different are those who stand at either end,

But if you would move onward, neither is your friend;

Third, as you see clearly, all are different size,

Neither dwarf nor giant holds death in their insides;

Fourth, the second left and the second on the right

Are twins once you taste them, though different at first sight.

I became curious about whether this is just a ditty Rowling made up, or the puzzle actually makes sense and is consistent. It turns out she has constructed it well. Let’s take a look. This investigation can be carried out by hand, but we’ll be lazy and use a computer, specifically Python. The code examples below are all to be typed in an interactive Python shell, the one that you get by typing “python” in a terminal.

So what we have are seven bottles, of which one will take you forward (F), one will let you go back (B), two are just nettle wine (N), and three are poison (P).

>>> bottles = ['F', 'B', 'N', 'N', 'P', 'P', 'P']

The actual ordering of these 7 bottles is some ordering (permutation) of them. As 7 is a very small number, we can afford to be inefficient and resort to brute-force enumeration.

>>> import itertools >>> perms = [''.join(s) for s in set(itertools.permutations(bottles))] >>> len(perms) 420

The `set`

is needed to remove duplicates, because otherwise `itertools.permutations`

will print 7! “permutations”. So already the number of all possible orderings is rather small (it is ). We can look at a sample to check whether things look fine.

>>> perms[:10] ['PNFNPBP', 'NPPBNFP', 'FNNPBPP', 'PPPFNBN', 'NPPNBFP', 'PFNNBPP', 'NPBPPFN', 'NBNPPFP', 'NPPFBNP', 'BNPFNPP']

Now let us try to solve the puzzle. We can start with the first clue, which says that wherever a nettle-wine bottle occurs, on its left is always a poison bottle (and in particular therefore, a nettle-wine bottle cannot be in the leftmost position). So we must restrict the orderings to just those that satisfy this condition.

>>> def clue1(s): return all(i > 0 and s[i-1] == 'P' for i in range(len(s)) if s[i]=='N') ... >>> len([s for s in perms if clue1(s)]) 60

(In the code, the 7 positions are 0 to 6, as array indices in code generally start at 0.)

Then the second clue says that the bottles at the end are different, and neither contains the potion that lets you go forward.

>>> def clue2(s): return s[0] != s[6] and s[0] != 'F' and s[6] != 'F' ... >>> len([s for s in perms if clue1(s) and clue2(s)]) 30

The third clue says that the smallest and largest bottles don’t contain poison, and this would be of help to Harry and Hermione who can see the sizes of the bottles. But as we readers are not told the sizes of the bottles, this doesn’t seem of any help to us; let us return to this later.

The fourth clue says that the second-left and second-right bottles have the same contents.

>>> def clue4(s): return s[1] == s[5] ... >>> len([s for s in perms if clue1(s) and clue2(s) and clue4(s)]) 8

There are now just 8 possibilities, finally small enough to print them all.

>>> [s for s in perms if clue1(s) and clue2(s) and clue4(s)] ['PPNBFPN', 'BPNPFPN', 'BPFPNPN', 'BPPNFPN', 'PNPFPNB', 'BPNFPPN', 'PPNFBPN', 'PNFPPNB']

Alas, without knowing which the “dwarf” and “giant” bottles are, we cannot use the third clue, and this seems as far as we can go. We seem to have exhausted all the information available…

Almost. It is reasonable to assume that the puzzle is meant to have a solution. So even without knowing where exactly the “dwarf” and “giant” bottles are, we can say that they are in some pair of locations that ensure a unique solution.

>>> def clue3(d, g, s): return s[d]!='P' and s[g]!='P' ... >>> for d in range(7): ... for g in range(7): ... if d == g: continue ... poss = [s for s in perms if clue1(s) and clue2(s) and clue4(s) and clue3(d,g,s)] ... if len(poss) == 1: ... print d, g, poss[0] ... 1 2 PNFPPNB 1 3 PNPFPNB 2 1 PNFPPNB 2 5 PNFPPNB 3 1 PNPFPNB 3 5 PNPFPNB 5 2 PNFPPNB 5 3 PNPFPNB

Aha! If you look at the possible orderings closely, you will see that we are down to just two possibilities for the ordering of the bottles.

Actually there is some scope for quibbling in what we did above: perhaps we cannot say that there is a unique solution determining the *entire* configuration; perhaps all we can say is that the puzzle should let us uniquely determine the positions of just the two useful bottles. Fortunately, that gives exactly the same set of possibilities, so this distinction happens to be inconsequential.

>>> for d in range(7): ... for g in range(7): ... if d == g: continue ... poss = [(s.index('F'),s.index('B')) for s in perms if clue1(s) and clue2(s) and clue4(s) and clue3(d,g,s)] ... if len(set(poss)) == 1: ... print d, g, [s for s in perms if clue1(s) and clue2(s) and clue4(s) and clue3(d,g,s)][0] ... 1 2 PNFPPNB 1 3 PNPFPNB 2 1 PNFPPNB 2 5 PNFPPNB 3 1 PNPFPNB 3 5 PNPFPNB 5 2 PNFPPNB 5 3 PNPFPNB

Good. Note that there are only two configurations above. So with *only the clues in the poem, and the assumption that the puzzle can be solved*, we can narrow down the possibilities to two configurations, and be sure of the contents of all the bottles except the third and fourth. We know that the potion that lets us go forward is in either the third or the fourth bottle.

In particular we see that the last bottle lets us go back, and indeed this is confirmed by the story later:

“Which one will get you back through the purple flames?”

Hermione pointed at a rounded bottle at the right end of the line.

[…]

She took a long drink from the round bottle at the end, and shuddered.

But we don’t know which of the two it is, as we can’t reconstruct all the relevant details of the configuration. Perhaps we can reconstruct something with the remaining piece of information from the story?

“Got it,” she said. “The smallest bottle will get us through the black fire — toward the Stone.”

Harry looked at the tiny bottle.

[…]

Harry took a deep breath and picked up the smallest bottle.

So we know that the bottle that lets one move forward is in fact in the smallest one, the “dwarf”.

>>> for d in range(7): ... for g in range(7): ... poss = [s for s in perms if clue1(s) and clue2(s) and clue4(s) and clue3(d,g,s)] ... if len(poss) == 1 and poss[0][d] == 'F': ... print d, g, poss[0] ... 2 1 PNFPPNB 2 5 PNFPPNB 3 1 PNPFPNB 3 5 PNPFPNB

This narrows the possible positions of the smallest and largest bottles (note that it says the largest bottle is one that contains nettle wine), but still leaves the same two possibilities for the complete configuration. So we can stop here.

**Conclusion**

What we can conclude is the following: apart from the clues mentioned in the poem, the “dwarf” (the smallest bottle) was in either position 2 (third from the left) or 3 (fourth from the left). The biggest bottle was in either position 1 (second from the left) or 5 (sixth from the left). With this information about the location of the smallest bottle (and *without* necessarily assuming the puzzle has a unique solution!), Hermione could determine the contents of all the bottles. In particular she could determine the location of the two useful bottles: namely that the bottle that lets you go back was the last one, and that the one that lets you go forward was the smallest bottle.

>>> for (d,g) in [(2,1), (2,5), (3,1), (3,5)]: ... poss = [s for s in perms if clue1(s) and clue2(s) and clue4(s) and clue3(d, g, s)] ... assert len(poss) == 1 ... s = poss[0] ... assert s.index('B') == 6 ... assert s.index('F') == d ... print (d,g), s ... (2, 1) PNFPPNB (2, 5) PNFPPNB (3, 1) PNPFPNB (3, 5) PNPFPNB

It is not clear why she went to the effort to create a meaningful puzzle, then withheld details that would let the reader solve it fully. Perhaps some details were removed during editing. As far as making up stuff for the sake of a story goes, though, this is nothing; consider for instance the language created for *Avatar* which viewers have learned.

Added:

See also http://www.zhasea.com/logic/snape.html which does it by hand, and has a perfectly clear exposition (it doesn’t try the trick of guessing that solution is unique before reaching for the additional information from the story).

## Are there Fibonacci numbers starting with 2012? (continued)

Almost 8 months ago I left the first part of this post unfinished planning to complete it in the morning; seems I slept too long. (Or as this guy said after a 2-year hiatus: “Sorry for the pause guys. I was in the bathroom.”)

Recall: To get a power of 2 that starts with a prefix (like ), we want such that the fractional part of lies between those of and (all logarithms here to base 10), and similarly to get a Fibonacci number starting with , we want (with some hand-waving) such that the fractional part of lies between those of and . The more general problem is:

**Problem**: Given an irrational number and an interval , find such that lies in the interval .

Here is one method, based on Edward Burger’s book *Exploring the Number Jungle*. Let be the midpoint of the interval . Then we are trying to find such that is close to .

- Find a fraction approximating , such that . (These are the convergents of the continued fraction of , but in practice it seems you can also get away with taking semi-convergents that may not satisfy this property.)
- Let be the closest integer to . Note that this automatically means
- Write with . This you can do quite easily with the Euclidean algorithm.
- Then for and , we have (it is a simple exercise to prove this)

- This means that the distance between and is small, modulo 1. If this distance turns out to be still too large, start with a bigger convergent .

I think I had some code to post as well (hey, what’s the actual Fibonacci number that starts with 2012?), but it needs to be cleaned up… probably this works (in Sage).

The thing we’re doing here is called **inhomogeneous Diophantine approximation**.

## Are there Fibonacci numbers starting with 2012?

Do there exist powers of 2 whose first four digits are 2012?

Are there Fibonacci numbers whose first four digits are 2012?

If the answer is obvious to you (or you don’t care), you can stop reading.

The answer for both is:

- Yes.
- There are infinitely many such numbers.
- In fact, the fraction of (powers of 2 / Fibonacci numbers) starting with 2012 is exactly

Similarly with any other prefix *p* (of any length) in place of 2012. Proof follows.

A number *x* starts with a prefix *p* if and only if for some *k ≥ 0*,

Thus a power of 2, say 2^{n}, starts with *p* iff

for some

Taking logarithms to base 10 and simplifying, this is equivalent to

for some

This is saying that the fractional part of lies between the fractional parts of and For example, if , this means that the fractional part of lies between and .

Similarly, for Fibonacci numbers, as is (or should be) well-known, the *n*th Fibonacci number F_{n} is the closest integer to , where is the golden ratio. So starts with iff

Taking logarithms to base 10 and simplifying, while ignoring the annoying which becomes irrelevant in the limit (this line is not rigorous), this is equivalent to

which means that the fractional part of lies between the fractional parts of and . For , this means that the fractional part of lies between and .

In either case, we are trying to make the fractional part of , for some irrational number , lie in some interval. The relevant fact is this:

**Theorem 1**: for any irrational number , the sequence (where denotes the fractional part of ) is dense in .

or, in other words,

**Theorem 1**: For any irrational number , the sequence is dense modulo .

Proving this theorem is a good exercise.

This means that for any interval you want, you can always find some such that the fractional part of lies in your interval. In fact, because the sequence is dense, you can find an infinite sequence of such that the fractional parts of converge to the midpoint (say) of the desired interval. This proves the first two facts of the answer, and for the third we need a stronger theorem:

** Theorem 2** [Equidistribution theorem]: For any irrational number , the numbers are uniformly distributed modulo 1.

This means that for any interval of size (say), the fraction of integers for which lies in the interval satisfies

This proves the third fact. The fraction of Fibonacci numbers (or of powers of a number that is not a power of 10) that start with a prefix is exactly where log is to base 10.

That much is standard. And non-constructive. We are assured of the existence of such numbers, but how do we actually find one?

The answer (or one answer), as it so often does in these problems, involves continued fractions. Here is one method, *[to be continued when I wake up :p]*

**Edit**: Continued here.

## (1+ix/n)^n

[Posting some images here for possible future reuse.]

A non-rigorous argument: when is large enough so that is small, is roughly (hand-waving) the point on the unit circle at arc length (and hence angle) :

So multiplication by roughly corresponds to rotation by angle . Multiplication by , which is multiplication by n times, roughly corresponds to rotation by angle . As , “roughly” becomes exact.

Animation for :

Image generated from Python-generated SVG files; code available if anyone wants.

In particular, once one accepts the fact/definition that (for instance, show that the function satisfies ), it is true that is a rotation by π, that is,

## Serieshelpmate in 19

Here’s a brilliant problem.

Consider the following chess position.

Black is to make 19 consecutive moves, after which White checkmates Black in one move. Black may not move into check, and may not check White (except possibly on his last move). Black and White are

cooperatingto achieve the aim of checkmate. (In chess problem parlance, this problem is called aserieshelpmate in 19.) How many different solutions are there?This problem is due to Kauko Väisänen, and appears in A. Puusa,

Queue Problems, Finnish Chess Problem Society, Helsinki, 1992 (Problem 2).

Hint: the above is quoted from Richard Stanley’s *Enumerative Combinatorics*.

## How does Tupper’s self-referential formula work?

*[I write this post with a certain degree of embarrassment, because in the end it turns out (1) to be more simple than I anticipated, and (2) already done before, as I could have found if I had internet access when I did this. :-)]*

The so-called “Tupper’s self-referential formula” is the following, due to Jeff Tupper.

Graph the set of all points such that

in the region

where N is the following 544-digit integer:

48584506361897134235820959624942020445814005879832445494830930850619

34704708809928450644769865524364849997247024915119110411605739177407

85691975432657185544205721044573588368182982375413963433822519945219

16512843483329051311931999535024137587652392648746133949068701305622

95813219481113685339535565290850023875092856892694555974281546386510

73004910672305893358605254409666435126534936364395712556569593681518

43348576052669401612512669514215505395545191537854575257565907405401

57929001765967965480064427829131488548259914721248506352686630476300

The result is the following graph:

Whoa. How does this work?

At first sight this is rather too incredible for words.

But after a few moments we can begin to guess what is going on, and see that—while clever—this is perhaps not so extraordinary after all. So let us calmly try to reverse-engineer this feat.

## “Your monkey did not jump high enough”

Yesterday, in Futility Closet there was a post:

In Longfellow’s novel

Kavanagh, Mr. Churchill reads a word problem to his wife:“In a lake the bud of a water-lily was observed, one span above the water, and when moved by the gentle breeze, it sunk in the water at two cubits’ distance. Required the depth of the water.”

“That is charming, but must be very difficult,” she says. “I could not answer it.”

Is it? If a span is 9 inches and a cubit is 18 inches, how deep is the water?

The problem is simple enough: if the depth of the water is x inches so that the lotus from bottom to tip is x+9 inches, then x^{2}+36^{2}=(x+9)^{2}, which means x=(36^{2}-9^{2})/18=135/2=67.5.

More interestingly, as I accidentally recognised (I don’t know how), it is from the Sanskrit mathematics text *Lilavati* (and also found in the *Bījagaṇita*) of Bhaskaracharya (Bhaskara II). That entire chapter of *Kavanagh* is essentially quoting the Lilavati (*Kavanagh* is written in a somewhat embarrassing tone that perhaps explains why it’s so obscure :p); it’s included later below the horizontal line in this post.

Bhaskaracharya, believed to have lived in the 12th century, is considered the last great Indian mathematician, outside of the Kerala school. Like most Sanskrit texts, the *Līlāvati* is written in verse, so as to be easier to memorise. Unlike many Sanskrit technical works (or for that matter technical works in any language), however, Bhāskara’s works are not written in the typical dry style, and can veer quite poetic at times. His description of the seasons in one of his astronomical works is one of the few true instances of poetry in the Sanskrit astronomical/mathematical corpus. This particular problem, it happens, is written in the beautiful *mandākrānta* metre: (If it helps: mandakranta is the metre of the Meghadūta, of “शान्ताकारं भुजगशयनं…”, of “नास्था धर्मे न वसुनिचये…”, etc., and you can listen to a recitation in the Marathi tradition by Ashwini Deo.)

चक्रक्रौञ्चाकुलितसलिले क्वापि दृष्टं तडागे तोयादूर्ध्वं कमलकलिकाग्रं वितस्तिप्रमाणम् मन्दं मन्दं चलितमनिलेनाऽऽहतं हस्तयुग्मे तस्मिन्मग्नं गणक कथय क्षिप्रमम्बुप्रमाणम्cakra-krauñcākulita-salile kvāpi dṛṣṭaṃ taḍāge

toyād ūrdhvaṃ kamala-kalikāgraṃ vitasti-pramāṇam

mandaṃ mandaṃ calitam anilenāhataṃ hasta-yugme

tasmin magnaṃ gaṇaka kathaya kṣipram ambu-pramāṇamIn a certain lake swarming with geese and cranes,

the tip of a bud of lotus was seen one span above the water.

Forced by the wind, it gradually moved, and was submerged at a distance of two cubits.

O mathematician, tell quickly the depth of the water.

Well, that’s my translation, close to Longfellow’s quoted translation by Taylor and to Colebrooke’s better translation, but I may be wrong, so details for anyone who cares to improve it:

In a certain [kvāpi] pool [taḍāge] whose water [salile] was swarming [ākulita] with ruddy geese [cakra] and curlews [krauñcā],

above the water [toyād ūrdhvaṃ] a lotus-bud-tip [kamala-kalikāgraṃ] at a distance of one span [vitasti-pramāṇam] was seen [dṛṣṭaṃ].

Slowly slowly [mandaṃ mandaṃ] by the wind [anilena] moved [calitam] and forced [āhataṃ],

at a distance of two cubits [hasta-yugme] it got submerged [magnaṃ] in the water [tasmin].

O mathematician [gaṇaka], say [kathaya] quickly [kṣipram] the depth of the water [ambu-pramāṇam].

The structure of the book may be worth remarking on: the general formula for exactly this problem is given first (in more technical terms), and then this problem is given as an example!

Glancing through Longfellow, one finds he’s also written a tiny poem called King Trisanku:

Viswamitra the Magician,

By his spells and incantations,

Up to Indra’s realms elysian

Raised Trisanku, king of nations.Indra and the gods offended

Hurled him downward, and descending

In the air he hung suspended,

With these equal powers contending.Thus by aspirations lifted,

By misgivings downward driven,

Human hearts are tossed and drifted

Midway between earth and heaven.

Ho hum (1845 America).

The chapter of *Kavanagh* below this line.

Read the rest of this entry »

## Not all best rational approximations are the convergents of the continued fraction!

[For more background on continued fractions and why they are so wonderful at approximations (and wonderful generally) — eventually I may edit this post to mention that. For now I just want to quickly clarify something, which surprisingly many popular expositions of continued fractions seem to mislead by leaving out.]

Any real number can be written as a continued fraction. For instance, the square root of 3 is

written as ,

and π is

i.e. (more terms below),

and *e* is (proved here).

Rational numbers have finite continued fractions, and irrational numbers have infinite continued fractions, of which only quadratic irrationals (like √3 above) have periodic continued fractions. Non-quadratic irrationals may still have some recognisable patterns in their representation, like *e* above, or not, like π.

The numbers you get by truncating the continued fraction — taking only the first few terms — are called its *convergents*. The continued fraction for π is [3; 7, 15, 1, 292, 1, 1, 1, 2, 1, 3, 1, 14, 2…] (sequence A001203 in OEIS), and the sequence of convergents is 3/1, 22/7, 333/106, 355/113, 103993/33102… (A002485/A002486):

Each of these convergents is a *best rational approximation*, in the following sense.

**Definition**: A fraction p/q is a best rational approximation of a real number x if p/q is closer to x than any other fraction with a smaller denominator. That is, for any other fraction with and , we have .

Thus among all fractions with denominator at most 7, the fraction 22/7 is closest to π. Among all fractions with denominator at most 113, the fraction 355/113 is closest to π. And so on.

Every convergent is a best rational approximation, **but these aren’t all of the best rational approximations!** The sequence of best approximations of π is actually **3/1**, 13/4, 16/5, 19/6, **22/7**, 179/57, 201/64, 223/71, 245/78, 267/85, 289/92, 311/99, **333/106**, **355/113**, 52163/16604… (A063674/A063673), where only the bold terms are convergents. (Aside: Notice how good 355/113 is: it is the best among all fractions with denominator at most 16603. This is because of the large term 292 in the continued fraction of π.) The others are *semi-convergents* (*intermediate fractions*), as explained below.

In general, for any real number, the convergents of its continued fraction are always best rational approximations, but they aren’t *all* of the best rational approximations. To get all of them, you also have to take the semi-convergents: fractions of the form for some integer n≥1. Specifically, taking — the next term in the continued fraction expansion — gives , and the ones to take are precisely from (if it’s better than the previous convergent, else ) to . This is also mentioned on Wikipedia.

Thus for π = [3; 7, 15, 1, 292, 1, 1, 1, 2, 1, 3, 1, 14, 2…], the best rational approximations are:

```
3/1 = [3]
13/4 = [3; 4]
16/5 = [3; 5]
19/6 = [3; 6]
22/7 = [3; 7]
179/57 = [3; 7, 8]
201/64 = [3; 7, 9]
223/71 = [3; 7, 10]
245/78 = [3; 7, 11]
267/85 = [3; 7, 12]
289/92 = [3; 7, 13]
311/99 = [3; 7, 14]
333/106 = [3; 7, 15]
355/113 = [3; 7, 15, 1]
52163/16604 = [3; 7, 15, 1, 146]
52518/16717 = [3; 7, 15, 1, 147]
```*… (terms from [3; 7, 15, 1, 148] to [3; 7, 15, 1, 291])...*
103993/33102 = [3; 7, 15, 1, 292]
104348/33215 = [3; 7, 15, 1, 292, 1]
...

The above is the natural meaning of “best rational approximation”. This is also known as “best rational approximation of the first kind”. There is a different definition of “best rational approximation” under which the convergents *are* precisely all the best rational approximations: basically, instead of considering the distance , we consider . That is: A fraction is a **best approximation of the second kind** of a real number if for every fraction with and , we have

This measure turns out to have a nicer theory, but to omit specifying that this non-obvious definition is the one being used is misleading.

## Program

For illustration, here’s a C program (why it’s written in C is not important right now) that given a positive real number, generates its continued fraction, its convergents, and the sequence of best rational approximations. The function `find_cf`

finds the continued fraction (putting the terms in a[] and the convergents in p[] and q[] — excuse the global variables), and the function `all_best`

prints all the best rational approximations.

#include <math.h> #include <stdio.h> #include <assert.h> // number of terms in continued fraction. // 15 is the max without precision errors for M_PI #define MAX 15 #define eps 1e-9 long p[MAX], q[MAX], a[MAX], len; void find_cf(double x) { int i; //The first two convergents are 0/1 and 1/0 p[0] = 0; q[0] = 1; p[1] = 1; q[1] = 0; //The rest of the convergents (and continued fraction) for(i=2; i<MAX; ++i) { a[i] = lrint(floor(x)); p[i] = a[i]*p[i-1] + p[i-2]; q[i] = a[i]*q[i-1] + q[i-2]; printf("%ld: %ld/%ld\n", a[i], p[i], q[i]); len = i; if(fabs(x-a[i])<eps) return; x = 1.0/(x - a[i]); } } void all_best(double x) { find_cf(x); printf("\n"); int i, n; long cp, cq; for(i=2; i<len; ++i) { //Test n = a[i+1]/2 separately. Enough to test when a[i+1] is even, actually. n = a[i+1]/2; cp = n*p[i]+p[i-1]; cq = n*q[i]+q[i-1]; if(fabs(x-(double)cp/cq) < fabs(x-(double)p[i]/q[i])) printf("%ld/%ld,", cp, cq); //And print all the rest, no need to test for(n = (a[i+1]+2)/2; n<=a[i+1]; ++n) { printf("%ld/%ld,", n*p[i]+p[i-1], n*q[i]+q[i-1]); } } } int main(int argc, char **argv) { double x; if(argc==1) { x = M_PI; } else { sscanf(argv[1], "%lf", &x); } assert(x>0); printf("%.15lf\n\n", x); all_best(x); printf("\n"); return 0; }

## Examples

Here’s the output of this program, in about 0.003 seconds (i.e., it’s truly better than looping through all possible denominators!), line-wrapped for readability:

```
% ./a.out
3.141592653589793
3: 3/1
7: 22/7
15: 333/106
1: 355/113
292: 103993/33102
1: 104348/33215
1: 208341/66317
1: 312689/99532
2: 833719/265381
1: 1146408/364913
3: 4272943/1360120
1: 5419351/1725033
14: 80143857/25510582
13/4, 16/5, 19/6, 22/7, 179/57, 201/64, 223/71, 245/78, 267/85, 289/92, 311/99,
333/106, 355/113, 52163/16604, 52518/16717, 52873/16830, 53228/16943, 53583/17056,
53938/17169, 54293/17282, 54648/17395, 55003/17508, 55358/17621, 55713/17734,
56068/17847, 56423/17960, 56778/18073, 57133/18186, 57488/18299, 57843/18412,
58198/18525, 58553/18638, 58908/18751, 59263/18864, 59618/18977, 59973/19090,
60328/19203, 60683/19316, 61038/19429, 61393/19542, 61748/19655, 62103/19768,
62458/19881, 62813/19994, 63168/20107, 63523/20220, 63878/20333, 64233/20446,
64588/20559, 64943/20672, 65298/20785, 65653/20898, 66008/21011, 66363/21124,
66718/21237, 67073/21350, 67428/21463, 67783/21576, 68138/21689, 68493/21802,
68848/21915, 69203/22028, 69558/22141, 69913/22254, 70268/22367, 70623/22480,
70978/22593, 71333/22706, 71688/22819, 72043/22932, 72398/23045, 72753/23158,
73108/23271, 73463/23384, 73818/23497, 74173/23610, 74528/23723, 74883/23836,
75238/23949, 75593/24062, 75948/24175, 76303/24288, 76658/24401, 77013/24514,
77368/24627, 77723/24740, 78078/24853, 78433/24966, 78788/25079, 79143/25192,
79498/25305, 79853/25418, 80208/25531, 80563/25644, 80918/25757, 81273/25870,
81628/25983, 81983/26096, 82338/26209, 82693/26322, 83048/26435, 83403/26548,
83758/26661, 84113/26774, 84468/26887, 84823/27000, 85178/27113, 85533/27226,
85888/27339, 86243/27452, 86598/27565, 86953/27678, 87308/27791, 87663/27904,
88018/28017, 88373/28130, 88728/28243, 89083/28356, 89438/28469, 89793/28582,
90148/28695, 90503/28808, 90858/28921, 91213/29034, 91568/29147, 91923/29260,
92278/29373, 92633/29486, 92988/29599, 93343/29712, 93698/29825, 94053/29938,
94408/30051, 94763/30164, 95118/30277, 95473/30390, 95828/30503, 96183/30616,
96538/30729, 96893/30842, 97248/30955, 97603/31068, 97958/31181, 98313/31294,
98668/31407, 99023/31520, 99378/31633, 99733/31746, 100088/31859, 100443/31972,
100798/32085, 101153/32198, 101508/32311, 101863/32424, 102218/32537, 102573/32650,
102928/32763, 103283/32876, 103638/32989, 103993/33102, 104348/33215, 208341/66317,
312689/99532, 833719/265381, 1146408/364913, 3126535/995207,
4272943/1360120, 5419351/1725033, 42208400/13435351, 47627751/15160384,
53047102/16885417, 58466453/18610450, 63885804/20335483, 69305155/22060516,
74724506/23785549, 80143857/25510582,
```

All these terms are correct, though if you increase MAX you start getting errors because of precision. I’m myself impressed with how many terms you get with only 13 convergents. (As you can see, there’s a small bug where it sometimes doesn’t print the very first “n/1″ approximation, or prints it incorrectly — I leave it for you to fix!)

You can try with √2, whose continued fraction is [1; 2, 2, 2, 2…]:

```
% ./a.out 1.41421356237309504880
1.414213562373095
1: 1/1
2: 3/2
2: 7/5
2: 17/12
2: 41/29
2: 99/70
2: 239/169
2: 577/408
2: 1393/985
2: 3363/2378
2: 8119/5741
2: 19601/13860
2: 47321/33461
3/2, 4/3, 7/5, 17/12, 24/17, 41/29, 99/70, 140/99, 239/169, 577/408, 816/577, 1393/985, 3363/2378, 4756/3363, 8119/5741, 19601/13860, 47321/33461,
```

Or the golden ratio φ = (1+√5)/2 whose continued fraction is [1; 1, 1, 1, …]:

```
% ./a.out 1.61803398874989484820
1.618033988749895
1: 1/1
1: 2/1
1: 3/2
1: 5/3
1: 8/5
1: 13/8
1: 21/13
1: 34/21
1: 55/34
1: 89/55
1: 144/89
1: 233/144
1: 377/233
2/1, 3/2, 5/3, 8/5, 13/8, 21/13, 34/21, 55/34, 89/55, 144/89, 233/144, 377/233,
```

(See the Fibonacci numbers? Here the convergents are all the approximants.)

Or with rational numbers like 4/3 = [1; 3]:

```
% ./a.out 1.33333333333333333333
1.333333333333333
1: 1/1
3: 4/3
3/2, 4/3,
```

or 14/11 = [1; 3, 1, 2]:

```
% ./a.out 1.27272727272727272727
1.272727272727273
1: 1/1
3: 4/3
1: 5/4
2: 14/11
3/2, 4/3, 5/4, 9/7, 14/11,
```

[**2011-11-04**: Fixed the bug and the description.]

## Shor’s algorithm

I’d learnt Shor’s algorithm in a course a few years ago, but (as usual) forgot all about it.

Yesterday I reread it from the *Algorithms* textbook of Dasgupta, Papadimitriou, and Vazirani, only because someone claimed that their Chapter 10 on Quantum algorithms was incomprehensible, so here is a very vague sketch of it before I forget it again. (For a popular account, which Peter Shor called “the best job of explaining quantum computing to the man on the street that I’ve seen”, see Scott Aaronson’s post.)

#### 1. Quantum

The quantum magic in Shor’s algorithm, the part that uses something unique to quantum computation, is simply the following fact: *a quantum computer can perform the quantum Fourier transform in O(log^{2} N) time.*

That is, given a “superposition” , in O(log^{2}N) steps the quantum fourier-magic-thing produces the superposition with . Of course, it just produces this superposition and does not output it; if we try to measure this we’d get only one of the states, specifically with probability .

(I will not write here how this QFT works. It uses the Hadamard gate etc., and roughly, what would be two recursive calls in the usual FFT can be done with just one operation here.)

All the other ideas are classical (i.e. non-quantum).

#### 2. Fourier transform of periodic vectors

Suppose is some factor of . It can be proved with some straightforward calculation that if (viewed as a vector/sequence) is periodic with period , with the only nonzero coefficients being , then its Fourier transform is also periodic with period — the only nonzero coefficients are .

The point is that if we “measure” this, then we get one of the multiples of at random, so by taking a few measurements, we can figure out and therefore .

#### 3. Factoring reduces to order-finding

We can see from number theory (use the Chinese remainder theorem) that if N is composite (etc.), then with probability at least 1/2, a random *x* modulo N has an even order *r* (i.e. the smallest positive integer such that ) such that is a nontrivial square root of 1 mod N. So if we can find the order , then we have such that so with neither factor being 0, so we get a factor of . So all we need to be able to do is find the order.

Modulo , pick any and consider the function . If is the order of mod then the function is periodic with period . The algorithm runs as follows:

- First, make a superposition normalised. (Think of the as two quantum “registers”.) (This is made by starting with just and taking its Fourier transform!)
- Now “compute” (for some random ) on it. This gives the superposition .
- Then measure the second register. This gives a random value from the second register, but more importantly, it make the
*first register*be a superposition of only those values that are compatible with this value — therefore, periodic with period . - Now do the Fourier transform on the first and measure it, so we get a random multiple of .

Doing all this the right number of times with the right constants does it. I’ve cheated in all this, because obviously the *N* we’re trying to factor is not a power of 2, but that can be taken care of.

I think Shor’s big idea (over Simon’s algorithm for period-finding) was in coming up with this function *f* whose period would give the factors of N. But I may be wrong.

## Equivalent forms of the Riemann hypothesis

- Let be the
*n*th harmonic number, i.e. . Then, the Riemann hypothesis is true if and only ifThe left-hand side, which is the sum of the divisors of , is also denoted .

See Jeffrey Lagarias,*An Elementary Problem Equivalent to the Riemann Hypothesis*[PDF, arXiv. - Define the Redheffer matrix to be the 0-1 matrix with entries (and 0 otherwise). Then the Riemann hypothesis is true if and only if for all .

For more, see Equivalences to the Riemann hypothesis (by J. Brian Conrey and David W. Farmer), and Consequences of the Riemann hypothesis (MathOverflow)

For fun: claimed [dis/]proofs.

Quote found via rjlipton:

The Riemann Hypothesis is the most basic connection between addition and multiplication that there is, so I think of it in the simplest terms as something really basic that we don’t understand about the link between addition and multiplication.

—Brian Conrey

## Convex, continuous, Jensen’s

On the principle that things I’ve forgotten at least thrice are worth remembering…

A convex function (need a mnemonic?) is such that

for any λ in [0,1].

**Midpoint convex (almost) implies convex**

A natural question is whether we need all λ in [0,1]; what about just ½? Functions which satisfy

are called midpoint convex.

It turns out that when things are “nice” as in most frequently encountered cases, a midpoint convex function is convex. (A counterexample is this crazy non-measurable nowhere-continuous bounded-in-no-interval function: since R is a vectorspace over Q, there is a basis (that includes π, say), and any real number x has a unique representation as a finite rational sum of basis elements. Define f(x) to be the coefficient of π in the representation of x. This function is in fact linear over Q.)

For midpoint convex to mean convex, any of the following conditions will do:

- f is continuous [Proof: prove the inequality for λ with denominators that are powers of 2, then use denseness in R.]
- f is continuous at some point (any one point will do) [Proof: see next section]
- f is (Lebesgue) measurable
- f is bounded on some measurable set with positive measure

**Convex functions are continuous**

In fact, a stronger theorem is true: If a midpoint convex function is discontinuous at a single point, then it is unbounded on every interval and (hence) discontinuous *everywhere*.

Proof: Suppose wlog that f is discontinuous at 0, that f(0)=0, and that a sequence has for some positive m. Prove that , so f is unbounded near 0. Then use this to show unbounded everywhere.

Note that convex functions *are* bounded on every interval, so by above theorem they are continuous.

**Convex functions are differentiable almost everywhere**

In fact a convex function has left and right derivatives at all points, these are monotonically increasing, and there are only countably many points where they don’t coincide.

All the above is from pages 10 to 16 of this book I had found through googling; if anyone’s reading this, please tell me if you know better proofs.

**A proof of Jensen’s inequality**

Here, Jensen’s inequality is the statement that for a convex function ,

An equivalent definition of a convex function is that at any point we can find a line through the point that lies entirely below the function. (Why?) In other words, for any point , there is a “slope” m such that

for all .

With this definition, Jensen’s inequality has an absurdly short proof: take to be the mean , then take expectation.

## Identify

What’s this?

Seeing more should help:

If that’s too easy, how about this?

Both are from *Rekhāgaṇita*, which is a c. 1720s translation by Jagannatha of Nasir al-Din al-Tusi’s 13th-century Arabic translation of Euclid’s *Elements*. It seems to be a straightforward translation of the Arabic — it even uses, to label vertices etc., letters in the Arabic order अ ब ज द ह व झ…. The text retains most of the structure and proposition numbers of Euclid, but in fact the Arabic world has considerably elaborated on Euclid. For instance, for the famous first example above, it gives sixteen additional proofs/demonstrations, which are not in the Greek texts.

The first volume (Books I to VI) is available on Google Books and the second volume (Books VII to XV) through wilbourhall.org (25 MB PDF).

Notes on the second: some technical vocabulary — a प्रथमाङ्कः is a prime number, and a रूप is a unit (one). The rest of the vocabulary, like “निःशेषं करोति” meaning “divides (without remainder)”, is more or less clear, and also the fact that as in Euclid, numbers are being conceived of as lengths (so दझ and झद mean the same).

It does sound cooler to say “इदमशुद्धम्” than “But this is a contradiction”. :-)

## Some incredibly beautiful things

First, a totally completely absolutely incredible fact about convergents of continued fractions: the Khinchin-Levy theorem, seen here: For almost all real numbers , the denominators of the continued-fraction convergents of satisfy

Why should such a thing be a true?! Why should the limit even exist, why should it be the same for almost all real numbers, and where do , , and 12 come from?!!

Intrigued by these questions, and embarrassed by my near-complete ignorance of continued fractions (each time I had encountered them before, I had been intimidated by the notation and skipped pages), I started reading a couple of books on continued fractions (including Khinchin’s)… other things intervened and I had to give up for now, but I hope to return to their study soon. But from what I’ve already seen, continued fractions are awesome, and it is a real shame they seem to be out of fashion these days and not more widely taught and known. Neither of these books proves the above theorem actually, but an editor’s footnote in Khinchin’s book gives the reference: it was proved by Khinchin in 1935 that the limit exists and is the same for almost all real numbers, and Lévy found the constant soon after (published 1937).

I suspect (based on nothing at all) that it bears some relation to, among other things, (some generalization of) the following well-known fact, itself quite surprising (probably the most surprising fact I know a proof of):

This fact is known as the Basel problem, was proved by (who else?) Euler, and fourteen proofs of it are collected here. But let this not distract you from reading about continued fractions!

Second, pointed out at John Baez’s This Week’s Finds in Mathematical Physics (Week 230), and from this paper by Sondow, are these (quoting Prof. Baez directly, just typeset in LaTeX):

eerily similar formulas for some of our favorite math constants. First, one for e:

Then, one for π/2:

Then one for e

^{γ}, where γ is Euler’s constant:He also points out Wallis’ product for π/2 and Pippenger’s for e:

What does it all mean? I haven’t a clue! Another mystery thrown down to us by the math gods, like a bone from on high… we can merely choose to chew on it or not, as we wish.

Indeed!

Third, my favourite way to prove the Möbius inversion formula. Well, it’s actually the *only* way I know, besides induction, but I think it’s *beautiful*. (Ignore if you already know what Dirichlet convolutions are.) It’s from Herbert Wilf’s generatingfunctionology, one of my favourite books.

Let us consider a kind of generating function known as a Dirichlet generating function. (“A generating function is a clothesline on which we hang up a sequence of numbers for display.”) For a sequence , the Dirichlet generating function is defined as the (formal) series (in s)

.

(A formal series means that we can ignore issues of convergence etc.) Now if we have the generating functions and for two sequences and , what is ? Well, it’s easy to see that the coefficient of in , which comes exactly from the terms of the form with , is , or equivalently . So the product is the generating function for a special kind of convolution, the sequence defined as

Let us call this fact “(*)” and put it aside for now; we will use it shortly. Instead, let us look at the generating function for a very special sequence, the sequence of all 1s. It is, by definition, . While we could call this , it goes by the more familiar name — it is (when viewed as a function instead of a formal series) the Riemann zeta function. But we also know (this is the Euler product formula, and follows simply from the unique factorisation of integers):

.

This means that

.

Why did I write on the right? Is this a generating function of anything? Why, yes! By expanding the product, you can see that it is the generating function for the sequence , defined as

.

(This function “mu” is known as the “Möbius function”.) With this background, the Möbius inversion formula falls out entirely naturally: if is the generating function of a sequence , and the function is defined as

,

then its generating function is, by (*), simply

,

which means that

,

or in other words, again by (*),

,

which is precisely the Möbius inversion formula. How cool is that?

## On “trivial” in mathematics

One aspect of mathematicians’ vocabulary that non-mathematicians often find non-intuitive is the word “trivial”. Mathematicians seem to call a great many things “trivial”, most of which are anything but. Here’s a joke, pointed out to me by Mohan:

Two mathematicians are discussing a theorem. The first mathematician says that the theorem is “trivial”. In response to the other’s request for an explanation, he then proceeds with two hours of exposition. At the end of the explanation, the second mathematician agrees that the theorem is trivial.

Like many jokes, this is not far from the truth. This tendency has led others to say, for example, that

In mathematics, there are only two kinds of proofs: Trivial ones, and undiscovered ones.

Or as Feynman liked to say, “mathematicians can prove only trivial theorems, because every theorem that’s proved is trivial”. (The mathematicians to whom Feynman said this do not seem to have liked the statement.)

A little examination, however, shows that all this is not far from reality, and indeed not far from the ideal of mathematics. Consider these excerpts from Gian-Carlo Rota‘s wonderful *Indiscrete Thoughts*:

“eventually every mathematical problem is proved trivial. The quest for ultimate triviality is characteristic of the mathematical enterprise.” (p.93)

“Every mathematical theorem is eventually proved trivial. The mathematician’s ideal of truth is triviality, and the community of mathematicians will not cease its beaver-like work on a newly discovered result until it has shown to everyone’s satisfaction that all difficulties in the early proofs were spurious, and only an analytic triviality is to be found at the end of the road.” (p. 118, in

The Phenomenology of Mathematical Truth)

Are there definitive proofs?

It is an article of faith among mathematicians that after a new theorem is discovered, other simpler proofs of it will be given until a definitive one is found. A cursory inspection of the history of mathematics seems to confirm the mathematician’s faith. The first proof of a great many theorems is needlessly complicated. “Nobody blames a mathematician if the first proof of a new theorem is clumsy”, said Paul Erdős. It takes a long time, from a few decades to centuries, before the facts that are hidden in the first proof areunderstood, as mathematicians informally say. This gradual bringing out of the significance of a new discovery takes the appearance of a succession of proofs, each one simpler than the preceding. New and simpler versions of a theorem will stop appearing when the facts are finally understood. (p.146, inThe Phenomenology of Mathematical Proof, here/here).

For more context, the section titled *“Truth and Triviality”* is especially worth reading, where he gives the example of proofs of the prime number theorem, starting with Hadamard and de la Vallée Poussin, through Wiener, Erdős and Selberg, and Levinson. Also p. 119, where he feels this is not unique to mathematics:

Any law of physics, when finally ensconced in a proper mathematical setting, turns into a mathematical triviality. The search for a universal law of matter, a search in which physics has been engaged throughout this century, is actually the search for a trivializing principle, for a universal “‘nothing but.” [...] The ideal of all science, not only of mathematics, is to do away with any kind of synthetic

a posterioristatement and to leave only analytic trivialities in its wake. Science may be defined as the transformation of synthetic facts of nature into analytic statements of reason.

So to correct the earlier quote, there are *three* kinds of proofs from the mathematician’s perspective: those that are trivial, those that have not been discovered, and those niggling ones that are not **yet** trivial.