The Lumber Room

"Consign them to dust and damp by way of preserving them"

Archive for the ‘mathematics’ Category

Colliding balls approximate pi

leave a comment »

Found via G+, a new physical experiment that approximates \pi, like Buffon’s needle problem: The Pi Machine.

Roughly, the amazing discovery of Gregory Galperin is this: When a ball of mass M collides with one of ball m, propelling it towards a wall, the number of collisions (assuming standard physics idealisms) is \pi \lfloor\sqrt{M/m}\rfloor, so by taking M/m = 10^{2n}, we can get the first n+1 digits of \pi. Note that this number of collisions is an entirely determinstic quantity; there’s no probability is involved!

Here’s a video demonstrating the fact for M/m = 100 (the blue ball is the heavier one):

The NYT post says how this discovery came about:

Dr. Galperin’s approach was also geometric but very different (using an unfolding geodesic), building on prior related insights. Dr. Galperin, who studied under well-known Russian mathematician Andrei Kolmogorov, had recently written (with Yakov Sinai) extensively on ball collisions, realized just before a talk in 1995 that a plot of the ball positions of a pair of colliding balls could be used to determine pi. (When he mentioned this insight in the talk, no one in the audience believed him.) This finding was ultimately published as “Playing Pool With Pi” in a 2003 issue of Regular and Chaotic Dynamics.

The paper, Playing Pool With π (The number π from a billiard point of view) is very readable. The post has, despite a “solution” section, essentially no explanation, but the two comments by Dave in the comments section explain it clearly. And a reader sent in a cleaned-up version of that too: here, by Benjamin Wearn who teaches physics at Fieldston School.

Now someone needs to make a simulation / animation graphing the two balls in phase space of momentum. :-)

I’d done something a while ago, to illustrate The Orbit of the Moon around the Sun is Convex!, here. Probably need to re-learn all that JavaScript stuff, to make one for this. Leaving this post here as a placeholder.

Or maybe someone has done it already?

Written by S

Mon, 2014-06-23 at 23:03:18 +05:30

Posted in mathematics, unfinished

Prefatory apprehension

leave a comment »

Robert Recorde’s 1557 book is noted for being the first to introduce the equals sign =, and is titled:

The Whetstone of Witte: whiche is the seconde parte of Arithmeteke: containing the extraction of rootes; the cossike practise, with the rule of equation; and the workes of Surde Nombers.

Its title page (see, see also the full book at contains this verse:

Original spelling

Though many ſtones doe beare greate price,
whetſtone is for exerſice
As neadefull, and in woorke as ſtraunge:
Dulle thinges and harde it will ſo chaunge,
And make them ſharpe, to right good vſe:
All arteſmen knowe, thei can not chuſe,
But uſe his helpe: yet as men ſee,
Noe ſharpeneſſe ſemeth in it to bee.

The grounde of artes did brede this ſtone:
His vſe is greate, and moare then one.
Here if you lift your wittes to whette,
Moche ſharpeneſſe thereby ſhall you gette.
Dulle wittes hereby doe greately mende,
Sharpe wittes are fined to their fulle ende.
Now proue, and praiſe, as you doe finde,
And to your ſelf be not vnkinde.

Modern spelling

Though many stones do bear great price,
The whetstone is for exercise
As needful, and in work as strange:
Dull things and hard it will so change
And make them sharp, to right good use:
All artsmen know they cannot choose
But use his help; yet as men see,
No sharpness seemeth in it to be.

The ground of arts did breed this stone;
His use is great, and more than one.
Here if you lift your wits to whet,
Much sharpness thereby shall you get.
Dull wits hereby do greatly mend,
Sharp wits are fined to their full end.
Now prove and praise as you do find,
And to yourself be not unkind.

Apparently the full title contains a pun (see “the cossike practise” in the title refers to algebra, as the Latin cosa apparently meaning “a thing” was used to stand for an unknown, abbreviated to cos — but the Latin word cos itself means a grindstone.

The author again reminds readers not to blame his book, at the end of his preface:

To the curiouſe ſcanner.

If you ought finde, as ſome men maie,
That you can mende, I ſhall you praie,
To take ſome paine ſo grace maie ſende,
This worke to growe to perfecte ende.

But if you mende not that you blame,
I winne the praiſe, and you the ſhame.
Therfore be wiſe, and learne before,
Sith ſlaunder hurtes it ſelf moſte ſore.

Authors are either anxious about how their book is received, or make sure to be pointedly uncaring.

Sir Arthur Conan Doyle, in a mostly forgettable volume of poetry (Songs of the Road, 1911), begins:

If it were not for the hillocks
   You’d think little of the hills;
The rivers would seem tiny
   If it were not for the rills.
If you never saw the brushwood
   You would under-rate the trees;
And so you see the purpose
   Of such little rhymes as these.

Kālidāsa of course begins his Raghuvaṃśa with a grand disclaimer:

kva sūryaprabhavo vaṃśaḥ kva cālpaviṣayā matiḥ /
titīrṣur dustaram mohād uḍupenāsmi sāgaram // Ragh_1.2 //

mandaḥ kaviyaśaḥ prārthī gamiṣyāmy upahāsyatām /
prāṃśulabhye phale lobhād udbāhur iva vāmanaḥ // Ragh_1.3 //

atha vā kṛtavāgdvāre vaṃśe ‘smin pūrvasūribhiḥ /
maṇau vajrasamutkīrṇe sūtrasyevāsti me gatiḥ // Ragh_1.4 //

But the most nonchalant I’ve seen, thanks to Dr. Ganesh, is this gīti by Śrīkṛṣṇa Brahmatantra Yatīndra of the Parakāla Maṭha, Mysore:

nindatu vā nandatu vā
mandamanīṣā niśamya kṛtim etām
harṣaṃ vā marṣaṃ vā
sarṣapamātram api naiva vindema

Screw you guys. :-)

Written by S

Wed, 2014-05-28 at 23:56:11 +05:30

Posted in history, mathematics

Big O() notation: a couple of sources

with one comment

This post contains, just for future reference, a couple of primary sources relevant to the O (“Big O”) notation:

  1. Some introductory words from Asymptotic Methods in Analysis by de Bruijn
  2. An letter from Donald Knuth on an approach to teaching calculus using this notation.

Read the rest of this entry »

Written by S

Thu, 2014-03-13 at 16:33:20 +05:30

Visualizing product of permutations

with 4 comments

A simple pedagogical trick that may come in handy: represent a permutation \sigma using arrows (curved lines) from k to \sigma(k) for each k. Then, the product of two permutations can be represented by just putting the two corresponding figures (sets of arrows) one below the other, and following the arrows.

Representing permutations and products of permutations.

Representing permutations and products of permutations.

The figure is from an article called Symmetries by Alain Connes, found via the Wikipedia article on Morley’s trisector theorem (something entirely unrelated to permutations, but the article covers both of them and more).

I’m thinking how one might write a program to actually draw these: if we decide that the “height” of the figure is some h, then each arrow needs to go from some (k, 0) to (\sigma(k), h) (using here the usual screen convention of x coordinate increasing from left to right, and y coordinate increasing from top to bottom). Further, each curve needs to have vertical slope at its two endpoints, so that successive curves can line up smoothly. The constraint on starting point, ending point, and directions at the endpoints defines almost a quadratic Bezier curve, except that here the two directions are parallel. So it’s somewhere between a quadratic and the (usual) cubic Bezier curve, which is given by the start point, end point, and derivatives at the start and end point. (Here we only care about the direction of the derivative; we can pick some arbitrary magnitude to fix the curve: the larger we pick, the more smooth it will look at the ends, at the cost of smoothness in the interior.)

Even knowing the curve, how do we generate an image?

Written by S

Thu, 2014-03-06 at 23:15:44 +05:30

The idea of logarithms, and the first appearance of e

with 2 comments

[Incomplete post: about 10% written, about 90% left.]

The notion of the number e, the exponential function e^x, and logarithms \log x are often conceptual stumbling blocks even to someone who has an otherwise solid understanding of middle-school mathematics.

Just what is the number e? How was it first calculated / where did it first turn up? Premature exposure to its numerical value

\displaystyle e \approx 2.718281828459045\dots

only serves to deepen the mysteriousness and to make it seem arbitrary.

Here a historical perspective helps: as is often the case, here too, the first appearance is simpler and more well-motivated than the accounts in dry textbooks. This is from this account by Matthew P. Wiener (originally posted on USENET somewhere, as quoted by MJD). I’m just going to quote it directly for now, and edit it later:

Napier, who invented logarithms, more or less worked out a table of logarithms to base \frac1e, as follows:

     0  1  2  3   4   5   6    7    8    9    10 ...
     1  2  4  8  16  32  64  128  256  512  1024 ...

The arithmetic progression in the first row is matched by a geometric progression in the second row. If, by any luck, you happen to wish to multiply 16 by 32, that just happen to be in the bottom row, you can look up their “logs” in the first row and add 4+5 to get 9 and then conclude 16·32=512.

For most practical purposes, this is useless. Napier realized that what one needs to multiply in general is 1+\epsilon for a base—the intermediate values will be much more extensive. For example, with base 1.01, we get:

       0 1.00   1 1.01   2 1.02   3 1.03   4 1.04   5 1.05
       6 1.06   7 1.07   8 1.08   9 1.09  10 1.10  11 1.12
      12 1.13  13 1.14  14 1.15  15 1.16  16 1.17  17 1.18
      18 1.20  19 1.21  20 1.22  21 1.23  22 1.24  23 1.26
      24 1.27  25 1.28  26 1.30  27 1.31  28 1.32  29 1.33
      30 1.35  31 1.36  32 1.37  33 1.39  34 1.40  35 1.42
      50 1.64  51 1.66  52 1.68  53 1.69  54 1.71  55 1.73
      94 2.55  95 2.57  96 2.60  97 2.63  98 2.65  99 2.68
     100 2.70 101 2.73 102 2.76 103 2.79 104 2.81 105 2.84

So if you need to multiply 1.27 by 1.33, say, just look up their logs, in this case, 24 and 29, add them, and get 53, so 1.27·1.33=1.69. For two/three digit arithmetic, the table only needs entries up to 9.99.

Note that e is almost there, as the antilogarithm of 100. The natural logarithm of a number can be read off from the above table, as just [approximately] \frac1{100} the corresponding exponent.

What Napier actually did was work with base .9999999. He spent 20 years computing powers of .9999999 by hand, producing a grand version of the above. That’s it. No deep understanding of anything, no calculus, and e pops up anyway—in Napier’s case, \frac1e was the 10 millionth entry. (To be pedantic, Napier did not actually use decimal points, that being a new fangled notion at the time.)

Later, in his historic meeting with Briggs, two changes were made. A switch to a base > 1 was made, so that logarithms would scale in the same direction as the numbers, and the spacing on the logarithm sides was chosen so that \log(10)=1. These two changes were, in effect, just division by -\log_e(10).

In other words, e made its first appearance rather implicitly.

(I had earlier read a book on Napier and come to the same information though a lot less clearly, here.)

I had started writing a series of posts leading up to an understanding of the exponential function e^x (here, here, here), but it seems to have got abandoned. Consider this one a contribution to that series.

Written by S

Wed, 2013-11-27 at 10:52:51 +05:30

Posted in mathematics

The functional equation f(x+y) = f(x)f(y)

with 2 comments

Suppose f: \mathbb{R} \to \mathbb{R} satisfies f(x+y) = f(x) f(y). What can we say about f?

Putting y = 0 gives

\displaystyle f(x) = f(x+0) = f(x)f(0),

which can happen if either f(x) = 0 or f(0) = 1. Note that the function f which is identically zero satisfies the functional equation. If f is not this function, i.e., if f(x) \neq 0 for at least one value of x, then plugging that value of x (say x^*) into the equation gives f(0) = 1. Also, for any x, the equation f(x^*) = f(x +x^* - x) = f(x)f(x^* - x) forces f(x) \neq 0 as well. Further, f(x) = f(x/2 + x/2) = f(x/2)^2 so f(x) > 0 for all x.

Next, putting y = x gives f(2x) = f(x)^2, and by induction f(nx) = f(x)^n. Putting \frac{x}{n} in place of x in this gives f(n\frac{x}{n}) = f(\frac{x}{n})^n which means f(\frac{x}{n}) = f(x)^{\frac1n} (note we’re using f(x) > 0 here). And again, f(\frac{m}{n}x) = f(x)^{m/n}. So f(\frac{m}{n}) = f(1)^{m/n}, which completely defines the function at rational points.

[As f(1) > 0, it can be written as f(1) = e^k for some constant k, which gives f(x) = e^{kx} for rational x.]

To extend this function to irrational numbers, we need some further assumptions on f, such as continuity. It turns out that being continuous at any point is enough (and implies the function is f(x) = f(1)^x everywhere): note that f(x + m/n) = f(x)f(m/n) = f(x)f(1)^{m/n}. Even being Lebesgue-integrable/measurable will do.

Else, there are discontinuous functions satisfying the functional equation. (Basically, we can define the value of the function separately on each “independent” part. That is, define the equivalence class where x and y are related if y = r_1x + r_2 for rationals r_1 and r_2, pick a representative for each class using the axiom of choice (this is something like picking a basis for \mathbb{R}/\mathbb{Q}, which corresponds to the equivalence class defined by the relation y = r_1x), define the value of the function independently for each representative, and this fixes the value of f on \mathbb{R}. See this article for more details.)

To step back a bit: what the functional equation says is that f is a homorphism from (\mathbb{R}, +), the additive group of real numbers, to (\mathbb{R}, \times), the multiplicative monoid of real numbers. If f is not the trivial identically-zero function, then (as we saw above) f is in fact a homomorphism from (\mathbb{R}, +), the additive group of real numbers, to (\mathbb{R_+^*}, \times), the multiplicative group of positive real numbers. What we proved is that the exponential functions e^{kx} are precisely all such functions that are nice (nice here meaning either measurable or continuous at least one point). (Note that this set includes the trivial homomorphism corresponding to k = 0: the function f(x) = 1 identically everywhere. If f is not this trivial map, then it is in fact an isomorphism.)

Edit [2013-10-11]: See also Overview of basic facts about Cauchy functional equation.

Written by S

Mon, 2013-04-08 at 11:24:08 +05:30

Posted in mathematics

Trajectory of a point moving with acceleration perpendicular to velocity

with 8 comments

(Just some basic high-school physics stuff; to assure myself I can still do some elementary things. :P Essentially, showing that if a particle moves with acceleration perpendicular to velocity, or velocity perpendicular to position, then it traces out a circle. Stop reading here if this is obvious.)

Suppose a point moves in the plane such that its acceleration is always perpendicular to its velocity, and of the same magnitude. What is its path like?

To set up notation: let’s say the point’s position at time t is (p_x(t), p_y(t)), its velocity is (v_x(t), v_y(t)) = \left(\frac{d}{dt}p_x(t), \frac{d}{dt}p_y(t)\right), and its acceleration is (a_x(t), a_y(t)) = \left(\frac{d}{dt}v_x(t), \frac{d}{dt}v_y(t)\right).

The result of rotating a point (x,y) by 90° is (-y, x). (E.g. see figure below)


So the fact that acceleration is at right angles to velocity means that (a_x(t), a_y(t)) = (-v_y(t), v_x(t)), or, to write everything in terms of the velocity,

\begin{aligned} \frac{d}{dt}v_x(t) &= -v_y(t) \\  \frac{d}{dt}v_y(t) &= v_x(t) \end{aligned}

where we can get rid of v_x(t) by substituting the second equation (in the form v_x(t) = \frac{d}{dt}v_y(t)) into the first:

v_y(t) = -\frac{d}{dt}v_x(t) = -\frac{d}{dt}\left(\frac{d}{dt}v_y(t)\right)

or in other words

v_y(t) = -\frac{d^2}{dt^2}v_y(t).

By some theory about ordinary differential equations, which I don’t know (please help!) (but see the very related example you saw in high school, of simple harmonic motion), the solutions to this equation are \sin(t) and \cos(t) and any linear combination of those: the solution in general is

\begin{aligned}  v_y(t) &= a \sin(t) + b \cos(t) \\  &= \sqrt{a^2 + b^2} \left(\frac{a}{\sqrt{a^2+b^2}}\sin(t) + \frac{b}{\sqrt{a^2+b^2}}\cos(t)\right) \\  &= R\sin (t + \alpha)  \end{aligned}

where R = \sqrt{a^2 + b^2} and \alpha is the angle such that \cos(\alpha) = \frac{a}{\sqrt{a^2+b^2}} and \sin(\alpha) = \frac{b}{\sqrt{a^2+b^2}}. And the fact that v_x(t) = \frac{d}{dt}v_y(t) gives v_x(t) = R\cos(t + \alpha). So (v_x(t), v_y(t)) = (R\cos(t + \alpha), R\sin(t + \alpha)). Note that (a_x(t), a_y(t)) = \left(\frac{d}{dt}v_x(t), \frac{d}{dt}v_y(t)\right) = (-R\sin(t+\alpha), R\cos(t+\alpha)) is indeed perpendicular to (v_x(t), v_y(t)) as we wanted.

The actual trajectory (p_x(t), p_y(t)) can be got by integrating

\left(\frac{d}{dt}p_x(t), \frac{d}{dt}p_y(t)\right)  = (v_x(t), v_y(t)) = (R\cos(t + \alpha), R\sin(t + \alpha))

to get p_x(t) = R\sin(t + \alpha) + c_1 and p_y(t) = -R\cos(t + \alpha) + c_2. This trajectory is a point moving on a circle centered at point (c_1, c_2) and of radius R, with speed R or unit angular speed. Note that velocity is also perpendicular to the point’s position wrt the centre of the circle, as velocity is tangential to the circle, as it should be.

With a suitable change of coordinates (translate the origin to (c_1, c_2), then rotate the axes by \frac{\pi}{2}+\alpha, then scale everything so that R = 1), this is the familiar paremetrization (\cos(t), \sin(t)) of the circle.

Note: Just as we derived (v_x(t), v_y(t)) = (R\cos(t + \alpha), R\sin(t + \alpha)) from assuming that the acceleration is perpendicular to velocity, we can, by assuming that velocity is perpendicular to position, identically derive (p_x(t), p_y(t)) = (R\cos(t + \alpha), R\sin(t + \alpha)), i.e. that the point moves on a circle.

Written by S

Sun, 2013-04-07 at 23:38:01 +05:30

Posted in mathematics

The power series for sin and cos

with one comment

There are many ways to derive the power series of \sin x and \cos x using the machinery of Taylor series etc., but below is another elementary way of demonstrating that the well-known power series expansions are the right ones. The argument below is from Tristan Needham’s Visual Complex Analysis, which I’m reproducing without looking at the book just to convince myself that I’ve internalized it correctly.

So: let
\displaystyle  \begin{aligned}   P(x) &= x - \frac{x^3}{3!} + \frac{x^5}{5!} - \frac{x^7}{7!} + \dots \quad \text{ and }\\  Q(x) &= 1 - \frac{x^2}{2!} + \frac{x^4}{4!} - \frac{x^6}{6!} + \dots .  \end{aligned}

We will take the following two for granted (both can be proved with some effort):

  1. Both power series are convergent.
  2. The power series can be differentiated term-wise.

As suggested by (2) above, the first thing we observe is that \frac{d}{dx}P(x) = Q(x) and \frac{d}{dx}Q(x) = -P(x).

So firstly:
\begin{aligned}  \frac{d}{dx}(P(x)^2 + Q(x)^2)   &= 2P(x)P'(x) + 2Q(x)Q'(x) \\  &= 2P(x)Q(x) - 2Q(x)P(x) \\  &= 0  \end{aligned}
which means that P(x)^2 + Q(x)^2 is a constant and does not vary with x. Putting x = 0 shows that P(0) = 0 and Q(0) = 1, so P(x)^2 + Q(x)^2 = 1 for all x.

Secondly, define the angle \theta as a function of x, by \tan \theta(x) = P(x)/Q(x). (To be precise, this defines \theta(x) up to a multiple of \pi, i.e. modulo \pi.)
Differentiating the left-hand side of this definition gives
\displaystyle \begin{aligned}  \frac{d}{dx} \tan \theta(x)   &= (1 + \tan^2 \theta(x)) \theta'(x) \\  &= (1 + \frac{P(x)^2}{Q(x)^2}) \theta'(x) \\  &= \frac{1}{Q(x)^2} \theta'(x)   \end{aligned}
(where \theta'(x) means \frac{d}{dx} \theta(x))
while differentiating the right-hand side gives
\displaystyle \begin{aligned}  \frac{d}{dx} \frac{P(x)}{Q(x)}   &= \frac{Q(x)P'(x) - P(x)Q'(x)}{Q(x)^2} \\  &= \frac{Q(x)Q(x) + P(x)P(x)}{Q(x)^2} \\  &= \frac{1}{Q(x)^2}  \end{aligned}

The necessary equality of the two tells us that \frac{d}{dx}\theta(x) = 1, which along with the initial condition \tan \theta(0) = P(0)/Q(0) = 0 = \tan 0 that says \theta(0) \equiv 0 \mod \pi, gives \theta(x) = x (or to be precise, \theta(x) \equiv x \pmod {\pi}).

In other words, we have shown that the power series P(x) and Q(x) satisfy \frac{P(x)}{Q(x)} = \tan x = \frac{\sin x}{\cos x} and therefore P(x) = k \sin x and Q(x) = k \cos x for some k. The observation that Q(0) = 1 = \cos 0 (or our earlier observation that P(x)^2 + Q(x)^2 = 1 for all x) gives k = 1, thereby showing that P(x) = \sin x and Q(x) = \cos x.

So much for \sin x and \cos x. Just as an aside, observe that if we take i to be a symbol satisfying i^2 = -1, then
\displaystyle \begin{aligned}  \cos x + i\sin x   &= Q(x) + iP(x) \\  &= \left(1 - \frac{x^2}{2!} + \frac{x^4}{4!} - \frac{x^6}{6!} + \dots\right) + i\left(x - \frac{x^3}{3!} + \frac{x^5}{5!} - \frac{x^7}{7!} + \dots \right) \\  &= 1 + ix + \frac{-x^2}{2!} + \frac{-ix^3}{3!} + \frac{x^4}{4!} + \frac{ix^5}{5!} + \frac{-x^6}{6!} + \frac{-ix^7}{7!} + \dots \\  &= 1 + ix + \frac{(ix)^2}{2!} + \frac{(ix)^3}{3!} + \frac{(ix)^4}{4!} + \frac{(ix)^5}{5!} + \frac{(ix)^6}{6!} + \frac{(ix)^7}{7!} + \dots  \end{aligned}
the right hand side of which looks very much like the result of “substituting” y = ix in the known (real) power series
\displaystyle e^y = 1 + y + \frac{y^2}{2!} + \frac{y^3}{3!} + \dots
(which itself can be proved using the term-wise differentiation above and the defining property \frac{d}{dx} e^x = e^x, say).

So this is one heuristic justification for us to define e^{ix} = \cos x + i\sin x.
Or, if we define e^{ix} as the result of substituting ix in the real power series for e^y, this proves that e^{ix} = \cos x + i\sin x.

Written by S

Fri, 2013-03-08 at 00:33:12 +05:30

Posted in mathematics

The potions puzzle by Snape in Harry Potter and the Philosopher’s Stone

with one comment

Sometime this week, I reread the first Harry Potter book (after at least 10 years… wow, has it been that long?), just for contrast after reading Rowling’s adult novel The Casual Vacancy (on which more later). Anyway, in the penultimate chapter there is a puzzle:

He pulled open the next door, both of them hardly daring to look at what came next — but there was nothing very frightening in here, just a table with seven differently shaped bottles standing on it in a line.
“Snape’s,” said Harry. “What do we have to do?”
They stepped over the threshold, and immediately a fire sprang up behind them in the doorway. It wasn’t ordinary fire either; it was purple. At the same instant, black flames shot up in the doorway leading onward. They were trapped.
“Look!” Hermione seized a roll of paper lying next to the bottles. Harry looked over her shoulder to read it:

Danger lies before you, while safety lies behind,
Two of us will help you, whichever you would find,
One among us seven will let you move ahead,
Another will transport the drinker back instead,
Two among our number hold only nettle wine,
Three of us are killers, waiting hidden in line.
Choose, unless you wish to stay here forevermore,
To help you in your choice, we give you these clues four:
First, however slyly the poison tries to hide
You will always find some on nettle wine’s left side;
Second, different are those who stand at either end,
But if you would move onward, neither is your friend;
Third, as you see clearly, all are different size,
Neither dwarf nor giant holds death in their insides;
Fourth, the second left and the second on the right
Are twins once you taste them, though different at first sight.

I became curious about whether this is just a ditty Rowling made up, or the puzzle actually makes sense and is consistent. It turns out she has constructed it well. Let’s take a look. This investigation can be carried out by hand, but we’ll be lazy and use a computer, specifically Python. The code examples below are all to be typed in an interactive Python shell, the one that you get by typing “python” in a terminal.

So what we have are seven bottles, of which one will take you forward (F), one will let you go back (B), two are just nettle wine (N), and three are poison (P).

>>> bottles = ['F', 'B', 'N', 'N', 'P', 'P', 'P']

The actual ordering of these 7 bottles is some ordering (permutation) of them. As 7 is a very small number, we can afford to be inefficient and resort to brute-force enumeration.

>>> import itertools
>>> perms = [''.join(s) for s in set(itertools.permutations(bottles))]
>>> len(perms)

The set is needed to remove duplicates, because otherwise itertools.permutations will print 7! “permutations”. So already the number of all possible orderings is rather small (it is \frac{7!}{2!3!} = 420). We can look at a sample to check whether things look fine.

>>> perms[:10]

Now let us try to solve the puzzle. We can start with the first clue, which says that wherever a nettle-wine bottle occurs, on its left is always a poison bottle (and in particular therefore, a nettle-wine bottle cannot be in the leftmost position). So we must restrict the orderings to just those that satisfy this condition.

>>> def clue1(s): return all(i > 0 and s[i-1] == 'P' for i in range(len(s)) if s[i]=='N')
>>> len([s for s in perms if clue1(s)])

(In the code, the 7 positions are 0 to 6, as array indices in code generally start at 0.)
Then the second clue says that the bottles at the end are different, and neither contains the potion that lets you go forward.

>>> def clue2(s): return s[0] != s[6] and s[0] != 'F' and s[6] != 'F'
>>> len([s for s in perms if clue1(s) and clue2(s)])

The third clue says that the smallest and largest bottles don’t contain poison, and this would be of help to Harry and Hermione who can see the sizes of the bottles. But as we readers are not told the sizes of the bottles, this doesn’t seem of any help to us; let us return to this later.

The fourth clue says that the second-left and second-right bottles have the same contents.

>>> def clue4(s): return s[1] == s[5]
>>> len([s for s in perms if clue1(s) and clue2(s) and clue4(s)])

There are now just 8 possibilities, finally small enough to print them all.

>>> [s for s in perms if clue1(s) and clue2(s) and clue4(s)]

Alas, without knowing which the “dwarf” and “giant” bottles are, we cannot use the third clue, and this seems as far as we can go. We seem to have exhausted all the information available…

Almost. It is reasonable to assume that the puzzle is meant to have a solution. So even without knowing where exactly the “dwarf” and “giant” bottles are, we can say that they are in some pair of locations that ensure a unique solution.

>>> def clue3(d, g, s): return s[d]!='P' and s[g]!='P'
>>> for d in range(7):
...   for g in range(7):
...     if d == g: continue
...     poss = [s for s in perms if clue1(s) and clue2(s) and clue4(s) and clue3(d,g,s)]
...     if len(poss) == 1:
...       print d, g, poss[0]

Aha! If you look at the possible orderings closely, you will see that we are down to just two possibilities for the ordering of the bottles.
Actually there is some scope for quibbling in what we did above: perhaps we cannot say that there is a unique solution determining the entire configuration; perhaps all we can say is that the puzzle should let us uniquely determine the positions of just the two useful bottles. Fortunately, that gives exactly the same set of possibilities, so this distinction happens to be inconsequential.

>>> for d in range(7):
...   for g in range(7):
...     if d == g: continue
...     poss = [(s.index('F'),s.index('B')) for s in perms if clue1(s) and clue2(s) and clue4(s) and clue3(d,g,s)]
...     if len(set(poss)) == 1:
...       print d, g, [s for s in perms if clue1(s) and clue2(s) and clue4(s) and clue3(d,g,s)][0]

Good. Note that there are only two configurations above. So with only the clues in the poem, and the assumption that the puzzle can be solved, we can narrow down the possibilities to two configurations, and be sure of the contents of all the bottles except the third and fourth. We know that the potion that lets us go forward is in either the third or the fourth bottle.

In particular we see that the last bottle lets us go back, and indeed this is confirmed by the story later:

“Which one will get you back through the purple flames?”
Hermione pointed at a rounded bottle at the right end of the line.
She took a long drink from the round bottle at the end, and shuddered.

But we don’t know which of the two it is, as we can’t reconstruct all the relevant details of the configuration. Perhaps we can reconstruct something with the remaining piece of information from the story?

“Got it,” she said. “The smallest bottle will get us through the black fire — toward the Stone.”
Harry looked at the tiny bottle.
Harry took a deep breath and picked up the smallest bottle.

So we know that the bottle that lets one move forward is in fact in the smallest one, the “dwarf”.

>>> for d in range(7):
...   for g in range(7):
...     poss = [s for s in perms if clue1(s) and clue2(s) and clue4(s) and clue3(d,g,s)]
...     if len(poss) == 1 and poss[0][d] == 'F':
...       print d, g, poss[0]

This narrows the possible positions of the smallest and largest bottles (note that it says the largest bottle is one that contains nettle wine), but still leaves the same two possibilities for the complete configuration. So we can stop here.

What we can conclude is the following: apart from the clues mentioned in the poem, the “dwarf” (the smallest bottle) was in either position 2 (third from the left) or 3 (fourth from the left). The biggest bottle was in either position 1 (second from the left) or 5 (sixth from the left). With this information about the location of the smallest bottle (and without necessarily assuming the puzzle has a unique solution!), Hermione could determine the contents of all the bottles. In particular she could determine the location of the two useful bottles: namely that the bottle that lets you go back was the last one, and that the one that lets you go forward was the smallest bottle.

>>> for (d,g) in [(2,1), (2,5), (3,1), (3,5)]:
...   poss = [s for s in perms if clue1(s) and clue2(s) and clue4(s) and clue3(d, g, s)]
...   assert len(poss) == 1
...   s = poss[0]
...   assert s.index('B') == 6
...   assert s.index('F') == d
...   print (d,g), s
(2, 1) PNFPPNB
(2, 5) PNFPPNB
(3, 1) PNPFPNB
(3, 5) PNPFPNB

It is not clear why she went to the effort to create a meaningful puzzle, then withheld details that would let the reader solve it fully. Perhaps some details were removed during editing. As far as making up stuff for the sake of a story goes, though, this is nothing; consider for instance the language created for Avatar which viewers have learned.

See also which does it by hand, and has a perfectly clear exposition (it doesn’t try the trick of guessing that solution is unique before reaching for the additional information from the story).

Written by S

Mon, 2012-10-22 at 02:00:12 +05:30

Are there Fibonacci numbers starting with 2012? (continued)

leave a comment »

Almost 8 months ago I left the first part of this post unfinished planning to complete it in the morning; seems I slept too long. (Or as this guy said after a 2-year hiatus: “Sorry for the pause guys. I was in the bathroom.”)

Recall: To get a power of 2 that starts with a prefix p (like p = 2012), we want n such that the fractional part of n\log 2 lies between those of \log p and \log(p+1) (all logarithms here to base 10), and similarly to get a Fibonacci number starting with p, we want (with some hand-waving) n such that the fractional part of n\log\phi lies between those of \log(p) + \log{\sqrt{5}} and \log(p+1) + \log{\sqrt{5}}. The more general problem is:

Problem: Given an irrational number \theta and an interval (a,b) \subset (0,1), find n such that \mathrm{frac}(n\theta) lies in the interval (a,b).

Here is one method, based on Edward Burger’s book Exploring the Number Jungle. Let \alpha be the midpoint of the interval (a,b). Then we are trying to find n such that \mathrm{frac}(n\theta) is close to \alpha.

  • Find a fraction \displaystyle \frac{p}{q} approximating \theta, such that \displaystyle |q\theta - p| < \frac1q. (These are the convergents of the continued fraction of \theta, but in practice it seems you can also get away with taking semi-convergents that may not satisfy this property.)
  • Let N be the closest integer to q\alpha. Note that this automatically means \displaystyle |q\alpha - N| \le \frac12
  • Write \displaystyle N = yp - xq with \displaystyle |y| \le \frac{q}{2}. This you can do quite easily with the Euclidean algorithm.
  • Then for n = q + y and k = p + x, we have (it is a simple exercise to prove this)
    \displaystyle |\theta n - k - \alpha| < \frac{3}{n}
  • This means that the distance between n\theta and \alpha is small, modulo 1. If this distance turns out to be still too large, start with a bigger convergent \frac{p}{q}.

I think I had some code to post as well (hey, what’s the actual Fibonacci number that starts with 2012?), but it needs to be cleaned up… probably this works (in Sage).

The thing we’re doing here is called inhomogeneous Diophantine approximation.

[Originally posted on math.stackexchange, here and here.]

Written by S

Fri, 2012-08-03 at 01:56:18 +05:30

Posted in mathematics

Are there Fibonacci numbers starting with 2012?

with one comment

Do there exist powers of 2 whose first four digits are 2012?
Are there Fibonacci numbers whose first four digits are 2012?

If the answer is obvious to you (or you don’t care), you can stop reading.

The answer for both is:

  • Yes.
  • There are infinitely many such numbers.
  • In fact, the fraction of (powers of 2 / Fibonacci numbers) starting with 2012 is exactly

    \displaystyle \frac{\log 2013 - \log 2012}{\log 10} \approx \frac1{4644} \approx 0.000216

Similarly with any other prefix p (of any length) in place of 2012. Proof follows.

A number x starts with a prefix p if and only if for some k ≥ 0,

\displaystyle p \cdot 10^k \le x < (p+1) \cdot 10^k

Thus a power of 2, say 2n, starts with p iff

\displaystyle p \cdot 10^k \le 2^n < (p+1) \cdot 10^k for some k \ge 0

Taking logarithms to base 10 and simplifying, this is equivalent to

\displaystyle \log p < n\log2 - k < \log(p+1) for some k \ge 0

This is saying that the fractional part of n\log 2 lies between the fractional parts of \log p and \log (p+1). For example, if p = 2012, this means that the fractional part of n\log 2 lies between \log 2.012 \approx 0.303628 and \log 2.013 \approx 0.303844.

Similarly, for Fibonacci numbers, as is (or should be) well-known, the nth Fibonacci number Fn is the closest integer to \displaystyle \frac{\phi^n}{\sqrt5}, where \displaystyle \phi = \frac{1+\sqrt5}2 is the golden ratio. So F_n starts with p iff

\displaystyle p \cdot 10^k - \frac12 \quad < \quad \frac{\phi^n}{\sqrt5} \quad < \quad (p+1) \cdot 10^k - \frac12

Taking logarithms to base 10 and simplifying, while ignoring the annoying \frac12 which becomes irrelevant in the limit (this line is not rigorous), this is equivalent to

\displaystyle \log(p) + \log\sqrt5 \quad < \quad n\log\phi - k \quad < \quad \log(p+1) + \log\sqrt5

which means that the fractional part of n\log \phi lies between the fractional parts of \log p +  \log\sqrt5 and \log (p+1) + \log\sqrt5. For p = 2012, this means that the fractional part of n\log \phi lies between \log 2.012 + \log\sqrt5 \approx 0.653113 and \log 2.013 + \log\sqrt5 \approx 0.653329.

In either case, we are trying to make the fractional part of n\theta, for some irrational number \theta, lie in some interval. The relevant fact is this:
Theorem 1: for any irrational number \theta, the sequence \mathrm{frac}(n\theta) (where \mathrm{frac}(x) denotes the fractional part of x) is dense in [0, 1).
or, in other words,
Theorem 1: For any irrational number \theta, the sequence n\theta is dense modulo 1.
Proving this theorem is a good exercise.

This means that for any interval you want, you can always find some n such that the fractional part of n\theta lies in your interval. In fact, because the sequence is dense, you can find an infinite sequence of n such that the fractional parts of n\theta converge to the midpoint (say) of the desired interval. This proves the first two facts of the answer, and for the third we need a stronger theorem:

Theorem 2 [Equidistribution theorem]: For any irrational number \theta, the numbers n\theta are uniformly distributed modulo 1.

This means that for any interval I \subset [0,1] of size s (say), the fraction of integers n for which \mathrm{frac}(n\theta) lies in the interval satisfies

\displaystyle \lim_{N \to \infty} \frac{ \left|\left\{ 1 \le n \le N : \mathrm{frac}(n\theta) \in I\right\}\right|}{N} = s

This proves the third fact. The fraction of Fibonacci numbers (or of powers of a number that is not a power of 10) that start with a prefix p \ge 1 is exactly \log(p+1) - \log(p) where log is to base 10.

That much is standard. And non-constructive. We are assured of the existence of such numbers, but how do we actually find one?

The answer (or one answer), as it so often does in these problems, involves continued fractions. Here is one method, [to be continued when I wake up :p]
Edit: Continued here.

Written by S

Sun, 2012-01-08 at 01:29:26 +05:30

Posted in mathematics, unfinished


leave a comment »

[Posting some images here for possible future reuse.]

\displaystyle \lim_{n\to\infty}\left(1 + \frac{ix}{n}\right)^n = \cos x + i \sin x

A non-rigorous argument: when n is large enough so that x/n is small, (1 + ix/n) is roughly (hand-waving) the point on the unit circle at arc length (and hence angle) x/n:

So multiplication by (1+ix/n) roughly corresponds to rotation by angle x/n. Multiplication by (1+ix/n)^n, which is multiplication by (1+ix/n) n times, roughly corresponds to rotation by angle n(x/n) = x. As n \to \infty, “roughly” becomes exact.

Animation for x = 1:

Image generated from Python-generated SVG files; code available if anyone wants.

In particular, once one accepts the fact/definition that \lim_{n\to\infty}\left(1 + \frac{x}{n}\right)^n = e^x (for instance, show that the function f(x) = \lim_{n\to\infty}\left(1 + \frac{x}{n}\right)^n satisfies f(x+y) = f(x)f(y)), it is true that e^{i\pi} is a rotation by π, that is,

\displaystyle e^{i\pi} = -1

Written by S

Tue, 2011-06-21 at 18:04:25 +05:30

Posted in mathematics

Serieshelpmate in 19

with 2 comments

Here’s a brilliant problem.

Consider the following chess position.

Black is to make 19 consecutive moves, after which White checkmates Black in one move. Black may not move into check, and may not check White (except possibly on his last move). Black and White are cooperating to achieve the aim of checkmate. (In chess problem parlance, this problem is called a serieshelpmate in 19.) How many different solutions are there?

This problem is due to Kauko Väisänen, and appears in A. Puusa, Queue Problems, Finnish Chess Problem Society, Helsinki, 1992 (Problem 2).

Hint: the above is quoted from Richard Stanley’s Enumerative Combinatorics.

Written by S

Sun, 2011-05-29 at 15:30:25 +05:30

Posted in mathematics

How does Tupper’s self-referential formula work?

with 16 comments

[I write this post with a certain degree of embarrassment, because in the end it turns out (1) to be more simple than I anticipated, and (2) already done before, as I could have found if I had internet access when I did this. :-)]

The so-called “Tupper’s self-referential formula” is the following, due to Jeff Tupper.

Graph the set of all points {(x,y)} such that

\displaystyle  \frac12 < \left\lfloor \mathrm{mod} \left( \left\lfloor{\frac{y}{17}}\right\rfloor 2^{-17\lfloor x \rfloor - \mathrm{mod}(\lfloor y \rfloor, 17)}, 2 \right) \right\rfloor

in the region

\displaystyle  0 < x < 106

\displaystyle  N < y < N+17

where N is the following 544-digit integer:

The result is the following graph:

Figure 1: The graph of the formula, in some obscure region, is a picture of the formula itself.

Whoa. How does this work?

At first sight this is rather too incredible for words.

But after a few moments we can begin to guess what is going on, and see that—while clever—this is perhaps not so extraordinary after all. So let us calmly try to reverse-engineer this feat.

Read the rest of this entry »

Written by S

Tue, 2011-04-12 at 13:05:20 +05:30

Posted in mathematics

Tagged with , , ,

“Your monkey did not jump high enough”

with 4 comments

Yesterday, in Futility Closet there was a post:

In Longfellow’s novel Kavanagh, Mr. Churchill reads a word problem to his wife:

“In a lake the bud of a water-lily was observed, one span above the water, and when moved by the gentle breeze, it sunk in the water at two cubits’ distance. Required the depth of the water.”

“That is charming, but must be very difficult,” she says. “I could not answer it.”

Is it? If a span is 9 inches and a cubit is 18 inches, how deep is the water?

The problem is simple enough: if the depth of the water is x inches so that the lotus from bottom to tip is x+9 inches, then x2+362=(x+9)2, which means x=(362-92)/18=135/2=67.5.

More interestingly, as I accidentally recognised (I don’t know how), it is from the Sanskrit mathematics text Lilavati (and also found in the Bījagaṇita) of Bhaskaracharya (Bhaskara II). That entire chapter of Kavanagh is essentially quoting the Lilavati (Kavanagh is written in a somewhat embarrassing tone that perhaps explains why it’s so obscure :p); it’s included later below the horizontal line in this post.

Bhaskaracharya, believed to have lived in the 12th century, is considered the last great Indian mathematician, outside of the Kerala school. Like most Sanskrit texts, the Līlāvati is written in verse, so as to be easier to memorise. Unlike many Sanskrit technical works (or for that matter technical works in any language), however, Bhāskara’s works are not written in the typical dry style, and can veer quite poetic at times. His description of the seasons in one of his astronomical works is one of the few true instances of poetry in the Sanskrit astronomical/mathematical corpus. This particular problem, it happens, is written in the beautiful mandākrānta metre: (If it helps: mandakranta is the metre of the Meghadūta, of “शान्ताकारं भुजगशयनं…”, of “नास्था धर्मे न वसुनिचये…”, etc., and you can listen to a recitation in the Marathi tradition by Ashwini Deo.)

चक्रक्रौञ्चाकुलितसलिले क्वापि दृष्टं तडागे
तोयादूर्ध्वं कमलकलिकाग्रं वितस्तिप्रमाणम्
मन्दं मन्दं चलितमनिलेनाऽऽहतं हस्तयुग्मे
तस्मिन्मग्नं गणक कथय क्षिप्रमम्बुप्रमाणम्

cakra-krauñcākulita-salile kvāpi dṛṣṭaṃ taḍāge
toyād ūrdhvaṃ kamala-kalikāgraṃ vitasti-pramāṇam
mandaṃ mandaṃ calitam anilenāhataṃ hasta-yugme
tasmin magnaṃ gaṇaka kathaya kṣipram ambu-pramāṇam

In a certain lake swarming with geese and cranes,
the tip of a bud of lotus was seen one span above the water.
Forced by the wind, it gradually moved, and was submerged at a distance of two cubits.
O mathematician, tell quickly the depth of the water.

Well, that’s my translation, close to Longfellow’s quoted translation by Taylor and to Colebrooke’s better translation, but I may be wrong, so details for anyone who cares to improve it:

In a certain [kvāpi] pool [taḍāge] whose water [salile] was swarming [ākulita] with ruddy geese [cakra] and curlews [krauñcā],
above the water [toyād ūrdhvaṃ] a lotus-bud-tip [kamala-kalikāgraṃ] at a distance of one span [vitasti-pramāṇam] was seen [dṛṣṭaṃ].
Slowly slowly [mandaṃ mandaṃ] by the wind [anilena] moved [calitam] and forced [āhataṃ],
at a distance of two cubits [hasta-yugme] it got submerged [magnaṃ] in the water [tasmin].
O mathematician [gaṇaka], say [kathaya] quickly [kṣipram] the depth of the water [ambu-pramāṇam].

The structure of the book may be worth remarking on: the general formula for exactly this problem is given first (in more technical terms), and then this problem is given as an example!

Glancing through Longfellow, one finds he’s also written a tiny poem called King Trisanku:

Viswamitra the Magician,
By his spells and incantations,
Up to Indra’s realms elysian
Raised Trisanku, king of nations.

Indra and the gods offended
Hurled him downward, and descending
In the air he hung suspended,
With these equal powers contending.

Thus by aspirations lifted,
By misgivings downward driven,
Human hearts are tossed and drifted
Midway between earth and heaven.

Ho hum (1845 America).

The chapter of Kavanagh below this line.

Read the rest of this entry »

Written by S

Sat, 2011-01-29 at 04:06:14 +05:30

Posted in mathematics, sanskrit

Not all best rational approximations are the convergents of the continued fraction!

with 15 comments

[For more background on continued fractions and why they are so wonderful at approximations (and wonderful generally) — eventually I may edit this post to mention that. For now I just want to quickly clarify something, which surprisingly many popular expositions of continued fractions seem to mislead by leaving out.]

Any real number can be written as a continued fraction. For instance, the square root of 3 is
\sqrt{3} = 1+\cfrac{1}{1+\cfrac{1}{2+\cfrac{1}{1+\cfrac{1}{2+\dots}}}}
written as \sqrt{3} = [1; 1, 2, 1, 2, \dots],

and π is

\pi = 3+\cfrac{1}{7+\cfrac{1}{15+\cfrac{1}{1+\cfrac{1}{292+\dots}}}}

i.e. \pi = [3; 7, 15, 1, 292, \dots] (more terms below),

and e is e = [2; 1, 2, 1, 1, 4, 1, 1, 6, 1, 1, 8, 1,\dots] (proved here).

Rational numbers have finite continued fractions, and irrational numbers have infinite continued fractions, of which only quadratic irrationals (like √3 above) have periodic continued fractions. Non-quadratic irrationals may still have some recognisable patterns in their representation, like e above, or not, like π.

The numbers you get by truncating the continued fraction — taking only the first few terms — are called its convergents. The continued fraction for π is [3; 7, 15, 1, 292, 1, 1, 1, 2, 1, 3, 1, 14, 2…] (sequence A001203 in OEIS), and the sequence of convergents is 3/1, 22/7, 333/106, 355/113, 103993/33102… (A002485/A002486):

\begin{array}{cll} p_1/q_1 &= [3;] &= 3/1 \\ p_2/q_2 &= [3; 7] &= 22/7 \\  p_3/q_3 &= [3; 7, 15] &= 333/106 \\  p_4/q_4 &= [3; 7, 15, 1] &= 355/113 \\ p_5/q_5 &= [3; 7, 15, 1, 292] &= 103993/33102\end{array}

Each of these convergents is a best rational approximation, in the following sense.

Definition: A fraction p/q is a best rational approximation of a real number x if p/q is closer to x than any other fraction with a smaller denominator. That is, for any other fraction p'/q' with q' \le q and p'/q' \ne p/q, we have \left|\frac{p}{q} -x \right| < \left| \frac{p'}{q'} -x \right|.

Thus among all fractions with denominator at most 7, the fraction 22/7 is closest to π. Among all fractions with denominator at most 113, the fraction 355/113 is closest to π. And so on.

Every convergent is a best rational approximation, but these aren’t all of the best rational approximations! The sequence of best approximations of π is actually 3/1, 13/4, 16/5, 19/6, 22/7, 179/57, 201/64, 223/71, 245/78, 267/85, 289/92, 311/99, 333/106, 355/113, 52163/16604… (A063674/A063673), where only the bold terms are convergents. (Aside: Notice how good 355/113 is: it is the best among all fractions with denominator at most 16603. This is because of the large term 292 in the continued fraction of π.) The others are semi-convergents (intermediate fractions), as explained below.

In general, for any real number, the convergents p_k/q_k of its continued fraction are always best rational approximations, but they aren’t all of the best rational approximations. To get all of them, you also have to take the semi-convergents: fractions of the form (p_k + np_{k+1})/(q_k +nq_{k+1}) for some integer n≥1. Specifically, taking n=a_{k+2} — the next term in the continued fraction expansion — gives p_{k+2}/q_{k+2}, and the ones to take are precisely n from \lceil a_{k+2}/2 \rceil (if it’s better than the previous convergent, else \lfloor a_{k+2}/2 \rfloor + 1) to a_{k+2}. This is also mentioned on Wikipedia.

Thus for π = [3; 7, 15, 1, 292, 1, 1, 1, 2, 1, 3, 1, 14, 2…], the best rational approximations are:

3/1     = [3]

13/4    = [3; 4]
16/5    = [3; 5]
19/6    = [3; 6]
22/7    = [3; 7]

179/57  = [3; 7, 8]
201/64  = [3; 7, 9]
223/71  = [3; 7, 10]
245/78  = [3; 7, 11]
267/85  = [3; 7, 12]
289/92  = [3; 7, 13]
311/99  = [3; 7, 14]
333/106 = [3; 7, 15]

355/113 = [3; 7, 15, 1]

52163/16604  = [3; 7, 15, 1, 146]
52518/16717  = [3; 7, 15, 1, 147]
… (terms from [3; 7, 15, 1, 148] to [3; 7, 15, 1, 291])...
103993/33102 = [3; 7, 15, 1, 292]

104348/33215 = [3; 7, 15, 1, 292, 1]

The above is the natural meaning of “best rational approximation”. This is also known as “best rational approximation of the first kind”. There is a different definition of “best rational approximation” under which the convergents are precisely all the best rational approximations: basically, instead of considering the distance |x - p/q|, we consider |qx - p|. That is: A fraction \frac{a}{b} is a best approximation of the second kind of a real number x if for every fraction \frac{c}{d} with 1 \le d \le b and a/b \neq c/d, we have |dx - c| > |bx - a|
This measure turns out to have a nicer theory, but to omit specifying that this non-obvious definition is the one being used is misleading.


For illustration, here’s a C program (why it’s written in C is not important right now) that given a positive real number, generates its continued fraction, its convergents, and the sequence of best rational approximations. The function find_cf finds the continued fraction (putting the terms in a[] and the convergents in p[] and q[] — excuse the global variables), and the function all_best prints all the best rational approximations.

#include <math.h>
#include <stdio.h>
#include <assert.h>

// number of terms in continued fraction.
// 15 is the max without precision errors for M_PI
#define MAX 15
#define eps 1e-9

long p[MAX], q[MAX], a[MAX], len;
void find_cf(double x) {
  int i;
  //The first two convergents are 0/1 and 1/0
  p[0] = 0; q[0] = 1;
  p[1] = 1; q[1] = 0;
  //The rest of the convergents (and continued fraction)
  for(i=2; i<MAX; ++i) {
    a[i] = lrint(floor(x));
    p[i] = a[i]*p[i-1] + p[i-2];
    q[i] = a[i]*q[i-1] + q[i-2];
    printf("%ld:  %ld/%ld\n", a[i], p[i], q[i]);
    len = i;
    if(fabs(x-a[i])<eps) return;
    x = 1.0/(x - a[i]);

void all_best(double x) {
  find_cf(x); printf("\n");
  int i, n; long cp, cq;
  for(i=2; i<len; ++i) {
    //Test n = a[i+1]/2 separately. Enough to test when a[i+1] is even, actually.
    n = a[i+1]/2; cp = n*p[i]+p[i-1]; cq = n*q[i]+q[i-1];
    if(fabs(x-(double)cp/cq) < fabs(x-(double)p[i]/q[i])) printf("%ld/%ld,", cp, cq);
    //And print all the rest, no need to test
    for(n = (a[i+1]+2)/2; n<=a[i+1]; ++n) {
      printf("%ld/%ld,", n*p[i]+p[i-1], n*q[i]+q[i-1]);

int main(int argc, char **argv) {
  double x;
  if(argc==1) { x = M_PI; } else { sscanf(argv[1], "%lf", &x); }
  assert(x>0); printf("%.15lf\n\n", x);
  all_best(x); printf("\n");
  return 0;


Here’s the output of this program, in about 0.003 seconds (i.e., it’s truly better than looping through all possible denominators!), line-wrapped for readability:

% ./a.out
3:  3/1
7:  22/7
15:  333/106
1:  355/113
292:  103993/33102
1:  104348/33215
1:  208341/66317
1:  312689/99532
2:  833719/265381
1:  1146408/364913
3:  4272943/1360120
1:  5419351/1725033
14:  80143857/25510582
13/4, 16/5, 19/6, 22/7, 179/57, 201/64, 223/71, 245/78, 267/85, 289/92, 311/99,
333/106, 355/113, 52163/16604, 52518/16717, 52873/16830, 53228/16943, 53583/17056,
53938/17169, 54293/17282, 54648/17395, 55003/17508, 55358/17621, 55713/17734,
56068/17847, 56423/17960, 56778/18073, 57133/18186, 57488/18299, 57843/18412,
58198/18525, 58553/18638, 58908/18751, 59263/18864, 59618/18977, 59973/19090,
60328/19203, 60683/19316, 61038/19429, 61393/19542, 61748/19655, 62103/19768,
62458/19881, 62813/19994, 63168/20107, 63523/20220, 63878/20333, 64233/20446,
64588/20559, 64943/20672, 65298/20785, 65653/20898, 66008/21011, 66363/21124,
66718/21237, 67073/21350, 67428/21463, 67783/21576, 68138/21689, 68493/21802,
68848/21915, 69203/22028, 69558/22141, 69913/22254, 70268/22367, 70623/22480,
70978/22593, 71333/22706, 71688/22819, 72043/22932, 72398/23045, 72753/23158,
73108/23271, 73463/23384, 73818/23497, 74173/23610, 74528/23723, 74883/23836,
75238/23949, 75593/24062, 75948/24175, 76303/24288, 76658/24401, 77013/24514,
77368/24627, 77723/24740, 78078/24853, 78433/24966, 78788/25079, 79143/25192,
79498/25305, 79853/25418, 80208/25531, 80563/25644, 80918/25757, 81273/25870,
81628/25983, 81983/26096, 82338/26209, 82693/26322, 83048/26435, 83403/26548,
83758/26661, 84113/26774, 84468/26887, 84823/27000, 85178/27113, 85533/27226,
85888/27339, 86243/27452, 86598/27565, 86953/27678, 87308/27791, 87663/27904,
88018/28017, 88373/28130, 88728/28243, 89083/28356, 89438/28469, 89793/28582,
90148/28695, 90503/28808, 90858/28921, 91213/29034, 91568/29147, 91923/29260,
92278/29373, 92633/29486, 92988/29599, 93343/29712, 93698/29825, 94053/29938,
94408/30051, 94763/30164, 95118/30277, 95473/30390, 95828/30503, 96183/30616,
96538/30729, 96893/30842, 97248/30955, 97603/31068, 97958/31181, 98313/31294,
98668/31407, 99023/31520, 99378/31633, 99733/31746, 100088/31859, 100443/31972,
100798/32085, 101153/32198, 101508/32311, 101863/32424, 102218/32537, 102573/32650,
102928/32763, 103283/32876, 103638/32989, 103993/33102, 104348/33215, 208341/66317,
312689/99532, 833719/265381, 1146408/364913, 3126535/995207,
4272943/1360120, 5419351/1725033, 42208400/13435351, 47627751/15160384,
53047102/16885417, 58466453/18610450, 63885804/20335483, 69305155/22060516,
74724506/23785549, 80143857/25510582, 

All these terms are correct, though if you increase MAX you start getting errors because of precision. I’m myself impressed with how many terms you get with only 13 convergents. (As you can see, there’s a small bug where it sometimes doesn’t print the very first “n/1″ approximation, or prints it incorrectly — I leave it for you to fix!)

You can try with √2, whose continued fraction is [1; 2, 2, 2, 2…]:

% ./a.out 1.41421356237309504880
1:  1/1
2:  3/2
2:  7/5
2:  17/12
2:  41/29
2:  99/70
2:  239/169
2:  577/408
2:  1393/985
2:  3363/2378
2:  8119/5741
2:  19601/13860
2:  47321/33461
3/2, 4/3, 7/5, 17/12, 24/17, 41/29, 99/70, 140/99, 239/169, 577/408, 816/577, 1393/985, 3363/2378, 4756/3363, 8119/5741, 19601/13860, 47321/33461, 

Or the golden ratio φ = (1+√5)/2 whose continued fraction is [1; 1, 1, 1, …]:

% ./a.out 1.61803398874989484820
1:  1/1
1:  2/1
1:  3/2
1:  5/3
1:  8/5
1:  13/8
1:  21/13
1:  34/21
1:  55/34
1:  89/55
1:  144/89
1:  233/144
1:  377/233
2/1, 3/2, 5/3, 8/5, 13/8, 21/13, 34/21, 55/34, 89/55, 144/89, 233/144, 377/233, 

(See the Fibonacci numbers? Here the convergents are all the approximants.)

Or with rational numbers like 4/3 = [1; 3]:

% ./a.out 1.33333333333333333333
1:  1/1
3:  4/3
3/2, 4/3, 

or 14/11 = [1; 3, 1, 2]:

% ./a.out 1.27272727272727272727
1:  1/1
3:  4/3
1:  5/4
2:  14/11
3/2, 4/3, 5/4, 9/7, 14/11, 

[2011-11-04: Fixed the bug and the description.]

Written by S

Mon, 2011-01-10 at 02:28:23 +05:30

Posted in mathematics

Shor’s algorithm

leave a comment »

I’d learnt Shor’s algorithm in a course a few years ago, but (as usual) forgot all about it.

Yesterday I reread it from the Algorithms textbook of Dasgupta, Papadimitriou, and Vazirani, only because someone claimed that their Chapter 10 on Quantum algorithms was incomprehensible, so here is a very vague sketch of it before I forget it again. (For a popular account, which Peter Shor called “the best job of explaining quantum computing to the man on the street that I’ve seen”, see Scott Aaronson’s post.)

1. Quantum

The quantum magic in Shor’s algorithm, the part that uses something unique to quantum computation, is simply the following fact: a quantum computer can perform the quantum Fourier transform in O(log2 N) time.

That is, given a “superposition” |\boldsymbol\alpha\rangle = \sum_{j=0}^{N-1}\alpha_j |j\rangle, in O(log2N) steps the quantum fourier-magic-thing produces the superposition |\boldsymbol\beta\rangle = \sum_{j=0}^{N-1} \beta_j |j\rangle with \beta_j = \frac{1}{\sqrt{N}} \sum_{l=0}^{N-1} \omega^{jl}\alpha_l. Of course, it just produces this superposition and does not output it; if we try to measure this we’d get only one of the N states, specifically |j\rangle with probability |\beta_j|^2.

(I will not write here how this QFT works. It uses the Hadamard gate etc., and roughly, what would be two recursive calls in the usual FFT can be done with just one operation here.)

All the other ideas are classical (i.e. non-quantum).

2. Fourier transform of periodic vectors

Suppose k is some factor of N. It can be proved with some straightforward calculation that if |\boldsymbol\alpha\rangle (viewed as a vector/sequence) is periodic with period k, with the only nonzero coefficients being \alpha_l, \alpha_{l+k}, \alpha_{l+2k}, \dots, then its Fourier transform is also periodic with period N/k — the only nonzero coefficients are \beta_0, \beta_{N/k}, \beta_{2N/k}, \dots.

The point is that if we “measure” this, then we get one of the multiples of N/k at random, so by taking a few measurements, we can figure out N/k and therefore k.

3. Factoring reduces to order-finding

We can see from number theory (use the Chinese remainder theorem) that if N is composite (etc.), then with probability at least 1/2, a random x modulo N has an even order r (i.e. the smallest positive integer such that x^r = 1 \mod N) such that x^{r/2} is a nontrivial square root of 1 mod N. So if we can find the order r, then we have y such that y^2 = 1\mod N so (y-1)(y+1) = 0\mod N with neither factor being 0, so we get a factor of N. So all we need to be able to do is find the order.

Modulo N, pick any x and consider the function f(k) = x^k \bmod N. If r is the order of x mod N then the function is periodic with period r. The algorithm runs as follows:

  1. First, make a superposition \sum_{j=0}^{N-1} |j, 0\rangle normalised. (Think of the |\cdot,\cdot\rangle as two quantum “registers”.) (This is made by starting with just |0,0\rangle and taking its Fourier transform!)
  2. Now “compute” f (for some random x) on it. This gives the superposition \sum_{j=0}^{N-1} |j, f(j)\rangle.
  3. Then measure the second register. This gives a random value from the second register, but more importantly, it make the first register be a superposition of only those values that are compatible with this value — therefore, periodic with period r.
  4. Now do the Fourier transform on the first and measure it, so we get a random multiple of N/r.

Doing all this the right number of times with the right constants does it. I’ve cheated in all this, because obviously the N we’re trying to factor is not a power of 2, but that can be taken care of.

I think Shor’s big idea (over Simon’s algorithm for period-finding) was in coming up with this function f whose period would give the factors of N. But I may be wrong.

Written by S

Thu, 2010-07-08 at 04:22:52 +05:30

Posted in mathematics

Equivalent forms of the Riemann hypothesis

with 3 comments

  1. Let H_n be the nth harmonic number, i.e. H_n = 1 + \frac12 + \frac13 + \dots + \frac1n. Then, the Riemann hypothesis is true if and only if

    \displaystyle \sum_{d | n}{d} \le H_n + \exp(H_n)\log(H_n)

    The left-hand side, which is the sum of the divisors of n, is also denoted \sigma(n).
    See Jeffrey Lagarias, An Elementary Problem Equivalent to the Riemann Hypothesis [PDF, arXiv.

  2. Define the Redheffer matrix A = A(n) to be the n \times n 0-1 matrix with entries A_{ij} = 1 \text{ if } j=1 \text{ or } i \text{ divides } j (and 0 otherwise). Then the Riemann hypothesis is true if and only if \det(A) = O(n^{1/2+\epsilon}) for all \epsilon.

For more, see Equivalences to the Riemann hypothesis (by J. Brian Conrey and David W. Farmer), and Consequences of the Riemann hypothesis (MathOverflow)

For fun: claimed [dis/]proofs.

Quote found via rjlipton:

The Riemann Hypothesis is the most basic connection between addition and multiplication that there is, so I think of it in the simplest terms as something really basic that we don’t understand about the link between addition and multiplication.
Brian Conrey

Written by S

Tue, 2010-07-06 at 22:03:54 +05:30

Posted in mathematics

Convex, continuous, Jensen’s

with 3 comments

On the principle that things I’ve forgotten at least thrice are worth remembering…

A convex function (need a mnemonic?) is f such that
\displaystyle f(\lambda x + (1-\lambda)y) \le \lambda f(x) + (1-\lambda)f(y) for any λ in [0,1].

Midpoint convex (almost) implies convex
A natural question is whether we need all λ in [0,1]; what about just ½? Functions which satisfy
\displaystyle f(\frac{x+y}{2}) \le \frac{f(x)+f(y)}{2}
are called midpoint convex.

It turns out that when things are “nice” as in most frequently encountered cases, a midpoint convex function is convex. (A counterexample is this crazy non-measurable nowhere-continuous bounded-in-no-interval function: since R is a vectorspace over Q, there is a basis (that includes π, say), and any real number x has a unique representation as a finite rational sum of basis elements. Define f(x) to be the coefficient of π in the representation of x. This function is in fact linear over Q.)
For midpoint convex to mean convex, any of the following conditions will do:

  • f is continuous [Proof: prove the inequality for λ with denominators that are powers of 2, then use denseness in R.]
  • f is continuous at some point (any one point will do) [Proof: see next section]
  • f is (Lebesgue) measurable
  • f is bounded on some measurable set with positive measure

Convex functions are continuous
In fact, a stronger theorem is true: If a midpoint convex function is discontinuous at a single point, then it is unbounded on every interval and (hence) discontinuous everywhere.
Proof: Suppose wlog that f is discontinuous at 0, that f(0)=0, and that a sequence x_n \rightarrow 0 has f(x_n) \rightarrow m for some positive m. Prove that \liminf f(2^k x_n) \ge 2^k m, so f is unbounded near 0. Then use this to show unbounded everywhere.

Note that convex functions are bounded on every interval, so by above theorem they are continuous.

Convex functions are differentiable almost everywhere
In fact a convex function has left and right derivatives at all points, these are monotonically increasing, and there are only countably many points where they don’t coincide.

All the above is from pages 10 to 16 of this book I had found through googling; if anyone’s reading this, please tell me if you know better proofs.

A proof of Jensen’s inequality
Here, Jensen’s inequality is the statement that for a convex function f,
\displaystyle \mathbb{E}\left[f(X)\right] \geq f\left(\mathbb{E}\left[X\right]\right) .

An equivalent definition of a convex function is that at any point we can find a line through the point that lies entirely below the function. (Why?) In other words, for any point a, there is a “slope” m such that
\displaystyle f(x) \ge f(a) + m(x-a) for all x.
With this definition, Jensen’s inequality has an absurdly short proof: take a to be the mean \mathbb{E}\left[X\right], then take expectation.

Written by S

Tue, 2010-06-29 at 02:32:05 +05:30

Posted in mathematics


leave a comment »

What’s this?


Seeing more should help:

If that’s too easy, how about this?

Statement and proof

Both are from Rekhāgaṇita, which is a c. 1720s translation by Jagannatha of Nasir al-Din al-Tusi’s 13th-century Arabic translation of Euclid’s Elements. It seems to be a straightforward translation of the Arabic — it even uses, to label vertices etc., letters in the Arabic order अ ब ज द ह व झ…. The text retains most of the structure and proposition numbers of Euclid, but in fact the Arabic world has considerably elaborated on Euclid. For instance, for the famous first example above, it gives sixteen additional proofs/demonstrations, which are not in the Greek texts.

The first volume (Books I to VI) is available on Google Books and the second volume (Books VII to XV) through (25 MB PDF).

Notes on the second: some technical vocabulary — a प्रथमाङ्कः is a prime number, and a रूप is a unit (one). The rest of the vocabulary, like “निःशेषं करोति” meaning “divides (without remainder)”, is more or less clear, and also the fact that as in Euclid, numbers are being conceived of as lengths (so दझ and झद mean the same).

It does sound cooler to say “इदमशुद्धम्” than “But this is a contradiction”. :-)

Written by S

Wed, 2010-05-12 at 21:20:19 +05:30

Posted in mathematics, sanskrit

Tagged with ,


Get every new post delivered to your Inbox.

Join 67 other followers