The Lumber Room

"Consign them to dust and damp by way of preserving them"

Archive for the ‘mathematics’ Category

The functional equation f(x+y) = f(x)f(y)

with one comment

Suppose f: \mathbb{R} \to \mathbb{R} satisfies f(x+y) = f(x) f(y). What can we say about f?

Putting y = 0 gives

\displaystyle f(x) = f(x+0) = f(x)f(0),

which can happen if either f(x) = 0 or f(0) = 1. Note that the function f which is identically zero satisfies the functional equation. If f is not this function, i.e., if f(x) \neq 0 for at least one value of x, then plugging that value of x (say x^*) into the equation gives f(0) = 1. Also, for any x, the equation f(x^*) = f(x +x^* - x) = f(x)f(x^* - x) forces f(x) \neq 0 as well. Further, f(x) = f(x/2 + x/2) = f(x/2)^2 so f(x) > 0 for all x.

Next, putting y = x gives f(2x) = f(x)^2, and by induction f(nx) = f(x)^n. Putting \frac{x}{n} in place of x in this gives f(n\frac{x}{n}) = f(\frac{x}{n})^n which means f(\frac{x}{n}) = f(x)^{\frac1n} (note we’re using f(x) > 0 here). And again, f(\frac{m}{n}x) = f(x)^{m/n}. So f(\frac{m}{n}) = f(1)^{m/n}, which completely defines the function at rational points.

[As f(1) > 0, it can be written as f(1) = e^k for some constant k, which gives f(x) = e^{kx} for rational x.]

To extend this function to irrational numbers, we need some further assumptions on f, such as continuity. It turns out that being continuous at any point is enough (and implies the function is f(x) = f(1)^x everywhere): note that f(x + m/n) = f(x)f(m/n) = f(x)f(1)^{m/n}. Even being Lebesgue-integrable/measurable will do.

Else, there are discontinuous functions satisfying the functional equation. (Basically, we can define the value of the function separately on each “independent” part. That is, define the equivalence class where x and y are related if y = r_1x + r_2 for rationals r_1 and r_2, pick a representative for each class using the axiom of choice (this is something like picking a basis for \mathbb{R}/\mathbb{Q}, which corresponds to the equivalence class defined by the relation y = r_1x), define the value of the function independently for each representative, and this fixes the value of f on \mathbb{R}. See this article for more details.)

To step back a bit: what the functional equation says is that f is a homorphism from (\mathbb{R}, +), the additive group of real numbers, to (\mathbb{R}, \times), the multiplicative monoid of real numbers. If f is not the trivial identically-zero function, then (as we saw above) f is in fact a homomorphism from (\mathbb{R}, +), the additive group of real numbers, to (\mathbb{R_+^*}, \times), the multiplicative group of positive real numbers. What we proved is that the exponential functions e^{kx} are precisely all such functions that are nice (nice here meaning either measurable or continuous at least one point). (Note that this set includes the trivial homomorphism corresponding to k = 0: the function f(x) = 1 identically everywhere. If f is not this trivial map, then it is in fact an isomorphism.)

Written by S

Mon, 2013-04-08 at 11:24:08 +05:30

Posted in mathematics

Trajectory of a point moving with acceleration perpendicular to velocity

with 7 comments

(Just some basic high-school physics stuff; to assure myself I can still do some elementary things. :P Essentially, showing that if a particle moves with acceleration perpendicular to velocity, or velocity perpendicular to position, then it traces out a circle. Stop reading here if this is obvious.)

Suppose a point moves in the plane such that its acceleration is always perpendicular to its velocity, and of the same magnitude. What is its path like?

To set up notation: let’s say the point’s position at time t is (p_x(t), p_y(t)), its velocity is (v_x(t), v_y(t)) = \left(\frac{d}{dt}p_x(t), \frac{d}{dt}p_y(t)\right), and its acceleration is (a_x(t), a_y(t)) = \left(\frac{d}{dt}v_x(t), \frac{d}{dt}v_y(t)\right).

The result of rotating a point (x,y) by 90° is (-y, x). (E.g. see figure below)

rotate90

So the fact that acceleration is at right angles to velocity means that (a_x(t), a_y(t)) = (-v_y(t), v_x(t)), or, to write everything in terms of the velocity,

\begin{aligned} \frac{d}{dt}v_x(t) &= -v_y(t) \\  \frac{d}{dt}v_y(t) &= v_x(t) \end{aligned}

where we can get rid of v_x(t) by substituting the second equation (in the form v_x(t) = \frac{d}{dt}v_y(t)) into the first:

v_y(t) = -\frac{d}{dt}v_x(t) = -\frac{d}{dt}\left(\frac{d}{dt}v_y(t)\right)

or in other words

v_y(t) = -\frac{d^2}{dt^2}v_y(t).

By some theory about ordinary differential equations, which I don’t know (please help!) (but see the very related example you saw in high school, of simple harmonic motion), the solutions to this equation are \sin(t) and \cos(t) and any linear combination of those: the solution in general is

\begin{aligned}  v_y(t) &= a \sin(t) + b \cos(t) \\  &= \sqrt{a^2 + b^2} \left(\frac{a}{\sqrt{a^2+b^2}}\sin(t) + \frac{b}{\sqrt{a^2+b^2}}\cos(t)\right) \\  &= R\sin (t + \alpha)  \end{aligned}

where R = \sqrt{a^2 + b^2} and \alpha is the angle such that \cos(\alpha) = \frac{a}{\sqrt{a^2+b^2}} and \sin(\alpha) = \frac{b}{\sqrt{a^2+b^2}}. And the fact that v_x(t) = \frac{d}{dt}v_y(t) gives v_x(t) = R\cos(t + \alpha). So (v_x(t), v_y(t)) = (R\cos(t + \alpha), R\sin(t + \alpha)). Note that (a_x(t), a_y(t)) = \left(\frac{d}{dt}v_x(t), \frac{d}{dt}v_y(t)\right) = (-R\sin(t+\alpha), R\cos(t+\alpha)) is indeed perpendicular to (v_x(t), v_y(t)) as we wanted.

The actual trajectory (p_x(t), p_y(t)) can be got by integrating

\left(\frac{d}{dt}p_x(t), \frac{d}{dt}p_y(t)\right)  = (v_x(t), v_y(t)) = (R\cos(t + \alpha), R\sin(t + \alpha))

to get p_x(t) = R\sin(t + \alpha) + c_1 and p_y(t) = -R\cos(t + \alpha) + c_2. This trajectory is a point moving on a circle centered at point (c_1, c_2) and of radius R, with speed R or unit angular speed. Note that velocity is also perpendicular to the point’s position wrt the centre of the circle, as velocity is tangential to the circle, as it should be.

With a suitable change of coordinates (translate the origin to (c_1, c_2), then rotate the axes by \frac{\pi}{2}+\alpha, then scale everything so that R = 1), this is the familiar paremetrization (\cos(t), \sin(t)) of the circle.


Note: Just as we derived (v_x(t), v_y(t)) = (R\cos(t + \alpha), R\sin(t + \alpha)) from assuming that the acceleration is perpendicular to velocity, we can, by assuming that velocity is perpendicular to position, identically derive (p_x(t), p_y(t)) = (R\cos(t + \alpha), R\sin(t + \alpha)), i.e. that the point moves on a circle.

Written by S

Sun, 2013-04-07 at 23:38:01 +05:30

Posted in mathematics

The power series for sin and cos

leave a comment »

There are many ways to derive the power series of \sin x and \cos x using the machinery of Taylor series etc., but below is another elementary way of demonstrating that the well-known power series expansions are the right ones. The argument below is from Tristan Needham’s Visual Complex Analysis, which I’m reproducing without looking at the book just to convince myself that I’ve internalized it correctly.

So: let
\displaystyle  \begin{aligned}   P(x) &= x - \frac{x^3}{3!} + \frac{x^5}{5!} - \frac{x^7}{7!} + \dots \quad \text{ and }\\  Q(x) &= 1 - \frac{x^2}{2!} + \frac{x^4}{4!} - \frac{x^6}{6!} + \dots .  \end{aligned}

We will take the following two for granted (both can be proved with some effort):

  1. Both power series are convergent.
  2. The power series can be differentiated term-wise.

As suggested by (2) above, the first thing we observe is that \frac{d}{dx}P(x) = Q(x) and \frac{d}{dx}Q(x) = -P(x).


So firstly:
\begin{aligned}  \frac{d}{dx}(P(x)^2 + Q(x)^2)   &= 2P(x)P'(x) + 2Q(x)Q'(x) \\  &= 2P(x)Q(x) - 2Q(x)P(x) \\  &= 0  \end{aligned}
which means that P(x)^2 + Q(x)^2 is a constant and does not vary with x. Putting x = 0 shows that P(0) = 0 and Q(0) = 1, so P(x)^2 + Q(x)^2 = 1 for all x.


Secondly, define the angle \theta as a function of x, by \tan \theta(x) = P(x)/Q(x). (To be precise, this defines \theta(x) up to a multiple of \pi, i.e. modulo \pi.)
Differentiating the left-hand side of this definition gives
\displaystyle \begin{aligned}  \frac{d}{dx} \tan \theta(x)   &= (1 + \tan^2 \theta(x)) \theta'(x) \\  &= (1 + \frac{P(x)^2}{Q(x)^2}) \theta'(x) \\  &= \frac{1}{Q(x)^2} \theta'(x)   \end{aligned}
(where \theta'(x) means \frac{d}{dx} \theta(x))
while differentiating the right-hand side gives
\displaystyle \begin{aligned}  \frac{d}{dx} \frac{P(x)}{Q(x)}   &= \frac{Q(x)P'(x) - P(x)Q'(x)}{Q(x)^2} \\  &= \frac{Q(x)Q(x) + P(x)P(x)}{Q(x)^2} \\  &= \frac{1}{Q(x)^2}  \end{aligned}

The necessary equality of the two tells us that \frac{d}{dx}\theta(x) = 1, which along with the initial condition \tan \theta(0) = P(0)/Q(0) = 0 = \tan 0 that says \theta(0) \equiv 0 \mod \pi, gives \theta(x) = x (or to be precise, \theta(x) \equiv x \pmod {\pi}).


In other words, we have shown that the power series P(x) and Q(x) satisfy \frac{P(x)}{Q(x)} = \tan x = \frac{\sin x}{\cos x} and therefore P(x) = k \sin x and Q(x) = k \cos x for some k. The observation that Q(0) = 1 = \cos 0 (or our earlier observation that P(x)^2 + Q(x)^2 = 1 for all x) gives k = 1, thereby showing that P(x) = \sin x and Q(x) = \cos x.


So much for \sin x and \cos x. Just as an aside, observe that if we take i to be a symbol satisfying i^2 = -1, then
\displaystyle \begin{aligned}  \cos x + i\sin x   &= Q(x) + iP(x) \\  &= \left(1 - \frac{x^2}{2!} + \frac{x^4}{4!} - \frac{x^6}{6!} + \dots\right) + i\left(x - \frac{x^3}{3!} + \frac{x^5}{5!} - \frac{x^7}{7!} + \dots \right) \\  &= 1 + ix + \frac{-x^2}{2!} + \frac{-ix^3}{3!} + \frac{x^4}{4!} + \frac{ix^5}{5!} + \frac{-x^6}{6!} + \frac{-ix^7}{7!} + \dots \\  &= 1 + ix + \frac{(ix)^2}{2!} + \frac{(ix)^3}{3!} + \frac{(ix)^4}{4!} + \frac{(ix)^5}{5!} + \frac{(ix)^6}{6!} + \frac{(ix)^7}{7!} + \dots  \end{aligned}
the right hand side of which looks very much like the result of “substituting” y = ix in the known (real) power series
\displaystyle e^y = 1 + y + \frac{y^2}{2!} + \frac{y^3}{3!} + \dots
(which itself can be proved using the term-wise differentiation above and the defining property \frac{d}{dx} e^x = e^x, say).

So this is one heuristic justification for us to define e^{ix} = \cos x + i\sin x.
Or, if we define e^{ix} as the result of substituting ix in the real power series for e^y, this proves that e^{ix} = \cos x + i\sin x.

Written by S

Fri, 2013-03-08 at 00:33:12 +05:30

Posted in mathematics

The potions puzzle by Snape in Harry Potter and the Philosopher’s Stone

with one comment

Sometime this week, I reread the first Harry Potter book (after at least 10 years… wow, has it been that long?), just for contrast after reading Rowling’s adult novel The Casual Vacancy (on which more later). Anyway, in the penultimate chapter there is a puzzle:

He pulled open the next door, both of them hardly daring to look at what came next — but there was nothing very frightening in here, just a table with seven differently shaped bottles standing on it in a line.
“Snape’s,” said Harry. “What do we have to do?”
They stepped over the threshold, and immediately a fire sprang up behind them in the doorway. It wasn’t ordinary fire either; it was purple. At the same instant, black flames shot up in the doorway leading onward. They were trapped.
“Look!” Hermione seized a roll of paper lying next to the bottles. Harry looked over her shoulder to read it:

Danger lies before you, while safety lies behind,
Two of us will help you, whichever you would find,
One among us seven will let you move ahead,
Another will transport the drinker back instead,
Two among our number hold only nettle wine,
Three of us are killers, waiting hidden in line.
Choose, unless you wish to stay here forevermore,
To help you in your choice, we give you these clues four:
First, however slyly the poison tries to hide
You will always find some on nettle wine’s left side;
Second, different are those who stand at either end,
But if you would move onward, neither is your friend;
Third, as you see clearly, all are different size,
Neither dwarf nor giant holds death in their insides;
Fourth, the second left and the second on the right
Are twins once you taste them, though different at first sight.

I became curious about whether this is just a ditty Rowling made up, or the puzzle actually makes sense and is consistent. It turns out she has constructed it well. Let’s take a look. This investigation can be carried out by hand, but we’ll be lazy and use a computer, specifically Python. The code examples below are all to be typed in an interactive Python shell, the one that you get by typing “python” in a terminal.

So what we have are seven bottles, of which one will take you forward (F), one will let you go back (B), two are just nettle wine (N), and three are poison (P).

>>> bottles = ['F', 'B', 'N', 'N', 'P', 'P', 'P']

The actual ordering of these 7 bottles is some ordering (permutation) of them. As 7 is a very small number, we can afford to be inefficient and resort to brute-force enumeration.

>>> import itertools
>>> perms = [''.join(s) for s in set(itertools.permutations(bottles))]
>>> len(perms)
420

The set is needed to remove duplicates, because otherwise itertools.permutations will print 7! “permutations”. So already the number of all possible orderings is rather small (it is \frac{7!}{2!3!} = 420). We can look at a sample to check whether things look fine.

>>> perms[:10]
['PNFNPBP', 'NPPBNFP', 'FNNPBPP', 'PPPFNBN', 'NPPNBFP', 'PFNNBPP', 'NPBPPFN', 'NBNPPFP', 'NPPFBNP', 'BNPFNPP']

Now let us try to solve the puzzle. We can start with the first clue, which says that wherever a nettle-wine bottle occurs, on its left is always a poison bottle (and in particular therefore, a nettle-wine bottle cannot be in the leftmost position). So we must restrict the orderings to just those that satisfy this condition.

>>> def clue1(s): return all(i > 0 and s[i-1] == 'P' for i in range(len(s)) if s[i]=='N')
...
>>> len([s for s in perms if clue1(s)])
60

(In the code, the 7 positions are 0 to 6, as array indices in code generally start at 0.)
Then the second clue says that the bottles at the end are different, and neither contains the potion that lets you go forward.

>>> def clue2(s): return s[0] != s[6] and s[0] != 'F' and s[6] != 'F'
...
>>> len([s for s in perms if clue1(s) and clue2(s)])
30

The third clue says that the smallest and largest bottles don’t contain poison, and this would be of help to Harry and Hermione who can see the sizes of the bottles. But as we readers are not told the sizes of the bottles, this doesn’t seem of any help to us; let us return to this later.

The fourth clue says that the second-left and second-right bottles have the same contents.

>>> def clue4(s): return s[1] == s[5]
...
>>> len([s for s in perms if clue1(s) and clue2(s) and clue4(s)])
8

There are now just 8 possibilities, finally small enough to print them all.

>>> [s for s in perms if clue1(s) and clue2(s) and clue4(s)]
['PPNBFPN', 'BPNPFPN', 'BPFPNPN', 'BPPNFPN', 'PNPFPNB', 'BPNFPPN', 'PPNFBPN', 'PNFPPNB']

Alas, without knowing which the “dwarf” and “giant” bottles are, we cannot use the third clue, and this seems as far as we can go. We seem to have exhausted all the information available…

Almost. It is reasonable to assume that the puzzle is meant to have a solution. So even without knowing where exactly the “dwarf” and “giant” bottles are, we can say that they are in some pair of locations that ensure a unique solution.

>>> def clue3(d, g, s): return s[d]!='P' and s[g]!='P'
...
>>> for d in range(7):
...   for g in range(7):
...     if d == g: continue
...     poss = [s for s in perms if clue1(s) and clue2(s) and clue4(s) and clue3(d,g,s)]
...     if len(poss) == 1:
...       print d, g, poss[0]
... 
1 2 PNFPPNB
1 3 PNPFPNB
2 1 PNFPPNB
2 5 PNFPPNB
3 1 PNPFPNB
3 5 PNPFPNB
5 2 PNFPPNB
5 3 PNPFPNB

Aha! If you look at the possible orderings closely, you will see that we are down to just two possibilities for the ordering of the bottles.
Actually there is some scope for quibbling in what we did above: perhaps we cannot say that there is a unique solution determining the entire configuration; perhaps all we can say is that the puzzle should let us uniquely determine the positions of just the two useful bottles. Fortunately, that gives exactly the same set of possibilities, so this distinction happens to be inconsequential.

>>> for d in range(7):
...   for g in range(7):
...     if d == g: continue
...     poss = [(s.index('F'),s.index('B')) for s in perms if clue1(s) and clue2(s) and clue4(s) and clue3(d,g,s)]
...     if len(set(poss)) == 1:
...       print d, g, [s for s in perms if clue1(s) and clue2(s) and clue4(s) and clue3(d,g,s)][0]
... 
1 2 PNFPPNB
1 3 PNPFPNB
2 1 PNFPPNB
2 5 PNFPPNB
3 1 PNPFPNB
3 5 PNPFPNB
5 2 PNFPPNB
5 3 PNPFPNB

Good. Note that there are only two configurations above. So with only the clues in the poem, and the assumption that the puzzle can be solved, we can narrow down the possibilities to two configurations, and be sure of the contents of all the bottles except the third and fourth. We know that the potion that lets us go forward is in either the third or the fourth bottle.

In particular we see that the last bottle lets us go back, and indeed this is confirmed by the story later:

“Which one will get you back through the purple flames?”
Hermione pointed at a rounded bottle at the right end of the line.
[…]
She took a long drink from the round bottle at the end, and shuddered.

But we don’t know which of the two it is, as we can’t reconstruct all the relevant details of the configuration. Perhaps we can reconstruct something with the remaining piece of information from the story?

“Got it,” she said. “The smallest bottle will get us through the black fire — toward the Stone.”
Harry looked at the tiny bottle.
[…]
Harry took a deep breath and picked up the smallest bottle.

So we know that the bottle that lets one move forward is in fact in the smallest one, the “dwarf”.

>>> for d in range(7):
...   for g in range(7):
...     poss = [s for s in perms if clue1(s) and clue2(s) and clue4(s) and clue3(d,g,s)]
...     if len(poss) == 1 and poss[0][d] == 'F':
...       print d, g, poss[0]
... 
2 1 PNFPPNB
2 5 PNFPPNB
3 1 PNPFPNB
3 5 PNPFPNB

This narrows the possible positions of the smallest and largest bottles (note that it says the largest bottle is one that contains nettle wine), but still leaves the same two possibilities for the complete configuration. So we can stop here.

Conclusion
What we can conclude is the following: apart from the clues mentioned in the poem, the “dwarf” (the smallest bottle) was in either position 2 (third from the left) or 3 (fourth from the left). The biggest bottle was in either position 1 (second from the left) or 5 (sixth from the left). With this information about the location of the smallest bottle (and without necessarily assuming the puzzle has a unique solution!), Hermione could determine the contents of all the bottles. In particular she could determine the location of the two useful bottles: namely that the bottle that lets you go back was the last one, and that the one that lets you go forward was the smallest bottle.

>>> for (d,g) in [(2,1), (2,5), (3,1), (3,5)]:
...   poss = [s for s in perms if clue1(s) and clue2(s) and clue4(s) and clue3(d, g, s)]
...   assert len(poss) == 1
...   s = poss[0]
...   assert s.index('B') == 6
...   assert s.index('F') == d
...   print (d,g), s
... 
(2, 1) PNFPPNB
(2, 5) PNFPPNB
(3, 1) PNPFPNB
(3, 5) PNPFPNB

It is not clear why she went to the effort to create a meaningful puzzle, then withheld details that would let the reader solve it fully. Perhaps some details were removed during editing. As far as making up stuff for the sake of a story goes, though, this is nothing; consider for instance the language created for Avatar which viewers have learned.

Added:
See also http://www.zhasea.com/logic/snape.html which does it by hand, and has a perfectly clear exposition (it doesn’t try the trick of guessing that solution is unique before reaching for the additional information from the story).

Written by S

Mon, 2012-10-22 at 02:00:12 +05:30

Are there Fibonacci numbers starting with 2012? (continued)

leave a comment »

Almost 8 months ago I left the first part of this post unfinished planning to complete it in the morning; seems I slept too long. (Or as this guy said after a 2-year hiatus: “Sorry for the pause guys. I was in the bathroom.”)

Recall: To get a power of 2 that starts with a prefix p (like p = 2012), we want n such that the fractional part of n\log 2 lies between those of \log p and \log(p+1) (all logarithms here to base 10), and similarly to get a Fibonacci number starting with p, we want (with some hand-waving) n such that the fractional part of n\log\phi lies between those of \log(p) + \log{\sqrt{5}} and \log(p+1) + \log{\sqrt{5}}. The more general problem is:

Problem: Given an irrational number \theta and an interval (a,b) \subset (0,1), find n such that \mathrm{frac}(n\theta) lies in the interval (a,b).

Here is one method, based on Edward Burger’s book Exploring the Number Jungle. Let \alpha be the midpoint of the interval (a,b). Then we are trying to find n such that \mathrm{frac}(n\theta) is close to \alpha.

  • Find a fraction \displaystyle \frac{p}{q} approximating \theta, such that \displaystyle |q\theta - p| < \frac1q. (These are the convergents of the continued fraction of \theta, but in practice it seems you can also get away with taking semi-convergents that may not satisfy this property.)
  • Let N be the closest integer to q\alpha. Note that this automatically means \displaystyle |q\alpha - N| \le \frac12
  • Write \displaystyle N = yp - xq with \displaystyle |y| \le \frac{q}{2}. This you can do quite easily with the Euclidean algorithm.
  • Then for n = q + y and k = p + x, we have (it is a simple exercise to prove this)
    \displaystyle |\theta n - k - \alpha| < \frac{3}{n}
  • This means that the distance between n\theta and \alpha is small, modulo 1. If this distance turns out to be still too large, start with a bigger convergent \frac{p}{q}.

I think I had some code to post as well (hey, what’s the actual Fibonacci number that starts with 2012?), but it needs to be cleaned up… probably this works (in Sage).

The thing we’re doing here is called inhomogeneous Diophantine approximation.

[Originally posted on math.stackexchange, here and here.]

Written by S

Fri, 2012-08-03 at 01:56:18 +05:30

Posted in mathematics

Are there Fibonacci numbers starting with 2012?

with one comment

Do there exist powers of 2 whose first four digits are 2012?
Are there Fibonacci numbers whose first four digits are 2012?

If the answer is obvious to you (or you don’t care), you can stop reading.

The answer for both is:

  • Yes.
  • There are infinitely many such numbers.
  • In fact, the fraction of (powers of 2 / Fibonacci numbers) starting with 2012 is exactly

    \displaystyle \frac{\log 2013 - \log 2012}{\log 10} \approx \frac1{4644} \approx 0.000216

Similarly with any other prefix p (of any length) in place of 2012. Proof follows.


A number x starts with a prefix p if and only if for some k ≥ 0,

\displaystyle p \cdot 10^k \le x < (p+1) \cdot 10^k

Thus a power of 2, say 2n, starts with p iff

\displaystyle p \cdot 10^k \le 2^n < (p+1) \cdot 10^k for some k \ge 0

Taking logarithms to base 10 and simplifying, this is equivalent to

\displaystyle \log p < n\log2 - k < \log(p+1) for some k \ge 0

This is saying that the fractional part of n\log 2 lies between the fractional parts of \log p and \log (p+1). For example, if p = 2012, this means that the fractional part of n\log 2 lies between \log 2.012 \approx 0.303628 and \log 2.013 \approx 0.303844.


Similarly, for Fibonacci numbers, as is (or should be) well-known, the nth Fibonacci number Fn is the closest integer to \displaystyle \frac{\phi^n}{\sqrt5}, where \displaystyle \phi = \frac{1+\sqrt5}2 is the golden ratio. So F_n starts with p iff

\displaystyle p \cdot 10^k - \frac12 \quad < \quad \frac{\phi^n}{\sqrt5} \quad < \quad (p+1) \cdot 10^k - \frac12

Taking logarithms to base 10 and simplifying, while ignoring the annoying \frac12 which becomes irrelevant in the limit (this line is not rigorous), this is equivalent to

\displaystyle \log(p) + \log\sqrt5 \quad < \quad n\log\phi - k \quad < \quad \log(p+1) + \log\sqrt5

which means that the fractional part of n\log \phi lies between the fractional parts of \log p +  \log\sqrt5 and \log (p+1) + \log\sqrt5. For p = 2012, this means that the fractional part of n\log \phi lies between \log 2.012 + \log\sqrt5 \approx 0.653113 and \log 2.013 + \log\sqrt5 \approx 0.653329.

In either case, we are trying to make the fractional part of n\theta, for some irrational number \theta, lie in some interval. The relevant fact is this:
Theorem 1: for any irrational number \theta, the sequence \mathrm{frac}(n\theta) (where \mathrm{frac}(x) denotes the fractional part of x) is dense in [0, 1).
or, in other words,
Theorem 1: For any irrational number \theta, the sequence n\theta is dense modulo 1.
Proving this theorem is a good exercise.

This means that for any interval you want, you can always find some n such that the fractional part of n\theta lies in your interval. In fact, because the sequence is dense, you can find an infinite sequence of n such that the fractional parts of n\theta converge to the midpoint (say) of the desired interval. This proves the first two facts of the answer, and for the third we need a stronger theorem:

Theorem 2 [Equidistribution theorem]: For any irrational number \theta, the numbers n\theta are uniformly distributed modulo 1.

This means that for any interval I \subset [0,1] of size s (say), the fraction of integers n for which \mathrm{frac}(n\theta) lies in the interval satisfies

\displaystyle \lim_{N \to \infty} \frac{ \left|\left\{ 1 \le n \le N : \mathrm{frac}(n\theta) \in I\right\}\right|}{N} = s

This proves the third fact. The fraction of Fibonacci numbers (or of powers of a number that is not a power of 10) that start with a prefix p \ge 1 is exactly \log(p+1) - \log(p) where log is to base 10.


That much is standard. And non-constructive. We are assured of the existence of such numbers, but how do we actually find one?

The answer (or one answer), as it so often does in these problems, involves continued fractions. Here is one method, [to be continued when I wake up :p]
Edit: Continued here.

Written by S

Sun, 2012-01-08 at 01:29:26 +05:30

Posted in mathematics, unfinished

(1+ix/n)^n

leave a comment »

[Posting some images here for possible future reuse.]

\displaystyle \lim_{n\to\infty}\left(1 + \frac{ix}{n}\right)^n = \cos x + i \sin x

A non-rigorous argument: when n is large enough so that x/n is small, (1 + ix/n) is roughly (hand-waving) the point on the unit circle at arc length (and hence angle) x/n:

So multiplication by (1+ix/n) roughly corresponds to rotation by angle x/n. Multiplication by (1+ix/n)^n, which is multiplication by (1+ix/n) n times, roughly corresponds to rotation by angle n(x/n) = x. As n \to \infty, “roughly” becomes exact.

Animation for x = 1:

Image generated from Python-generated SVG files; code available if anyone wants.

In particular, once one accepts the fact/definition that \lim_{n\to\infty}\left(1 + \frac{x}{n}\right)^n = e^x (for instance, show that the function f(x) = \lim_{n\to\infty}\left(1 + \frac{x}{n}\right)^n satisfies f(x+y) = f(x)f(y)), it is true that e^{i\pi} is a rotation by π, that is,

\displaystyle e^{i\pi} = -1

Written by S

Tue, 2011-06-21 at 18:04:25 +05:30

Posted in mathematics

Serieshelpmate in 19

with 2 comments

Here’s a brilliant problem.

Consider the following chess position.

Black is to make 19 consecutive moves, after which White checkmates Black in one move. Black may not move into check, and may not check White (except possibly on his last move). Black and White are cooperating to achieve the aim of checkmate. (In chess problem parlance, this problem is called a serieshelpmate in 19.) How many different solutions are there?

This problem is due to Kauko Väisänen, and appears in A. Puusa, Queue Problems, Finnish Chess Problem Society, Helsinki, 1992 (Problem 2).

Hint: the above is quoted from Richard Stanley’s Enumerative Combinatorics.

Written by S

Sun, 2011-05-29 at 15:30:25 +05:30

Posted in mathematics

How does Tupper’s self-referential formula work?

with 15 comments

[I write this post with a certain degree of embarrassment, because in the end it turns out (1) to be more simple than I anticipated, and (2) already done before, as I could have found if I had internet access when I did this. :-)]

The so-called “Tupper’s self-referential formula” is the following, due to Jeff Tupper.

Graph the set of all points {(x,y)} such that

\displaystyle  \frac12 < \left\lfloor \mathrm{mod} \left( \left\lfloor{\frac{y}{17}}\right\rfloor 2^{-17\lfloor x \rfloor - \mathrm{mod}(\lfloor y \rfloor, 17)}, 2 \right) \right\rfloor

in the region

\displaystyle  0 < x < 106

\displaystyle  N < y < N+17

where N is the following 544-digit integer:
48584506361897134235820959624942020445814005879832445494830930850619
34704708809928450644769865524364849997247024915119110411605739177407
85691975432657185544205721044573588368182982375413963433822519945219
16512843483329051311931999535024137587652392648746133949068701305622
95813219481113685339535565290850023875092856892694555974281546386510
73004910672305893358605254409666435126534936364395712556569593681518
43348576052669401612512669514215505395545191537854575257565907405401
57929001765967965480064427829131488548259914721248506352686630476300

The result is the following graph:

Figure 1: The graph of the formula, in some obscure region, is a picture of the formula itself.

Whoa. How does this work?

At first sight this is rather too incredible for words.

But after a few moments we can begin to guess what is going on, and see that—while clever—this is perhaps not so extraordinary after all. So let us calmly try to reverse-engineer this feat.

Read the rest of this entry »

Written by S

Tue, 2011-04-12 at 13:05:20 +05:30

Posted in mathematics

Tagged with , , ,

“Your monkey did not jump high enough”

with 4 comments

Yesterday, in Futility Closet there was a post:

In Longfellow’s novel Kavanagh, Mr. Churchill reads a word problem to his wife:

“In a lake the bud of a water-lily was observed, one span above the water, and when moved by the gentle breeze, it sunk in the water at two cubits’ distance. Required the depth of the water.”

“That is charming, but must be very difficult,” she says. “I could not answer it.”

Is it? If a span is 9 inches and a cubit is 18 inches, how deep is the water?


The problem is simple enough: if the depth of the water is x inches so that the lotus from bottom to tip is x+9 inches, then x2+362=(x+9)2, which means x=(362-92)/18=135/2=67.5.

More interestingly, as I accidentally recognised (I don’t know how), it is from the Sanskrit mathematics text Lilavati (and also found in the Bījagaṇita) of Bhaskaracharya (Bhaskara II). That entire chapter of Kavanagh is essentially quoting the Lilavati (Kavanagh is written in a somewhat embarrassing tone that perhaps explains why it’s so obscure :p); it’s included later below the horizontal line in this post.

Bhaskaracharya, believed to have lived in the 12th century, is considered the last great Indian mathematician, outside of the Kerala school. Like most Sanskrit texts, the Līlāvati is written in verse, so as to be easier to memorise. Unlike many Sanskrit technical works (or for that matter technical works in any language), however, Bhāskara’s works are not written in the typical dry style, and can veer quite poetic at times. His description of the seasons in one of his astronomical works is one of the few true instances of poetry in the Sanskrit astronomical/mathematical corpus. This particular problem, it happens, is written in the beautiful mandākrānta metre: (If it helps: mandakranta is the metre of the Meghadūta, of “शान्ताकारं भुजगशयनं…”, of “नास्था धर्मे न वसुनिचये…”, etc., and you can listen to a recitation in the Marathi tradition by Ashwini Deo.)

चक्रक्रौञ्चाकुलितसलिले क्वापि दृष्टं तडागे
तोयादूर्ध्वं कमलकलिकाग्रं वितस्तिप्रमाणम्
मन्दं मन्दं चलितमनिलेनाऽऽहतं हस्तयुग्मे
तस्मिन्मग्नं गणक कथय क्षिप्रमम्बुप्रमाणम्

cakra-krauñcākulita-salile kvāpi dṛṣṭaṃ taḍāge
toyād ūrdhvaṃ kamala-kalikāgraṃ vitasti-pramāṇam
mandaṃ mandaṃ calitam anilenāhataṃ hasta-yugme
tasmin magnaṃ gaṇaka kathaya kṣipram ambu-pramāṇam

In a certain lake swarming with geese and cranes,
the tip of a bud of lotus was seen one span above the water.
Forced by the wind, it gradually moved, and was submerged at a distance of two cubits.
O mathematician, tell quickly the depth of the water.


Well, that’s my translation, close to Longfellow’s quoted translation by Taylor and to Colebrooke’s better translation, but I may be wrong, so details for anyone who cares to improve it:

In a certain [kvāpi] pool [taḍāge] whose water [salile] was swarming [ākulita] with ruddy geese [cakra] and curlews [krauñcā],
above the water [toyād ūrdhvaṃ] a lotus-bud-tip [kamala-kalikāgraṃ] at a distance of one span [vitasti-pramāṇam] was seen [dṛṣṭaṃ].
Slowly slowly [mandaṃ mandaṃ] by the wind [anilena] moved [calitam] and forced [āhataṃ],
at a distance of two cubits [hasta-yugme] it got submerged [magnaṃ] in the water [tasmin].
O mathematician [gaṇaka], say [kathaya] quickly [kṣipram] the depth of the water [ambu-pramāṇam].

The structure of the book may be worth remarking on: the general formula for exactly this problem is given first (in more technical terms), and then this problem is given as an example!

Glancing through Longfellow, one finds he’s also written a tiny poem called King Trisanku:

Viswamitra the Magician,
By his spells and incantations,
Up to Indra’s realms elysian
Raised Trisanku, king of nations.

Indra and the gods offended
Hurled him downward, and descending
In the air he hung suspended,
With these equal powers contending.

Thus by aspirations lifted,
By misgivings downward driven,
Human hearts are tossed and drifted
Midway between earth and heaven.

Ho hum (1845 America).

The chapter of Kavanagh below this line.


Read the rest of this entry »

Written by S

Sat, 2011-01-29 at 04:06:14 +05:30

Posted in mathematics, sanskrit

Not all best rational approximations are the convergents of the continued fraction!

with 15 comments

[For more background on continued fractions and why they are so wonderful at approximations (and wonderful generally) — eventually I may edit this post to mention that. For now I just want to quickly clarify something, which surprisingly many popular expositions of continued fractions seem to mislead by leaving out.]

Any real number can be written as a continued fraction. For instance, the square root of 3 is
\sqrt{3} = 1+\cfrac{1}{1+\cfrac{1}{2+\cfrac{1}{1+\cfrac{1}{2+\dots}}}}
written as \sqrt{3} = [1; 1, 2, 1, 2, \dots],

and π is

\pi = 3+\cfrac{1}{7+\cfrac{1}{15+\cfrac{1}{1+\cfrac{1}{292+\dots}}}}

i.e. \pi = [3; 7, 15, 1, 292, \dots] (more terms below),

and e is e = [2; 1, 2, 1, 1, 4, 1, 1, 6, 1, 1, 8, 1,\dots] (proved here).

Rational numbers have finite continued fractions, and irrational numbers have infinite continued fractions, of which only quadratic irrationals (like √3 above) have periodic continued fractions. Non-quadratic irrationals may still have some recognisable patterns in their representation, like e above, or not, like π.

The numbers you get by truncating the continued fraction — taking only the first few terms — are called its convergents. The continued fraction for π is [3; 7, 15, 1, 292, 1, 1, 1, 2, 1, 3, 1, 14, 2…] (sequence A001203 in OEIS), and the sequence of convergents is 3/1, 22/7, 333/106, 355/113, 103993/33102… (A002485/A002486):

\begin{array}{cll} p_1/q_1 &= [3;] &= 3/1 \\ p_2/q_2 &= [3; 7] &= 22/7 \\  p_3/q_3 &= [3; 7, 15] &= 333/106 \\  p_4/q_4 &= [3; 7, 15, 1] &= 355/113 \\ p_5/q_5 &= [3; 7, 15, 1, 292] &= 103993/33102\end{array}

Each of these convergents is a best rational approximation, in the following sense.

Definition: A fraction p/q is a best rational approximation of a real number x if p/q is closer to x than any other fraction with a smaller denominator. That is, for any other fraction p'/q' with q' \le q and p'/q' \ne p/q, we have \left|\frac{p}{q} -x \right| < \left| \frac{p'}{q'} -x \right|.

Thus among all fractions with denominator at most 7, the fraction 22/7 is closest to π. Among all fractions with denominator at most 113, the fraction 355/113 is closest to π. And so on.

Every convergent is a best rational approximation, but these aren’t all of the best rational approximations! The sequence of best approximations of π is actually 3/1, 13/4, 16/5, 19/6, 22/7, 179/57, 201/64, 223/71, 245/78, 267/85, 289/92, 311/99, 333/106, 355/113, 52163/16604… (A063674/A063673), where only the bold terms are convergents. (Aside: Notice how good 355/113 is: it is the best among all fractions with denominator at most 16603. This is because of the large term 292 in the continued fraction of π.) The others are semi-convergents (intermediate fractions), as explained below.

In general, for any real number, the convergents p_k/q_k of its continued fraction are always best rational approximations, but they aren’t all of the best rational approximations. To get all of them, you also have to take the semi-convergents: fractions of the form (p_k + np_{k+1})/(q_k +nq_{k+1}) for some integer n≥1. Specifically, taking n=a_{k+2} — the next term in the continued fraction expansion — gives p_{k+2}/q_{k+2}, and the ones to take are precisely n from \lceil a_{k+2}/2 \rceil (if it’s better than the previous convergent, else \lfloor a_{k+2}/2 \rfloor + 1) to a_{k+2}. This is also mentioned on Wikipedia.

Thus for π = [3; 7, 15, 1, 292, 1, 1, 1, 2, 1, 3, 1, 14, 2…], the best rational approximations are:

3/1     = [3]

13/4    = [3; 4]
16/5    = [3; 5]
19/6    = [3; 6]
22/7    = [3; 7]

179/57  = [3; 7, 8]
201/64  = [3; 7, 9]
223/71  = [3; 7, 10]
245/78  = [3; 7, 11]
267/85  = [3; 7, 12]
289/92  = [3; 7, 13]
311/99  = [3; 7, 14]
333/106 = [3; 7, 15]

355/113 = [3; 7, 15, 1]

52163/16604  = [3; 7, 15, 1, 146]
52518/16717  = [3; 7, 15, 1, 147]
… (terms from [3; 7, 15, 1, 148] to [3; 7, 15, 1, 291])...
103993/33102 = [3; 7, 15, 1, 292]

104348/33215 = [3; 7, 15, 1, 292, 1]
...

The above is the natural meaning of “best rational approximation”. This is also known as “best rational approximation of the first kind”. There is a different definition of “best rational approximation” under which the convergents are precisely all the best rational approximations: basically, instead of considering the distance |x - p/q|, we consider |qx - p|. That is: A fraction \frac{a}{b} is a best approximation of the second kind of a real number x if for every fraction \frac{c}{d} with 1 \le d \le b and a/b \neq c/d, we have |dx - c| > |bx - a|
This measure turns out to have a nicer theory, but to omit specifying that this non-obvious definition is the one being used is misleading.

Program

For illustration, here’s a C program (why it’s written in C is not important right now) that given a positive real number, generates its continued fraction, its convergents, and the sequence of best rational approximations. The function find_cf finds the continued fraction (putting the terms in a[] and the convergents in p[] and q[] — excuse the global variables), and the function all_best prints all the best rational approximations.

#include <math.h>
#include <stdio.h>
#include <assert.h>

// number of terms in continued fraction.
// 15 is the max without precision errors for M_PI
#define MAX 15
#define eps 1e-9

long p[MAX], q[MAX], a[MAX], len;
void find_cf(double x) {
  int i;
  //The first two convergents are 0/1 and 1/0
  p[0] = 0; q[0] = 1;
  p[1] = 1; q[1] = 0;
  //The rest of the convergents (and continued fraction)
  for(i=2; i<MAX; ++i) {
    a[i] = lrint(floor(x));
    p[i] = a[i]*p[i-1] + p[i-2];
    q[i] = a[i]*q[i-1] + q[i-2];
    printf("%ld:  %ld/%ld\n", a[i], p[i], q[i]);
    len = i;
    if(fabs(x-a[i])<eps) return;
    x = 1.0/(x - a[i]);
  }
}

void all_best(double x) {
  find_cf(x); printf("\n");
  int i, n; long cp, cq;
  for(i=2; i<len; ++i) {
    //Test n = a[i+1]/2 separately. Enough to test when a[i+1] is even, actually.
    n = a[i+1]/2; cp = n*p[i]+p[i-1]; cq = n*q[i]+q[i-1];
    if(fabs(x-(double)cp/cq) < fabs(x-(double)p[i]/q[i])) printf("%ld/%ld,", cp, cq);
    //And print all the rest, no need to test
    for(n = (a[i+1]+2)/2; n<=a[i+1]; ++n) {
      printf("%ld/%ld,", n*p[i]+p[i-1], n*q[i]+q[i-1]);
    }
  }
}

int main(int argc, char **argv) {
  double x;
  if(argc==1) { x = M_PI; } else { sscanf(argv[1], "%lf", &x); }
  assert(x>0); printf("%.15lf\n\n", x);
  all_best(x); printf("\n");
  return 0;
}

Examples

Here’s the output of this program, in about 0.003 seconds (i.e., it’s truly better than looping through all possible denominators!), line-wrapped for readability:

% ./a.out
3.141592653589793
 
3:  3/1
7:  22/7
15:  333/106
1:  355/113
292:  103993/33102
1:  104348/33215
1:  208341/66317
1:  312689/99532
2:  833719/265381
1:  1146408/364913
3:  4272943/1360120
1:  5419351/1725033
14:  80143857/25510582
 
13/4, 16/5, 19/6, 22/7, 179/57, 201/64, 223/71, 245/78, 267/85, 289/92, 311/99,
333/106, 355/113, 52163/16604, 52518/16717, 52873/16830, 53228/16943, 53583/17056,
53938/17169, 54293/17282, 54648/17395, 55003/17508, 55358/17621, 55713/17734,
56068/17847, 56423/17960, 56778/18073, 57133/18186, 57488/18299, 57843/18412,
58198/18525, 58553/18638, 58908/18751, 59263/18864, 59618/18977, 59973/19090,
60328/19203, 60683/19316, 61038/19429, 61393/19542, 61748/19655, 62103/19768,
62458/19881, 62813/19994, 63168/20107, 63523/20220, 63878/20333, 64233/20446,
64588/20559, 64943/20672, 65298/20785, 65653/20898, 66008/21011, 66363/21124,
66718/21237, 67073/21350, 67428/21463, 67783/21576, 68138/21689, 68493/21802,
68848/21915, 69203/22028, 69558/22141, 69913/22254, 70268/22367, 70623/22480,
70978/22593, 71333/22706, 71688/22819, 72043/22932, 72398/23045, 72753/23158,
73108/23271, 73463/23384, 73818/23497, 74173/23610, 74528/23723, 74883/23836,
75238/23949, 75593/24062, 75948/24175, 76303/24288, 76658/24401, 77013/24514,
77368/24627, 77723/24740, 78078/24853, 78433/24966, 78788/25079, 79143/25192,
79498/25305, 79853/25418, 80208/25531, 80563/25644, 80918/25757, 81273/25870,
81628/25983, 81983/26096, 82338/26209, 82693/26322, 83048/26435, 83403/26548,
83758/26661, 84113/26774, 84468/26887, 84823/27000, 85178/27113, 85533/27226,
85888/27339, 86243/27452, 86598/27565, 86953/27678, 87308/27791, 87663/27904,
88018/28017, 88373/28130, 88728/28243, 89083/28356, 89438/28469, 89793/28582,
90148/28695, 90503/28808, 90858/28921, 91213/29034, 91568/29147, 91923/29260,
92278/29373, 92633/29486, 92988/29599, 93343/29712, 93698/29825, 94053/29938,
94408/30051, 94763/30164, 95118/30277, 95473/30390, 95828/30503, 96183/30616,
96538/30729, 96893/30842, 97248/30955, 97603/31068, 97958/31181, 98313/31294,
98668/31407, 99023/31520, 99378/31633, 99733/31746, 100088/31859, 100443/31972,
100798/32085, 101153/32198, 101508/32311, 101863/32424, 102218/32537, 102573/32650,
102928/32763, 103283/32876, 103638/32989, 103993/33102, 104348/33215, 208341/66317,
312689/99532, 833719/265381, 1146408/364913, 3126535/995207,
4272943/1360120, 5419351/1725033, 42208400/13435351, 47627751/15160384,
53047102/16885417, 58466453/18610450, 63885804/20335483, 69305155/22060516,
74724506/23785549, 80143857/25510582, 

All these terms are correct, though if you increase MAX you start getting errors because of precision. I’m myself impressed with how many terms you get with only 13 convergents. (As you can see, there’s a small bug where it sometimes doesn’t print the very first “n/1″ approximation, or prints it incorrectly — I leave it for you to fix!)

You can try with √2, whose continued fraction is [1; 2, 2, 2, 2…]:

% ./a.out 1.41421356237309504880
1.414213562373095
 
1:  1/1
2:  3/2
2:  7/5
2:  17/12
2:  41/29
2:  99/70
2:  239/169
2:  577/408
2:  1393/985
2:  3363/2378
2:  8119/5741
2:  19601/13860
2:  47321/33461
 
3/2, 4/3, 7/5, 17/12, 24/17, 41/29, 99/70, 140/99, 239/169, 577/408, 816/577, 1393/985, 3363/2378, 4756/3363, 8119/5741, 19601/13860, 47321/33461, 

Or the golden ratio φ = (1+√5)/2 whose continued fraction is [1; 1, 1, 1, …]:

% ./a.out 1.61803398874989484820
1.618033988749895
 
1:  1/1
1:  2/1
1:  3/2
1:  5/3
1:  8/5
1:  13/8
1:  21/13
1:  34/21
1:  55/34
1:  89/55
1:  144/89
1:  233/144
1:  377/233
 
2/1, 3/2, 5/3, 8/5, 13/8, 21/13, 34/21, 55/34, 89/55, 144/89, 233/144, 377/233, 

(See the Fibonacci numbers? Here the convergents are all the approximants.)

Or with rational numbers like 4/3 = [1; 3]:

% ./a.out 1.33333333333333333333
1.333333333333333
 
1:  1/1
3:  4/3
 
3/2, 4/3, 

or 14/11 = [1; 3, 1, 2]:

% ./a.out 1.27272727272727272727
1.272727272727273
 
1:  1/1
3:  4/3
1:  5/4
2:  14/11
 
3/2, 4/3, 5/4, 9/7, 14/11, 

[2011-11-04: Fixed the bug and the description.]

Written by S

Mon, 2011-01-10 at 02:28:23 +05:30

Posted in mathematics

Shor’s algorithm

leave a comment »

I’d learnt Shor’s algorithm in a course a few years ago, but (as usual) forgot all about it.

Yesterday I reread it from the Algorithms textbook of Dasgupta, Papadimitriou, and Vazirani, only because someone claimed that their Chapter 10 on Quantum algorithms was incomprehensible, so here is a very vague sketch of it before I forget it again. (For a popular account, which Peter Shor called “the best job of explaining quantum computing to the man on the street that I’ve seen”, see Scott Aaronson’s post.)

1. Quantum

The quantum magic in Shor’s algorithm, the part that uses something unique to quantum computation, is simply the following fact: a quantum computer can perform the quantum Fourier transform in O(log2 N) time.

That is, given a “superposition” |\boldsymbol\alpha\rangle = \sum_{j=0}^{N-1}\alpha_j |j\rangle, in O(log2N) steps the quantum fourier-magic-thing produces the superposition |\boldsymbol\beta\rangle = \sum_{j=0}^{N-1} \beta_j |j\rangle with \beta_j = \frac{1}{\sqrt{N}} \sum_{l=0}^{N-1} \omega^{jl}\alpha_l. Of course, it just produces this superposition and does not output it; if we try to measure this we’d get only one of the N states, specifically |j\rangle with probability |\beta_j|^2.

(I will not write here how this QFT works. It uses the Hadamard gate etc., and roughly, what would be two recursive calls in the usual FFT can be done with just one operation here.)

All the other ideas are classical (i.e. non-quantum).

2. Fourier transform of periodic vectors

Suppose k is some factor of N. It can be proved with some straightforward calculation that if |\boldsymbol\alpha\rangle (viewed as a vector/sequence) is periodic with period k, with the only nonzero coefficients being \alpha_l, \alpha_{l+k}, \alpha_{l+2k}, \dots, then its Fourier transform is also periodic with period N/k — the only nonzero coefficients are \beta_0, \beta_{N/k}, \beta_{2N/k}, \dots.

The point is that if we “measure” this, then we get one of the multiples of N/k at random, so by taking a few measurements, we can figure out N/k and therefore k.

3. Factoring reduces to order-finding

We can see from number theory (use the Chinese remainder theorem) that if N is composite (etc.), then with probability at least 1/2, a random x modulo N has an even order r (i.e. the smallest positive integer such that x^r = 1 \mod N) such that x^{r/2} is a nontrivial square root of 1 mod N. So if we can find the order r, then we have y such that y^2 = 1\mod N so (y-1)(y+1) = 0\mod N with neither factor being 0, so we get a factor of N. So all we need to be able to do is find the order.

Modulo N, pick any x and consider the function f(k) = x^k \bmod N. If r is the order of x mod N then the function is periodic with period r. The algorithm runs as follows:

  1. First, make a superposition \sum_{j=0}^{N-1} |j, 0\rangle normalised. (Think of the |\cdot,\cdot\rangle as two quantum “registers”.) (This is made by starting with just |0,0\rangle and taking its Fourier transform!)
  2. Now “compute” f (for some random x) on it. This gives the superposition \sum_{j=0}^{N-1} |j, f(j)\rangle.
  3. Then measure the second register. This gives a random value from the second register, but more importantly, it make the first register be a superposition of only those values that are compatible with this value — therefore, periodic with period r.
  4. Now do the Fourier transform on the first and measure it, so we get a random multiple of N/r.

Doing all this the right number of times with the right constants does it. I’ve cheated in all this, because obviously the N we’re trying to factor is not a power of 2, but that can be taken care of.

I think Shor’s big idea (over Simon’s algorithm for period-finding) was in coming up with this function f whose period would give the factors of N. But I may be wrong.

Written by S

Thu, 2010-07-08 at 04:22:52 +05:30

Posted in mathematics

Equivalent forms of the Riemann hypothesis

with 3 comments

  1. Let H_n be the nth harmonic number, i.e. H_n = 1 + \frac12 + \frac13 + \dots + \frac1n. Then, the Riemann hypothesis is true if and only if

    \displaystyle \sum_{d | n}{d} \le H_n + \exp(H_n)\log(H_n)

    The left-hand side, which is the sum of the divisors of n, is also denoted \sigma(n).
    See Jeffrey Lagarias, An Elementary Problem Equivalent to the Riemann Hypothesis [PDF, arXiv.

  2. Define the Redheffer matrix A = A(n) to be the n \times n 0-1 matrix with entries A_{ij} = 1 \text{ if } j=1 \text{ or } i \text{ divides } j (and 0 otherwise). Then the Riemann hypothesis is true if and only if \det(A) = O(n^{1/2+\epsilon}) for all \epsilon.

For more, see Equivalences to the Riemann hypothesis (by J. Brian Conrey and David W. Farmer), and Consequences of the Riemann hypothesis (MathOverflow)

For fun: claimed [dis/]proofs.

Quote found via rjlipton:

The Riemann Hypothesis is the most basic connection between addition and multiplication that there is, so I think of it in the simplest terms as something really basic that we don’t understand about the link between addition and multiplication.
Brian Conrey

Written by S

Tue, 2010-07-06 at 22:03:54 +05:30

Posted in mathematics

Convex, continuous, Jensen’s

with 3 comments

On the principle that things I’ve forgotten at least thrice are worth remembering…

A convex function (need a mnemonic?) is f such that
\displaystyle f(\lambda x + (1-\lambda)y) \le \lambda f(x) + (1-\lambda)f(y) for any λ in [0,1].

Midpoint convex (almost) implies convex
A natural question is whether we need all λ in [0,1]; what about just ½? Functions which satisfy
\displaystyle f(\frac{x+y}{2}) \le \frac{f(x)+f(y)}{2}
are called midpoint convex.

It turns out that when things are “nice” as in most frequently encountered cases, a midpoint convex function is convex. (A counterexample is this crazy non-measurable nowhere-continuous bounded-in-no-interval function: since R is a vectorspace over Q, there is a basis (that includes π, say), and any real number x has a unique representation as a finite rational sum of basis elements. Define f(x) to be the coefficient of π in the representation of x. This function is in fact linear over Q.)
For midpoint convex to mean convex, any of the following conditions will do:

  • f is continuous [Proof: prove the inequality for λ with denominators that are powers of 2, then use denseness in R.]
  • f is continuous at some point (any one point will do) [Proof: see next section]
  • f is (Lebesgue) measurable
  • f is bounded on some measurable set with positive measure

Convex functions are continuous
In fact, a stronger theorem is true: If a midpoint convex function is discontinuous at a single point, then it is unbounded on every interval and (hence) discontinuous everywhere.
Proof: Suppose wlog that f is discontinuous at 0, that f(0)=0, and that a sequence x_n \rightarrow 0 has f(x_n) \rightarrow m for some positive m. Prove that \liminf f(2^k x_n) \ge 2^k m, so f is unbounded near 0. Then use this to show unbounded everywhere.

Note that convex functions are bounded on every interval, so by above theorem they are continuous.

Convex functions are differentiable almost everywhere
In fact a convex function has left and right derivatives at all points, these are monotonically increasing, and there are only countably many points where they don’t coincide.

All the above is from pages 10 to 16 of this book I had found through googling; if anyone’s reading this, please tell me if you know better proofs.

A proof of Jensen’s inequality
Here, Jensen’s inequality is the statement that for a convex function f,
\displaystyle \mathbb{E}\left[f(X)\right] \geq f\left(\mathbb{E}\left[X\right]\right) .

An equivalent definition of a convex function is that at any point we can find a line through the point that lies entirely below the function. (Why?) In other words, for any point a, there is a “slope” m such that
\displaystyle f(x) \ge f(a) + m(x-a) for all x.
With this definition, Jensen’s inequality has an absurdly short proof: take a to be the mean \mathbb{E}\left[X\right], then take expectation.

Written by S

Tue, 2010-06-29 at 02:32:05 +05:30

Posted in mathematics

Identify

leave a comment »

What’s this?

Statement

Seeing more should help:

If that’s too easy, how about this?

Statement and proof

Both are from Rekhāgaṇita, which is a c. 1720s translation by Jagannatha of Nasir al-Din al-Tusi’s 13th-century Arabic translation of Euclid’s Elements. It seems to be a straightforward translation of the Arabic — it even uses, to label vertices etc., letters in the Arabic order अ ब ज द ह व झ…. The text retains most of the structure and proposition numbers of Euclid, but in fact the Arabic world has considerably elaborated on Euclid. For instance, for the famous first example above, it gives sixteen additional proofs/demonstrations, which are not in the Greek texts.

The first volume (Books I to VI) is available on Google Books and the second volume (Books VII to XV) through wilbourhall.org (25 MB PDF).

Notes on the second: some technical vocabulary — a प्रथमाङ्कः is a prime number, and a रूप is a unit (one). The rest of the vocabulary, like “निःशेषं करोति” meaning “divides (without remainder)”, is more or less clear, and also the fact that as in Euclid, numbers are being conceived of as lengths (so दझ and झद mean the same).

It does sound cooler to say “इदमशुद्धम्” than “But this is a contradiction”. :-)

Written by S

Wed, 2010-05-12 at 21:20:19 +05:30

Posted in mathematics, sanskrit

Tagged with ,

Some incredibly beautiful things

with 2 comments

First, a totally completely absolutely incredible fact about convergents of continued fractions: the Khinchin-Levy theorem, seen here: For almost all real numbers \alpha, the denominators q_n of the continued-fraction convergents of \alpha satisfy

\displaystyle \lim_{n \to \infty} q_n^{1/n} = e^{\pi^2/(12 \log 2)}

Why should such a thing be a true?! Why should the limit even exist, why should it be the same for almost all real numbers, and where do e, \pi, \log 2 and 12 come from?!!

Intrigued by these questions, and embarrassed by my near-complete ignorance of continued fractions (each time I had encountered them before, I had been intimidated by the notation and skipped pages), I started reading a couple of books on continued fractions (including Khinchin’s)… other things intervened and I had to give up for now, but I hope to return to their study soon. But from what I’ve already seen, continued fractions are awesome, and it is a real shame they seem to be out of fashion these days and not more widely taught and known. Neither of these books proves the above theorem actually, but an editor’s footnote in Khinchin’s book gives the reference: it was proved by Khinchin in 1935 that the limit exists and is the same for almost all real numbers, and Lévy found the constant soon after (published 1937).

I suspect (based on nothing at all) that it bears some relation to, among other things, (some generalization of) the following well-known fact, itself quite surprising (probably the most surprising fact I know a proof of):

\displaystyle \sum_{n=1}^{\infty} \frac{1}{n^2} = \frac{\pi^2}{6} = \prod_{p \text{ prime}}{\left(1-\frac1{p^2}\right)}

This fact is known as the Basel problem, was proved by (who else?) Euler, and fourteen proofs of it are collected here. But let this not distract you from reading about continued fractions!


Second, pointed out at John Baez’s This Week’s Finds in Mathematical Physics (Week 230), and from this paper by Sondow, are these (quoting Prof. Baez directly, just typeset in LaTeX):

eerily similar formulas for some of our favorite math constants. First, one for e:

\displaystyle e = \left(\frac{2}{1}\right)^{1/1} \left(\frac{2^2}{1\times3}\right)^{1/2} \left(\frac{2^3\times4}{1\times3^3}\right)^{1/3}  \left(\frac{2^4\times4^4}{1\times3^6\times5}\right)^{1/4} \dots

Then, one for π/2:

\displaystyle \frac{\pi}{2} = \left(\frac{2}{1}\right)^{1/2} \left(\frac{2^2}{1\times3}\right)^{1/4} \left(\frac{2^3\times4}{1\times3^3}\right)^{1/8} \left(\frac{2^4\times4^4}{1\times3^6\times5}\right)^{1/16} \dots

Then one for eγ, where γ is Euler’s constant:

\displaystyle e^\gamma = \left(\frac{2}{1}\right)^{1/2} \left(\frac{2^2}{1\times3}\right)^{1/3} \left(\frac{2^3\times4}{1\times3^3}\right)^{1/4} \left(\frac{2^4\times4^4}{1\times3^6\times5}\right)^{1/5} \dots

He also points out Wallis’ product for π/2 and Pippenger’s for e:

\displaystyle \frac{\pi}{2} = \left(\frac{2}{1}\right)^{1/1} \left(\frac{2\times 4}{3\times3}\right)^{1/1} \left(\frac{4\times6\times6\times8}{5\times5\times7\times7}\right)^{1/1} \dots

\displaystyle  e = \left(\frac{2}{1}\right)^{1/2} \left(\frac{2\times 4}{3\times3}\right)^{1/4} \left(\frac{4\times6\times6\times8}{5\times5\times7\times7}\right)^{1/8} \dots

What does it all mean? I haven’t a clue! Another mystery thrown down to us by the math gods, like a bone from on high… we can merely choose to chew on it or not, as we wish.

Indeed!


Third, my favourite way to prove the Möbius inversion formula. Well, it’s actually the only way I know, besides induction, but I think it’s beautiful. (Ignore if you already know what Dirichlet convolutions are.) It’s from Herbert Wilf’s generatingfunctionology, one of my favourite books.

Let us consider a kind of generating function known as a Dirichlet generating function. (“A generating function is a clothesline on which we hang up a sequence of numbers for display.”) For a sequence a_n, the Dirichlet generating function is defined as the (formal) series (in s)

\displaystyle G_a(s) = \sum_{n=1}^{\infty} \frac{a_n}{n^s} = a_1 + \frac{a_2}{2^s} + \frac{a_3}{3^s} + \frac{a_4}{4^s} + \dots.

(A formal series means that we can ignore issues of convergence etc.) Now if we have the generating functions G_a(s) and G_b(s) for two sequences a_n and b_n, what is G_a(s)G_b(s)? Well, it’s easy to see that the coefficient of 1/n^s in G_a(s)G_b(s), which comes exactly from the terms of the form i^sj^s with ij=n, is \displaystyle \sum_{ij=n} a_ib_j, or equivalently \displaystyle \sum_{d \vert n} a_db_{n/d}. So the product is the generating function for a special kind of convolution, the sequence c_n defined as

\displaystyle c_n = \sum_{d \vert n} a_db_{n/d}

Let us call this fact “(*)” and put it aside for now; we will use it shortly. Instead, let us look at the generating function for a very special sequence, the sequence of all 1s. It is, by definition, \displaystyle \sum_{n\ge 1} \frac{1}{n^s}. While we could call this G_1(s), it goes by the more familiar name \zeta(s) — it is (when viewed as a function instead of a formal series) the Riemann zeta function. But we also know (this is the Euler product formula, and follows simply from the unique factorisation of integers):

\displaystyle \zeta(s) = \sum_{n\ge1}\frac{1}{n^s} = \prod_{p\text{ prime}}\left(1+\frac1{p^s} + \frac1{p^{2s}} + \frac1{p^{3s}} + \dots\right) = \prod_{p\text{ prime}}\frac1{\left(1-\frac1{p^s}\right)}.

This means that

\displaystyle \frac1{\zeta(s)} = \prod_{p\text{ prime}}{\left(1-\frac1{p^s}\right)} = G_\mu(s).

Why did I write G_\mu(s) on the right? Is this a generating function of anything? Why, yes! By expanding the product, you can see that it is the generating function for the sequence \mu(n), defined as

\displaystyle \mu(s) = \begin{cases} 0 \quad \text{if }n\text{ is not squarefree,} \\ (-1)^k \quad \text{if }n = p_1p_2\dots p_k\text{ is the product of k primes}\end{cases}.

(This function “mu” is known as the “Möbius function”.) With this background, the Möbius inversion formula falls out entirely naturally: if G_f(s) is the generating function of a sequence f(n), and the function F(n) is defined as

\displaystyle F(n) = \sum_{d\vert n} f(d),

then its generating function G_F(s) is, by (*), simply

\displaystyle G_F(s) = G_f(s)\zeta(s),

which means that

\displaystyle G_f(s) = G_F(s)\frac{1}{\zeta(s)} = G_F(s)G_\mu(s),

or in other words, again by (*),

\displaystyle f(n) = \sum_{d\vert n} F(d)\mu(n/d),

which is precisely the Möbius inversion formula. How cool is that?

Written by S

Fri, 2010-04-30 at 01:12:02 +05:30

Posted in mathematics

On “trivial” in mathematics

with 6 comments

One aspect of mathematicians’ vocabulary that non-mathematicians often find non-intuitive is the word “trivial”. Mathematicians seem to call a great many things “trivial”, most of which are anything but. Here’s a joke, pointed out to me by Mohan:

Two mathematicians are discussing a theorem. The first mathematician says that the theorem is “trivial”. In response to the other’s request for an explanation, he then proceeds with two hours of exposition. At the end of the explanation, the second mathematician agrees that the theorem is trivial.

Like many jokes, this is not far from the truth. This tendency has led others to say, for example, that

In mathematics, there are only two kinds of proofs: Trivial ones, and undiscovered ones.

Or as Feynman liked to say, “mathematicians can prove only trivial theorems, because every theorem that’s proved is trivial”. (The mathematicians to whom Feynman said this do not seem to have liked the statement.)

A little examination, however, shows that all this is not far from reality, and indeed not far from the ideal of mathematics. Consider these excerpts from Gian-Carlo Rota‘s wonderful Indiscrete Thoughts:

“eventually every mathematical problem is proved trivial. The quest for ultimate triviality is characteristic of the mathematical enterprise.” (p.93)

“Every mathematical theorem is eventually proved trivial. The mathematician’s ideal of truth is triviality, and the community of mathematicians will not cease its beaver-like work on a newly discovered result until it has shown to everyone’s satisfaction that all difficulties in the early proofs were spurious, and only an analytic triviality is to be found at the end of the road.” (p. 118, in The Phenomenology of Mathematical Truth)

Are there definitive proofs?
It is an article of faith among mathematicians that after a new theorem is discovered, other simpler proofs of it will be given until a definitive one is found. A cursory inspection of the history of mathematics seems to confirm the mathematician’s faith. The first proof of a great many theorems is needlessly complicated. “Nobody blames a mathematician if the first proof of a new theorem is clumsy”, said Paul Erdős. It takes a long time, from a few decades to centuries, before the facts that are hidden in the first proof are understood, as mathematicians informally say. This gradual bringing out of the significance of a new discovery takes the appearance of a succession of proofs, each one simpler than the preceding. New and simpler versions of a theorem will stop appearing when the facts are finally understood. (p.146, in The Phenomenology of Mathematical Proof, here/here).

For more context, the section titled “Truth and Triviality” is especially worth reading, where he gives the example of proofs of the prime number theorem, starting with Hadamard and de la Vallée Poussin, through Wiener, Erdős and Selberg, and Levinson. Also p. 119, where he feels this is not unique to mathematics:

Any law of physics, when finally ensconced in a proper mathematical setting, turns into a mathematical triviality. The search for a universal law of matter, a search in which physics has been engaged throughout this century, is actually the search for a trivializing principle, for a universal “‘nothing but.” [...] The ideal of all science, not only of mathematics, is to do away with any kind of synthetic a posteriori statement and to leave only analytic trivialities in its wake. Science may be defined as the transformation of synthetic facts of nature into analytic statements of reason.

So to correct the earlier quote, there are three kinds of proofs from the mathematician’s perspective: those that are trivial, those that have not been discovered, and those niggling ones that are not yet trivial.

Written by S

Sat, 2010-04-03 at 14:40:30 +05:30

Posted in mathematics

Tagged with ,

Euler, pirates, and the discovery of America

leave a comment »

Is there nothing Euler wasn’t involved in?!

That rhetorical question is independent of the following two, which are exceedingly weak connections.

“Connections” to piracy: Very tenuous connections, of course, but briefly, summarising from the article:

  1. Maupertuis: President of the Berlin Academy for much of the time Euler was there. His father got a license from the French king to attack English ships, made a fortune, and retired. Maupertuis is known for formulating the Principle of Least Action (but maybe it was Euler), and best known for taking measurements showing the Earth bulges at the equator as Newton had predicted, thus “The Man Who Flattened the Earth”.
  2. Henry Watson: English privateer living in India, lost a fortune to the scheming British East India Company. Wanted to be a pirate, but wasn’t actually one. Known for: translated Euler’s Théorie complette [E426] from its original French: A complete theory of the construction and properties of vessels: with practical conclusions for the management of ships, made easy to navigators. (Yes, Euler wrote that.)
  3. Kenelm Digby: Not connected to Euler actually, just the recipient of a letter by Fermat in which a problem that was later solved by Euler was discussed. Distinguished alchemist, one of the founders of the Royal Society, did some pirating (once) and was knighted for it.
  4. Another guy, nevermind.

Moral: The fundamental interconnectedness of all things. Or, connections don’t mean a thing.

The discovery of America: Columbus never set foot on the mainland of America, and died thinking he had found a shorter route to India and China, not whole new continents that were in the way. The question remained whether these new lands were part of Asia (thus, “Very Far East”) or not. The czar of Russia (centuries later) sent Bering to determine the bounds of Russia, and the Bering Strait separating the two continents was discovered and reported back: America was not part of Russia. At about this time, there were riots in Russia, there was nobody to make the announcement, and “Making the announcement fell to Leonhard Euler, still the preeminent member of the St. Petersburg Academy, and really the only member who was still taking his responsibilities seriously.” As the man in charge of drawing the geography of Russia, Euler knew a little, and wrote a letter to Wetstein, member of the Royal Society in London. So it was only through Euler that the world knew that the America that was discovered was new. This letter [E107], with others, is about the only work of Euler in English. That Euler knew English (surprisingly!) is otherwise evident from the fact that he translated and “annotated” a book on ballistics by the Englishman Benjamin Robins. The original was 150 pages long; with Euler’s comments added, it was 720. [E77, translated back into English as New principles of gunnery.]

Most or all of the above is from Ed Sandifer’s monthly column How Euler Did It.

The works of Leonhard Euler online has pages for all 866 of his works; 132 of them are available in English, including the translations from the Latin posted by graduate student Jordan Bell on the arXiv. They are very readable.

This includes his Letters to a German Princess on various topics in physics and philosophy [E343,E344,E417], which were bestsellers when reprinted as science books for a general audience. It includes his textbook, Elements of Algebra [E387,E388]. Find others on Google Books. The translations do not seem to include (among his other books) his classic textbook Introductio in analysin infinitorum [E101,E102, "the foremost textbook of modern times"], though there are French and German translations available.

Apparently, Euler’s Latin is (relatively) not too hard to follow.

Written by S

Mon, 2010-03-01 at 23:04:23 +05:30

Keeping up with the Schwar(t)zes

leave a comment »

For future reference, reminded by a recent webcomic:

Those are the major ones, I think.

Omissions? Errors?

Written by S

Thu, 2010-02-11 at 22:35:46 +05:30

Posted in mathematics

Tagged with , ,

Is it important to “understand” first?

with 11 comments

The Oxford University Press has been publishing a book series known as “Very Short Introductions”. These slim volumes are an excellent idea, and cover over 200 topics already. The volume Mathematics: A Very Short Introduction is written by Timothy Gowers.

Gowers is one of the leading mathematicians today, and a winner of the Fields Medal (in 1998). In addition to his research work, he has also done an amazing amount of service to mathematics in other ways. He edited the 1000-page Princeton Companion to Mathematics, getting the best experts to write, and writing many articles himself. He also started the Polymath project and the Tricki, the “tricks wiki”. You can watch his talk on The Importance of Mathematics (with slides) (transcript), and read his illuminating mathematical discusssions, and his blog. His great article The Two Cultures of Mathematics is on the “theory builders and problem solvers” theme, and is a paper every mathematician should read.

Needless to say, “Mathematics: A Very Short Introduction” is a very good read. Unlike many books aimed at non-mathematicians, Gowers is quite clear that he does “presuppose some interest on the part of the reader rather than trying to drum it up myself. For this reason I have done without anecdotes, cartoons, exclamation marks, jokey chapter titles, or pictures of the Mandelbrot set. I have also avoided topics such as chaos theory and Godel’s theorem, which have a hold on the public imagination out of proportion to their impact on current mathematical research”. What follows is a great book that particularly excels at describing what it is that mathematicians do. Some parts of the book, being Gowers’s personal views on the philosophy of mathematics, might not work very well when directed at laypersons, not because they require advanced knowledge, but assume a culture of mathematics. Doron Zeilberger thinks that this book “should be recommended reading to everyone and required reading to mathematicians”.

Its last chapter, “Some frequently asked questions”, carries Gowers’s thoughts on some interesting questions. With whole-hearted apologies for inserting my own misleading “summaries” of the answers in brackets, they are the following: “1.Is it true that mathematicians are past it by the time they are 30?” (no), “2. Why are there so few women mathematicians?” (puzzling and regrettable), “3. Do mathematics and music go together?” (not really), “4. Why do so many people positively dislike mathematics?” (more on this below), “5. Do mathematicians use computers in their work?” (not yet), “6. How is research in mathematics possible?” (if you have read this book you won’t ask), “7. Are famous mathematical problems ever solved by amateurs?” (not really), “8. Why do mathematicians refer to some theorems and proofs as beautiful?” (already discussed. Also, “One difference is that [...] a mathematician is more anonymous than an artist. [...] it is, in the end, the mathematics itself that delights us”.) As I said, you should read the book itself, not my summaries.

The interesting one is (4).

4. Why do so many people positively dislike mathematics?

One does not often hear people saying that they have never liked biology, or English literature. To be sure, not everybody is excited by these subjects, but those who are not tend to understand perfectly well that others are. By contrast, mathematics, and subjects with a high mathematical content such as physics, seem to provoke not just indifference but actual antipathy. What is it that causes many people to give mathematical subjects up as soon as they possibly can and remember them with dread for the rest of their lives?

Probably it is not so much mathematics itself that people find unappealing as the experience of mathematics lessons, and this is easier to understand. Because mathematics continually builds on itself, it is important to keep up when learning it. For example, if you are not reasonably adept at multiplying two-digit numbers together,then you probably won’t have a good intuitive feel for the distributive law (discussed in Chapter 2). Without this, you are unlikely to be comfortable with multiplying out the brackets in an expression such as (x+2)(x+3), and then you will not be able to understand quadratic equations properly. And if you do not understand quadratic equations, then you will not understand why the golden ratio is \frac{1+\sqrt{5}}{2}.

There are many chains of this kind, but there is more to keeping up with mathematics than just maintaining technical fluency. Every so often, a new idea is introduced which is very important and markedly more sophisticated than those that have come before, and each one provides an opportunity to fall behind. An obvious example is the use of letters to stand for numbers, which many find confusing but which is fundamental to all mathematics above a certain level. Other examples are negative numbers, complex numbers, trigonometry, raising to powers, logarithms, and the beginnings of calculus. Those who are not ready to make the necessary conceptual leap when they meet one of these ideas will feel insecure about all the mathematics that builds on it. Gradually they will get used to only half understanding what their mathematics teachers say, and after a few more missed leaps they will find that even half is an overestimate. Meanwhile, they will see others in their class who are keeping up with no difficulty at all. It is no wonder that mathematics lessons become, for many people, something of an ordeal.

This seems to be exactly the right reason. No one would enjoy being put through drudgery that they were not competent at, and without the beauty at the end of the pursuit being apparent. (I hated my drawing classes in school, too.) See also Lockhart’s Lament, another article that everyone — even, or especially, non-mathematicians — should read.

As noted earlier, Gowers has some things to say about the philosophy of mathematics. As is evident from his talk “Does mathematics need a philosophy?” (also typeset as essay 10 of 18 Unconventional Essays on the Nature of Mathematics), he has rejected the Platonic philosophy (≈ mathematical truths exist, and we’re discovering them) in favour of a formalist one (≈ it’s all just manipulating expressions and symbols, just stuff we do). The argument is interesting and convincing, but I find myself unwilling to change my attitude. Yuri Manin says in a recent interview that “I am an emotional Platonist (not a rational one: there are no rational arguments in favor of Platonism)”, so it’s perhaps just as well.

Anyway, the anti-Platonist / formalist idea of Gowers is evident throughout the book, and of course it has its great side: “a mathematical object is what it does” is his slogan, and most of us can agree that “one should learn to think abstractly, because by doing so many philosophical difficulties disappear” , etc. The only controversial suggestion, perhaps, follows the excerpt quoted above (of “Why do so many people positively dislike mathematics?”):

Is this a necessary state of affairs? Are some people just doomed to dislike mathematics at school? Or might it be possible to teach the subject differently in such a way that far fewer people are excluded from it? I am convinced that any child who is given one-to-one tuition in mathematics from an early age by a good and enthusiastic teacher will grow up liking it. This, of course, does not immediately suggest a feasible educational policy, but it does at least indicate that there might be room for improvement in how mathematics is taught.

One recommendation follows from the ideas I have emphasized in this book. Above, I implicitly drew a contrast between being technically fluent and understanding difficult concepts, but it seems that almost everybody who is good at one is good at the other. And indeed, if understanding a mathematical object is largely a question of learning the rules it obeys rather than grasping its essence, then this is exactly what one would expect — the distinction between technical fluency and mathematical understanding is less clear-cut than one might imagine.

How should this observation influence classroom practice? I do not advocate any revolutionary change — mathematics has suffered from too many of them already — but a small change in emphasis could pay dividends. For example, suppose that a pupil makes the common mistake of thinking that xa+b = xa + xb. A teacher who has emphasized the intrinsic meaning of expressions such as xa will point out that xa+b means a+b xs all multiplied together, which is clearly the same as a of them multiplied together multiplied by b of them multiplied together. Unfortunately, many children find this argument too complicated to take in, and anyhow it ceases to be valid if a and b are not positive integers.

Such children might benefit from a more abstract approach. As I pointed out in Chapter 2, everything one needs to know about powers can be deduced from a few very simple rules, of which the most important is xa+b = xa xb. If this rule has been emphasized, then not only is the above mistake less likely in the first place, but it is also easier to correct: those who make the mistake can simply be told that they have forgotten to apply the right rule. Of course, it is important to be familiar with basic facts such as that x3 means x times x times x, but these can be presented as consequences of the rules rather than as justifications for them.

I do not wish to suggest that one should try to explain to children what the abstract approach is, but merely that teachers should be aware of its implications. The main one is that it is quite possible to learn to use mathematical concepts correctly without being able to say exactly what they mean. This might sound a bad idea, but the use is often easier to teach, and a deeper understanding of the meaning, if there is any meaning over and above the use, often follows of its own accord.

Of course, there is an instinctive reason to immediately reject such a proposal — as the MAA review by Fernando Q. Gouvêa observes, ‘I suspect, however, that there is far too much “that’s the rule” teaching, and far too little explaining of reasons in elementary mathematics teaching. Such a focus on rules can easily lead to students having to remember a huge list of unrelated rules. I fear Gowers’ suggestion here may in fact be counterproductive.’ Nevertheless, the idea that technical fluency may precede and lead to mathematical understanding is worth pondering.

(Unfortunately, even though true, it may not actually help with teaching: in practice, drilling-in “mere” technical fluency can be as unsuccessful as imparting understanding.)

Written by S

Tue, 2009-12-01 at 03:38:31 +05:30

Follow

Get every new post delivered to your Inbox.

Join 43 other followers