The Lumber Room

"Consign them to dust and damp by way of preserving them"

Archive for the ‘mathematics’ Category

A simple puzzle, with a foray into inequivalent expressions

leave a comment »

[Needs cleanup… just dumping here for now.]

Mark Jason Dominus tweeted and later blogged about this puzzle:

From the four numbers [6, 6, 5, 2], using only the binary operations [+, -, *, /], form the number 17.

When he tweeted the first time, I thought about it a little bit (while walking from my desk to the restroom or something like that), but forgot about it pretty soon and didn’t give it much further thought. When he posted again, I gave it another serious try, failed, and so gave up and wrote a computer program.

This is what I thought this time.


Any expression is formed as a binary tree. For example, 28 = 6 + (2 * (5 + 6)) is formed as this binary tree (TODO make a proper diagram with DOT or something):

               6       *
                     2    +
                        5   6

And 8 = (2 + 6) / (6 – 5) is this binary tree:

              +       -
            2   6   6   5

Alternatively, any expression is built up from the 4 given numbers [a, b, c, d] as follows:
Take any two of the numbers and perform any operation on them, and replace the two numbers with the result. Then repeat, until you have only one number, which is the final result.

Thus the above two expressions 28 = 6 + (2 * (5 + 6)) and 8 = (2 + 6) / (6 – 5) can be formed, respectively, as:

  1. Start with [6, 6, 5, 2]. Replace (5, 6) with 5+6=11 to get [6, 11, 2]. Replace (11, 2) with 11*2=22 to get [6, 22]. Replace (6, 22) with 6+22=28, and that’s your result.
  2. Start with [6, 6, 5, 2]. Replace (2, 6) with 2+6=8 to get [8, 6, 5]. Replace (6, 5) with 6-5=1 to get [8, 1]. Replace (8, 1) with 8/1=8 and that’s your result.

So my idea was to generate all possible such expressions out of [6, 6, 5, 2], and see if 17 was one of them. (I suspected it may be possible by doing divisions and going via non-integers, but couldn’t see how.)

(In hindsight it seems odd that my first attempt was to answer whether 17 could be generated, rather than how: I guess at this point, despite the author’s assurance that there are no underhanded tricks involved, I still wanted to test whether 17 could be generated in this usual way, if only to ensure that my understanding of the puzzle was correct.)

Read the rest of this entry »

Written by S

Wed, 2016-07-20 at 00:38:51

Posted in mathematics, unfinished

Multiple ways of understanding

with one comment

In his wonderful On Proof and Progress in Mathematics, Thurston begins his second section “How do people understand mathematics?” as follows:

This is a very hard question. Understanding is an individual and internal matter that is hard to be fully aware of, hard to understand and often hard to communicate. We can only touch on it lightly here.

People have very different ways of understanding particular pieces of mathematics. To illustrate this, it is best to take an example that practicing mathematicians understand in multiple ways, but that we see our students struggling with. The derivative of a function fits well. The derivative can be thought of as:

  1. Infinitesimal: the ratio of the infinitesimal change in the value of a function to the infinitesimal change in a function.
  2. Symbolic: the derivative of x^n is nx^{n-1}, the derivative of \sin(x) is \cos(x), the derivative of f \circ g is f' \circ g * g', etc.
  3. Logical: f'(x) = d if and only if for every \epsilon there is a \delta such that when 0 < |\Delta x| < \delta,

    \left|\frac{f(x+\Delta x) - f(x)}{\Delta x} - d\right| < \epsilon.

  4. Geometric: the derivative is the slope of a line tangent to the graph of the function, if the graph has a tangent.
  5. Rate: the instantaneous speed of f(t), when t is time.
  6. Approximation: The derivative of a function is the best linear approximation to the function near a point.
  7. Microscopic: The derivative of a function is the limit of what you get by looking at it under a microscope of higher and higher power.

This is a list of different ways of thinking about or conceiving of the derivative, rather than a list of different logical definitions. Unless great efforts are made to maintain the tone and flavor of the original human insights, the differences start to evaporate as soon as the mental concepts are translated into precise, formal and explicit definitions.

I can remember absorbing each of these concepts as something new and interesting, and spending a good deal of mental time and effort digesting and practicing with each, reconciling it with the others. I also remember coming back to revisit these different concepts later with added meaning and understanding.

The list continues; there is no reason for it ever to stop. A sample entry further down the list may help illustrate this. We may think we know all there is to say about a certain subject, but new insights are around the corner. Furthermore, one person’s clear mental image is another person’s intimidation:

  1. The derivative of a real-valued function f in a domain D is the Lagrangian section of the cotangent bundle T^{\ast}(D) that gives the connection form for the unique flat connection on the trivial \mathbf{R}-bundle D \times \mathbf{R} for which the graph of f is parallel.

These differences are not just a curiosity. Human thinking and understanding do not work on a single track, like a computer with a single central processing unit. Our brains and minds seem to be organized into a variety of separate, powerful facilities. These facilities work together loosely, “talking” to each other at high levels rather than at low levels of organization.

This has been extended on the MathOverflow question Different ways of thinking about the derivative where you can find even more ways of thinking about the derivative. (Two of the interesting pointers are to this discussion on the n-Category Café, and to the book Calculus Unlimited by Marsden and Weinstein, which does calculus using a “method of exhaustion” that does not involve limits. (Its definition of the derivative is also mentioned at the earlier link, as that notion of the derivative closest to [the idea of Eudoxus and Archimedes] of “the tangent line touches the curve, and in the space between the line and the curve, no other straight line can be interposed”, or “the line which touches the curve only once” — this counts as another important way of thinking about the derivative.)

It has also been best extended by Terence Tao, who in an October 2009 blog post on Grothendieck’s definition of a group gave several ways of thinking about a group:

In his wonderful article “On proof and progress in mathematics“, Bill Thurston describes (among many other topics) how one’s understanding of given concept in mathematics (such as that of the derivative) can be vastly enriched by viewing it simultaneously from many subtly different perspectives; in the case of the derivative, he gives seven standard such perspectives (infinitesimal, symbolic, logical, geometric, rate, approximation, microscopic) and then mentions a much later perspective in the sequence (as describing a flat connection for a graph).

One can of course do something similar for many other fundamental notions in mathematics. For instance, the notion of a group {G} can be thought of in a number of (closely related) ways, such as the following:

  1. Motivating examples: A group is an abstraction of the operations of addition/subtraction or multiplication/division in arithmetic or linear algebra, or of composition/inversion of transformations.
  2. Universal algebraic: A group is a set {G} with an identity element {e}, a unary inverse operation {\cdot^{-1}: G \rightarrow G}, and a binary multiplication operation {\cdot: G \times G \rightarrow G} obeying the relations (or axioms) {e \cdot x = x \cdot e = x}, {x \cdot x^{-1} = x^{-1} \cdot x = e}, {(x \cdot y) \cdot z = x \cdot (y \cdot z)} for all {x,y,z \in G}.
  3. Symmetric: A group is all the ways in which one can transform a space {V} to itself while preserving some object or structure {O} on this space.
  4. Representation theoretic: A group is identifiable with a collection of transformations on a space {V} which is closed under composition and inverse, and contains the identity transformation.
  5. Presentation theoretic: A group can be generated by a collection of generators subject to some number of relations.
  6. Topological: A group is the fundamental group {\pi_1(X)} of a connected topological space {X}.
  7. Dynamic: A group represents the passage of time (or of some other variable(s) of motion or action) on a (reversible) dynamical system.
  8. Category theoretic: A group is a category with one object, in which all morphisms have inverses.
  9. Quantum: A group is the classical limit {q \rightarrow 0} of a quantum group.

One can view a large part of group theory (and related subjects, such as representation theory) as exploring the interconnections between various of these perspectives. As one’s understanding of the subject matures, many of these formerly distinct perspectives slowly merge into a single unified perspective.

From a recent talk by Ezra Getzler, I learned a more sophisticated perspective on a group, somewhat analogous to Thurston’s example of a sophisticated perspective on a derivative (and coincidentally, flat connections play a central role in both):

  1. Sheaf theoretic: A group is identifiable with a (set-valued) sheaf on the category of simplicial complexes such that the morphisms associated to collapses of {d}-simplices are bijective for {d < 1} (and merely surjective for {d \leq 1}).

The rest of the post elaborates on this understanding.

Again in a Google Buzz post on Jun 9, 2010, Tao posted the following:

Bill Thurston’s “On proof and progress in mathematics” has many nice observations about the nature and practice of modern mathematics. One of them is that for any fundamental concept in mathematics, there is usually no “best” way to define or think about that concept, but instead there is often a family of interrelated and overlapping, but distinct, perspectives on that concept, each of which conveying its own useful intuition and generalisations; often, the combination of all of these perspectives is far greater than the sum of the parts. Thurston illustrates this with the concept of differentiation, to which he lists seven basic perspectives and one more advanced perspective, and hints at dozens more.

But even the most basic of mathematical concepts admit this multiplicity of interpretation and perspective. Consider for instance the operation of addition, that takes two numbers x and y and forms their sum x+y. There are many such ways to interpret this operation:

1. (Disjoint union) x+y is the “size” of the disjoint union X u Y of an object X of size x, and an object Y of size y. (Size is, of course, another concept with many different interpretations: cardinality, volume, mass, length, measure, etc.)

2. (Concatenation) x+y is the size of the object formed by concatenating an object X of size x with an object Y of size y (or by appending Y to X).

3. (Iteration) x+y is formed from x by incrementing it y times.

4. (Superposition) x+y is the “strength” of the superposition of a force (or field, intensity, etc.) of strength x with a force of strength y.

5. (Translation action) x+y is the translation of x by y.

5a. (Translation representation) x+y is the amount of translation or displacement incurred by composing a translation by x with a translation by y.

6. (Algebraic) + is a binary operation on numbers that give it the structure of an additive group (or monoid), with 0 being the additive identity and 1 being the generator of the natural numbers or integers.

7. (Logical) +, when combined with the other basic arithmetic operations, are a family of structures on numbers that obey a set of axioms such as the Peano axioms.

8. (Algorithmic) x+y is the output of the long addition algorithm that takes x and y as input.

9. etc.

These perspectives are all closely related to each other; this is why we are willing to give them all the common name of “addition”, and the common symbol of “+”. Nevertheless there are some slight differences between each perspective. For instance, addition of cardinals is based on perspective 1, while addition of ordinals is based on perspective 2. This distinction becomes apparent once one considers infinite cardinals or ordinals: for instance, in cardinal arithmetic, aleph_0 = 1+ aleph_0 = aleph_0 + 1 = aleph_0 + aleph_0, whereas in ordinal arithmetic, omega = 1+omega < omega+1 < omega + omega.

Transitioning from one perspective to another is often a necessary first conceptual step when the time comes to generalise the concept. As a child, addition of natural numbers is usually taught initially by using perspective 1 or 3, but to generalise to addition of integers, one must first switch to a perspective such as 4, 5, or 5a; similar conceptual shifts are needed when one then turns to addition of rationals, real numbers, complex numbers, residue classes, functions, matrices, elements of abstract additive groups, nonstandard number systems, etc. Eventually, one internalises all of the perspectives (and their inter-relationships) simultaneously, and then becomes comfortable with the addition concept in a very broad set of contexts; but it can be more of a struggle to do so when one has grasped only a subset of the possible ways of thinking about addition.

In many situations, the various perspectives of a concept are either completely equivalent to each other, or close enough to equivalent that one can safely “abuse notation” by identifying them together. But occasionally, one of the equivalences breaks down, and then it becomes useful to maintain a careful distinction between two perspectives that are almost, but not quite, compatible. Consider for instance the following ways of interpreting the operation of exponentiation x^y of two numbers x, y:

1. (Combinatorial) x^y is the number of ways to make y independent choices, each of which chooses from x alternatives.

2. (Set theoretic) x^y is the size of the space of functions from a set Y of size y to a set X of size x.

3. (Geometric) x^y is the volume (or measure) of a y-dimensional cube (or hypercube) whose sidelength is x.

4. (Iteration) x^y is the operation of starting at 1 and multiplying by x y times.

5. (Homomorphism) y → x^y is the continuous homomorphism from the domain of y (with the additive group structure) to the range of x^y (with the multiplicative structure) that maps 1 to x.

6. (Algebraic) ^ is the operation that obeys the laws of exponentiation in algebra.

7. (Log-exponential) x^y is exp( y log x ). (This raises the question of how to interpret exp and log, and again there are multiple perspectives for each…)

8. (Complex-analytic) Complex exponentiation is the analytic continuation of real exponentiation.

9. (Computational) x^y is whatever my calculator or computer outputs when it is asked to evaluate x^y.

10. etc.

Again, these interpretations are usually compatible with each other, but there are some key exceptions. For instance, the quantity 0^0 would be equal to zero [ed: I think this should be one —S] using some of these interpretations, but would be undefined in others. The quantity 4^{1/2} would be equal to 2 in some interpretations, be undefined in others, and be equal to the multivalued expression +-2 (or to depend on a choice of branch) in yet further interpretations. And quantities such as i^i are sufficiently problematic that it is usually best to try to avoid exponentiation of one arbitrary complex number by another arbitrary complex number unless one knows exactly what one is doing. In such situations, it is best not to think about a single, one-size-fits-all notion of a concept such as exponentiation, but instead be aware of the context one is in (e.g. is one raising a complex number to an integer power? A positive real to a complex power? A complex number to a fractional power? etc.) and to know which interpretations are most natural for that context, as this will help protect against making errors when manipulating expressions involving exponentiation.

It is also quite instructive to build one’s own list of interpretations for various basic concepts, analogously to those above (or Thurston’s example). Some good examples of concepts to try this on include “multiplication”, “integration”, “function”, “measure”, “solution”, “space”, “size”, “distance”, “curvature”, “number”, “convergence”, “probability” or “smoothness”. See also my blog post below in which the concept of a “group” is considered.

I plan to collect more such “different ways of thinking about the same (mathematical) thing” in this post, as I encounter them.

Written by S

Sat, 2016-03-26 at 10:05:09

Posted in mathematics, quotes

The same in every country

leave a comment »

(TODO: Learn and elaborate more on their respective histories and goals.)

The formula
\frac{\pi}{4} = 1 - \frac13 + \frac15 - \frac17 + \frac19 - \frac1{11} + \dots
(reminded via this post), a special case at x=1 of
\arctan x = x - \frac{x}3 + \frac{x}5 - \frac{x}7 + \dots,

was found by Leibniz in 1673, while he was trying to find the area (“quadrature”) of a circle, and he had as prior work the ideas of Pascal on infinitesimal triangles, and that of Mercator on the area of the hyperbola y(1+x) = 1 with its infinite series for \log(1+x). This was Leibniz’s first big mathematical work, before his more general ideas on calculus.

Leibniz did not know that this series had already been discovered earlier in 1671 by the short-lived mathematician James Gregory in Scotland. Gregory too had encountered Mercator’s infinite series \log(1+x) = x - x^2/2 + x^3/3 + \dots, and was working on different goals: he was trying to invert logarithmic and trigonometric functions.

Neither of them knew that the series had already been found two centuries earlier by Mādhava (1340–1425) in India (as known through the quotations of Nīlakaṇṭha c.1500), working in a completely different mathematical culture whose goals and practices were very different. The logarithm function doesn’t seem to have been known, let alone an infinite series for it, though a calculus of finite differences for interpolation for trigonometric functions seems to have been ahead of Europe by centuries (starting all the way back with Āryabhaṭa in c. 500 and more clearly stated by Bhāskara II in 1150). Using a different approach (based on the arc of a circle) and geometric series and sums-of-powers, Mādhava (or the mathematicians of the Kerala tradition) arrived at the same formula.

[The above is based on The Discovery of the Series Formula for π by Leibniz, Gregory and Nilakantha by Ranjay Roy (1991).]

This startling universality of mathematics across different cultures is what David Mumford remarks on, in Why I am a Platonist:

As Littlewood said to Hardy, the Greek mathematicians spoke a language modern mathematicians can understand, they were not clever schoolboys but were “fellows of a different college”. They were working and thinking the same way as Hardy and Littlewood. There is nothing whatsoever that needs to be adjusted to compensate for their living in a different time and place, in a different culture, with a different language and education from us. We are all understanding the same abstract mathematical set of ideas and seeing the same relationships.

The same thought was also expressed by Mean Girls:

Written by S

Tue, 2016-03-15 at 13:53:32

Posted in history, mathematics

The generating function for Smirnov words (or: time until k consecutive results are the same)

leave a comment »

1. Alphabet

Suppose we have an alphabet {\mathcal{A}} of size {m}. Its generating function (using the variable {z} to mark length) is simply {A(z) = mz}, as {\mathcal{A}} contains {m} elements of length {1} each.

2. Words

Let {\mathcal{W}} denote the class of all words over the alphabet {\mathcal{A}}. There are many ways to find the generating function {W(z)} for {\mathcal{W}}.


We have

\displaystyle \mathcal{W} = \{\epsilon\} + \mathcal{A} + \mathcal{A}\mathcal{A} + \mathcal{A}\mathcal{A}\mathcal{A} + \dots

so its generating function is

\displaystyle \begin{aligned} W(z) &= 1 + A(z) + A(z)^2 + A(z)^3 + \dots \\ &= 1 + mz + (mz)^2 + (mz)^3 + \dots \\ &= \frac{1}{1-mz} \end{aligned}


To put it differently, in the symbolic framework, we have {\mathcal{W} = \textsc{Seq}(\mathcal{A})}, so the generating function for {\mathcal{W}} is

\displaystyle W(z) = \frac{1}{1 - A(z)} = \frac{1}{1-mz}.


We could have arrived at this with direct counting: the number of words of length {n} is {W_n = m^n} as there are {m} choices for each of the {n} letters, so the generating function is

\displaystyle W(z) = \sum_{n \ge 0}W_n z^n = \sum_{n \ge 0} m^n z^n = \frac{1}{1-mz}.

3. Smirnov words

Next, let {\mathcal{S}} denote the class of Smirnov words over the alphabet {\mathcal{A}}, defined as words in which no two consecutive letters are identical. (That is, words {w_1w_2 \dots w_n} in which {w_i \in \mathcal{A}} for all {i}, and {w_i \neq w_{i-1}} for any {1 < i \le n}.) Again, we can find the generating function for {\mathcal{S}} in different ways.


For any word in {\mathcal{W}}, by “collapsing” all runs of each letter, we get a Smirnov word. To put it differently, any word in {\mathcal{W}} can be obtained from a Smirnov word {w} by “expanding” each letter {w_i} into a nonempty sequence of that letter. This observation (see Analytic Combinatorics, pp. 204–205) lets us relate the generating functions of {\mathcal{W}} and {\mathcal{S}} as

\displaystyle W(z) = S(\frac{z}{1-z})

which implicitly gives the generating function {S(z)}: we have

\displaystyle S(z) = W(\frac{z}{1+z}) = \frac{1}{1-m\frac{z}{1+z}} = \frac{1+z}{1 - (m-1)z}.


Alternatively, consider in an arbitrary word the first occurrence of a pair of repeated letters. Either this doesn’t happen at all (the word is a Smirnov word), or else, if it happens at position {i} so that {w_i = w_{i+1}}, then the part of the word up to position {i} is a nonempty Smirnov word, the letter at position {i+1} is the same as the previous letter, and everything after {i+1} is an arbitrary word. This gives

\displaystyle \mathcal{W} = \mathcal{S} + (\mathcal{S} \setminus \{ \epsilon \}) \cdot \mathcal{Z} \cdot \mathcal{W}

or in terms of generating functions

\displaystyle W(z) = S(z) + (S(z) - 1)zW(z)


\displaystyle S(z) = \frac{W(z) (1 + z)}{1 + zW(z)} = \frac{1 + z}{(1-mz)(1 + \frac{z}{1-mz})} = \frac{1+z}{1 - (m-1)z}


A minor variant is to again pick an arbitrary word and consider its first pair of repeated letters, happening (if it does) at positions {i} and {i+1}, but this time consider the prefix up to {i -1}: either it is empty, or the pair of letters is different from the last letter of the prefix, giving us the decomposition

\displaystyle \mathcal{W} = \mathcal{S} + m\mathcal{Z}^2 \cdot \mathcal{W} + (\mathcal{S}\setminus \{ \epsilon \}) \cdot (m-1)\mathcal{Z}^2 \mathcal{W}

and corresponding generating function

\displaystyle W(z) = S(z) + mz^2W(z) + (S(z) - 1)(m-1)z^2W(z)


\displaystyle S(z) = \frac{W(z)(1-z^2)}{1 + (m-1)z^2W(z)} = \frac{1-z^2}{1 - mz + (m-1)z^2} = \frac{(1-z)(1+z)}{(1-z)(1 - (m-1)z)}

which is the same as before after we cancel the {(1-z)} factors.


We could have arrived at this result with direct counting. For {n \ge 1}, for a Smirnov word of length {n}, we have {m} choices for the first letter, and for each of the other {(n-1)} letters, as they must not be the same as the previous letter, we have {(m-1)} choices. This gives the number of Smirnov words of length {n} as {m (m-1)^{n-1}} for {n \ge 1}, and so the generating function for Smirnov words is

\displaystyle S(z) = 1 + \sum_{n \ge 1} m (m-1)^{n-1} z^n = 1 + mz \sum_{n \ge 1} (m-1)^{n-1}z^{n-1} = 1 + \frac{mz}{1-(m-1)z}

again giving

\displaystyle S(z) = \frac{1 + z}{1 - (m-1)z}

4. Words with bounded runs

We can now generalize. Let {\mathcal{S}_k} denote the class of words in which no letter occurs more than {k} times consecutively. ({\mathcal{S} = \mathcal{S}_1}.) We can find the generating function for {\mathcal{S}_k}.


To get a word in {\mathcal{S}} we can take a Smirnov word and replace each letter with a nonempty sequence of up to {k} occurrences of that letter. This gives:

\displaystyle S_k(z) = S(z + z^2 + \dots + z^k) = S(z\frac{1-z^{k}}{1-z})


\displaystyle S_k(z) = \frac{1 + z\frac{1-z^{k}}{1-z}}{1 - (m-1)z\frac{1-z^{k}}{1-z}} = \frac{1 - z^{k+1}}{1 - mz + (m-1)z^{k+1}}.


Pick any arbitrary word, and consider its first occurrence of a run of {k+1} letters. Either such a run does not exist (which means the word we picked is in {\mathcal{S}_k}), or it occurs right at the beginning ({m} possibilities, one for each letter in the alphabet), or, if it occurs starting at position {i > 1}, then the part of the word up to position {i-1} (the “prefix”) is a nonempty Smirnov word, positions {i} to {i+k} are {k+1} occurrences of any of the {m-1} letters other than the last letter of the prefix, and what follows is an arbitrary word. This gives

\displaystyle \mathcal{W} = \mathcal{S}_k + m\mathcal{Z}^{k+1} \cdot \mathcal{W} + (\mathcal{S}_k \setminus \{ \epsilon \}) \cdot (m-1)\mathcal{Z}^{k+1} \cdot \mathcal{W}

or in terms of generating functions

\displaystyle W(z) = S_k(z) + mz^{k+1}W(z) + (S_k(z) - 1)(m-1)z^{k+1}W(z)


\displaystyle W(z)(1 - z^{k+1}) = S_k(z) (1 + (m-1)z^{k+1} W(z))


\displaystyle S_k(z) = \frac{W(z)(1-z^{k+1})}{1 + (m-1)z^{k+1}W(z)} = \frac{1-z^{k+1}}{1-mz + (m-1)z^{k+1}}


Arriving at this via direct counting seems hard.

5. Words that stop at a long run

Now consider words in which we “stop” as soon we see {k} consecutive identical letters. Let the class of such words be denoted {\mathcal{U}} (not writing {\mathcal{U}_k} to keep the notation simple). As before, we can find its generating function in multiple ways.


We get any word in {\mathcal{U}} by either immediately seeing a run of length {k} and stopping, or by starting with a nonempty prefix in {\mathcal{S}_{k-1}}, and then stopping with a run of {k} identical letters different from the last letter of the prefix. Thus we have

\displaystyle \mathcal{U} = m \mathcal{Z}^k + (\mathcal{S}_{k-1} \setminus \{\epsilon\}) \cdot (m-1)\mathcal{Z}^k


\displaystyle U(z) = m z^k + (S_{k-1}(z) - 1) (m-1) z^k

which gives

\displaystyle U(z) = z^k(1 + (m-1)S_{k-1}(z)) = z^k\left(1+(m-1)\frac{1-z^k}{1-mz+(m-1)z^k}\right) = \frac{m(1-z)z^k}{1 - mz + (m-1)z^k}


Alternatively, we can decompose any word by looking for its first run of {k} identical letters. Either it doesn’t occur at all (the word we picked is in {\mathcal{S}_{k-1}}, or the part of the word until the end of the run belongs to {\mathcal{U}} and the rest is an arbitrary word, so

\displaystyle \mathcal{W} = \mathcal{S}_{k-1} + \mathcal{U} \cdot \mathcal{W}


\displaystyle W(z) = S_{k-1}(z) + U(z) W(z)


\displaystyle U(z) = 1 - \frac{S_{k-1}(z)}{W(z)} = 1 - \frac{(1-z^k)(1-mz)}{1-mz + (m-1)z^k} = \frac{m(1-z)z^k}{1 - mz + (m-1)z^k}

6. Probability

Finally we arrive at the motivation: suppose we keep appending a random letter from the alphabet, until we encounter the same letter {k} times consecutively. What can we say about the length {X} of the word thus generated? As all sequences of letters are equally likely, the probability of seeing any string of length {n} is {\frac{1}{m^n}}. So in the above generating function {U(z) = \sum_{n} U_n z^n}, the probability of our word having length {n} is {U_n / m^n}, and the probability generating function {P(z)} is therefore {\sum_{n} U_n z^n / m^n}. This {P(z)} can be got by replacing {z} with {z/m} in the expression for {U(z)}: we have

\displaystyle P(z) = U(z/m) = \frac{(m-z)z^k}{m^k(1-z) + (m-1)z^k}

In principle, this probability generating function tells us everything about the distribution of the length of the word. For example, its expected length is

\displaystyle \mathop{E}[X] = P'(1) = \frac{m^k - 1}{m - 1}

(See this question on Quora for other powerful ways of finding this expected value directly.)

We can also find its variance, as

\displaystyle \mathop{Var}[X] = P''(1) + P'(1) - P'(1)^2 = \frac{m^{2k} - (2k-1)(m-1)m^k - m}{(m-1)^2}

This variance is really too large to be useful, so what we would really like, is the shape of the distribution… to be continued.

Written by S

Sun, 2016-01-03 at 03:06:23

Posted in mathematics

Some playing with Python

with 2 comments

A long time ago, Diophantus (sort of) discussed integer solutions to the equation

\displaystyle  x^2 + y^2 = z^2

(solutions to this equation are called Pythagorean triples).

Centuries later, in 1637, Fermat made a conjecture (now called Fermat’s Last Theorem, not because he uttered it in his dying breath, but because it was the last one to be proved — in ~1995) that

\displaystyle x^n + y^n = z^n

has no positive integer solutions for n \ge 3. In other words, his conjecture was that none of the following equations has a solution:

\displaystyle x^3 + y^3 = z^3

\displaystyle x^4 + y^4 = z^4

\displaystyle x^5 + y^5 = z^5

\displaystyle x^6 + y^6 = z^6

… and so on. An nth power cannot be partitioned into two nth powers.

About a century later, Euler proved the n = 3 case of Fermat’s conjecture, but generalized it in a different direction: he conjectured in 1769 that an nth power cannot be partitioned into fewer than n nth powers, namely

\displaystyle z^n = \sum_{i = 1}^k x_i^n

has no solutions with k < n. So his conjecture was that (among others) none of the following equations has a solution:

\displaystyle z^3 = a^3 + b^3

\displaystyle z^4 = a^4 + b^4 + c^4

\displaystyle z^5 = a^5 + b^5 + c^5 + d^5

\displaystyle z^6 = a^6 + b^6 + c^6 + d^6 + e^6

… and so on.

This conjecture stood for about two centuries, until abruptly it was found to be false, by Lander and Parkin who in 1966 simply did a direct search on the fastest (super)computer at the time, and found this counterexample:

\displaystyle 27^5 + 84^5 + 110^5 + 133^5 = 144^5

(It is still one of only three examples known, according to Wikipedia.)

Now, how might you find this solution on a computer today?

In his wonderful (as always) post at bit-player, Brian Hayes showed the following code:

import itertools as it

def four_fifths(n):
    '''Return smallest positive integers ((a,b,c,d),e) such that
       a^5 + b^5 + c^5 + d^5 = e^5; if no such tuple exists
       with e < n, return the string 'Failed'.'''
    fifths = [x**5 for x in range(n)]
    combos = it.combinations_with_replacement(range(1,n), 4)
    while True:
            cc =
            cc_sum = sum([fifths[i] for i in cc])
            if cc_sum in fifths:
                return(cc, fifths.index(cc_sum))
        except StopIteration:

to which, if you add (say) print four_fifths(150) and run it, it returns the correct answer fairly quickly: in about 47 seconds on my laptop.

The if cc_sum in fifths: line inside the loop is an O(n) cost each time it’s run, so with a simple improvement to the code (using a set instead) and rewriting it a bit, we can write the following full program:

import itertools

def find_counterexample(n):
  fifth_powers = [x**5 for x in range(n)]
  fifth_powers_set = set(fifth_powers)
  for xs in itertools.combinations_with_replacement(range(1, n), 4):
    xs_sum = sum([fifth_powers[i] for i in xs])
    if xs_sum in fifth_powers_set:
      return (xs, fifth_powers.index(xs_sum))
  return 'Failed'

print find_counterexample(150)

which finishes in about 8.5 seconds.


But there’s something unsatisfying about this solution, which is that it assumes there’s a solution with all four numbers on the LHS less than 150. After all, changing the function invocation to find_counterexample(145) makes it run a second faster even, but how could we know to do without already knowing the solution? Besides, we don’t have a fixed 8- or 10-second budget; what we’d really like is a program that keeps searching till it finds a solution or we abort it (or it runs out of memory or something), with no other fixed termination condition.

The above program used the given “n” as an upper bound to generate the combinations of 4 numbers; is there a way to generate all combinations when we don’t know an upper bound on them?

Yes! One of the things I learned from Knuth volume 4 is that if you simply write down each combination in descending order and order them lexicographically, the combinations you get for each upper bound are a prefix of the list of the next bigger one, i.e., for any upper bound, all the combinations form a prefix of the same infinite list, which starts as follows (line breaks for clarity):

2111, 2211, 2221, 2222,
3111, 3211, 3221, 3222, 3311, 3321, 3322, 3331, 3332, 3333,
4111, ...
     ... 9541, 9542, 9543, 9544, 9551, ... 9555, 9611, ...

There doesn’t seem to be a library function in Python to generate these though, so we can write our own. If we stare at the above list, we can figure out how to generate the next combination from a given one:

  1. Walk backwards from the end, till you reach the beginning or find an element that’s less than the previous one.
  2. Increase that element, set all the following elements to 1s, and continue.

We could write, say, the following code for it:

def all_combinations(r):
  xs = [1] * r
  while True:
    yield xs
    for i in range(r - 1, 0, -1):
      if xs[i] < xs[i - 1]:
      i = 0
    xs[i] += 1
    xs[i + 1:] = [1] * (r - i - 1)

(The else block on a for loop is an interesting Python feature: it is executed if the loop wasn’t terminated with break.) We could even hard-code the r=4 case, as we’ll see later below.

For testing whether a given number is a fifth power, we can no longer simply lookup in a fixed precomputed set. We can do a binary search instead:

def is_fifth_power(n):
  assert n > 0
  lo = 0
  hi = n
  # Invariant: lo^5 < n <= hi^5
  while hi - lo > 1:
    mid = lo + (hi - lo) / 2
    if mid ** 5 < n:
      lo = mid
      hi = mid
  return hi ** 5 == n

but it turns out that this is slower than one based on looking up in a growing set (as below).

Putting everything together, we can write the following (very C-like) code:

largest_known_fifth_power = (0, 0)
known_fifth_powers = set()
def is_fifth_power(n):
  global largest_known_fifth_power
  while n > largest_known_fifth_power[0]:
    m = largest_known_fifth_power[1] + 1
    m5 = m ** 5
    largest_known_fifth_power = (m5, m)
  return n in known_fifth_powers

def fournums_with_replacement():
  (x0, x1, x2, x3) = (1, 1, 1, 1)
  while True:
    yield (x0, x1, x2, x3)
    if x3 < x2:
      x3 += 1
    x3 = 1
    if x2 < x1:
      x2 += 1
    x2 = 1
    if x1 < x0:
      x1 += 1
    x1 = 1
    x0 += 1

if __name__ == '__main__':
  tried = 0
  for get in fournums_with_replacement():
    tried += 1
    if (tried % 1000000 == 0):
      print tried, 'Trying:', get
    rhs = get[0]**5 + get[1]**5 + get[2]**5 + get[3]**5
    if is_fifth_power(rhs):
      print 'Found:', get, rhs

which is both longer and slower (takes about 20 seconds) than the original program, but at least we have the satisfaction that it doesn’t depend on any externally known upper bound.

I originally started writing this post because I wanted to describe some experiments I did with profiling, but it’s late and I’m sleepy so I’ll just mention it.

python -m cProfile

will print relevant output in the terminal:

         26916504 function calls in 26.991 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1   18.555   18.555   26.991   26.991
 13458164    4.145    0.000    4.145    0.000
 13458163    4.292    0.000    4.292    0.000
      175    0.000    0.000    0.000    0.000 {method 'add' of 'set' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

Another way to view the same thing is to write the profile output to a file and read it with cprofilev:

python -m cProfile -o euler_profile.out
cprofilev euler_profile.out

and visit http://localhost:4000 to view it.

Of course, simply translating this code to C++ makes it run much faster:

#include <array>
#include <iostream>
#include <map>
#include <utility>

typedef long long Int;
constexpr Int fifth_power(Int x) { return x * x * x * x * x; }

std::map<Int, int> known_fifth_powers = {{0, 0}};
bool is_fifth_power(Int n) {
  while (n > known_fifth_powers.rbegin()->first) {
    int m = known_fifth_powers.rbegin()->second  + 1;
    known_fifth_powers[fifth_power(m)] = m;
  return known_fifth_powers.count(n);

std::array<Int, 4> four_nums() {
  static std::array<Int, 4> x = {1, 1, 1, 0};
  int i = 3;
  while (i > 0 && x[i] == x[i - 1]) --i;
  x[i] += 1;
  while (++i < 4) x[i] = 1;
  return x;

std::ostream& operator<<(std::ostream& os, std::array<Int, 4> x) {
  os << "(" << x[0] << ", " << x[1] << ", " << x[2] << ", " << x[3] << ")";
  return os;

int main() {
  while (true) {
    std::array<Int, 4> get = four_nums();
    Int rhs = fifth_power(get[0]) + fifth_power(get[1]) + fifth_power(get[2]) + fifth_power(get[3]);
    if (is_fifth_power(rhs)) {
      std::cout << "Found: " << get << " " << known_fifth_powers[rhs] << std::endl;


clang++ -std=c++11 && time ./a.out

runs in 2.43s, or 0.36s if compiled with -O2.

But I don’t have a satisfactory answer to how to make our Python program which takes 20 seconds as fast as the 8.5-second known-upper-bound version.

Edit [2015-05-08]: I wrote some benchmarking code to compare all the different “combination” functions.

import itertools

# Copied from the Python documentation
def itertools_equivalent(iterable, r):
    pool = tuple(iterable)
    n = len(pool)
    if not n and r:
    indices = [0] * r
    yield tuple(pool[i] for i in indices)
    while True:
        for i in reversed(range(r)):
            if indices[i] != n - 1:
        indices[i:] = [indices[i] + 1] * (r - i)
        yield tuple(pool[i] for i in indices)

# Above function, specialized to first argument being range(1, n)
def itertools_equivalent_specialized(n, r):
  indices = [1] * r
  yield indices
  while True:
    for i in reversed(range(r)):
      if indices[i] != n - 1:
    indices[i:] = [indices[i] + 1] * (r - i)
    yield indices

# Function to generate all combinations of 4 elements
def all_combinations_pythonic(r):
  xs = [1] * r
  while True:
    yield xs
    for i in range(r - 1, 0, -1):
      if xs[i] < xs[i - 1]:
      i = 0
    xs[i] += 1
    xs[i + 1:] = [1] * (r - i - 1)

# Above function, written in a more explicit C-like way
def all_combinations_clike(r):
  xs = [1] * r
  while True:
    yield xs
    i = r - 1
    while i > 0 and xs[i] == xs[i - 1]:
      i -= 1
    xs[i] += 1
    while i < r - 1:
      i += 1
      xs[i] = 1

# Above two functions, specialized to r = 4, using tuple over list.
def fournums():
  (x0, x1, x2, x3) = (1, 1, 1, 1)
  while True:
    yield (x0, x1, x2, x3)
    if x3 < x2:
      x3 += 1
    x3 = 1
    if x2 < x1:
      x2 += 1
    x2 = 1
    if x1 < x0:
      x1 += 1
    x1 = 1
    x0 += 1

# Benchmarks for all functions defined above (and the library function)
def benchmark_itertools(n):
  for xs in itertools.combinations_with_replacement(range(1, n), 4):
    if xs[0] >= n:
def benchmark_itertools_try(n):
  combinations = itertools.combinations_with_replacement(range(1, n), 4)
  while True:
      xs =
      if xs[0] >= n:
    except StopIteration:
def benchmark_itertools_equivalent(n):
  for xs in itertools_equivalent(range(1, n), 4):
    if xs[0] >= n:
def benchmark_itertools_equivalent_specialized(n):
  for xs in itertools_equivalent_specialized(n, 4):
    if xs[0] >= n:
def benchmark_all_combinations_pythonic(n):
  for xs in all_combinations_pythonic(4):
    if xs[0] >= n:
def benchmark_all_combinations_clike(n):
  for xs in all_combinations_clike(4):
    if xs[0] >= n:
def benchmark_fournums(n):
  for xs in fournums():
    if xs[0] >= n:

if __name__ == '__main__':

As you can see, I chose inside the benchmarking function the same statement that would cause all_combinations to terminate, and have no effect for the other combination functions.
When run with

python -m cProfile

the results include:



  • Calling itertools.combinations_with_replacement is by far the fastest, taking about 2.7 seconds. It turns out that it’s written in C, so this would be hard to beat. (Still, writing it in a try block is seriously bad.)
  • The “equivalent” Python code from the itertools documentation (benchmark_itertools_combinations_with_replacment) is about 50x slower.
  • Gets slightly better when specialized to numbers.
  • Simply generating all combinations without an upper bound is actually faster.
  • It can be made even faster by writing it in a more C-like way.
  • The tuples version with the loop unrolled manually is rather fast when seen in this light, less than 4x slower than the library version.

Written by S

Sun, 2015-02-08 at 00:03:38

Colliding balls approximate pi

with 2 comments

Found via G+, a new physical experiment that approximates \pi, like Buffon’s needle problem: The Pi Machine.

Roughly, the amazing discovery of Gregory Galperin is this: When a ball of mass M collides with one of ball m, propelling it towards a wall, the number of collisions (assuming standard physics idealisms) is \pi \lfloor\sqrt{M/m}\rfloor, so by taking M/m = 10^{2n}, we can get the first n+1 digits of \pi. Note that this number of collisions is an entirely determinstic quantity; there’s no probability is involved!

Here’s a video demonstrating the fact for M/m = 100 (the blue ball is the heavier one):

The NYT post says how this discovery came about:

Dr. Galperin’s approach was also geometric but very different (using an unfolding geodesic), building on prior related insights. Dr. Galperin, who studied under well-known Russian mathematician Andrei Kolmogorov, had recently written (with Yakov Sinai) extensively on ball collisions, realized just before a talk in 1995 that a plot of the ball positions of a pair of colliding balls could be used to determine pi. (When he mentioned this insight in the talk, no one in the audience believed him.) This finding was ultimately published as “Playing Pool With Pi” in a 2003 issue of Regular and Chaotic Dynamics.

The paper, Playing Pool With π (The number π from a billiard point of view) is very readable. The post has, despite a “solution” section, essentially no explanation, but the two comments by Dave in the comments section explain it clearly. And a reader sent in a cleaned-up version of that too: here, by Benjamin Wearn who teaches physics at Fieldston School.

Now someone needs to make a simulation / animation graphing the two balls in phase space of momentum. :-)

I’d done something a while ago, to illustrate The Orbit of the Moon around the Sun is Convex!, here. Probably need to re-learn all that JavaScript stuff, to make one for this. Leaving this post here as a placeholder.

Or maybe someone has done it already?

Written by S

Mon, 2014-06-23 at 23:03:18

Posted in mathematics, unfinished

Prefatory apprehension

leave a comment »

Robert Recorde’s 1557 book is noted for being the first to introduce the equals sign =, and is titled:

The Whetstone of Witte: whiche is the seconde parte of Arithmeteke: containing the extraction of rootes; the cossike practise, with the rule of equation; and the workes of Surde Nombers.

Its title page (see, see also the full book at contains this verse:

Original spelling

Though many ſtones doe beare greate price,
whetſtone is for exerſice
As neadefull, and in woorke as ſtraunge:
Dulle thinges and harde it will ſo chaunge,
And make them ſharpe, to right good vſe:
All arteſmen knowe, thei can not chuſe,
But uſe his helpe: yet as men ſee,
Noe ſharpeneſſe ſemeth in it to bee.

The grounde of artes did brede this ſtone:
His vſe is greate, and moare then one.
Here if you lift your wittes to whette,
Moche ſharpeneſſe thereby ſhall you gette.
Dulle wittes hereby doe greately mende,
Sharpe wittes are fined to their fulle ende.
Now proue, and praiſe, as you doe finde,
And to your ſelf be not vnkinde.

Modern spelling

Though many stones do bear great price,
The whetstone is for exercise
As needful, and in work as strange:
Dull things and hard it will so change
And make them sharp, to right good use:
All artsmen know they cannot choose
But use his help; yet as men see,
No sharpness seemeth in it to be.

The ground of arts did breed this stone;
His use is great, and more than one.
Here if you lift your wits to whet,
Much sharpness thereby shall you get.
Dull wits hereby do greatly mend,
Sharp wits are fined to their full end.
Now prove and praise as you do find,
And to yourself be not unkind.

Apparently the full title contains a pun (see “the cossike practise” in the title refers to algebra, as the Latin cosa apparently meaning “a thing” was used to stand for an unknown, abbreviated to cos — but the Latin word cos itself means a grindstone.

The author again reminds readers not to blame his book, at the end of his preface:

To the curiouſe ſcanner.

If you ought finde, as ſome men maie,
That you can mende, I ſhall you praie,
To take ſome paine ſo grace maie ſende,
This worke to growe to perfecte ende.

But if you mende not that you blame,
I winne the praiſe, and you the ſhame.
Therfore be wiſe, and learne before,
Sith ſlaunder hurtes it ſelf moſte ſore.

Authors are either anxious about how their book is received, or make sure to be pointedly uncaring.

Sir Arthur Conan Doyle, in a mostly forgettable volume of poetry (Songs of the Road, 1911), begins:

If it were not for the hillocks
   You’d think little of the hills;
The rivers would seem tiny
   If it were not for the rills.
If you never saw the brushwood
   You would under-rate the trees;
And so you see the purpose
   Of such little rhymes as these.

Kālidāsa of course begins his Raghuvaṃśa with a grand disclaimer:

kva sūryaprabhavo vaṃśaḥ kva cālpaviṣayā matiḥ /
titīrṣur dustaram mohād uḍupenāsmi sāgaram // Ragh_1.2 //

mandaḥ kaviyaśaḥ prārthī gamiṣyāmy upahāsyatām /
prāṃśulabhye phale lobhād udbāhur iva vāmanaḥ // Ragh_1.3 //

atha vā kṛtavāgdvāre vaṃśe ‘smin pūrvasūribhiḥ /
maṇau vajrasamutkīrṇe sūtrasyevāsti me gatiḥ // Ragh_1.4 //

But the most nonchalant I’ve seen, thanks to Dr. Ganesh, is this gīti by Śrīkṛṣṇa Brahmatantra Yatīndra of the Parakāla Maṭha, Mysore:

nindatu vā nandatu vā
mandamanīṣā niśamya kṛtim etām
harṣaṃ vā marṣaṃ vā
sarṣapamātram api naiva vindema

Screw you guys. :-)

Written by S

Wed, 2014-05-28 at 23:56:11

Posted in history, mathematics

Big O() notation: a couple of sources

with 2 comments

This post contains, just for future reference, a couple of primary sources relevant to the O (“Big O”) notation:

  1. Some introductory words from Asymptotic Methods in Analysis by de Bruijn
  2. An letter from Donald Knuth on an approach to teaching calculus using this notation.

Read the rest of this entry »

Written by S

Thu, 2014-03-13 at 16:33:20

Visualizing product of permutations

with 4 comments

A simple pedagogical trick that may come in handy: represent a permutation \sigma using arrows (curved lines) from k to \sigma(k) for each k. Then, the product of two permutations can be represented by just putting the two corresponding figures (sets of arrows) one below the other, and following the arrows.

Representing permutations and products of permutations.

Representing permutations and products of permutations.

The figure is from an article called Symmetries by Alain Connes, found via the Wikipedia article on Morley’s trisector theorem (something entirely unrelated to permutations, but the article covers both of them and more).

I’m thinking how one might write a program to actually draw these: if we decide that the “height” of the figure is some h, then each arrow needs to go from some (k, 0) to (\sigma(k), h) (using here the usual screen convention of x coordinate increasing from left to right, and y coordinate increasing from top to bottom). Further, each curve needs to have vertical slope at its two endpoints, so that successive curves can line up smoothly. The constraint on starting point, ending point, and directions at the endpoints defines almost a quadratic Bezier curve, except that here the two directions are parallel. So it’s somewhere between a quadratic and the (usual) cubic Bezier curve, which is given by the start point, end point, and derivatives at the start and end point. (Here we only care about the direction of the derivative; we can pick some arbitrary magnitude to fix the curve: the larger we pick, the more smooth it will look at the ends, at the cost of smoothness in the interior.)

Even knowing the curve, how do we generate an image?

Written by S

Thu, 2014-03-06 at 23:15:44

The idea of logarithms, and the first appearance of e

with 2 comments

[Incomplete post: about 10% written, about 90% left.]

The notion of the number e, the exponential function e^x, and logarithms \log x are often conceptual stumbling blocks even to someone who has an otherwise solid understanding of middle-school mathematics.

Just what is the number e? How was it first calculated / where did it first turn up? Premature exposure to its numerical value

\displaystyle e \approx 2.718281828459045\dots

only serves to deepen the mysteriousness and to make it seem arbitrary.

Here a historical perspective helps: as is often the case, here too, the first appearance is simpler and more well-motivated than the accounts in dry textbooks. This is from this account by Matthew P. Wiener (originally posted on USENET somewhere, as quoted by MJD). I’m just going to quote it directly for now, and edit it later:

Napier, who invented logarithms, more or less worked out a table of logarithms to base \frac1e, as follows:

     0  1  2  3   4   5   6    7    8    9    10 ...
     1  2  4  8  16  32  64  128  256  512  1024 ...

The arithmetic progression in the first row is matched by a geometric progression in the second row. If, by any luck, you happen to wish to multiply 16 by 32, that just happen to be in the bottom row, you can look up their “logs” in the first row and add 4+5 to get 9 and then conclude 16·32=512.

For most practical purposes, this is useless. Napier realized that what one needs to multiply in general is 1+\epsilon for a base—the intermediate values will be much more extensive. For example, with base 1.01, we get:

       0 1.00   1 1.01   2 1.02   3 1.03   4 1.04   5 1.05
       6 1.06   7 1.07   8 1.08   9 1.09  10 1.10  11 1.12
      12 1.13  13 1.14  14 1.15  15 1.16  16 1.17  17 1.18
      18 1.20  19 1.21  20 1.22  21 1.23  22 1.24  23 1.26
      24 1.27  25 1.28  26 1.30  27 1.31  28 1.32  29 1.33
      30 1.35  31 1.36  32 1.37  33 1.39  34 1.40  35 1.42
      50 1.64  51 1.66  52 1.68  53 1.69  54 1.71  55 1.73
      94 2.55  95 2.57  96 2.60  97 2.63  98 2.65  99 2.68
     100 2.70 101 2.73 102 2.76 103 2.79 104 2.81 105 2.84

So if you need to multiply 1.27 by 1.33, say, just look up their logs, in this case, 24 and 29, add them, and get 53, so 1.27·1.33=1.69. For two/three digit arithmetic, the table only needs entries up to 9.99.

Note that e is almost there, as the antilogarithm of 100. The natural logarithm of a number can be read off from the above table, as just [approximately] \frac1{100} the corresponding exponent.

What Napier actually did was work with base .9999999. He spent 20 years computing powers of .9999999 by hand, producing a grand version of the above. That’s it. No deep understanding of anything, no calculus, and e pops up anyway—in Napier’s case, \frac1e was the 10 millionth entry. (To be pedantic, Napier did not actually use decimal points, that being a new fangled notion at the time.)

Later, in his historic meeting with Briggs, two changes were made. A switch to a base > 1 was made, so that logarithms would scale in the same direction as the numbers, and the spacing on the logarithm sides was chosen so that \log(10)=1. These two changes were, in effect, just division by -\log_e(10).

In other words, e made its first appearance rather implicitly.

(I had earlier read a book on Napier and come to the same information though a lot less clearly, here.)

I had started writing a series of posts leading up to an understanding of the exponential function e^x (here, here, here), but it seems to have got abandoned. Consider this one a contribution to that series.

Written by S

Wed, 2013-11-27 at 10:52:51

Posted in mathematics

The functional equation f(x+y) = f(x)f(y)

with 3 comments

Suppose f: \mathbb{R} \to \mathbb{R} satisfies f(x+y) = f(x) f(y). What can we say about f?

Putting y = 0 gives

\displaystyle f(x) = f(x+0) = f(x)f(0),

which can happen if either f(x) = 0 or f(0) = 1. Note that the function f which is identically zero satisfies the functional equation. If f is not this function, i.e., if f(x) \neq 0 for at least one value of x, then plugging that value of x (say x^*) into the equation gives f(0) = 1. Also, for any x, the equation f(x^*) = f(x +x^* - x) = f(x)f(x^* - x) forces f(x) \neq 0 as well. Further, f(x) = f(x/2 + x/2) = f(x/2)^2 so f(x) > 0 for all x.

Next, putting y = x gives f(2x) = f(x)^2, and by induction f(nx) = f(x)^n. Putting \frac{x}{n} in place of x in this gives f(n\frac{x}{n}) = f(\frac{x}{n})^n which means f(\frac{x}{n}) = f(x)^{\frac1n} (note we’re using f(x) > 0 here). And again, f(\frac{m}{n}x) = f(x)^{m/n}. So f(\frac{m}{n}) = f(1)^{m/n}, which completely defines the function at rational points.

[As f(1) > 0, it can be written as f(1) = e^k for some constant k, which gives f(x) = e^{kx} for rational x.]

To extend this function to irrational numbers, we need some further assumptions on f, such as continuity. It turns out that being continuous at any point is enough (and implies the function is f(x) = f(1)^x everywhere): note that f(x + m/n) = f(x)f(m/n) = f(x)f(1)^{m/n}. Even being Lebesgue-integrable/measurable will do.

Else, there are discontinuous functions satisfying the functional equation. (Basically, we can define the value of the function separately on each “independent” part. That is, define the equivalence class where x and y are related if y = r_1x + r_2 for rationals r_1 and r_2, pick a representative for each class using the axiom of choice (this is something like picking a basis for \mathbb{R}/\mathbb{Q}, which corresponds to the equivalence class defined by the relation y = r_1x), define the value of the function independently for each representative, and this fixes the value of f on \mathbb{R}. See this article for more details.)

To step back a bit: what the functional equation says is that f is a homorphism from (\mathbb{R}, +), the additive group of real numbers, to (\mathbb{R}, \times), the multiplicative monoid of real numbers. If f is not the trivial identically-zero function, then (as we saw above) f is in fact a homomorphism from (\mathbb{R}, +), the additive group of real numbers, to (\mathbb{R_+^*}, \times), the multiplicative group of positive real numbers. What we proved is that the exponential functions e^{kx} are precisely all such functions that are nice (nice here meaning either measurable or continuous at least one point). (Note that this set includes the trivial homomorphism corresponding to k = 0: the function f(x) = 1 identically everywhere. If f is not this trivial map, then it is in fact an isomorphism.)

Edit [2013-10-11]: See also Overview of basic facts about Cauchy functional equation.

Written by S

Mon, 2013-04-08 at 11:24:08

Posted in mathematics

Trajectory of a point moving with acceleration perpendicular to velocity

with 8 comments

(Just some basic high-school physics stuff; to assure myself I can still do some elementary things. :P Essentially, showing that if a particle moves with acceleration perpendicular to velocity, or velocity perpendicular to position, then it traces out a circle. Stop reading here if this is obvious.)

Suppose a point moves in the plane such that its acceleration is always perpendicular to its velocity, and of the same magnitude. What is its path like?

To set up notation: let’s say the point’s position at time t is (p_x(t), p_y(t)), its velocity is (v_x(t), v_y(t)) = \left(\frac{d}{dt}p_x(t), \frac{d}{dt}p_y(t)\right), and its acceleration is (a_x(t), a_y(t)) = \left(\frac{d}{dt}v_x(t), \frac{d}{dt}v_y(t)\right).

The result of rotating a point (x,y) by 90° is (-y, x). (E.g. see figure below)


So the fact that acceleration is at right angles to velocity means that (a_x(t), a_y(t)) = (-v_y(t), v_x(t)), or, to write everything in terms of the velocity,

\begin{aligned} \frac{d}{dt}v_x(t) &= -v_y(t) \\  \frac{d}{dt}v_y(t) &= v_x(t) \end{aligned}

where we can get rid of v_x(t) by substituting the second equation (in the form v_x(t) = \frac{d}{dt}v_y(t)) into the first:

v_y(t) = -\frac{d}{dt}v_x(t) = -\frac{d}{dt}\left(\frac{d}{dt}v_y(t)\right)

or in other words

v_y(t) = -\frac{d^2}{dt^2}v_y(t).

By some theory about ordinary differential equations, which I don’t know (please help!) (but see the very related example you saw in high school, of simple harmonic motion), the solutions to this equation are \sin(t) and \cos(t) and any linear combination of those: the solution in general is

\begin{aligned}  v_y(t) &= a \sin(t) + b \cos(t) \\  &= \sqrt{a^2 + b^2} \left(\frac{a}{\sqrt{a^2+b^2}}\sin(t) + \frac{b}{\sqrt{a^2+b^2}}\cos(t)\right) \\  &= R\sin (t + \alpha)  \end{aligned}

where R = \sqrt{a^2 + b^2} and \alpha is the angle such that \cos(\alpha) = \frac{a}{\sqrt{a^2+b^2}} and \sin(\alpha) = \frac{b}{\sqrt{a^2+b^2}}. And the fact that v_x(t) = \frac{d}{dt}v_y(t) gives v_x(t) = R\cos(t + \alpha). So (v_x(t), v_y(t)) = (R\cos(t + \alpha), R\sin(t + \alpha)). Note that (a_x(t), a_y(t)) = \left(\frac{d}{dt}v_x(t), \frac{d}{dt}v_y(t)\right) = (-R\sin(t+\alpha), R\cos(t+\alpha)) is indeed perpendicular to (v_x(t), v_y(t)) as we wanted.

The actual trajectory (p_x(t), p_y(t)) can be got by integrating

\left(\frac{d}{dt}p_x(t), \frac{d}{dt}p_y(t)\right)  = (v_x(t), v_y(t)) = (R\cos(t + \alpha), R\sin(t + \alpha))

to get p_x(t) = R\sin(t + \alpha) + c_1 and p_y(t) = -R\cos(t + \alpha) + c_2. This trajectory is a point moving on a circle centered at point (c_1, c_2) and of radius R, with speed R or unit angular speed. Note that velocity is also perpendicular to the point’s position wrt the centre of the circle, as velocity is tangential to the circle, as it should be.

With a suitable change of coordinates (translate the origin to (c_1, c_2), then rotate the axes by \frac{\pi}{2}+\alpha, then scale everything so that R = 1), this is the familiar paremetrization (\cos(t), \sin(t)) of the circle.

Note: Just as we derived (v_x(t), v_y(t)) = (R\cos(t + \alpha), R\sin(t + \alpha)) from assuming that the acceleration is perpendicular to velocity, we can, by assuming that velocity is perpendicular to position, identically derive (p_x(t), p_y(t)) = (R\cos(t + \alpha), R\sin(t + \alpha)), i.e. that the point moves on a circle.

Written by S

Sun, 2013-04-07 at 23:38:01

Posted in mathematics

The power series for sin and cos

with one comment

There are many ways to derive the power series of \sin x and \cos x using the machinery of Taylor series etc., but below is another elementary way of demonstrating that the well-known power series expansions are the right ones. The argument below is from Tristan Needham’s Visual Complex Analysis, which I’m reproducing without looking at the book just to convince myself that I’ve internalized it correctly.

So: let
\displaystyle  \begin{aligned}   P(x) &= x - \frac{x^3}{3!} + \frac{x^5}{5!} - \frac{x^7}{7!} + \dots \quad \text{ and }\\  Q(x) &= 1 - \frac{x^2}{2!} + \frac{x^4}{4!} - \frac{x^6}{6!} + \dots .  \end{aligned}

We will take the following two for granted (both can be proved with some effort):

  1. Both power series are convergent.
  2. The power series can be differentiated term-wise.

As suggested by (2) above, the first thing we observe is that \frac{d}{dx}P(x) = Q(x) and \frac{d}{dx}Q(x) = -P(x).

So firstly:
\begin{aligned}  \frac{d}{dx}(P(x)^2 + Q(x)^2)   &= 2P(x)P'(x) + 2Q(x)Q'(x) \\  &= 2P(x)Q(x) - 2Q(x)P(x) \\  &= 0  \end{aligned}
which means that P(x)^2 + Q(x)^2 is a constant and does not vary with x. Putting x = 0 shows that P(0) = 0 and Q(0) = 1, so P(x)^2 + Q(x)^2 = 1 for all x.

Secondly, define the angle \theta as a function of x, by \tan \theta(x) = P(x)/Q(x). (To be precise, this defines \theta(x) up to a multiple of \pi, i.e. modulo \pi.)
Differentiating the left-hand side of this definition gives
\displaystyle \begin{aligned}  \frac{d}{dx} \tan \theta(x)   &= (1 + \tan^2 \theta(x)) \theta'(x) \\  &= (1 + \frac{P(x)^2}{Q(x)^2}) \theta'(x) \\  &= \frac{1}{Q(x)^2} \theta'(x)   \end{aligned}
(where \theta'(x) means \frac{d}{dx} \theta(x))
while differentiating the right-hand side gives
\displaystyle \begin{aligned}  \frac{d}{dx} \frac{P(x)}{Q(x)}   &= \frac{Q(x)P'(x) - P(x)Q'(x)}{Q(x)^2} \\  &= \frac{Q(x)Q(x) + P(x)P(x)}{Q(x)^2} \\  &= \frac{1}{Q(x)^2}  \end{aligned}

The necessary equality of the two tells us that \frac{d}{dx}\theta(x) = 1, which along with the initial condition \tan \theta(0) = P(0)/Q(0) = 0 = \tan 0 that says \theta(0) \equiv 0 \mod \pi, gives \theta(x) = x (or to be precise, \theta(x) \equiv x \pmod {\pi}).

In other words, we have shown that the power series P(x) and Q(x) satisfy \frac{P(x)}{Q(x)} = \tan x = \frac{\sin x}{\cos x} and therefore P(x) = k \sin x and Q(x) = k \cos x for some k. The observation that Q(0) = 1 = \cos 0 (or our earlier observation that P(x)^2 + Q(x)^2 = 1 for all x) gives k = 1, thereby showing that P(x) = \sin x and Q(x) = \cos x.

So much for \sin x and \cos x. Just as an aside, observe that if we take i to be a symbol satisfying i^2 = -1, then
\displaystyle \begin{aligned}  \cos x + i\sin x   &= Q(x) + iP(x) \\  &= \left(1 - \frac{x^2}{2!} + \frac{x^4}{4!} - \frac{x^6}{6!} + \dots\right) + i\left(x - \frac{x^3}{3!} + \frac{x^5}{5!} - \frac{x^7}{7!} + \dots \right) \\  &= 1 + ix + \frac{-x^2}{2!} + \frac{-ix^3}{3!} + \frac{x^4}{4!} + \frac{ix^5}{5!} + \frac{-x^6}{6!} + \frac{-ix^7}{7!} + \dots \\  &= 1 + ix + \frac{(ix)^2}{2!} + \frac{(ix)^3}{3!} + \frac{(ix)^4}{4!} + \frac{(ix)^5}{5!} + \frac{(ix)^6}{6!} + \frac{(ix)^7}{7!} + \dots  \end{aligned}
the right hand side of which looks very much like the result of “substituting” y = ix in the known (real) power series
\displaystyle e^y = 1 + y + \frac{y^2}{2!} + \frac{y^3}{3!} + \dots
(which itself can be proved using the term-wise differentiation above and the defining property \frac{d}{dx} e^x = e^x, say).

So this is one heuristic justification for us to define e^{ix} = \cos x + i\sin x.
Or, if we define e^{ix} as the result of substituting ix in the real power series for e^y, this proves that e^{ix} = \cos x + i\sin x.

Written by S

Fri, 2013-03-08 at 00:33:12

Posted in mathematics

The potions puzzle by Snape in Harry Potter and the Philosopher’s Stone

with 3 comments

Sometime this week, I reread the first Harry Potter book (after at least 10 years… wow, has it been that long?), just for contrast after reading Rowling’s adult novel The Casual Vacancy (on which more later). Anyway, in the penultimate chapter there is a puzzle:

He pulled open the next door, both of them hardly daring to look at what came next — but there was nothing very frightening in here, just a table with seven differently shaped bottles standing on it in a line.
“Snape’s,” said Harry. “What do we have to do?”
They stepped over the threshold, and immediately a fire sprang up behind them in the doorway. It wasn’t ordinary fire either; it was purple. At the same instant, black flames shot up in the doorway leading onward. They were trapped.
“Look!” Hermione seized a roll of paper lying next to the bottles. Harry looked over her shoulder to read it:

Danger lies before you, while safety lies behind,
Two of us will help you, whichever you would find,
One among us seven will let you move ahead,
Another will transport the drinker back instead,
Two among our number hold only nettle wine,
Three of us are killers, waiting hidden in line.
Choose, unless you wish to stay here forevermore,
To help you in your choice, we give you these clues four:
First, however slyly the poison tries to hide
You will always find some on nettle wine’s left side;
Second, different are those who stand at either end,
But if you would move onward, neither is your friend;
Third, as you see clearly, all are different size,
Neither dwarf nor giant holds death in their insides;
Fourth, the second left and the second on the right
Are twins once you taste them, though different at first sight.

I became curious about whether this is just a ditty Rowling made up, or the puzzle actually makes sense and is consistent. It turns out she has constructed it well. Let’s take a look. This investigation can be carried out by hand, but we’ll be lazy and use a computer, specifically Python. The code examples below are all to be typed in an interactive Python shell, the one that you get by typing “python” in a terminal.

So what we have are seven bottles, of which one will take you forward (F), one will let you go back (B), two are just nettle wine (N), and three are poison (P).

>>> bottles = ['F', 'B', 'N', 'N', 'P', 'P', 'P']

The actual ordering of these 7 bottles is some ordering (permutation) of them. As 7 is a very small number, we can afford to be inefficient and resort to brute-force enumeration.

>>> import itertools
>>> perms = [''.join(s) for s in set(itertools.permutations(bottles))]
>>> len(perms)

The set is needed to remove duplicates, because otherwise itertools.permutations will print 7! “permutations”. So already the number of all possible orderings is rather small (it is \frac{7!}{2!3!} = 420). We can look at a sample to check whether things look fine.

>>> perms[:10]

Now let us try to solve the puzzle. We can start with the first clue, which says that wherever a nettle-wine bottle occurs, on its left is always a poison bottle (and in particular therefore, a nettle-wine bottle cannot be in the leftmost position). So we must restrict the orderings to just those that satisfy this condition.

>>> def clue1(s): return all(i > 0 and s[i-1] == 'P' for i in range(len(s)) if s[i]=='N')
>>> len([s for s in perms if clue1(s)])

(In the code, the 7 positions are 0 to 6, as array indices in code generally start at 0.)
Then the second clue says that the bottles at the end are different, and neither contains the potion that lets you go forward.

>>> def clue2(s): return s[0] != s[6] and s[0] != 'F' and s[6] != 'F'
>>> len([s for s in perms if clue1(s) and clue2(s)])

The third clue says that the smallest and largest bottles don’t contain poison, and this would be of help to Harry and Hermione who can see the sizes of the bottles. But as we readers are not told the sizes of the bottles, this doesn’t seem of any help to us; let us return to this later.

The fourth clue says that the second-left and second-right bottles have the same contents.

>>> def clue4(s): return s[1] == s[5]
>>> len([s for s in perms if clue1(s) and clue2(s) and clue4(s)])

There are now just 8 possibilities, finally small enough to print them all.

>>> [s for s in perms if clue1(s) and clue2(s) and clue4(s)]

Alas, without knowing which the “dwarf” and “giant” bottles are, we cannot use the third clue, and this seems as far as we can go. We seem to have exhausted all the information available…

Almost. It is reasonable to assume that the puzzle is meant to have a solution. So even without knowing where exactly the “dwarf” and “giant” bottles are, we can say that they are in some pair of locations that ensure a unique solution.

>>> def clue3(d, g, s): return s[d]!='P' and s[g]!='P'
>>> for d in range(7):
...   for g in range(7):
...     if d == g: continue
...     poss = [s for s in perms if clue1(s) and clue2(s) and clue4(s) and clue3(d,g,s)]
...     if len(poss) == 1:
...       print d, g, poss[0]

Aha! If you look at the possible orderings closely, you will see that we are down to just two possibilities for the ordering of the bottles.
Actually there is some scope for quibbling in what we did above: perhaps we cannot say that there is a unique solution determining the entire configuration; perhaps all we can say is that the puzzle should let us uniquely determine the positions of just the two useful bottles. Fortunately, that gives exactly the same set of possibilities, so this distinction happens to be inconsequential.

>>> for d in range(7):
...   for g in range(7):
...     if d == g: continue
...     poss = [(s.index('F'),s.index('B')) for s in perms if clue1(s) and clue2(s) and clue4(s) and clue3(d,g,s)]
...     if len(set(poss)) == 1:
...       print d, g, [s for s in perms if clue1(s) and clue2(s) and clue4(s) and clue3(d,g,s)][0]

Good. Note that there are only two configurations above. So with only the clues in the poem, and the assumption that the puzzle can be solved, we can narrow down the possibilities to two configurations, and be sure of the contents of all the bottles except the third and fourth. We know that the potion that lets us go forward is in either the third or the fourth bottle.

In particular we see that the last bottle lets us go back, and indeed this is confirmed by the story later:

“Which one will get you back through the purple flames?”
Hermione pointed at a rounded bottle at the right end of the line.
She took a long drink from the round bottle at the end, and shuddered.

But we don’t know which of the two it is, as we can’t reconstruct all the relevant details of the configuration. Perhaps we can reconstruct something with the remaining piece of information from the story?

“Got it,” she said. “The smallest bottle will get us through the black fire — toward the Stone.”
Harry looked at the tiny bottle.
Harry took a deep breath and picked up the smallest bottle.

So we know that the bottle that lets one move forward is in fact in the smallest one, the “dwarf”.

>>> for d in range(7):
...   for g in range(7):
...     poss = [s for s in perms if clue1(s) and clue2(s) and clue4(s) and clue3(d,g,s)]
...     if len(poss) == 1 and poss[0][d] == 'F':
...       print d, g, poss[0]

This narrows the possible positions of the smallest and largest bottles (note that it says the largest bottle is one that contains nettle wine), but still leaves the same two possibilities for the complete configuration. So we can stop here.

What we can conclude is the following: apart from the clues mentioned in the poem, the “dwarf” (the smallest bottle) was in either position 2 (third from the left) or 3 (fourth from the left). The biggest bottle was in either position 1 (second from the left) or 5 (sixth from the left). With this information about the location of the smallest bottle (and without necessarily assuming the puzzle has a unique solution!), Hermione could determine the contents of all the bottles. In particular she could determine the location of the two useful bottles: namely that the bottle that lets you go back was the last one, and that the one that lets you go forward was the smallest bottle.

>>> for (d,g) in [(2,1), (2,5), (3,1), (3,5)]:
...   poss = [s for s in perms if clue1(s) and clue2(s) and clue4(s) and clue3(d, g, s)]
...   assert len(poss) == 1
...   s = poss[0]
...   assert s.index('B') == 6
...   assert s.index('F') == d
...   print (d,g), s
(2, 1) PNFPPNB
(2, 5) PNFPPNB
(3, 1) PNPFPNB
(3, 5) PNPFPNB

It is not clear why she went to the effort to create a meaningful puzzle, then withheld details that would let the reader solve it fully. Perhaps some details were removed during editing. As far as making up stuff for the sake of a story goes, though, this is nothing; consider for instance the language created for Avatar which viewers have learned.

See also which does it by hand, and has a perfectly clear exposition (it doesn’t try the trick of guessing that solution is unique before reaching for the additional information from the story).

Written by S

Mon, 2012-10-22 at 02:00:12

Bill Thurston

with 2 comments

Somehow I am saddened to hear of the death of William Thurston, even though I never knew him nor can I even understand his work.

For years I have been pointing people at this article of his, which I find illuminating and inspiring, even in ways that have nothing to do with mathematics:
5-paragraph version:
Riffs on his “derivative” exercise:

User profile (
“Mathematics is a process of staring hard enough with enough perseverance at at the fog of muddle and confusion to eventually break through to improved clarity. I’m happy when I can admit, at least to myself, that my thinking is muddled, and I try to overcome the embarrassment that I might reveal ignorance or confusion. Over the years, this has helped me develop clarity in some things, but I remain muddled in many others. I enjoy questions that seem honest, even when they admit or reveal confusion, in preference to questions that appear designed to project sophistication.” :
“At that time, I prided myself in reading quickly. I was really amazed by my first encounters with serious mathematics textbooks. I was very interested and impressed by the quality of the reasoning, but it was quite hard to stay alert and focused. After a few experiences of reading a few pages only to discover that I really had no idea what I’d just read, I learned to drink lots of coffee, slow way down, and accept that I needed to read these books at 1/10th or 1/50th standard reading speed, pay attention to every single word and backtrack to look up all the obscure numbers of equations and theorems in order to follow the arguments.” :
“When listening to a lecture, I can’t possibly attend to every word: so many words blank out my thoughts. My attention repeatedly dives inward to my own thoughts and my own mental models, asking ‘what are they really saying?’ or ‘where is this going?’. I try to shortcut through my own understanding, then emerge to see if I’m still with the lecture. It’s the only way for me, and it often works.”
“My first tenure-track job interview was at Cornell. During and after my job talk, most people were pretty quiet, but there was one guy who kept asking very penetrating and insightful questions. And it was very confusing, because I knew all the number theorists at Cornell, and I had no idea who this guy was, or how it was that he obviously understood my talk better than anybody else in the room, possibly including me.”
“When I grew up I was a voracious reader. But when I started studying serious mathematical textbooks I was surprised how slowly I had to read, at a much lower speed than reading non-mathematical books. I couldn’t just gloss over the text. The phrasing and the symbols had been carefully chosen and every sign was important. Now I read differently. I rarely spend the time and effort to follow carefully every word and every equation in a mathematics article. I have come to appreciate the importance of the explanation that lies beneath the surface. […] I have little expectation for the words to be a good representation of the real ideas. I try to tunnel beneath the surface and to find shortcuts, checking in often enough to have a reasonable hope not to miss a major point. I have decided that daydreaming is not a bug but a feature. If I can drift away far enough to gain the perspective that allows me to see the big picture, noticing the details becomes both easier and less important.
I wish I had developed the skill of reading beneath the surface much earlier. As I read, I stop and ask, What’s the author trying to say? What is the author really thinking (if I suppose it is different from what he put in the mathematical text)? What do I think of this? I talk to myself back and forth while reading somebody else’s writing. But the main thing is to give myself time, to close my eyes, to give myself space, to reflect and allow my thoughts to form on their own in order to shape my ideas.” :
“A prominent mathematician once remarked to me that Thurston was the most underappreciated mathematician alive today. When I pointed out that Thurston had a Fields medal and innumerable other accolades, he replied that this was not incompatible with his thesis.”

He believed that this human understanding was what gave mathematics not only its utility but its beauty, and that mathematicians needed to improve their ability to communicate mathematical ideas rather than just the details of formal proofs.

Benson Farb, a mathematician at the University of Chicago and a student of Thurston, said in an email, “in my opinion Thurston is underrated: his influence goes far beyond the (enormous) content of his mathematics. He changed the way geometers/topologists think about mathematics. He changed our idea of what it means to ‘encounter’ and ‘interact with’ a geometric object. The geometry that came before almost looks like pure symbol pushing in comparison.”

Reactions like this are hard to explain: About the Proof&Progress article, says:

“Reading Thurston’s response was one of the most stirring intellectual experiences of my life. It truly struck a cord with my own conception of mathematics. To me, it has the status that the Declaration of Independence has to many Americans, or the U.N. charter has to other global citizens.”

See these for a better summary:

Edit [2012-09-28]: He has also done work in computer science! A fundamental result in data structures (on the number of rotations needed to transform one binary tree into another) was proved in a paper by Sleator–Tarjan–Thurston (with 278 citations). According to DBLP, he has three STOC papers.

See also: (“Mathematical Education”)

Edit [2016-02-20]: A collection of reminiscences about Thurston, in the Notices of the AMS, issues December 2015 and January 2016.

Written by S

Fri, 2012-08-24 at 15:28:19

Posted in mathematics

Are there Fibonacci numbers starting with 2012? (continued)

leave a comment »

Almost 8 months ago I left the first part of this post unfinished planning to complete it in the morning; seems I slept too long. (Or as this guy said after a 2-year hiatus: “Sorry for the pause guys. I was in the bathroom.”)

Recall: To get a power of 2 that starts with a prefix p (like p = 2012), we want n such that the fractional part of n\log 2 lies between those of \log p and \log(p+1) (all logarithms here to base 10), and similarly to get a Fibonacci number starting with p, we want (with some hand-waving) n such that the fractional part of n\log\phi lies between those of \log(p) + \log{\sqrt{5}} and \log(p+1) + \log{\sqrt{5}}. The more general problem is:

Problem: Given an irrational number \theta and an interval (a,b) \subset (0,1), find n such that \mathrm{frac}(n\theta) lies in the interval (a,b).

Here is one method, based on Edward Burger’s book Exploring the Number Jungle. Let \alpha be the midpoint of the interval (a,b). Then we are trying to find n such that \mathrm{frac}(n\theta) is close to \alpha.

  • Find a fraction \displaystyle \frac{p}{q} approximating \theta, such that \displaystyle |q\theta - p| < \frac1q. (These are the convergents of the continued fraction of \theta, but in practice it seems you can also get away with taking semi-convergents that may not satisfy this property.)
  • Let N be the closest integer to q\alpha. Note that this automatically means \displaystyle |q\alpha - N| \le \frac12
  • Write \displaystyle N = yp - xq with \displaystyle |y| \le \frac{q}{2}. This you can do quite easily with the Euclidean algorithm.
  • Then for n = q + y and k = p + x, we have (it is a simple exercise to prove this)
    \displaystyle |\theta n - k - \alpha| < \frac{3}{n}
  • This means that the distance between n\theta and \alpha is small, modulo 1. If this distance turns out to be still too large, start with a bigger convergent \frac{p}{q}.

I think I had some code to post as well (hey, what’s the actual Fibonacci number that starts with 2012?), but it needs to be cleaned up… probably this works (in Sage).

The thing we’re doing here is called inhomogeneous Diophantine approximation.

[Originally posted on math.stackexchange, here and here.]

Written by S

Fri, 2012-08-03 at 01:56:18

Posted in mathematics

Are there Fibonacci numbers starting with 2012?

with one comment

Do there exist powers of 2 whose first four digits are 2012?
Are there Fibonacci numbers whose first four digits are 2012?

If the answer is obvious to you (or you don’t care), you can stop reading.

The answer for both is:

  • Yes.
  • There are infinitely many such numbers.
  • In fact, the fraction of (powers of 2 / Fibonacci numbers) starting with 2012 is exactly

    \displaystyle \frac{\log 2013 - \log 2012}{\log 10} \approx \frac1{4644} \approx 0.000216

Similarly with any other prefix p (of any length) in place of 2012. Proof follows.

A number x starts with a prefix p if and only if for some k ≥ 0,

\displaystyle p \cdot 10^k \le x < (p+1) \cdot 10^k

Thus a power of 2, say 2n, starts with p iff

\displaystyle p \cdot 10^k \le 2^n < (p+1) \cdot 10^k for some k \ge 0

Taking logarithms to base 10 and simplifying, this is equivalent to

\displaystyle \log p < n\log2 - k < \log(p+1) for some k \ge 0

This is saying that the fractional part of n\log 2 lies between the fractional parts of \log p and \log (p+1). For example, if p = 2012, this means that the fractional part of n\log 2 lies between \log 2.012 \approx 0.303628 and \log 2.013 \approx 0.303844.

Similarly, for Fibonacci numbers, as is (or should be) well-known, the nth Fibonacci number Fn is the closest integer to \displaystyle \frac{\phi^n}{\sqrt5}, where \displaystyle \phi = \frac{1+\sqrt5}2 is the golden ratio. So F_n starts with p iff

\displaystyle p \cdot 10^k - \frac12 \quad < \quad \frac{\phi^n}{\sqrt5} \quad < \quad (p+1) \cdot 10^k - \frac12

Taking logarithms to base 10 and simplifying, while ignoring the annoying \frac12 which becomes irrelevant in the limit (this line is not rigorous), this is equivalent to

\displaystyle \log(p) + \log\sqrt5 \quad < \quad n\log\phi - k \quad < \quad \log(p+1) + \log\sqrt5

which means that the fractional part of n\log \phi lies between the fractional parts of \log p +  \log\sqrt5 and \log (p+1) + \log\sqrt5. For p = 2012, this means that the fractional part of n\log \phi lies between \log 2.012 + \log\sqrt5 \approx 0.653113 and \log 2.013 + \log\sqrt5 \approx 0.653329.

In either case, we are trying to make the fractional part of n\theta, for some irrational number \theta, lie in some interval. The relevant fact is this:
Theorem 1: for any irrational number \theta, the sequence \mathrm{frac}(n\theta) (where \mathrm{frac}(x) denotes the fractional part of x) is dense in [0, 1).
or, in other words,
Theorem 1: For any irrational number \theta, the sequence n\theta is dense modulo 1.
Proving this theorem is a good exercise.

This means that for any interval you want, you can always find some n such that the fractional part of n\theta lies in your interval. In fact, because the sequence is dense, you can find an infinite sequence of n such that the fractional parts of n\theta converge to the midpoint (say) of the desired interval. This proves the first two facts of the answer, and for the third we need a stronger theorem:

Theorem 2 [Equidistribution theorem]: For any irrational number \theta, the numbers n\theta are uniformly distributed modulo 1.

This means that for any interval I \subset [0,1] of size s (say), the fraction of integers n for which \mathrm{frac}(n\theta) lies in the interval satisfies

\displaystyle \lim_{N \to \infty} \frac{ \left|\left\{ 1 \le n \le N : \mathrm{frac}(n\theta) \in I\right\}\right|}{N} = s

This proves the third fact. The fraction of Fibonacci numbers (or of powers of a number that is not a power of 10) that start with a prefix p \ge 1 is exactly \log(p+1) - \log(p) where log is to base 10.

That much is standard. And non-constructive. We are assured of the existence of such numbers, but how do we actually find one?

The answer (or one answer), as it so often does in these problems, involves continued fractions. Here is one method, [to be continued when I wake up :p]
Edit: Continued here.

Written by S

Sun, 2012-01-08 at 01:29:26

Posted in mathematics


leave a comment »

[Posting some images here for possible future reuse.]

\displaystyle \lim_{n\to\infty}\left(1 + \frac{ix}{n}\right)^n = \cos x + i \sin x

A non-rigorous argument: when n is large enough so that x/n is small, (1 + ix/n) is roughly (hand-waving) the point on the unit circle at arc length (and hence angle) x/n:

So multiplication by (1+ix/n) roughly corresponds to rotation by angle x/n. Multiplication by (1+ix/n)^n, which is multiplication by (1+ix/n) n times, roughly corresponds to rotation by angle n(x/n) = x. As n \to \infty, “roughly” becomes exact.

Animation for x = 1:

Image generated from Python-generated SVG files; code available if anyone wants.

In particular, once one accepts the fact/definition that \lim_{n\to\infty}\left(1 + \frac{x}{n}\right)^n = e^x (for instance, show that the function f(x) = \lim_{n\to\infty}\left(1 + \frac{x}{n}\right)^n satisfies f(x+y) = f(x)f(y)), it is true that e^{i\pi} is a rotation by π, that is,

\displaystyle e^{i\pi} = -1

Written by S

Tue, 2011-06-21 at 18:04:25

Posted in mathematics

Serieshelpmate in 19

with 2 comments

Here’s a brilliant problem.

Consider the following chess position.

Black is to make 19 consecutive moves, after which White checkmates Black in one move. Black may not move into check, and may not check White (except possibly on his last move). Black and White are cooperating to achieve the aim of checkmate. (In chess problem parlance, this problem is called a serieshelpmate in 19.) How many different solutions are there?

This problem is due to Kauko Väisänen, and appears in A. Puusa, Queue Problems, Finnish Chess Problem Society, Helsinki, 1992 (Problem 2).

Hint: the above is quoted from Richard Stanley’s Enumerative Combinatorics.

Written by S

Sun, 2011-05-29 at 15:30:25

Posted in mathematics

How does Tupper’s self-referential formula work?

with 50 comments

[I write this post with a certain degree of embarrassment, because in the end it turns out (1) to be more simple than I anticipated, and (2) already done before, as I could have found if I had internet access when I did this. :-)]

The so-called “Tupper’s self-referential formula” is the following, due to Jeff Tupper.

Graph the set of all points {(x,y)} such that

\displaystyle  \frac12 < \left\lfloor \mathrm{mod} \left( \left\lfloor{\frac{y}{17}}\right\rfloor 2^{-17\lfloor x \rfloor - \mathrm{mod}(\lfloor y \rfloor, 17)}, 2 \right) \right\rfloor

in the region

\displaystyle  0 < x < 106

\displaystyle  N < y < N+17

where N is the following 544-digit integer:

The result is the following graph:

Figure 1: The graph of the formula, in some obscure region, is a picture of the formula itself.

Whoa. How does this work?

At first sight this is rather too incredible for words.

But after a few moments we can begin to guess what is going on, and see that—while clever—this is perhaps not so extraordinary after all. So let us calmly try to reverse-engineer this feat.

Read the rest of this entry »

Written by S

Tue, 2011-04-12 at 13:05:20

Posted in mathematics

Tagged with , , ,