Archive for the ‘mathematics’ Category
Some playing with Python
A long time ago, Diophantus (sort of) discussed integer solutions to the equation
(solutions to this equation are called Pythagorean triples).
Centuries later, in 1637, Fermat made a conjecture (now called Fermat’s Last Theorem, not because he uttered it in his dying breath, but because it was the last one to be proved — in ~1995) that
has no positive integer solutions for . In other words, his conjecture was that none of the following equations has a solution:
… and so on. An nth power cannot be partitioned into two nth powers.
About a century later, Euler proved the case of Fermat’s conjecture, but generalized it in a different direction: he conjectured in 1769 that an nth power cannot be partitioned into fewer than n nth powers, namely
has no solutions with . So his conjecture was that (among others) none of the following equations has a solution:
… and so on.
This conjecture stood for about two centuries, until abruptly it was found to be false, by Lander and Parkin who in 1966 simply did a direct search on the fastest (super)computer at the time, and found this counterexample:
(It is still one of only three examples known, according to Wikipedia.)
Now, how might you find this solution on a computer today?
In his wonderful (as always) post at bit-player, Brian Hayes showed the following code:
import itertools as it def four_fifths(n): '''Return smallest positive integers ((a,b,c,d),e) such that a^5 + b^5 + c^5 + d^5 = e^5; if no such tuple exists with e < n, return the string 'Failed'.''' fifths = [x**5 for x in range(n)] combos = it.combinations_with_replacement(range(1,n), 4) while True: try: cc = combos.next() cc_sum = sum([fifths[i] for i in cc]) if cc_sum in fifths: return(cc, fifths.index(cc_sum)) except StopIteration: return('Failed')
to which, if you add (say) print four_fifths(150)
and run it, it returns the correct answer fairly quickly: in about 47 seconds on my laptop.
The if cc_sum in fifths:
line inside the loop is an cost each time it’s run, so with a simple improvement to the code (using a set instead) and rewriting it a bit, we can write the following full program:
import itertools def find_counterexample(n): fifth_powers = [x**5 for x in range(n)] fifth_powers_set = set(fifth_powers) for xs in itertools.combinations_with_replacement(range(1, n), 4): xs_sum = sum([fifth_powers[i] for i in xs]) if xs_sum in fifth_powers_set: return (xs, fifth_powers.index(xs_sum)) return 'Failed' print find_counterexample(150)
which finishes in about 8.5 seconds.
Great!
But there’s something unsatisfying about this solution, which is that it assumes there’s a solution with all four numbers on the LHS less than 150. After all, changing the function invocation to find_counterexample(145)
makes it run a second faster even, but how could we know to do without already knowing the solution? Besides, we don’t have a fixed 8- or 10-second budget; what we’d really like is a program that keeps searching till it finds a solution or we abort it (or it runs out of memory or something), with no other fixed termination condition.
The above program used the given “n” as an upper bound to generate the combinations of 4 numbers; is there a way to generate all combinations when we don’t know an upper bound on them?
Yes! One of the things I learned from Knuth volume 4 is that if you simply write down each combination in descending order and order them lexicographically, the combinations you get for each upper bound are a prefix of the list of the next bigger one, i.e., for any upper bound, all the combinations form a prefix of the same infinite list, which starts as follows (line breaks for clarity):
1111, 2111, 2211, 2221, 2222, 3111, 3211, 3221, 3222, 3311, 3321, 3322, 3331, 3332, 3333, 4111, ... ... 9541, 9542, 9543, 9544, 9551, ... 9555, 9611, ...
There doesn’t seem to be a library function in Python to generate these though, so we can write our own. If we stare at the above list, we can figure out how to generate the next combination from a given one:
- Walk backwards from the end, till you reach the beginning or find an element that’s less than the previous one.
- Increase that element, set all the following elements to 1s, and continue.
We could write, say, the following code for it:
def all_combinations(r): xs = [1] * r while True: yield xs for i in range(r - 1, 0, -1): if xs[i] < xs[i - 1]: break else: i = 0 xs[i] += 1 xs[i + 1:] = [1] * (r - i - 1)
(The else
block on a for
loop is an interesting Python feature: it is executed if the loop wasn’t terminated with break
.) We could even hard-code the r=4
case, as we’ll see later below.
For testing whether a given number is a fifth power, we can no longer simply lookup in a fixed precomputed set. We can do a binary search instead:
def is_fifth_power(n): assert n > 0 lo = 0 hi = n # Invariant: lo^5 < n <= hi^5 while hi - lo > 1: mid = lo + (hi - lo) / 2 if mid ** 5 < n: lo = mid else: hi = mid return hi ** 5 == n
but it turns out that this is slower than one based on looking up in a growing set (as below).
Putting everything together, we can write the following (very C-like) code:
largest_known_fifth_power = (0, 0) known_fifth_powers = set() def is_fifth_power(n): global largest_known_fifth_power while n > largest_known_fifth_power[0]: m = largest_known_fifth_power[1] + 1 m5 = m ** 5 largest_known_fifth_power = (m5, m) known_fifth_powers.add(m5) return n in known_fifth_powers def fournums_with_replacement(): (x0, x1, x2, x3) = (1, 1, 1, 1) while True: yield (x0, x1, x2, x3) if x3 < x2: x3 += 1 continue x3 = 1 if x2 < x1: x2 += 1 continue x2 = 1 if x1 < x0: x1 += 1 continue x1 = 1 x0 += 1 continue if __name__ == '__main__': tried = 0 for get in fournums_with_replacement(): tried += 1 if (tried % 1000000 == 0): print tried, 'Trying:', get rhs = get[0]**5 + get[1]**5 + get[2]**5 + get[3]**5 if is_fifth_power(rhs): print 'Found:', get, rhs break
which is both longer and slower (takes about 20 seconds) than the original program, but at least we have the satisfaction that it doesn’t depend on any externally known upper bound.
I originally started writing this post because I wanted to describe some experiments I did with profiling, but it’s late and I’m sleepy so I’ll just mention it.
python -m cProfile euler_conjecture.py
will print relevant output in the terminal:
26916504 function calls in 26.991 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 18.555 18.555 26.991 26.991 euler_conjecture.py:1() 13458164 4.145 0.000 4.145 0.000 euler_conjecture.py:12(fournums_with_replacement) 13458163 4.292 0.000 4.292 0.000 euler_conjecture.py:3(is_fifth_power) 175 0.000 0.000 0.000 0.000 {method 'add' of 'set' objects} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
Another way to view the same thing is to write the profile output to a file and read it with cprofilev
:
python -m cProfile -o euler_profile.out euler_conjecture.py cprofilev euler_profile.out
and visit http://localhost:4000 to view it.
Of course, simply translating this code to C++ makes it run much faster:
#include <array> #include <iostream> #include <map> #include <utility> typedef long long Int; constexpr Int fifth_power(Int x) { return x * x * x * x * x; } std::map<Int, int> known_fifth_powers = {{0, 0}}; bool is_fifth_power(Int n) { while (n > known_fifth_powers.rbegin()->first) { int m = known_fifth_powers.rbegin()->second + 1; known_fifth_powers[fifth_power(m)] = m; } return known_fifth_powers.count(n); } std::array<Int, 4> four_nums() { static std::array<Int, 4> x = {1, 1, 1, 0}; int i = 3; while (i > 0 && x[i] == x[i - 1]) --i; x[i] += 1; while (++i < 4) x[i] = 1; return x; } std::ostream& operator<<(std::ostream& os, std::array<Int, 4> x) { os << "(" << x[0] << ", " << x[1] << ", " << x[2] << ", " << x[3] << ")"; return os; } int main() { while (true) { std::array<Int, 4> get = four_nums(); Int rhs = fifth_power(get[0]) + fifth_power(get[1]) + fifth_power(get[2]) + fifth_power(get[3]); if (is_fifth_power(rhs)) { std::cout << "Found: " << get << " " << known_fifth_powers[rhs] << std::endl; break; } } }
and
clang++ -std=c++11 euler_conjecture.cc && time ./a.out
runs in 2.43s, or 0.36s if compiled with -O2.
But I don’t have a satisfactory answer to how to make our Python program which takes 20 seconds as fast as the 8.5-second known-upper-bound version.
Edit [2015-05-08]: I wrote some benchmarking code to compare all the different “combination” functions.
import itertools # Copied from the Python documentation def itertools_equivalent(iterable, r): pool = tuple(iterable) n = len(pool) if not n and r: return indices = [0] * r yield tuple(pool[i] for i in indices) while True: for i in reversed(range(r)): if indices[i] != n - 1: break else: return indices[i:] = [indices[i] + 1] * (r - i) yield tuple(pool[i] for i in indices) # Above function, specialized to first argument being range(1, n) def itertools_equivalent_specialized(n, r): indices = [1] * r yield indices while True: for i in reversed(range(r)): if indices[i] != n - 1: break else: return indices[i:] = [indices[i] + 1] * (r - i) yield indices # Function to generate all combinations of 4 elements def all_combinations_pythonic(r): xs = [1] * r while True: yield xs for i in range(r - 1, 0, -1): if xs[i] < xs[i - 1]: break else: i = 0 xs[i] += 1 xs[i + 1:] = [1] * (r - i - 1) # Above function, written in a more explicit C-like way def all_combinations_clike(r): xs = [1] * r while True: yield xs i = r - 1 while i > 0 and xs[i] == xs[i - 1]: i -= 1 xs[i] += 1 while i < r - 1: i += 1 xs[i] = 1 # Above two functions, specialized to r = 4, using tuple over list. def fournums(): (x0, x1, x2, x3) = (1, 1, 1, 1) while True: yield (x0, x1, x2, x3) if x3 < x2: x3 += 1 continue x3 = 1 if x2 < x1: x2 += 1 continue x2 = 1 if x1 < x0: x1 += 1 continue x1 = 1 x0 += 1 continue # Benchmarks for all functions defined above (and the library function) def benchmark_itertools(n): for xs in itertools.combinations_with_replacement(range(1, n), 4): if xs[0] >= n: break def benchmark_itertools_try(n): combinations = itertools.combinations_with_replacement(range(1, n), 4) while True: try: xs = combinations.next() if xs[0] >= n: break except StopIteration: return def benchmark_itertools_equivalent(n): for xs in itertools_equivalent(range(1, n), 4): if xs[0] >= n: break def benchmark_itertools_equivalent_specialized(n): for xs in itertools_equivalent_specialized(n, 4): if xs[0] >= n: break def benchmark_all_combinations_pythonic(n): for xs in all_combinations_pythonic(4): if xs[0] >= n: break def benchmark_all_combinations_clike(n): for xs in all_combinations_clike(4): if xs[0] >= n: break def benchmark_fournums(n): for xs in fournums(): if xs[0] >= n: break if __name__ == '__main__': benchmark_itertools(150) benchmark_itertools_try(150) benchmark_itertools_equivalent(150) benchmark_itertools_equivalent_specialized(150) benchmark_all_combinations_pythonic(150) benchmark_all_combinations_clike(150) benchmark_fournums(150)
As you can see, I chose inside the benchmarking function the same statement that would cause all_combinations
to terminate, and have no effect for the other combination functions.
When run with
python -m cProfile benchmark_combinations.py
the results include:
2.817 benchmark_combinations.py:80(benchmark_itertools) 8.583 benchmark_combinations.py:84(benchmark_itertools_try) 126.980 benchmark_combinations.py:93(benchmark_itertools_equivalent) 46.635 benchmark_combinations.py:97(benchmark_itertools_equivalent_specialized) 44.032 benchmark_combinations.py:101(benchmark_all_combinations_pythonic) 18.049 benchmark_combinations.py:105(benchmark_all_combinations_clike) 10.923 benchmark_combinations.py:109(benchmark_fournums)
Lessons:
- Calling
itertools.combinations_with_replacement
is by far the fastest, taking about 2.7 seconds. It turns out that it’s written in C, so this would be hard to beat. (Still, writing it in atry
block is seriously bad.) - The “equivalent” Python code from the itertools documentation (
benchmark_itertools_combinations_with_replacment
) is about 50x slower. - Gets slightly better when specialized to numbers.
- Simply generating all combinations without an upper bound is actually faster.
- It can be made even faster by writing it in a more C-like way.
- The tuples version with the loop unrolled manually is rather fast when seen in this light, less than 4x slower than the library version.
Colliding balls approximate pi
Found via G+, a new physical experiment that approximates , like Buffon’s needle problem: The Pi Machine.
Roughly, the amazing discovery of Gregory Galperin is this: When a ball of mass collides with one of ball , propelling it towards a wall, the number of collisions (assuming standard physics idealisms) is , so by taking , we can get the first digits of . Note that this number of collisions is an entirely determinstic quantity; there’s no probability is involved!
Here’s a video demonstrating the fact for (the blue ball is the heavier one):
The NYT post says how this discovery came about:
Dr. Galperin’s approach was also geometric but very different (using an unfolding geodesic), building on prior related insights. Dr. Galperin, who studied under well-known Russian mathematician Andrei Kolmogorov, had recently written (with Yakov Sinai) extensively on ball collisions, realized just before a talk in 1995 that a plot of the ball positions of a pair of colliding balls could be used to determine pi. (When he mentioned this insight in the talk, no one in the audience believed him.) This finding was ultimately published as “Playing Pool With Pi” in a 2003 issue of Regular and Chaotic Dynamics.
The paper, Playing Pool With π (The number π from a billiard point of view) is very readable. The post has, despite a “solution” section, essentially no explanation, but the two comments by Dave in the comments section explain it clearly. And a reader sent in a cleaned-up version of that too: here, by Benjamin Wearn who teaches physics at Fieldston School.
Now someone needs to make a simulation / animation graphing the two balls in phase space of momentum. :-)
I’d done something a while ago, to illustrate The Orbit of the Moon around the Sun is Convex!, here. Probably need to re-learn all that JavaScript stuff, to make one for this. Leaving this post here as a placeholder.
Or maybe someone has done it already?
Prefatory apprehension
Robert Recorde’s 1557 book is noted for being the first to introduce the equals sign =, and is titled:
The Whetstone of Witte: whiche is the seconde parte of Arithmeteke: containing the extraction of rootes; the cossike practise, with the rule of equation; and the workes of Surde Nombers.
Its title page (see http://www.maa.org/publications/periodicals/convergence/mathematical-treasures-robert-recordes-whetstone-of-witte, see also the full book at https://archive.org/stream/TheWhetstoneOfWitte#page/n0/mode/2up) contains this verse:
Original spelling
Though many ſtones doe beare greate price, The grounde of artes did brede this ſtone: |
Modern spelling
Though many stones do bear great price, The ground of arts did breed this stone; |
Apparently the full title contains a pun (see http://www.pballew.net/arithm17.html): “the cossike practise” in the title refers to algebra, as the Latin cosa apparently meaning “a thing” was used to stand for an unknown, abbreviated to cos — but the Latin word cos itself means a grindstone.
The author again reminds readers not to blame his book, at the end of his preface:
To the curiouſe ſcanner.
If you ought finde, as ſome men maie, But if you mende not that you blame, |
Authors are either anxious about how their book is received, or make sure to be pointedly uncaring.
Sir Arthur Conan Doyle, in a mostly forgettable volume of poetry (Songs of the Road, 1911), begins:
If it were not for the hillocks
You’d think little of the hills;
The rivers would seem tiny
If it were not for the rills.
If you never saw the brushwood
You would under-rate the trees;
And so you see the purpose
Of such little rhymes as these.
Kālidāsa of course begins his Raghuvaṃśa with a grand disclaimer:
kva sūryaprabhavo vaṃśaḥ kva cālpaviṣayā matiḥ /
titīrṣur dustaram mohād uḍupenāsmi sāgaram // Ragh_1.2 //mandaḥ kaviyaśaḥ prārthī gamiṣyāmy upahāsyatām /
prāṃśulabhye phale lobhād udbāhur iva vāmanaḥ // Ragh_1.3 //atha vā kṛtavāgdvāre vaṃśe ‘smin pūrvasūribhiḥ /
maṇau vajrasamutkīrṇe sūtrasyevāsti me gatiḥ // Ragh_1.4 //
But the most nonchalant I’ve seen, thanks to Dr. Ganesh, is this gīti by Śrīkṛṣṇa Brahmatantra Yatīndra of the Parakāla Maṭha, Mysore:
nindatu vā nandatu vā
mandamanīṣā niśamya kṛtim etām
harṣaṃ vā marṣaṃ vā
sarṣapamātram api naiva vindema
Screw you guys. :-)
Big O() notation: a couple of sources
This post contains, just for future reference, a couple of primary sources relevant to the (“Big O”) notation:
- Some introductory words from Asymptotic Methods in Analysis by de Bruijn
- An letter from Donald Knuth on an approach to teaching calculus using this notation.
Visualizing product of permutations
A simple pedagogical trick that may come in handy: represent a permutation using arrows (curved lines) from to for each . Then, the product of two permutations can be represented by just putting the two corresponding figures (sets of arrows) one below the other, and following the arrows.
The figure is from an article called Symmetries by Alain Connes, found via the Wikipedia article on Morley’s trisector theorem (something entirely unrelated to permutations, but the article covers both of them and more).
I’m thinking how one might write a program to actually draw these: if we decide that the “height” of the figure is some , then each arrow needs to go from some to (using here the usual screen convention of coordinate increasing from left to right, and coordinate increasing from top to bottom). Further, each curve needs to have vertical slope at its two endpoints, so that successive curves can line up smoothly. The constraint on starting point, ending point, and directions at the endpoints defines almost a quadratic Bezier curve, except that here the two directions are parallel. So it’s somewhere between a quadratic and the (usual) cubic Bezier curve, which is given by the start point, end point, and derivatives at the start and end point. (Here we only care about the direction of the derivative; we can pick some arbitrary magnitude to fix the curve: the larger we pick, the more smooth it will look at the ends, at the cost of smoothness in the interior.)
Even knowing the curve, how do we generate an image?
The idea of logarithms, and the first appearance of e
[Incomplete post: about 10% written, about 90% left.]
The notion of the number , the exponential function , and logarithms are often conceptual stumbling blocks even to someone who has an otherwise solid understanding of middle-school mathematics.
Just what is the number ? How was it first calculated / where did it first turn up? Premature exposure to its numerical value
only serves to deepen the mysteriousness and to make it seem arbitrary.
Here a historical perspective helps: as is often the case, here too, the first appearance is simpler and more well-motivated than the accounts in dry textbooks. This is from this account by Matthew P. Wiener (originally posted on USENET somewhere, as quoted by MJD). I’m just going to quote it directly for now, and edit it later:
Napier, who invented logarithms, more or less worked out a table of logarithms to base , as follows:
0 1 2 3 4 5 6 7 8 9 10 ... 1 2 4 8 16 32 64 128 256 512 1024 ...The arithmetic progression in the first row is matched by a geometric progression in the second row. If, by any luck, you happen to wish to multiply 16 by 32, that just happen to be in the bottom row, you can look up their “logs” in the first row and add 4+5 to get 9 and then conclude 16·32=512.
For most practical purposes, this is useless. Napier realized that what one needs to multiply in general is for a base—the intermediate values will be much more extensive. For example, with base 1.01, we get:
0 1.00 1 1.01 2 1.02 3 1.03 4 1.04 5 1.05 6 1.06 7 1.07 8 1.08 9 1.09 10 1.10 11 1.12 12 1.13 13 1.14 14 1.15 15 1.16 16 1.17 17 1.18 18 1.20 19 1.21 20 1.22 21 1.23 22 1.24 23 1.26 24 1.27 25 1.28 26 1.30 27 1.31 28 1.32 29 1.33 30 1.35 31 1.36 32 1.37 33 1.39 34 1.40 35 1.42 [...] 50 1.64 51 1.66 52 1.68 53 1.69 54 1.71 55 1.73 [...] 94 2.55 95 2.57 96 2.60 97 2.63 98 2.65 99 2.68 100 2.70 101 2.73 102 2.76 103 2.79 104 2.81 105 2.84 [...]So if you need to multiply 1.27 by 1.33, say, just look up their logs, in this case, 24 and 29, add them, and get 53, so 1.27·1.33=1.69. For two/three digit arithmetic, the table only needs entries up to 9.99.
Note that is almost there, as the antilogarithm of 100. The natural logarithm of a number can be read off from the above table, as just [approximately] the corresponding exponent.
What Napier actually did was work with base .9999999. He spent 20 years computing powers of .9999999 by hand, producing a grand version of the above. That’s it. No deep understanding of anything, no calculus, and pops up anyway—in Napier’s case, was the 10 millionth entry. (To be pedantic, Napier did not actually use decimal points, that being a new fangled notion at the time.)
Later, in his historic meeting with Briggs, two changes were made. A switch to a base was made, so that logarithms would scale in the same direction as the numbers, and the spacing on the logarithm sides was chosen so that . These two changes were, in effect, just division by .
In other words, made its first appearance rather implicitly.
(I had earlier read a book on Napier and come to the same information though a lot less clearly, here.)
I had started writing a series of posts leading up to an understanding of the exponential function (here, here, here), but it seems to have got abandoned. Consider this one a contribution to that series.
The functional equation f(x+y) = f(x)f(y)
Suppose satisfies . What can we say about ?
Putting gives
which can happen if either or . Note that the function which is identically zero satisfies the functional equation. If is not this function, i.e., if for at least one value of , then plugging that value of (say ) into the equation gives . Also, for any , the equation forces as well. Further, so for all .
Next, putting gives , and by induction . Putting in place of in this gives which means (note we’re using here). And again, . So , which completely defines the function at rational points.
[As , it can be written as for some constant , which gives for rational .]
To extend this function to irrational numbers, we need some further assumptions on , such as continuity. It turns out that being continuous at any point is enough (and implies the function is everywhere): note that . Even being Lebesgue-integrable/measurable will do.
Else, there are discontinuous functions satisfying the functional equation. (Basically, we can define the value of the function separately on each “independent” part. That is, define the equivalence class where and are related if for rationals and , pick a representative for each class using the axiom of choice (this is something like picking a basis for , which corresponds to the equivalence class defined by the relation ), define the value of the function independently for each representative, and this fixes the value of on . See this article for more details.)
To step back a bit: what the functional equation says is that is a homorphism from , the additive group of real numbers, to , the multiplicative monoid of real numbers. If is not the trivial identically-zero function, then (as we saw above) is in fact a homomorphism from , the additive group of real numbers, to , the multiplicative group of positive real numbers. What we proved is that the exponential functions are precisely all such functions that are nice (nice here meaning either measurable or continuous at least one point). (Note that this set includes the trivial homomorphism corresponding to : the function identically everywhere. If is not this trivial map, then it is in fact an isomorphism.)
Edit [2013-10-11]: See also Overview of basic facts about Cauchy functional equation.