Archive for the ‘mathematics’ Category
The generating function for Smirnov words (or: time until k consecutive results are the same)
1. Alphabet
Suppose we have an alphabet of size . Its generating function (using the variable to mark length) is simply , as contains elements of length each.
2. Words
Let denote the class of all words over the alphabet . There are many ways to find the generating function for .
2.1.
We have
so its generating function is
2.2.
To put it differently, in the symbolic framework, we have , so the generating function for is
2.3.
We could have arrived at this with direct counting: the number of words of length is as there are choices for each of the letters, so the generating function is
3. Smirnov words
Next, let denote the class of Smirnov words over the alphabet , defined as words in which no two consecutive letters are identical. (That is, words in which for all , and for any .) Again, we can find the generating function for in different ways.
3.1.
For any word in , by “collapsing” all runs of each letter, we get a Smirnov word. To put it differently, any word in can be obtained from a Smirnov word by “expanding” each letter into a nonempty sequence of that letter. This observation (see Analytic Combinatorics, pp. 204–205) lets us relate the generating functions of and as
which implicitly gives the generating function : we have
3.2.
Alternatively, consider in an arbitrary word the first occurrence of a pair of repeated letters. Either this doesn’t happen at all (the word is a Smirnov word), or else, if it happens at position so that , then the part of the word up to position is a nonempty Smirnov word, the letter at position is the same as the previous letter, and everything after is an arbitrary word. This gives
or in terms of generating functions
giving
3.3.
A minor variant is to again pick an arbitrary word and consider its first pair of repeated letters, happening (if it does) at positions and , but this time consider the prefix up to : either it is empty, or the pair of letters is different from the last letter of the prefix, giving us the decomposition
and corresponding generating function
so
which is the same as before after we cancel the factors.
3.4.
We could have arrived at this result with direct counting. For , for a Smirnov word of length , we have choices for the first letter, and for each of the other letters, as they must not be the same as the previous letter, we have choices. This gives the number of Smirnov words of length as for , and so the generating function for Smirnov words is
again giving
4. Words with bounded runs
We can now generalize. Let denote the class of words in which no letter occurs more than times consecutively. (.) We can find the generating function for .
4.1.
To get a word in we can take a Smirnov word and replace each letter with a nonempty sequence of up to occurrences of that letter. This gives:
so
4.2.
Pick any arbitrary word, and consider its first occurrence of a run of letters. Either such a run does not exist (which means the word we picked is in ), or it occurs right at the beginning ( possibilities, one for each letter in the alphabet), or, if it occurs starting at position , then the part of the word up to position (the “prefix”) is a nonempty Smirnov word, positions to are occurrences of any of the letters other than the last letter of the prefix, and what follows is an arbitrary word. This gives
or in terms of generating functions
so
giving
4.3.
Arriving at this via direct counting seems hard.
5. Words that stop at a long run
Now consider words in which we “stop” as soon we see consecutive identical letters. Let the class of such words be denoted (not writing to keep the notation simple). As before, we can find its generating function in multiple ways.
5.1.
We get any word in by either immediately seeing a run of length and stopping, or by starting with a nonempty prefix in , and then stopping with a run of identical letters different from the last letter of the prefix. Thus we have
and
which gives
5.2.
Alternatively, we can decompose any word by looking for its first run of identical letters. Either it doesn’t occur at all (the word we picked is in , or the part of the word until the end of the run belongs to and the rest is an arbitrary word, so
and
so
6. Probability
Finally we arrive at the motivation: suppose we keep appending a random letter from the alphabet, until we encounter the same letter times consecutively. What can we say about the length of the word thus generated? As all sequences of letters are equally likely, the probability of seeing any string of length is . So in the above generating function , the probability of our word having length is , and the probability generating function is therefore . This can be got by replacing with in the expression for : we have
In principle, this probability generating function tells us everything about the distribution of the length of the word. For example, its expected length is
(See this question on Quora for other powerful ways of finding this expected value directly.)
We can also find its variance, as
This variance is really too large to be useful, so what we would really like, is the shape of the distribution… to be continued.
Some playing with Python
A long time ago, Diophantus (sort of) discussed integer solutions to the equation
(solutions to this equation are called Pythagorean triples).
Centuries later, in 1637, Fermat made a conjecture (now called Fermat’s Last Theorem, not because he uttered it in his dying breath, but because it was the last one to be proved — in ~1995) that
has no positive integer solutions for . In other words, his conjecture was that none of the following equations has a solution:
… and so on. An nth power cannot be partitioned into two nth powers.
About a century later, Euler proved the case of Fermat’s conjecture, but generalized it in a different direction: he conjectured in 1769 that an nth power cannot be partitioned into fewer than n nth powers, namely
has no solutions with . So his conjecture was that (among others) none of the following equations has a solution:
… and so on.
This conjecture stood for about two centuries, until abruptly it was found to be false, by Lander and Parkin who in 1966 simply did a direct search on the fastest (super)computer at the time, and found this counterexample:
(It is still one of only three examples known, according to Wikipedia.)
Now, how might you find this solution on a computer today?
In his wonderful (as always) post at bit-player, Brian Hayes showed the following code:
import itertools as it def four_fifths(n): '''Return smallest positive integers ((a,b,c,d),e) such that a^5 + b^5 + c^5 + d^5 = e^5; if no such tuple exists with e < n, return the string 'Failed'.''' fifths = [x**5 for x in range(n)] combos = it.combinations_with_replacement(range(1,n), 4) while True: try: cc = combos.next() cc_sum = sum([fifths[i] for i in cc]) if cc_sum in fifths: return(cc, fifths.index(cc_sum)) except StopIteration: return('Failed')
to which, if you add (say) print four_fifths(150)
and run it, it returns the correct answer fairly quickly: in about 47 seconds on my laptop.
The if cc_sum in fifths:
line inside the loop is an cost each time it’s run, so with a simple improvement to the code (using a set instead) and rewriting it a bit, we can write the following full program:
import itertools def find_counterexample(n): fifth_powers = [x**5 for x in range(n)] fifth_powers_set = set(fifth_powers) for xs in itertools.combinations_with_replacement(range(1, n), 4): xs_sum = sum([fifth_powers[i] for i in xs]) if xs_sum in fifth_powers_set: return (xs, fifth_powers.index(xs_sum)) return 'Failed' print find_counterexample(150)
which finishes in about 8.5 seconds.
Great!
But there’s something unsatisfying about this solution, which is that it assumes there’s a solution with all four numbers on the LHS less than 150. After all, changing the function invocation to find_counterexample(145)
makes it run a second faster even, but how could we know to do without already knowing the solution? Besides, we don’t have a fixed 8- or 10-second budget; what we’d really like is a program that keeps searching till it finds a solution or we abort it (or it runs out of memory or something), with no other fixed termination condition.
The above program used the given “n” as an upper bound to generate the combinations of 4 numbers; is there a way to generate all combinations when we don’t know an upper bound on them?
Yes! One of the things I learned from Knuth volume 4 is that if you simply write down each combination in descending order and order them lexicographically, the combinations you get for each upper bound are a prefix of the list of the next bigger one, i.e., for any upper bound, all the combinations form a prefix of the same infinite list, which starts as follows (line breaks for clarity):
1111, 2111, 2211, 2221, 2222, 3111, 3211, 3221, 3222, 3311, 3321, 3322, 3331, 3332, 3333, 4111, ... ... 9541, 9542, 9543, 9544, 9551, ... 9555, 9611, ...
There doesn’t seem to be a library function in Python to generate these though, so we can write our own. If we stare at the above list, we can figure out how to generate the next combination from a given one:
- Walk backwards from the end, till you reach the beginning or find an element that’s less than the previous one.
- Increase that element, set all the following elements to 1s, and continue.
We could write, say, the following code for it:
def all_combinations(r): xs = [1] * r while True: yield xs for i in range(r - 1, 0, -1): if xs[i] < xs[i - 1]: break else: i = 0 xs[i] += 1 xs[i + 1:] = [1] * (r - i - 1)
(The else
block on a for
loop is an interesting Python feature: it is executed if the loop wasn’t terminated with break
.) We could even hard-code the r=4
case, as we’ll see later below.
For testing whether a given number is a fifth power, we can no longer simply lookup in a fixed precomputed set. We can do a binary search instead:
def is_fifth_power(n): assert n > 0 lo = 0 hi = n # Invariant: lo^5 < n <= hi^5 while hi - lo > 1: mid = lo + (hi - lo) / 2 if mid ** 5 < n: lo = mid else: hi = mid return hi ** 5 == n
but it turns out that this is slower than one based on looking up in a growing set (as below).
Putting everything together, we can write the following (very C-like) code:
largest_known_fifth_power = (0, 0) known_fifth_powers = set() def is_fifth_power(n): global largest_known_fifth_power while n > largest_known_fifth_power[0]: m = largest_known_fifth_power[1] + 1 m5 = m ** 5 largest_known_fifth_power = (m5, m) known_fifth_powers.add(m5) return n in known_fifth_powers def fournums_with_replacement(): (x0, x1, x2, x3) = (1, 1, 1, 1) while True: yield (x0, x1, x2, x3) if x3 < x2: x3 += 1 continue x3 = 1 if x2 < x1: x2 += 1 continue x2 = 1 if x1 < x0: x1 += 1 continue x1 = 1 x0 += 1 continue if __name__ == '__main__': tried = 0 for get in fournums_with_replacement(): tried += 1 if (tried % 1000000 == 0): print tried, 'Trying:', get rhs = get[0]**5 + get[1]**5 + get[2]**5 + get[3]**5 if is_fifth_power(rhs): print 'Found:', get, rhs break
which is both longer and slower (takes about 20 seconds) than the original program, but at least we have the satisfaction that it doesn’t depend on any externally known upper bound.
I originally started writing this post because I wanted to describe some experiments I did with profiling, but it’s late and I’m sleepy so I’ll just mention it.
python -m cProfile euler_conjecture.py
will print relevant output in the terminal:
26916504 function calls in 26.991 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 18.555 18.555 26.991 26.991 euler_conjecture.py:1() 13458164 4.145 0.000 4.145 0.000 euler_conjecture.py:12(fournums_with_replacement) 13458163 4.292 0.000 4.292 0.000 euler_conjecture.py:3(is_fifth_power) 175 0.000 0.000 0.000 0.000 {method 'add' of 'set' objects} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
Another way to view the same thing is to write the profile output to a file and read it with cprofilev
:
python -m cProfile -o euler_profile.out euler_conjecture.py cprofilev euler_profile.out
and visit http://localhost:4000 to view it.
Of course, simply translating this code to C++ makes it run much faster:
#include <array> #include <iostream> #include <map> #include <utility> typedef long long Int; constexpr Int fifth_power(Int x) { return x * x * x * x * x; } std::map<Int, int> known_fifth_powers = {{0, 0}}; bool is_fifth_power(Int n) { while (n > known_fifth_powers.rbegin()->first) { int m = known_fifth_powers.rbegin()->second + 1; known_fifth_powers[fifth_power(m)] = m; } return known_fifth_powers.count(n); } std::array<Int, 4> four_nums() { static std::array<Int, 4> x = {1, 1, 1, 0}; int i = 3; while (i > 0 && x[i] == x[i - 1]) --i; x[i] += 1; while (++i < 4) x[i] = 1; return x; } std::ostream& operator<<(std::ostream& os, std::array<Int, 4> x) { os << "(" << x[0] << ", " << x[1] << ", " << x[2] << ", " << x[3] << ")"; return os; } int main() { while (true) { std::array<Int, 4> get = four_nums(); Int rhs = fifth_power(get[0]) + fifth_power(get[1]) + fifth_power(get[2]) + fifth_power(get[3]); if (is_fifth_power(rhs)) { std::cout << "Found: " << get << " " << known_fifth_powers[rhs] << std::endl; break; } } }
and
clang++ -std=c++11 euler_conjecture.cc && time ./a.out
runs in 2.43s, or 0.36s if compiled with -O2.
But I don’t have a satisfactory answer to how to make our Python program which takes 20 seconds as fast as the 8.5-second known-upper-bound version.
Edit [2015-05-08]: I wrote some benchmarking code to compare all the different “combination” functions.
import itertools # Copied from the Python documentation def itertools_equivalent(iterable, r): pool = tuple(iterable) n = len(pool) if not n and r: return indices = [0] * r yield tuple(pool[i] for i in indices) while True: for i in reversed(range(r)): if indices[i] != n - 1: break else: return indices[i:] = [indices[i] + 1] * (r - i) yield tuple(pool[i] for i in indices) # Above function, specialized to first argument being range(1, n) def itertools_equivalent_specialized(n, r): indices = [1] * r yield indices while True: for i in reversed(range(r)): if indices[i] != n - 1: break else: return indices[i:] = [indices[i] + 1] * (r - i) yield indices # Function to generate all combinations of 4 elements def all_combinations_pythonic(r): xs = [1] * r while True: yield xs for i in range(r - 1, 0, -1): if xs[i] < xs[i - 1]: break else: i = 0 xs[i] += 1 xs[i + 1:] = [1] * (r - i - 1) # Above function, written in a more explicit C-like way def all_combinations_clike(r): xs = [1] * r while True: yield xs i = r - 1 while i > 0 and xs[i] == xs[i - 1]: i -= 1 xs[i] += 1 while i < r - 1: i += 1 xs[i] = 1 # Above two functions, specialized to r = 4, using tuple over list. def fournums(): (x0, x1, x2, x3) = (1, 1, 1, 1) while True: yield (x0, x1, x2, x3) if x3 < x2: x3 += 1 continue x3 = 1 if x2 < x1: x2 += 1 continue x2 = 1 if x1 < x0: x1 += 1 continue x1 = 1 x0 += 1 continue # Benchmarks for all functions defined above (and the library function) def benchmark_itertools(n): for xs in itertools.combinations_with_replacement(range(1, n), 4): if xs[0] >= n: break def benchmark_itertools_try(n): combinations = itertools.combinations_with_replacement(range(1, n), 4) while True: try: xs = combinations.next() if xs[0] >= n: break except StopIteration: return def benchmark_itertools_equivalent(n): for xs in itertools_equivalent(range(1, n), 4): if xs[0] >= n: break def benchmark_itertools_equivalent_specialized(n): for xs in itertools_equivalent_specialized(n, 4): if xs[0] >= n: break def benchmark_all_combinations_pythonic(n): for xs in all_combinations_pythonic(4): if xs[0] >= n: break def benchmark_all_combinations_clike(n): for xs in all_combinations_clike(4): if xs[0] >= n: break def benchmark_fournums(n): for xs in fournums(): if xs[0] >= n: break if __name__ == '__main__': benchmark_itertools(150) benchmark_itertools_try(150) benchmark_itertools_equivalent(150) benchmark_itertools_equivalent_specialized(150) benchmark_all_combinations_pythonic(150) benchmark_all_combinations_clike(150) benchmark_fournums(150)
As you can see, I chose inside the benchmarking function the same statement that would cause all_combinations
to terminate, and have no effect for the other combination functions.
When run with
python -m cProfile benchmark_combinations.py
the results include:
2.817 benchmark_combinations.py:80(benchmark_itertools) 8.583 benchmark_combinations.py:84(benchmark_itertools_try) 126.980 benchmark_combinations.py:93(benchmark_itertools_equivalent) 46.635 benchmark_combinations.py:97(benchmark_itertools_equivalent_specialized) 44.032 benchmark_combinations.py:101(benchmark_all_combinations_pythonic) 18.049 benchmark_combinations.py:105(benchmark_all_combinations_clike) 10.923 benchmark_combinations.py:109(benchmark_fournums)
Lessons:
- Calling
itertools.combinations_with_replacement
is by far the fastest, taking about 2.7 seconds. It turns out that it’s written in C, so this would be hard to beat. (Still, writing it in atry
block is seriously bad.) - The “equivalent” Python code from the itertools documentation (
benchmark_itertools_combinations_with_replacment
) is about 50x slower. - Gets slightly better when specialized to numbers.
- Simply generating all combinations without an upper bound is actually faster.
- It can be made even faster by writing it in a more C-like way.
- The tuples version with the loop unrolled manually is rather fast when seen in this light, less than 4x slower than the library version.
Colliding balls approximate pi
Found via G+, a new physical experiment that approximates , like Buffon’s needle problem: The Pi Machine.
Roughly, the amazing discovery of Gregory Galperin is this: When a ball of mass collides with one of ball , propelling it towards a wall, the number of collisions (assuming standard physics idealisms) is , so by taking , we can get the first digits of . Note that this number of collisions is an entirely determinstic quantity; there’s no probability is involved!
Here’s a video demonstrating the fact for (the blue ball is the heavier one):
The NYT post says how this discovery came about:
Dr. Galperin’s approach was also geometric but very different (using an unfolding geodesic), building on prior related insights. Dr. Galperin, who studied under well-known Russian mathematician Andrei Kolmogorov, had recently written (with Yakov Sinai) extensively on ball collisions, realized just before a talk in 1995 that a plot of the ball positions of a pair of colliding balls could be used to determine pi. (When he mentioned this insight in the talk, no one in the audience believed him.) This finding was ultimately published as “Playing Pool With Pi” in a 2003 issue of Regular and Chaotic Dynamics.
The paper, Playing Pool With π (The number π from a billiard point of view) is very readable. The post has, despite a “solution” section, essentially no explanation, but the two comments by Dave in the comments section explain it clearly. And a reader sent in a cleaned-up version of that too: here, by Benjamin Wearn who teaches physics at Fieldston School.
Now someone needs to make a simulation / animation graphing the two balls in phase space of momentum. :-)
I’d done something a while ago, to illustrate The Orbit of the Moon around the Sun is Convex!, here. Probably need to re-learn all that JavaScript stuff, to make one for this. Leaving this post here as a placeholder.
Or maybe someone has done it already?
Prefatory apprehension
Robert Recorde’s 1557 book is noted for being the first to introduce the equals sign =, and is titled:
The Whetstone of Witte: whiche is the seconde parte of Arithmeteke: containing the extraction of rootes; the cossike practise, with the rule of equation; and the workes of Surde Nombers.
Its title page (see http://www.maa.org/publications/periodicals/convergence/mathematical-treasures-robert-recordes-whetstone-of-witte, see also the full book at https://archive.org/stream/TheWhetstoneOfWitte#page/n0/mode/2up) contains this verse:
Original spelling
Though many ſtones doe beare greate price, The grounde of artes did brede this ſtone: |
Modern spelling
Though many stones do bear great price, The ground of arts did breed this stone; |
Apparently the full title contains a pun (see http://www.pballew.net/arithm17.html): “the cossike practise” in the title refers to algebra, as the Latin cosa apparently meaning “a thing” was used to stand for an unknown, abbreviated to cos — but the Latin word cos itself means a grindstone.
The author again reminds readers not to blame his book, at the end of his preface:
To the curiouſe ſcanner.
If you ought finde, as ſome men maie, But if you mende not that you blame, |
Authors are either anxious about how their book is received, or make sure to be pointedly uncaring.
Sir Arthur Conan Doyle, in a mostly forgettable volume of poetry (Songs of the Road, 1911), begins:
If it were not for the hillocks
You’d think little of the hills;
The rivers would seem tiny
If it were not for the rills.
If you never saw the brushwood
You would under-rate the trees;
And so you see the purpose
Of such little rhymes as these.
Kālidāsa of course begins his Raghuvaṃśa with a grand disclaimer:
kva sūryaprabhavo vaṃśaḥ kva cālpaviṣayā matiḥ /
titīrṣur dustaram mohād uḍupenāsmi sāgaram // Ragh_1.2 //mandaḥ kaviyaśaḥ prārthī gamiṣyāmy upahāsyatām /
prāṃśulabhye phale lobhād udbāhur iva vāmanaḥ // Ragh_1.3 //atha vā kṛtavāgdvāre vaṃśe ‘smin pūrvasūribhiḥ /
maṇau vajrasamutkīrṇe sūtrasyevāsti me gatiḥ // Ragh_1.4 //
But the most nonchalant I’ve seen, thanks to Dr. Ganesh, is this gīti by Śrīkṛṣṇa Brahmatantra Yatīndra of the Parakāla Maṭha, Mysore:
nindatu vā nandatu vā
mandamanīṣā niśamya kṛtim etām
harṣaṃ vā marṣaṃ vā
sarṣapamātram api naiva vindema
Screw you guys. :-)
Big O() notation: a couple of sources
This post contains, just for future reference, a couple of primary sources relevant to the (“Big O”) notation:
- Some introductory words from Asymptotic Methods in Analysis by de Bruijn
- An letter from Donald Knuth on an approach to teaching calculus using this notation.
Visualizing product of permutations
A simple pedagogical trick that may come in handy: represent a permutation using arrows (curved lines) from to for each . Then, the product of two permutations can be represented by just putting the two corresponding figures (sets of arrows) one below the other, and following the arrows.
The figure is from an article called Symmetries by Alain Connes, found via the Wikipedia article on Morley’s trisector theorem (something entirely unrelated to permutations, but the article covers both of them and more).
I’m thinking how one might write a program to actually draw these: if we decide that the “height” of the figure is some , then each arrow needs to go from some to (using here the usual screen convention of coordinate increasing from left to right, and coordinate increasing from top to bottom). Further, each curve needs to have vertical slope at its two endpoints, so that successive curves can line up smoothly. The constraint on starting point, ending point, and directions at the endpoints defines almost a quadratic Bezier curve, except that here the two directions are parallel. So it’s somewhere between a quadratic and the (usual) cubic Bezier curve, which is given by the start point, end point, and derivatives at the start and end point. (Here we only care about the direction of the derivative; we can pick some arbitrary magnitude to fix the curve: the larger we pick, the more smooth it will look at the ends, at the cost of smoothness in the interior.)
Even knowing the curve, how do we generate an image?
The idea of logarithms, and the first appearance of e
[Incomplete post: about 10% written, about 90% left.]
The notion of the number , the exponential function , and logarithms are often conceptual stumbling blocks even to someone who has an otherwise solid understanding of middle-school mathematics.
Just what is the number ? How was it first calculated / where did it first turn up? Premature exposure to its numerical value
only serves to deepen the mysteriousness and to make it seem arbitrary.
Here a historical perspective helps: as is often the case, here too, the first appearance is simpler and more well-motivated than the accounts in dry textbooks. This is from this account by Matthew P. Wiener (originally posted on USENET somewhere, as quoted by MJD). I’m just going to quote it directly for now, and edit it later:
Napier, who invented logarithms, more or less worked out a table of logarithms to base , as follows:
0 1 2 3 4 5 6 7 8 9 10 ... 1 2 4 8 16 32 64 128 256 512 1024 ...The arithmetic progression in the first row is matched by a geometric progression in the second row. If, by any luck, you happen to wish to multiply 16 by 32, that just happen to be in the bottom row, you can look up their “logs” in the first row and add 4+5 to get 9 and then conclude 16·32=512.
For most practical purposes, this is useless. Napier realized that what one needs to multiply in general is for a base—the intermediate values will be much more extensive. For example, with base 1.01, we get:
0 1.00 1 1.01 2 1.02 3 1.03 4 1.04 5 1.05 6 1.06 7 1.07 8 1.08 9 1.09 10 1.10 11 1.12 12 1.13 13 1.14 14 1.15 15 1.16 16 1.17 17 1.18 18 1.20 19 1.21 20 1.22 21 1.23 22 1.24 23 1.26 24 1.27 25 1.28 26 1.30 27 1.31 28 1.32 29 1.33 30 1.35 31 1.36 32 1.37 33 1.39 34 1.40 35 1.42 [...] 50 1.64 51 1.66 52 1.68 53 1.69 54 1.71 55 1.73 [...] 94 2.55 95 2.57 96 2.60 97 2.63 98 2.65 99 2.68 100 2.70 101 2.73 102 2.76 103 2.79 104 2.81 105 2.84 [...]So if you need to multiply 1.27 by 1.33, say, just look up their logs, in this case, 24 and 29, add them, and get 53, so 1.27·1.33=1.69. For two/three digit arithmetic, the table only needs entries up to 9.99.
Note that is almost there, as the antilogarithm of 100. The natural logarithm of a number can be read off from the above table, as just [approximately] the corresponding exponent.
What Napier actually did was work with base .9999999. He spent 20 years computing powers of .9999999 by hand, producing a grand version of the above. That’s it. No deep understanding of anything, no calculus, and pops up anyway—in Napier’s case, was the 10 millionth entry. (To be pedantic, Napier did not actually use decimal points, that being a new fangled notion at the time.)
Later, in his historic meeting with Briggs, two changes were made. A switch to a base was made, so that logarithms would scale in the same direction as the numbers, and the spacing on the logarithm sides was chosen so that . These two changes were, in effect, just division by .
In other words, made its first appearance rather implicitly.
(I had earlier read a book on Napier and come to the same information though a lot less clearly, here.)
I had started writing a series of posts leading up to an understanding of the exponential function (here, here, here), but it seems to have got abandoned. Consider this one a contribution to that series.