The generating function for Smirnov words (or: time until k consecutive results are the same)
1. Alphabet
Suppose we have an alphabet of size
. Its generating function (using the variable
to mark length) is simply
, as
contains
elements of length
each.
2. Words
Let denote the class of all words over the alphabet
. There are many ways to find the generating function
for
.
2.1.
We have
so its generating function is
2.2.
To put it differently, in the symbolic framework, we have , so the generating function for
is
2.3.
We could have arrived at this with direct counting: the number of words of length is
as there are
choices for each of the
letters, so the generating function is
3. Smirnov words
Next, let denote the class of Smirnov words over the alphabet
, defined as words in which no two consecutive letters are identical. (That is, words
in which
for all
, and
for any
.) Again, we can find the generating function for
in different ways.
3.1.
For any word in , by “collapsing” all runs of each letter, we get a Smirnov word. To put it differently, any word in
can be obtained from a Smirnov word
by “expanding” each letter
into a nonempty sequence of that letter. This observation (see Analytic Combinatorics, pp. 204–205) lets us relate the generating functions of
and
as
which implicitly gives the generating function : we have
3.2.
Alternatively, consider in an arbitrary word the first occurrence of a pair of repeated letters. Either this doesn’t happen at all (the word is a Smirnov word), or else, if it happens at position so that
, then the part of the word up to position
is a nonempty Smirnov word, the letter at position
is the same as the previous letter, and everything after
is an arbitrary word. This gives
or in terms of generating functions
giving
3.3.
A minor variant is to again pick an arbitrary word and consider its first pair of repeated letters, happening (if it does) at positions and
, but this time consider the prefix up to
: either it is empty, or the pair of letters is different from the last letter of the prefix, giving us the decomposition
and corresponding generating function
so
which is the same as before after we cancel the factors.
3.4.
We could have arrived at this result with direct counting. For , for a Smirnov word of length
, we have
choices for the first letter, and for each of the other
letters, as they must not be the same as the previous letter, we have
choices. This gives the number of Smirnov words of length
as
for
, and so the generating function for Smirnov words is
again giving
4. Words with bounded runs
We can now generalize. Let denote the class of words in which no letter occurs more than
times consecutively. (
.) We can find the generating function for
.
4.1.
To get a word in we can take a Smirnov word and replace each letter with a nonempty sequence of up to
occurrences of that letter. This gives:
so
4.2.
Pick any arbitrary word, and consider its first occurrence of a run of letters. Either such a run does not exist (which means the word we picked is in
), or it occurs right at the beginning (
possibilities, one for each letter in the alphabet), or, if it occurs starting at position
, then the part of the word up to position
(the “prefix”) is a nonempty Smirnov word, positions
to
are
occurrences of any of the
letters other than the last letter of the prefix, and what follows is an arbitrary word. This gives
or in terms of generating functions
so
giving
4.3.
Arriving at this via direct counting seems hard.
5. Words that stop at a long run
Now consider words in which we “stop” as soon we see consecutive identical letters. Let the class of such words be denoted
(not writing
to keep the notation simple). As before, we can find its generating function in multiple ways.
5.1.
We get any word in by either immediately seeing a run of length
and stopping, or by starting with a nonempty prefix in
, and then stopping with a run of
identical letters different from the last letter of the prefix. Thus we have
and
which gives
5.2.
Alternatively, we can decompose any word by looking for its first run of identical letters. Either it doesn’t occur at all (the word we picked is in
, or the part of the word until the end of the run belongs to
and the rest is an arbitrary word, so
and
so
6. Probability
Finally we arrive at the motivation: suppose we keep appending a random letter from the alphabet, until we encounter the same letter times consecutively. What can we say about the length
of the word thus generated? As all sequences of letters are equally likely, the probability of seeing any string of length
is
. So in the above generating function
, the probability of our word having length
is
, and the probability generating function
is therefore
. This
can be got by replacing
with
in the expression for
: we have
In principle, this probability generating function tells us everything about the distribution of the length of the word. For example, its expected length is
(See this question on Quora for other powerful ways of finding this expected value directly.)
We can also find its variance, as
This variance is really too large to be useful, so what we would really like, is the shape of the distribution… to be continued.
Converting a data URL (aka data URI) to an image on the commandline (Mac OS X)
This is trivial, but was awfully hard to find via Google Search. Eventually had to give up and actually think about it. :-)
So, a data-URI looks something like the following:
data:image/png;base64,[and a stream of base64 characters here]
The part after the comma is literally the contents of the file (image or whatever), encoded in base64, so all you need to do is run base64 --decode on that part.
For example, with the whole data URL copied to the clipboard, I can do:
pbpaste | sed -e 's#data:image/png;base64,##' | base64 --decode > out.png
to get it into a png file.
Using Stellarium to make an animation / video
(I don’t have a solution yet.)
I just wanted to show what the sky looks like over the course of a week.
On a Mac with Stellarium installed, I ran the following
/Applications/Stellarium.app/Contents/MacOS/stellarium --startup-script stellarium.ssc
with the following stellarium.ssc:
// -*- mode: javascript -*-
core.clear('natural'); // "atmosphere, landscape, no lines, labels or markers"
core.wait(5);
core.setObserverLocation('Ujjain, India');
core.setDate('1986-08-15T05:30:00', 'utc');
core.wait(5);
for (var i = 0; i < 2 * 24 * 7; i += 1) {
core.setDate('+30 minutes');
core.wait(0.5);
core.screenshot('uj');
core.wait(0.5);
}
core.wait(10);
core.quitStellarium();
It took a while (some 10–15 minutes) and created those 336 images in ~/Pictures/Stellarium/uj*, occupying a total size of about 550 MB. This seems a start, but Imagemagick etc. seem to choke on creating a GIF from such large data.
Giving up for now; would like to come back in future and figure out something better, that results in a smaller GIF.
Science
Is hard.
Just dumping some links here for now:
Feynman on “cargo cult science”: http://www.lhup.edu/~DSIMANEK/cargocul.htm
Feynman on “what is science”: http://www.fotuva.org/feynman/what_is_science.html
Ioannidis, Why Most Published Research Findings Are False: http://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.0020124
On how subtle it is: http://slatestarcodex.com/2014/04/28/the-control-group-is-out-of-control/
Reinhart, http://www.statisticsdonewrong.com
http://www.nytimes.com/2012/04/17/science/rise-in-scientific-journal-retractions-prompts-calls-for-reform.html?pagewanted=all
The state of static analysis on Python
Here is a program that fails with UnboundLocalError three out of four times:
"""Module to demonstrate an error."""
import random
def zero_or_one():
"""Returns either 0 or 1 with equal probability."""
return random.choice([0, 1])
def main():
"""Function to demonstrate an error."""
if zero_or_one():
first = 42
for _ in range(zero_or_one()):
second = 42
print first, second
if __name__ == '__main__':
main()
(Condensed from a real bug where, because of additional indentation, a variable assignment happened as the last line inside a loop, instead of the first line after it.)
I know of three tools that are popular: pychecker, pyflakes, and pylint. None of them say a single thing about this program. It is questionable whether ever (and if so, how often) code like the above is what the programmer intended.
The first of these, pychecker, is not on pip, and requires you to download a file from Sourceforge and run “python setup.py install”. Anyway, this is the output from the three programs:
/tmp% pychecker test.py
Processing module test (test.py)...
Warnings...
None
/tmp% pyflakes test.py
/tmp% pylint test.py
No config file found, using default configuration
Report
======
12 statements analysed.
Statistics by type
------------------
+---------+-------+-----------+-----------+------------+---------+
|type |number |old number |difference |%documented |%badname |
+=========+=======+===========+===========+============+=========+
|module |1 |1 |= |100.00 |0.00 |
+---------+-------+-----------+-----------+------------+---------+
|class |0 |0 |= |0 |0 |
+---------+-------+-----------+-----------+------------+---------+
|method |0 |0 |= |0 |0 |
+---------+-------+-----------+-----------+------------+---------+
|function |2 |2 |= |100.00 |0.00 |
+---------+-------+-----------+-----------+------------+---------+
Raw metrics
-----------
+----------+-------+------+---------+-----------+
|type |number |% |previous |difference |
+==========+=======+======+=========+===========+
|code |11 |73.33 |11 |= |
+----------+-------+------+---------+-----------+
|docstring |3 |20.00 |3 |= |
+----------+-------+------+---------+-----------+
|comment |0 |0.00 |0 |= |
+----------+-------+------+---------+-----------+
|empty |1 |6.67 |1 |= |
+----------+-------+------+---------+-----------+
Duplication
-----------
+-------------------------+------+---------+-----------+
| |now |previous |difference |
+=========================+======+=========+===========+
|nb duplicated lines |0 |0 |= |
+-------------------------+------+---------+-----------+
|percent duplicated lines |0.000 |0.000 |= |
+-------------------------+------+---------+-----------+
Messages by category
--------------------
+-----------+-------+---------+-----------+
|type |number |previous |difference |
+===========+=======+=========+===========+
|convention |0 |0 |= |
+-----------+-------+---------+-----------+
|refactor |0 |0 |= |
+-----------+-------+---------+-----------+
|warning |0 |0 |= |
+-----------+-------+---------+-----------+
|error |0 |0 |= |
+-----------+-------+---------+-----------+
Global evaluation
-----------------
Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)
/tmp%
Bhavabhuti on finding a reader
Bhavabhūti, the 8th-century author of the very moving play Uttara-rāma-carita, has in one of his other works these lines, any author’s consolation that even if your work receives not enough praise today, someday the right sort of reader will come along, who will derive great joy or meaning from it.
ये नाम केचिदिह नः प्रथयन्त्यवज्ञां जानन्ति ते किमपि तान्प्रति नैष यत्नः । उत्पत्स्यते तु मम कोऽपि समानधर्मा कालो ह्ययं निरवधिर्विपुला च पृथ्वी ॥ye nāma kecit iha naḥ prathayanti avajñām
jānanti te kim api tān prati na eṣa yatnaḥ |
utpatsyate tu mama ko api samāna-dharmā
kālo hi ayaṃ niravadhiḥ vipulā ca pṛthvī ||Those who deride or ignore my work —
let them know: my efforts are not for them.
There will come along someone who shares my spirit:
the world is vast, and time endless.
This verse has become a favourite of many. It appears already in the first known anthology of Sanskrit verses (subhāṣita-collection), Vidyākara’s Subhāṣita-ratna-koṣa. (It’s numbered 1731 (= 50.34) in the edition by Kosambi and Gokhale, and translated by Ingalls.) Ingalls writes and translates (1965):
Of special interest are the verses of Dharmakīrti and Bhavabhūti, two of India’s most original writers, which speak of the scorn and lack of understanding which the writings of those authors found among contemporaries. To such disappointment Dharmakīrti replies with bitterness (1726, 1729), Bhavabhūti with the unreasoning hope of a romantic (1731). If the souls of men could enjoy their posthumous fame the one would now see his works admired even far beyond India, the other would see his romantic hope fulfilled.
Those who scorn me in this world
have doubtless special wisdom,
so my writings are not made for them;
but are rather with the thought that some day will be born,
since time is endless and the world is wide,
one whose nature is the same as mine.
A translation of this verse is also included in A. N. D. Haksar’s A Treasury of Sanskrit Poetry in English Translation (2002):
The Proud Poet
Are there any around who mock my verses?
They ought to know I don’t write for them.
Someone somewhere sometime will understand.
Time has no end. The world is big.
— translated by V. N. Misra, L. Nathan and S. Vatsyayan [The Indian Poetic Tradition, 1993]
Andrew Schelling has written of it in Manoa, Volume 25, Issue 2, 2013:
Critics scoff
at my work
and declare their contempt—
no doubt they’ve got
their own little wisdom.
I write nothing for them.
But because time is
endless and our planet
vast, I write these
poems for a person
who will one day be born
with my sort of heart.“Criticism is for poets as ornithology is for the birds,” wrote John Cage. Bhavabhūti has scant doubt that future generations will honor his work. The reader who will arise, utpalsyate [sic], is somebody of the same faith, heart, or discipline, samānadharmā.
Just now also found it on the internet, here (2014) (misattributed to Bhartṛhari):
There are those who
treat my work with
studied indifference.
Maybe they know something,
but I’m not writing for them.
Someone will come around
who feels the way I do.
Time, after all, is limitless,
and fortune spreads far.
Finally, in Sadāsvada, written by my friend Mohan with some comments from me, this was included in one of our our very first posts (2012):
In his play Mālatīmādhava, he makes a point that deserves to be the leading light of anyone wishing to do something of value and
isput off by discouragement. Standing beside the words attributed to Gandhi (“First they ignore you, then they laugh at you, then they fight you, then you win.”) and Teddy Roosevelt (“It is not the critic who counts…”), Bhavabhūti’s confidence in the future stands resplendent:“They who disparage my work should know that it’s not for them that I did it. One day, there will arise someone who will truly know me: this world is vast, and time infinite.”
It is not the critic who counts; not the man who points out how the strong man stumbles, or where the doer of deeds could have done them better. The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood; who strives valiantly; who errs, who comes short again and again, because there is no effort without error and shortcoming; but who does actually strive to do the deeds; who knows great enthusiasms, the great devotions; who spends himself in a worthy cause; who at the best knows in the end the triumph of high achievement, and who at the worst, if he fails, at least fails while daring greatly, so that his place shall never be with those cold and timid souls who neither know victory nor defeat.
[Note on the text: in Vidyākara’s compilation the verse ends with “विपुला च लक्ष्मीः” (vipulā ca lakṣmīḥ) instead of “विपुला च पृथ्वी” (vipulā ca pṛthvī), but the actual source work Mālatī-mādhava has the latter, as do all quotations of this verse elsewhere (e.g. the काव्यप्रकाशः of Mammaṭa, the Sahityadarpana of Viśvanātha Kavirāja, the रसार्णवसुधाकरः of श्रीसिंहभूपाल), and that is what Ingalls uses: “For lakṣmīḥ, which utterly destroys the line, read pṛthvi with the printed texts of Māl.” Actually, most quotations have “utpatsyate ‘sti” in place of “utpatsyate tu”: “either will be born, or already exists…”.]
Printing floating-point numbers
Everyone knows that floating-point numbers, being discrete, cannot possibly exactly represent every real number. In particular, the usual binary (IEEE 754) floating point numbers cannot even exactly store all numbers exactly representable in decimal (e.g. 0.3 or 0.1, which are not dyadic rationals).
But what about printing them?
Just because the number stored internally is not 0.1 but the closest approximation to it (say as 0.100000001490116119384765625) doesn’t mean it should be printed as a long string, when “0.1” means exactly the same number.
This is a solved problem since 1990.
TODO: Write rest of this post.
Bryan O’Sullivan (of Real World Haskell fame):
http://www.serpentine.com/blog/2011/06/29/here-be-dragons-advances-in-problems-you-didnt-even-know-you-had/
Steel & White paper How to Print Floating-Point Numbers Accurately: https://lists.nongnu.org/archive/html/gcl-devel/2012-10/pdfkieTlklRzN.pdf
Their retrospective: http://grouper.ieee.org/groups/754/email/pdfq3pavhBfih.pdf
Burger & Dybvig:
http://www.cs.indiana.edu/~dyb/pubs/FP-Printing-PLDI96.pdf
Florian Loitsch:
http://www.cs.tufts.edu/~nr/cs257/archive/florian-loitsch/printf.pdf
Russ Cox: http://research.swtch.com/ftoa
http://www.ryanjuckett.com/programming/printing-floating-point-numbers/
https://labix.org/python-nicefloat
Edit (Thanks Soma!): Printing Floating-Point Numbers: A Faster, Always Correct Method from POPL’16.