The Lumber Room

"Consign them to dust and damp by way of preserving them"

Archive for the ‘programming’ Category

Using Stellarium to make an animation / video

leave a comment »

(I don’t have a solution yet.)

I just wanted to show what the sky looks like over the course of a week.

On a Mac with Stellarium installed, I ran the following

/Applications/Stellarium.app/Contents/MacOS/stellarium --startup-script stellarium.ssc

with the following stellarium.ssc:

// -*- mode: javascript -*-
core.clear('natural');  // "atmosphere, landscape, no lines, labels or markers"
core.wait(5);
core.setObserverLocation('Ujjain, India');
core.setDate('1986-08-15T05:30:00', 'utc');
core.wait(5);
for (var i = 0; i < 2 * 24 * 7; i += 1) {
    core.setDate('+30 minutes');
    core.wait(0.5);
    core.screenshot('uj');
    core.wait(0.5);
}
core.wait(10);
core.quitStellarium();

It took a while (some 10–15 minutes) and created those 336 images in ~/Pictures/Stellarium/uj*, occupying a total size of about 550 MB. This seems a start, but Imagemagick etc. seem to choke on creating a GIF from such large data.

Giving up for now; would like to come back in future and figure out something better, that results in a smaller GIF.

Advertisements

Written by S

Mon, 2015-09-14 at 20:10:10

The state of static analysis on Python

leave a comment »

Here is a program that fails with UnboundLocalError three out of four times:

"""Module to demonstrate an error."""

import random

def zero_or_one():
    """Returns either 0 or 1 with equal probability."""
    return random.choice([0, 1])

def main():
    """Function to demonstrate an error."""
    if zero_or_one():
        first = 42
    for _ in range(zero_or_one()):
        second = 42
    print first, second

if __name__ == '__main__':
    main()

Note that line 15 uses the variables first and second, which are defined only if zero_or_one() returned 1 both times.

(Condensed from a real bug where, because of additional indentation, a variable assignment happened as the last line inside a loop, instead of the first line after it.)

I know of three tools that are popular: pychecker, pyflakes, and pylint. None of them say a single thing about this program. It is questionable whether ever (and if so, how often) code like the above is what the programmer intended.

The first of these, pychecker, is not on pip, and requires you to download a file from Sourceforge and run “python setup.py install”. Anyway, this is the output from the three programs:

/tmp% pychecker test.py
Processing module test (test.py)...

Warnings...

None
/tmp% pyflakes test.py
/tmp% pylint test.py
No config file found, using default configuration


Report
======
12 statements analysed.

Statistics by type
------------------

+---------+-------+-----------+-----------+------------+---------+
|type     |number |old number |difference |%documented |%badname |
+=========+=======+===========+===========+============+=========+
|module   |1      |1          |=          |100.00      |0.00     |
+---------+-------+-----------+-----------+------------+---------+
|class    |0      |0          |=          |0           |0        |
+---------+-------+-----------+-----------+------------+---------+
|method   |0      |0          |=          |0           |0        |
+---------+-------+-----------+-----------+------------+---------+
|function |2      |2          |=          |100.00      |0.00     |
+---------+-------+-----------+-----------+------------+---------+



Raw metrics
-----------

+----------+-------+------+---------+-----------+
|type      |number |%     |previous |difference |
+==========+=======+======+=========+===========+
|code      |11     |73.33 |11       |=          |
+----------+-------+------+---------+-----------+
|docstring |3      |20.00 |3        |=          |
+----------+-------+------+---------+-----------+
|comment   |0      |0.00  |0        |=          |
+----------+-------+------+---------+-----------+
|empty     |1      |6.67  |1        |=          |
+----------+-------+------+---------+-----------+



Duplication
-----------

+-------------------------+------+---------+-----------+
|                         |now   |previous |difference |
+=========================+======+=========+===========+
|nb duplicated lines      |0     |0        |=          |
+-------------------------+------+---------+-----------+
|percent duplicated lines |0.000 |0.000    |=          |
+-------------------------+------+---------+-----------+



Messages by category
--------------------

+-----------+-------+---------+-----------+
|type       |number |previous |difference |
+===========+=======+=========+===========+
|convention |0      |0        |=          |
+-----------+-------+---------+-----------+
|refactor   |0      |0        |=          |
+-----------+-------+---------+-----------+
|warning    |0      |0        |=          |
+-----------+-------+---------+-----------+
|error      |0      |0        |=          |
+-----------+-------+---------+-----------+



Global evaluation
-----------------
Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)

/tmp% 

Written by S

Tue, 2015-06-23 at 14:19:34

Posted in programming

Tagged with

Some playing with Python

with 2 comments

A long time ago, Diophantus (sort of) discussed integer solutions to the equation

\displaystyle  x^2 + y^2 = z^2

(solutions to this equation are called Pythagorean triples).

Centuries later, in 1637, Fermat made a conjecture (now called Fermat’s Last Theorem, not because he uttered it in his dying breath, but because it was the last one to be proved — in ~1995) that

\displaystyle x^n + y^n = z^n

has no positive integer solutions for n \ge 3. In other words, his conjecture was that none of the following equations has a solution:

\displaystyle x^3 + y^3 = z^3

\displaystyle x^4 + y^4 = z^4

\displaystyle x^5 + y^5 = z^5

\displaystyle x^6 + y^6 = z^6

… and so on. An nth power cannot be partitioned into two nth powers.

About a century later, Euler proved the n = 3 case of Fermat’s conjecture, but generalized it in a different direction: he conjectured in 1769 that an nth power cannot be partitioned into fewer than n nth powers, namely

\displaystyle z^n = \sum_{i = 1}^k x_i^n

has no solutions with k < n. So his conjecture was that (among others) none of the following equations has a solution:

\displaystyle z^3 = a^3 + b^3

\displaystyle z^4 = a^4 + b^4 + c^4

\displaystyle z^5 = a^5 + b^5 + c^5 + d^5

\displaystyle z^6 = a^6 + b^6 + c^6 + d^6 + e^6

… and so on.

This conjecture stood for about two centuries, until abruptly it was found to be false, by Lander and Parkin who in 1966 simply did a direct search on the fastest (super)computer at the time, and found this counterexample:

\displaystyle 27^5 + 84^5 + 110^5 + 133^5 = 144^5

(It is still one of only three examples known, according to Wikipedia.)

Now, how might you find this solution on a computer today?

In his wonderful (as always) post at bit-player, Brian Hayes showed the following code:

import itertools as it

def four_fifths(n):
    '''Return smallest positive integers ((a,b,c,d),e) such that
       a^5 + b^5 + c^5 + d^5 = e^5; if no such tuple exists
       with e < n, return the string 'Failed'.'''
    fifths = [x**5 for x in range(n)]
    combos = it.combinations_with_replacement(range(1,n), 4)
    while True:
        try:
            cc = combos.next()
            cc_sum = sum([fifths[i] for i in cc])
            if cc_sum in fifths:
                return(cc, fifths.index(cc_sum))
        except StopIteration:
            return('Failed')

to which, if you add (say) print four_fifths(150) and run it, it returns the correct answer fairly quickly: in about 47 seconds on my laptop.

The if cc_sum in fifths: line inside the loop is an O(n) cost each time it’s run, so with a simple improvement to the code (using a set instead) and rewriting it a bit, we can write the following full program:

import itertools

def find_counterexample(n):
  fifth_powers = [x**5 for x in range(n)]
  fifth_powers_set = set(fifth_powers)
  for xs in itertools.combinations_with_replacement(range(1, n), 4):
    xs_sum = sum([fifth_powers[i] for i in xs])
    if xs_sum in fifth_powers_set:
      return (xs, fifth_powers.index(xs_sum))
  return 'Failed'

print find_counterexample(150)

which finishes in about 8.5 seconds.

Great!

But there’s something unsatisfying about this solution, which is that it assumes there’s a solution with all four numbers on the LHS less than 150. After all, changing the function invocation to find_counterexample(145) makes it run a second faster even, but how could we know to do without already knowing the solution? Besides, we don’t have a fixed 8- or 10-second budget; what we’d really like is a program that keeps searching till it finds a solution or we abort it (or it runs out of memory or something), with no other fixed termination condition.

The above program used the given “n” as an upper bound to generate the combinations of 4 numbers; is there a way to generate all combinations when we don’t know an upper bound on them?

Yes! One of the things I learned from Knuth volume 4 is that if you simply write down each combination in descending order and order them lexicographically, the combinations you get for each upper bound are a prefix of the list of the next bigger one, i.e., for any upper bound, all the combinations form a prefix of the same infinite list, which starts as follows (line breaks for clarity):

1111,
2111, 2211, 2221, 2222,
3111, 3211, 3221, 3222, 3311, 3321, 3322, 3331, 3332, 3333,
4111, ...
     ... 9541, 9542, 9543, 9544, 9551, ... 9555, 9611, ...

There doesn’t seem to be a library function in Python to generate these though, so we can write our own. If we stare at the above list, we can figure out how to generate the next combination from a given one:

  1. Walk backwards from the end, till you reach the beginning or find an element that’s less than the previous one.
  2. Increase that element, set all the following elements to 1s, and continue.

We could write, say, the following code for it:

def all_combinations(r):
  xs = [1] * r
  while True:
    yield xs
    for i in range(r - 1, 0, -1):
      if xs[i] < xs[i - 1]:
        break
    else:
      i = 0
    xs[i] += 1
    xs[i + 1:] = [1] * (r - i - 1)

(The else block on a for loop is an interesting Python feature: it is executed if the loop wasn’t terminated with break.) We could even hard-code the r=4 case, as we’ll see later below.

For testing whether a given number is a fifth power, we can no longer simply lookup in a fixed precomputed set. We can do a binary search instead:

def is_fifth_power(n):
  assert n > 0
  lo = 0
  hi = n
  # Invariant: lo^5 < n <= hi^5
  while hi - lo > 1:
    mid = lo + (hi - lo) / 2
    if mid ** 5 < n:
      lo = mid
    else:
      hi = mid
  return hi ** 5 == n

but it turns out that this is slower than one based on looking up in a growing set (as below).

Putting everything together, we can write the following (very C-like) code:

largest_known_fifth_power = (0, 0)
known_fifth_powers = set()
def is_fifth_power(n):
  global largest_known_fifth_power
  while n > largest_known_fifth_power[0]:
    m = largest_known_fifth_power[1] + 1
    m5 = m ** 5
    largest_known_fifth_power = (m5, m)
    known_fifth_powers.add(m5)
  return n in known_fifth_powers

def fournums_with_replacement():
  (x0, x1, x2, x3) = (1, 1, 1, 1)
  while True:
    yield (x0, x1, x2, x3)
    if x3 < x2:
      x3 += 1
      continue
    x3 = 1
    if x2 < x1:
      x2 += 1
      continue
    x2 = 1
    if x1 < x0:
      x1 += 1
      continue
    x1 = 1
    x0 += 1
    continue

if __name__ == '__main__':
  tried = 0
  for get in fournums_with_replacement():
    tried += 1
    if (tried % 1000000 == 0):
      print tried, 'Trying:', get
    rhs = get[0]**5 + get[1]**5 + get[2]**5 + get[3]**5
    if is_fifth_power(rhs):
      print 'Found:', get, rhs
      break

which is both longer and slower (takes about 20 seconds) than the original program, but at least we have the satisfaction that it doesn’t depend on any externally known upper bound.

I originally started writing this post because I wanted to describe some experiments I did with profiling, but it’s late and I’m sleepy so I’ll just mention it.

python -m cProfile euler_conjecture.py

will print relevant output in the terminal:

         26916504 function calls in 26.991 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1   18.555   18.555   26.991   26.991 euler_conjecture.py:1()
 13458164    4.145    0.000    4.145    0.000 euler_conjecture.py:12(fournums_with_replacement)
 13458163    4.292    0.000    4.292    0.000 euler_conjecture.py:3(is_fifth_power)
      175    0.000    0.000    0.000    0.000 {method 'add' of 'set' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

Another way to view the same thing is to write the profile output to a file and read it with cprofilev:

python -m cProfile -o euler_profile.out euler_conjecture.py
cprofilev euler_profile.out

and visit http://localhost:4000 to view it.

Of course, simply translating this code to C++ makes it run much faster:

#include <array>
#include <iostream>
#include <map>
#include <utility>

typedef long long Int;
constexpr Int fifth_power(Int x) { return x * x * x * x * x; }

std::map<Int, int> known_fifth_powers = {{0, 0}};
bool is_fifth_power(Int n) {
  while (n > known_fifth_powers.rbegin()->first) {
    int m = known_fifth_powers.rbegin()->second  + 1;
    known_fifth_powers[fifth_power(m)] = m;
  }
  return known_fifth_powers.count(n);
}

std::array<Int, 4> four_nums() {
  static std::array<Int, 4> x = {1, 1, 1, 0};
  int i = 3;
  while (i > 0 && x[i] == x[i - 1]) --i;
  x[i] += 1;
  while (++i < 4) x[i] = 1;
  return x;
}

std::ostream& operator<<(std::ostream& os, std::array<Int, 4> x) {
  os << "(" << x[0] << ", " << x[1] << ", " << x[2] << ", " << x[3] << ")";
  return os;
}

int main() {
  while (true) {
    std::array<Int, 4> get = four_nums();
    Int rhs = fifth_power(get[0]) + fifth_power(get[1]) + fifth_power(get[2]) + fifth_power(get[3]);
    if (is_fifth_power(rhs)) {
      std::cout << "Found: " << get << " " << known_fifth_powers[rhs] << std::endl;
      break;
    }
  }
}

and


clang++ -std=c++11 euler_conjecture.cc && time ./a.out

runs in 2.43s, or 0.36s if compiled with -O2.

But I don’t have a satisfactory answer to how to make our Python program which takes 20 seconds as fast as the 8.5-second known-upper-bound version.


Edit [2015-05-08]: I wrote some benchmarking code to compare all the different “combination” functions.

import itertools

# Copied from the Python documentation
def itertools_equivalent(iterable, r):
    pool = tuple(iterable)
    n = len(pool)
    if not n and r:
        return
    indices = [0] * r
    yield tuple(pool[i] for i in indices)
    while True:
        for i in reversed(range(r)):
            if indices[i] != n - 1:
                break
        else:
            return
        indices[i:] = [indices[i] + 1] * (r - i)
        yield tuple(pool[i] for i in indices)

# Above function, specialized to first argument being range(1, n)
def itertools_equivalent_specialized(n, r):
  indices = [1] * r
  yield indices
  while True:
    for i in reversed(range(r)):
      if indices[i] != n - 1:
        break
    else:
      return
    indices[i:] = [indices[i] + 1] * (r - i)
    yield indices

# Function to generate all combinations of 4 elements
def all_combinations_pythonic(r):
  xs = [1] * r
  while True:
    yield xs
    for i in range(r - 1, 0, -1):
      if xs[i] < xs[i - 1]:
        break
    else:
      i = 0
    xs[i] += 1
    xs[i + 1:] = [1] * (r - i - 1)

# Above function, written in a more explicit C-like way
def all_combinations_clike(r):
  xs = [1] * r
  while True:
    yield xs
    i = r - 1
    while i > 0 and xs[i] == xs[i - 1]:
      i -= 1
    xs[i] += 1
    while i < r - 1:
      i += 1
      xs[i] = 1

# Above two functions, specialized to r = 4, using tuple over list.
def fournums():
  (x0, x1, x2, x3) = (1, 1, 1, 1)
  while True:
    yield (x0, x1, x2, x3)
    if x3 < x2:
      x3 += 1
      continue
    x3 = 1
    if x2 < x1:
      x2 += 1
      continue
    x2 = 1
    if x1 < x0:
      x1 += 1
      continue
    x1 = 1
    x0 += 1
    continue

# Benchmarks for all functions defined above (and the library function)
def benchmark_itertools(n):
  for xs in itertools.combinations_with_replacement(range(1, n), 4):
    if xs[0] >= n:
      break
def benchmark_itertools_try(n):
  combinations = itertools.combinations_with_replacement(range(1, n), 4)
  while True:
    try:
      xs = combinations.next()
      if xs[0] >= n:
        break
    except StopIteration:
      return
def benchmark_itertools_equivalent(n):
  for xs in itertools_equivalent(range(1, n), 4):
    if xs[0] >= n:
      break
def benchmark_itertools_equivalent_specialized(n):
  for xs in itertools_equivalent_specialized(n, 4):
    if xs[0] >= n:
      break
def benchmark_all_combinations_pythonic(n):
  for xs in all_combinations_pythonic(4):
    if xs[0] >= n:
      break
def benchmark_all_combinations_clike(n):
  for xs in all_combinations_clike(4):
    if xs[0] >= n:
      break
def benchmark_fournums(n):
  for xs in fournums():
    if xs[0] >= n:
      break

if __name__ == '__main__':
  benchmark_itertools(150)
  benchmark_itertools_try(150)
  benchmark_itertools_equivalent(150)
  benchmark_itertools_equivalent_specialized(150)
  benchmark_all_combinations_pythonic(150)
  benchmark_all_combinations_clike(150)
  benchmark_fournums(150)

As you can see, I chose inside the benchmarking function the same statement that would cause all_combinations to terminate, and have no effect for the other combination functions.
When run with

python -m cProfile benchmark_combinations.py

the results include:

    2.817 benchmark_combinations.py:80(benchmark_itertools)
    8.583 benchmark_combinations.py:84(benchmark_itertools_try)
  126.980 benchmark_combinations.py:93(benchmark_itertools_equivalent)
   46.635 benchmark_combinations.py:97(benchmark_itertools_equivalent_specialized)
   44.032 benchmark_combinations.py:101(benchmark_all_combinations_pythonic)
   18.049 benchmark_combinations.py:105(benchmark_all_combinations_clike)
   10.923 benchmark_combinations.py:109(benchmark_fournums)

Lessons:

  • Calling itertools.combinations_with_replacement is by far the fastest, taking about 2.7 seconds. It turns out that it’s written in C, so this would be hard to beat. (Still, writing it in a try block is seriously bad.)
  • The “equivalent” Python code from the itertools documentation (benchmark_itertools_combinations_with_replacment) is about 50x slower.
  • Gets slightly better when specialized to numbers.
  • Simply generating all combinations without an upper bound is actually faster.
  • It can be made even faster by writing it in a more C-like way.
  • The tuples version with the loop unrolled manually is rather fast when seen in this light, less than 4x slower than the library version.

Written by S

Sun, 2015-02-08 at 00:03:38

C++ vs Java terminology glossary

leave a comment »

E.g. there is no such thing as a “method” in C++. As far as the C++ standard is concerned, functions inside classes are always called “member functions”.

Here is a handout from this course (taught by, AFAICT, Jeffrey S. Leon) that helps me understand the Java people.

comparable-terms

Written by S

Sat, 2015-01-31 at 11:53:06

Posted in programming

Tagged with