The Lumber Room

"Consign them to dust and damp by way of preserving them"

Matplotlib tutorial

The “standard” way to plot data used to be gnuplot, but it’s time to start using matplotlib which looks better and easier to use. For one thing, it’s a Python library, and you have the full power of a programming language available when you’re plotting, and a full-featured plotting library available when you’re programming, which is very convenient: I no longer find it necessary to use a horrible combination of gnuplot, programs, and shell scripts. For another, it’s free software, unlike gnuplot which is (unrelated to GNU and) distributed under a (slightly) restrictive license. Also, the plots look great. (Plus, I’m told that matplotlib will be familiar to MATLAB users, but as an assiduous non-user of MATLAB, I can’t comment on that.)

matplotlib is simple to start using, but its documentation makes this fact far from clear. The documentation is fine if you’re already an expert and want to draw dolphins swimming in glass bubbles, but it’s barely useful to a complete beginner. So what follows is a short tutorial. After this, you (that is, I) should be able to look at the gallery and get useful information, and perhaps even the documentation will make a bit of sense. (If matplotlib isn’t already present on your system, the easiest way to install it, if you have enough bandwidth and disk space to download a couple of gigabytes and you’re associated with an academic installation, is to get the Enthought Python distribution. Otherwise, sudo easy_install matplotlib should do it.)

The most common thing you want to do is plot a bunch of (x,y) values:

import matplotlib.pyplot as plot
xs = [2, 3, 5, 7, 11]
ys = [4, 9, 5, 9, 1]
plot.plot(xs, ys)
plot.savefig("squaremod10.png")


Or, for a less arbitrary example:

import matplotlib.pyplot as plot
import math

xs = [0.01*x for x in range(1000)] #That's 0 to 10 in steps of 0.01
ys = [math.sin(x) for x in xs]
plot.plot(xs, ys)
plot.savefig("sin.png")


That is all. And you have a nice-looking sine curve:

If you want to plot two curves, you do it the natural way:

import matplotlib.pyplot as plot
import math

xs = [0.01*x for x in range(1000)]
ys = [math.sin(x) for x in xs]
zs = [math.cos(x) for x in xs]

plot.plot(xs, ys)
plot.plot(xs, zs)

plot.savefig("sin-2.png")


It automatically chooses a different colour for the second curve.

Perhaps you don’t find it so nice-looking. Maybe you want the y-axis to have the same scale as the x-axis. Maybe you want to label the x and y axes. Maybe you want to title the plot. (“The curves sin x and cos x”?) Maybe you want the sine curve to be red for some reason, and the cosine curve to be dotted. And have a legend for which curve is which. All these can be done. In that order —

import matplotlib.pyplot as plot
import math

xs = [0.01*x for x in range(1000)]
ys = [math.sin(x) for x in xs]
zs = [math.cos(x) for x in xs]

plot.axis('equal')
plot.xlabel('$x$')
plot.ylabel('$y$')
plot.title(r'The curves $\sin x$ and $\cos x$')
plot.plot(xs, ys, label=r'$\sin$', color='red')
plot.plot(xs, zs, ':', label=r'$\cos$')
plot.legend(loc='upper right')

plot.savefig("sin-cos-two.png")


Observe that we specified the plot style as “:”, for dotted. We can also use ‘o’ for circles, ‘^’ for triangles, and so on. We can also prefix a colour, e.g. ‘bo’ for blue circles, ‘rx’ for red crosses and so on (the default is ‘b-‘, as we saw above). See the documentation for plot.

Also, Matplotlib has some TeX support. :-) It parses a subset of TeX syntax (be sure to use r before the string, so that you don’t have to escape backslashes), and you can also use

 plot.rc('text', usetex=True)

to make it actually use (La)TeX to generate text.

My only major annoyance with the default settings is that the y-axis label is vertical, so one has to tilt one’s head to read it. This is easy to fix:

plot.ylabel('$y$', rotation='horizontal')

You can also place text at a certain position, or turn a grid on:

plot.text(math.pi/2, 1, 'top') plot.grid(True)

You can also annotate text with arrows.

You can have multiple figures in the same image, but it’s all rather stateful (or complicated) and so far I haven’t needed it.

Instead of saving the figure, you can use
 plot.show()
and get an interactive window in which you can zoom and so on, but I find it more convenient to just save to image.

You can customize defaults in the matplotlibrc file — fonts, line widths, colours, resolution…

Other plots: Matplotlib can also do histographs, erorr bars, scatter plots, log plots, polarplots, bar charts and yes, pie charts, but these don’t seem to be well-documented.

Animation: Don’t know yet, but it seems that if you want to make a “movie”, the recommended way is to save a bunch of png images and use mencoder/ImageMagick on them.

That takes care of the most common things you (that is, I) might want to do. The details are in the documentation. (Also see: cookbook, screenshots, gallery.)

Edit: If you have a file which is just several lines of (x,y) values in two columns (e.g. one you may have been using as input to gnuplot), this function may help:

def x_and_y(filename):
xs = []; ys = []
x, y = [int(s) for s in l.split()]
xs.append(x)
ys.append(y)
return xs, ys


Edit [2010-05-10]: To plot values indexed by dates, use the following.

    fig = plot.figure(figsize=(80,10))
plot.plot_date(dates, values, '-', marker='.', label='something')
fig.autofmt_xdate()


The first line is because I don’t know how else to set the figure size. The last is so that the dates are rotated and made sparser, so as to not overlap.

More generally, Matplotlib seems to have a model of figures inside plots and axes inside them and so on; it would be good to understand this model.

Edit [2013-03-15]: Found another good-looking tutorial here: http://www.loria.fr/~rougier/teaching/matplotlib/.

[2013-04-08: Comments closed because this post was getting a lot of spam; sorry.]

Written by S

Sun, 2010-03-07 at 23:41:37

Flashing the screen on Mac OS X

Here’s one way. There’s a C program to adjust the screen’s brightness written by Nicholas Riley, also available from this blog post by Matt (Danger) West. Get it. The rest is obvious. For instance, here’s a Python script, which should have probably been written in Perl:

import os, re, time

ob = re.findall('brightness (\d.\d+)', s)[0]

w = 0.2
for i in range(10):
os.system('./brightness 0'); time.sleep(w)
os.system('./brightness 1'); time.sleep(w)
if(i==4): os.system("say beep")

os.system('./brightness ' + ob)


Tune parameters to avoid epileptic seizures.

Written by S

Mon, 2009-08-17 at 16:07:05

Posted in compknow

Tagged with , , , ,

Firebug “console is undefined”

If you’re using Firebug and don’t want to bother removing or commenting out all the console.log debug messages for users who aren’t, put this at the top:

if(typeof(console) === "undefined" || typeof(console.log) === "undefined")
var console = { log: function() { } };


This reminds me of the trick I use for C/C++ of putting debugging statements inside a macro:

D(cerr<<"The number is "<<n<<endl;);


where the macro D is defined as

#define D(A) A


when you want the debugging code to run, and

#define D(A)


when you don’t. :-)

(The final semicolon after the D() is for Emacs to indent it reasonably.)

Update: Of course, the above are just hacky hacks to save typing. The “right” way for conditional C code is usually to use #ifdef DEBUG and the like, and the right way around the Firebug problem is to use something more general like this code at metamalcolm.

Written by S

Wed, 2009-08-05 at 14:32:15

Posted in compknow

Extreme image compression: the Twitter challenge

If a picture is worth a 1000 words, how much of a picture can you fit in 140 characters?

Mario Klingemann (Quasimondo on Flickr) had a fascinating — call it crazy if you like — idea: can you encode an image such that it can be sent as a single Twitter message (“tweet”)? Twitter allows 140 characters, which seems like nothing. It’s pretty much guaranteed that you’ll be able to get nothing meaningful out of so few bits, right?

Well, he came up with this, using a bunch of clever tricks: using the full Unicode range for “characters” (Chinese, etc.) to squeeze a few more bits’ worth, representing colours as blends of just 8 colours (3 bits!), and arriving at a Voronoi triangulation through a genetic algorithm:

“Mona Tweeta” © Quasimondo:Flickr/CC-BY-NC (210 bytes?)

The one on the right is the real Mona Lisa, and the left one is what fits in 140 characters, specifically the message: “圑嘌婂搒孵怤實恄幖戰怴搝愩娻屗奊唀唭嚟帧啜徠山峔巰喜圂嗊埯廇嗕患嚵幇墥彫壛嶂壋悟声喿墰廚埽崙嫖嘵奰恛嬂啷婕媸姴嚥娐嗪嫤圣峈嬻尤囮愰啴屽嶍屽嶰寂喿嶐唥帑尸庠啞彐啯廂喪帄嗆怠嗙开唅恰唦慼啥憛幮悐喆悠喚忐嗳惐唔戠啹媊婼捐啸抃岖嗅怲幀嗈拀唹坭嵄彠喺悠單囏庰抂唋岰媮岬夣宐彋媀恦啼彐壔姩宔嬀”

This is pretty impressive, you’d think, for 140 characters. But it gets better. Brian Campbell started a contest on Stack Overflow, and some brilliant approaches turned up.

Boojum wrote a nanocrunch.cpp, based on fractal compression, which can do this (on the left is the original, for comparsion):

by Boojum (490 bytes)

Sam Hocevar wrote img2twit, which segments the image into square cells and tries to randomly assign points and colours to them until something is close. It can do this:

img2twit by Sam Hocevar (250 bytes?)
You can watch a movie of the image evolving; it’s pretty cool!

There were also attempts at converting the image to a vector format and encoding that instead. Needless to say, it works well for vector-like images:
(almost perfect!)

but it’s hard to even convert some images to vector form:
by autotrace (before compression!)

Finally, this is how Dennis Lee’s record-holding “optimizing general-purpose losy image codec” DLI does:

Comparison: JPEG at 536 bytes, img2twit at 534 bytes, DLI at 534 bytes

Or if you want to be fair and compare at 250 bytes, here’s img2twit and DLI:

img2twit at ~250 bytes, DLI at 243 bytes

Amazing!

For silly amusement, you can read a liberal translation of the original message, or the Reddit thread with ASCII porn.

Disclaimer: I did not participate in any of this, and I know nothing about image compression, so no doubt there are errors in the above. Please point them out. All images are copyright the respective owners, and the quote in the first line is by Brian Campbell on Stack Overflow.

Written by S

Sun, 2009-05-31 at 13:37:44

mplayer: changing speed without changing pitch (avoiding the chipmunk effect)

with one comment

In mplayer, you can change the playback speed with [ or ], but that probably changes the pitch as well (naturally). Can be amusing the first time, but not after you realise that it is actually possible to do something sophisticated to avoid this. (Wikipedia calls it Audio timescale-pitch modification) Many other media players (including VLC and even Windows Media Player(?)) can do this automatically; here’s how to do it in mplayer.

Start mplayer as mplayer -af scaletempo
That’s it. The catch is that you need to get an mplayer which has the scaletempo filter, and we know how much the mplayer project loves making releases. (It’s not in Ubuntu at the time of writing.)

So, either

Get such an mplayer
e.g the deb from Sourceforge (here),

or

Start mplayer as mplayer -speed 1.5 -af ladspa=tap_pitch:tap_pitch:0:-33:-90:0 foo.avi

Seems even the latter might require installing the ladspa plugins.

For more on all this, see:

1. blog comments (with patches) at Pitch-Correct Play Speed with MPlayer
2. Change MPlayer Playback Speed
3. Mplayer FAQ: “How do i change mplayer speed but keep pitch the same?”

Written by S

Fri, 2009-05-29 at 21:54:31

Posted in compknow

Tagged with

Drag-and-drop on web pages

Many web interfaces (like the WordPress dashboard in which I type this) allow you to drag and drop sections of the page to rearrange them. I think I first saw it when the personalised Google homepage (google.com/ig) came out, but I don’t know if it was the first. Here are some libraries that seem to help you

So it seems that all JavaScript frameworks have it. It also seems that JQuery has “won”.

But it seems YUI’s “Reordering a List” example is the closest match to what we actually want, but it seems to require too many lines of code.

Some of the rest seem even worse: unless you are actually using the library for a lot of things, it is simply too much code to just drop into a bare-bones web page you’ve been writing from scratch.

Written by S

Sun, 2008-09-14 at 22:11:37

Posted in compknow, unfinished

Tagged with , ,

Reverse-engineering Gmail: Initial remarks

For the last week and a bit, I have been trying to do a particular something with Gmail. (Specifically, get at the Message-ID headers of messages.) This has been mostly a failure, but that’s not so surprising, as I had little experience with “all this web stuff”: JavaScript, AJAX, DOM, browser incompatibilities, Firebug, Greasemonkey… round up the usual buzzwords. I have learnt a bit, though, and thought it might help others starting in a similar situation. (And there’s also the hope that someone might actually find this and help me!)

The story so far
Gmail was launched in April 2004. Since then, it has been through many changes, the latest around October 2007 when there came to our inboxes a “Newer version”, also sometimes called “Gmail 2”. (Note that officially Gmail is still in Beta; it hasn’t even released a 1.0!)
When Gmail was released the set of practices that go by the name of “AJAX” was still new and unfamiliar; it has been refined and better-understood since. (And it turns out to require neither asynchrony nor JavaScript nor XML.)

Johnvey Hwang reverse-engineered much of Gmail’s original version, and even made a “Gmail API” out of it. It no longer works of course, and the site is often down too, but it’s available on the Wayback Machine and the section documenting “the Gmail engine and protocol” is still worth a read, if only for its glimpse into the labyrinthine ways in which Ajax applications can work. He turned it (in May 2005) into a SourceForge project (“Gmail API”), last updated June 2005, and the associated Google Group (” Gmail Agent API”) is also largely defunct and indicates that the API, or whatever came of it, has not been working since the changes in October 2007, at any rate.

My goal
At this point, I might as well reveal what I want to do: I want to make it easy to get the “Message-ID:” header of messages in Gmail. (I like to read email in Gmail but not to send, so one way to reply to a specific message would be to get the Message-ID and ask my other mail client to reply to the message with that message-ID.) In the current interface, the (only) way of getting it is to click on the pulldown menu next to “Reply”, and click on “Show original”. This will open up a page that contains the raw text of the message with all its headers, and “Message-ID:” is always one of them. Since I use Firefox, I’ve been trying to make this easier with a Greasemonkey script.

Trap-patching the P() function
As Greasemonkey scripts for Gmail go, much useful information comes from Mihai Parparita, who wrote many Greasemonkey scripts for Gmail. Quoting from here:

As others have documented, Gmail receives data from the server in form of JavaScript snippets. Looking at the top of any conversation list’s source, we can see that the D() function that receives data in turns calls a function P() in the frame where all the JavaScript resides. Since all data must pass through this global P() function, we can use Greasemonkey to hook into it. This is similar to the trap patching way of extending Classic Mac OS. Specifically, the Greasemonkey script gets a hold of the current P() function and replaces it with a version that first records relevant data in an internal array, and then calls the original function (so that Gmail operations are not affected).

Clever. This same information is also documented at Greasespot wiki, with a few remarks on what different parameters to P() mean. Alas, it no longer works, because Gmail changed their functions around and renamed all of them, so there is no P() function anymore, and I can’t find what the new equivalent is, or if there is one.

Changes of October 2007
Gmail made certain changes in October 2007, including introducing a “newer version”, but also changing the “older version” that is still available: so it’s not really the older version. As far as Greasemonkey scripts go, another change was in January 2008, where they made all the Javascript load in a separate iframe. So “unsafeWindow” in a Greasemonkey script now refers to this iframe (which is the first frame, frame[0], in the window, and can also be got as top.js). So any scripts written in September 2007 or earlier are certainly useless now.

A lesson from all this is that Gmail will always be a moving target, and one must consider whether it’s worth chasing it.

Gmail’s Greasemonkey “API”:
Sometime in November 2007 or so, after the latest changes, Google even released a basic Greasemonkey API for Gmail, which lets you do a few things, like adding things to the pane at the left. It is too limited for what I need, but it works very well for what is meant for, and is also very well-documented, by Mark Pilgrim with his usual “Dive Into” excellence. It is comprehensive, accurate, well-illustrated and to-the-point, and great as documentation goes; it just happens that the API doesn’t provide what I need.

Some observations
Back to what I’m trying to do. Currently, the actions in the menu next to “Reply”, namely “Reply to all”, “Forward”, “Filter messages like this”, … “Show original” etc., do not actually appear in the DOM multiple times once attached to each message. Instead each of these actions corresponds to exactly one node (each) in the DOM, like these:

<div act="27" style="padding-left: 19px;" class="SAQJzb" id=":t6">Filter messages like this</div>
<div id=":tc" class="SAQJzb" act="32" style="padding-left: 19px;">Show original</div>


etc. The IDs change, and the class name also seems to randomly change between “SAQJzb” and “R10Zdd”; the only constant between the action and the node is the “act” attribute. “Show original” is always act=32. So when you click on the down-arrow button next to Reply, this menu comes up, and when you click on something in the menu, it somehow uses the information about where this menu came up and what you clicked, to find out which message to act on.

This means that simply simulating a click on the node (initMouseEvent, etc…) does not work; we also have to somehow give it the information on what message to act on. How to do this is one thing I’m trying to find out.

The other way involves the fact that Gmail also has its own “ID” for each message. When you are looking at a thread (“conversation”) that contains a single message, it is the same as what is in the URL, e.g. if the URL is something like https://mail.google.com/mail/#inbox/11c177beaf88ffe6, Gmail’s ID of the message is 11c177beaf88ffe6. But when you’re looking at a thread containing more than one message, the ID in the URL is just that of any of the messages in the thread (usually the first one, but you can use the ID of a different message in the URL and it will show the same thread). And when you click on the “Show original” link, the URL is something like https://mail.google.com/mail/?ui=2&ik=1234567890&view=om&th=11c177beaf88ffe6 where 1234567890 is a constant (probably depending on the user) and “om” probably stands for “original message”, and the “th” parameter is the ID of the message. So if I can somehow find a way of getting the ID of messages (like the trap-patching P() method, except that it should work for the current version), then it is possible to get the Message-ID headers of messages too.

Neither has worked out yet, but I’m trying…
(And I have more to say, but will post when things actually work.)

Written by S

Sun, 2008-08-31 at 18:45:57