The Lumber Room

"Consign them to dust and damp by way of preserving them"

Reverse-engineering Gmail: Initial remarks

with 11 comments

For the last week and a bit, I have been trying to do a particular something with Gmail. (Specifically, get at the Message-ID headers of messages.) This has been mostly a failure, but that’s not so surprising, as I had little experience with “all this web stuff”: JavaScript, AJAX, DOM, browser incompatibilities, Firebug, Greasemonkey… round up the usual buzzwords. I have learnt a bit, though, and thought it might help others starting in a similar situation. (And there’s also the hope that someone might actually find this and help me!)

The story so far
Gmail was launched in April 2004. Since then, it has been through many changes, the latest around October 2007 when there came to our inboxes a “Newer version”, also sometimes called “Gmail 2″. (Note that officially Gmail is still in Beta; it hasn’t even released a 1.0!)
When Gmail was released the set of practices that go by the name of “AJAX” was still new and unfamiliar; it has been refined and better-understood since. (And it turns out to require neither asynchrony nor JavaScript nor XML.)

Johnvey Hwang reverse-engineered much of Gmail’s original version, and even made a “Gmail API” out of it. It no longer works of course, and the site is often down too, but it’s available on the Wayback Machine and the section documenting “the Gmail engine and protocol” is still worth a read, if only for its glimpse into the labyrinthine ways in which Ajax applications can work. He turned it (in May 2005) into a SourceForge project (“Gmail API”), last updated June 2005, and the associated Google Group (” Gmail Agent API”) is also largely defunct and indicates that the API, or whatever came of it, has not been working since the changes in October 2007, at any rate.

My goal
At this point, I might as well reveal what I want to do: I want to make it easy to get the “Message-ID:” header of messages in Gmail. (I like to read email in Gmail but not to send, so one way to reply to a specific message would be to get the Message-ID and ask my other mail client to reply to the message with that message-ID.) In the current interface, the (only) way of getting it is to click on the pulldown menu next to “Reply”, and click on “Show original”. This will open up a page that contains the raw text of the message with all its headers, and “Message-ID:” is always one of them. Since I use Firefox, I’ve been trying to make this easier with a Greasemonkey script.

Trap-patching the P() function
As Greasemonkey scripts for Gmail go, much useful information comes from Mihai Parparita, who wrote many Greasemonkey scripts for Gmail. Quoting from here:

As others have documented, Gmail receives data from the server in form of JavaScript snippets. Looking at the top of any conversation list’s source, we can see that the D() function that receives data in turns calls a function P() in the frame where all the JavaScript resides. Since all data must pass through this global P() function, we can use Greasemonkey to hook into it. This is similar to the trap patching way of extending Classic Mac OS. Specifically, the Greasemonkey script gets a hold of the current P() function and replaces it with a version that first records relevant data in an internal array, and then calls the original function (so that Gmail operations are not affected).

Clever. This same information is also documented at Greasespot wiki, with a few remarks on what different parameters to P() mean. Alas, it no longer works, because Gmail changed their functions around and renamed all of them, so there is no P() function anymore, and I can’t find what the new equivalent is, or if there is one.

Changes of October 2007
Gmail made certain changes in October 2007, including introducing a “newer version”, but also changing the “older version” that is still available: so it’s not really the older version. As far as Greasemonkey scripts go, another change was in January 2008, where they made all the Javascript load in a separate iframe. So “unsafeWindow” in a Greasemonkey script now refers to this iframe (which is the first frame, frame[0], in the window, and can also be got as top.js). So any scripts written in September 2007 or earlier are certainly useless now.

A lesson from all this is that Gmail will always be a moving target, and one must consider whether it’s worth chasing it.

Gmail’s Greasemonkey “API”:
Sometime in November 2007 or so, after the latest changes, Google even released a basic Greasemonkey API for Gmail, which lets you do a few things, like adding things to the pane at the left. It is too limited for what I need, but it works very well for what is meant for, and is also very well-documented, by Mark Pilgrim with his usual “Dive Into” excellence. It is comprehensive, accurate, well-illustrated and to-the-point, and great as documentation goes; it just happens that the API doesn’t provide what I need.

Some observations
Back to what I’m trying to do. Currently, the actions in the menu next to “Reply”, namely “Reply to all”, “Forward”, “Filter messages like this”, … “Show original” etc., do not actually appear in the DOM multiple times once attached to each message. Instead each of these actions corresponds to exactly one node (each) in the DOM, like these:

<div act="27" style="padding-left: 19px;" class="SAQJzb" id=":t6">Filter messages like this</div>
<div id=":t8" class="R10Zdd" act="29" style="padding-left: 19px;">Add to Contacts list</div>
<div id=":tc" class="SAQJzb" act="32" style="padding-left: 19px;">Show original</div>

etc. The IDs change, and the class name also seems to randomly change between “SAQJzb” and “R10Zdd”; the only constant between the action and the node is the “act” attribute. “Show original” is always act=32. So when you click on the down-arrow button next to Reply, this menu comes up, and when you click on something in the menu, it somehow uses the information about where this menu came up and what you clicked, to find out which message to act on.

This means that simply simulating a click on the node (initMouseEvent, etc…) does not work; we also have to somehow give it the information on what message to act on. How to do this is one thing I’m trying to find out.

The other way involves the fact that Gmail also has its own “ID” for each message. When you are looking at a thread (“conversation”) that contains a single message, it is the same as what is in the URL, e.g. if the URL is something like https://mail.google.com/mail/#inbox/11c177beaf88ffe6, Gmail’s ID of the message is 11c177beaf88ffe6. But when you’re looking at a thread containing more than one message, the ID in the URL is just that of any of the messages in the thread (usually the first one, but you can use the ID of a different message in the URL and it will show the same thread). And when you click on the “Show original” link, the URL is something like https://mail.google.com/mail/?ui=2&ik=1234567890&view=om&th=11c177beaf88ffe6 where 1234567890 is a constant (probably depending on the user) and “om” probably stands for “original message”, and the “th” parameter is the ID of the message. So if I can somehow find a way of getting the ID of messages (like the trap-patching P() method, except that it should work for the current version), then it is possible to get the Message-ID headers of messages too.

Neither has worked out yet, but I’m trying…
(And I have more to say, but will post when things actually work.)

About these ads

Written by S

Sun, 2008-08-31 at 18:45:57 +05:30

11 Responses

Subscribe to comments with RSS.

  1. Beware of the trap of spending too much time to optimise your time usage :).

    (you can’t save more than 168 hours a week no matter what you do, even if some personal productivity suites promise to save you 4000 hours a week).

    Vipul

    vipulnaik

    Sat, 2008-09-27 at 21:15:09 +05:30

  2. Good point.

    The right comparison, though, is not whether the time lost by is more than the time spent creating it. In this case, it is the extra time lost clicking on IDs to find headers, and the frustrating of doing so, versus the satisfaction gained at having done something, the knowledge from it, the satisfaction from optimisation itself, the fun of poking at a closed system to try and learn its secrets and so on :-)
    Some of us just like to optimise, even when it may not be worthwhile.

    Shreevatsa

    Tue, 2008-10-07 at 04:14:52 +05:30

  3. [...] of Gmail (rats). Information is surprisingly hard to find, but a good source is a blog called The Lumber Room . Basically, some pre-2008 scripts no longer work, especially when it comes to getting message [...]

  4. Were you able to solve this problem? I’m currently trying to do something similiar, but still with no success.

    The only way I can see is analysis/deobfuscation of gmail js code…

    Vyacheslav Egorov

    Sun, 2009-08-02 at 07:46:02 +05:30

  5. Hi there,

    I’ve been stuck with a similar problem myself, because I don’t like the summary titles of some particular newsleter e-mails and I want to change them to reflect the content.

    I made some progress, but I’m stuck now. I have the conversation ids, I have the DOM conversation summary, I can change it (inside the iframe) – but I can’t get the actual conversation content.

    If anyone has any ideas/ wants any ideas on how I obtained the above, please reply here – I subscribed.


    Tiby

    Tiby

    Fri, 2010-08-27 at 17:19:18 +05:30

  6. me too, i think, we better login, save the session then go to our URL links, the last is parse the message.

    On Ruby : mechanize + any html parser.

    even on php, its impossible to count then trace it one by one,

    febru

    Sun, 2011-04-03 at 01:23:26 +05:30

    • Hi,

      I found a solution. Actually, gmail has this really handy functionality to view the original message. Using that static url, you can get whatever you want if you have the conversation ID.


      Tiby

      Tiby

      Sun, 2011-04-03 at 05:45:14 +05:30

      • Hi, I had mentioned this in the post above:

        …click on the pulldown menu next to “Reply”, and click on “Show original”. This will open up a page that contains the raw text of the message with all its headers, and “Message-ID:” is always one of them.

        But have you found a way of viewing the original message (activating “show original”) automatically? Or do you have to click on the menu and select it manually? (How does that help?) It seems you have a solution to your problem, so it would be great if you give a bit more detail.

        Thanks,
        S

        S

        Sun, 2011-04-03 at 08:39:01 +05:30

        • Hi there,

          Sorry I was so poor in details. I think I got the idea with the original message after reading this post, so thank you for that.

          Now, I already had the conv ID I wanted, so after that I “activated” the “show original” using an AJAX call to that url and retrieving it’s content.

          The getting of the connv ID was actually pretty tricky. Here’s how I did it: I opened firebug and watched in the NET tab for a POST or smth or a GET containing the required ID’s.

          I found a request to a URL like this “https://mail.google.com/mail/?ui=2&ik=5f6ed2f664&rid=mail%3Ai.882.0.0&view=tl&start=0&num=120&pcd=1&mb=0&rt=h&cat=_sL&search=cat” – now that I look at your post, I see you found something similar. I discovered that one of these params was a cookie or smth like that (the rid) and so had to redo the Firebug watching method from time to time to update it.

          With a little string parsing you can easily get the conv ID’s from there, and thanks to the ‘view original’ option, you can use that ID to get a conv, headers, etc.

          Here’s my code for these 2 functions, but they’re probably outdated (I stopped using this code for quite a while now, but with a few adjustments it should work).

          function getConvMsg(nr){
          	GM_xmlhttpRequest({
          	  method: "GET",
          	  url: "https://mail.google.com/mail/?ui=2&ik=5f6ed2f664&view=om&th="+convIds[nr],
          	  onload: function(body) {
          		var msg = body.responseText;
          		var titlePos = msg.indexOf("Title: ");
          		var tagsPos = msg.indexOf("Tags: ");
          		var urlPos = msg.indexOf("URL: ");
          		var endPos = msg.indexOf("UNSUBSCRIBE");
          		var title = msg.substring(titlePos,tagsPos).replace("Title: ","");
          		var URL = msg.substring(urlPos,endPos).replace(/\s\n/gm,'').replace("URL: ","").replace(links,""+title+"");
          
          		var tags_budget = msg.substring(tagsPos,urlPos).replace("Tags: ","").replace("Budget: ","");
          		domConv[nr].innerHTML = " "+ URL +" | "+ tags_budget;
          		domConv[nr].addEventListener("mousedown",markMsg,true);
          	  },
          	  onerror: function(err){
          		alert(err);
          	  }
          	});
          }
          function getConvIds(){
          	GM_xmlhttpRequest({
          	  method: "GET",
          	  url: "https://mail.google.com/mail/?ui=2&ik=5f6ed2f664&rid=mail%3Ai.882.0.0&view=tl&start=0&num=120&pcd=1&mb=0&rt=h&cat=_sL&search=cat",
          	  onload: function(body) {
          		var reply = body.responseText;
          		var tb = reply.indexOf("\"tb\"");
          		if(tb==-1) return;
          		convIds = new Array();
          		var te = reply.lastIndexOf("\"te\"");
          		var allIds = reply.substring(tb, te);
          		var idsFound = allIds.match(/\[\"\w{16}/gim);
          		for(i=0; i<idsFound.length; i++){
          			convIds.push(idsFound[i].replace("[\"",''));
          			getConvMsg(i);
          		}
          	  }
          	});
          	
          }
          

          Hope it helps.

          --
          Tiby

          [Edit: Changed <code> to <pre> to avoid some mischief done by the preprocessor. —S]

          Tiby

          Sun, 2011-04-03 at 13:15:00 +05:30

  7. Another thing: the

    &cat=_sL

    is from the fact I store these messages in a label named _sL.

    Also, it seems that the form processor added “a href=..” where links were in my code. Those are not supposed to be there (the html tags).


    Tiby

    Tiby

    Sun, 2011-04-03 at 13:20:48 +05:30

    • Thanks a lot. I will study this, try it out, and reply in a while.

      Regards,
      S

      S

      Sun, 2011-04-03 at 22:28:26 +05:30


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 57 other followers