Gmail

From AMSN
Jump to: navigation, search

This is a draft of the Gmail protocol.

Contents

Authentication

First, we need to authenticate. This is done through SSL.

Do a GET at the url : https://www.google.com/accounts/ServiceClientLogin?service=mail

we should receive a 401 Unauthorized error with a header WWW-Authenticate which should tell us which authentication method is supported as well as a realm..

 GET /accounts/ServiceClientLogin?service=mail HTTP/1.1
 User-Agent: Mozilla/5.0 (compatible; GNotify 1.0.25.0)
 Host: www.google.com
 Connection: Keep-Alive
 Cache-Control: no-cache
 Cookie: __utma=173272373.1877630316.1163121062.1163121062.1163987526.2; __utmz=173272373.1163121062.1.1.utmccn=(direct)|utmcsr=(direct)|utmcmd=(none)
 
 HTTP/1.1 401 Unauthorized
 WWW-Authenticate: basic realm="Please log in to your Google Account"
 Cache-control: no-cache
 Pragma: no-cache
 Content-Type: text/plain
 Content-Length: 1
 Date: Thu, 23 Nov 2006 03:47:10 GMT
 Server: GFE/1.3

After that, we should send a new request in which we add a header 'Authorization : Basic' with the base64 of $user:$password

We should get a 200 OK response with a list of Set-Cookie and X-Set-Google-Cookie in the headers. We need to parse those. the Set-Cookie will be of the form :

 Set-Cookie: <key>=<value>
 Set-Cookie: <key>=<value>;<key>=<value>;<key>=<value>

etc.. We must take all those Set-Cookie headers and get the SID key and its value. For the X-Set-Google-Cookie, the key should be 'GV' and we should also retreive it. All other info could be important, but for now, we don't really care...

We can close the SSL connection. The cookie we'll use later on for the authentication will be created with the SID and GV keys. Example :

 GET /accounts/ServiceClientLogin?service=mail HTTP/1.1
 User-Agent: Mozilla/5.0 (compatible; GNotify 1.0.25.0)
 Host: www.google.com
 Connection: Keep-Alive
 Cache-Control: no-cache
 Authorization: Basic dXNlckBnbWFpbC5jb206cGFzc3dvcmQ=
 Cookie: __utma=173272373.1877630316.1163121062.1163121062.1163987526.2; __utmz=173272373.1163121062.1.1.utmccn=(direct)|utmcsr=(direct)|utmcmd=(none)
 
 
 HTTP/1.1 200 OK
 Set-Cookie: SID=<removed for security reasons>;Domain=.google.com;Path=/
 Set-Cookie: LSID=EXPIRED;Domain=.google.com;Path=/;Expires=Mon, 01-Jan-1990 00:00:00 GMT
 Set-Cookie: LSID=EXPIRED;Path=/;Expires=Mon, 01-Jan-1990 00:00:00 GMT
 Set-Cookie: LSID=EXPIRED;Domain=www.google.com;Path=/accounts;Expires=Mon, 01-Jan-1990 00:00:00 GMT
 Set-Cookie: LSID=<removed for security reasons>;Path=/accounts
 X-Set-Google-Cookie: GV=<removed for security reasons>
 Cache-control: no-cache
 Pragma: no-cache
 Content-Type: text/plain
 Content-Length: 0
 Date: Thu, 23 Nov 2006 03:47:10 GMT
 Server: GFE/1.3

So your Cookie you'll use for authentication will be :

 Cookie: GV=<removed for security reasons>; SID=<removed for security reasons>


We can also get a "403 Forbidden" error instead of the 200 OK, what this actually means is that you are not authorized to authenticate. This is usually caused when you attempt to authenticate many times with a wrong username/password combination. The exact error is described by Gmail as :

 We're sorry…
 … but your query looks similar to automated requests from a computer virus or spyware application. To protect our users, we can't process your request right now.
 We'll restore your access as quickly as possible, so try again soon. In the meantime, if you suspect that your computer or network has been infected, 
 you might want to run a virus checker or spyware remover to make sure that your systems are free of viruses and other spurious software.
 
 We apologize for the inconvenience, and hope we'll see you again on Google.

In order to avoid this error once it happens, you only need to open your web browser and go to http://gmail.com then login using your correct username and password, a visual confirmation will be requested that you must fill in. Once you filled the visual confirmation correctly and that you're signed into your account, the authentification described above should work again without showing you the infamous "403 Forbidden" error.. until you lock it up again of course...

Retreiving mail information

You can now connect through a standard socket on port 80 of http://mail.google.com/mail/?ui=pb

The ui=pb argument tells the gmail server that the UI (user interface) to show is of type 'pb' (used by Gmail notifier), other interfaces available are html and javascript and any other ui created for a specific application (gtalk may use another ui option)

Once you do your GET, you should receive an answer with 200 OK, in which there is a Set-Cookie header of key 'S', you will need to add that key to your Cookie for the following requests. Also note that every new request will generate a new 'S' key cookie and you'll need to update your cookie everytime.

Also note that subsequent requests will have an additional parameter, the URL should be http://mail.google.com/mail/?ui=pb&tlt=1234566789abcdef where tlt is a hexadecimal value of an epoch timestamp with millisecond granularity. This timestamp is used for something unclear for now..

looking at http://www.codeproject.com/csharp/gmailagent.asp it says :

 The tlt= parameter is the thread list timestamp, which is treated like a checksum in determining the state of the client versus the mailbox state on the server. If the client timestamp is older than the one on the server, then a full DataPack is sent. Otherwise, Gmail sends an essentially empty DataPack.

The data received will be of type 'pb' and will be binary data. Here's an example :

 GET /mail/?ui=pb HTTP/1.1
 User-Agent: Mozilla/5.0 (compatible; GNotify 1.0.25.0)
 Host: mail.google.com
 Connection: Keep-Alive
 Cache-Control: no-cache
 Cookie: GV=<removed for security reasons>; SID=<removed for security reasons>
 HTTP/1.1 200 OK
 Cache-control: no-cache, no-store
 Pragma: no-cache
 Content-Type: application/octet-stream; charset=utf-8
 Content-Length: 98
 Set-Cookie: S=<removed for security reasons>; Domain=.google.com; Path=/
 Server: GFE/1.3
 Date: Thu, 23 Nov 2006 03:47:10 GMT
 <binary data>


A second request would like like this :

 GET /mail/?ui=pb&tlt=10f1134779e HTTP/1.1
 User-Agent: Mozilla/5.0 (compatible; GNotify 1.0.25.0)
 Host: mail.google.com
 Connection: Keep-Alive
 Cache-Control: no-cache
 Cookie: GV=<removed for security reason>; SID=<removed for security reason>; S=<removed for security reason>

The Binary Data

The binary data format (let's call it GData :) ) is pretty much reverse engineered by now but it is unfortunately an incomplete specs with some unidentified tags.

The prerequisites

First, you need to know about the way the GData stores into bytes. Everytime you hear a "MultiByte", this means that it's an integer value represented by either one or multiple bytes. In order to determine that, you must first read one byte, then, depending on its value, decide if the next byte is also part of this integer or not. In other words, whenever the byte value has its last bit set (is superior to 0x80), then you need to unset that bit and use that as a base value to which you will add the next byte shifted by 7, if that byte has its last bit set, you need to unset that bit first, then you need to read the next byte, shift it by 14 and add it to the base value, etc... until you've read a maximum of an 8 bytes value. The algorithm should look like this :

 int multibyte = (*ptr & 0x7F);
 int shift = 7;
 if (*ptr & 0x80 && shift < 64) {
    ptr++;
    size = size | ((0x7F & *ptr) << shift);
    shift += 7;
 }

The GData format

Now, let's talk about the actual format of the GData. The format looks like this :

 [Key] [Data] [Key] [Data] [Key] [Data] ....

where [Key] is a MultiByte key allowing us to understand what the [Data] means. [Data] is specific to the key.

[Data] items are composed of either :

 [Value]

or

 [Size] [Value]

or

 [Size] [String]

or

 [Size] [Payload]

Where [Value] is a MultiByte representing the value specific to the [Key], [Size] being a MultiByte representing the size of the value (payload) and [Payload] being a GData format of its own specific to [Key]. A [String] means that is simply represents a character string. It is non null-terminated and has a special format. See String encoding

Here are the known keys and their meanings  :

Main GData

Key Data format Description Comments
0x0A [Size] [Payload] New email This item will contain information about each unread email. See New email GData
0x88 [Value] Number of new emails This key contains the binary value of the number of unread emails in your inbox
0x90 [Value] Unknown ? This is unknown so far and does not always appear, I see no reason why it would appear some time and some other time it won't. Its value so far has only been seen as being 1.
0x188 [Value] Unknown ? This is unknown so far and its value so far has only been seen as being 0.
0x190 [Value] Unknown ? This is unknown so far and its value so far has only been seen as being 0.

Don't forget that [Key] is a MultiByte ([Size] too), which means that the 0x88 key will appear in the GData as 0x8801 and 0x190 will appear as 0x9003 because 0x90 is superior to 0x80.

Note : By looking at the gnotifier.exe assembly code, it seems that there are also 3 other [Key]s possible, there's 0x10, 0x80 and a possible list of 0xC2 [Key]s.. but they did not appear so far during our tests.

New email GData

Key Data format Description Comments
0x10 [Value] Message ID ? This seems to be a MultiByte value representing a message id of the email. It is an 8 bytes long value (taking 9 bytes as a MultiByte) and seems to be composed by the timestamp of when gmail received the email and a random value.
0x18 [Value] Timestamp This value represents the timestamp of the email. From it we can get an Epoch timestamp in milliseconds (not seconds). It is the date/time of the Date email header.
0x82 [Size] [String] Tags This will contain the name of a tag of the email. See Tags
0x92 [Size] [Payload] Authors This will contain the information about one of the users participating in the conversation. See Authors GData
0x98 [Value] Personal Level Indicator This is can be 0 (this email is not send directly for this address (eg. from a mailing list)), 1 (send directly to this address but not only to this address (eg. in cc)) and 2 (when the email is send directly to this address only).
0xA2 [Size] [String] Subject This contains the Subject of the email
0xAA [Size] [String] Body preview This contains a preview of the body of the email.
0xB2 [Size] [String] Attachment The data represents the filename of an file attached to the email.
0xB8 [Value] Thread List The data represents the binary value of the number of emails in this thread list (conversation).

Authors GData

Key Data format Description Comments
0x0A [Size] [Payload] Author identity This will contain the email address and name of the author. See Author Identity GData
0x10 [Value] Has Unread When this value is set (and set to 1) this means that this user has one of his emails unread in the current Thread List. A Conversation may contain many emails from different authors but only one of them is unread. This allows us to see which one is really unread.
0x18 [Value] Thread Initiator When this value is set (and set to 1) this means that the current author is the original user who started this Thread List.

Author Identity GData

Key Data format Description Comments
0x0A [Size] [String] Email address This contain the email address of the author
0x12 [Size] [String] Name This contains the name of the author


Tags

The tags determine what an email represents. The tags can either start with a '^' sign or not start with it. In the case it does not start with the ^ sign, then it means that the tag is the name of a user defined label. In the case it starts with ^ it means that the tag is a reserved keyword tag name of Gmail. Here are a few ones and their supposed meaning :

 ^all : Appears in "All mail"
 ^i : Appears in "Inbox"
 ^u : Unread
 ^t : Starred items
 ^b,^bc : Appears in "Chats"
 ^vm : Appears in "Voice Mail"
 ^f : unknown
 ^seti : unknown
 ^tsX : unknown (where X is a number...)


String encoding

When a key has its value as a String, the string is non null terminated and can contain special characters. It is then encoded in a pretty standard way (XML encoding?) where the ampersant represents a special characters. You can find more info about those encodings here : Wikipedia:List of XML and HTML character entity references the mapping should be done this way :

Encoded Decoded
&amp; &
&quot; "
&apos; '
&lt; <
&gt; >
&hellip; ... (this one means an Ellipsis. This usually shows at the end of the Body preview, if it doesn't contain the whole email. It should be replaced by the Ellipsis symbol)
&#XYZ; The XYZ is actually an integer value corresponding to the ASCII code of the character to show.
Personal tools