2008-05-12

Google App Engine tips&tricks

source

A while ago I was writing some sample applications (source) for Google App Engine. I noted the things that can be useful for other GAE programmers.

I used Google's webapp framework, my code here is using it.

Please take a look at the shell application, it can help you test simple code.

How to dynamically get application name and version?

This question was asked before. You can use os.getcwd() or os.environ['PATH_TRANSLATED'].
>>> os.getcwd()
'/base/data/home/apps/shell/1.21'
>>> os.getcwd().split('/')[-2]
'shell'
>>> os.getcwd().split('/')[-1]
'1.21'

>>> os.environ['PATH_TRANSLATED']
'/base/data/home/apps/shell/1.21/shell.py'
>>> os.environ['PATH_TRANSLATED'].split('/')[-3]
'shell'

How to identify current host?

There's a very interesting file that should be unique for every server:
>>> open('/base/python_dist/search.config').read()
'datapath .\nsorttempdir .\ndisk /export/hdc3/borgletdata/dirs/prod-appengine.\
mpm_python_dist_v12.apphosting.77627982/bigfiledata/466024'

>>> open('/base/python_dist/search.config').read()
'datapath .\nsorttempdir .\ndisk /export/hdc3/borgletdata/dirs/prod-appengine.\
mpm_python_dist_v12.apphosting.77627739/bigfiledata/465336'
You can identify the machine on which the process is deployed by using hash based on this file. Something like that:
def get_server_id():
try:
fd = open('/base/python_dist/search.config')
data = fd.read()
fd.close()
except IOError:
return 'unknown'

return '%s' % data.__hash__()
Google doesn't inform you on how many machines your application is going to be deployed (this probably depends on the traffic your site generates). But you can add this server_id to your site footer. Than you can do multiple wget's to know on how many unique machines your app is being deployed.
$ for i in `seq 20`; do
curl -s http://cometchat.appspot.com|\
grep server_id; \
done |sort -n|uniq -c

20 server_id: '7341146770217830363'
It seems that my app is deployed on only one server.

How to identify current process?

Yet again, how many processes with your app are deployed? This time a trick with global variable:
the_process_global = "something"

def get_process_id():
return '%s' % id(the_process_global)
Now I know that my application is deployed using two processes:
$ for i in `seq 20`; do
curl -s http://cometchat.appspot.com|\
grep _id;
done |sort -n|uniq -c

13 process_id: '12457625149327067176'
7 process_id: '3996238433791648184'

Are we on production or development server?

I use this snippet:
if os.environ.get('SERVER_SOFTWARE','').startswith('Devel'):
HOST='local'
elif os.environ.get('SERVER_SOFTWARE','').startswith('Goog'):
HOST='google'
else:
# logging.error('Unknown server. Production/development?')
HOST='unknown'

Captcha on GAE?

Joscha Feth wrote tutorial about using reCaptcha on GAE.

Cookies?

Google suggests that request and response objects follow the WebOb interfaces. This works for getting cookies from request:
username = self.request.cookies.get('username', '')
Unfortunately you can't use WebOb method response.set_cookie. But you can set cookies by hand:
self.response.headers.add_header(
'Set-Cookie',
'username=%s; expires=Fri, 31-Dec-2020 23:59:59 GMT' \
% username.encode())
You can find some other hints on google-app-engine discussions. I don't know if cookies work from django-helper.

Debugging datastore access

I created very simple datastore debugger. It appends some debugging info to the footer of generated page. To use it you must just change your classes to inherit from debug.DebugMiddleware instead of webapp.RequestHandler.

For example:
class List(debug.DebugMiddleware):
def get(self):
... blabla ...
Sample footer can look like that:
**** Request took:   830ms/170ms (real time/cpu time)
**** GQLs, datastore accessed 1 times.
98ms GQL app: ":self"
kind: "Image"
Order {
property: "modified"
direction: 2
}
args: (50,) {}
This GQL log was caused by the code:
ims = Image.all().order("-modified").fetch(50)
Yet another example of output footer:
**** Request took:   150ms/130ms (real time/cpu time)
**** GQLs, datastore accessed xx times.
219ms PUT ({'full':...
178ms PUT ({'full':...
6ms GET ([datastore_types.Key.from_path('Image', 350L, _app=u'srv')],) {}
2ms GET ([datastore_types.Key.from_path('Image', 349L, _app=u'srv')],) {}
2ms GET ([datastore_types.Key.from_path('Image', 348L, _app=u'srv')],) {}
This datastore debugger can be easily modified to be used as Django middleware.

Dynamic images uploading

This is the code I use. The template:
<form action="." method="post" enctype="multipart/form-data">
<label>File: </label><input name="file" type="file"><br />
<input type="submit">
</form>
Server side:
class Image(db.Model):
name = db.StringProperty()
content = db.BlobProperty()

class UploadImage(webapp.RequestHandler):
def post(self):
if 'file' not in self.request.POST:
self.error(400)
self.response.out.write("file not specified!")
return

if (self.request.POST.get('file', None) is None or
not self.request.POST.get('file', None).filename):
self.error(400)
self.response.out.write("file not specified!")
return

file_data = self.request.POST.get('file').file.read()
file_name = self.request.POST.get('file').filename

im = Image()
im.name = file_name
im.content = file_data
im.save()
self.response.out.write("image %r saved." % im.name)

How to get image size and type

Tj9991 found an implementation of function getImageInfo that can extract image size without any external libraries. The usage is straightforward:
content_type, width, height = getImageInfo(im.content)

Dynamic images serving

There's an article about this topic in the official docs. Here's my non-optimal code:
class ServeImage(webapp.RequestHandler):
def get(self, key):
im = db.get(db.Key(key))
if not im:
self.error(404)
return

content_type, width, height = getImageInfo(im.content)
self.response.headers.add_header("Expires", "Thu, 01 Dec 2014 16:00:00 GMT")
self.response.headers["Content-Type"] = content_type
self.response.out.write(im.content)


Image resizing

Google doesn't support image converting libraries like PiL. You have to convert images using some external services. You need to upload your data somewhere far from GAE and then somehow get the resized image. Especially for this I created a service (which is not the-most-stable way unfortunately). You can try other people methods as well.

Is comet/http-push/long polling supported by GAE?

No, but keep reading. You could try to do normal polling. For example by loading ajax data every second. But the GAE resources are limited, there are only 650k requests/day available. This limit is going to be reached with only 8 constant users for 24 hours (using ajax polling every second). I created external service that allow you to use comet techniques from GAE.



You can also take a look at my some sample applications that use my external services (source).


14 comments:

Ian Bicking said...

BTW, there's a ticket about the lack of a WebOb Response in issue 200

Henrik Joreteg said...

Thanks Majek! Very helpful.

Do you know anything about using SSL on App Engine?

Anonymous said...

SSL on Google App Engine?
Good question!

Anonymous said...

The cookie part solves my problem. Thanks!

Plat said...

My GAE application has been up and running for about a week now. The app- ThhetaNoon- tracks the energy output of solar energy systems (P.V. and Thermo-Solar) in real time, using weather data and mathematical algorithms form the statistical and geographical domains.
I ported the app from your average LAMP configuration to GAE, mainly to the scalability offered by G. The tie-in is an issue, hope GAE doesn't just shout down one day...
I used other Google tools for the development- Google Web Toolkit (GWT), Google charts. Google maps.
The GAE part provides data and some logic services. The GWT front-end performs some logic itself.
You can view the app at:
http://thetanoon.appspot.com/

Yossi

Anonymous said...

Hello Majek,

Thank you for posting your GAE-templates.

Especially for showing how one can write cookies in GAE. It isn't very well documented.

José Almeida said...

Hey Majek's!
I'm just starting a similar GAE tips&tricks blog and added a link to this post :)
Hope you don't mind!
Check out my first posts:
http://googleappenginetips.blogspot.com

polarbear said...

For Dynamic images uploading
I check file existance in post by:
if isinstance(self.request.POST["file"], cgi.FieldStorage):

Unknown said...

If you're looking for a quick introduction to the Google App Engine check out http://www.squidoo.com/Google-App-Engine

9Yim said...

Thanks so much for your help.

Clarity said...

very helpful post. I was just wondering how to store images in places other than blob store. I think this site - giftag.com is storing images somewhere else, because I can see the filename of the image and the path looks like it is mapped to a physical location. any idea how they do that, without blobstore?

Rakesh said...

Love this app! I really helps that it reminds me.
pacquiao vs margarito fight

Manki said...

Thank you for the useful collection of tips!

SG said...

Your cookie code fails if there is a space in the value - actually only the first word is stored. God damn Google they didn't make any API for cookies!