PyCon 2009 report

This year was my first visit to PyCon in the United States. Here's the full schedule for the conference.

In this post, I will give some notes and remarks about the sessions that I followed. For each session I included a link to the session details. Most of them have videos and slides of the talks.

How to Give a Python talk

A lot of talk about presentations using slides. I don't like slides. The author didn't like live coding because you could make a lot of typos, which would frustrate an audience. Later on, somebody gave a lightning talk about player piano, a project that types automatically for you and is very useful for presentations.

» How to Give a Python talk

Introduction to CherryPy

CherryPy is a lightweight HTTP framework. It is very much unlike Django: it doesn't have an ORM or administrative interface, but just provides the perfect balance for creating web applications that are different from the standard CRUD.

You can install "tools" that can, for example, automatically encode/decode JSON. CherryPy-guy gave a nice example that I can't find, but here's the webpage.
You can use other dispatchers than the default. MethodDispatcher looks interesting, because it allows you to do REST-style systems.
There's a tool to serve static content.

» Introduction to CherryPy

Introduction to Python Profiling

Tools for how to write fast Python code.

You need to profile your code to find out where it spends its time. This is really important: you really don't know where your code is slow.
Use cProfile
KCacheGrind can help you with interpreting Python profiles.
RunSnakeRun allows you to view cProfile dumps in a GUI. This tool was created by the guy giving the talk.
Caching function references can speed up deeply nested code. But profile first.
Property lookups look just like an attribute lookup, but can be much slower.
Another interesting tool is line profiler, by Robert Kern of Enthought.

» Introduction to Python Profiling

Panel: Python VMs

A discussion about the different implementations of Python:

CPython, the "official" Python implementation. This gets confused with Python, because a few years ago there was only one Python language and implementation. Now there are many implementations of the same language.
Jython, Python on the JVM.
IronPython, Python on .NET and Mono.
PyPy, Python in Python. This is actually more useful than it sounds.
Unladen swallow, an effort by Google to make Python 10x faster.

What I found interesting in the discussion was that the "official" Python guys really welcome and support the other implementations. I had a chance to have lunch together with all the different VM guys, and they are really sincere about this.

Just as with JavaScript implementations, there is a lot of buzz and activity around making Python run faster. Python and JavaScript are very different languages however, and optimization techniques used in JavaScript cannot be used in Python. Still, everybody has their own idea of how to make Python faster (check out "PyPy status").

» Panel: Python VMs

Python in a sandbox

A talk by the makers of PyPy about the use of PyPy as a virtualization tool. PyPy has a secure sandbox for running untrusted Python code, where you can bound IO calls, CPU and RAM resources. Really interesting for running other peoples' Python code on your own server. PyPy is "only" 2-3 times slower than regular Python.

There is no official release, but here are the steps for running it yourself:

Download the sandbox.
Translate pypy with --sandbox flag (this takes a while).
Run the script using pypy interact.py

» Python in a sandbox

A better Python for the JVM

A very technical talk about the Jython compiler. A bit over my head. However the conclusions were:

Currently Jython is a bit slower than Python.
The new Jython compiler should make things a bit faster.
However, currently the new compiler isn't faster.
But it provides better opportunities for optimization.

» A better Python for the JVM

Behind the scenes of EveryBlock.com

Adrian Holovaty, maker of Django, talked about Everyblock, his project to catalog everything that's happening at the hyper-local level (in your neighborhood).

They have a lot of different data-types, and wanted to store all of them in the same table. This meant that date types had to be opaque, since the columns would be different for the various datatypes. The main table has columns named varchar1, varchar2, int1, int2, etc. A schema table defines what these various columns means for a given schema.

This really looked like a hack, which was surprising coming from Adrian, someone who has written such a clean web framework. It seems like an other database system (such as key-value stores or even a graph database, such as neo4j) would fit the job better.

One remark also struck me as odd: Adrian stated that he wrote everything from scratch, basically because "I don't trust other people's code". It was unclear whether he was entirely serious or not, but the fact remains that no external code was used in his website.

» Behind the scenes of EveryBlock.com

Jython Progress

The reason for coming to PyCon: a very interesting talk by Frank Wierzbicki, who works at Sun on Jython.

The focus for the Jython 2.5 release was on compatibility. Through our use with Jython we found that a lot of stuff that worked in Python really worked in Jython as well, which was nice.

The focus for Jython 2.6 will be on performance and integration with Java. It should show up quicker than the 2.2 > 2.5 release, which took years to complete.

At the end he also showed Field, which is a Processing/NodeBox-like environment in Java using Jython. It has only very recently been open-sourced. It only runs on Mac OS X 10.5.

Jython Progress

Pinax: a platform for rapidly developing websites

Pinax has been getting some exposure lately because it provides ready-made components for Django. I had some troubles getting it to work, and was hoping this talk would provide some answers.

There seemed to be some problems with how to distribute this, and how to version all of it. They make a lot of use of svn:externals, which isn't bad per se. It really requires a lot of commitment to get started with the framework, but once you have it installed, it provides a lot of the boilerplate functionality of sites, and more specifically, social networks: user-to-user messaging, twitter clone, tagging, photo management, interest groups, … .

After the talk, I'm still not sure if it's worth the trouble, or if I'd rather be writing a lot of that from scratch. I feel the framework is still a bit too young to be used without to much configuration hassle. Also, the remark of Adrian during his EveryBlock talk about writing everything himself rings true here.

» Pinax: a platform for rapidly developing websites

Class Decorators: Radically Simple

A nice introduction to "better metaclasses". The functionality is only available in Python 2.6, but the talk is interesting (and short) enough to watch in its entirety.

» Class Decorators: Radically Simple

PyPy status talk

After the talk about Python sandboxing using PyPy, I was interested in what the status of PyPy was as a project. PyPy was started to be able to generate Python interpreters with more flexibility than a fixed C implementation. By defining a Python Interpreter on a higher level, you can experiment with different VM features quickly.

The PyPy interpreter is slower but consumes a lot less memory: some objects are 50% the size of CPython. They don't support libraries written in C, but they do support CTypes as the official way to have bindings for PyPy.

» PyPy status talk

Drop ACID and think about data

Bob Ippolito talked about the various alternative data storage implementations that have come up recently. He covered both closed- and open-source implementations. Most of the talk was about alternative key-value stores and other non-relational models, but he didn't cover graph databases.

There are a lot of different kinds of databases out there, but none of them seemed particularly stable or usable for my purposes. Afterwards, I checked out the open space about Cassandra, which seemed like the most stable one.

» Drop ACID and think about data

Concurrency and Distributed Computing with Python Today

This talk was not so much about the multiprocessing module in Python 2.6, but more about the tools available today.

He covered Jython as a solution for the global interpreter lock (GIL), because Jython uses Java threading which does not have the GIL. Also, the usage of java.util.concurrent solves a lot of problems. Actually, this seemed the most interesting approach for my projects.

He also talked about Stackless Python, which offers lightweight threads and cooperative multitasking.

There are different approaches to multitasking:

Real threads: hard to use because concurrency requires a lot of thinking ahead. They are hampered by the GIL in CPython.
Coroutines: light-weight threads are not truly parallel, but simplify the threading problem.
Actors: isolated, self reliant components that communicate via messages. They are truly parallel, and are a good model to overcome the threading fallacies. Erlang and Scala are two programming languages that use the actors idiom for concurrency.

All of these approaches have libraries implementing them in Python. There are a lot of them, and most are alpha quality. The two good ones are Twisted and Kamaelia. The rest is a mish-mash of technologies. ("Concurrency is hard, let’s go shopping!")

» Concurrency and Distributed Computing with Python Today

Abstraction as Leverage

An abstract talk about the tower of abstractions we use every as programmers, and how to produce and consume abstraction layers.

The starting point for the talk was that abstractions "leak". This is something Joe Spolsky has written about, and an interesting read. He also pointed out a quote from Jason Fried of 37 signals: "Prefer action over abstr-action".

It's hard to sum up the talk without paraphrasing it (badly), so I suggest you go see it.

» Abstraction as Leverage

Paver: easy build and deployment automation for Python projects

Paver is sort of like Ant for Python projects, but instead of using a bizarre XML

syntax, just using Python. It is also very similar to Ruby's Rake. It is really

interesting and looks mature. It requires you to make a pavement.py file in your project

directory that gets picked up and executed by Paver.

The basic building blocks in Paver are tasks. Tasks can be given command options (using optparse -like syntax).

Paver can be used for deploying servers, generating documentation, building a distribution

(using distutils/setuptools), working with files, etc. Paver doesn't replace distutils, but

embraces and extends it in a non-evil way.

A good introduction is Getting Starting with Paver.

» Paver: easy build and deployment automation for Python projects

Making games in Python - Tools and techniques at CCP

I wanted to see this talk to see how a gaming company used Python almost everywhere in a high-profile MMORPG. They use Python for networking, the web server, content authoring tools (using wxPython) and game logic. Some performance-critical parts were they use C++. They did not talk about how they wrapped C++ code, only that they didn't use Boost.Python.

They use embedded stackless Python, which they actively maintain. It allows them to do micro-threading (every object in the world runs in its own light-weight thread).

They use their own code-reloading system called livecoding because the reload built-in was deemed unsuitable. Their custom code reloading system allows them to reload game code while server and client are running. They also integrated unit testing so that the tests run before the code is reloaded to see if it passes.

» Making games in Python - Tools and techniques at CCP

Stackless python in EVE, pt. 2

This talk went more in depth about how they used Stackless Python in EVE. It covered

StacklessIO, a unified framework for blocking operations in Stackless Python. He also talked

about how they deployed StacklessIO at CCP, using "Cowboy mode", which short- circuited normal

QA procedures. Although interesting, I could not see myself using any of the technologies or

working methods introduced.

» Stackless python in EVE, pt. 2

Designing a web framework: Django's design decisions

I love to see Jacob Kaplan-Moss give a talk, and this one was no exception. This talk was high-level overview of some of the design decisions they made in the framework, and how to keep ego out of the equation.

He talked about the difference between academic frameworks ("architecture astronauts") and the real world.

What I found interesting was the decision in Django to make a full stack framework instead of glueing together existing components. Part of the reason for this was that there were not much existing components available, but also that they could provide one consistent API

dialect that felt they same, whether you were working in the templating language, the database API or some other part of the stack.

He also pointed that out that having users pick from existing components because it would allow for full flexibility only makes sense for expert users, and not for beginners that are just starting out with your framework. Having them pick from a list of components that they have no knowledge over is silly.

He also pointed out the talk by Cal Henderson of Flickr, Why I Hate Django which is really interesting and funny.

» Designing a web framework: Django's design decisions

Open Space: Cassandra

I followed an open space talk by Jonathan Ellis of Rackspace Managed Hosting, one of the maintainers of Cassandra. [Cassandra](http://code.google.com/p/the- cassandra-project/) is a distributed database somewhere between BigTable and Dynamo, which are both closed source. The project was started by two ex-Googlers at Facebook, and has now been open-sourced. The talk went into depth about the implementation of Cassandra.

The system looks interesting but way too big for my needs. As long as we don't really need high-performance distributed databases, I'd prefer my ACID using MySQL or Postgresql (or Neo4j).

Neo4j

Turns out the maker of the new Jython compiler also works on an open-source graph database called neo4j. His name is Tobias Ivarsson. I talked to him and showed him Perception. He was convinced that neo4j would be perfect for this: super-fast, O(1) lookups, lightweight, Python bindings. It is a mature graph database that runs in production for over 5 years.

There are components available for indexing, graph algorithms and much more.

Answered questions

I came to the conference to hear talks about the progress and future of Jython. I am now convinced that Jython is an excellent choice for Python development, given that there are some very motivated people working on it, and they have the full support of the CPython guys.

Unanswered questions

One thing I've been struggling with in NodeBox is how to do proper packaging of NodeBox packages. I need a system that can version packages, do dependency management, can run several version of the same package at the same time, and do live loading/unloading of packages. I recently discovered OSGi recently, which seems to solve this problem for Java. However, Python doesn't seem to have a system like that available. There was a lot of talk about virtualenv, pip and even zc.buildout, but none of those seems to provide a complete answer. (By the way, Jacob's recent blog post about zc.buildout is very interesting.) The whole environment seems very much in flux, and I think we'll have to wait for the next PyCon to have an answer available.

Personally, I would love to see integration between the VM's native package management system and the Python implementation (for Jython, that would be OSGi). However, even OSGi seems to come under attack, as Sun is [rolling its own solution](http://www.osgi.org/blog/2007/07/can-someone-tell-sun-about- osgi.html).

Overall, package management is a hairy business, and one that needs some serious thought. As for now, I think I'm better off taking a good hard look at OSGi and then rolling my own solution.