Monday, May 11, 2009

Python Project Structure

As the python project I've been working on gets bigger and bigger I've been seeking to setup the projects structure nicely for source code and tests. I already knew about nose but what I mainly wanted was a definitive directory structure - "best practice" type stuff. The most helpful thing I found was a blog post which I've replicated here.

Do:

  • name the directory something related to your project. For example, if your project is named "Twisted", name the top-level directory for its source files Twisted. When you do releases, you should include a version number suffix: Twisted-2.5.
  • create a directory Twisted/bin and put your executables there, if you have any. Don't give them a .py extension, even if they are Python source files. Don't put any code in them except an import of and call to a main function defined somewhere else in your projects.
  • If your project is expressible as a single Python source file, then put it into the directory and name it something related to your project. For example, Twisted/twisted.py. If you need multiple source files, create a package instead (Twisted/twisted/, with an empty Twisted/twisted/__init__.py) and place your source files in it. For example, Twisted/twisted/internet.py.
  • put your unit tests in a sub-package of your package (note - this means that the single Python source file option above was a trick - you always need at least one other file for your unit tests). For example, Twisted/twisted/test/. Of course, make it a package with Twisted/twisted/test/__init__.py. Place tests in files like Twisted/twisted/test/test_internet.py.
  • add Twisted/README and Twisted/setup.py to explain and install your software, respectively, if you're feeling nice.
Don't:
  • put your source in a directory called src or lib. This makes it hard to run without installing.
  • put your tests outside of your Python project. This makes it hard to run the tests against an installed version.
  • create a package that only has a __init__.py and then put all your code into __init__.py. Just make a module instead of a package, it's simpler.
  • try to come up with magical hacks to make Python able to import your module or package without having the user add the directory containing it to their import path (either via PYTHONPATH or some other mechanism). You will not correctly handle all cases and users will get angry at you when your software doesn't work in their environment.
I found the above very helpful in organising my code as well as two other important things I've found.

  • For tests which need to see your source code, steer away from using relative imports and instead put your project on the PYTHONPATH. My thinking is that anywhere the project will be used it will need to be properly installed (i.e. on the PYTONPATH) so that's how it should work normally. You can use virtualenv if you don't want to clutter your site-packages
  • I've had to change my thinking from Java/C# and start to accept that multiple classes in one file is OK (C# will actually let you do this too, and Java too apparently). With that in mind, I keep classes which are functionally similar in a module and when that module starts to try and do too much I create a folder with submodules. So from the above examples Twisted/twisted.py and Twisted/test/test_* is fine for a relatively simple twisted.py (maybe 3 or 4 classes) but once the library starts to grow I'd consider breaking it up at Twisted/twisted/thispart.py and Twisted/twisted/thatpart.py

In all of this it was helpful to browse the twisted source and see how that was laid out.

No comments:

Post a Comment