Python Notes

Tuesday, November 16, 2004

A dynamic class repository

As part of my workflow project, I came across an interesting problem. Transition entities are modelled as classes derived from a base ActionDef class, because each action in the system needs to have its own custom code. As the system grows, I needed to design a repository for such classes.

The easy way to implement the repository is to put everything inside a package. It's reasonably scalable; a package can easily hold a lot of classes. But at some point, the package will become harder to maintain. There are other problems also. The application is designed to work as a long-running server. Shutting down the server to add new classes should not be required; instead, new classes should be added or redefined dynamically.

At this point, I started to contemplate how to design a dynamic Python class repository, in a safe and clean way. One simple way is to use a database to store the code for the class. At first, I was concerned that this would be a huge security problem; but then I realized that no matter what I did, code would have to be retrieved from the repository anyway, and a database would be no worse by itself than any other alternative. But there are some real issues with databases; the biggest problem is that it's not convenient to maintain, and also, not compatible with some standard code development practices, such as versioning repositories (I'm using Subversion).

After giving up on the database idea, I decided to keep it really simple, and use a standard file-based class repository. A class factory function gets the name of the desired class and looks after it in the repository. It loads the source file that has the same name of the class, and returns an instance of the class for the application to work. Relevant code goes like this:

def makeObjFromName(self, name, expectedClass):
Creates a new instance of class using the process library

Searchs the pathlist for a file. If the file exists, executes
its content. It is assumed that the source file contains a
class with the same name as of the source file (without the .py
extension, of course). For example:

file name = -> class name = pdCreateNewuser

If the file does not exist, or if there is no symbol with the
correct name inside the file, it will raise a NameError.

If the file exists, and the symbol is define, it checks if the
symbol is bound to a subclass of the expectedClass. If not,
it raises a TypeError. If the class is correct, the object is
automatically instantiated and returned.
for path in self.pathlist:
filename = os.path.join(path, name+'.py')
print "** trying " + filename
if os.path.isfile(filename):
print "** executed!"
obj = locals().get(name)
if isclass(obj) and issubclass(obj, expectedClass):
print "** found!"
return obj()
raise TypeError, 'Object %s is not from the ' 'expected class %s' % (name, expecteClass)
raise NameError, 'Could not find class %s in the library' % name

The file based repository is simple, and also more convenient to work with than the database. But it still far from a ideal solution, and I'm still looking for improvements. One of the ideas is to include native support for Subversion in the repository code itself, using the Python Subversion bindings. Instead of a file read, I could checkout the latest version from the repository whenever required; it may be slow, though, so it's something to be carefully tested.

Another issue is that class references are passed as strings, instead of pure Python symbols. This is needed because all references have to be resolved at runtime by reading the source code on the fly. Also, as each class is defined into its own source file, there is no way for one class to refer to any other. The resulting code does read as nicely as it could. For example, the following code snippet shows a task definition class that holds a reference to a process definition:

class tdFillInvoice(TaskDef):
name = 'Fills the invoice form'
description = '''
Fills all the fields of the invoce form (...)
processdef = getProcessDef('pdPurchaseItems')
actions = [
('adConfirm', None, None),
('adCancel', None, None),
('adDelegate', None, None),
('adPrint', None, ''),
('adSaveDraft', None, ''),

I am looking into ways to allow one to write this reference in one of the forms below:

processdef = ProcessLibrary.pdPurchaseItems


from ProcessLibrary import pdPurchaceItems
processdef = pdPurchaseItems

In the first case, the idea is to use the __getattribute__ magic function to capture attribute access to the ProcessLibrary object. It's tricky, and perhaps a little bit dangerous, because __getattribute__ imposes some limitations on what can be done. It allows the code to look as a pure Python attribute access, which improves readability; but on the other hand, it hides too much magic from the users, which may not be the best thing to do.

In the later example, the idea is to implement a import hook to create a pseudo-package of sorts. The problem with this approach is that Python's import mechanism is not well documented, and it's not clear what is the preferred way to implement an import hook.

In both cases, to avoid problems with circular references, it may be necessary to move the code from the class body to the __init__ code. There are also other projects that suffer from the same problem (referencing class names with strings); SQLObject and ctypes comes to mind. Finding a good and generic solution for the workflow library may also be helpful for these projects.


Post a Comment

<< Home