Python Notes

Wednesday, October 20, 2004

Inifile: an ini-style configuration template

IniFile is a ini-style configuration module that uses a template to define the configuration file structure. The code below includes some samples. Feel free to use as-is.


"""
inifile.py

Library for reading and writing ini-style config files. The ini file
structure is provided as a class, with nested classes acting as the
sections.

Usage:

# declare the ini file structure using a class statement
class MyIni(IniFile):
one = TypedAttribute(1)
two = TypedAttribute(2)
three = TypedAttribute(3, name='triple')
class NestedSection(IniSection):
str1 = TypedAttribute('one', name='str_one')
str2 = TypedAttribute('two', name='str_two')

# create a configuration instance
config = MyIni()

# read the config file
inifile.load()
or
inifile.read()

# write it back
inifile.save()
or
inifile.write()

-- sample ini file --
[server]
socketPort = 8080
threadPool = 10

[staticContent]
bitmaps = /var/local/bitmaps

[session]
storageType=ram

"""

from metatemplate import GenericTemplate, GenericAttribute, TypedAttribute, next_attribute_id
from types import StringType
from inspect import isclass
import re

debug_inifile = 0

class IniSection(GenericTemplate):

re_section = re.compile(r'^\[(.*)\]')
re_unquote = re.compile(r'^(?P[\"\']?)(?P.*)(?P=openquote)$')

def log(self, line):
"""simple auxiliary log functions"""
if debug_inifile:
# should use the logger interface!
print line

def unquote(self, value):
"""unquotes strings reads from the config file"""
return self.re_unquote.match(value).groupdict()['content']

def read(self, fileobj):
"""reads the config file info from a file-like object"""

for line in fileobj:
#attr_name_list = [f[0] for f in self.__fields__]
attr_name_list = self.__attr__.keys()
line = line.strip()
if not line: continue
matchresult = self.re_section.match(line)
if matchresult:
sectionname = matchresult.group(1)
self.log('Found section: %s' % sectionname)
while sectionname in attr_name_list:
# found a known section
section = getattr(self, sectionname, None)
if isinstance(section, IniSection):
self.log('Section: %s' % sectionname)
sectionname = section.read(fileobj)
self.log('Section finished: %s' % sectionname)
else:
# section name does refer to a section object
# ?should it raise a fatal exception/warning?
self.log('Section error: %s' % sectionname)
sectionname = None
else:
# section is not a child of this node, it's a
# sibling; returns the section name
self.log('Unknown section: %s' % sectionname)
return sectionname
else:
# found an attribute
name, value = line.split('=',1)
name = name.strip()
value = value.strip()
self.log('Normal attribute: %s = %s' % (name, value))

# retrieves the attribute using the alternative name
cls_attr_tuple = self.__class__.__attr__.get(name, None)
if not cls_attr_tuple:
# invalid attribute found in the config file!
# ? should it cause a exception or warning ?
print '1:', self.__class__.__fields__
print '2:', self.__class__.__attr__
print '3:', self.__class__.__attr__[name]
print '4:', cls_attr_tuple
print 'Error - Attribute not found: %s' % (name,)
raise KeyError

# retrieves the name used for binding
cls_attr_name, cls_attr = cls_attr_tuple

# sets the instance value
if isinstance(cls_attr, GenericAttribute):
setattr(self, cls_attr_name, self.unquote(value))
else:
# invalid attribute found in the config file!
# ? should it cause a exception or warning ?
raise KeyError

def write(self, fileobj, _initialpos = None):
"""writes the config file info to a file-like object"""
if not _initialpos:
_initialpos = fileobj.tell()

for fname, fobj in self.__fields__:
# looks at the class to retrieve the attribute definition
cls_attr = getattr(self.__class__, fname, None)
# gets the attribute value stored in the instance
ins_attr = getattr(self, fname, None)

# is a subsection?
if isclass(cls_attr) and issubclass(cls_attr, IniSection):
# writes an empty line before the section (if needed)
currentpos = fileobj.tell()
if currentpos > _initialpos:
fileobj.write('\n')

# writes section name & section contents
fileobj.write('[%s]\n' % fname)
ins_attr.write(fileobj, _initialpos = currentpos)

# is an attribute?
elif isinstance(cls_attr, GenericAttribute):
fileobj.write('%s=%s\n' % (cls_attr.name, str(ins_attr)))

class IniFile(IniSection):
def load(self, fname=None):
if not fname:
fname = self.name + '.ini'
inifile = open(fname, 'r')
self.read(inifile)
inifile.close()

def save(self, fname=None):
if not fname:
fname = self.name + '.ini'
inifile = open(fname, 'w')
self.write(inifile)
inifile.close()

MetaTemplate: Generic data templates in pure Python

I'm happy to announce that the metatemplate module is now sufficiently mature for a beta-quality release. I've been using it for my own projects over the past few weeks. The latest changes involved cleaning the code, and the addition of a new __attr__ private member to the template which allows to map alternative names to Python names. This feature was needed to allow template attributes to be stored on external resources (such as INI files) using alternative names that can include characters that are invalid in standard Python symbol names.

This module can be used to declare standard data structures containing attributes and arbitrarily nested templates. Attributes are stored in the order thay are declared in the source code, which allows the template to be used to build structures that reflect this ordering. Some applications include ini-style configuration handling, data entry forms, document structure templates, and generic data records with ordered members.


"""
metatemplate.py -- template metaclass that can be used to customize any
class to store user-defined attributes in the original definition
order.

"""

import sys
import itertools
from inspect import isclass, isdatadescriptor
from types import StringType, IntType, FloatType, ListType, DictType

#----------------------------------------------------------------------
# Debug constants (for testing purposes)

debug_generic_attribute = 0
debug_typed_attribute = 0
debug_auto_instantiation = 0
debug_iterator = 0

#----------------------------------------------------------------------
# AbstractAttribute is the ancestor of all classes that can be used
# in the metatemplate framework. Abstract attributes are named.

class AbstractAttribute(object):
name = ''

#----------------------------------------------------------------------
# GenericAttribute is the ancestor of all simple elements that are
# used as attributes of templates
#
# When referred from a instance, the __get__ method returns the value
# associated with the attribute, using the default value if needed.
# If called from the class, the __get__ method returns the property
# object itself. This is used for some internal checks.

class GenericAttribute(AbstractAttribute):
""" Generic attributes for generic containers """
def __init__(self, default = None, name = None):
self._seqno = next_attribute_id()
self.value = default
self.name = name
def __repr__(self):
return "" % (self.name)
def __get__(self, instance, owner):
if debug_generic_attribute:
print "GET self:[%s], instance:[%s], owner:[%s]" % (self, instance, owner)
if instance:
attrdict = instance.__dict__.setdefault('__attr__', {})
return attrdict.get(self.name, self.value)
else:
return self
def __set__(self, instance, value):
if debug_generic_attribute:
print "SET self:[%s], instance:[%s], value:[%s]" % (self, instance, value)
attrdict = instance.__dict__.setdefault('__attr__', {})
attrdict[self.name] = value

class TypedAttribute(GenericAttribute):
""" Typed attributes for generic containers """
def __init__(self, default = None, name = None, mytype = None):
GenericAttribute.__init__(self, default, name)
if mytype:
if isclass(mytype):
self.mytype = mytype
else:
raise TypeError("Argument expects None "
"or a valid type/class")
else:
self.mytype = type(default)
def __repr__(self):
return "" % (self.name, self.mytype.__name__)
def __set__(self, instance, value):
if debug_typed_attribute:
print "SET self:[%s], instance:[%s], value:[%s]" % (self, instance, value)
if not isinstance(value, self.mytype):
if isinstance(value, StringType):
# tries to convert a string to the correct target
# type; needed when reading values from files.
value = self.mytype(value)
else:
raise TypeError, "Expected %s attribute" % self.mytype.__name__
attrdict = instance.__dict__.setdefault('__attr__', {})
attrdict[self.name] = value

#----------------------------------------------------------------------
# auxiliary functions

def stripindent(docstring):
"""
stripindent - reformats a multiline, triple-quoted string, removing
extra leading spaces that are used for indentation purposes.
"""
# shamelessly taken from PEP 257:
# http://www.python.org/peps/pep-0257.html
if not docstring:
return ''
# Convert tabs to spaces (following the normal Python rules)
# and split into a list of lines:
lines = docstring.expandtabs().splitlines()
# Determine minimum indentation (first line doesn't count):
indent = sys.maxint
for line in lines[1:]:
stripped = line.lstrip()
if stripped:
indent = min(indent, len(line) - len(stripped))
# Remove indentation (first line is special):
trimmed = [lines[0].strip()]
if indent < sys.maxint:
for line in lines[1:]:
trimmed.append(line[indent:].rstrip())
# Strip off trailing and leading blank lines:
while trimmed and not trimmed[-1]:
trimmed.pop()
while trimmed and not trimmed[0]:
trimmed.pop(0)
# Return a single string:
return '\n'.join(trimmed)

#----------------------------------------------------------------------

next_attribute_id = itertools.count().next

def getfields(dct):
"""
takes a dictionary of class attributes and returns a decorated list
containing all valid field instances and their relative position.

"""
for fname, fobj in dct.items():
if isinstance(fobj,GenericAttribute):
yield (fobj._seqno, (fname, fobj))
elif isclass(fobj) and issubclass(fobj,AbstractAttribute):
yield (fobj._seqno, (fname, fobj))
elif (fname[0] != '_'):
# conventional attributes from basic types are just stored
# as GenericAttributes, and put at the end of the list,
# in alphabetical order
basic_types = (StringType, IntType, FloatType, ListType, DictType)
if isinstance(fobj, basic_types):
yield (sys.maxint, (fname, GenericAttribute(fobj)))
else:
yield (0, (fname, fobj))
else:
yield (0, (fname, fobj))

def makefieldsdict(dct, bases):
# build the field list and sort it
fields = list(getfields(dct))
fields.sort()
# undecorate the list and build a dict that will be returned later
sorted_field_list = [field[1] for field in fields]
field_dict = dict(sorted_field_list)
# finds all nested instances and classes that are templates
attribute_list = [field for field in sorted_field_list
if (isinstance(field[1],AbstractAttribute) or
(isclass(field[1]) and
issubclass(field[1],AbstractAttribute)
))]
# check baseclasses for attributes inherited but not overriden
# !!WARNING: this code does not checks correctly for multiple
# base classes if there are name clashes between overriden
# members. This is not recommended anyway.
inherited = []
for baseclass in bases:
base_field_list = getattr(baseclass, '__fields__', None)
# looks for a valid __fields__ attribute in an ancestor
if isinstance(base_field_list, ListType):
fnames = [f[0] for f in attribute_list]
for fname, fobj in base_field_list:
# checks for overriden attributes
if (fname in fnames):
# overriden - inherited list contains the new value
newobj = field_dict[fname]
inherited.append((fname, newobj))
# remove attribute and quick check field names list
attribute_list.remove((fname, field_dict[fname]))
fnames.remove(fname)
else:
# copy the original entry into the inherited list
inherited.append((fname, fobj))

#---------------------------------------------------------------
# IMPLEMENTATION NOTE
# Templates have two private members named __fields__ and
# __attr__. The former stores the ordered field definitions,
# while the later is a dict indexed by the alternative attribute
# name. There are situations where each structure is more
# convenient. The ideal situation would be to have a ordered
# dict, but this is not the case right now...

# stores the ordered field list in the new class directory
all_fields = inherited + attribute_list
field_dict['__fields__'] = all_fields

# generates a dict indexed by the alternative attribute name; for
# each key it stores a tuple containing the field name (used for
# binding in the template) and the object definition
attr_dict = {}
for field in all_fields:
fname, fobj = field
if isinstance(fobj, AbstractAttribute):
if not fobj.name:
fobj.name = fname
attr_dict[fobj.name] = field
field_dict['__attr__'] = attr_dict
return field_dict

#----------------------------------------------------------------------
# MetaTemplate metaclass
#
# Most of the hard work is done outside the class by the auxiliary
# functions makefieldsdict() and getfields()

class MetaTemplate(type):
def __new__(cls, name, bases, dct):
# works out the attribute ordering; keeps inherited order
newdct = makefieldsdict(dct, bases)
# creates the class using only the processed field list
newclass = type.__new__(cls, name, bases, newdct)
newclass._seqno = next_attribute_id()
newclass.name = name
return newclass

#----------------------------------------------------------------------
# GenericTemplate superclass

class GenericTemplate(AbstractAttribute):
__metaclass__ = MetaTemplate

def __init__(self):
""" instantiates all nested classes upon creation """

# builds a copy of the field list. this is needed to allow
# customizations of the instance not to be reflected in the
# original class field list.
self.__fields__ = list(self.__class__.__fields__)

# auto instantiates nested classes and attributes
if debug_auto_instantiation:
print "AutoInstantiation <%s>: fieldlist = %s" % \ self.name, self.__fields__)
for fname, fobj in self.__fields__:
if isclass(fobj) and issubclass(fobj,GenericTemplate):
# found a nested class
if debug_auto_instantiation:
print "AutoInstantiation <%s>: field[%s] is a "\ "Container Subclass" % (self.name, fname)
fobj = fobj()
setattr(self, fname, fobj)
elif isinstance(fobj, AbstractAttribute):
# found an attribute instance
if debug_auto_instantiation:
print "AutoInstantiation <%s>: field[%s] is an "\ "Attribute Instance" % (self.name, fname)
#setattr(fobj, 'name', fname)
else:
if debug_auto_instantiation:
print "AutoInstantiation <%s>: field[%s] is "\ "unknown" % (self.name, fname)

def iterall(cls, preorder=True, posorder=False, interface=None, _iterlevel=0):
"""
Generic recursive iterator for nested templates

This iterator handles all the recursion needed to navigate deep
nested structures. It 'flattens' the structure, returning a
simple sequence of attributes that can be processed
automatically by sequential code.

This iterator was implemented originally for testing purposes.
It's a class method, because some information can only be
acessed by the class (example: alternative attribute names).

preorder is a flag. If True, nested structures will be returned
*before* descending into its component attributes.

posorder is a flag. If True, nested structures will be returned
*after* descending into its component attributes.

if both preorder and posorder are true, nested structures will
be returned both *before* and *after* the component attributes
are returned.

interface is a filter. It allows to retrieve only the members
that expose a particular interface. It doesnt descend on nested
classes that don't expose that interface.

_iterlevel is a simple-minded watchdog. It was included for
debugging during development, to stop code infinite recursion
in some weird cases, and will probably be removed from released
code.
"""
if not interface:
interface = GenericTemplate
if debug_iterator:
if _iterlevel > 5:
print "[4] Recursion limit exceeded"
return
if debug_iterator:
print "[1] entry code:", cls.name
if preorder:
yield cls
for fname, fobj in cls.__fields__:
obj = getattr(cls, fname)
if debug_iterator:
print "[2] yield:", cls.name, fname, obj.name
if isclass(obj) and issubclass(obj, interface):
if debug_iterator:
print "[3] found nested class: ", obj, [x[0] for x in obj.__fields__]
for member in obj.iterall(preorder=preorder, posorder=posorder,
interface=interface, _iterlevel=_iterlevel+1):
yield member
elif isinstance(obj, GenericAttribute):
yield obj
if posorder:
yield cls
iterall = classmethod(iterall)

def iterfields(self):
"""Simple iterator: returns ordered fields"""
for fname, fobj in self.__fields__:
yield getattr(self, fname)

def __repr__(self):
return "<%s '%s'>" % (self.__class__.__name__, self.name,)

Thursday, October 07, 2004

Publishing classes or instances?

Python is an excellent language for web development, as it can be easily verified by the number of options when it comes to web-enabled frameworks. Most frameworks implement a object publisher -- a piece of software that finds the correct object and activates it upon request. Each object is 'published', or associated with some part of the site.

All object publishers -- or at least, all that I'm aware of -- work with object instances. In this scheme, all HTTP requests to some part of the site are directed to an instance that handles it. Such requests are short lived, and in many cases, all the data is valid for a single request. Multiple threads may be running, though; session data for each call is separate, and web frameworks provide some way to find the correct data for the session that did the request, and for the thread where the server is running.

After pondering a while about it, it seems to me that a better way to design the system would be to publish object classes. At each request, the web framework would create a new instance of the class to handle the request. This technique seems interesting because it clearly dedicates one instance for each request. It incurs on the additional cost of instantiation, but this should not be so high, at least for low volume non-critical apps -- the type of apps I'm working with for the small business market.

I'm now trying to do some tests with the concept, using the upcoming CherryPy2 framework. I've already adapted the object publisher to look for classes, and to instantiate and dispatch requests to them. It was a pretty minor change. It has one potential advantage, in that I can handle long-running persistent sessions using clever Javascript hacks such as the ones used by GMail. In these apps, each client connection has a long-running component, doing direct data requests to the server (using plain HTML, XML or SOAP) in the background. This component is hidden in the browser in a non-visible frame, and it does not have to be reloaded every time the page is refreshed. It's an interesting technique already used for highly interactive websites.

Tuesday, October 05, 2004

Generic template classes in Python

It's been a while since this blog was last updated. And it has a good reason. I've been working with code to allow the expression of complex data structures using Python classes. This project was started as an experiment for data-entry forms, and is now reaching a quite usable state. This is controversial stuff; some people argue that this is only a obfuscation, and that the same results can be achieved with simpler and more traditional approaches. I disagree, but unfortunately, I'm not still able to explain why -- for now, it's just that I 'feel' that this is the correct approach.

At this point, I'm using the basic templating engine for HTML pages, data entry forms, and INI-style configuration files. The following snippet shows how to declare a INI-style configuration file using the templating system:

class CherryPyIni(IniFile):
class server(IniSection):
socketPort = TypedAttribute(8080)
threadPool = TypedAttribute(10)
class staticContent(IniSection):
bitmaps = TypedAttribute('c:/work/sidercom/bitmaps')
class session(IniSection):
storageType = TypedAttribute('ram')

This code reads and writes the following ini file (with the help of the IniFile and IniSection classes, of course):

[server]
socketPort = 8080
threadPool = 10

[staticContent]
bitmaps = c:/work/sidercom/bitmaps

[session]
storageType=ram

The code above (in its full form) extracts its behavior from the structure of the class declaration itself. For example, the class 'knows' the sequence at which declarations must appear on the generated INI file, which is helpful, and better than to have the entries written in random order, as it would appear if a dictionary was used. The TypedAttribute also infers its 'type' from the default value that is passed as a parameter.

After a lot of work, and some dead-ends, I've come to think about this as a generic templating mechanism. The classes declared in such way as to act as templates, to parse, process or transform native Python data representations into other types of representation. Then I hit another stumbling block. But first, some definitions.

Template classes vs Template instances

Template classes are the definitions of the templates themselves. They can be used to build new template instances. As with normal classes and objects, the difference is important, but a few rules must be added to make them comform to the restrictions imposed by Python's syntax and semantics:
  • Rule 1: Template classes may contain any number of nested template classes.
  • Rule 2: Template instances do not contain any nested template classes -- only nested template instances.

The transformation from (1) to (2) is done automatically during the instantiation of the main template class, also known as a container. The __init__ code recursively instantiates all nested classes inside a template class. This is necessary to avoid side effects that would occur as templates instances modify their own member attributes; if one of these members is a class, then the modification would be automatically reflected on the original class itself (because classes are mutable), leading to strange and undesirable side effects. Let us recall the example above to make this point clear:

class CherryPyIni(IniFile):
class server(IniSection):
socketPort = TypedAttribute(8080)
threadPool = TypedAttribute(10)
class staticContent(IniSection):
bitmaps = TypedAttribute('c:/work/sidercom/bitmaps')
class session(IniSection):
storageType = TypedAttribute('ram')

myini = CherryPyIni()

Upon instantiation, the CherryPyIni class will automatically instantiate its nested class members: server, staticContent and session. If they were not instantiated, then any changes done to the myini instance would in fact affect the class declaration. For example:

>>> myini.server.socketPort
8080
>>> myini.server.socketPort = 1234
>>> otherini = CherryPyIni()
>>> otherini.server.socketPort
1234

This is clearly not wanted. New template instances must always be created from a clean sheet, and modifications in one instance are not supposed to change all the others.

Attributes

Not everything inside the template class is another, nested template class. In the example, each nested class has a few TypedAttributes of its own. TypedAttributes are special: they store the default value, and know the datatype that can be stored. This is needed as the INI file is read, to make sure that numeric parameters are automatically converted to the correct type. The attributes include support make automatic instantiation not necessary: they're implemented as data descriptors (also known as properties in other popular languages), which means that they implement the __get__ and __set__ methods and thus can automatically intercept instance-level modifications done at runtime. Besides TypedAttributes, there are also GenericAttributes, that don't do any runtime type checking, conversion, or enforcement. Both types of attributes know the order at which they appear in the class declaration. This information is useful in several applications, even if only for documentation purposes.

Advantages and applications

The system was designed to be very flexible and extensible. Simple attributes are stored as GenericAttributes, or TypedAttributes, depending on the situation. To make things even simpler, basic types such as plain strings or numbers are automatically converted to GenericAttributes (although the order information is lost in this case), which means that some cruft can be removed, making the code even more readable:

class CherryPyIni(IniFile):
class server(IniSection):
socketPort = 8080
threadPool = 10
class staticContent(IniSection):
bitmaps = 'c:/work/sidercom/bitmaps'
class session(IniSection):
storageType = 'ram'

The version above works fine; the difference is that it does not enforce type checking during attribute access as the TypedAttribute does; also, there is no guarantee that the attributes inside each one of the nested classes (server, for example) will be listed in the correct order when the INI file gets written.

The stumbling block

But, as said above, there is a stumbling block. Not all attributes can be represented as simple attributes. In these cases, the use of nested classes is a requirement. This can lead to code that is hard to read. For example, this is a snippet of a form declaration:

class address(Panel):
style = 'form-section'
class address1(EditBox):
caption = 'Address 1'
size = 40
class address2(EditBox):
caption = 'Address 2'
size = 40
class city(EditBox):
caption = 'City'
size = 40

If the number of attributes inside the nested classes is big, the code above can becode really difficult to read -- too long to fit in a page. Of course, one of the advantages of the class declaration system is that inheritance is your friend, and you can always refactor the definition into a sequence of shorter ones. Even so, after testing, I did as follows:

class address(Panel):
style = 'form-section'
address1 = EditBox(caption = 'Address 1', size = 40)
address2 = EditBox(caption = 'Address 2', size = 40)
city = EditBox(caption = 'City', size = 40)

It's much shorter, and quite clear. However, the EditBox function is a hack, and that's what is bothering me.

Why is EditBox a hack? Well -- for all the reasons explained above, a nested class attribute has to be a class, and not an instance. Reading the code above, EditBox looks like a constructor call that returns a EditBox instance -- but it is not. It's a function that builds a new class definition, using the parameters provided as default values for the new class, and returns a class.

I'm not entirely satisfied with this solution, but I'm not still able to substitute it with a better, more generic approach. Right now, for every such class (EditBox, Button, etc) I have to provide a 'class factory' function that builds a new class.
I'm pondering some alternatives:
  • The 'EditBox' class (and all related classes) could be more intelligent. If called from within a class declaration, the constructor would return a new, derived class. If called from within 'normal' code, it would return an instance.
  • Another solution is to substitute nested instances in class declarations for classes. It's the opposite from the above, in a sense. As soon as the metaclass constructor for the container class is called, it searches through the attribute list. If it finds a nested instance that really should be a class, it then builds a new class out of the instance, and drops the instance afterwards.

More stumbling blocks

Did I say that I've found one stumbling block? Well, it seems that stumbling blocks are more common than I had realized. Another interesting one is: how to use a template class with conditionals that will be evaluated at instantiation time? For example, let us assume that I have a form template that will be used in different situations, upon some conditions. The template is the same. Depending upon some parameters, the instance will be generated in a slightly different way. The following code snippet illustrates what I mean:

class MyForm(Form):
...
if can_delete:
bt_delete = Button(caption = 'delete', action = 'delete')

While it seems weird, the code above works. Its only problem is that the test is evaluated only once -- when the class is first evaluated. So, it's not possible to do something like this:

can_delete = True
form_with_delete = MyForm()
can_delete = False
form_without_delete = MyForm()

One way make it sure that the class declaration would be re-run is to put it into a module and reload it; another way is to put it inside a 'exec' statement. But in both cases the hack defeats the purpose of using classes for this type of declaration. This is another issue that I'm not comfortable at all.