A simple protocol for data synchronization between objects
What follows can't still be considered a definite version of this idea. It's still a prototype of an idea that I have been working on in a rather fuzzy way, to solve a practical problem that arised as I started to used templates and a declarative style approach to model business entities. Consider it as an attempt to write it down to clarify things for myself; if you find it useful, or at least amusing, please let me know.
As a business application grows, there's an increasing need to manage internal data communication between different objects. The temptation is to share live data, in such a way that all applicable objects are automatically notified of any change. However, this approach raises a number of issues with concurrency, threading, and security in general. It's a very complex and demanding problem.
A different approach is to share static copies of data using a simple synchronization protocol. Each object stores its own copy of the data. At certain points in the system, data is explicitly copied between the objects. This approach isn't certainly as charming as having live links; however, it leads to a much more predictable and simple design. It's one case where practicality beats purity.
While designing my new applications, I gave a lot of thought to simple templating objects. The current implementation allows to declare nested templates, and it's useful to model a variety of entities ranging from simple data records, data entry forms, reports, and validation procedures. These templates can be thought about as differents visions of the same data; each one has its own internal structure, appropriate to solve the problem at hand. For example, simple data records tend to be flat models. Other structures, such as forms or reports are usually mapped to nested templates. Nesting is used in this case to replicate the structure of the entities being modelled. A similar declarative mechanism is also by SQLObject to declare database entities.
At this point, the question is: how can I synchronize the state of data between objects which do share the same attributes, but have different structures? For example, let's assume that I have these three templates, one for a database entity (using SQLObject), another for a data entry form, and the last one for reports.
class UserData(DataTemplate):
username = ''
password = ''
name = ''
description = ''
class UserForm(DataTemplate):
title = 'User Data'
class basicinfo(Panel):
username = EditBox(caption = 'Username', size = 15)
password = EditBox(caption = 'Password', size = 10, password = True)
password2 = EditBox(caption = '(repeat)', size = 10, password = True)
class personalinfo(Panel):
name = EditBox(caption = 'Full name', size = 40)
description = EditBox(caption = 'Description', size = 80)
class UserReport(ReportTemplate):
class header(ReportHeader):
title = 'User Report'
class body(ReportBody):
username = Column(title = 'Username', width = 15)
name = Column(title = 'Full name', width = 40)
description = Column(title = 'Description', width = 40)
These templates are really simple, and provide a good showcase for the data synchronization protocol. Each one has a different structure that reflects the actual requirements of each object. However, all templates refer to the same basic data, that is declared in the DataTemplate object.
One solution to the problem is to provide the UserForm and the UserReport templates with hooks to receive data from the DataTemplate. However, in the spirit of dynamic languages such as Python, this is not really necessary, and in fact removes some flexibility out of the result. It's much easier to use a common protocol in such a way that all objects that use a similar template structure can talk to each other.
The protocol involves two types of objects:
- The intermediate data record is a simple dict-like object that stores a flat representation of the data. Any object that exposes a simple mapping interface can be used.
- The getdata and setdata methods allow templates to respectivelly retrieve and set internal data using an intermediate data record.
One application case is as follows:
# creates a dict with initialization data
adminuser = { 'username':'admin'; 'name':'admin user' }
# creates a new 'generic' user record
user = UserData(adminuser)
# creates a form initialized with the same data
form1 = UserForm(adminuser)
# the UserData itself also exposes the mapping
form2 = UserForm(user)
# the report receives an iterable that yields objects with a compatible mapping
# interface. dbUser is be a SQLObject instance with appropriately named fields
# (assuming that sqlobject is patched to implement the mapping interface)
report = UserReport(dbUser.select())
report.generateReport()
All communication is done internally using the getdata and the setdata methods. For nested templates (such as the UserForm and the UserReport), the representation is automatically flattened, allowing the data to be set directly to the inner members of the template.
The intermediate data record can optionally offer a much richer functionality. It can check types, or do a limited validation of the data (possibly limited to a sanity check). Dynamic link between templates can be supported by using the observer pattern in the data record implementation.
The limitations of this design are simple, and relatively obvious. First of all, all names must be unique. For some structures it may be a problem. One possible alternative is to allow to specify the ids to be used for complex structures. One such example are INI files, which are usually nested, but are the primary source of data. The flattened version should still have unique names for all its members.
Another interesting design issue is the one of nested data records. Data records are flattened to simplify the design. But in many cases there is a one-to-many relationship that needs to be mapped; for example, a form with subitems, or a report with nested subgroups. I think that it's possible to extend the design for this situation while still keeping it simple and clean.