Python Notes: September 2004

Sunday, September 19, 2004

Thoughts about GUI layout techniques

A remark on Wing IDE wxPython integration page caught my attention today:

"A Caveat: Because Python lends itself so well to writing data-driven code, you may want to reconsider using a GUI builder for some tasks. In many cases, Python's introspection features make it possible to write generic GUI code that you can use to build user interfaces on the fly based on models of your data and your application. This can be much more efficient than using a GUI builder to craft individual menus and dialogs by hand. In general hand-coded GUIs also tend to be more maintainable."

There are two schools of thought regarding GUI layout: manual and automatic layout. Popular programming environments such as VB and Delphi helped to disseminate the notion that manual layout is the way to go. On the other hand, automatic layout tools suffered from a log time, due to the dull results achieved by their application on most situations.

Enter the new generation of Web applications. The clever use of the CSS+HTML+Javascript combination is producing glitzy high performance interfaces that are at the same time efficient, cross platform, and highly usable. Compare that to the now "legacy" web applications -- the same ones that looked so good a couple of years ago; what lessons can we learn from them?

One striking lesson is the relative value of positioning. In layout, positioning is everything. This has led to a school of absolute, fine tuned positioning, that is represented in tools such as Adobe Acrobat and, to a certain extent, in the IDEs forementioned. But the reality of Web design has taught us something important. Due to the differences between web browsers, and also to the inherent limitations of CSS and HTML, it's impossible to bring high precision absolute positioning to the web world. As a result, web designers had to resort to their creativity to achieve precise designs that are mostly based on relative positioning.

The web browser is, for the most part, an automatic layout tool. HTML and CSS have only a limited input here. True high-precision layout engines, such as Adobe Acrobat, could never make as much success due to the very way the web is supposed to work. Relative positioning, as used in today's web design, teach us a valuable lesson: it's possible to strike a compromise between beauty, ellegancy, and functionality in a web world through the concious use of a limited feature set of positioning primitives. In a way, it's the rescue of the automatic layout engines. That leaves us with the question, why don't use the same set of techniques for native applications? That's what I'm studying right now -- how to design native interfaces with GUI libraries (such as wxPython) that use the newest lessons of web design to achieve a good looking, flexible, and high performance result. For it to work, we must go beyond the traditional approach for native frameworks -- automatic positioning, ar at best simple very automatic layouts (vertical, horizontal, or grid) -- towards a vision of the native GUI renderer as a type of "document renderer", that is capable of distributing the visual content in a good layout given a set of layout constraints as based on the document reading flow. We must remember that we're targeting human users; we must design graphical interfaces that read right, and there's no better example of this than the web applications of today.

On a related note, it's interesting to note how dated some popular PHP-based applications look today. Countless forums feature PHPforums-based discussions. The default design used by these forums is a tribute to the old style of Web design; cluttered, inefficient, with bad use of white space that forces the user to scroll vertically through long pages to read all the posts. It's funny because a couple years ago PHP-based websites were hot. This is not a problem with the language itself -- I'm sure good designs can be made with it -- but a sign of the speed at which web interface design evolved recently.

Thursday, September 16, 2004

Languages, behavior, and frameworks

It's quite interesting when different topics fuse together so nicely. Over the last days at c.l.py, and over this blog, I've been ranting about IDEs and frameworks for business applications. In an independent thread, I've also written a note about the seemingly connection between programmer behavior and the choice of programming language. Simon Brunning posted a comment on this blog pointing me out a discussion over Charles Miller's weblog about the behavior of Java programmers. In a interesting twist, there's a comment that connects the Java mentality with egoistical search for a revolutionary framework. The author argues that for historical reasons Java programmers had to come up with their own little pet frameworks in order to be able to do anything useful with the language. As a result, today everyone is an architect, that loses precious time bragging about his design abilities.

I compare this situation with Python's one, and see two different trends at work. My argument is that Python inherent elegancy has attracted a nice community that values respect and objective discussions. But it's clear that Python is lacking in the area of business application frameworks. Is it possible that the parallel development of a dozen different frameworks will change the character of the community? Although I agree with the argument for Java, I don't believe that the same will happen with Python. First, because there's already a strong and solid community working. But better, it seems that most Python users (using c.l.py population as a sample) have learned (maybe from previous experience with other languanges and environments) the value of consensus. In some way or other, I hope that the community will find a way to agree on a few principles of design for business applications; and then, with time, a recommendation, or even a PEP with stronger guidelines, or even a standard framework. Of course, this is a long road, and we know how diverse frameworks need to be given the different needs. But some common ground can be found, I'm sure, and I hope it will be found sooner than we expect.

Wednesday, September 15, 2004

Do we really need SQL?

It's been a few busy weeks since I returned to the world of business application development. This time, however, I'm having a chance to take a fresh look at it. Although I'm still too used to some tools to change my mind very quickly (IDEs, for example), there are a few things that I'm really keen to try out. One of them sounds like anathema -- do I really need a true RDBMS for my apps? (note: I'm using the terms SQL and RDMS rather interchangeably for all purposes of this discussion. I hope my readers will not find if offending.).

A long time ago, I've read an interesting paper criticizing the over use of RDBMSs (I tried to Google it but I can't find it now). The author -- an old time IBM researcher -- argues that the use of RDBMSs is not needed after all, and that (almost) everything that is done with SQL can be done more efficiently with old-style flat files and batch processing programming. Well, it's hard to agree with him on the conclusion, but he pointed out several issues with relational databases and SQL that are worth thinking about. The main problem is that the relational theory is purely a mathemathical construction, that over generalizes the problem at hand, and in the process imposes several layers of abstractions for things that can be done dirty cheap if you just do it the "ugly way". If you think from this perspective, you can surely see that for lots of applications, SQL is plain overkill, and one can surely live happily with a much simpler persistence model.

Then today, I've read another article, called object prevalence. The argument is sound, and made me think (again) on the actual need for RDBMSs. It's now clear to me that there are few good reasons to deploy a full fledged RDBMS for small apps, and even for relatively big apps. Storing data in memory sounds better than most people realize. Many apps never have more than a few thousand records, and storing everything in memory is a clear winner in terms of performance. Even big tables - those with a few hundred thoused records- can actually be stored in memory if well structured.

So why do we need to deploy RDBMSs today? There are still a few things that a RDBMS/SQL combo does well, and that are good reasons to keep the RDBMS backend.

Customers are used to it. Tell your customer that you're using a SQL database, and he'll feel fine about it. Tell him that you're using a proprietary high-performance scheme at absolutely zero extra cost and watch his reaction.
SQL makes report generation easier. Well, while not actually absolutely true, it's a fact that there are countless report generators available around, and that some of them will probably be able to connect to your database and retrieve data using SQL. This puts more power in the hands of the customers, and buys them peace of mind.
SQL is network enabled. It's easy to separate the SQL server from the application code. On the other hand, a custom memory-based persistence setup may depend on special design to take into account networking. On the other hand, if your app is not going to need to exchange raw data with anyone else, then your app server can just do it fine by serving custom XML feeds to all its clients.

For now, I'm just wondering if the entire SQL stuff is really worth the price for some of my current assignments. I think that it's better to be conservative in this regard, so I'll probably keep the RDBMS backend -- for now. Future projects may be different...

The Python IDE Manifesto

I've been discussing over the past few days at c.l.py the state of the Python IDEs. Other threads have popped out on this same very subject since I resumed reading the mailing list a few weeks ago. To put things into perspective, IDEs are also a hot topic at one well known magazine. It's hard to keep the discussion from becoming religious. I'l try my best to respectfully put my own opinion here. However, it's a manifesto. It's meant to be opinionated and passional. Be warned.

Over the past weeks I've been looking for a good IDE to support my Python programming activities. My search has led me to literally dozens of products of varying degrees of quality. Many are broken. Some are incomplete. A few are working, but still missing some important features. After this long search, I'm still missing it somehow. I need a good, solid IDE.

What is an IDE? The acronym means "Integrated Development Environment". Any of the three words has a broad meaning of its own. The resulting expression also does have a broad meaning. I'm going to narrow it here, to put things as I understand them. For all purposes of this manifesto, an IDE is a well integrated software suite that allows a developer to do all the tasks related to his programming activities in productive environment. There's one more thing in my definition that takes IDE apart from code editors: A good IDE must provide a foundation for a RAD style of development, automating many of the tasks related to project management, and providing a high level framework for application development.

Good IDEs are a productivity tool. Bad IDEs are distracting, and that's a good reason why many programmers swear by their combination of console-based editing and programming tools.

Good IDEs are simpler than the sum of its parts. It's supposed to remove complexity, not to bring more complexity.

Good IDEs work at all levels of abstraction, from project management to simple coding tasks. It makes easy to locate and work with all the files in a project. It makes navigating the file system unnecessary for most maintenance tasks, removing a lot of clutter out of the view. On the other end, it supports all the functionality you would expect from a code editor: syntax coloring, code completion (possible even in Python), class browsers, and regular expression support, to name a few.

Good IDEs support Rapid Application Development, and that means that they are able to automate much of the stuff that is neede to make a simple project take off. That's probably the main difference between code editors and professional IDEs -- at least, that's something all the top commercial IDEs support. It goes beyond the edit-compile-run cycle. The IDE must work closely with the application framework, automatically generating parts of the code that are needed to integrate the pieces. This is essential for the visual part -- the form designer -- but by no means limited to it; it's also useful to write modules, packages, or any other stuff that is going to be integrated with the rest of the project.

Up to this point, I've been talking about good IDEs. But there's something better. The perfect IDE supports more than one style of development. Some people do network programming. Some people do scientific programming. Others do business stuff, and that's mean database access. Each type of software imposes a different approach to the development cycle. For example, a good interactive shell is a must for scientific programming, but it's not nearly as important for business app development. The debugging needs vary considerably. Each style also imposes its own framework and supporting libraries, and the perfect IDE seamlessly support them all with sound abstractions for each need.

Closing words

The perfect IDE is still a distant dream. But there's something that I need, and that I was still not quite able to find. It's a good IDE for business application development. There are some promising tools here -- Boa Constructor and PythonCard deserving special mention. But both fail to some degree, specially when it comes to the framework (or lack thereof), including the database support. And that's where the main weak spot of Python's library is today. Although a lot of tools are available, there's no integration, each one being designed and coded to solve some particular problem. It's a difficult problem to solve without a central vision. It's no wonder that the best IDEs available are all commercial offerings that include their own frameworks. It's not as if the open source community can't do it; it's probably because nobody with the required skills ever bothered to tackle it himself. But the open source community has already shown what is it capable of doing, and it may be only a question of time until it gets done.

Monday, September 13, 2004

Why do I like Python

I believe that there are some things that you just can't take apart from the whole -- they're an integral part of the package. Code beauty and good behavior are strangely connected.

Python is a wonderful programming language. It is, for the lack of a better word, elegant. Programs written in Python tend to look good, and read nicely, even when the reader is a newbie. Much has been said about this before over the Python mailing list and on dozens of reviews. But that leads me to another reason, that's the wonderful community that participates on the mailing list.

For people used to other mailing lists or newsgroups, the Python list is a welcome exception. Apart from some occasional disputes, there are very little flame wars. The level of the conversation is polite. But it is not like some political-correctness devil had hipnotized the entire community into a sleepy state of cynic behavior. On the opposite, there are discussions -- plenty of, in fact. And newbiews are well-received and regarded. Posts that on other groups would be received with a barrage of RTFM-like replies are normally answered with care and attention to detail. It's not uncommon to have short programs implemented by experts for the sole sake of helping someone who asked for it in the group. The authors certainly are busy people, but they do take some time to help newbies. And why it is so?

The answer again lies in my primary reason to like Python. It's beautiful. And the group happens to entice the presence of people that loves to write beautiful code. So even a question from a newbie sparks some kind of healthy competitive mood; and you see experts discussing solutions and delivering code. At each round, someone would point out a few details from each other's implementation, and it would get refined to the point where it's nearing perfection: it's multiplatform, it can't be optimized anymore, and the best of all, it's still readable. Everyone seems to enjoy it (apart from a few occasional rants).

I believe that there are some things that you just can't take apart from the whole -- they're an integral part of the package. Code beauty and good behavior are strangely connected. A language that not only allow, but encourages the writing of beautiful code will attract better programmers. Because they care for their craft, not in a egocentric way; they like to write good, and useful, code. What good is beautiful, if it's not to be seen and used by other people? So they also care for other people, and it shows at c.l.py.

Friday, September 03, 2004

Python + UML design tools

Now that I'm trying to get to speed with business app development in Python, I decided that I needed to take the full course. Up to now, I used to make a quick database prototype using MS Access to jumpstart the process. It's fine for small stuff, but not so for something that is going to be used professionally. So I'll be using UML from the start, and I'm going to select good Python-aware tools to help me with it. I'm trying to find tools that go beyond the drawing aspects. Writing class definitions in Python would be the best for me. Many of the tools available support only Java, though, so I maybe need to make a compromise.

I found some good references to share. Bruce Eckel's wrote a UML Tool Survey (2003). A good guide on evaluating modeling tools is available at Objects by Design website. From these starting points, I found a few free tools that I'm going to evaluate.

Dia is a generic diagramming tool written in Python. It supports UML diagram editing. I don't have information about the quality of the UML design tools, or if its possible to extend it to work together with some of the tools that I'm working with. Being written in Python, I expect to be able to to some of the stuff that I want.

<sigh> It seems that Dia on Win32 is dead, or sort of. I'm not in the mood to download and setup the entire mingw development environment just to check it out. What is most amazing is the reason given for the removal of Win32 binaries. It seems that someone decided that it was a good idea to threaten the contributor that was voluntarily compiling it on the basis that he wasn't distributing its source changes. It's just amazing.

Poseidon is a commercial UML design tool written in Java that has a free community version. For the description, it seems to be powerful enough for actual use. It's Java centered.

ArgoUML is an open source UML design tool that shares a common root with Poseidon, although the two projects have diverged since the split. It's also Java-based

UMLStudio is a Windows-based tool that supports more languages than usual, but it's no Python. I've included it in this list because it was recommended elsewhere, and I want to check out its features and compare to the others.

Judeis another Java-based UML design tool. (It seems that Java guys are the only ones that really use this stuff nowadays). But it does have a free version to try out.

Wednesday, September 01, 2004

Python ORM tools

An object relational mapper, or ORM, is a piece of software that sits between the object model used by an application and the relational model used by a conventional RDBMS system. It's a useful piece of software, as it allows one to blend the best of both worlds. The application can be written in terms of objects, and the database can be managed using conventional tools.

A little bit of history

I have very little experience with real-world ORMs, but the concept itself is not new for me. I used to write business applications a long time ago (circa 1990), long before RDMBS became popular in the PC world. In those times we used to rely on flat file formats and external index libraries, such as the Turbo Power - a set of Pascal libraries to support B-Tree indexes and mergesort methods. When Turbo Pascal 5.0 (and shortly after, 5.5) arrived with the notion of objects, I wrote a small class library of my own to encapsulate records as objects. All records were descendant of TRecord, which declared a few abstract methods to load and save records to the database, and to manage the keys used in the index files. Its descendants had to declare real methods. Turbo Pascal didn't allow for the type of magic that dynamic languages such as Python do, so a lot of it was repretitive hand-crafted work. Even parts of code that were seemingly similar in structure were hard to automate at that time.

Enter Python. As a dynamic language, Python makes easy to write a lot of the glue code that is needed to translate between the OO and the RDBMS worlds. Python ORMs take advantage of this to allow for simpler use.

What is a ORM good for?

Well, now we know what a ORM is, and how did we arrive here today. But why should one use a ORM? Well, implementing a business application using SQL can be a tedious task. Records in SQL are simple tuples of data. If you want to use this data in a object-oriented fashion, you have to convert this representation to an object, and vice-versa. SQL is also based on sets of data. It means that there is not such a concept as working with individual records in SQL. Another issue is that any changes to the object model used by the application imply equivalent changes both to the relational database model, and to the code used in the adaptation layer. If this code is spread over the application, the process will not only be tedious, but also error prone. Due to the nature of SQL (and Python's, to a certain extent), most errors will only get caught at runtime, which is too late for most situations.

An ORM can solve all these problems, and add a few niceties of its own. In the best case, it will allow for a true object persistence model. Good ORMs also allow changes in the object model to be automatically mapped to the relational one. The opposite is also true, albeit less useful, assuming that you will be writing object-oriented code. In the best scenario, the application code can be completely ridden from any SQL reference, allowing the full use of an object-oriented paradigm for its development.

Challenges in ORM design

This short document doesn't intend to be a treaty on how ORMs are designed. However, some of the pitfalls are important and obvious enough to deserve special treatment.

The basic challenge in ORM design is how to map the OO and relational models seamingless. Good ORM tools will run as transparently as possible, allowing the programmer to forget the intrincacies of the relational model. The process may start with an representation of the object model, either graphical or textual, that can be compiled to generate the classes as needed. Its also possible to declare the object model in pure Python, counting on the ORM tool to generate and handle the glue code automagically.

To implement true object persistence, each object needs a unique identifier. Each ORM will use a different approach here. However, one issue that is common to all ORMs is how to handle multiple copies of the same object. For the sake of consistence, objects should be singletons. If the ORM tool doesn't do it automatically, then it's up to the programmer to make sure that no multiple copies of the same object exist at the same time, avoiding concurrency and consistency issues.

Last, we have the age old tradeoff: ease of use versus power. Simpler ORMs are lighter, generally easier to learn, but may fall short on the amount of automation allowed. More powerful ORMs are the opposite -- they fully support all expected functionality, but may be too complex or heavyweight for simpler applications. In any case, ORMs may require a too strict approach that may not fit the programmer's mental model, so chosing the better one is as much a matter of choice as it is a matter of intrinsic quality.

Some Python ORM tools

The current state of ORM tools for Python is difficult to assess at first glance. There are many tools available, but some are old, not maintained, or not widely supported. I've collected a few links and my impressions as follows.

PDO

PDO is not a full fledged ORM. It's rather a lightweight object-oriented layer for the DB-API. It adds a few helper methods and accommodate the slight differences between the available database drivers within a single framework, making the transition easier. However, it falls really short of a true ORM. There is no automatic mapping of the database rows to objects, and the database structure has to be created and managed manually. It would probably be better to treat PDO as part of the underlying DB-API, but that's out of the scope of this document.

SQLObject

SQLObject is a very pythonic approach to ORM. Objects are declared in Python using a very simple syntax, and a metaclasses-based approach makes the translation between the objects and the database representation fully transparent. Each record in the database is represented by a object, generated on the fly according to the defined structure. Fields are automatically mapped to attributes. Joins and relationships -- including many to many -- are also implemented in a fairly straighforward way. Each object is uniquely associated with a row in the database, guaranteeing consistency. And better -- everything is done at run time, with no need for intermediate steps or external support tools.

The approach used by SQLObject is very easy to understand and use in actual production. A few things are surprisingly easy; for instance, many-to-many relationships normally resort to an intermediate table to store the relations; SQLObject manages this intermediate table automatically, and the user may never need to know that it exists. Creating new tables is also a breeze. On the other hand, some of the more complex relational constructs are not directly supported, such as left joins. The primary key is always an integer (actually, there are a few workarounds, but there are compelling reasons to use it as-is). There are also a few quirks; for example, being as pythonic as it is, one is tempted to declare inherited object types. In relational database design, the descendant is usually a table that implements only the additional fields, and store the ancestor fields in the original table; a relationship is used to form the full record. But due to the way database entities are mapped to objects, the end result is not a true inherited related database design; fields are duplicated on the descendant object, making it unusable as a means of database modeling. Transaction support is also missing.

Summary. SQLObject is a fantastic tool for simple database applications. It helps a big part of the problem for small apps -- the management overhead of the database itself for some repetitive tasks. For complex ones, it still does a good job, but if you push its limits, then you have to start doing manual database coding, with little help of the framework.

Modeling

Modeling is an ambitious project that aims to write a port of Apple's Enterprise Object Framework (abbreviated henceforth EOF) in pure Python. The author had extensive previous experience using the EOF, and decided to write a port after missing the convenience that a good ORM brings to programming. The EOF uses the Entity-Relationship Model [1], which provides the academic foundation for this type of work.

The framework can be broadly split in two parts. The design part is a relational database modeling tool that generates the object-oriented abstraction layer. The runtime code is an object-oriented framework that implements object persistency and consistency.

The design tools accepts models in both Python or XML format. The former allows the schema to be written in a conventional text editor. The later works well with interactive tools that generate the entity description. One such a tool, ZModelizationTool (for Zope), is provided as part of the package. The configuration allows the full specification of relationships and joins. Solid knowledge of relational database theory is required to take full advantage of the design tool. Besides that, default values are handled nicely, making global changes relatively easy. The following database adapters are supported, and provided as part of the package: MySQL, Oracle, PostgreSQL and SQLite.

Once processed, the schema is converted into a package of Python classes. Each entity is mapped to a class that inherits its behavior from a common CustomObject ancestor. This hierarchy allows hooks for actions such as data validation. But the best part of the framework is the EditingContext class. It creates a graph of all object instances in memory, guaranteeing consistency. Each row in the underlying dataset is uniquely mapped to an object. All relationships are also mapped into the EditingContext. One can think about it as a object oriented database state map -- in a sense, the equivalent, in object oriented terms, to a standard database connection.

Summary. Modeling is a well structured tool, based on solid and well tested concepts. The documentation of this project is well structured and professional looking, but still incomplete at parts, and the diagrams still need some polishment. Although ambitious, the project is focused and well managed. It's a powerful tool, and some careful code tuning is needed to take full advantage of its power. An 1.0 release will be most welcome.

Conclusion

Choosing a ORM is a integral part of the design choices that have to be done at the start of the development cycle. In a sense, it should be easier to change the back-end database than to change ORMs -- assuming all databases support the same API (which is not actually true, but it's a good aproximation for this purpose).

SQLObject is by far the easiest and more pythonic of all the approaches we evaluated so far. Its power comes from the fact that it makes routine tasks vary simple -- in some cases, fully automatic and transparent. Its programming model is really easy and non intrusive. Its shortcomings should not be evident for most applications. Modeling, on the other hand, is more like a full fledged framework for database applications. The EditingContext concept is great, and would be in fact a more than welcome addition to the SQLObjects model. For my own applications, SQLObjects seem to be better suited - it is simpler, easier to learn, and cleaner than Modeling. But I'm sure that Modeling has its place for more complex apps.

Python Notes