Python Notes

Wednesday, December 15, 2004

XML is not the question

A quote from the Python Is Not Java article (great reading, btw):
XML is not the answer. It is not even the question. To paraphrase Jamie Zawinski on regular expressions, "Some people, when confronted with a problem, think “I know, I’ll use XML.” Now they have two problems."

Flow Based Programming & the new Python coding style

Since being introduced in the language, generators (and their lesser cousins, list comprehensions) are finally making it into the community mindset. In the long term, I think that this change will have a lasting effect on the way we write Python programs.

In the beginning, Python was a conventional scripting language that happened to have a exceptionally clean syntax and a solid object model. So most Python programs would resemble conventional sequential scripts. As the language evolved, the applications grew more complex, but their essence was still the same. The network-enabled modules pushed the envelope early on. Two architectures to handle multiple simultaneous requests were implemented, following (again) the conventional wisdom: async modules (based on the standard Posix select() call) and threaded execution.

Stackless Python was a interesting event in Python's history. It's still one of the most amazing pieces of Python-related software. I personally believe that the most important effect of Stackless was to break out the paradigm box. Once Stackless became available, people could see other ways of doing things. I personally believe that Stackless was a very strong driving force, even if only as a reference of what could be done once the conventional restrictions of the sequential processing model were lifted.

List comprehensions were added first to the language. The iterator protocol followed it, and soon after, generators were introduced. People started using then, at first slowly. List comprehensions in particular have lead a number of calls for help from people trying to grasp their syntax. But now, more and more systems rely on these new style constructs to implement complex programming patterns. While checking libraries, I can often see list-comps being used in places where loops would be used just a couple of years ago. More recently, the Web SIG defined a generator-based interface in the WSGI spec, that will allow async-style calls between the web server and the application engine. And new modules and projects seem to be getting to grips with the async nature of generators, and are using them in highly interesting and innovative ways.

After following these changes over the past few years, I was surprised to be a late "rediscoverer" of Flow Based Programming. Just a quick reading on the topic shows how much can be done in terms of application modelling with Python. There are strong parallels between FBP and other paradigms, such as functional programming; a good discussion can be found on the C2 Wiki (Flow Based Programming). The basic premise is that business applications are data-driven by nature, and thus, poorly suited to the strictly sequential Von Neumann model. This may sound like old talk today, but it's interesting to note that this stuff was developed over 30 years ago, in a time when concepts like Object Oriented Programming were still an academic novelty. Critical applications written using FBP are still in use today, which proves its suitability for an extremely demanding task.

I personally believe that right now, more and more people are "rediscovering" how to think and write data centric code in Python. The newest features (specially generator expressions) will encourage this style of programming. This will lead to a change as the Python community incorporates this data-centric paradigm shift into new applications. The trend is already stablished; it's one more great application for Python, with a huge potential for success.

Tuesday, December 14, 2004

Forget the GUI (long live the GUI!)

There was a time, not a really long ago, when business programming meant basically processing data. The visual interface options were really limited, so much more time was spent in the business logic than in the UI part of the system. Now times have changed for better, and GUIs are the norm; in fact, it's so common place now as to make one wonder how were things done before. For many programmers, prototyping means baking a visually decent, but barely functional sketch of the software that they hope to deliver. It's amazing how many of those so called "prototypes" end up being used as the final application, with a lot of shortcomings.

That's all to present a proposal, that may sound radical, but it really isn't. Forget the GUI. I mean: forget it while writing the prototype of the software.

This idea may seem silly, or even counter-intuitive, due to several factors:
  • Many people today relate the idea of programming to the one of making something visible. Managers, specially.
  • Rapid development languages are focused toward visual development. This makes for a great temptation, specially for novices.
  • Due to the combination of the aforementioned reasons, it's pretty easy to fake development using visual development tools. In the process the programmer usually falls prey of its own device, having to maintain the code that was supposed to be a prototype as the real product, thus never having the time to finish the product.

So what's the revolutionary "forget the GUI" idea? It's basically about a change in the approach for the initial phases of development. Instead of focusing efforts on the GUI part (which is really a mistake, as any serious system analyst will note), focus the effort on the prototyping of the business objects. All the development is done in this phase using a combination of techniques, one of them at least 40 years old -- Test Driven Development, and batch processing.

Test Driven Development (see also TDD on C2 Wiki for a discussion on the topic) is one relatively new buzzword. It's one of the techniques associated with Extreme Programming, and it's got a fair share of discussion, praises and critics over the past few years. On the other hand, batch processing is one of the old-style programming techniques that have a decidedly bad fame attached to it. But if applied correctly, it's still highly useful. In fact, all those scripts that process huge log files are nothing more than batch processes in a disguise (just don't tell managers about it).

The combination works like this: on the initial phases of the system development, focus on the prototyping of the business objects. Model real processes as they happen; write objects, fill them with data by hand (by assigning values directly to their properties), and implement simple routines that simulate the process. Dynamic languages like Python are wonderfully well suited to this approach to coding. Using a TDD approach, write the test cases based on this input. Determine what should happen for each case, and make a test case: fill objects with data, call a processing method, check the resulting data.

All the data that you have at this point will be coded in the test scripts, that act in a sense as a simple batch processing system. Everytime you run the test system, you are in fact running a batch that is the prototype for the entire application. Of course, tests sometimes need to force the system limits, or even feed bad data to the system. But you can separate stress tests from the actual system simulation objects, so you have a valid database after you finish processing the data.

This approach has many advantages. On one hand, it does not give the project the immediate visibility that a GUI prototype gives. But it compensates for this with a greatly simplified development environment, where the programmer can focus on the business rules. One of the most time consuming parts of visual development is keeping the visual part of the program (data entry forms, for example) in sync with the business rules. If you change the schema, you have to change the forms; in some cases, the change propagates to several modules. To make things worse, a simple "find & replace" is not very helpful at this point. By writing a "batch style" prototype, the programmer can bypass this step, and write the GUI only after the schema is already stable, and subject to smaller changes.

Another advantage of this approach is the modularization. In general, conventional GUI programs suffer from the tendency of being monolithical, partially due to the very development model that the RAD tool emphasizes. Writing the batch processing scripts helps to isolate the underlying system API from the visual presentation of data. The system becomes easier to test, and also, easier to automate with user-level scripts.

So the conclusion is: Forget the GUI, long live the GUI. Because in the end, the application will have a GUI. But it will be mounted over a clean, well modularized foundation, that is testable and easily scriptable. That's more than offset any disadvantage that the lack of a preliminary visual prototype could possibly cause.