Python Notes

Thursday, November 11, 2004

Alternative database systems

There was a time when a database meant a flat file, fixed record repository. Indexes were added later, bringing better performance for several tasks. During the sixties, hierarchical database systems were developed, allowing to model complex real-life structures better. Even today, old-style mainframe systems (such as IBM's IMS) are still in production, managing huge databases. SQL was only invented in the seventies, based on a mathematical formalization of high-level data manipulation algorithms. Batch processing systems read and process data in a sequential fashion, and normally do not need such abstractions. But the new generation interactive systems really needed them. And when PC-based client-server computing exploded in the 90's, SQL kingdom was started.

For those who develop conventional business applications, it currently seems like SQL is the definitive database system. Although SQL has several strong points, its current near monopoly can probably be explained by academia indoctrination: almost every CS graduate in the past fifteen years was told that SQL is good, and that the rest is bad. Part of it may be because there was nothing better at the time; also, and specially compared to its predecessors, SQL mathematical foundations gave it a kind of scientific validity that is loved by academia. But even SQL pioneers agree that there are problems. Of course, the diagnostics vary a lot. Some people (such as renowed C.J.Date) think that basic mathematical model itself is correct, and that the current implementations are flawed. There are some who believe that SQL itself is dead, and advocate instead a XML based model. One line of research that still failed to reach widespread adoption is the object-oriented database concept. There are several implementations, but somehow they fail to have the same level of public awareness that RDBMS or XML-based storage has today.

It seems a good situation to apply the age-old ditto: when all that you have is a hammer, everything looks like a nail. Relational databases are great, but can't solve all problems. XML is also interesting, but is bloated and confusing -- kind of an overabused tool to do too many things beyond its original roots. Object databases are interesting, but normally suffer from being too tied to a single environment.

In the middle of this, there is a unforeseen trend in the use of the file system as a storage medium. Yes, the file system. Guess what? Forget the FAT, please. Current file systems are much more stable and efficient than older ones. Modern filesystems are hierarchical, and can store arbitrary objects. Support for journaling, and better metadata management means that the filesystem is now a better choice for many situations. Several web publishing engines (blogs, wikis, and even full-fledged content management systems) support filesystem-based storage for text notes and documents, which were previously stored (in a hackish and haphazardous way) into DB blobs. The full filename is now a primary key, and flexible relationships between entities can be expressed as hiperlinks.

Of course, the same age-old advice applies for filebased storage: it's not suited for every task. But it's better than many academics would like to admit, and much safer than the old "I've seen a file system corrupted" guys would fear. For many things, it's already the best bet. The best part? It's free, and comes installed in the OS.

9 Comments:

Post a Comment

<< Home