Saturday, November 19, 2011

Too deep

I am generally a fan of Domain Specific Languages. Especially when they are declarative (rather than imperative) in style. However, I don't like it when people build DSLs on top of SQL.

When using a DSL I like to know what happens one level below the DSL, i.e. the transformations which takes the declarative what to the imperative how. For example, when using a regex it is useful to know about the finite automata used to do the matching.

SQL is in itself a (declarative) DSL. Hence, when using SQL it is nice to have a feeling for what will happen in the DBMS when the SQL statement is executed, what indexes will be used etc. If I have another DSL on top of SQL I have to go two levels deep to get that insight. That is one level too much.

Thursday, November 17, 2011

Does not compose (and other musings)

I first planned to write a post about how objects do not compose on a property level. But I ended up addressing a couple of other problems with ORM as well. It is really hard (for me) to stay focused on one problem at a time when writing of Object Relational Mapping, because they do such a good job of introducing complexity.

Let's take a trivial example of something you want to do all the time in even the most trivial of web apps. You want to compose data from from different tables.

Consider this tiny database of (some of) the beers in my fridge.

Table: BEER
IDNAMEFK_BREWERY
1Double Dog6
2Gonzo6
3Texas Ranger5
4Punk IPA1
5Bigfoot Ale2
6Torpedo Extra IPA2

Table: BREWERY
IDNAMECOUNTRY
1Brew DogScotland
2Sierra NevadaUSA
3Thomas CreekUSA
4Sam AdamsUSA
5MikkellerDenmark
6Flying DogUSA

In a real database these tables would of course have a lot more columns.

Now, imagine you want to display a HTML page with a list of all the beers along with the name of the brewery displayed as a link. The link would refer to another page displaying all the data for that brewer. You cannot get more standard than this.

With ORM


How would this look in a standard ORM with one class per table and one property per column (sprinkled with some magic if the column was a foreign key)?

beers = orm.getAllOfType(Beer)
for (beer in beers) {
  brewery = beer.getBrewery()
  printRow(beer.getName(), link(brewery.getId(), brewery.getName()))
}

If you get a hard-on from the absence of SQL it looks like we hava a winner! But if you have half a brain and wonder what takes place under the covers things don't look so good. For each beer.getBrewery() a database query is beeing made. This is bad on so many levels.

The extra states


The most obvious reason for why this is bad is that you have no way of knowing about the extra database requests just looking at the code. You can probably even open the definition of the Beer class and still not get a clue. You have to know that your beer object is not just a simple collection of data, nor is it anything like if you had instantiated it yourself. The object was created by the almighty ORM and is now a carrier of the database connection! WTF?!?

So the joyful result is that the objects have state you did not ask for. Typical ORM names for the states are attached and detached. But that is just scratching the surface, if the object has a (potential) reference to another object you have to know if that object has been loaded or not. That is part of the extra state as well, but typically not talked about.

The n+1 problem


This brings us to the next problem; the problem of multiple database requests. This is of course a well know problem. It is so well known it has a name: the n+1 problem, named after the number of queries made to the database. Makers of ORMs love to create problems, give them names, then create bad workarounds for the problems they just created. A sane person would maybe stop a think that you may have something fundamentally flawed in the model.

Ok, so that are the "solutions" ORMs provide for the n+1 problem? One way is to provide some mechanism to give hint to the ORM that you always should join (or fetch in ORM speak) the brewery as soon as you load a beer. Never mind that in 95% of your other queries of you don't care about that data at all. Hence, you should really think twice before using this technique.

Another "solution" provided may be a query language that sort of looks like SQL but have none of the properties a relational query language should have. Again, WTF?!? Why create a shitty language just so you can work with objects which sort of looks like your tables. What is the obsession?

Loading too much data


Another problem is that the ORM will load all the columns from the BREWERY table. The ORM has no way of knowing that you are only going to use 2 of them. This will get ugly very quick, especially if you're joining many tables.

Without ORM


How would a solution look in SQL and some low level data access layer?

resultSet = queryDatabase(
  """SELECT BEER.NAME, BREWERY.NAME, BREWERY.ID
  FROM   BEER, BREWERY
  WHERE  BEER.FK_BREWERY = BREWERY.ID""")

for (row in resultSet)
  printRow(row["BEER.NAME"], link(row["BREWERY.ID"], row["BREWERY.NAME"]))

Done! One query to the database guaranteed. Further you have a clear separation of concerns, querying the database is clearly separated from rendering the table. And the row object is simple naked data, the way god intended.

But here you may object (no pun intended)! The resultSet can be of a kind which has a open database connection, wasn't that something just complained about? This technique is used if you expect large result sets and you don't want to load all data in memory. If that is the case this is reflected in resultSet's type. You don't have to know how the resultSet was created, you just have to look at its type. Further, you will never execute a whole new query, you will effectively stream data from the database. (Streaming data from the database with an ORM could be a topic for a whole other post btw.)


The root cause


So, that is the root problem with the ORM in this case? Objects do not compose on a property level, they are indivisible containers of data. Relations (tables) on the other hand are divisible, you can cut them any way you like and still have a relation. That power is lost as soon as you go ORM.

Tuesday, November 15, 2011

Recommended reading

I thought that ORMs were all but dead, but leaving my bubble in a product company populated with peers I've found out that this is not the case. Some are even claiming the world is moving towards more ORM! The horror!

Well, I take comfort in the fact that I am not alone in my views regarding Object Relational Mapping, see for example Historical Perspective of ORM and Alternatives by Kenneth Downs. I especially like this quote from the article:

There are plenty of alternatives to ORM, but I would contend that they begin with a different world view. Good business recognizes the infinite value of the users as the generators of the Almighty Paycheck, and the database as the permanent record of a job well done.

This worldview forces us into a humble position with respect to our own application code, which is that it is little more than a waiter, carrying orders to the kitchen and food back to the patrons. When we see it this way, the goal becomes to write code that can efficiently get data back and forth.

This is a point I am planning to elaborate on myself in a future post.

Why this blog?

At a first glance this may seem like a strange blog, a blog devoted to arguing against some practice rather than for some other better practice. You can easily dismiss this entire blog with a "Haters gonna hate!" and be done with it. But I am trying to make case; the case for simplicity! And I think that today many of the so called Object Relational Mapping frameworks are the biggest threat, by a wide margin, to reliable simple code, hence this blog.

The name for this blog - "ORM, OMG!" - comes from the excellent presentation Simple Made Easy by Rich Hickey of Clojure fame. If you haven't seen that presention you should stop reading right here and go watch it!

Stay tuned for more specific rants agains the evil that is ORM.