Saturday, November 19, 2011
Too deep
When using a DSL I like to know what happens one level below the DSL, i.e. the transformations which takes the declarative what to the imperative how. For example, when using a regex it is useful to know about the finite automata used to do the matching.
SQL is in itself a (declarative) DSL. Hence, when using SQL it is nice to have a feeling for what will happen in the DBMS when the SQL statement is executed, what indexes will be used etc. If I have another DSL on top of SQL I have to go two levels deep to get that insight. That is one level too much.
Thursday, November 17, 2011
Does not compose (and other musings)
Let's take a trivial example of something you want to do all the time in even the most trivial of web apps. You want to compose data from from different tables.
Consider this tiny database of (some of) the beers in my fridge.
Table: BEER
ID | NAME | FK_BREWERY |
---|---|---|
1 | Double Dog | 6 |
2 | Gonzo | 6 |
3 | Texas Ranger | 5 |
4 | Punk IPA | 1 |
5 | Bigfoot Ale | 2 |
6 | Torpedo Extra IPA | 2 |
Table: BREWERY
ID | NAME | COUNTRY |
---|---|---|
1 | Brew Dog | Scotland |
2 | Sierra Nevada | USA |
3 | Thomas Creek | USA |
4 | Sam Adams | USA |
5 | Mikkeller | Denmark |
6 | Flying Dog | USA |
In a real database these tables would of course have a lot more columns.
Now, imagine you want to display a HTML page with a list of all the beers along with the name of the brewery displayed as a link. The link would refer to another page displaying all the data for that brewer. You cannot get more standard than this.
With ORM
How would this look in a standard ORM with one class per table and one property per column (sprinkled with some magic if the column was a foreign key)?
beers = orm.getAllOfType(Beer) for (beer in beers) { brewery = beer.getBrewery() printRow(beer.getName(), link(brewery.getId(), brewery.getName())) }
If you get a hard-on from the absence of SQL it looks like we hava a winner! But if you have half a brain and wonder what takes place under the covers things don't look so good. For each beer.getBrewery() a database query is beeing made. This is bad on so many levels.
The extra states
The most obvious reason for why this is bad is that you have no way of knowing about the extra database requests just looking at the code. You can probably even open the definition of the Beer class and still not get a clue. You have to know that your beer object is not just a simple collection of data, nor is it anything like if you had instantiated it yourself. The object was created by the almighty ORM and is now a carrier of the database connection! WTF?!?
So the joyful result is that the objects have state you did not ask for. Typical ORM names for the states are attached and detached. But that is just scratching the surface, if the object has a (potential) reference to another object you have to know if that object has been loaded or not. That is part of the extra state as well, but typically not talked about.
The n+1 problem
This brings us to the next problem; the problem of multiple database requests. This is of course a well know problem. It is so well known it has a name: the n+1 problem, named after the number of queries made to the database. Makers of ORMs love to create problems, give them names, then create bad workarounds for the problems they just created. A sane person would maybe stop a think that you may have something fundamentally flawed in the model.
Ok, so that are the "solutions" ORMs provide for the n+1 problem? One way is to provide some mechanism to give hint to the ORM that you always should join (or fetch in ORM speak) the brewery as soon as you load a beer. Never mind that in 95% of your other queries of you don't care about that data at all. Hence, you should really think twice before using this technique.
Another "solution" provided may be a query language that sort of looks like SQL but have none of the properties a relational query language should have. Again, WTF?!? Why create a shitty language just so you can work with objects which sort of looks like your tables. What is the obsession?
Loading too much data
Another problem is that the ORM will load all the columns from the BREWERY table. The ORM has no way of knowing that you are only going to use 2 of them. This will get ugly very quick, especially if you're joining many tables.
Without ORM
How would a solution look in SQL and some low level data access layer?
resultSet = queryDatabase( """SELECT BEER.NAME, BREWERY.NAME, BREWERY.ID FROM BEER, BREWERY WHERE BEER.FK_BREWERY = BREWERY.ID""") for (row in resultSet) printRow(row["BEER.NAME"], link(row["BREWERY.ID"], row["BREWERY.NAME"]))
Done! One query to the database guaranteed. Further you have a clear separation of concerns, querying the database is clearly separated from rendering the table. And the row object is simple naked data, the way god intended.
But here you may object (no pun intended)! The resultSet can be of a kind which has a open database connection, wasn't that something just complained about? This technique is used if you expect large result sets and you don't want to load all data in memory. If that is the case this is reflected in resultSet's type. You don't have to know how the resultSet was created, you just have to look at its type. Further, you will never execute a whole new query, you will effectively stream data from the database. (Streaming data from the database with an ORM could be a topic for a whole other post btw.)
The root cause
So, that is the root problem with the ORM in this case? Objects do not compose on a property level, they are indivisible containers of data. Relations (tables) on the other hand are divisible, you can cut them any way you like and still have a relation. That power is lost as soon as you go ORM.
Tuesday, November 15, 2011
Recommended reading
I thought that ORMs were all but dead, but leaving my bubble in a product company populated with peers I've found out that this is not the case. Some are even claiming the world is moving towards more ORM! The horror!
Well, I take comfort in the fact that I am not alone in my views regarding Object Relational Mapping, see for example Historical Perspective of ORM and Alternatives by Kenneth Downs. I especially like this quote from the article:
This is a point I am planning to elaborate on myself in a future post.There are plenty of alternatives to ORM, but I would contend that they begin with a different world view. Good business recognizes the infinite value of the users as the generators of the Almighty Paycheck, and the database as the permanent record of a job well done.
This worldview forces us into a humble position with respect to our own application code, which is that it is little more than a waiter, carrying orders to the kitchen and food back to the patrons. When we see it this way, the goal becomes to write code that can efficiently get data back and forth.
Why this blog?
At a first glance this may seem like a strange blog, a blog devoted to arguing against some practice rather than for some other better practice. You can easily dismiss this entire blog with a "Haters gonna hate!" and be done with it. But I am trying to make case; the case for simplicity! And I think that today many of the so called Object Relational Mapping frameworks are the biggest threat, by a wide margin, to reliable simple code, hence this blog.
The name for this blog - "ORM, OMG!" - comes from the excellent presentation Simple Made Easy by Rich Hickey of Clojure fame. If you haven't seen that presention you should stop reading right here and go watch it!
Stay tuned for more specific rants agains the evil that is ORM.