ORM, OMG!: 2011

Thursday, December 1, 2011

Users don't care about the middle

If you are reading this you are either a developer, strange or both. But you are also a user. Let's think like a user for a while. What do you care about? Me, I care about two things:

The user experience - This generally means a good GUI, whatever that is depending on the situation.
My data - I want my data to be stored in a reliable storage and accessible when I need it.

What is missing from this list? Answer, the middleware.

As a user I don't care too much about the exact kind of database, what languages were used in writing the software, etc. I definitely don't care about all intermediate forms my data have taken from the database to produce pixels on my screen. I tend to think about my data as data.

I really think this is the focus we as developers should have as well! Before deciding on you technology stack, we should think long and hard on the problems we are trying to solve.

What does this to have do with ORM?

Ok, but this is a ORM-bashing blog. What does this have to do with ORMs?

I really believe that ORM is a trap what seduces you into putting your precious brain cycle working on the wrong problem. It promise goes something like this:

"If you create a nice class diagram you will never have to think about your database again. Don't be afraid to add some inheritance as the elite OO-developer you are. And if you don't know anything about relational databases that's ok, you are so cool with you OO-skillz. Oh, and I will almost auto-generate the GUI for you. One page per class was what you wanted, right? I love you. Look at this tiny pet-store example how good it works."

And with this in mind you start hacking away not thinking to much about database constraints and the database schema sort of just happens, it is a second class citizen.

(Of course, the ORM proponents will say that you can go from schema to classes as well as the other way around. But if you do that I really think you should pause for a moment and ask yourself if you really need the added weight and complexity of an ORM?)

When you are done with your object graph you can start to think about the GUI. The problem is, there is not much to think about. Your entity classes have more or less dictated what your GUI will look like. Cut your data any other way and you are in for-loop-add-things-to-maps-and-loop-some-more-hell. Without an ORM this would be a simple declarative SQL statement.

And do you know who don't care about your fancy object graph? Your users.

The alternative

So what happens when you do the opposite and start focusing on the GUI and the database. Well, if you are in a RESTful AJAX web-application setting you will notice that your middle starts to get really really boring, which is a good thing. And if you drop the ORM it will even get simple.

Your middleware will become a simple servant of data, transforming data from whatever format it happens to have in the database to another format suitable for sending to the client. If you adhere to the state transfer part of REST (and do not use AJAX as another RPC-protocol) you will see other winnings as well, such as it doesn't matter anymore if the next request goes to the same server.

You will get more time building that awesome application for your users, you know the ones who puts food on your table. And you you will no longer complain when they ask for features what don't fit your fancy object graph. You will be able to take advantage of the features people used to use as an argument for choosing a SQL database to begin with. Before, when people used to know that a database is more than persistent storage.

Saturday, November 19, 2011

Too deep

I am generally a fan of Domain Specific Languages. Especially when they are declarative (rather than imperative) in style. However, I don't like it when people build DSLs on top of SQL.

When using a DSL I like to know what happens one level below the DSL, i.e. the transformations which takes the declarative what to the imperative how. For example, when using a regex it is useful to know about the finite automata used to do the matching.

SQL is in itself a (declarative) DSL. Hence, when using SQL it is nice to have a feeling for what will happen in the DBMS when the SQL statement is executed, what indexes will be used etc. If I have another DSL on top of SQL I have to go two levels deep to get that insight. That is one level too much.

Thursday, November 17, 2011

Does not compose (and other musings)

I first planned to write a post about how objects do not compose on a property level. But I ended up addressing a couple of other problems with ORM as well. It is really hard (for me) to stay focused on one problem at a time when writing of Object Relational Mapping, because they do such a good job of introducing complexity.

Let's take a trivial example of something you want to do all the time in even the most trivial of web apps. You want to compose data from from different tables.

Consider this tiny database of (some of) the beers in my fridge.

Table: BEER

ID	NAME	FK_BREWERY
1	Double Dog	6
2	Gonzo	6
3	Texas Ranger	5
4	Punk IPA	1
5	Bigfoot Ale	2
6	Torpedo Extra IPA	2

Table: BREWERY

ID	NAME	COUNTRY
1	Brew Dog	Scotland
2	Sierra Nevada	USA
3	Thomas Creek	USA
4	Sam Adams	USA
5	Mikkeller	Denmark
6	Flying Dog	USA

In a real database these tables would of course have a lot more columns.

Now, imagine you want to display a HTML page with a list of all the beers along with the name of the brewery displayed as a link. The link would refer to another page displaying all the data for that brewer. You cannot get more standard than this.

With ORM

How would this look in a standard ORM with one class per table and one property per column (sprinkled with some magic if the column was a foreign key)?

beers = orm.getAllOfType(Beer)
for (beer in beers) {
  brewery = beer.getBrewery()
  printRow(beer.getName(), link(brewery.getId(), brewery.getName()))
}

If you get a hard-on from the absence of SQL it looks like we hava a winner! But if you have half a brain and wonder what takes place under the covers things don't look so good. For each beer.getBrewery() a database query is beeing made. This is bad on so many levels.

The extra states

The most obvious reason for why this is bad is that you have no way of knowing about the extra database requests just looking at the code. You can probably even open the definition of the Beer class and still not get a clue. You have to know that your beer object is not just a simple collection of data, nor is it anything like if you had instantiated it yourself. The object was created by the almighty ORM and is now a carrier of the database connection! WTF?!?

So the joyful result is that the objects have state you did not ask for. Typical ORM names for the states are attached and detached. But that is just scratching the surface, if the object has a (potential) reference to another object you have to know if that object has been loaded or not. That is part of the extra state as well, but typically not talked about.

The n+1 problem

This brings us to the next problem; the problem of multiple database requests. This is of course a well know problem. It is so well known it has a name: the n+1 problem, named after the number of queries made to the database. Makers of ORMs love to create problems, give them names, then create bad workarounds for the problems they just created. A sane person would maybe stop a think that you may have something fundamentally flawed in the model.

Ok, so that are the "solutions" ORMs provide for the n+1 problem? One way is to provide some mechanism to give hint to the ORM that you always should join (or fetch in ORM speak) the brewery as soon as you load a beer. Never mind that in 95% of your other queries of you don't care about that data at all. Hence, you should really think twice before using this technique.

Another "solution" provided may be a query language that sort of looks like SQL but have none of the properties a relational query language should have. Again, WTF?!? Why create a shitty language just so you can work with objects which sort of looks like your tables. What is the obsession?

Loading too much data

Another problem is that the ORM will load all the columns from the BREWERY table. The ORM has no way of knowing that you are only going to use 2 of them. This will get ugly very quick, especially if you're joining many tables.

Without ORM

How would a solution look in SQL and some low level data access layer?

resultSet = queryDatabase(
  """SELECT BEER.NAME, BREWERY.NAME, BREWERY.ID
  FROM   BEER, BREWERY
  WHERE  BEER.FK_BREWERY = BREWERY.ID""")

for (row in resultSet)
  printRow(row["BEER.NAME"], link(row["BREWERY.ID"], row["BREWERY.NAME"]))

Done! One query to the database guaranteed. Further you have a clear separation of concerns, querying the database is clearly separated from rendering the table. And the row object is simple naked data, the way god intended.

But here you may object (no pun intended)! The resultSet can be of a kind which has a open database connection, wasn't that something just complained about? This technique is used if you expect large result sets and you don't want to load all data in memory. If that is the case this is reflected in resultSet's type. You don't have to know how the resultSet was created, you just have to look at its type. Further, you will never execute a whole new query, you will effectively stream data from the database. (Streaming data from the database with an ORM could be a topic for a whole other post btw.)

The root cause

So, that is the root problem with the ORM in this case? Objects do not compose on a property level, they are indivisible containers of data. Relations (tables) on the other hand are divisible, you can cut them any way you like and still have a relation. That power is lost as soon as you go ORM.

Tuesday, November 15, 2011

Why this blog?

At a first glance this may seem like a strange blog, a blog devoted to arguing against some practice rather than for some other better practice. You can easily dismiss this entire blog with a "Haters gonna hate!" and be done with it. But I am trying to make case; the case for simplicity! And I think that today many of the so called Object Relational Mapping frameworks are the biggest threat, by a wide margin, to reliable simple code, hence this blog.

The name for this blog - "ORM, OMG!" - comes from the excellent presentation Simple Made Easy by Rich Hickey of Clojure fame. If you haven't seen that presention you should stop reading right here and go watch it!

[Watch the presentation.]

Stay tuned for more specific rants agains the evil that is ORM.