Let's take a trivial example of something you want to do all the time in even the most trivial of web apps. You want to compose data from from different tables.
Consider this tiny database of (some of) the beers in my fridge.
Table: BEER
ID | NAME | FK_BREWERY |
---|---|---|
1 | Double Dog | 6 |
2 | Gonzo | 6 |
3 | Texas Ranger | 5 |
4 | Punk IPA | 1 |
5 | Bigfoot Ale | 2 |
6 | Torpedo Extra IPA | 2 |
Table: BREWERY
ID | NAME | COUNTRY |
---|---|---|
1 | Brew Dog | Scotland |
2 | Sierra Nevada | USA |
3 | Thomas Creek | USA |
4 | Sam Adams | USA |
5 | Mikkeller | Denmark |
6 | Flying Dog | USA |
In a real database these tables would of course have a lot more columns.
Now, imagine you want to display a HTML page with a list of all the beers along with the name of the brewery displayed as a link. The link would refer to another page displaying all the data for that brewer. You cannot get more standard than this.
With ORM
How would this look in a standard ORM with one class per table and one property per column (sprinkled with some magic if the column was a foreign key)?
beers = orm.getAllOfType(Beer) for (beer in beers) { brewery = beer.getBrewery() printRow(beer.getName(), link(brewery.getId(), brewery.getName())) }
If you get a hard-on from the absence of SQL it looks like we hava a winner! But if you have half a brain and wonder what takes place under the covers things don't look so good. For each beer.getBrewery() a database query is beeing made. This is bad on so many levels.
The extra states
The most obvious reason for why this is bad is that you have no way of knowing about the extra database requests just looking at the code. You can probably even open the definition of the Beer class and still not get a clue. You have to know that your beer object is not just a simple collection of data, nor is it anything like if you had instantiated it yourself. The object was created by the almighty ORM and is now a carrier of the database connection! WTF?!?
So the joyful result is that the objects have state you did not ask for. Typical ORM names for the states are attached and detached. But that is just scratching the surface, if the object has a (potential) reference to another object you have to know if that object has been loaded or not. That is part of the extra state as well, but typically not talked about.
The n+1 problem
This brings us to the next problem; the problem of multiple database requests. This is of course a well know problem. It is so well known it has a name: the n+1 problem, named after the number of queries made to the database. Makers of ORMs love to create problems, give them names, then create bad workarounds for the problems they just created. A sane person would maybe stop a think that you may have something fundamentally flawed in the model.
Ok, so that are the "solutions" ORMs provide for the n+1 problem? One way is to provide some mechanism to give hint to the ORM that you always should join (or fetch in ORM speak) the brewery as soon as you load a beer. Never mind that in 95% of your other queries of you don't care about that data at all. Hence, you should really think twice before using this technique.
Another "solution" provided may be a query language that sort of looks like SQL but have none of the properties a relational query language should have. Again, WTF?!? Why create a shitty language just so you can work with objects which sort of looks like your tables. What is the obsession?
Loading too much data
Another problem is that the ORM will load all the columns from the BREWERY table. The ORM has no way of knowing that you are only going to use 2 of them. This will get ugly very quick, especially if you're joining many tables.
Without ORM
How would a solution look in SQL and some low level data access layer?
resultSet = queryDatabase( """SELECT BEER.NAME, BREWERY.NAME, BREWERY.ID FROM BEER, BREWERY WHERE BEER.FK_BREWERY = BREWERY.ID""") for (row in resultSet) printRow(row["BEER.NAME"], link(row["BREWERY.ID"], row["BREWERY.NAME"]))
Done! One query to the database guaranteed. Further you have a clear separation of concerns, querying the database is clearly separated from rendering the table. And the row object is simple naked data, the way god intended.
But here you may object (no pun intended)! The resultSet can be of a kind which has a open database connection, wasn't that something just complained about? This technique is used if you expect large result sets and you don't want to load all data in memory. If that is the case this is reflected in resultSet's type. You don't have to know how the resultSet was created, you just have to look at its type. Further, you will never execute a whole new query, you will effectively stream data from the database. (Streaming data from the database with an ORM could be a topic for a whole other post btw.)
The root cause
So, that is the root problem with the ORM in this case? Objects do not compose on a property level, they are indivisible containers of data. Relations (tables) on the other hand are divisible, you can cut them any way you like and still have a relation. That power is lost as soon as you go ORM.
No comments:
Post a Comment