Triggers Considered Harmful, Considered Harmful: Continuing the story

Sunday, January 4, 2015

Continuing the story

One of my 2015 new year's resolutions, was to finish the story I started on this blog. The story being a talk that I had delivered a couple of times 3-4 years ago on various Oracle/usergroup events. The talk is about why I think triggers should *not* be considered harmful. There is one specific use-case for triggers -- implementing validation code for what's called a multi-row constraint -- that I will treat in-depth on this blog.

So here goes.

Let's first recap what we did sofar:

In november 2011 I started this blog and set the scene. I talked about DML-triggers and pointed out some of their features and restrictions. Obviously I introduced the infamous 'mutating table' error ORA-04091 that exists for row-level triggers (for each row). Everybody that starts out using triggers to implement a multi-row constraint, will hit this error.
Then in december 2011 I explained in detail what that error is all about, and why it is your friend and not your enemy. I also showed that the common workarounds for this error really aren't workarounds and should be avoided.
The month thereafter I demonstrated that using a statement trigger (instead of a row trigger) as a workaround has a serious performance disadvantage.
I then took a step back and discussed various other use-cases for DML-triggers. For these other use-cases I agree with the common consensus that triggers are to be considered harmful.
Here I specified one use-case for triggers being the implementation of multi-row constraints. And explained how the lack of SQL assertions, forces us to employ triggers to implement these kinds of constraints.
Next I discussed the broader topic of constraints, and how they can be meaningfully classified: multi-row constraints being one of the classes.
I showed how two other constraint classes (attribute and tuple constraints) ideally should be implemented using a declarative CHECK, and not via a trigger.
In the following post I discussed different implementation strategies for multi-row constraints, the vast majority of which we cannot simply declare to the DBMS. We are required to write our own procedural validation code to implement these constraints. One implementation strategy however comes very close to being a declarative one: the use of materialized views that fail to refresh in case a violation for the constraint exists.
I then discussed the materialized view strategy in some detail in this post.

And that's where the development of my story is right now. Let's continue. And we will do so by discussing the correct workaround for the mutating table error when you try to implement a multi-row constraint. Recall we have looked at the following workarounds sofar:

Workaround 1a: a row-trigger where we wrapped the query accessing the mutating table in an autonomous transaction (thereby preventing ORA-04091).
Workaround1b: a row-trigger where we queried the mutating table via a loopback database-link, which also prevents the error.
Workaround 2: a statement trigger, which was not an acceptable solution performance-wise. It had to always check (not just when a manager got inserted), and it had to check all departments (not just the one that was affected by the DML-statement).

To implement multi-row constraints correctly and efficiently we need to build a combination of a row-trigger and a statement-trigger. The task of the row-trigger is to convey the information about which rows were affected by the DML-statement, to the after-statement-trigger. This statement-trigger can then use this information and query the mutating table (which it is allowed to do) in an efficient manner.

We can employ three 'container'-types to convey this information:

Use a persistent packaged (table) variable
Use compound triggers and its state variables
Use a global temporary table (gtt)

In all three cases the row-trigger would populate the container with the relevant information to be conveyed to the statement-trigger. And the statement-trigger would inspect the container. Note that in the cases of a packaged variable (1) and a gtt (3) you are to make sure that the container is empty when a new DML-statement starts. The state variables of a compound trigger are initialized (by the DBMS) each time the compound trigger fires, even when the triggering statement is interrupted and restarted. Packaged variables do not rollback when the triggering statement is interrupted. The contents of a gtt do of course. Both packaged and state variables live in the PGA of the session and since they involve memory manipulation are likely to be 'faster' to populate and inspect, than the gtt container is.

However, I personally prefer to use a gtt. Perhaps the only reason being that it supports DML-statements that affect many, many rows. With state and package variables you might run the risk of blowing up the PGA in such a case. And with the advent of Oracle12c the overhead of using a gtt is further reduced with the new temporary-undo feature, which causes no redo to be generated for undo of temporary tables (see the temp_undo_enabled spfile parameter).

Now given our earlier attempts to code validation logic for the example constraint (All managers must have a clerk in the same department), we really only need to convey the number of the department for which we need to validate the constraint. And, we only need to convey this when a manager is indeed inserted. However we anticipate more than just (this) one multi-row constraint in which the EMP table is involved. And since we do not want to create a gtt per constraint, but rather one per (application) table, it is necessary to insert the values of all involved columns of the multi-row constraint. And to insert them always. I.e. not just when a manager is inserted, for we can have another multi-row constraint on the EMP table that is interested in other job values (not just managers). We then move the logic of when and for which department the constraint needs to be validated to the statement trigger. This also has makes the code more elegant: all validation logic for the constraint is located in one piece of code.

Here is the validation logic sofar for our constraint then.

Note that all the row-trigger does, is keep track of the relevant column values that are being inserted. The real validation logic for our constraint sits in the statement-trigger. For every department into which a manager just got inserted (we conveniently query our gtt for this) , it calls out to the (earlier created) p_check_for_clerk procedure. A few remarks on this implementation:

We query distinct deptno's from the gtt, to prevent checking a department more than once, in case an insert statement just inserted more than one manager into a department.
The query that access the EMP table inside the p_check_for_clerk procedure, will not hit the mutating table anymore, as at 'after statement' time the EMP table isn't in a mutating state anymore.
That query will also be immune for any order into which a multi-row insert statement will have inserted new EMP rows and it will see these rows: i.e. the issues of the previous two 'workarounds' (autonomous TX and loopback db-link) do not exist now.

Here's a test of this solution, using the same insert from the EMP_STAGE table we used earlier.

So, we now have a trigger based solution to enforce our constraint. But we are of course not done yet. We have only tackled the inserting of managers now. There are however other DML-statements that can potentially violate this constraint, and for which we will need to develop (more) triggers. Can you guess them?

And there is more after that... In a future post we will also discuss the need to serialize multiple transactions for correct enforcement of constraints.

Thanks for tuning in again.