Friday, March 8, 2013

The materialized view approach for implementing a table constraint

In yesterdays post I announced that I'd spent a separate post on how we can use materialized views to enforce table constraints. So here goes.

The high-level cookbook for this approach is as follows:
  1. We create a materialized view that refreshes on commit,
  2. The materialized view is defined in such a manner that it will hold no rows when the table constraint is adhered to by the current transaction trying to commit,
  3. And it is defined such that it will hold (at least) one row when the table constraint is violated by the current transaction trying to commit,
  4. We devise a construct such that on-commit refresh of the materialized view *always* fails whenever (at least) one row is materialized in the view. This can be done in two manners:
    1) we add a check constraint on the underlying table of the materialized view that always fails, or
    2) we add a before-insert row-trigger on the underlying table of the materialized view that always fails.
Here's the slide on this from my 'harmful triggers' presentation:


So let's try this with our example constraint (managers require a clerk in same department). The assertion for this constraint was:
not exists
        (select 'a department'
         from (select distinct deptno from emp) d
         where exists 
                 (select 'a manager in d'
                  from emp e
                  where e.deptno = d.deptno and e.job = 'MANAGER')
           and not exists
                 (select 'a clerk in d'
                  from emp e
                  where e.deptno = d.deptno and e.job = 'CLERK')
        )

With this assertion we can now mechanically generate a materialized view for our constraint, using the DUAL table. Note: we negate the assertion so that the materialized view ends up having characteristics 2 and 3 from our cookbook above. So the 'not exists' turns into an 'exists'.

create materialized view managers_need_clerk
refresh fast on commit
as
select 'wrong' as text
from dual
where exists
        (select 'a department'
         from (select distinct deptno from emp) d
         where exists 
                 (select 'a manager in d'
                  from emp e
                  where e.deptno = d.deptno and e.job = 'MANAGER')
           and not exists
                 (select 'a clerk in d'
                  from emp e
                  where e.deptno = d.deptno and e.job = 'CLERK')
        )
/

Note: we explicitly want this materialized view to be "fast refreshable", meaning that Oracle will use intelligence to minimize the work required to refresh this view. In order for Oracle to be able to do so, we would also need to create a materialized view log on the table involved, which is EMP (and DUAL?) in this case. Creating the MV-log is left for the reader.

And finally we add a CHECK clause to the underlying table segment of this materialized view (whose name is the same as the materialized view). This CHECK clause is such that it always evaluates to FALSE.

alter table managers_need_clerk add CHECK( 0 = 1 )
/

The way this now should work is that whenever a transaction introduces a department that has a manager, but no clerk, and tries to commit, this on-commit refresh materialized view will produce a single row to be inserted into the underlying table segment. This triggers validation of our CHECK clause, which will always fail. Which in turn causes the commit to fail, thereby preventing this transaction to successfully complete.

All seems very well, until you now hit the following error:

ORA-12052: cannot fast refresh materialized view [owner].[mat.view]

There are still various restrictions imposed upon materialized views for them to be fast refreshable. See the Oracle documentation for this. Sometimes you might be surprised though that a bit of rewriting a materialized view could end up such that the materialized view becomes fast refreshable. Rewriting them into straight joins is a strategy that might work here. For instance our materialized view above can be rewritten into this:

create materialized view managers_need_clerk
refresh fast on commit
as
select 'wrong' as text
from (select c.job
      from emp m
          ,emp c
      where m.job = 'MANAGER'
        and m.deptno = c.deptno (+)
        and 'CLERK' = c.job (+))
where job is NULL
/

I haven't tested above alternative: with the appropriate materialized view logs, it could well be fast refreshable now...

On final comment on this approach for implementing table constraints: Oracle (must and) will serialize refreshes of the materialized view among simultaneously executing (or rather, committing) transactions. Rob van Wijk has a nice blogpost on this here.

Thursday, March 7, 2013

And what about table constraints?

In a previous post we've introduced a classification scheme for constraints:
  • attribute constraints
  • tuple constraints
  • table constraints
  • database constraints
  • dynamic constraints
And talked a bit about how we could implement the first two classes here. In today's post we will make a start talking about how we can implement table constraints using triggers. But before we do that we will offer some thoughts on how the ideal world with regards to this subject would look like.

Long ago, in a galaxy far away, an ANSI/ISO SQL standardization committee came up with the ground breaking concept of a SQL ASSERTION. SQL assertions would make our lives real easy when it comes down to implementing table constraints. The example constraint "we cannot have a manager without a clerk in the same department" could be implemented as:

create assertion managers_need_clerk as
check(not exists
        (select 'a department'
         from (select distinct deptno from emp) d
         where exists 
                 (select 'a manager in d'
                  from emp e
                  where e.deptno = d.deptno and e.job = 'MANAGER')
           and not exists
                 (select 'a clerk in d'
                  from emp e
                  where e.deptno = d.deptno and e.job = 'CLERK')
        )
     )
/

Presto. Done.
It would then be up to the DBMS to maintain this constraint. Of course we require the DBMS to do that in an efficient manner. This will all become clear during the course of the next couple of posts.

By the way you may think, why not just do this with a CHECK constraint?

alter table emp add constraint managers_need_clerk as
check(not exists
        (select 'a department'
         from (select distinct deptno from emp) d
         where exists 
                 (select 'a manager in d'
                  from emp e
                  where e.deptno = d.deptno and e.job = 'MANAGER')
           and not exists
                 (select 'a clerk in d'
                  from emp e
                  where e.deptno = d.deptno and e.job = 'CLERK')
        )
     )
/

Well that's because CHECK constraints do not allow sub-queries. They don't allow that for the same reason as why we still do not have support for assertions. Both require the DBMS vendor to produce some seriously complex piece of software that can accept an arbitrary complex boolean SQL-expression and compute from that the most efficient way to maintain that boolean expression inside all concurrently running transactions. The research and development effort for this still needs to be done.


So, we have to resort to some other means when it comes to implementing table constraints. Here's a list of possible implementation strategies.


So we've discussed the first one already: it's highly preferred, but unfortunately only a very partial solution. The only table constraints that we can deal with declaratively are:

  • Keys (be them primary or unique), and
  • Foreign keys, in case the FK refers back to another column in the same table (in which case the foreign key is a table constraint, and not a database constraint).
The trigger approach is what the rest of this blog will be all about. Every table constraint can be implemented using triggers. Contrary to popular belief this is doable. But it *is* rather complex. The fact that this a full solution for the table constraint class, is a big pro though.

Then there is the API-approach. This is the approach where you encapsulate all DML statements inside stored procedures, and disallow any direct DML access to your tables. The only way to modify your tables is through the stored procedure API-layer. And inside this API-layer, you deal with constraint validation, just before or right after you issue the DML statements. To me this is a fundamentally flawed solution, since it will always lead to constraint enforcing code duplication. And since it is not a "once and for all" solution. Everytime you maintain your application and need to introduce new transactions, you'll have to take care of constraint enforcement again. Many people also tend to completely disregard the complexities involved. To a certain extent, they are the exact same complexities as are involved in the trigger approach: you'll have to take care of serialization and efficiency (all explained in future posts) in this approach too. And finally, in practice it is very difficult to maintain the enforcement of only allowing access to you tables via the API-layer. At some point people will find ways to go around the API-layer, and thereby be able to introduce corrupt data.


And there is the function based index trick. This is where we can use unique, function-based, indexes to implement keys across a subset of rows in a table. This too is a very partial solution for implementing table constraints. I'm assuming, since you read this blog, you are familiar with this trick, otherwise let me know in a comment and I'll provide you with an example.


Another approach is to employ materialized views in an ingenious manner. Like triggers, this is in theory a full solution, but in practice only a very partial one. It all boils down to the same research and development effort mentioned above when we discussed assertions, not having been done yet. We will discuss the materialized view approach in our next post.

Stay tuned.