HomeCalendarFAQSearchRegisterMemberlistUsergroupsLog in

Share | 

 Time and Time Again: Managing Time in Relational Databases, Part 10 - Foreign Keys in Version Tables

Go down 

Number of posts : 1850
Age : 55
Registration date : 2008-03-08

PostSubject: Time and Time Again: Managing Time in Relational Databases, Part 10 - Foreign Keys in Version Tables   Sun 13 Apr - 20:05

In trying to explain versioning to other data modelers and DBAs, we have found that many of them are worried about the cascade update implications for versioned tables. In general, the concern is that if a new version of an object is inserted into a version table, then all foreign key references to the version it replaced must be updated to point to the new version. Furthermore, if those updates take place in tables which are themselves versioned tables, then those updates create additional new versions, and all foreign keys that point to them must also be updated. And so a single versioned update can cascade out to affect many other versioned tables, ending only when a cascade chain reaches either a) a non-versioned table or b) a versioned table which does not itself contain any foreign keys.

These concerned modelers and DBAs correctly point out that this cascade effect can make versioned updates very expensive. Furthermore, if there are any circular references, the cascade will become an infinite loop. For example, consider this pair of tables: a Client Version table and our Policy Version table. The Policy Version table contains a foreign key to the Client Version table, and each row in the Client Version table contains a count of the total number of versioned updates across all policies for that client. The infinite loop happens like this:

A new version of a policy is inserted. Call the old version P12 and the new version P13.
This triggers code that updates the policy version count in the related client, resulting in a new version of that client. Call the old version C25 and the new version C26.
At this point, we must cascade update the policy version. P13's foreign key still points to client version C25, and it must be updated to point to C26. This creates a new version of that policy, P14.
But now the trigger executes once again, updating the policy version count in the related client. This loops us back to step 2, where we update that count, creating client version C27.
But once again, we must cascade update the policy version. P14's client foreign key still points to C26. This update creates a new version of that policy, whose client foreign key points to C27. This new version of the policy is P15.
And so on, and so on, to either the last syllable of recorded time, or to a frustrated machine operator who cancels the process.
Also, notice that so far, we have talked only about cascade updates. There are also, of course, cascade deletes. As we will see later on, the semantic constraints surrounding cascade deletes in versioned tables are not identical to those for cascade deletes of non-versioned tables.

What to Do About the Cascade Update Problem for Versioned Tables
There are only three possible responses to the versioned table cascade update problem (or to any problem). They are: a) ignore the problem, b) avoid it or c) fix it.

Ignore it. The first response is to ignore the problem. If we take this approach, it means that in many if not most cases, a single versioned update will trigger any number of additional versioned updates. Perhaps the attitude about multiple updates is: "DASD is cheap; CPU cycles are cheap. So just let it happen." But then what about the possibility of an infinite loop? Perhaps the attitude about infinite loops is: "Design your databases so there aren't any loops."

In some specific cases, this may be an acceptable response. But if our objective is to develop an enterprise solution (which, indeed, is our objective), then this response just won't do. We need a response that works in all possible cases, not just in special individual cases. We can't just ignore the issue of cascade updates in versioned tables.

Avoid it. The second response is to avoid the problem. This means to turn off referential integrity checking in the DBMS (which would stop the loop in the client to policy direction). If we can accept cascade updating, and are only concerned to avoid the infinite loop, an alternative is to turn off the developer-written code which updates counts (which would stop the loop in the policy to client direction).

Avoiding the problem is not as cavalier as it sounds. Standards committees are at loggerheads over how to introduce temporal management functions into relational DBMSs, and as a result, DBMS vendors are stymied. Consequently, there is no adequate support for temporal database management in current relational DBMSs. So we shouldn't be surprised that there is also no DBMS-implemented referential integrity that works for temporal (versioned) tables. Until there is such support, avoiding the problem by turning off referential integrity may be a very reasonable response.

But isn't the proper conclusion, given DBMS shortcomings, to implement referential integrity with our own code? Isn't the proper conclusion to fix it, not to avoid it?

That depends, of course, on a comparison of the costs and benefits of the two responses. Specifically:

What costs do we incur if we fix the problem? We may assume that these are all IT costs, not business costs. The answer is that we incur the development and maintenance costs that are part of any IT software artifact.
What benefits do we garner if we fix the problem? We may assume that these are all business benefits, not IT benefits. The answer is that a) we can use DBMS-supplied referential integrity, and thereby gain the assurance that RI errors will not occur; and b) by updating foreign keys whenever referenced rows are replaced by newer versions, the relationships involving versioned tables will remain current.
What costs do we incur if we avoid the problem? We may assume that these are all business costs, not IT costs. The answer is that a) since we cannot use DBMS-supplied referential integrity, we must therefore write our own code to guarantee that integrity; and b) since relationships among versioned tables are not updated if we avoid the problem, such relationships gradually become more and more "out of date" as additional versions of a foreign-key referenced object are created.
We may assume that there are no benefits to avoiding the problem (other than cost avoidance).
Fix it. The third response is to fix the problem. To fix any problem, we need to start with a correct diagnosis. We need to identify the root cause, the thing which, if fixed, will eliminate the problem. We suggest that the root cause of the problem is the instability of foreign-key references when those foreign keys point to versioned tables.

With "normal" (non-versioned) tables, cascade updates are rare because key changes are rare. If we were dealing with a Client and a Policy table - instead of with a Client Version and Policy Version table - then nothing we have considered so far would require a cascade update. But because we are dealing with foreign key-related version tables, every new version in the referenced table (Client Version) requires a cascade update in the referencing table (Policy Version). Because we have chosen to create a new version in the referenced table every time a new version is added in the referencing table (by incrementing a count), we loop back and create an infinite cascade update problem.

We can fix the infinite loop problem quite easily. In the case of the example we are working with, the infinite loop will be broken if a versioned update in the referencing Policy Version table does not cause a new version to be created in the referenced Client Version table. That means that we either don't keep a count of policy versions in the client, or else that updating that count is an overwrite type of update, not one that requires a new version to be created.

Additional discussion of the infinite loop problem, and the fix for it, might provide additional insight into the root cause of the overall cascade update problem. But we prefer to go after that additional insight by turning to the other part of the problem, the cascade update from "parent" to "child," from the referenced table to the referencing table.

A cascade update happens when a primary key changes, and there exist one or more foreign keys which reference it. So to fix a cascade update problem (not ignore it and not avoid it), we must either a) eliminate primary keys, b) eliminate foreign keys or c) stabilize the link between them.

We can rule out the first two options immediately. They amount to conceding that the relational model cannot handle versions and, more generally, cannot handle time. While it might be interesting to pursue this concession, we have neither the time nor the inclination to do so here.

But what does the third fix mean? What does it mean to "stabilize" the link between a primary and a foreign key? Let's start by looking at an example of these keys.
Back to top Go down
View user profile
Time and Time Again: Managing Time in Relational Databases, Part 10 - Foreign Keys in Version Tables
Back to top 
Page 1 of 1
 Similar topics
» After a long long time, we can have a clam sleep
» Time to invest in stock market: SC Securities
» Only 6 months more; time to discuss new SLFRS and its impacts to stocks
» Rupee hits all time low
» A Good Time to Trade in Gently Used Car -Is it Same SriLanka

Permissions in this forum:You cannot reply to topics in this forum
Job98456 :: Databases-
Jump to: