Job98456


 
HomeCalendarFAQSearchRegisterMemberlistUsergroupsLog in

Share | 
 

 Time and Time Again: Managing Time in Relational Databases, Part 13 - A Roadmap of the Rest of This Series

Go down 
AuthorMessage
raj_mmm9



Number of posts : 1850
Age : 55
Registration date : 2008-03-08

PostSubject: Time and Time Again: Managing Time in Relational Databases, Part 13 - A Roadmap of the Rest of This Series   Sun 13 Apr - 20:03

We are nearly ready to discuss the penultimate pattern, Version Pattern 6. This pattern expresses the combined semantics of all the version patterns we have discussed so far. The last versioning requirement we will examine adds the requirement to version error corrections, and it is Version Pattern 7 that will accommodate that requirement, as well as all the patterns previous to it. Version Pattern 7 is therefore the “ultimate” versioning pattern to be discussed in these articles.

It is clear that if Pattern 6 can express the semantics of patterns 4 and 5, it can express the semantics of all prior patterns. And indeed it can.

Like Pattern 4, it distinguishes business effectivity dates from database activity dates, making it possible, for example, to create new versions in advance of the date on which they will become effective.

Roadmap for the Remainder of the Series

The high-level view of the roadmap to the rest of this series is as follows. Items are listed in the sequence in which they will be discussed.

Versioning Using Surrogate Keys

Surrogate keys are primary keys whose values are system-created and free of any business content. Although Dr. E. F. Codd’s Information Principle recommends the use of business data rather than surrogate keys as primary keys, there are both theoretical and practical reasons for using surrogates as primary keys.

The distinction we have already made, between objects and their versions, is best represented, when surrogate keys are used, by a pair of such keys. One of them we will call an “object identifier” (“object id,” abbreviated as OID), and the other a “version identifier” (“version id,” abbreviated as VID). These matters will be the focus of the Roadmap Item #1 discussion.

The Physical Implementation of Version Patterns 1 – 5

In earlier articles, we noted that each of the first five version patterns might be an appropriate solution in a specific versioning situation. Those articles discussed the semantics which each pattern supports, and described each pattern in what we will call logical data model (LDM) terms. With Roadmap Item #2, we will provide the SQL DDL and DML that define the corresponding physical database schemas, and that provide access to them. We will call this the physical data model (PDM) representation of these patterns.
This clears the deck by completing our discussion of the first five patterns. It's time, then, to move on to more advanced patterns.

Version Pattern 6: Semantics and Implementation

In semantic terms, this pattern is the logical union of the semantics of all preceding patterns. After demonstrating that this is so, we go on to a PDM discussion, which demonstrates how to actually build tables that conform to this pattern. Also, with this pattern, we switch from business keys to surrogate keys. This will be the focus of the Roadmap Item #3 discussion.

Managing Recurring Objects

Recurring objects are anything that persists over time, are recognized by us, go away, and later on come back. A policy that has lapsed, but then is renewed, is a recurring object.

But the most important kind of recurring objects, for businesses, are customers – people or organizations who purchase goods or services from us. The principal reason that recognizing returning customers is so important is human psychology. Returning customers want to be greeted like old friends. They don't want the standard "We're happy to welcome you as a customer of ours" greeting. They want a "Welcome! It's great to have you back!" greeting.

Another reason it is important to recognize returning customers is so we can use the valuable knowledge we have gained about them from their previous history with us, particularly the knowledge of what kinds of products or services they tend to purchase from us. If we're a company really on its toes, we may even have captured the information about why they left us in the first place. All this information is far too valuable to lose just because the IT Department tells upper management that "the match problem is difficult"!

From the point of view of temporal database management, however, it doesn't matter whether the returning/recurring object is a person or a policy, a customer or a competitor. The point is that when a new row is inserted into a versioned table, it is inserted as the next version of an object for which we already have prior versions, even though there is a gap in time between when the object was terminated and when this new version is inserted. These gaps mean that we can recognize logical groupings of versions, one group for each continuous set of versions, with gaps distinguishing the groups. These groups are what we called "episodes" back in Part 5.

We already know how to record the reappearance of a returning object in a versioned table. But there are two related issues, one of which we will discuss, and one of which we will not. The issue we will not discuss is how to recognize a returning object as an object we have seen (and recorded in our databases) previously. In a non-temporal form, this is identical to the issue of how to recognize that data from two different input streams, very often with different and incomplete identifying data, are indeed for the same object. The issue, in either its temporal or its non-temporal form, is "match logic,” and it differs so much from business to business, and object type to object type, that nothing useful could be said of the general match logic problem in much less than a book-length treatment.

The issue we will discuss is what to do when we handle reappearance incorrectly. The more general issue is what to do when our match logic makes the wrong choice. The first wrong choice is to say that we have a recurring object (a match to an object already in our databases) when in fact we do not. The second wrong choice is to say that we do not have a recurring object when in fact we do. The third wrong choice is to say (correctly) that we have a match, but to identify the wrong object to match it with.

In the first case, once we identify our mistake, we have to split the incorrect match, separating data identified as being for one object into two sets of data, one for each of the objects we now recognize as distinct. In the second case, once we identify our mistake, we have to merge the data for the two objects into data for a single object, either letting one of the two objects "survive" the merge, or else creating a new third object to merge the first two into. In the third case, we have to do both.

This is not a versioning problem, but it is a problem that inevitably arises when we are required to (try to) correctly recognize recurring objects. When that requirement is combined with the requirement to version the objects in question, we get a situation in which error correction becomes a very complex problem. So the relevance of this issue to versioning is that if both recurring objects and error correction are requirements, the problem of correcting match logic mistakes becomes in part a versioning problem.

This complex problem is the subject matter of Roadmap Item #4. But most of the analysis of this problem is relevant whether or not the target tables being affected are versioned or not. So because the requirement to recognize recurring objects is being imposed on more and more business application systems, and because the work required to correct the inevitable match mistakes is so complex, we think it is warranted for us to branch off from our core topic of versioning, and consider the related topic of correcting object matching mistakes.

Version Pattern 7: Semantics and Implementation

With Roadmap Item #5, we present Version Pattern 7, our “ultimate” version pattern. Because of its importance, we will provide a more thorough and detailed discussion of it than we have of the prior version patterns.

In particular, we will provide over a dozen usage scenarios for this pattern, whereby “usage scenario” we mean something like a use case, except that a workflow of separate processing steps is not part of a usage scenario. Our usage scenarios focus on those events in which one or more versioned tables are either updated from source transactions, or queried with SQL.

For each usage scenario, we will discuss both the updates required to get the retrieved data into the database, and the queries required to get that data out. An LDM level discussion will be followed by the physical schemas, transformation code and SQL which constitute the PDM discussion. Specifically, we will cover the following points:

Sample tables, in their logical form, as already used with the earlier patterns. In addition, sample physical tables, and the DDL used to generate them. All sample tables will contain instance data sufficient to illustrate the scenario being presented.

Timeline diagrams for each scenario. These are the same diagrams that have been used to explain earlier version patterns.

The semantics of each update scenario. These semantics will be clearly stated. In addition, the transforms from source transaction to versioned table updates will be documented, either as code or as “logically tight English.” These transforms provide one aspect of the encapsulation of versioning shown in Figure 2 of Part 1.

The semantics of view creation. This will include SQL view creation statements, and their instance-level results. These views provide the complementary aspect of the encapsulation of versioning shown in Figure 2 of Part 1.

The semantics of temporal queries against these views. The objective here is to maximize the simplicity of the query, while keeping all the temporal semantics available for the query author.

Star Schema Versioning

Star schemas are the most common data model for fact/dimension databases, often referred to as "data marts.” But versioning doesn't apply to fact tables because fact tables are about events, and events don't change. Events happen, and then they are over with. Versioning tracks change over time, and only objects which persist over time can change over time. They change by participating in events. In the Roadmap #6 articles, we will discuss the semantics – indeed the ontology – of events and persistent objects, in order to understand this important point more clearly.

However, the dimension tables of fact/dimension databases do represent persistent objects. Indeed, they represent the very persistent objects whose properties or relationships are altered by their participation in the events recorded in their fact tables. Because dimensions represent objects that change over time, versioning is applicable to them. The Roadmap #6 articles will provide both a logical and a physical description of the use of versioning in fact/dimension databases.

Snapshots

In Part 1, we provided a complete taxonomy of ways in which historical data is maintained in business databases. In particular, we distinguished snapshots from versioning. Although this series has focused on versioning, snapshots also have their place in the management of temporal data. Roadmap Item #7 discusses snapshots and their uses.

Sandbox Versioning

Often, application systems need to accumulate updates over a period time, and across several physical units of work (UOW) that represent a single logical business transaction. Not until all updates are applied, and the results of the complete set of updates validated, are those results made visible to database end users. In other words, the updates are accumulated from several different UOWs, but are then assembled into a single logically atomic transaction, i.e., one in which the updates either all succeed or all fail, and in which the state of the database while the updates are being applied is not visible to end users.

One common way of implementing the requirement to create a single, multi-transaction atomic update out of updates from several distinct UOWs is to copy the database schema and store the results of the updates in this copy. Once all updates have been applied, and all results stored in the copy database, those results are then validated and either approved, or re-applied until the results are all correct. At that point, all the data is moved from the tables in the database copy to the corresponding tables in the live database.

This way of implementing a multiple UOW to single atomic update requirement is often costly. Redundant objects and processes may be needed to validate the data stored in the copy database. In addition, it may require significant amounts of system resources to marshal this data into and out of the copy database.

Sandbox versioning will be discussed when we reach Roadmap Item #8. We will explain how the requirement we have described can be met without the need for a copy database, and also without the need for intra-database copies of the updated tables for these pending updates. Instead, everything is done within the original tables themselves. When the results have been validated and approved, the results become visible to the database user almost instantly, regardless of how many rows across how many tables have participated in this "mass update."

Sandbox versioning functionality also provides the basis for the “check-out” and “check-in” capabilities of hierarchical versioning, which is described next.

Hierarchical Versioning

The versioning we have been considering, up to this point, we may call “linear versioning.” Except for error corrections, versions of the same object follow one after the other, with or without intervening gaps, but never overlapping. They thus constitute a linear sequence of versions, each the truth, and the only truth, about the object during the version’s effectivity period of time.

But businesses also have a need for hierarchical versioning. As the name suggests, this is versioning in which one or more versions are both a) “child” versions of the same “parent version,” and b) “valid,” in some sense, in concurrent and/or overlapping periods of time. IT professionals deal with this kind of versioning when they use software that allows them to “check-out” and “check-in” copies of a base object.
Back to top Go down
View user profile
 
Time and Time Again: Managing Time in Relational Databases, Part 13 - A Roadmap of the Rest of This Series
Back to top 
Page 1 of 1
 Similar topics
-
» After a long long time, we can have a clam sleep
» Easy way to calculate the time to double your money.
» good time to collect shares for new investor?...
» Time to invest in stock market: SC Securities
» Only 6 months more; time to discuss new SLFRS and its impacts to stocks

Permissions in this forum:You cannot reply to topics in this forum
Job98456 :: Databases-
Jump to: