Wednesday, October 22, 2008

Referential Integrity and Domain Tables

In my last posting about constraints I discussed field and table level properties. Another way to implement constraints is the use of Referential Integrity. Referential integrity (RI) enforces primary key and foreign key relationships within the database. These RI rules are most commonly used to prevent the occurrence of orphaned records, records in a child table which are not linked to a record in the master table.

Orphaned records are best explained using an example. In the database you have a customer table and an order table. Each order is linked to a particular customer using the customer’s primary key value stored in a field referred to as a foreign key. If the customer’s primary key is changed then all of the orders associated with that customer are no longer associated. These orders have become orphaned.

RI rules prevent this from happening by checking for associated records when an Update or Delete occurs. For instance if a CASCADE UPDATE rule was in place all of the records in the order table that were associated with the customer whose primary key changed would be updated to the new value. Using an RESTRICT UPDATE would not allow the user to change the primary key value if there were orders associated with the customer. There are two other rules SET DEFAULT and SET NULL which change the foreign  key to the specified default value or remove the value respectively.

Rules for deleting can also be defined using the same rules although they behave slightly different. A CASCADE DELETE will delete all records in the orders table that are associated with a customer which is deleted. I would recommend avoiding use of CASCADE DELETE rules in general. A RESTRICT DELETE is a good option because it prevents the customer from being deleted if there are related records in the order table. I like the SET DEFAULT UPDATE rule since it will set the foreign key value to the default making these records easy to locate.

Another aspect of an RI rule is that a record in the child table cannot be inserted without a valid value from the parent table in the foreign key field. This ensures that these tables remain linked, validating the input into the child table. This can be a very useful aspect of RI when used in conjunction with domain tables.

A domain table contains information about a particular category typically containing a code and a description. For example a marital status table could contain the following information:

Code Status
S Single
M Married
D Divorced
W Widowed

We can now setup an RI rule relating an employee table to the marital status domain table. This will ensure that if the status code ever changes the change will be reflected in the employee table. It will also ensure that a valid value, one of the codes defined in the marital status table, is used for the marital status. This essentially creates a constraint on the valid data for that table. The RI rule uses the primary key from the marital status table (Code) and relates it to the foreign key in the employee table (MaritalStatus).

Referential Integrity Rule Diagram

With this relation in place an error (7076) is reported if the user enters a value for marital status which is not in the marital status table. You can catch this error in your application and provide a more detailed description of the error.

Relation Error

RI Rules are a powerful mechanism for ensuring logical data integrity, however, they do have a few drawbacks. First ensure that your RI rule does not prohibit legitimate data entry from occurring. Creating too many rules or rules on the wrong tables valid user input may be discarded. Second, enforcing RI rules consumes resources on the server. This can delay record inserts, updates and deletes. In general use RI for your most critical relationships. If the data must have specific values or be related for your application to function correctly then RI is an excellent solution.

Constraints provide a database level mechanism for ensuring data integrity. This becomes more important when more than one application accesses the data. By centralizing critical constraints there is less margin for error when using the data from multiple client applications.

No comments: