When we design relational schemas for databases, there are many tasks that rely on, or at least benefit from, the closure of attributes with respect to a given set of functional dependencies (FDs). However, it could take us a lot of efforts to enumerate all the combinations of the attributes and solve each closure of them, which will grow exponentially with the number of attributes.
Consider a relation R with attributes A1, A2, …, An. The number of the combinations of no attribute, one attribute, to all attributes, are n choose 0, n choose 1, …, n choose n, respectively, which are the binomial coefficients of n. The sum of them equals 2 raised to the power of n.
A First Course in Database Systems, Third Edition by Jeffrey D. Ullman, and Jennifer Widom shows us in Example 3.13 that there are two kinds of obvious simplifications.
- Closing the empty set and the set of all attributes cannot yield a nontrivial FD.
- If we already know that the closure of some set X is all attributes, then we cannot discover any new FD’s by closing supersets of X.
Apart from these, I have not read other written ways of simplification. I supposed that it would be required to close other combinations of attributes with brute-force, and only by then could I spot useful closures and useless ones.
By saying ‘useless’, I mean closures that are: 1) deduced from trivial FDs only, e.g. {A, B}+ = {A, B}, or 2) augmented by only trivial FDs, e.g. {A, C}+ = {A, B, C} when {A}+ = {A, B}, except if this closure indicate a key with a minimal initial set, in the last example if ABC are all the attributes and AC is the minimal possible combination to get them.
The idea of brute-force worked me through quite a few exercises until I stumbled upon an exercise over a relation with 6 attributes but only 4 FDs, excerpted here:
Consider a relation Stocks(B, O, I, S, Q, D). Let the set of FDs for Stocks be S→D, I→B, IS→Q, and B→O.
A First Course in Database Systems, Third Edition. Exercise 3.5.3.
The combinations of the attributes, excluding the empty set and universal set, are:
- B, O, I, S, Q, D;
- BO, BI, BS, BQ, BD, OI, OS, OQ, OD, IS, IQ, ID, SQ, SD,QD;
- BOI, BOS, BOQ, BOD, BIS, BIQ, BID, BSQ, BSD, BQD, OIS, OIQ, OID, OSQ, OSD, OQD, ISQ, ISD, IQD, SQD;
- BOIS, BOIQ, BOID, BOSQ, BOSD, BOQD, BISQ, BISD, BIQD, BSQD, OISQ, OISD, OIQD, OSQD, ISQD;
- BOISQ, BOISD, BOIQD, BOSQD BISQD, OISQD.
With the contrast between the number of the combinations and the number of the FDs, I feel neither worthy to list all the combinations nor easy to solve all the closures correctly. Let alone relations with even more attributes! I should find some other rules to make the problem simpler.
Consider the process of closing attributes regarding a set of FDs. Attributes in the initial set (left side of the equal sign) of closures will
- contribute themselves to the resulting set (right side of the equal sign), always;
- contribute the right-hand side (RHS) of any FD, if they satisfy its left-hand side (LHS);
- contribute nothing else, if they do not satisfy the left-hand side of any FD.
Thus generally, on one hand, attributes won’t contribute anything more than themselves to closures if they don’t appear in any FD’s LHS, i.e. (1) attributes that may contribute more than themselves to closures all reside in the LHS of the given FDs. On the other hand, all the FDs won’t contribute more than their both sides, i.e. (2) attributes that contribute only themselves to closures may become indispensable for a key, sometimes.
Claim (1) makes me think that we should start with combinations of the LHS of the given FDs. Beware, though, of the trivial situations when some RHS and LHS of individual FDs overlap. For example, with AB→C and CD→E, ABD, instead of ABCD, will suffice to close to ABCDE.
This consideration leads to another claim: (3) An FD with part of its LHS in a closure’s resulting set can contribute its RHS if and only if the closure is augmented with at least the rest part of the FD’s LHS. Of course, we should try to find the ‘least’ of such part.
With these 3 claims, I’m ready to present my
Procedure of getting closures of a relation given a set of FDs:
- Start with any FD. Solve the closure of its LHS with respect to all the FDs. (Claim 1)
- Find candidates for augmentation by solving every set difference of any other FD’s LHS and this closure’s resulting set, respectively, i.e. attributes in the FD’s LHS but not in the closure’s resulting set. (Claim 3)
- Augment the closure with the set differences from the last step, one by one, generating a series of new closures.
- For any newly generated closure, repeat steps 2-3, until no new closure can be found.
- Repeat the above steps for all other FDs.
- Finally, augment every closure with all the attributes that are absent from its resulting set, to reveal keys of the relation. (Claim 2)
- Anytime, ignore a closure whose initial set is a superset of some other closure’s initial set and is a subset of that closure’s resulting set.
This procedure can reduce the amount of work when the number of FDs are much fewer than the number of combinations of attributes, which is often the case when attributes reach more than four.
Let’s try this procedure on the Stocks exercise.
With S→D, we have S+=SD at first. The candidates for SD are I, and B. Augmented closures are SI+=SIDBQO and SB+=SBDO. SI is a key. The candidates for SBDO is I. The augmented closure is SIB+ can be ignored since SIB is a superset of SI and a subset of SI+, or simply put, SI is already a key.
With I→B, we have I+=IBO at first. The candidate for IBO is S. We have solved IS+ already.
With IS→Q, we have solved IS+ already. There is no candidate for augmentation in any way, since it’s already a key.
With B→O, we have B+=BO at first. The candidates for BO are S, and I. Augmented closures are BS+ and BI+. We have solved BS+ already. BI+ can be ignored since BI is a superset of I and a subset of I+.
Until now, we have only these useful closures:
S+=SD, SI+=SIDBQO, SB+=SBDO, I+=IBO, B+=BO.
Augment S+ with IBQO, since SIBQO is a superset of SI and a subset of SI+, it can be ignored. The same applies for the rest attempts to find a key other than SI.
Thanks to the simplicity of the FDs, we can solve all the useful closures with just 10 enumerations in this exercise, much less than all the combinations of the attributes.