The Tech Law Guy: In The Data Trenches, Part 2

Spreadsheets store information in what is known as a “flat file” format – a chart X columns wide and Y rows long. In contrast, relational databases store information in “tables,” which are connected via “links” or “relationships.” These relationships allows smaller tables to be combined to provide all the information found in a larger file, but in a format that is easier to administer.

Consider this list of fields in a spreadsheet used as an address book:

Company Name

Address1

Address2

City

State

Zip

Contact Person

Contact Title
Contact Phone

Contact Cell

Contact eMail

This file is simple to build and simple to use. It can be quickly tied to a mailing program to automate correspondence with the contact person. It is also relatively easy to change if the contact has a change of title or phone number, or is replaced as the principal contact.

But this file does not provide an easy way to track a contact to any new assignment or employer. One might create a “Contacts” tab in a workbook, and keep individual information there. But that quickly creates the need to manually cross reference between the “Client Company” tab and the “Contacts” tab, which is inefficient and opens the door to errors. A relational database can perform such cross references automatically, by tying two different sets of data together. In this case, the two data sets might be laid out as follows:

Company Name Contact Name

Address1 Company Name

Address2 Title

City Work Phone

State Mobile Phone

Zip eMail

The cross reference is the duplicate field “Company Name.” Information entered into the first table automatically becomes available in the second, typically as a drop-down selection. That is one of the great advantages of a relational database – when well designed, it requires information to be entered only once, but permits it to be used in many ways. But this strength rests on one of greatest complications of relational databases – the need to “normalize” the tables or data sets.

“Normalization” is the process of arranging tables and fields to avoid duplication. In practice, it is the process of breaking a large data set into component parts, and tying those parts together with relevant links or relationships.

Consider some relevant questions about a client:

· What is their contact information?

· How many projects are we handling for them?

· Who is the contact for each project?

· What is the budget for each project?

· What are expenditures to date for each project?

· What is the current status of each project?

· What are the next steps for each project?

· What obstacles have been encountered?

Creating one flat file to capture these disparate types of information would tax even a spreadsheet expert. A relational database, however, permits each of these questions to be addressed in turn, using tables for Client, Contacts, Projects, Expenses and Status.

CLIENTS                               CONTACTS

Company Name                     Contact Name
Address1                               Company Name
Address 2                              Title
City                                       Phone
State                                     Mobile
Zip                                        eMail

PROJECTS                             EXPENSES

Project                                   Project
Company Name                      Expense Amt
Start Date                              Expense Type
Est Completion Date                Date
Contact Name                         Notes
Budget
Notes

STATUS

Project
Current Status
Status Date
Next Steps
Waiting For
Notes

The tables Clients, Projects, Expenses and Status are in order. Information flows from Clients to Projects via the Company Name field, while Projects, Expenses and Status are all tied using the Project field. The information for Company Name and Project need be entered only once, suggesting that this portion of the database has been normalized.

The Contacts table, however, may be problematic. Note that it shares information with both Clients and Projects. While these relationships do not conflict, they will require that care be taken to ensure that the correct data sources are used when reports are generated or queries run. A better approach might be to delete the Contact Name field from Projects, and create a separate table consisting of only Contact Name and Project.

We will explore this question in our next installment, on table design.

The Tech Law Guy

Friday, January 17, 2014

In The Data Trenches, Part 2 – Normalization

No comments:

Post a Comment

Followers

Blog Archive

Contributors