What is a data dictionary and how is it used when communicating and managing requirements?
Definition
A data dictionary is a collection of the definitions of the structure of information that is relevant to a set of requirements. That’s a lot of words for a simple concept. We need to know (and constrain) a set of information about some business element when managing our requirements. We use a data dictionary to define what that information is, and any constraints on how it must be used.
Viewing The System
When using object oriented analysis (OOA) as part of defining requirements, we represent business concepts as objects and processes. For example, an order management system might define orders as having line items and customers. We can represent that information graphically with a UML diagram like the following:
In prose, we could also capture the same information as follows:
- The system shall include a representation of customer orders.
- Each order will have a single associated customer, and each customer can have multiple orders. Note that a customer is not required to have any orders.
- Each order will have at least one, and possibly multiple line items. Each line item is uniquely associated with a single order.
- Each line item represents a single product. Note that a product is not required to be represented in a line item. A product can be represented in multiple line items (even within the same quote).
While this diagram tells us about the structure at a high level, it doesn’t tell us enough information to go implement the solution. What exactly is a line item? What information does it contain? And what format must that information be in?
A Dictionary Entry
We could create a data dictionary entry for the line item object as follows:
Line Item
A line item represents a portion of a customer order that describes a product being ordered, as well as the quantity of that product being ordered. Each line item must include the following information:
- A reference to the product being ordered, using the product ID per constraint X1. [Note, the constraint is imposed by the existing product data management system, with which our software is required to integrate.]
- The quantity of the product being ordered, where the quantity is a positive integer. [Note, we would include a maximum value, if there were a constraint imposed by some other part of the system.]
Note that we have not specified that a line item includes a price. It is very likely that a line item would have a price, but we would be specifying implementation details if we did. Pricing may be done per product, or may be unique for each product for any given customer. Discounts may be applied based upon quantity of products in a line item, or dollar amount for an order. Discounts may be applied based upon all products ordered by a customer over a period of time. These different possibilities are a function of the requirements of the system.
When those business requirements are defined, they will dictate the ownership of properties by business objects. With that information, we can include the data as appropriate. For example, a list price property may be defined for the product object, or a customer-price may be defined for a line-item as a function of (product, quantity, customer). We would add that data as part of the business modeling. Note that this is a description of the problem domain, not a description of the implementation.
Another Data Dictionary Example
Here’s an example of a “Customer”
A customer represents the business or person for whom an order has been placed. Note that all character fields are to be represented in unicode 4.1.0 or later per corporate policy ABC. A customer has the following information:
- Name. 50 characters representing the name of the customer.
- Shipping Address 1. 100 characters representing the first line of the address to which all customer shipments are made.
- Shipping Address 2. 100 characters representing the second line of the address to which all customer shipments are made.
- Shipping Country. 50 characters…
- Billing Address.
- Customer Contact.
- etcetera.
This list is intended to show all of the elements of information that must be present in the “customer object” to support the requirements of the system.
Further Reading
Joe, at Seilevel wrote a post back in March with a good explanation of data dictionary entries. As Joe points out, requirements can drive the need for specific information.
For example, my business users have told me that the number of decimal places of each weight value tracked by the system is very important for monitoring and reporting. It stands to reason that other objects and attributes might require the same level of specification. If you figure it out once, you can use it in many places.
Barbara, at B2T Training points out the importance of understanding the details of the data for a system. She also touches on the value of having that information in a separate document.
Many BAs document data as part of the business process or part of the Use Case. Our recommendation is that you document data in a separate part of the requirements package because it is often used in multiple places.
Usage
A data dictionary should be defined as a repository of all data definitions like the examples above. Those examples should be referenced in all requirements documents that rely on the defined objects. Requirements documents should not specify the content of the objects, they should defer to the referenced dictionary entries.
Some projects, especially migration projects, have many constraints tied to data formats and structure. These projects will have extensive data dictionaries, and multiple references to entries throughout the requirements document. Other projects will have far fewer constraints on data formats, but will still have explicit structural definitions for business objects (like our line item example).
– – –
Check out the index of the Foundation Series posts which will be updated whenever new posts are added.
Scott,
The examples in your post could have been examples for items in a glossary as well as items in a datadictionary as well as descriptions given for the UML model. In our products, we usually declare only one of them to be used as true requirements.
Could you elaborate more on the differences and when you use each of these three?
I’m used to using the datadictionary as a design artifact, do you?. Do you propose you use native (database) datatypes in your datadictionary?
BTW,
A Nice way to add documentation to a UML model is the use of OCL, Object Constraint Language, which allows a VERY precise specification of all kinds of business rules.
Harry Nieboer
Thanks for the comment Harry (and as a long time reader, thanks for sticking around!).
Like you, I have used the data dictionary as an artifact, but managed it within and as part of the the requirements.
The differentiator I’ve used in the past for glossary vs. dictionary is as follows: If it is an industry standard term (like “gross margin”), a definition or reference is included in a glossary of terms (which is explicitly informative, but not an artifact). If it is a company-specific term, calculation or reference (like “shipping costs are based on weight”), it is part of the data dictionary – as a definition of how to calculate shipping for this customer.
On the projects where I’ve used OOA in addition to structured requirements, the OOA diagrams have been reference documents for clarification and understanding. The diagrams can be so informative, but their lack of atomicity makes them almost impossible to manage in a requirements system.
If you or anyone knows a better or different way to manage the diagrams, please add a comment and let us know.