ѻý

Skip to Content

GenBG – How to generate an effective Business Glossary

Ralf Teschner
Jul 10, 2025

Customer Status. Supplier Type. Asset Value. Employee Date of Birth. Accounts Payable… Every large organization has tens of thousands of such data elements, even if less than 5% of those should be considered as ‘critical’.

It is for these Critical Data Elements (CDEs) that you need detailed business term definitions, typically of a page long. This provides clarity on their meaning, avoids confusion and drives consistency across departments, systems, countries and business units. Use cases include transformation programmes, cloud migrations, data monetization projects, management dashboards, data migrations, agile decision-making, and data mesh/fabric ambitions.

But developing such detailed definitions is fiendishly difficult. Why? Because you need to agree on common terminology and taxonomy for all affected systems and teams. You’ll need deep understanding of the business and IT landscape. You’ll want to capture current data issues that require resolution. You’ll need editing skills that pulls all this together concisely. And you better have strong diplomatic skills to facilitate compromise formulas. Most organizations therefore do not have a high-value, effective Business Glossary.

The Top 11 Myths around Business Glossaries

1. “A business term definition is just a paragraph! And you can find it on Google, or from the SAP/Oracle/Azure data model!”
If only it was that simple. A short sentence may be enough for simple concepts such as ‘Postcode’. But a single-sentence approach is insufficient for CDEs that are core to your primary applications, sit at the interface to other key systems, feed into management KPIs, often have a prescribed list of values or suffer from data quality issues.

2. “A Business Glossary (BG) is the same as a Data Dictionary (DD).”
Not true. There’s a DD for each system, listing every single field with lots of technical details. The BG is enterprise-wide, provides the business context, focuses on CDEs only, and effectively sits on top of all DDs.

3. “Only the department entering the data needs to write the definitions as they know best.”
But what about all those departments that are heavy users of this data. They are heavily affected by how the data is described, so should have a say in its definition.

4. “A BG is just a list of acronyms.”
Many long policy documents end with a list of all the abbreviations used in the text and then call it a Glossary. But that’s insufficient for a wide variety of BG users.

5. “Excel is fine.”
An effective BG allows for Wikipedia-style links between definitions, policies, systems etc. to show their relationship. A BG tool is far more effective in governing definitions, auditing their history, sustaining its content and communicating it enterprise-wide. Excel can’t reliably do any of this. You need a tool, which typically comes as part of a Data Catalog.

6. “A BG is used only by business folks.”
We find an almost even split between Business and IT stakeholders in using the content of a BG, including Data Analysts, Business Analysts, Data Architects, Data Engineers, Data Stewards and Data Scientists.

7. “A definition is needed for every single data element, so also for the 95% of data elements that are not CDEs.”
You can, if you want, insert a definition for House Number, Transaction Currency, or Credit Card Expiry Date, but there is very little value to it, so nobody would look for a definition for such obvious concepts.

8. “GenAI will take care of it.”
If only. But there are two major ways where GenAI can indeed help build your BG. More on that later.

9. “Data Quality and data policies have nothing to do with a BG.”
Well, if you want to measure data quality, you need to know what exactly to measure it against. And your data policies, processes and procedure documents make frequent reference to core data elements, so readers of these policies need to understand exactly what these concepts mean.

10. “A definition is a passive document.”
On the contrary, a well-composed definition usually provides several prompts for change actions, e.g. improving data quality, aligning terminology between departments, standardizing the list of values between systems, installing additional data entry controls etc.

11. “We can call it Data Glossary, Business Dictionary, Data Lexicon and data definitions.”
Try not to make up your own language. Standard worldwide industry lingo says they are called a Business Glossary and business term definitions. So it’s best to stick with the norms used by all tool vendors, Gartner, consultancies, partners, regulators etc.

What makes a good business term definition?

So what should a good definition contain? Obviously a high-level synopsis sentence that is good enough for the superficial user. Then sample data values, or even the full list of values if it’s a reference data standard with a drop-down menu. The definition should describe exceptions as well as synonyms and homonyms. It must show the business lineage (if this hasn’t already been created by your Data Catalog).

And then there are a whole range of critical questions to answer:

  • Why is this an important data concept? Which Business functions/stakeholders use the data, when, in which processes, and with which impact?
  • How is the data entered, cascaded, maintained, archived, deleted? By whom?
  • Which Analytics reports/forecasts, AI systems, LLMs etc. require this data element?
  • What are the data quality rules, e.g., duplication, expected data quality levels etc.?
  • Do all corporate manuals/policies/training material etc. reference this concept consistently?
  • Which parts of the enterprise are exempt and why? Which systems do not receive this data even though they should?
  • Are there data interoperability requirements with partner organizations or regulators?
  • How is this concept relevant for the company’s internal and external compliance requirements?

So it’s no wonder that definitions for CDEs often end up being a page long, even if not all of them need to answer all of these questions. And developing them in cross-departmental working groups, even if leveraging existing authoritative definition documentation, is labour-intensive and can easily take a year or more to complete.

This is a challenge even for the most organized Chief Data Officer with the best team of Data Stewards, especially once you realize there could easily be about 1000 CDEs across all critical data domains (Customer, Product, Supplier, Employee, Asset, Material, Location, Financial, Inventory, Sustainability, etc.).

This is why ѻý has developed its Business Glossary Library with 900+ out-of-the-box business term definitions, based on real-world client work over many years. Organizations using these generic definitions would still need to tailor them to the specifics of their business structures and systems but could do this now in less than a third of the time, and to much higher levels of quality.

ѻý has partnered with Collibra and SAP in providing this content through a leading Data Catalog platform, and, where possible, mapped each definition to the relevant SAP S/4HANA data table and field name. This saves implementation teams many weeks of detective effort and increases implementation quality.

Let’s close with an optimistic note also around GenAI, which has already made a difference in the world of Business Glossaries. First, by augmenting the development of generic definition material, using sophisticated prompt engineering in tools such as ChatGPT, Gemini AI and Meta AI (each useful in different ways, and constantly improving).

But GenAI also enables the harvesting of existing definition material that lies scattered across a wide range of an organization’s data dictionaries, lexicons, policies, glossaries and data models. Better yet, most enterprises now have their own in-house GenAI platform covering its thousands of documents and effectively representing their corporate history, DNA and knowledge base.

GenAI can find and scrape relevant text passages more quickly and accurately than humans can, even if human oversight of the end product will never go away.

So, all of a sudden, the world of Business Glossaries has become easier, quicker and much more exciting. ѻý calls it GenBG. Come and talk to us.

About the author

Ralf Teschner

Global Data Trust Lead, ѻý & Data, ѻý
Innovative thought-leader with 32 years of experience in Enterprise Data Management consulting, based in London, UK. More than 15 years of building and leading Data Governance and Data Catalogue programmes at executive level at 20+ organisations across several industries. Responsible for delivering business value in collaboration with the Business and IT. Highly successful track record as a change agent with a focus on quality, customers, and growth. Core skills include: Business Glossary, Data Strategy, Data Catalogue, Data Quality, Data Protection, rigorous implementation, facilitating joint solution approaches across complex matrix organisations, team management, development of new methodologies.