2023 was a tremendous year for AI advancements. Almost every week, we were bombarded by the latest innovation in AI. Large language models (LLMs) were in particular focus, and state-of-the-art capabilities and demonstrators were released at blazing speed in a race to the top of the mountain. During the middle of a plethora of announcements, Microsoft CEO Satya Nadella said, “I hope that with our innovation, they will definitely want to come out and show that they can dance, and I want people to know that we made them dance.”
If you’re not one-up, you’re one-down.
With AI momentum picking up and new business use cases being revealed each month, organizations big and small are weaving AI strategy into the upper echelons of the CTO agenda. Rightfully so. Every organization should start investing in AI capabilities and build a progressive roadmap for the enablement of this technology through each business segment.
As you start to consider your goals and strategy for enabling AI in your organization, there are many use cases that a few hundred billion or trillion parameter GPT model can unlock while leveraging general, pre-trained context to automate business processes or solve a specific problem. However, when you start to consider the nuances of your business models, your intellectual property, the decades of institutional knowledge that you’ve collected through blood, sweat, and tears, you’ll want to consider something else as well: your data.

Data serves as the lifeblood of LLMs, shaping understanding, validating context, and building relationships between massive amounts of data points. Likewise, by bringing your enterprise data and knowledge to LLMs, you can achieve a higher degree of relevancy in data retrieval. Fine-tuning, prompt engineering and using retrieval augmented generation (RAG) patterns are excellent ways to ground LLM results in the context of your business and within the parameters of your use case. By bringing your own data, you’re able to get better precision and accuracy on the outputs.
However, bringing your data can be challenging. Many organizations have knowledge silos where critical business data is isolated apart from complimentary, value-added data sets. In this example, a centralized data platform or data mesh could help alleviate this silo and provide the right data to the right use cases.
Data silos are just one of many things to consider as part of the data pillar in your journey. The data platforms your organization uses should be at the top of the list as well. You should ask critical questions: Can the platform scale to meet the needs of an intelligent application? Does the platform have the governance and security needed for proper data management? Are the data engineering tools advanced enough to work seamlessly with modern AI applications? Is there a capability to monitor and evaluate data quality?
Customers I work with routinely talk about the time spent in data preparation and cleansing. It’s not uncommon for data wrangling to consume a significant amount of time from data scientists. Your data is not as clean as you think. When incorporating your enterprise data into AI models, verifying the accuracy of the data you intend to use is essential.
We’ll likely continue to see a rapid evolution in AI as time progresses, but despite the latest advances on what AI can achieve, don’t forget to dance with your data as you embark on your strategy. It’s non-negotiable.
What are your thoughts? What are the steps you take in preparing your data for AI models? I would love to hear your insights.
