Data is the new currency of digital transformation. Whether it’s providing new insights, improving decision making, or driving better business outcomes, enthusiasm for unlocking the power of data has never been greater. Internally at Microsoft, our data governance practices are essential in helping ensure that data at Microsoft is optimized for any use—enabling deeper insights across our organizational and functional boundaries.
In the simplest terms, data governance is about managing data as a strategic asset. It involves ensuring that there are controls in place around data, its content, structure, use, and safety. To provide effective data governance, we need to know what data exists, whether the data is of good quality, whether the data is usable, who’s accessing it, who’s using it, what are they using it for, and whether the use cases are secure, compliant, and governed.
As modern business is embracing advanced analytics, artificial intelligence, and machine learning, the amount, velocity, and variety of data is increasing. With all that data comes a wealth of new possibilities, and a new set of challenges. Our ability to optimize the management and governance of ever-greater amounts of data is essential.
Different data types require different controls to ensure that systems handle, store, and use the data correctly. The traditional top-down method Microsoft Digital Employee Experience (MDEE) was using for data governance wasn’t scalable. It left us little time to more than reactively address data issues as they occurred. We needed a scalable approach that could use automated controls, engineered into the process, to address the root causes of data issues during every stage of the data lifecycle.
Our approach to data governance
Rather than viewing data governance as a blocking function, or a gatekeeper in the enterprise, MDEE saw data governance modernization as way to democratize data responsibly. Widely accessible, trusted, and connected enterprise data makes intelligent experiences possible, and powers the wider digital transformation at Microsoft.
We are transforming how we provide data governance, to introduce scalable, automated controls for data architecture, lifecycle health, and advancing its appropriate use. As illustrated below, modern data governance is the foundational pillar upon which Microsoft has built its overall Enterprise Data Strategy.
We created our overall Enterprise Data Strategy in response to an increasing demand for the right intelligence to power experiences at every touchpoint inside and outside Microsoft. At the same time, the increased demand amplified the pressure to better govern the data and manage regulatory requirements across an ever-expanding data landscape. Trying to address data issues as they arose—one at a time—was expensive and inefficient. Without a centralized, scalable, and automated way to address the root causes of these data issues, our analytics capabilities would continue to decline. As would our user satisfaction rating for Microsoft’s data-centric apps.
We developed a more modern data governance strategy with five goals in mind:
- Reduce data duplication and sprawl by building a single Enterprise Data Lake (EDL) for high-quality, secure, and trusted data.
- Connect data from disparate silos in a way that creates opportunities to use that data in ways not possible in a siloed approach.
- Power responsible data democratization across Microsoft.
- Drive efficiency gains in the processes Microsoft employs to gather, manage, access, and use data.
- Meet or exceed compliance and regulatory requirements without compromising Microsoft’s ability to create exceptional products.
Our approach to modern data governance has two key components. First, we embed clear data standards and build them into our application development process. This move helps us automate and proactively manage data governance issues and data policy compliance. Second, we leverage the EDL platform, to centralize and systemically scan and monitor the data.
Creating a clear set of data standards built into the engineering process
Much of our early effort focused on creating the formalized data standards that we wanted to build into the engineering process. It was natural for us to look to our core strength—engineering—when addressing business problems. For every formalized data standard, we then drive it into our modern engineering process. Having clear data standards and providing compliance measurements against those standards is key to our change management approach for data governance.
Microsoft Azure DevOps helps auto-generate and manage the data governance backlog
After authoring data standards, we then used Microsoft Azure DevOps (ADO)/Microsoft Visual Studio to automate the ways our systems generate, assign, and track data governance. For example, when an engineering project reaches a certain milestone, we have the application owner complete a data governance assessment. That assessment results in automatically generated work items in the project’s backlog.
Measuring our compliance against the data standards
To measure the progress of our data governance efforts, we are defining the metrics that matter to create Microsoft Power BI-based scorecards that explicitly show data standards alignment. For each standard, the central data governance office will actively monitor assessment exceptions, so that application owners can complete their required data governance work.
Centralizing data in the Enterprise Data Lake
As part of Microsoft’s Enterprise Data Strategy, we have been making key investments in the modern data foundations that enable modern data governance’s role in ensuring the responsible democratization of data. Centralizing data assets is key in reducing the amount of redundant and outdated copies, understanding who has access, and understanding how they are using the assets. Data governance optimizes our infrastructure resources and uses services and automation to proactively scan data for potential issues, rather than reacting to issues as they occur.
We have begun moving data from disparate sources across Microsoft into our Enterprise Data Lake (EDL). The EDL is built on Azure Data Lake Storage and leverages Azure Data Services. The EDL not only consolidates the data, it also creates a centralized source of truth where enterprise data can be collected, shaped into trusted forms, secured, made accessible, and managed by applicable governance controls. Moving everything to a single EDL enables scalable, systematic data scanning without having to individually scan thousands of databases across the enterprise.
Scalable and automated engineering solutions help proactively manage data governance
Microsoft integrates automated and scalable services into the EDL. These services help proactively automate data management, data quality management, data security, data access management, and compliance. This integration means various teams that are onboarding to the EDL don’t have to invest in engineering solutions to benefit from the built-in services and automation—they are applied consistently across all data.
Scanning for data issues in the Enterprise Data Lake
Regular scanning in the EDL finds data issues so they can be fixed and then prevented at the systems of record and systems of engagement. We are building out proactive solutions through engineering checks and guardrails directly into our processes. These moves help prevent data governance issues by design. The EDL’s capabilities and services include built-in scanning for data security, access management, compliance, and a host of other defined data controls. Not only does the data foundations team get notifications of compliance violations, the data publishers receive them as well.
The Enterprise Data Catalog improves discoverability
To provide effective data governance we need a full view of all data assets. We need to know where the assets exist, who is accessing them, and how users are interacting with the data. This visibility is needed for managing fragmentation, sprawl, and redundant or outdated copies of data assets that can exist across multiple platforms.
The Enterprise Data Catalog helps drive data governance. It does so by building controls into the catalog’s data-discovery process. These controls ensure that only people with the appropriate need and authority can access sensitive data stored in the EDL. This promotes compliance with government regulations through processes, patterns, and tools for data management and governance of data assets. The EDL metadata service sends metadata published to the EDL to the catalog for discovery. The service also registers broader data sources—transactional data systems, retention policies, and master data, for example—in the catalog.
Modern governance with assessment-based models and evidence-based results
At Microsoft, we find evidence-based flagging is the most compelling way to incent data producers and/or data owners to address the underlying gaps that cause data issues. Thus, “evidence at scale” is the fundamental reason we’ve modernized our data governance program around the two-pronged approach of embedded data standards coupled with a scannable EDL platform. Using this new approach, we can detect data issues before they metastasize and engage and drive data compliance with multiple organizations at once. We’re able to use scanners to show engineers where data compliance gaps exist before data products get published into production. And most importantly, we can sustain this model because it’s simply part of the everyday rhythm of the business.
Things to consider when planning your own data governance strategy
Though it’s early in our journey toward modern data governance, we do have a few best practices to share. Primarily, we recommend that you address your data governance strategy holistically. As illustrated below, we designed our approach so that standards, embedded into the engineering process and data centralization on the modern data foundation worked together to ensure end-to-end modern data governance.
- Build standards into your existing process and implement them as engineering solutions. By approaching data governance during the design phase of the larger Enterprise Data strategy, we have been able to institutionalize “governance by design” into the engineering DNA—and apply it to data at every touchpoint. We are building our data governance controls into the centralized analytics infrastructure and analytics processes.
- Consider implementing a modern data foundation with integrated toolsets. The EDL, with its built-in governance services and capabilities, does more than scale data governance efforts—it enables enterprise analytics for the whole organization. You can plan for federated analytics upfront by using a shared data catalog and data lake platform as your centralized analytics infrastructure. By centralizing data and bringing compute to the data rather than the other way around, you can reduce the amount of duplicated or fragmented data.
- People and processes are just as important as tools and infrastructure. We are embracing and promoting a data culture mindset. MDEE is encouraging business and data owners across the company to onboard their data into the EDL. It can be challenging to buy into using a new platform and new processes, particularly when business owners and data owners feel like what they have is working for them. We commonly use a variety of methods, including communication campaigns and gamification, to drive early adoption at Microsoft. Measuring and reviewing daily and monthly active usage is also helpful during mid and late-stage adoption. MDEE has been encouraging adoption by providing evidence-based results that demonstrate adopting our modern data governance strategy can prevent root cause data issues.
Organizations have historically treated data governance as a set of processes, reactive measures, and guardrails that were applied to, yet separate from, the data itself. Creating data standards, engineering them into our processes, and moving data into the EDL with built in services for data management has provided measurable benefits in scaling Microsoft’s approach to governance.
From an IT perspective, Microsoft’s Enterprise Data Strategy helps control data sprawl and reduces infrastructure cost. It does so by limiting data copies and by better managing the data estate. For data owners at Microsoft, MDEE makes data easier to connect to and consume, while increasing trust in the data and the systems that host it.
We are realizing our vision for providing world-class modern data governance and effectively improving our data compliance posture by moving away from the traditional reactionary processes. We can now engineer data compliance into every part of the process—from applying embedded standards to new projects before collecting or storing data, to proactively scanning for issues as changes occur in the EDL. We are automating compliance measurement and reporting. That automation enables MDEE to provide evidence-based results to business process owners, suppliers of data, and data owners across the company.