These days many projects start quite small but can grow very quickly. At the start of a project, the structure of all the sources is still clear. But as the sources get bigger and more different types are used, the structure of your information flow becomes more unclear.
People who start halfway through a large project have a lot of trouble understanding this information flow and are therefore slower to get started. Data is often managed by a person. Because of that there is a risk of inconsistency and inaccuracy in the analysis of the data. In addition, the management of this data is sometimes neglected as the project gets older.
At this point, the above issues are mostly due to lack of time. Many of these processes are extremely time-consuming and people are looking for a system that can make all these processes to go a lot faster. A subsequent problem with this is that personal and sensitive data can be overlooked due to that time constraint. This can have major consequences for privacy and compliance with laws and regulations. As a result, there is a need for a system that extracts this information from the data or identifies it as sensitive information.
Within Microsoft Azure we have come to the tool Azure Purview. Azure Purview is a new Data Governance tool from Microsoft released in November 2021. At the time of writing, the tool is still fairly new. This means that some parts of Azure Purview are still in preview, so we can’t yet explore the full potential of the system. But it is clear that it offers great possibilities for the future.
Azure Purview has a feature that makes Data Discovery very easy. All assets that have been scanned by Purview can be retrieved via the Data Catalog. Here you can find data such as the lineage and the schemas of these assets. The lineage of an asset shows where this information comes from (the source) and looks at where this information will end up (the destination).
The diagrams in the Data Catalog are also very useful. From this page you can easily debug if, for example, a data type has been changed somewhere while this was not intended.
The amount of metadata that Purview retrieves from your sources is also very large. For example, Purview will already create insights that show in a clear way what type of data you have in your sources.
To manage the large amount of data, you can assign stewards and experts to the assets. These persons are then responsible for the maintenance of these assets. Colleagues then immediately know which people are responsible and who the experts are for a particular asset.
Stewards and experts can also be assigned to a Glossary Term. This is done to quickly give all matching assets its stewards and experts. These are a few small features that make Azure Purview save a lot of time.
In Azure Purview we can create different collections of data sources. These collections are usually classified according to the different departments of the organization. To ensure that everyone cannot view all data, we can assign roles per collection to which we assign people. This will result in each person being able to view only the data that is useful to them, automatically increasing the security of the system as well.
As discussed earlier, it is necessary that the system does not allow sensitive information to pass unnoticed. To counter this, Azure Purview works with a classification system. It has more than 200 different default classifications in the system that are looked at every single time a scan is performed. If this is not enough, it is also possible to add classifications yourself based on patterns you define. When something is found that might cause problems, this is indicated with the corresponding label for consideration by the responsible people.
Azure Purview is a very handy tool if you have a lot of data that comes from different sources, especially third-party sources. It gives you a quick overview of all the data that is used in the project. This can be particularly useful for people that are starting mid-project. The lineage of an asset is an easy way to see the origin and destination of the information in this asset. Data analysts and the business side of a company can learn a lot from the data insights Purview creates. Here they can quickly see the types and the number of sources they have to work with. They also see what kind of data they work with. Purview classifies the data in your sources based on classification rules. Furthermore, there is an easy way to see what glossary term is dominant over the other and how they are distributed over all the assets.
Luuk Op ‘t Hoog & Matthias Vanermen