New named entities – SIT’s

Posted by

From a compliance perspective, sensitive information types or SIT’s are very important. These types are used to detect specific types of information. This detection is used for data loss prevention, automatically applying sensitivity labels (see below), setting retention labels, and more.

The list of built-in sensitive information types has grown over the last couple of years. Passport numbers, credit card numbers and other such information has been there from the beginning. Microsoft added more types directly related to IT, for example, IP addresses and the Azure Redis cache connection string. For the complete list, see: https://docs.microsoft.com/en-us/microsoft-365/compliance/sensitive-information-type-entity-definitions?WT.mc_id=EM-MVP-5003084 

The now general available enhancements (called named entities) include sensitive information types directly related to personal identifiable information (or PII) and medical information. So, let’s take a look.

Personal identifiable information

Here in the EU, personal identifiable information or PII is defined as:

‘Personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person.

https://gdpr.eu/eu-gdpr-personal-data/

Here in The Netherlands, we use the terms “Persoonsgegevens” – which is the first part of the definition above- and “Bijzondere persoonsgegevens” – which is the second part. Either way, this information is used to identify a person. In Microsoft 365 this was somewhat limited to information on passport- and identity card numbers. This resulted in organizations creating their own sensitive information types for things like addresses and names. But this is no longer needed.

Microsoft has released new sensitive information types called Named entities. These are complex dictionary and pattern-based classifiers (info from the Microsoft site). In all, there’s 52 of them. Most are individual types (Netherlands Addresses for example) and others contain a collection of types (All Addresses for example). In Microsoft terms, these are Unbundled and Bundled information types. The bundled types have an easy to remember name: they all start with All.

The types are for detecting names, addresses, and specific medical terms.

Names

You can use the All Full Names information type to detect any person’s name. But beware that this is limited to Australia, China, Japan, U.S, and countries in the EU. When I tested this one, I did notice that having more than one space between the first and last name was problematic. It did not detect the name. But I guess that’s a use-case that will not be too relevant.

Physical addresses

These information types are either unbundled (All Physical Addresses) or bundled. For example, you can select the Netherlands Physical Addresses.

And this works great. Every address I tried was detected by this information type. These are just random, non-existing, addresses 🙂

Medical information

Medical information has been part of the information types for some time. These were (English) dictionaries for medical classifications. These have now been enhanced with:

  • Medical specialities (such as ‘dermatology’);
  • Lifestyles (such as ‘smoking’.) – dodgy this one….imoh;
  • Blood test (such as ‘hCG’);
  • Brand medication (such as ‘Tylenol’);
  • Generic medication (such as ‘acetaminophen’);
  • Types of medication (such as ‘insulin’).

All of these types are English only! Very relevant information types in my opinion. Although the lifestyle might raise some eyebrows.

The Netherlands

As I’m from The Netherlands, I also wanted to look at the specific NL information types that we can now use. And these have been enhanced as well. Of course, there’s the physical address, but also the tax identification number and value added tax (BTW) number has been added or can now be used.

New templates

All of the sensitive information types can be used in functions like data loss prevention and sensitivity labels. And for this, Microsoft has added some new templates. For example, in the Privacy category, you can now select specific Enhanced templates. There not only include the basic information types like passport numbers but also include the new named entities. Which is really cool stuff.

Wrapping up

These new sensitivity types really add a lot to the Microsoft Compliance offerings. I especially like the names and physical address types. But the medical types will be very useful for medical organizations as well. If you want to read more on this subject, then please go to: https://docs.microsoft.com/en-us/microsoft-365/compliance/named-entities-learn?WT.mc_id=EM-MVP-5003084 

One comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s