Azure Information Protection Scanner

Posted by

It has been a while since I last wrote about the Azure Information Protection Scanner. I still love the functionality, although there’s always some room for improvement.

Not too long ago Microsoft released the Unified Labeling client which supported using the scanner. So now you can use the scanner to scan, classify and protect documents on your on-premises file shares, NAS and SharePoint environments using the Microsoft 365 sensitivity labels.

ul-scanner-arch

For those of us who have not seen or used the scanner, I decided to create a little video on this. I hope you enjoy 🙂 But I also want to share some more information.

Terminology

As you can see in the figure above, there’s a couple of specific components used when configuring the scanner. Let’s take a look.

This server is the scanner itself (also called a Node). When you’re going to configure the scanner, you will need some components. For one, you will need a Windows Server. On this server you will install the Unified Labeling client. Using PowerShell you install (or upgrade) the scanner itself. When installing the scanner for the first time, a SQL database is created to store the policies. During the installation process, a Windows Server service is created, called Azure Information Protection Service, which runs using an Active Directory account.

Nodes
Nodes

You can have multiple servers or nodes, which are all part of a Cluster. This cluster is configured to use a  Content Scan Job. A content scan job contains the overall policy settings of the job (it’s run schedule for example) but also the content repositories to be scanned by the cluster.

Repositories
Repositories

Components required

In order to get the scanner to work, you will need several components. Most are straight forward:

  • Windows Server
  • SQL Server of Express
  • An Active Directory account, with access to the repositories
  • An Azure Active Directory account with the Azure Information Protection licence
  • An Azure Application registration with the required permissions
  • Azure Information Protection UL client

More information: https://docs.microsoft.com/en-us/azure/information-protection/deploy-aip-scanner-configure-install

Scanner service account

Some quick notes on the account. There are three ways of configuring the account which runs the scanner. All of these require the account running the scanner to have access to the repositories.

  1. The preferred method is using an Active Directory account which is synched to Azure AD and has the AIP license;
  2. The alternative (example below) is to use an Active Directory account which is not synched to Azure AD that works “on behalf off” a separate account in Azure Active Directory;
  3. The last alternative is to use a local machine account that works “on behalf off” a separate account in Azure Active Directory. But this will not work for SharePoint on-premises repositories.

On the Windows Server, the scanner service account requires the Log on locally (for installation and configuration). This right can be removed when the scanner is working properly.  The Log on as a service is provided automatically during installation and is required throughout.

For the two alternatives you require an Azure Application Registration (or app registration) for the scanner to work in the background (unattended). This app registration will need specific access to Azure RMS and Microsoft Information Protection.  You will use a PowerShell cmdlet to connect this app-registration, the Azure Active Directory account and the scanner service account:

Set-AIPAuthentication -AppId <ID of the registered app> -AppSecret <client secret sting> -TenantId <your tenant ID> -DelegatedUser <Azure AD account>

For example:

$pscreds = Get-Credential CONTOSO\AIPScanner
Set-AIPAuthentication -AppId “77c3c14e-8b2b-4652836c8c66” -AppSecret “OAkk+rnuYc/u+]ah2kNxL4” -DelegatedUser AIPScanner@contoso.com -TenantId “9c11c87a-ac8b-46a3-8d5c-f4d0b72ee29a” -OnBehalfOf $pscreds

This should return a value of:

Acquired application access token on behalf of CONTOSO\AIPScanner.

But this cmdlet can only be run when other steps have finished. So let’s take a look at these.

Steps required

These are the steps to create the scanner:

  1. Configure the scanner in the Azure portal (cluster, content scan job);
  2. Install the scanner on the Windows Server;
  3. Configure the app registration in order to get an Azure AD token (if needed);
  4. Configure the scanner (finetune the content scan jobs).

Installing the scanner

When you install or update the scanner on the Windows Server, it will connect to a specific cluster – at which time it will download the content scan jobs into the SQL database. If the server isn’t connected to the internet, you can export the cluster and import this into the scanner using PowerShell.

For this installation process, you will use this PowerShell cmdlet:

Install-AIPScanner -SqlServerInstance <name> -Profile <cluster name>

For example:

Install-AIPScanner -SqlServerInstance SQLSERVER1 -Profile AzureIPScanCluster1

By the way: all the cmdlets in this post are included in the Azure Information Protection client installation 🙂

Now the scanner should be up and running and will appear in the Azure Information Protection dashboard – under Nodes. You can check if the Windows service Azure Information Protection Service is running. You can also use several PowerShell cmdlets to check this, for example:

  • Start-AIPScannerDiagnostics
  • Get-AIPScannerStatus

In all – if you have on-premises repositories you want to scan and classify, and you have the relevant licenses :-), then please check out the scanner.

20 comments

  1. Hi Albert, Thank you for the nice article. however, I have a question that which user should be login to system so and run this “Install-AIPScanner -SqlServerInstance -Profile ” command? I tried to install the aip scanner using a local machine user and get the token using a set-authentication cmdlet with Azure Ad user who has access to pull the policies. I got the token received after a successful run of “set-authentication” cmd but I get an error – “no policy found” on the Azure portal. Could you please point me if I am missing something here? ..Thank you.

    1. Hi Sunil,

      Thanks for your question. I’ve modified the post, as there are more options for configuring the account. In my initial blog I only mentioned one: an Active Directory account working on behalf off an Azure Active Directory account. And this works fine. But there are alternatives, like running using a local Windows account.

      I’ve added the information to the blog.

      As to your error. I think you mean the “Set-AIPAuthentication” cmdlet. Of so, do you have an Azure Information Protection policy and labels configured? You will need to have several labels configured. If not, you will not be able to use the scanner.

      By default, you should have a “Global” policy – without any labels in there : https://portal.azure.com/#blade/Microsoft_Azure_InformationProtection/DataClassGroupEditBlade/scopedpoliciesBlade

      But you can add these.

      I hope this helps. This article also provides more information:
      https://docs.microsoft.com/en-us/azure/information-protection/deploy-aip-scanner-prereqs#deploying-the-scanner-with-alternative-configurations

  2. Hi there,

    Great content,

    Was wondering if you have applied sensitivity label based of a csv?

    If so what is the best way to do this ?

    Thanks in advance.

  3. Hello Albert,

    First of all, thanks! It was a great tutorial. I have several doubts regarding the offline scenario. Please, as you have mentioned it, maybe you can help me with the configuration that I’m trying to do. The question is clear : Is posible to deploy AIP Scanner (only for labeling and classification) in a offline server? In the official Microsoft Doc:

    https://docs.microsoft.com/es-es/azure/information-protection/rms-client/clientv2-admin-guide-customizations#support-for-disconnected-computers

    Says that is possible, but it talks about “computers that cannot connect to the internet for a period of time”. I’m talking about no connection (even in a period of time). Totally 100% offline… It requiered at least a minimum gap of time connected to internet in order to get an Azure AD token for the scanner?

    Best regards and thanks in advance

    1. Hi Paul,

      Thanks for your reply. Yes, it is possible to run the scanner in offline mode for a longer period of time.
      The scanner will need to receive your (IRM) policies, label configuration, etcetera. For this you can export this configuration (contentjobs) and import this on the offline scanner. See: https://go.microsoft.com/fwlink/?linkid=2044886 and https://portal.azure.com/#blade/Microsoft_Azure_InformationProtection/DataClassGroupEditBlade/scannerProfilesBlade.

      When the scanner is not connected to the internet, it will not receive any updates to the policies and you will not be able to use the label explorer, statistics and more.

      But it should work.

  4. Hello Albert, thank you for the great post.
    I have a quick question – we are looking to label a document with unified labelling based on the custom property set in the advanced properties section of the office document, is this doable?
    If yes, we have millions of documents in the repository that we need to scan, can you please advise on best possible way to achieve this in an efficient manner?

    1. Hi Jet,

      The unified labeling solution looks for information in the document itself for detecting and classifying. It will not look at the (advanced) properties section of the documents. So I’m afraid I cannot help you with that. For scanning a large amount of document, I would recommend using multiple content scan jobs and repositories. You will need to be able to handle the output of the scanner, so breaking the jobs into smaller bits is recommended. If you want more information or help, then I highly recommend looking at this Yammer-group: https://www.yammer.com/askipteam/#/home It’s hosted by the Microsoft people responsible for the platform and many answers are given there 🙂

  5. Hi Albert, Thank you for the nice article.
    I performed the AIP Scanner installation properly. Can I assign “Confidential” label to only dwg extension files on File Server? Is something like this possible?

    Does auto-labeling only apply to text-based files?

    1. Hi there,

      Auto-labeling for documents at rest (SharePoint Online, Exchange Online or OneDrive for Business) can be used to classify and protect information in Office files (Word, PowerPoint, and Excel – in Open XML format) and PDF format. The latter is only supported for Exchange Online.

      All information protection auto-labeling functionality will look at the content in the files to determine the sensitivity. Unfortunately, this means that a file-extension will not do the trick I’m afraid. For supported file-types you could do this by using Powershell probably or using the unified labeling client. For example:

      Get-ChildItem C:\Projects\*.docx -File -Recurse | Get-AIPFileStatus | where {$_.IsLabeled -eq $False} | Set-AIPFileLabel -LabelId d9f23ae3-1234-1234-1234-f515f824c57b

      But I’m afraid that dwg is not on the supported file-type list.
      There is an integration option though. Please see the link.

      https://techcommunity.microsoft.com/t5/microsoft-security-and/classifying-and-protecting-computer-aided-design-with-microsoft/ba-p/967064
      https://docs.microsoft.com/en-us/azure/information-protection/rms-client/clientv2-admin-guide-file-types
      https://docs.microsoft.com/en-us/powershell/module/azureinformationprotection/set-aipfilelabel?view=azureipps

  6. How do you configure a cluster to use more than 1 scan job as you mentioned? I don’t see that option available. Once you assign a cluster to a content scan job, it is not available to other content scan jobs.

    1. Hi Jimmy,

      You are absolutely right and I was wrong in this article. The cluster can only contain one scan-job. If you need more jobs, you will need to add more clusters. I modified the article. Thanks for notifying me!

  7. Hi, I have an installation of AIP Scanner service that is not showing me the result of the Diagnostics though powershell. The error that In get is the following: “Start-AIPScannerDiagnostics : Some or all identity references could not be translated”
    Have you any idea of how to solve this? Thanks

    1. Hi Roland,

      The cmdlet needs to be run with the AIP Scanner account. Can you check if this is the case? According to Microsoft:

      “This cmdlet requires you to define a specific scanner account in the -OnBehalfOf parameter. The OnBehalfOf parameter requires you to run your PowerShell session as an Administrator.”

      I’ve used this cmdlet myself several times, without this error.
      But to be honest, I haven’t run this cmdlet for quite some time.

  8. Hi,
    why does AIP scanner require a license? Is it licensing problem or technical?
    In my lab, I installed AIP scanner to my server, configured it for manual classification ( didnt add an extra license to it and it seems it does work – it does provide some labeling)
    Or just users accessing the folder needs a license..

    How many scanners can run under one service account? If I have 70 scanners on my network, do i need to have 70 service accounts and for each account i need to assign a license?

    1. Hi Vladimir,

      Good questions. As the scanner needs to access the AIP (or now: Microsoft Purview Information Protection) back-end, it will require a license. And because it’s a form of auto-classification, this account does require a form of E5 to function.

      If you look at user-licensing – this is more tricky. Because Microsoft states that using the scanner is a form of auto-classification, any use that accesses documents scanned/classified by the scanner needs a form of E5 license. It doesn’t state it like this on Microsoft Learn though. Instead it just says this:

      “For the Microsoft Purview information protection scanner feature, Microsoft does not commit to providing file classification, labeling, or protection capabilities to users who are not licensed.”

      https://learn.microsoft.com/en-us/office365/servicedescriptions/microsoft-365-service-descriptions/microsoft-365-tenantlevel-services-licensing-guidance/microsoft-365-security-compliance-licensing-guidance#how-can-the-service-be-applied-only-to-users-in-the-tenant-who-are-licensed-for-the-service-18

      So if you have a specific folder that only specific users access, then these user will require the license. The scanners themselves can work with one account, so you won’t need several (as far as I know).

      Hopes this helps somewhat.

  9. Hi Albert,
    Great content and really helpful. Some information that I’ve been looking for but couldn’t find anywhere are;
    1) MS documentation says you can have multiple nodes and scan jobs under same cluster but I am unable to do this.
    2) When you install a second AIP scanner on another server, do I still use the same power shell command that run to install the first one? Will it create another database on SQL ? Can we use same service account or need a separate one? Do we need to create separate clusters and scan jobs for each additional nodes?

    Appreciate your time

    1. Hi there Mohammed,

      Thank you for your reply and kind words.
      Any (AIP) scanner will require one form of SQL Server (or local SQL environment) to store the configuration data. As far as I know, this will create specific database(s). These are used for all servers (nodes) – when these nodes are configured to use the same SQL environment. The information in the GUI can be found in the specific tables of these databases.

      Using the same service-account is recommended. Mostly, because this account needs to be granted access to the information and the Microsoft Purview Information Protection service.

      Clusters are mostly used when you really need to have the nodes separated. For example: when you have different timezones or regions. I would not set-up any more clusters if you don’t have this need.

      You can now configure the clusters and scan-jobs from the Purview compliance portal. PowerShell is still needed to add a server (node) to a specific cluster (Install-AIPScanner -SqlServerInstance -Cluster ). And the content scan jobs are added to the same cluster. So this should work.

      Hope this helps?

Leave a reply to Jet Cancel reply