Skip to content

High availability

Bogdan Gavril edited this page Feb 22, 2021 · 47 revisions

Use the latest MSAL

MSAL is used by many applications, both Microsoft internal and external. We treat scalability issues as high priority bugs. Using the latest version of the library ensures it has the latest scalability improvements.

Use the token cache

Default behaviour: MSAL builds an internal in-memory cache of tokens when it fetches tokens from AAD.

Recommendation: Make sure you call AcquireTokenSilent method which first checks the cache for a valid token before making a call to AAD. For service-to-service calls, AcquireTokenForClient uses the token cache by default.

Default behaviour: Each PublicClientApplication (PCA) or ConfidentialClientApplication (CCA) maintains a token cache in memory.

Recommendation: Do not re-create a new application object for each token request, unless cache serialization is enabled (see next point), because once the application instance is garbage collected, the internal cache will be disposed of too.

Default behaviour: Using a singleton ConfidentialClientApplication tends to lead to slower performance, as each request has to search through many cache items. Based on the performed testing AcquireTokenForClient takes ~5ms with 10k cache items and ~95ms with 100k cache items. Performance testing wiki.

Recommendation: General strategy is to partition the cached data into smaller equal buckets, so that search is performed across the fewer number of cache items. The partition key could be anything that can equally divide the tokens, for example, tenant ID, client ID. Examples:

  • Instead of one CCA, create a collection of CCA instances partitioned by your chosen data point.
  • Add a serialization cache to the CCA instance, and use your chosen partition key as a key into the cache. This will make sure the internal CCA cache is reloaded with only the partitioned data (and not the whole set).

Default behaviour: MSAL maintains a secondary ADAL token cache for migration scenarios between ADAL and MSAL. ADAL cache operations are very slow. Recommendation: Disable ADAL cache if you are not interested in migrating from ADAL. This will make a BIG perf improvement - see perf measurements here.

Add WithLegacyCacheCompatibility(false) when constructing your app to disable ADAL caching.

Serialize your token cache

Default behaviour: MSAL caches the tokens in memory. Each ConfidentialClientApplication instance has its own internal token cache. In-memory cache can be lost, for example, if the object instance is disposed or the whole application is stopped.

Recommendation: Use the cache read and write callbacks, exposed by MSAL, to persist the internal cache. Customers have reported good results when using Redis and other distributed cache stores. Details here.

Retry Policy

Default behaviour: MSAL will retry failed 5xx requests once, then block similar requests for 1 minute.

Recommendation:

  • Add your own retry logic around AcquireToken* methods, using a library like Poly.
  • ESTS may reply with a 429 Too Many Requests that contains a Retry-After header. Make sure to obey this value, otherwise you will get throttled. More details about Retry-After

One Confidential Client per session

In web app and web API scenarios, it is recommended to use a new ConfidentialClientApplication on each session and to serialize in the same way - one token cache per session. This scales well and also increases security. The official samples show how to do this.

HttpClient

Default behaviour: An HttpClient is created for each PublicClientApplication / ConfidentialClientApplication. This does not scale well for web sites / web API where we recommend to have a ClientApplication object for each user session.

Recommendation: Provide your own scalable HttpClientFactory. On .NET Core we recommend that you inject the System.Net.Http.IHttpClientFactory. This is described in more detail here.

Pro-Active Token renewal

Goal

Increase application availability by issuing longer lived access tokens and implementing a pro-active renewal strategy.

Status quo

By default, AAD issues access tokens with a 1h expiration. MSAL considers the token as expired 5 min before and will request a new token from AAD. This is a silent operation. If an AAD outage occurs when a refresh is needed, MSAL will fail. The failure propagates to the calling application and impacts availability.

Pro-active token renewal

To overcome this MSAL tries to ensure than an app always has fresh tokens. AAD outages rarely take more than a few hours, so if MSAL can guarantee that a token always has at least a few hours of availability left, the application will not be impacted by the AAD outage.

Conditions for use

  • Use MSAL.NET
  • Configure a token lifetime of more than 1h

Then observe the refresh_in field in the response from ESTS:

image

Internal MSAL Algorithm

  • ESTS issues expires_in and refresh_in along with a token response. These are in seconds. They are cached by MSAL.
  • AcquireTokenSilent, AcquireTokenForClient, AcquireTokenOnBehalfOf looks in the token cache for a token
  • If a token is found, it is not expired and should be refreshed, then MSAL will make a call to ESTS to get a new token.
  • If this call fails because ESTS is down (i.e. HTTP 5xx error), then MSAL will return the old token, which is still valid.
  • Note that MSAL only refreshes the token when the app calls one of these AcquireToken* methods. It does not schedule background requests, as it would not be feasible for most application types.
  • AcquireTokenOnBehalfOf does not implement this logic yet - tracking issue

Getting started with MSAL.NET

Acquiring tokens

Desktop/Mobile apps

Web Apps / Web APIs / daemon apps

Advanced topics

News

FAQ

Other resources

Clone this wiki locally