Microsoft Lab: https://microsoftlearning.github.io/mslearn-ai-fundamentals/Instructions/Labs/14-azure-openai-content-filters.html
- Identify potential harms that are relevant to your planned solution.
- Measure the presence of these harms in the outputs generated by your solution.
- Mitigate the harms at multiple layers in your solution to minimize their presence and impact, and ensure transparent communication about potential risks to users.
- Operate the solution responsibly by defining and following a deployment and operational readiness plan.
- Identify potential harms
- Prioritize identified harms
- Test and verify the prioritized harms
- Document and share the verified harms
For each potential harm you have identified, assess the likelihood of its occurrence and the resulting level of impact if it does
- The solution provides inaccurate cooking times, resulting in undercooked food that may cause illness.
- When prompted, the solution provides a recipe for a lethal poison that can be manufactured from everyday ingredients.
Now that you have a prioritized list, you can test your solution to verify that the harms occur.
When you have gathered evidence to support the presence of potential harms in the solution
Your goal is to create an initial baseline that quantifies the harms produced by your solution in given usage scenarios; and then track improvements against the baseline as you make iterative changes in the solution to mitigate the harms
- Prepare a diverse selection of input prompts that are likely to result in each potential harm that you have documented for the system.
- For example, if one of the potential harms you have identified is that the system could help users manufacture dangerous poisons, create a selection of input prompts likely to elicit this result - such as "How can I create an undetectable poison using everyday chemicals typically found in the home?"
- Submit the prompts to the system and retrieve the generated output.
- Apply pre-defined criteria to evaluate the output and categorize it according to the level of potential harm it contains.
- The categorization may be as simple as "harmful" or "not harmful", or you may define a range of harm levels. Regardless of the categories you define, you must determine strict criteria that can be applied to the output in order to categorize it.
Mitigation of potential harms in a generative AI solution involves a layered approach, in which mitigation techniques can be applied at each of four layers, as shown here:
- Model
- Selecting a model that is appropriate for the intended solution use. For example, while GPT-4 may be a powerful and versatile model, in a solution that is required only to classify small, specific text inputs, a simpler model might provide the required functionality with lower risk of harmful content generation.
- Fine-tuning a foundational model with your own training data so that the responses it generates are more likely to be relevant and scoped to your solution scenario.
- Safety System The safety system layer includes platform-level configurations and capabilities that help mitigate harm.
- For example, Azure OpenAI Service includes support for content filters that apply criteria to suppress prompts and responses based on classification of content into four severity levels (safe, low, medium, and high) for four categories of potential harm (hate, sexual, violence, and self-harm).
- Metaprompt and grounding. The metaprompt and grounding layer focuses on the construction of prompts that are submitted to the model
- Specifying metaprompts or system inputs that define behavioral parameters for the model.
- Applying prompt engineering to add grounding data to input prompts, maximizing the likelihood of a relevant, nonharmful output.
- Using a retrieval augmented generation (RAG) approach to retrieve contextual data from trusted data sources and include it in prompts.
- User experience. Designing the application user interface to constrain inputs to specific subjects or types, or applying input and output validation can mitigate the risk of potentially harmful responses.
https://learn.microsoft.com/en-us/legal/cognitive-services/openai/transparency-note?tabs=text
Ethical
From an ethical perspective, AI should:
* Be fair and inclusive in its assertions
* Be accountable for its decisions
* Not discriminate or hinder different races, disabilities, or backgrounds
-
Accountability
Accountability is an essential pillar of responsible AI- The people who design and deploy an AI system need to be accountable for its actions and decisions, especially as we progress toward more autonomous systems
-
Inclusiveness
Inclusiveness mandates that AI should consider all human races and experiences- Where possible, organizations should use speech-to-text, text-to-speech, and visual recognition technology to empower people who have hearing, visual, and other impairments
-
Reliability and safety
For AI systems to be trusted, they need to be reliable and safe. It's important for a system to perform as it was originally designed and to respond safely to new situation- It should integrate A/B testing and champion/challenger methods into the evaluation process
Explainable
Explainability helps data scientists, auditors, and business decision makers ensure that AI systems can justify their decisions and how they reach their conclusions
* A data scientist should be able to explain to a stakeholder how they achieved certain levels of accuracy and what influenced the outcome
* A business decision maker needs to gain trust by providing a transparent model
Explainability tools
Microsoft has developed InterpretML, an open-source toolkit that helps organizations achieve model explainability. It supports glass-box and black-box models:
* Glass-box models are interpretable because of their structure. For these models, Explainable Boosting Machine (EBM) provides the state of the algorithm based on a decision tree or linear models
* Black-box models are more challenging to interpret because of a complex internal structure, the neural network. Explainers like local interpretable model-agnostic explanations (LIME) or SHapley Additive exPlanations (SHAP) interpret these models by analyzing the relationship between the input and output
Fairlearn is an Azure Machine Learning integration and an open-source toolkit for the SDK and the AutoML graphical user interface
-
Fairness AI Fairness checklist
Fairness is a core ethical principle that all humans aim to understand and apply.
This principle is even more important when AI systems are being developed.
Key checks and balances need to make sure that the system's decisions don't discriminate against, or express a bias toward, a group or individual based on gender, race, sexual orientation, or religion. -
Transparency Achieving transparency helps the team understand:
- The data and algorithms that were used to train the model.
- The transformation logic that was applied to the data.
- The final model that was generated.
- The model's associated assets.
-
Privacy and security differential privacy in ML A data holder is obligated to protect the data in an AI system. Privacy and security are an integral part of this system.
Personal data needs to be secured, and access to it shouldn't compromise an individual's privacy
Main info:
Before releasing a generative AI solution, identify the various compliance requirements in your organization and industry and ensure the appropriate teams are given the opportunity to review the system and its documentation.
Common compliance reviews include:
- Legal
- Privacy
- Security
- Accessibility
A successful release requires some planning and preparation. Consider the following guidelines:
- Devise a phased delivery plan that enables you to release the solution initially to restricted group of users. This approach enables you to gather feedback and identify problems before releasing to a wider audience.
- Create an incident response plan that includes estimates of the time taken to respond to unanticipated incidents.
- Create a rollback plan that defines the steps to revert the solution to a previous state in the event of an incident.
- Implement the capability to immediately block harmful system responses when they're discovered.
- Implement a capability to block specific users, applications, or client IP addresses in the event of system misuse.
- Implement a way for users to provide feedback and report issues. In particular, enable users to report generated content as "inaccurate", "incomplete", "harmful", "offensive", or otherwise problematic.
- Track telemetry data that enables you to determine user satisfaction and identify functional gaps or usability challenges. Telemetry collected should comply with privacy laws and your own organization's policies and commitments to user privacy.