| |

Ethical Challenges in AI: Fairness in LLMs

Fairness has emerged as a critical topic in artificial intelligence, particularly with the rise of large language models (LLMs). These advanced models, designed to handle diverse applications like customer support and content creation, have a growing impact on our daily lives. However, as they are trained on vast datasets, they often inherit the biases present in those corpora, raising significant concerns about fairness in their outputs.

As we harness the capabilities of LLMs, it is essential to prioritize ethical considerations in their development. We must strive for systems that not only excel in performance but also promote fairness and inclusivity. In this article, we will define the fairness and several metrics to measure bias. Then we will talk briefly about the right metric to use and a few ways to address the bias.

Understanding Fairness in AI

Fairness in artificial intelligence is a multi-faceted concept that seeks to ensure that models make unbiased and equitable decisions across different individuals and groups. Since AI systems often impact real-world outcomes — from job screenings to medical diagnoses — ensuring fairness is crucial to preventing discrimination and promoting equal opportunity.

To measure fairness in different contexts, several metrics have been developed, each with its own interpretation and applicability. Before understanding these metrics, let’s discuss their building blocks.

We measure the performance and fairness of the AI model through the confusion matrix. As shown in the figure below, the confusion matrix is a table which compares the actual values to the predicted values of a classification model.

Some of the critical metrics derived from the confusion matrix are:

  • True Positive Rate (Recall or Sensitivity): True Positive Rate (TPR) is the percentage of actual positives that the model correctly identified.

  • True Negative Rate (Specificity):  True Negative Rate (TNR), or Specificity, is the percentage of actual negatives that the model correctly identified.

  • Positive Predictive Value (Precision): Positive Predictive Value (precsion) is the he percentage of positive predictions that were actually correct.

With the above metrics in mind, we can now briefly describe the fairness metrics. Fairness metrics are typically classified into two main categories: group-based fairness and individual fairness.

  • Group-based fairness metrics: Group-based fairness metrics measure fairness across different groups or demographic categories, ensuring that no specific group is treated unfairly compared to others. The most common group-based fairness metrics are:
    • Accuracy: Accuracy in the context of fairness refers to ensuring that the overall accuracy of the model—the percentage of correct predictions (both positive and negative)—is similar across different demographic groups. The metric focuses on the balance of both True Positives (TP) and True Negatives (TN), ensuring that the model is equally accurate for all groups.
    • Demographic Parity: Demographic parity requires that a model’s outcomes be independent of sensitive attributes, such as race, gender, or age. An AI model satisfies demographic parity if the probability of being assigned a positive prediction is the same for all groups, regardless of their membership in a protected class. The model should have equal positive prediction rates (i.e., equal TP+FP) across all groups.
    • Equalized Odds: Equalized odds ensures that a model’s predictions are equally accurate across groups. It requires that the model’s True Positive Rate (TPR) and False Positive Rate (FPR) are the same across all groups, meaning the model should be equally good at identifying positives and equally bad at making false positive mistakes across groups.
    • Equal Opportunity: Equal opportunity is a relaxed version of equalized odds, focusing only on true positive rates (TPRs). It focuses specifically on ensuring that the model gives equal True Positive Rates (TPR) for all groups, meaning that the model is equally good at identifying positive outcomes across different demographic groups.
    • Predictive Parity: Predictive Parity ensures a model’s Positive Predictive Value (PPV) is the same across different demographic groups. In other words, it ensures that when the model predicts a positive outcome, the probability that this prediction is correct is equal for all groups.
    • Treatment Equality: Treatment Equality ensures the False Positive Rate (FPR) and False Negative Rate (FNR) are balanced across different demographic groups. It focuses on the relationship between these two errors and ensures that the model treats both groups similarly when it makes mistakes.
  • Individual fairness metrics: In contrary to the group-based fairness, Individual fairness focuses on ensuring that similar individuals (based on relevant criteria) receive similar outcomes, regardless of their group membership. The fundamental idea is that the two individuals who have similar attributes should receive similar treatment from the AI model.
    For the two individual x1 and x2, if their task-relevant features are similar, their AI model predictions Y1 and Y2 should also be similar. Mathematically, this can be defines as

The Right Fairness Metric

Fairness isn’t a one-size-fits-all concept. It’s like a puzzle with pieces that shift depending on the context, and the challenge lies in choosing the right piece for each scenario.

  • In healthcare, for example, we can’t afford a diagnostic tool to miss a disease in one group more often than in another. That’s why equalized odds are crucial—they ensure that diagnostic errors don’t disproportionately affect any demographic.
  • When it comes to hiring, the stakes are different. Here, the focus is on equal opportunity: everyone, regardless of background, should have the same chance to land a job, as long as they’re qualified.
  • Meanwhile, in financial lending, we care about predictive parity. A lending algorithm should accurately predict the likelihood of loan repayment for everyone, regardless of their demographic, ensuring no group is unfairly penalized or favored.

However, these fairness goals can sometimes clash. This theory has been named the impossibility theorem which shows in some cases, it’s mathematically impossible to satisfy all fairness criteria at once. For example, equalized odds and demographic parity cannot be satisfied simultaneously in every scenario.
So, we can conclude that achieving fairness requires balancing competing goals, shaped by the ethical values unique to each situation. A choice for the right metric depends on the task.

Sources of Unfairness in LLMs

The Unfairness in LLMs can arise from many different factors. Some of them are

  • Bias in training data: LLMs learns from vast amounts of text data. These text data often contain biased or unrepresentative information. LLMs learn and perpetuate these biases.
  • Algorithmic Design Choices: Model development and design choices play a key role in shaping the fairness of artificial intelligence systems. Decisions such as the choice of algorithms, optimization objectives, and how models are evaluated can influence how biases are addressed or overlooked. For instance, prioritizing accuracy over fairness may lead to unequal performance across different groups.

Mitigating discrimination and bias in LLMs

To mitigate discrimination in LLMs, we have several measures at the different stages of the model’s lifecycle – from data collection to deployment.

  1. Bias Detection and Measurement: The most important step in mitigating unfairness in LLMs is detecting and measuring fairness using different metrics. This requires evaluating the model’s output for disparities in treatment or outcomes across different demographic groups, such as gender, race, etc.
    One common way is to prompt the model with a similar context along with sensitive demographic groups such as gender, race, etc. The model’s output is analyzed based on the different attributes such as male, or female for gender. One such prompt collection is DiscrimEval1 and you can read our accompanying blog article where we use it to compare different LLM providers here. It contains decision questions related to loan approval, organ transplantation, or visa. The model is instructed to reply to the decision question with a “yes/no”.
  2. Data Preprocessing and Curation: Since the output of the LLMs is based on the training data, the quality and representativeness of that data are critical to ensuring fairness. Debiasing and data augmentation are two important methods to preprocess the data. Cleaning the training dataset to remove the imbalanced representation or harmful stereotypes will help the models learn from the balanced examples. Also, data augmentation can expand datasets to have representative samples from underrepresented groups.

Ensuring Fairness with Validaitor

At Validaitor, we’re committed to making AI systems fairer and more accountable. By rigorously testing LLM-based applications across diverse prompt collections, we help identify and address potential biases early. Our goal is to ensure that these models perform equitably across all demographics, aligning with the right fairness metrics for each application.

References

  1. Anthropic Discrim-Eval,
    https://huggingface.co/datasets/Anthropic/discrim-eval

Similar Posts