Building Prompt Injection Detector with Text Embeddings and LogisticRegression

In this post, we will discuss how to build a Prompt Injection detector using a simple classification task with Scikit-learn’s Logistic Regression. Logistic Regression is a statistical method for binary classification problems. It helps predict situations with only two possible outcomes.

We will use SPML Chatbot Prompt Injection Dataset for input data.

Install the following libraries:

pip install datasets
pip install sentence-transformers
pip install scikit-learn

We will start by loading the dataset

from datasets import load_dataset
dataset = load_dataset("reshabhs/SPML_Chatbot_Prompt_Injection")

Let’s look at the dataset

dataset
DatasetDict({
    train: Dataset({
        features: ['System Prompt', 'User Prompt', 'Prompt injection', 'Degree', 'Source'],
        num_rows: 16012
    })
})

This displays the dataset structure. There are 16,012 records in this dataset, each with five columns:

  • System Prompt
  • User Prompt
  • Prompt injection
  • Degree
  • Source
Continue reading “Building Prompt Injection Detector with Text Embeddings and LogisticRegression”