In this post, we will discuss how to build a Prompt Injection detector using a simple classification task with Scikit-learn’s Logistic Regression. Logistic Regression is a statistical method for binary classification problems. It helps predict situations with only two possible outcomes.
We will use SPML Chatbot Prompt Injection Dataset for input data.
Install the following libraries:
pip install datasets
pip install sentence-transformers
pip install scikit-learn
We will start by loading the dataset
from datasets import load_dataset
dataset = load_dataset("reshabhs/SPML_Chatbot_Prompt_Injection")
Let’s look at the dataset
dataset
DatasetDict({
train: Dataset({
features: ['System Prompt', 'User Prompt', 'Prompt injection', 'Degree', 'Source'],
num_rows: 16012
})
})
This displays the dataset structure. There are 16,012 records in this dataset, each with five columns:
System PromptUser PromptPrompt injectionDegreeSource