Building a Bulletproof Prompt Injection Detector using SetFit with Just 32 Examples

In my previous post we built Prompt Injection Detector by training a LogisticRegression classifier on embeddings of SPML Chatbot Prompt Injection Dataset. Today, we will look at how we can fine-tune an embedding model and then use LogisticRegression classifier. I learnt this technique from Chatper 11 of Hands-On Large Language Models book. I am enjoying this book. It is practical take on LLMs and teaches you many practical and useful techniques that can one can apply in their work.

We can fine-tune an embedding on the complete dataset or few examples. In this post we will look at fine tuning for few shot classification. This technique shines when you have only a dozen or so examples in your dataset.

I fine-tuned the model on RunPod https://www.runpod.io/. It costed me 36 cents to fine tune and evaluate the model. I used 1 x RTX A5000 machine that has 16 vCPU and 62 GB RAM.

Continue reading “Building a Bulletproof Prompt Injection Detector using SetFit with Just 32 Examples”