Finally, someone did the obvious thing! It's framed for open source models but close source providers should also take notes.
AI Security Institute
AI Security Institute12.8. klo 18.59
How can open-weight Large Language Models be safeguarded against malicious uses? In our new paper with @AiEleuther, we find that removing harmful data before training can be over 10x more effective at resisting adversarial fine-tuning than defences added after training 🧵
11,09K