Classify and Detoxify: Tackling Toxic Comments with RoBERTa and LLaMA
Classify and Detoxify: Tackling Toxic Comments with RoBERTa and LLaMA1 min readJust now--In todays digital era, toxic comments and hate speech have become increasingly common across social media, discussion forums, and online platforms. Toxic text such as insults, threats, or sexual harassment not only harms users mental health but also creates a hostile and unhealthy online environment.Detecting toxic comments has become a major challenge in the field of Natural Language Processing (NLP). However, simply classifying toxic content isnt enough. To build truly helpful solutions, these harmful texts also need to be detoxified rephrased in a softer, more respectful manner without losing the original context.In this project, I built a two-stage system:Toxic comment classification using the RoBERTa model to identify the type of toxicity: insult, threat, sexual harassment, or non-toxic.Toxic text detoxification using a fine-tuned LLaMA 3.1 model to transform harmful sentences into more polite and constructive versions.With this approach, were not just flagging problematic content were offering better alternatives, promoting a healthier and more inclusive digital space.