Informatik / Digitales
As AI systems grow more pervasive, questions of safety and trust demand urgent, practical answers.
Beginn
17:00
Uhr
Ende
00:00
Uhr
Beschreibung
This session presents five research efforts spanning the AI safety landscape: a classifier that identifies AI-generated content across model families (IITGnGPT); toxicity detectors for low-resource Indian languages covering 17 fine-grained harm categories across 12 languages (UnityAI-Guard 1.0 & 2.0); a demonstration of backdoor attacks in YOLO, text classifiers, generative models, and translation systems — revealing a systemic adversarial threat surface; and SangrahaTox, a benchmark dataset for auditing multimodal models for stereotypes, bias, and toxicity. Together, these projects chart a path from detecting what AI produces, to exposing how it can be manipulated, to measuring the biases it silently encodes — offering both diagnostic tools and a broader framework for building trustworthy AI.Please note that the presentation will be held in English.
Information zum Veranstaltungsformat
PräsentationStationen
Münchner Platz
- 3 (tram)
Helmholtzstraße
- 85 (bus)