Patent attributes
An analysis engine receives data characterizing a prompt for ingestion by a generative artificial intelligence (GenAI) model. The analysis engine, using a determines using, for example, a classifier or blocklist, that the prompt comprises or is indicative of malicious content or otherwise elicits undesired model behavior. Similarly, outputs of the GenAI model can be analyzed to determine whether they comprise malicious content or cause the model to behave in an undesired manner. The output is inputted into a GenAI model along with obfuscation instructions to generate an output which is returned to the requesting user. Related apparatus, systems, techniques and articles are also described.