AI Security Review | Netscylla Cyber Security

As organisations accelerate their adoption of Large Language Models (LLMs), AI agents, and retrieval-augmented generation (RAG) pipelines, a new and rapidly evolving attack surface emerges. Netscylla's AI Security Review service applies proven offensive security techniques to the unique risks posed by modern AI systems — helping you understand exactly where your models, integrations, and data pipelines are exposed before an adversary does.

Prompt Injection
Prompt injection attacks attempt to override or subvert a model's system instructions by embedding malicious directives into user-supplied or external content. We test both direct prompt injection (where an attacker controls input to the model) and indirect injection (where a model reads adversarial content from a document, web page, or database). Our assessors craft payloads designed to redirect model behaviour, exfiltrate context, or cause the model to take unintended actions — exposing gaps in your input sanitisation, system prompt hardening, and output filtering.
RAG Assessments
Retrieval-Augmented Generation introduces a retrieval layer that can itself be targeted. We assess the security of your vector stores, embedding pipelines, and document ingestion processes. Key risks we investigate include RAG poisoning (injecting malicious content into the knowledge base to manipulate model outputs), retrieval manipulation (crafting queries that surface sensitive or unintended documents), and data segregation failures (where retrieval crosses tenant or classification boundaries). We also review chunking strategies, metadata filtering, and access controls on your retrieval index.
Model Review
A thorough review of the AI model itself — whether a hosted API, a fine-tuned checkpoint, or an on-premise deployment. We evaluate model card claims against observed behaviour, assess the adequacy of safety fine-tuning (RLHF, constitutional AI, guardrails), review the model's tendency to hallucinate in security-sensitive contexts, and examine version control and provenance of model artefacts. For fine-tuned models we also assess the training pipeline and dataset hygiene to identify risks of training data poisoning or inadvertent memorisation of sensitive information.
Data Exfiltration
LLMs can be coerced into leaking information they should not have access to — including system prompts, tool configurations, PII processed in prior turns, or data from other users in multi-tenant deployments. We test for context window leakage, system prompt extraction, and cross-session data bleed. In agentic settings, we examine whether an AI agent can be manipulated into reading and transmitting files, API responses, or environment variables that fall outside its intended scope.

Jailbreaking
Jailbreaking encompasses a range of techniques used to cause a model to bypass its own safety constraints — producing harmful, policy-violating, or confidential output. Our team maintains an up-to-date library of jailbreak patterns including role-play exploits, character hijacking, hypothetical framing, token smuggling, and multi-turn manipulation chains. Beyond simple content policy bypasses, we test whether safety controls hold under adversarial pressure at the system level — particularly when the model is embedded in an application that layers additional context or instructions.
Application & API Testing
AI functionality is typically exposed via APIs and integrated into broader applications, introducing conventional web and API security risks alongside AI-specific ones. We conduct standard application security testing (OWASP API Top 10, authentication, authorisation, rate limiting) combined with AI-specific checks: model endpoint hardening, streaming response leakage, tool-call injection in function-calling APIs, and plugin/extension security in agentic frameworks. We also assess logging and observability to determine whether your application captures the data needed to detect and investigate AI-driven attacks.
Pivoting & Acting on Objectives
In agentic deployments, a successfully compromised AI model is not just an information leak — it is an execution primitive. We simulate full attack chains where an initial prompt injection or jailbreak enables an attacker to pivot laterally through connected tools, APIs, and services. Objectives we pursue include: issuing unauthorised commands to integrated services (email, calendar, CRM, cloud APIs), escalating privileges within multi-agent orchestration frameworks, and exfiltrating data through the model's allowed output channels. This tradecraft-focused assessment reveals the real-world blast radius of an AI compromise in your environment.