Cythera Cyber Security

What We Actually Find During Penetration Tests

What Penetration Testing Actually Reveals And Why the Most Dangerous Findings Don't Come From Scanners
Talk to an expert

What Penetration Testing Actually Reveals And Why the Most Dangerous Findings Don't Come From Scanners

When organisations invest in penetration testing, there's often an expectation that the results will centre on technical vulnerabilities unpatched systems, misconfigured services, known CVEs. The kinds of findings that scanners flag and severity matrices rank.

In practice, the findings that carry the most significant business impact are frequently something else entirely. They're business logic flaws weaknesses in how an application's workflows and processes actually behave, rather than in the underlying technology. These issues don't appear in vulnerability databases. Automated tools can't detect them. They only surface when a tester manually works through an application the way an attacker would: probing how functions interact, where trust assumptions are made, and what happens when the expected sequence of operations is deliberately broken.

This is the work that separates a penetration test that confirms known weaknesses from one that uncovers the risks your organisation didn't know it had.

A Finding That Automated Testing Would Never Catch

During a recent engagement for a platform that enables businesses to purchase radio advertising campaigns, our team identified a business logic vulnerability in the campaign creation and payment workflow. The technical classification was broken function-level authorisation, but the real impact was financial and it was significant.

The platform's intended workflow was straightforward. A business creates a campaign, sets a budget, and is redirected to a payment provider to complete the transaction. Once payment is confirmed, the campaign is submitted for approval.

What our tester identified was a gap between the moment a payment amount is generated and the moment that payment is actually processed. During that window, the draft campaign remained editable. The budget, the campaign duration, the parameters all of them could be modified after the payment link had been issued but before the user followed it.

In practical terms, this meant an adversary could create a campaign with a one-dollar budget, receive the payment redirect, then modify the draft campaign to reflect a budget of almost any amount hundreds of thousands of dollars before completing the one-dollar payment. The platform would register the campaign as successfully paid and submit it for approval at the inflated budget, despite only a single dollar having been charged.

The impact didn't stop there. The platform also supported campaign cancellations with refund processing. A campaign created through this method could be cancelled after approval, triggering a refund request against the modified budget rather than the amount actually paid. During testing, this sequence resulted in a refund request large enough that the payment provider contacted the client directly to query the transaction.

No scanner would flag this. No automated tool would construct this sequence of requests. It required a tester who understood the application's business logic, recognised the trust assumption between the payment initiation and payment processing steps, and methodically tested what would happen if that assumption was violated.

Why Business Logic Vulnerabilities Are Consistently Underestimated

The finding above is specific to one platform, but the pattern behind it is something we encounter regularly. Business logic flaws exist in the gap between what an application is designed to do and what it can actually be made to do. They emerge from assumptions that users will follow the intended workflow, that backend validation will catch what the frontend permits, that one step in a process can trust the output of a previous step without independent verification.

These assumptions are reasonable from a development perspective. Applications are built around expected behaviour. But attackers don't follow expected behaviour, and the most consequential vulnerabilities are often found in the spaces between functions that were each built correctly in isolation but weren't validated as a complete chain.

This class of finding tends to be underrepresented in pen test reports for a straightforward reason: finding them requires significant manual effort. A tester needs to understand what the application does in business terms, map the workflows and their dependencies, and then systematically test what happens when those workflows are manipulated. It's slower than running a scan and less formulaic than testing against OWASP categories. But it's where the findings that genuinely change an organisation's risk profile tend to live.

The Broader Patterns Behind Engagement Findings

Business logic flaws are the clearest illustration of why manual testing depth matters, but they're not the only class of finding that requires human judgement to uncover. Across engagements, several broader patterns consistently emerge and they share the common characteristic of being invisible to automated tools or too context-dependent for generic testing approaches to surface.

Chained weaknesses that individually appear low-risk. Attackers rarely succeed through a single critical vulnerability. More often, compromise follows a path through multiple individually modest weaknesses a misconfigured service that reveals internal structure, credentials obtained through password reuse, an overpermissioned service account that enables lateral movement, and monitoring gaps that allow the entire chain to complete undetected. Each element might rate as medium or low severity in isolation. Together, they represent a viable path to critical assets. Identifying these chains requires a tester who thinks in terms of attack paths rather than individual findings.

Detection and response failures that only appear under realistic conditions. Many organisations have invested in security monitoring, but investment and operational effectiveness aren't the same thing. We regularly find that alerts aren't triggering on the activity they're designed to catch, that detection coverage has gaps in parts of the environment that have changed since the tooling was configured, or that the process between alert and response isn't fast enough to interrupt a realistic attack sequence. These findings don't come from testing controls against a checklist they come from testing controls against actual adversary behaviour and seeing what happens.

AI integration risks that sit outside traditional testing scope. As organisations embed AI capabilities into their products and internal workflows, new categories of exposure are emerging. AI systems that connect to internal data sources create trust boundaries that most application architectures weren't designed to enforce. Prompt injection, data leakage through model outputs, and insecure API connections between AI tooling and business-critical systems are increasingly appearing in our engagement findings but only when AI systems are explicitly brought into scope and tested with the same adversarial rigour as any other application component.


What This Means for Your Testing Program

The depth of what a penetration test reveals is directly determined by the depth of the approach behind it. A test scoped narrowly, executed primarily through automated tooling, and reported against standard vulnerability categories will produce a report. It will satisfy a compliance requirement. It may even identify issues worth remediating.

But it's unlikely to find the business logic flaw in your payment workflow, the attack chain that threads through three systems no one thought to test together, or the detection gap that means your security team wouldn't see a real compromise until it was too late.

The question worth asking isn't whether your organisation conducts penetration testing. It's whether your testing program is structured to find the things that actually matter the findings that change how you think about your risk, not just the ones that confirm what you already suspected.

Assess whether your current approach is uncovering what it should.

Download the Cythera Penetration Testing Checklist to evaluate your testing program against the criteria that drive real security outcomes from scoping and methodology through to AI exposure testing and getting the most value from every engagement.

Download the Checklist


Events

Latest events

Join Cythera experts for networking events, technical briefings, and hands-on workshops hosted throughout the year.
View all events
No items found.
Cyber security news

Latest advisories

Stay ahead of emerging threats with our expert blog posts, research, and industry updates.
Silverstripe - Host Header Injection
Silverstripe CMS is affected by a Host Header Injection flaw, which can be exploited to manipulate password reset workflows, potentially redirecting or compromising user data.
FarCry Core Framework - Multiple Issues
FarCry Core contains multiple vulnerabilities that could let unauthenticated users upload arbitrary files and execute remote code on the hosting server.
Silverstripe – Cross-Site Scripting (XSS) Vulnerability
With local organisation admin credentials, an attacker can exploit the API to create, delete, or revert virtual machine snapshots in other organisations’ Virtual Data Centres (VDCs), breaching isolation boundaries.