Most security operations environments that I have had the luck of seeing, look impressive. They have dashboards everywhere, there are alerts firing and their SIEM is lit up like it’s doing something meaningful. From anyone on the outside, it reads as a mature, well-instrumented program. From the inside…if you’re really honest about it, is a lot of it is just noise with good branding.
All of that without understanding of what it really means, isn’t security, it’s just activity and that activity, no matter how busy it looks, doesn’t reduce risk one bit.
Most organizations have invested heavily in detection. SIEM platforms, endpoint detection, network monitoring, cloud telemetry, alerting rules that would make a seasoned analyst pause before clicking “acknowledge.” On paper, it’s everything we’re told that we’re supposed to do. Collect more data, increase visibility and detect faster. None of that is wrong, but somewhere along the way, we started treating detection and understanding as the same thing. They’re not and they aren’t even even close.
Detection tells you something happened, yet understanding tells you what it means in context of your environment. That distinction matters more than most security programs or leaders are willing to admit, because the entire chain of security operations depends on it. When an alert fires, the real question isn’t whether you saw it, it’s whether you knew what to do with that alert.
I’ve been in environments where alerts fire constantly and in high volumes, with high severity, everything looks critical and yet very little of it is really actionable in the way people think. It’s not because the alerts are wrong, but it’s because they have no context to understand what is happening.
I like to think of an alert without context is a smoke alarm going off in a building where nobody knows where the exits are, what’s burning, or if there’s even a fire. While there may be panic at first, eventually people stop reacting. It’s not because they stopped caring, but because they can’t tell what matters. I think that’s where detection fails….and fails quietly, and it’s not the tools, it’s the interpretation. Interpretation requires understanding the environment and what is normal. In most cases that’s a lot harder than deploying a tool.
For example take a suspicious login alert. Now that sounds important, doesn’t it and it probably is, but what makes it suspicious? Is it the location, the time, the behavior, a known pattern? Now let’s add context to that alert. Does that user travel frequently? Is the system publicly accessible? Is there a business process that explains the anomaly? Without context, it’s just a data point, on it’s own not actionable. Yet, with context, it becomes a decision point. That gap is where most programs struggle, because collecting data scales easily but understanding it doesn’t.
The challenge isn’t just that context is hard to build into our programs, it’s just that we’ve built entire programs on the assumption that we don’t need it (purposely or accidentally). We design detection rules around technical triggers and assume the analyst will fill in the rest. We tune our SIEM for coverage and call it done and we measure success by alert volume rather than decision quality. Then we act surprised when our team feels buried.
Security operations has a data problem, we’re excellent at collecting signals and genuinely inconsistent at turning them into real meaning. When meaning is missing, everything looks the same (important, urgent, critical) which is another way of saying nothing stands out. Teams feel overwhelmed, and not because they lack skill, but because they lack clarity.
Alert fatigue is the visible symptom of this, but the damage runs deeper than tired or frustrated analysts. When signals don’t carry enough context to support confident decisions, hesitation sets in. Analysts stop trusting their own judgment, not because they’re wrong, but because the information they’re working with is incomplete. Over time that leads to inconsistency in our programs, where two analysts look at the same alert and reach completely different conclusions, both of them reasonable given what they know. You could argue that neither is wrong and if you’re honest with yourself you know the process is.
Another things to consider is there’s also the delayed action problem. I’m not talking about “we missed it” before taking action, more like “we saw it, we just couldn’t figure out what it meant fast enough to do anything useful.” In most incidents that I’ve talked to my collogues about, the detection wasn’t the failure…the interpretation was. The alert fired like it was supposed to, the team acknowledged it and then they spent the next several hours trying to reconstruct the context that should have been there from the start.
That mistake is an expensive way to operate a security program. The longer it takes to understand an event, the smaller your window to contain it and in security, that window tends to close faster than anyone’s comfortable admitting.
I think that most detection strategies are built backwards. We start with the technology, what can the SIEM detect, what does the EDR flag, what triggers an alert in the cloud environment and then work out from there. That makes sense, that’s understandable, but I also think It’s also a problem.
It comes down to detection logic built around technical triggers without business context tends to generate alerts that are technically accurate but operationally useless. Yes, that login looks anomalous, but is it? Well that depends entirely on what the business was doing at the time. Yes, that process spawned an unusual child process. Should you care? Depends on whether that’s a known behavior in your environment or something that’s never happened before. Every single environment is different, something normal for my environment, might be unusual in yours and cause for alarm.
The missing ingredient is almost always the same, we don’t know what “normal” actually looks like for this specific environment. Not the textbook version of normal. Not the vendor’s baseline. What’s normal for this organization, with these users, with these systems, and these workflows. That knowledge doesn’t come from any tool (at least that I’ve ever seen), it comes from time, documentation, and deliberate effort to understand the environment before you try to detect threats inside it.
I think this is also why threat intelligence programs often underdeliver. I talked about use having a data problem, and the raw intelligence is a data point. To go with my theme if you want actionable intelligence you take that data point and add context. Knowing that a particular threat actor targets organizations in your sector is interesting (bit only mildly helpful). Knowing which of your systems would be most exposed if they came knocking, and what that activity would actually look like in your telemetry, is extremely useful. The gap between those two things is where a lot of threat intelligence programs live permanently.
As I often talk about, I think there need to be a shift in thought, leadership needs to change the question. Instead of asking how many alerts we’re detecting, ask how many we actually understand. That question tends to make people uncomfortable, which usually means it’s a good one.
Understanding requires context, knowing your environment, your users, your systems, your data flows, and what normal actually looks like for your organization specifically. That’s a different investment than buying another detection tool. It’s harder to justify in a budget conversation, and it doesn’t come with a vendor demo…but it’s the work that makes everything else function.
A mature security operations program shouldn’t be measured by how much it detects, it should be measured by how effectively it acts on what it detects. That means investing in context, not just coverage. It means detection logic that reflects how the business actually works, not just what a default framework says to watch for. It means that when an alert fires, the analyst has enough information to make a decision quickly and accurately, not just enough to check a box.
A mature security operation program also means being honest about what you’re actually measuring for success. Alert volume is easy to measure. Mean time to detect is measurable. I think that “Mean time to understand” (which is the actual gap between seeing an event and knowing what it means) almost nobody tracks that (not once have I heard this as a metric). Which is interesting, because that’s the number that tells you whether your program actually works.
Detection and alerting is only valuable if it leads to action. Action only happens when there’s understanding. Otherwise you’ve built a very impressive system that tells you something happened and leaves the rest to chance. Security theatre!
Next time you’re staring at a dashboard full of alerts, ask yourself (and be honest with yourself) are we detecting more, or are we understanding more? One of those reduces risk. The other just looks like it does.