Controlling Security and Observability Data at Scale

Category: News
Published: 19^th February 2026

As infrastructure becomes more distributed, data volumes continue to rise. Endpoints, servers, cloud platforms, SaaS applications, and Kubernetes clusters all generate logs and telemetry. In many environments, this data is forwarded directly into a SIEM (Security Information & Event Management) platform without filtering or prioritisation.

The outcome is predictable. Ingestion volumes exceed what would typically be expected for an organisation’s size or industry, and costs increase accordingly. At the same time, analysts must sift through growing amounts of low-value data to find what matters.

For security and IT leaders, the challenge is maintaining effective visibility without unnecessary spend or operational drag. Platforms such as Cribl address this problem by introducing control between data sources and analytics tools.

The Problem with Sending Everything to the SIEM

Modern environments span on-premises systems, multiple cloud providers, and SaaS services. Each layer produces logs. A common response is to centralise all of it in a SIEM.

While comprehensive collection may appear prudent, it creates two issues. First, many SIEM platforms charge based on ingestion volume and retention. High log volumes directly translate into higher costs. Second, excessive data reduces efficiency. Analysts spend more time filtering noise, and detection engineering becomes more complex.

Storing and indexing data that is rarely queried offers limited value, particularly when it inflates budgets.

Routing Data by Purpose

Not all telemetry serves the same objective. Security investigations, compliance requirements, performance monitoring, and long-term retention have different needs.

With a data pipeline layer in place, security-relevant logs can be directed to the SIEM, while application and performance data can be sent to observability platforms. Logs required for compliance can be archived without being indexed in high-cost systems. Raw data can be retained in object storage for future reference.

This separation allows organisations to match data to the appropriate tool instead of forcing every dataset into a single platform.

Reducing Volume While Preserving Value

Log reduction does not mean sacrificing visibility. Many events are repetitive or low value in the context of threat detection. Filtering, sampling, and transforming data before it reaches a SIEM helps ensure that indexed data supports meaningful analysis.

By focusing ingestion on high-value telemetry, teams can maintain detection capability while limiting unnecessary indexing. This approach improves both performance and cost predictability.

Supporting Complex Environments

Hybrid and multi-cloud architectures introduce operational complexity. Logs originate from virtual machines, containers, serverless workloads, and network devices across different platforms.

A centralised data pipeline creates consistent handling across these sources. Data can be normalised and routed according to defined policies, regardless of origin. This provides architectural clarity as environments evolve.

For organisations balancing security and observability needs, separating data flows reduces friction. Security teams receive the telemetry required for monitoring and response. Engineering teams maintain access to operational data without inflating security tooling costs.

When to Consider a Data Pipeline Strategy

A structured data management approach becomes relevant when SIEM costs grow faster than the business, when cloud adoption significantly increases telemetry, or when Kubernetes and containerisation multiply log streams.

In many cases, organisations discover that all available logs are being forwarded directly into the SIEM. Endpoint activity, cloud events, server logs, and container logs accumulate in a single system. Over time, ingestion volumes expand beyond what is operationally or financially sustainable.

Introducing a data pipeline creates an opportunity to reassess what truly needs real-time indexing and what can be stored or routed elsewhere.

What Cribl Provides

Cribl is a data pipeline platform designed to manage how telemetry is processed and routed. Rather than replacing existing tools, it sits between data sources and destinations. This intermediary position gives organisations control over how data flows across their environment.

Teams can determine which telemetry should be forwarded for real-time analysis, which should be transformed or enriched, and which can be archived in lower-cost storage. Instead of defaulting to full ingestion into a SIEM, data handling becomes intentional and policy-driven.

Cribl also provides pre-built templates for routes and pipelines, including associated processing logic. These templates can be deployed across environments with minimal setup, reducing the operational effort required to implement structured data flows. This is particularly useful in complex environments where consistency across cloud, on-premises, and containerised workloads is important.

In addition to pipeline management, Cribl offers a data lake solution designed for scalable, long-term storage of IT and security telemetry. Data can be stored in open formats, enabling search and replay without being tied to a specific analytics vendor. This approach supports long-term retention and investigative flexibility while reducing reliance on high-cost indexing platforms.

Together, these capabilities shift the focus from collecting everything in one place to managing telemetry according to operational and financial priorities.

Learn more about Cribl

Cookie	Duration	Description
AnalyticsSyncHistory	1 month	No description
crisp-clientsession/*	6 months	Session ID for existing chat through Crisp chat
li_gc	2 years	No description

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gat_UA-3109036-1	1 minute	A variation of the _gat cookie set by Google Analytics and Google Tag Manager to allow website owners to track visitor behaviour and measure site performance. The pattern element in the name contains the unique identity number of the account or website it relates to.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.

Cookie	Duration	Description
bcookie	2 years	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser ID.
bscookie	2 years	LinkedIn sets this cookie to store performed actions on the website.
lang	session	LinkedIn sets this cookie to remember a user's language setting.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
UserMatchHistory	1 month	LinkedIn sets this cookie for LinkedIn Ads ID syncing.

Send a message

Contact details

Social media