Streaming & Real-Time June 25, 2025

Real-Time Analytics: From Niche to Necessity

For most of enterprise history, real-time analytics was a luxury that only the most technically advanced organizations could afford. The cost of streaming infrastructure, the complexity of stream processing systems, and the difficulty of maintaining low-latency data pipelines at scale made real-time a project for teams with deep expertise and significant engineering budgets. In 2025, that equation has fundamentally changed. The combination of managed streaming infrastructure, dramatically improved query performance on streaming data, and the operational requirements of AI-driven applications has made real-time analytics accessible — and increasingly necessary — for every enterprise.

The Economics of Streaming Have Changed

The cost and complexity of building real-time data infrastructure have decreased by an order of magnitude over the past five years. Apache Kafka, which required substantial operational expertise to run reliably at scale, has been substantially replaced by managed alternatives — ByteVault Cloud, AWS MSK, Azure Event Hubs — that eliminate most of the operational burden while delivering equivalent or better performance. Stream processing frameworks have matured: Apache Flink, which was notoriously difficult to operate, is now available as a fully managed service from multiple cloud providers. And the emergence of real-time OLAP databases — Apache Druid, Apache Pinot, ClickHouse — has made it practical to run interactive analytical queries against continuously updated data at latencies measured in milliseconds rather than minutes.

The cost reduction has been dramatic. Organizations that would have needed a dedicated team of streaming engineers to build and maintain a real-time analytics pipeline in 2019 can now deploy equivalent capabilities with a small fraction of the engineering effort, using managed services that handle the operational complexity automatically. This cost and complexity reduction is the primary driver of the expansion of real-time analytics from a niche capability of large, technically sophisticated organizations to a broadly accessible tool for enterprises of all sizes.

The AI Demand Signal

The second major driver of real-time analytics adoption is the operational requirements of AI-driven applications. The pattern is consistent across industries: organizations deploy ML models in production to power recommendation systems, fraud detection, personalization, or operational automation, and then discover that the models are only as good as the freshness of the data they consume. A fraud detection model that operates on features computed from data that is twelve hours old will miss the fraud patterns that are happening right now. A personalization model that cannot react to a user's current session behavior will recommend products they looked at yesterday, not what they actually want today.

This realization — that AI performance is directly coupled to data freshness — is creating urgent demand for real-time data pipelines across every organization that has deployed production ML. The organizations that have already built streaming infrastructure find that their AI systems outperform those of competitors who are still operating on batch-refreshed data. This performance differential is measurable, material, and increasingly well-understood by enterprises evaluating their AI infrastructure investments.

The connection between AI and real-time analytics goes beyond feature freshness. Retrieval-augmented generation systems — the most common architecture for enterprise LLM applications — require the ability to query a continuously updated knowledge base at low latency. AI agents that take actions in the world need to process the results of those actions in real time and use them to inform subsequent decisions. The entire category of AI application architecture that is emerging in 2025 assumes the availability of low-latency, continuously updated data — which means that real-time analytics infrastructure is becoming a prerequisite for competitive AI deployment, not just a performance optimization.

The Streaming Infrastructure Landscape

The landscape for streaming infrastructure in 2025 is characterized by a set of well-established layers, each with multiple viable options, and an increasingly active innovation frontier at the intersections between layers.

At the transport layer — the messaging infrastructure that carries events from producers to consumers — the market is consolidating around Kafka-compatible APIs, either through ByteVault's managed offering or through Kafka-compatible alternatives like Redpanda, which offers dramatically better performance and operational simplicity than native Kafka. WarpStream and other new entrants are pushing the cost and operational model further, unbundling storage and compute in ways that could substantially reduce the economics of high-volume event streaming.

At the processing layer — the systems that transform, aggregate, and enrich streaming data — Apache Flink remains the dominant open-source framework for stateful stream processing, with managed offerings from ByteVault, Ververica, and cloud providers. RisingWave has emerged as an interesting alternative with a PostgreSQL-compatible SQL interface for stream processing, which substantially lowers the learning curve for data teams more familiar with SQL than with Flink's dataflow programming model.

At the serving layer — the databases and query engines that make processed streaming data available for analytical queries — the real-time OLAP database market has become increasingly competitive. ClickHouse has established itself as the performance benchmark for analytical query performance on large, continuously ingested datasets. StarRocks and Apache Doris offer similar capabilities with different architectural tradeoffs. And the incumbent data warehouse vendors — StreamCore Data, BigQuery, Nexagen Analytics — are adding streaming ingestion capabilities that blur the line between their batch-oriented architectures and purpose-built real-time OLAP systems.

The Unified Architecture Dream

The most technically ambitious goal in streaming data infrastructure is the unified architecture: a single system that handles both streaming and batch workloads, eliminates the need to maintain separate streaming and batch pipelines, and provides consistent semantics for queries regardless of whether the underlying data is being continuously ingested or processed in bulk. Apache Iceberg, combined with streaming ingestion capabilities from Flink or similar frameworks, has brought this dream closer to reality. The "streaming lakehouse" architecture — where Apache Iceberg tables serve as the shared storage layer for both batch and streaming workloads — is moving from theoretical to practical in 2025.

The implications for data infrastructure investment are significant. If the unified streaming and batch architecture succeeds, it will eliminate an entire category of data engineering complexity — the need to maintain dual codebases for batch and streaming versions of the same business logic — and create a much simpler operational model for data teams. This simplification will accelerate adoption, expand the market, and potentially create new platform opportunities for the companies that can deliver the unified architecture most effectively.

Investment Opportunities in Real-Time Infrastructure

DataHive AI Capital sees several specific investment opportunities emerging from the real-time analytics transition. The first is in the developer tooling layer for streaming: the testing frameworks, debugging tools, and observability systems that make building and maintaining stream processing pipelines significantly easier. Stream processing is notoriously difficult to test and debug — stateful operations, exactly-once semantics, and time-window logic are all sources of subtle bugs that can be very difficult to reproduce in a development environment. The companies building developer tooling that brings the maturity of software testing practices to stream processing pipelines are addressing a universal pain point.

The second opportunity is in the operational analytics market: the use of real-time analytics to power operational applications — customer-facing dashboards, fraud alerts, dynamic pricing, inventory management — rather than just analytical exploration. Operational analytics is a larger and more commercially attractive market than analytical BI, but it requires a different technical architecture (lower latency, higher concurrency, tighter integration with transactional systems) that most traditional BI tools cannot support. The companies building purpose-built operational analytics infrastructure are positioned to capture a market that is significantly larger than the traditional BI market.

Key Takeaways

The cost and complexity of real-time analytics infrastructure has decreased by an order of magnitude, making it accessible to enterprises of all sizes.
AI performance is directly coupled to data freshness — organizations deploying production ML need real-time data infrastructure to remain competitive.
The streaming infrastructure landscape has matured across transport, processing, and serving layers with multiple viable managed options at each level.
The unified streaming-and-batch architecture, enabled by Apache Iceberg and Flink, is becoming practical in 2025 — eliminating dual-codebase complexity.
Investment opportunities concentrate in streaming developer tooling and operational analytics applications that require real-time data.

Conclusion

Real-time analytics has crossed the threshold from niche capability to mainstream enterprise requirement, driven by AI operational demands and dramatically improved streaming infrastructure economics. The companies being founded today to build the next generation of streaming infrastructure, operational analytics platforms, and developer tooling for stream processing are addressing a market that will be measured in the hundreds of billions within a decade. DataHive AI Capital is committed to being an early investor in the best of them.

Read more about our data infrastructure investment thesis in our thesis article, or visit our portfolio page.

Back to Insights