S2 the Stream Store

Streams as a cloud storage primitive

Multi-cloud APIs for data in motion with reasonable pricing on scalable usage.

Unlimited streams

icon
Model your domain naturally, say goodbye to low caps and fixed costs.

Elastic throughput

icon
Reactive scaling of each stream to 100s of MiBps writes and GiBps reads – and down to 0.

Latency flexibility

icon
Express < 50 ms Standard < 500 ms end-to-end, p99

Enabling the next generation of data systems

icon
“When data is moving reliably, there is a durable stream involved. Streams deserve to be a cloud storage primitive.” S2 is designed to deliver on the promise of serverless. Read more

FAQs

How are streams namespaced?

An S2 Basin is a configurable namespace for streams, just like a bucket in object storage. Streams in a basin can be listed with prefix filtering, which is helpful for representing a hierarchy.

What are the semantics of a stream?

An S2 Stream is a durable, unbounded sequence of records that can be appended to, pulled from, trimmed, and fenced. Records carry headers, a data payload, and a 64-bit sequence number.

How much throughput can a stream push?

Each stream is elastic up to hundreds of MiBps of writes with no prior provisioning. Real-time readers are guaranteed a multiple of recent write throughput, and catch-ups can draw multi-GiBps.

What are the latencies like?

A storage class can be selected as a basin-level default and even per stream, based on your end-to-end tail latency requirements. The inaugural storage classes areStandard at sub-500-millisecond and Express at sub-50-millisecond – in AWS these are backed by S3 storage classes of the same name. A faster Native storage class for sub-10-millisecond requirements is planned as a followup.

How is data made durable?

S2 is a regional service; writes will always be on disk in multiple availability zones before being acknowledged. Rigorous testing and rock solid dependencies help guard against bugs.

How long can records be retained?

As long as you need them – storage is bottomless, at an object storage price point. Streams can be trimmed explicitly, or automatically based on age of records. Key-based compaction like Kafka is also planned.

What is the pricing model?

Only usage-based: a small per request cost, per GiB-hour for retained data storage, and per GiB for data transferred. Data transfer costs will depend on the storage class for ingress, and client origin for egress.

Will it support Kafka?

Yes! Our starting point is to make the heart of Kafka – the ordered stream – truly serverless. Comprehensive support for the Kafka wire protocol and features will be available as an open source layer.

Something else