Streams as a cloud storage primitive
Multi-cloud APIs for data in motion with reasonable pricing on scalable usage.
Unlimited streams
Elastic throughput
Latency flexibility
Enabling the next generation of data systems
FAQs
How are streams namespaced?
An S2 Basin
is a configurable namespace for streams, just like a bucket in object storage. Streams in a basin can be listed with prefix filtering, which is helpful for representing a hierarchy.
What are the semantics of a stream?
An S2 Stream
is a durable, unbounded sequence of records that can be appended to, pulled from, trimmed, and fenced. Records carry headers, a data payload, and a 64-bit sequence number.
How much throughput can a stream push?
Each stream is elastic up to hundreds of MiBps of writes with no prior provisioning. Real-time readers are guaranteed a multiple of recent write throughput, and catch-ups can draw multi-GiBps.
What are the latencies like?
A storage class can be selected as a basin-level default and even per stream, based on your end-to-end tail latency requirements. The inaugural storage classes areStandard
at sub-500-millisecond and Express
at sub-50-millisecond – in AWS these are backed by S3 storage classes of the same name. A faster Native
storage class for sub-10-millisecond requirements is planned as a followup.
How is data made durable?
S2 is a regional service; writes will always be on disk in multiple availability zones before being acknowledged. Rigorous testing and rock solid dependencies help guard against bugs.
How long can records be retained?
As long as you need them – storage is bottomless, at an object storage price point. Streams can be trimmed explicitly, or automatically based on age of records. Key-based compaction like Kafka is also planned.
What is the pricing model?
Only usage-based: a small per request cost, per GiB-hour for retained data storage, and per GiB for data transferred. Data transfer costs will depend on the storage class for ingress, and client origin for egress.
Will it support Kafka?
Yes! Our starting point is to make the heart of Kafka – the ordered stream – truly serverless. Comprehensive support for the Kafka wire protocol and features will be available as an open source layer.
Something else
Leave a comment or email.