Amazon Web Services on Saturday celebrated the 20th birthday of its Simple Storage Service (S3) and revealed a few little secrets about the service.
S3 launched on March 14th, 2006, and according to Amazon’s birthday party post by principal developer advocate Sébastien Stormacq, initially offered “approximately one petabyte of total storage capacity across about 400 storage nodes in 15 racks spanning three data centers, with 15 Gbps of total bandwidth.”
Today, the post states, S3 “stores more than 500 trillion objects and serves more than 200 million requests per second globally across hundreds of exabytes of data in 123 Availability Zones in 39 AWS Regions.”
AWS described S3’s scale with the following odd metrics: “If you stacked all of the tens of millions[of] S3 hard drives on top of each other, they would reach the International Space Station and almost back.”
Most 3.5-inch hard disk drives are 26mm tall and the International Space Station orbits around 400km above Earth. We know AWS buys custom hardware, but assuming the company sticks to standard form factor we therefore believe S3 uses about 276 million hard drives.*
In the post, Stormacq ranked S3’s consistency as the service’s most remarkable achievement.
“The code you wrote for S3 in 2006 still works today, unchanged,” he wrote. “Your data went through 20 years of innovation and technical advances. We migrated the infrastructure through multiple generations of disks and storage systems. All the code to handle a request has been rewritten. But the data you stored 20 years ago is still available today, and we’ve maintained complete API backward compatibility.”
The post also notes that the S3 API “has been adopted and used as a reference point across the storage industry” and that “Multiple vendors now offer S3 compatible storage tools and systems, implementing the same API patterns and conventions.”
The download exceeds 130TB, so Backblaze broke it up into 200GB objects.
What a time to be alive.
That’s also a very significant achievement. Your correspondent edited an Australian storage-centric publication at the time AWS launched S3, and within months startup backup vendors started using the service as a new storage tier that allowed them to handle data at scale that would have been prohibitively expensive to store or protect with the on-prem tech of the day. S3 therefore created new possibilities for data protection.
The availability of cloud storage also has a cultural footprint: Netflix and Spotify are known S3 users and used the service to scale at speed. Both streaming giants set examples that others in the video and music industries followed.
S3 has also caused plenty of problems, especially with the peculiar decision to initially make all resources on the service open to public access unless users restricted access. Some S3 users thought obscurity would confer security, but once criminals went looking for open S3 buckets, they found thousands of insecure cloudy storage setups.
S3 has also suffered outages, most infamously in 2017 when trouble at Amazon’s problematic US-EAST-1 region crippled some major websites for hours.
Amazon’s birthday post accentuates the positives, especially S3’s 11 nines (99.999999999 percent) durability and lossless operations.
The missive reveals a little about how AWS delivers that reliability.
“At the heart of S3 durability is a system of microservices that continuously inspect every single byte across the entire fleet,” Stormacq wrote. “These auditor services examine data and automatically trigger repair systems the moment they detect signs of degradation.”
He then revealed that in the last eight years, “AWS has been progressively rewriting performance-critical code in the S3 request path in Rust. Blob movement and disk storage have been rewritten, and work is actively ongoing across other components.”
The post concludes with Stormacq sketching AWS’s future plan for S3 to extend “beyond being a storage service to becoming the universal foundation for all data and AI workloads.”
“Our vision is simple: you store any type of data one time in S3, and you work with it directly, without moving data between specialized systems. This approach reduces costs, eliminates complexity, and removes the need for multiple copies of the same data.”
And perhaps also removes the need to consider clouds other than AWS, then makes it difficult to leave, as any vendor hopes is the result when you become a customer. ®
* 400km is 4,000,000,000 mm/26mm = 153,846,615. Assuming “almost” back is 80 percent, 1.8 x 153,846,615 = 276,923,077
Source: The register