S3
Introduction
Simple Storage Service (S3) is an object based file storage (not block storage for OS or applications running on), Single file size can be 0 to 5T. No limit for total usage.
S3 is global, but choose region when creating a bucket.
Files are stored in Buckets. Bucket name is unique globally.
A Bucket created with a domain name has the path-style URL: https://s3-${region}.amazonaws.com/${bucket_name}
Working with HTTP protocols: PUT, DELETE. Return status code 200 if the action is successful. Multipart transferring is supported.
Feature
Can read immediately after upload, but update or delete would take more time to propagate (eventual consistency)
Support Cross-origin resource sharing (CORS) for servers in different domains.
Composition:
Key (name)
S3 Stores data by name with alphabetical order, so filtering with file name might hit the performance. Adding random letters or numbers into name (for folder, file) would help files storing evenly through S3.
Value (file)
Version id
Metadata
Sub-resources
Access Control Lists (for privilege)
Torrent
Storage Class:
Performance:
Baseline Performance (by prefixes in a bucket)
scales automatically, latency: 100 ~ 200 ms
can achieve 5,500 req/s for GET/HEAD, 3,500 req/s for other REST APIs per prefix in a bucket.
No limits to the number of prefixes in a bucket.
Multi-part Upload
Recommended: file > 100 MB, mandatory: file > 5 GB.
Byte-range Fetches (for downloads)
Parallelize GETs by requesting specific byte ranges
Transfer Acceleration (can be combined with multi-part upload)
Upload to CloudFront edge location, then move to S3 target region.
Transfer Acceleration over a fully-utilized 1 Gbps line can transfer up to 75 TB in the same time of Snowball turnaround time, if it will take more than a week to transfer over the Internet, or there are recurring transfer jobs and there is more than 25Mbps of available bandwidth, Transfer Acceleration is a good option.
Glacier retrieval options:
Expedited (1 to 5 minutes)
Standard (3 to 5 hours)
Bulk (5 to 12 hours)
Glacier Deep Archive retrieval options:
Standard (12 hours)
Bulk (48 hours)
Retrieve subsets of objects using SQL by performing server side filtering.
Can filter by rows & columns (simple SQL statements)
Faster & cheaper
Charges:
Storage
Requests
Storage management pricing (file tagging)
Data transfer pricing
Transfer acceleration
Monitoring cost (only for S3 Intelligent-Tiering, $0.0025 for 1000 objects)
Cost Saving tips:
S3 Select / Glacier Select
S3 Lifecycle
Compress object to save space
S3 Requester Pays:
Bucket owner pays for S3 storage
Requester pays for the cost of request and data download
Use bucket policy, not use IAM Role (Otherwise, it's still you to pay).
Lifecycle management
Transition from standard to IA (128Kb and 30 days after creation)
Archive to Glacier (30 days after IA if relevant)
Permanent deletion
Versioning
Stores all versions of an object (including all writes, even the object is deleted)
Once enabled, it can't be disabled, only suspended.
Integrate with lifecycle rule
Versioning's MFA delete can be enabled to:
Changing versioning state
Permantly delete an object version
If want to receive every write event notification, must enable versioning. Otherwise may get only one notification for multiple writes without versioning.
S3 Object Lock & Glacier Vault Lock
S3 Object Lock
Adopt a WORM (Write once, read many) model.
Block an object version deletion for a specified amount of time.
Glacier Vault Lock
Adopt a WORM (Write once, read many) model.
Lock the policy for future edits (can no longer be changed).
Helpful for compliance and data retention.
Encryption:
In transit (SSL/TLS)
At rest
Server side encryption options
S3 managed keys: SSE-S3
AWS Key Management Service, Managed Keys: SSE-KMS (similar to SSE-S3 but with some additional benefits)
Customer provided (managed) keys: SSE-C
Client side encryption
Control accessing
Using bucket policies(can constraint with public IP, Elastic IP but not private IP), bucket ACLs
Default the bucket and its content are private
Pre-signed URLs
Can generate pre-signed urls with SDK / CLI
For downloads (can use CLI)
For uploads (must use SDK)
Valid for a default 3600 seconds, can chage timeout with
--expires-in
argumentUsers given a pre-signed URL inherit the permissions of the person who generated the URL for GET / PUT
Scenarios
Allow only logged-in users to download a file.
Allow a user to upload a file temporarily
Logging
CloudTrail:
By default, bucket level access are recorded.
Object level logging can be enabled.
Replication
Cross Region Replication
For buckets in different regions
Versioning must be enabled in both bucket.
Once CRR is on, subsequent updated files will be replicated automatically.
When delete is made firstly, duplicate items would be added deletion marker altogether. Then 2nd delete is made in a bucket, only that item would be deleted, other duplicated items stay as with deletion marker. When doing recovery, only item in that bucket is recovered, other buckets must take action individually.
To avoid regional failure of S3
Enable CRR and have a different bucket name in a backup region.
Applications refer to SSM Parameter Store, replace it for DR.
Same Region Replication
Request through public / private subnets
Through public subnets: Internet Gateway is used to reach S3. Must set up bucket Policy with
AWS:SourceIP
for public IP.Through private subnets: VPC Endpoint Gateway is used to reach S3. Restricting Access to specific VPC Endpoints must set up bucket policy either:
AWS:SourceVpce
for one or few endpointsAWS:SourceVpc
to encompass all possible VPC endpoints.
For a VPC to restrict access to specific buckets
Setup a endpoint policy that explicitly allows access to the two required buckets.
Provide static website hosting
Serverless, cheap, auto-scaling, but not support HTTPS.
Co-work with Route53 - bucket name should be identical to domain name
Virtual-hosted–style URL: https://${bucket_name}.s3-${region}.amazonaws.com
The index.html is mandatory, but error page is optional.
Anti patterns
Lots of small files
POSIX file system (use EFS instead), file locks.
Search features, queries, rapidly changing data.
can index objects in DynamoDB (S3 event --> Lambda to insert data)
Website with dynamic content
Service for transferring large amount of data with physical storage (bypassing internet)
AWS Import/Export
To import/export data size is less than or equal to 16 TB to S3 or EBS.
Snowball (Import/Export Disk to/from S3)
Types:
Snowball
On-board storage with size of 50 or 80TB
Bypasses internet entirely
Snowball Edge
Durable local storage
Local compute with AWS Lambda
Local compute instances
Use in a cluster of devices
Use with AWS Greengrass (IoT)
Transfer files through NFS with a GUI
Snowmobile
Exabyte-scale data (coming with a truck)
Scenarios
Syncing data from on-premise
Uses S3 CLI
sync
command (can do several times to shorten the time for final sync, especially useful for migration to AWS.)
Check personally identifiable information (PII) with Macie
Last updated