S3

Introduction

Simple Storage Service (S3) is an object based file storage (not block storage for OS or applications running on), Single file size can be 0 to 5T. No limit for total usage.
S3 is global, but choose region when creating a bucket.
Files are stored in Buckets. Bucket name is unique globally.
A Bucket created with a domain name has the path-style URL: https://s3-${region}.amazonaws.com/${bucket_name}
Working with HTTP protocols: PUT, DELETE. Return status code 200 if the action is successful. Multipart transferring is supported.

Feature

Can read immediately after upload, but update or delete would take more time to propagate (eventual consistency)
Support Cross-origin resource sharing (CORS) for servers in different domains.
Composition:
- Key (name)
  - S3 Stores data by name with alphabetical order, so filtering with file name might hit the performance. Adding random letters or numbers into name (for folder, file) would help files storing evenly through S3.
- Value (file)
- Version id
- Metadata
- Sub-resources
  - Access Control Lists (for privilege)
  - Torrent
Storage Class:

Performance:
- Baseline Performance (by prefixes in a bucket)
  - scales automatically, latency: 100 ~ 200 ms
  - can achieve 5,500 req/s for GET/HEAD, 3,500 req/s for other REST APIs per prefix in a bucket.
  - No limits to the number of prefixes in a bucket.
- Multi-part Upload
  - Recommended: file > 100 MB, mandatory: file > 5 GB.
- Byte-range Fetches (for downloads)
  - Parallelize GETs by requesting specific byte ranges
- Transfer Acceleration (can be combined with multi-part upload)
  - Upload to CloudFront edge location, then move to S3 target region.
  - Transfer Acceleration over a fully-utilized 1 Gbps line can transfer up to 75 TB in the same time of Snowball turnaround time, if it will take more than a week to transfer over the Internet, or there are recurring transfer jobs and there is more than 25Mbps of available bandwidth, Transfer Acceleration is a good option.
Glacier retrieval options:
- Expedited (1 to 5 minutes)
- Standard (3 to 5 hours)
- Bulk (5 to 12 hours)
Glacier Deep Archive retrieval options:
- Standard (12 hours)
- Bulk (48 hours)
S3 Select and Glacier Select
- Retrieve subsets of objects using SQL by performing server side filtering.
- Can filter by rows & columns (simple SQL statements)
- Faster & cheaper
Charges:
- Storage
- Requests
- Storage management pricing (file tagging)
- Data transfer pricing
- Transfer acceleration
- Monitoring cost (only for S3 Intelligent-Tiering, $0.0025 for 1000 objects)
Cost Saving tips:
- S3 Select / Glacier Select
- S3 Lifecycle
- Compress object to save space
- S3 Requester Pays:
  - Bucket owner pays for S3 storage
  - Requester pays for the cost of request and data download
  - Use bucket policy, not use IAM Role (Otherwise, it's still you to pay).
Lifecycle management
- Transition from standard to IA (128Kb and 30 days after creation)
- Archive to Glacier (30 days after IA if relevant)
- Permanent deletion
Versioning
- Version Id is null if Versioning is not enabled. Once it's enabled, object creations / modifications would be added with a specific numeric Version id.
- Stores all versions of an object (including all writes, even the object is deleted)
- Once enabled, it can't be disabled, only suspended.
- Integrate with lifecycle rule
- Versioning's MFA delete can be enabled to:
  - Changing versioning state
  - Permantly delete an object version
- If want to receive every write event notification, must enable versioning. Otherwise may get only one notification for multiple writes without versioning.
S3 Object Lock & Glacier Vault Lock
- S3 Object Lock
  - Adopt a WORM (Write once, read many) model.
  - Block an object version deletion for a specified amount of time.
- Glacier Vault Lock
  - Adopt a WORM (Write once, read many) model.
  - Lock the policy for future edits (can no longer be changed).
  - Helpful for compliance and data retention.
Encryption:
- In transit (SSL/TLS)
- At rest
  - Server side encryption options
    S3 managed keys: SSE-S3
    AWS Key Management Service, Managed Keys: SSE-KMS (similar to SSE-S3 but with some additional benefits)
    Customer provided (managed) keys: SSE-C
  - Client side encryption
Control accessing
- Using bucket policies(can constraint with public IP, Elastic IP but not private IP), bucket ACLs
- Default the bucket and its content are private
- Pre-signed URLs
  - Can generate pre-signed urls with SDK / CLI
    For downloads (can use CLI)
    For uploads (must use SDK)
  - Valid for a default 3600 seconds, can chage timeout with --expires-in argument
  - Users given a pre-signed URL inherit the permissions of the person who generated the URL for GET / PUT
  - Scenarios
    Allow only logged-in users to download a file.
    Allow a user to upload a file temporarily
Logging
- CloudTrail:
  - By default, bucket level access are recorded.
  - Object level logging can be enabled.
Replication
- Cross Region Replication
  - For buckets in different regions
  - Versioning must be enabled in both bucket.
  - Once CRR is on, subsequent updated files will be replicated automatically.
  - When delete is made firstly, duplicate items would be added deletion marker altogether. Then 2nd delete is made in a bucket, only that item would be deleted, other duplicated items stay as with deletion marker. When doing recovery, only item in that bucket is recovered, other buckets must take action individually.
  - To avoid regional failure of S3
    Enable CRR and have a different bucket name in a backup region.
    Applications refer to SSM Parameter Store, replace it for DR.
- Same Region Replication
Request through public / private subnets
- Through public subnets: Internet Gateway is used to reach S3. Must set up bucket Policy with AWS:SourceIP for public IP.
- Through private subnets: VPC Endpoint Gateway is used to reach S3. Restricting Access to specific VPC Endpoints must set up bucket policy either:
  - AWS:SourceVpce for one or few endpoints
  - AWS:SourceVpc to encompass all possible VPC endpoints.
For a VPC to restrict access to specific buckets
- Setup a endpoint policy that explicitly allows access to the two required buckets.
Provide static website hosting
- Serverless, cheap, auto-scaling, but not support HTTPS.
- Co-work with Route53 - bucket name should be identical to domain name
- Virtual-hosted–style URL: https://${bucket_name}.s3-${region}.amazonaws.com
- The index.html is mandatory, but error page is optional.
Anti patterns
- Lots of small files
- POSIX file system (use EFS instead), file locks.
- Search features, queries, rapidly changing data.
  - can index objects in DynamoDB (S3 event --> Lambda to insert data)
- Website with dynamic content

Service for transferring large amount of data with physical storage (bypassing internet)

AWS Import/Export
- To import/export data size is less than or equal to 16 TB to S3 or EBS.
Snowball (Import/Export Disk to/from S3)
- Types:
  - Snowball
    On-board storage with size of 50 or 80TB
    Bypasses internet entirely
  - Snowball Edge
    Durable local storage
    Local compute with AWS Lambda
    Local compute instances
    Use in a cluster of devices
    Use with AWS Greengrass (IoT)
    Transfer files through NFS with a GUI
  - Snowmobile
    Exabyte-scale data (coming with a truck)

Scenarios

Syncing data from on-premise
- Uses S3 CLI sync command (can do several times to shorten the time for final sync, especially useful for migration to AWS.)
Check personally identifiable information (PII) with Macie

PreviousRedshift NextStorage Gateway

Last updated 4 years ago

Was this helpful?