Development Notes
  • Introduction
  • Programming Langauges
    • Java
      • Cache
      • Java Fundamentals
      • Multithreading & Concurrency
      • Spring Boot
        • Spring Security
        • Development tips
      • ORM
        • Mybatis
      • Implementation & Testing
    • Node.js
      • Asynchronous Execution
      • Node.js Notes
    • Python
      • Memo
  • Data Structure & Algorithm
  • Database
  • Design Pattern
  • AWS Notes
    • Services
      • API Gateway
      • CloudHSM
      • Compute & Load Balancing
        • Auto Scaling Group
        • EC2
        • ECS
        • ELB
        • Lambda
      • Data Engineering
        • Athena
        • Batch
        • EMR
        • IoT
        • Kinesis
        • Video Streaming
        • Quicksight
      • Deployment
        • CloudFormation
        • Code Deploy
        • Elastic Beanstalk
        • OpsWorks
        • SAM
        • SSM
      • ElasticSearch
      • Identity & Federation
        • Directory Service
        • IAM
        • Organizations
        • Resource Access Manager (RAM)
        • SSO
        • STS
      • KMS
      • Management Tools
        • Catalog
        • CloudTrail
        • CloudWatch
        • Config
        • Cost Allocation Tags
        • GuardDuty
        • Savings Plans
        • Trusted Advisor
        • X-Ray
      • Migration
        • Cloud Migration: The 6R
        • Disaster Recovery
        • DMS
        • VM Migrations
      • Networking
        • ACM
        • CloudFront
        • Direct Connect
        • EIP & ENI
        • Network Security
        • PrivateLink
        • Route53
        • VPC
        • VPN
      • Service Commnucation
        • Amazon MQ
        • SNS
        • SQS
        • Step Functions
        • SWF
      • Storage
        • Aurora
        • DynamoDB
        • EBS
        • EFS
        • ElastiCache
        • RDS
        • Redshift
        • S3
        • Storage Gateway
      • Other Services
        • Alexa for Business, Lex, Connect
        • AppStream 2.0
        • CloudSearch
        • Comprehend
        • Data Tools
        • Elastic Transcoder
        • Mechanical Turk
        • Rekognition
        • WorkDocs
        • WorkSpaces
    • Well Architect Framework
      • Security
      • Reliability
      • Performance Effeciency
      • Cost Optimization
      • Operational Excellence
    • Labs
      • Webserver Implementation
      • ELB Implementation
      • Auto-scaling Implementation
      • A 3-tier Architecture In VPC
  • Architecture
    • Security
  • Spark
    • Memo
  • Conference Notes
    • Notes of JCConf 2017
  • AI Notes
Powered by GitBook
On this page

Was this helpful?

  1. AWS Notes
  2. Services
  3. Storage

S3

PreviousRedshiftNextStorage Gateway

Last updated 4 years ago

Was this helpful?

Introduction

  • Simple Storage Service (S3) is an object based file storage (not block storage for OS or applications running on), Single file size can be 0 to 5T. No limit for total usage.

  • S3 is global, but choose region when creating a bucket.

  • Files are stored in Buckets. Bucket name is unique globally.

  • A Bucket created with a domain name has the path-style URL:

  • Working with HTTP protocols: PUT, DELETE. Return status code 200 if the action is successful. Multipart transferring is supported.

Feature

  • Can read immediately after upload, but update or delete would take more time to propagate (eventual consistency)

  • Support Cross-origin resource sharing (CORS) for servers in different domains.

  • Composition:

    • Key (name)

      • S3 Stores data by name with alphabetical order, so filtering with file name might hit the performance. Adding random letters or numbers into name (for folder, file) would help files storing evenly through S3.

    • Value (file)

    • Version id

    • Metadata

    • Sub-resources

      • Access Control Lists (for privilege)

      • Torrent

  • Storage Class:

  • Performance:

    • Baseline Performance (by prefixes in a bucket)

      • scales automatically, latency: 100 ~ 200 ms

      • can achieve 5,500 req/s for GET/HEAD, 3,500 req/s for other REST APIs per prefix in a bucket.

      • No limits to the number of prefixes in a bucket.

    • Multi-part Upload

      • Recommended: file > 100 MB, mandatory: file > 5 GB.

    • Byte-range Fetches (for downloads)

      • Parallelize GETs by requesting specific byte ranges

    • Transfer Acceleration (can be combined with multi-part upload)

      • Upload to CloudFront edge location, then move to S3 target region.

      • Transfer Acceleration over a fully-utilized 1 Gbps line can transfer up to 75 TB in the same time of Snowball turnaround time, if it will take more than a week to transfer over the Internet, or there are recurring transfer jobs and there is more than 25Mbps of available bandwidth, Transfer Acceleration is a good option.

  • Glacier retrieval options:

    • Expedited (1 to 5 minutes)

    • Standard (3 to 5 hours)

    • Bulk (5 to 12 hours)

  • Glacier Deep Archive retrieval options:

    • Standard (12 hours)

    • Bulk (48 hours)

    • Retrieve subsets of objects using SQL by performing server side filtering.

    • Can filter by rows & columns (simple SQL statements)

    • Faster & cheaper

  • Charges:

    • Storage

    • Requests

    • Storage management pricing (file tagging)

    • Data transfer pricing

    • Transfer acceleration

    • Monitoring cost (only for S3 Intelligent-Tiering, $0.0025 for 1000 objects)

  • Cost Saving tips:

    • S3 Select / Glacier Select

    • S3 Lifecycle

    • Compress object to save space

    • S3 Requester Pays:

      • Bucket owner pays for S3 storage

      • Requester pays for the cost of request and data download

      • Use bucket policy, not use IAM Role (Otherwise, it's still you to pay).

  • Lifecycle management

    • Transition from standard to IA (128Kb and 30 days after creation)

    • Archive to Glacier (30 days after IA if relevant)

    • Permanent deletion

  • Versioning

    • Stores all versions of an object (including all writes, even the object is deleted)

    • Once enabled, it can't be disabled, only suspended.

    • Integrate with lifecycle rule

    • Versioning's MFA delete can be enabled to:

      • Changing versioning state

      • Permantly delete an object version

    • If want to receive every write event notification, must enable versioning. Otherwise may get only one notification for multiple writes without versioning.

  • S3 Object Lock & Glacier Vault Lock

    • S3 Object Lock

      • Adopt a WORM (Write once, read many) model.

      • Block an object version deletion for a specified amount of time.

    • Glacier Vault Lock

      • Adopt a WORM (Write once, read many) model.

      • Lock the policy for future edits (can no longer be changed).

      • Helpful for compliance and data retention.

  • Encryption:

    • In transit (SSL/TLS)

    • At rest

        • S3 managed keys: SSE-S3

        • AWS Key Management Service, Managed Keys: SSE-KMS (similar to SSE-S3 but with some additional benefits)

        • Customer provided (managed) keys: SSE-C

      • Client side encryption

  • Control accessing

    • Using bucket policies(can constraint with public IP, Elastic IP but not private IP), bucket ACLs

    • Default the bucket and its content are private

    • Pre-signed URLs

      • Can generate pre-signed urls with SDK / CLI

        • For downloads (can use CLI)

        • For uploads (must use SDK)

      • Valid for a default 3600 seconds, can chage timeout with --expires-in argument

      • Users given a pre-signed URL inherit the permissions of the person who generated the URL for GET / PUT

      • Scenarios

        • Allow only logged-in users to download a file.

        • Allow a user to upload a file temporarily

  • Logging

    • CloudTrail:

      • By default, bucket level access are recorded.

      • Object level logging can be enabled.

  • Replication

    • Cross Region Replication

      • For buckets in different regions

      • Versioning must be enabled in both bucket.

      • Once CRR is on, subsequent updated files will be replicated automatically.

      • When delete is made firstly, duplicate items would be added deletion marker altogether. Then 2nd delete is made in a bucket, only that item would be deleted, other duplicated items stay as with deletion marker. When doing recovery, only item in that bucket is recovered, other buckets must take action individually.

      • To avoid regional failure of S3

        • Enable CRR and have a different bucket name in a backup region.

        • Applications refer to SSM Parameter Store, replace it for DR.

    • Same Region Replication

  • Request through public / private subnets

    • Through public subnets: Internet Gateway is used to reach S3. Must set up bucket Policy with AWS:SourceIP for public IP.

      • AWS:SourceVpce for one or few endpoints

      • AWS:SourceVpc to encompass all possible VPC endpoints.

    • Setup a endpoint policy that explicitly allows access to the two required buckets.

  • Provide static website hosting

    • Serverless, cheap, auto-scaling, but not support HTTPS.

    • Co-work with Route53 - bucket name should be identical to domain name

    • The index.html is mandatory, but error page is optional.

  • Anti patterns

    • Lots of small files

    • POSIX file system (use EFS instead), file locks.

    • Search features, queries, rapidly changing data.

      • can index objects in DynamoDB (S3 event --> Lambda to insert data)

    • Website with dynamic content

Service for transferring large amount of data with physical storage (bypassing internet)

  • AWS Import/Export

    • To import/export data size is less than or equal to 16 TB to S3 or EBS.

  • Snowball (Import/Export Disk to/from S3)

    • Types:

      • Snowball

        • On-board storage with size of 50 or 80TB

        • Bypasses internet entirely

      • Snowball Edge

        • Durable local storage

        • Local compute with AWS Lambda

        • Local compute instances

        • Use in a cluster of devices

        • Use with AWS Greengrass (IoT)

        • Transfer files through NFS with a GUI

      • Snowmobile

        • Exabyte-scale data (coming with a truck)

Scenarios

  • Syncing data from on-premise

    • Uses S3 CLI sync command (can do several times to shorten the time for final sync, especially useful for migration to AWS.)

Through private subnets: VPC Endpoint Gateway is used to reach S3. must set up bucket policy either:

Virtual-hosted–style URL:

Check personally identifiable information (PII) with

https://s3-${region}.amazonaws.com/${bucket_name}
S3 Select and Glacier Select
Version Id is null if Versioning is not enabled. Once it's enabled, object creations / modifications would be added with a specific numeric Version id.
Server side encryption options
Restricting Access to specific VPC Endpoints
For a VPC to restrict access to specific buckets
https://${bucket_name}.s3-${region}.amazonaws.com
Macie
S3 Storage Class