Django가 고장날때까지

AWS 오답노트 본문

카테고리 없음

AWS 오답노트

Django가 고장날때까지 2025. 1. 2. 22:41
반응형

 

 

 

 

 

  • 출제의도를 분석해라. 

 

cost(비용)

  • a cost-effective solution with the least engineering effort?
  • the most cost-effective method
  • with minimal cost implications.
  • the need for a cost-effective and reliable solution, which of the following systems should the website employ?
  • most cost-effective way

 

efficient(효율적인)

  • most efficient method - achieve this goal effectively
  • Which approach effectively utilizes Amazon API Gateway in this scenario?
  • recommend to implement the database requirements and integrate user authentication
  • to achieve this goal with minimal maintenance and without impacting existing read and write operations?
  • ensure your Lambda function can properly perform its actions?
  • Which of the following solutions would meet this requirement?(내가 쳤을땐 없었음)
  • Which TWO of the following options would you recommend to perform analytical queries without impacting the primary database's performance?

 

  • What would be the simplest solution to address this issue?
  • Which of the following approaches would you recommend?

  • 1) Configure Amazon CloudWatch Logs to directly stream logs to Kinesis Data Streams, relying on its native capabilities for handling retries and batching, and use a separate custom solution for clickstream data.

    2) Utilize the Amazon Kinesis Producer Library (KPL) for Java applications to ensure efficient batching, asynchronous data transmission, and built-in retry mechanisms for both clickstream data and server logs.

    3) Implement a custom Java solution using the AWS SDK for Kinesis, manually handling retries, batching, and asynchronous sending for both clickstream data and server logs.

    4) Deploy an AWS Lambda function to act as an intermediary, processing and forwarding clickstream data and server logs from your applications to Kinesis with custom logic for batching and retries.
  • 다양한 방식이 있는데, 어떤게 효율적이라는 것을 어떻게 판단하지? 

 

  • What action should the team take to troubleshoot and address the issue related to resource allocation and improve the AWS Glue ETL job performance?
  • Which AWS service should the team use to automate and manage this ETL process with the required level of control and flexibility?
  • Which TWO of the following strategies would effectively comply with the encryption and key management stipulations?

 

 

AWS Database Migration Service (DMS)

 

IAM polices to restrict

IAM roles for service accounts(IRSA) feature

IAM role that grants permissions to create EMR clusters

IAM role in the S3 account with permissions to access the bucket and assume this role from Redshift Spectrum

 

AWS Cognito

 

secure and efficient access to Amazon DynamoDB without embedding AWS credentials within the containers.

 

Amazon AppFlow

 

AWS Data Pipeline

 

AWS Secrets Manager 

 

EC2 - computing, memeory  # 보기 중에 2개 무엇에 초점을 맞춘거 쓸거냐고, spark job 할때 - cpu는 거의 사용량이 적고 memory 사용량이 많을 때

# 시험에서 본 유형과 유사

A data engineering team is developing a pipeline to process and prepare large datasets for training machine learning models. This pipeline involves intensive data manipulation tasks, such as feature engineering, normalization, and encoding, all of which require substantial compute resources. The team aims to optimize for both performance and cost. Given these requirements, which Amazon EC2 instance type should they choose to efficiently handle these compute-intensive data processing tasks?

  • 1) m5.large instances, which are powered by Intel Xeon Platinum 8175M processors, known for their balance of compute, memory, and networking resources for general-purpose tasks.
  • 2) c6g.2xlarge instances, which are powered by AWS Graviton2 processors, offering a significant price-performance advantage for compute-intensive applications.
  • 3) r5a.large instances, which utilize AMD EPYC 7000 series processors, providing a cost-effective option for memory-intensive applications.
  • 4) p3.2xlarge instances, which feature NVIDIA V100 Tensor Core GPUs, optimized for machine learning and high-performance computing workloads.

 

 

복합키

 

S3

versioning on the S3 bucket

S3 Glacier

Glacier Deep Archive

S3 select 

Glacier Select operation

Glacier using Bulk retrieval

Glacier using Expedited retrieval

AWS S3 event notifications

S3 Access Points

S3 Object Ownership 

S3 Object Lock

S3 Bucket Policies to enforce server-side encryption with customer-provided keys(SSE-c)

S3 server-side encryption(SSE) with AWS KMS-managed keys(SSE-KMS)

S3 Cross-Region Replication

S3 VPC Endpoint

S3 Transfer Acceleration

S3 Multipart upload feature with MD5 checksum

S3 Lifecycle policy

S3 Intelligent-Tiering 

 

Spark

Spark job

 

Redshift

Redshift to enforce RLS

Redshift COPY  # 시험에 나온거 같아

Redshift Spectrum

workload management(WLM) queries  # 자주 나옴
You need to ensure a real-time replica of your Amazon Redshift data warehouse, which utilizes RA3 nodes, is maintained across multiple availability zones.

Redshift cross-AZ data replication

Redshift's native snapshot

 

 

 

# 이거 시험에 나온거랑 거의 유사한 듯

A data engineer needs to optimize query performance on an Amazon Redshift cluster. They want to analyze query execution times to identify long-running queries and assess the impact of different optimization strategies. Which Amazon Redshift system table or view should the data engineer query to obtain detailed information about query execution, including start and end times, to facilitate this analysis?

STV_TBL_PERM

STV_BLOCKLIST
STL_QUERY
STL_WLM_QUERY

 

 

Athena

Athena to monitor all COPY and UNLOAD

Athena to run standard SQL 

 

AWS Lake Formation

 

AWS DataSync

 

AWS Glue Workflows

AWS Glue ETL (jobs)

AWS Glue job

AWS Glue Data Catalog

AWS Glue DataBrew  # 이거 자주 나왔음 - Split column, Format phone number, Find and replace, Extract pattern

Data Processing Units (DPUs) allocated to the AWS Glue ETL job

 

EMR

EMR cluster, which subsequently performs transformations in S3 through automating the creation of an EMR cluster, which subsequently performs transformations in S3 through EMRFS(EMR File System).

 

Amazon Aurora with Auto Scaling

 

Amazon EventBridge  # 자주 나온 듯

 

Amazon CloudWatch Events

Amazon CloudWatch Logs

 

Lambda

 

AWS Step Function

  • Choice
  • Task
  • Map
  • Parallel

 

Amazon Kinesis Data Streams

Reduce the batch size in thenKinesis Data Streams's Lambda trigger

Amazon Kinesis Data Firehose(in buffered mode)

Amazon S3 generated by a Kinesis Firehose pipeline.

Kinesis Producer Library(KPL)

Kinesis shards

 

Amazon Simple Queue Service (SQS) queue

SQS dead-letter que(DLQ): Implementing an SQS dead-letter queue (DLQ) is the optimal solution for capturing messages that fail to be processed after a specified number of attempts.

 

VPC flow logs 

VPC access

VPC peering

VPC Endpoints for DynamoDB

 

VPN connection

 

AWS Direct Connect

 

OpenSearch

 

AWS CloudTrail

 

AWS Global Accelerator

 

AWS Config Rules

 

AWS WAF to monitor and filter traffic to and from the Amazon Redshift cluster

 

AWS Key Management Service(KMS) customer master key(CMK)

AWS-managed keys

 

Amazon DynamoDB

Amazon DynamoDB streams

DynamoDB Time-to-Live(TTL)

DynamoDB On-Demand

DynamoDB Auto Scaling

DynamoDB read/write capacities.

DynamoDB Accelerator(DAX)

 

AWS Data Pipeline

 

 

각기 다른 팀의 데이터 제한 관련 내용  # 자주 나옴

Amazon QuickSight

Amazon ElastiCache

 

AWS CloudFront

 

personally identifiable information (PII)

 

Amazon GuardDuty  # 시험에 안나옴

AWS Shield  # 시험에 안나옴

Amazon Macie  # 시험에 안나옴

 

an on-premise PostgreSQL database as the main OLTP engine

AWS SDK

 

 

troubleshoot and address the issue related to resource allocation and improve the AWS Glue ETL job performance?

  • Migrate the ETL workload to a larger instance type within AWS Glue to ensure better performance.
  • Increase the number of Data Processing Units (DPUs) allocated to the AWS Glue ETL jobs to provide additional processing power.
  • Review and optimize the ETL script to reduce complexity and improve execution efficiency.
  • Implement data compression on the source files in Amazon S3 to decrease the volume of data processed by AWS Glue.

AWS CLI

Amazon API Gateway

NAT Gateway

AWS Snowball

 

# 시험에 안나옴

  • Stratified Sampling
  • Random Sampling
  • Cluster Sampling
  • Systematic Smapling
  • SageMaker
  • SageMaker Lineage
  • SageMaker Experiments
  • SageMaker Debugger
  • SageMaker Model Monitor
  • MSK

 

 

A data engineering team is tasked with setting up a reliable and scalable streaming data pipeline using Amazon Managed Streaming for Apache Kafka (MSK) to process real-time data from various sources. The goal is to ensure that the data can be ingested efficiently, processed, and then stored in Amazon S3 for further analysis. Considering best practices for high availability, scalability, and cost-effectiveness, which of the following configurations should the team implement?

  • 1)mplement an MSK cluster with three broker nodes across multiple AZs, but manage scaling manually based on anticipated load increases. Use Apache NiFi for data ingestion, Kafka Streams for real-time data processing, and manually manage Kafka Connect with the S3 sink connector for storing data in S3.
  • 2) Deploy an MSK cluster with at least three broker nodes spread across multiple AZs and enable auto-scaling based on CPU utilization. Utilize Amazon Kinesis Data Firehose for data ingestion, Apache Kafka Connect with the S3 sink connector for moving processed data to S3, and manage stream processing using Apache Spark.
  • 3) Configure an MSK cluster with a single broker node across multiple Availability Zones (AZs) and enable auto-scaling based on CPU utilization. Use a custom Java producer application for data ingestion and Apache Flink for stream processing, with the results written directly to S3.
  • 4) Set up an MSK cluster with three broker nodes in a single AZ and disable auto-scaling to control costs. Employ AWS Lambda for data ingestion, directly processing streams with AWS Glue, and use the Kafka REST Proxy for writing the results to S3.

 

반응형
Comments