Skip to main content

Posts

Showing posts with the label hadoop

Managed ETL using AWS Glue and Spark

Managed ETL using AWS Glue and Spark Managed ETL using AWS Glue and Spark ETL, Extract, Transform and Load workloads are becoming popular lately. Increasing number of companies are looking for solutions to get… Managed ETL using AWS Glue and Spark ETL, Extract, Transform and Load workloads are becoming popular lately. An increasing number of companies are looking for solutions to solve their ETL problems. Moving data from one datastore to another can become a really expensive solution if the right tools are not chosen. AWS Glue provides easy to use tools for getting ETL workloads done. AWS Glue runs your ETL jobs in an Apache Spark Serverless environment, so you are not managing any Spark clusters by yourself. In order to experience the basic functionality of Glue, we will showcase how to use Glue with MongoDB as a data source. We will be moving data from MongoDB collections to S3 for analytic purposes. Later on, we can query the data in S3 using Athena, interactive query ser...