All About AWS Analytics and Media Services

Anish Antony
9 min readDec 7, 2021

Here we discuss different AWS Analytics and Media Services such asn Amazon EMR, Amazon Kinesis, Amazon Athena, Glue and Amazon Workspace.

Photo by Carlynn Alarid on Unsplash

#1. Amazon EMR

Amazon EMR is a web service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data. EMR utilizes a hosted Hadoop framework running on Amazon EC2 and Amazon S3. Managed Hadoop framework for processing huge amounts of data. The Amazon EMR also support Apache Spark, HBase, Presto and Flink. The EMR is most commonly used for log analysis, financial analysis, or extract, translate and loading (ETL) activities.

EMR uses Apache Hadoop as its distributed data processing engine which is an open-source, Java software framework that supports data-intensive distributed applications running on large clusters of commodity hardware. EMR is a good place to deploy Apache Spark, an open-source distributed processing used for big data workloads which utilizes in-memory caching and optimized query execution. You can also launch Presto clusters. Presto is an open-source distributed SQL query engine designed for fast analytic queries against large datasets. EMR launches all nodes for a given cluster in the same Amazon EC2 Availability Zone. You can access Amazon EMR by using the AWS Management Console, Command Line Tools, SDKs, or the EMR API. With EMR you have access to the underlying operating system (you can SSH in).

#2. Amazon Kinesis

Amazon Kinesis makes it easy to collect, process, and analyze real-time, streaming data so you can get timely insights and react quickly to new information. Collection of services for processing streams of various data. In Data is processed in “shards” — with each shard able to ingest 1000 records per second. There is a default limit of 500 shards, but you can request an increase to unlimited shards. A record consists of a partition key, sequence number, and data blob (up to 1 MB). The kinesis provides a transient data store, which means it has default retention of 24 hours, but can be configured for up to 7 days.

There are four types of Kinesis service and these are detailed below.

#1. Kinesis Video Streams

Anish Antony

Fullstack Developer | Blogger | Experience on Java, Python, React, Angular, Golang | http://www.behindjava.com