Subscribe to the Teradata Blog

Get the latest industry news, technology trends, and data science insights each week.



Я соглашаюсь с тем, что Teradata Corporation может информировать меня о своих продуктах, а также приглашать на мероприятия и вебинары по электронной почте. Я осведомлен, что могу отписаться от рассылки по ссылке внизу письма.

Ваша конфиденциальность важна для нас. Сбор, хранение и обработка персональных данных осуществляются в соответствии с Глобальной политикой конфиденциальности.

AWS Kinesis Firehose and Teradata Vantage

AWS Kinesis Firehose and Teradata Vantage

Many Teradata customers are interested in integrating Teradata Vantage with Amazon AWS First Party Services. This Getting Started Guide will help you to connect Teradata Vantage with AWS Kinesis service. 

Although this approach has been implemented and tested internally, it is offered on an as-is basis. Neither AWS nor Teradata provide validation of Teradata Vantage with AWS services. 

We encourage your feedback. We want to understand what you found useful and how we can improve this guide.  

Please send your feedback to shamira.joshua@teradata.com and wenjie.tehan@teradata.com

Disclaimer: This guide includes content from both AWS and Teradata product documentation. 

Overview 

AWS Kinesis is a streaming service that makes it easy to collect, process, and analyze real-time, streaming data. 

Kinesis streaming data platform offers Kinesis Data Streams, Kinesis Data Firehose, Kinesis Data Analytics, and Kinesis Video Streams. Kinesis Data Streams is manually managed and can store data in the stream for up to seven days, in which transformation can be done with the data. Kinesis Firehose is fully managed, and collects the data and stores it in Amazon S3, Redshift, Splunk and Elasticsearch. Kinesis Video streams is used to stream live video and Kinesis Data Analytics can process and analyze streaming data using standard SQL. 

Teradata Vantage Native Object Store (NOS) makes it easy for users to explore data in external object stores like Amazon S3 using standard SQL and application interfaces like ODBC, JDBC, .NET, Python and R native drivers. No special object storage-side compute infrastructure is required to use NOS. You can explore data located in Amazon S3 bucket by simply creating a NOS table definition to point to the bucket you are authorized to access.  

This guide describes the process to stream data from source to Amazon S3 via AWS Kinesis firehose,  transform it to JSON format by an AWS Glue ETL job, and then use Teradata NOS to access data from Amazon S3. Lambda functions and a CloudWatch event rule is also created to automate the whole process. 
Picture1-(1).png

Prerequisites

You are expected to be familiar with AWS Kinesis, Lambda, CloudWatch services, and Teradata Vantage.
You will need the following accounts, and systems:

•    An AWS account
•    A Teradata Vantage instance with SQLE 17.0+
•    An Amazon S3 bucket to store streaming data
•    An Amazon S3 bucket to store JSON files
•    IAM roles that allow Glue Crawler, ETL and Lambda services
•    AccessKeyId and SecretAccessKey

Getting Started

Create Amazon S3 buckets
Amazon S3 buckets can be created using instructions here. Two buckets are needed in this example: one to store streaming data (i.e., ptctstoutput), and another one to store JSON files (i.e., awspilbucket) after transformation.

Create IAM role
AWS services require you to use roles to allow the service to access resource in other services on your behalf. In this example, three roles are needed – a role for Kinesis Firehose, a role for Glue, and a role for Lambda. 
Kinesis Firehose role will be created on the fly. Instructions below create roles for Glue and Lambda services.
Screen-Shot-2021-09-23-at-9-29-08-AM.pngScreen-Shot-2021-09-23-at-9-29-46-AM.png



Screen-Shot-2021-09-23-at-9-31-48-AM.png

Screen-Shot-2021-09-23-at-9-22-59-AM.png
Screen-Shot-2021-09-23-at-9-20-43-AM.png
Screen-Shot-2021-09-23-at-9-21-45-AM.png

Create Firehose Delivery System

Screen-Shot-2021-09-23-at-9-25-03-AM.pngScreen-Shot-2021-09-23-at-9-25-29-AM.pngScreen-Shot-2021-09-23-at-9-25-51-AM.pngScreen-Shot-2021-09-23-at-9-26-17-AM.pngScreen-Shot-2021-09-23-at-9-26-45-AM.png

Screen-Shot-2021-09-23-at-9-32-47-AM.pngScreen-Shot-2021-09-23-at-9-33-12-AM.pngCreate Glue ETL Transformation Job

Screen-Shot-2021-09-23-at-9-33-58-AM.pngScreen-Shot-2021-09-23-at-9-35-51-AM.pngScreen-Shot-2021-09-23-at-9-36-28-AM.pngScreen-Shot-2021-09-23-at-9-36-52-AM.pngScreen-Shot-2021-09-23-at-9-37-16-AM.pngScreen-Shot-2021-09-23-at-9-37-39-AM.pngAccessing Streaming Data Using NOS

Screen-Shot-2021-09-23-at-9-38-32-AM.pngScreen-Shot-2021-09-23-at-9-39-05-AM.pngScreen-Shot-2021-09-23-at-9-39-42-AM.pngScreen-Shot-2021-09-23-at-9-40-05-AM.pngCreate Lambda functions, Trigger, and CloudWatch event

Screen-Shot-2021-09-23-at-9-41-02-AM.pngScreen-Shot-2021-09-23-at-9-42-17-AM.pngScreen-Shot-2021-09-23-at-9-42-41-AM.pngScreen-Shot-2021-09-23-at-9-43-02-AM.pngScreen-Shot-2021-09-23-at-9-43-33-AM.pngScreen-Shot-2021-09-23-at-9-44-00-AM.pngScreen-Shot-2021-09-23-at-9-44-25-AM.pngScreen-Shot-2021-09-23-at-9-45-05-AM.pngRun

Screen-Shot-2021-09-23-at-9-45-39-AM.pngScreen-Shot-2021-09-23-at-9-46-02-AM.png
Portrait of Wenjie Tehan

(Author):
Wenjie Tehan

Wenjie is a Technical Consulting Manager, currently working with the Teradata Global Alliances team. 
 
With over 20 years in the IT industry, Wenjie has worked as developer, tester, business analyst, solution designer and project manager. This breadth of roles makes her perfect for the current role, understanding how the business needs data and how this data can be managed to meet those business needs.  
 
Wenjie has a BS in computer science from University of California at San Diego, and ME in computer engineering at Cornell University. Wenjie is also certified on both Teradata and AWS. View all posts by Wenjie Tehan

Turn your complex data and analytics into answers with Teradata Vantage.

Контакты