Top Banner
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 60 . !" 8912 $%# B* +!)&# ",A'(
72

B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Aug 28, 2018

Download

Documents

doanthu
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

60 .8 9 1 2

B

A

Page 2: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

����❏

❏ER C M R Sa C A R

A R

❏ W C❖❖ ❖

Page 3: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

eo l• # ( k eo i a

i a eoG c• b eoG

r• ( T GW

)

t keo ns����

Page 4: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

2/ 39 : wS 2 e ej i zvL

) )2 r Ln d su R J. 3 T 9C

) ()2mL p dhs tLu J- 1 2 0 8 9C

a e S Rb I a WRIj i zvL p o L cmiu :CC ) DC 9E9 C 9 A

Page 5: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

��������

• ( ) j nl Mcauv S b/. hiojgm 0 A 0 0 2 w W

• sS Mdb uv /. hiojgm uv t/. hiojgm uvf W b

• uv b My q prf ef z W b

• /. A 18 8 6 82 C A /. 828 6 8A C1:82: 0 08:01: 0 8A AC19 22 0 6 8 022 0 2 8 /. ,CA 6 0 08:01: 0

0 A 0 0 2 06 828 6 8 0 8 8 2:C 8 8A 2C 8A8 : 0A 0 A 8 0 CA06 2 0 6 A /. A 82 A 10A 2 08 8 0 8

0 C 0 8 : 2 0 6 A 8:: 1 10A C 02 C0: CA /. A 82 A 00 0 A 8 0 A 8

Page 6: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

��������❏ C�&"�.6����

❖ ��)��� ���3>=�1G=�@8,?��AB❖ ��)��� DWH� ���1G=�@8�AB

❏ :75❖ ��+E0D�/���*�#���"'�!������!�A❖ DWH�DB Administrator �A❖ 2� AWS ���$��-F)%(�;4�9<� ���

Amazon S3 / AWS Glue / Amazon Redshift �

Page 7: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

������❏ ���������������❏ ����������� #"��+"�&!�%�� '(

❏ ����������������$��� *��)������+"�&!� '(

Page 8: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

������������

Page 9: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

��������

��� ������������

������ ����������

��� ��$���������! ���

���#� ��������"������� � ��

Page 10: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

�����������

W eb and m obile data

Logs

Social M edia data

Stream ing data IO T data

Spreadsheets

Structured data Unstructured and Semi-structured data

������

Page 11: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

���������Data Volume

������

1990 2000 2010 2020

Ge ne rat ed DataAvaila ble for Ana lysis������� ������

Page 12: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

������������

Analysts

Business Users

Applications

Agile Real time

Flexible Scale

����� ��������

Data Scientists

Page 13: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

+,*���������

�!���� ���

���������$/

���"��(�)%���� ��

#0.&�����-'OLTP ERP CRM LOB

Data Warehouse

Business Intelligence

Page 14: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

R( )

SOLTP ERP CRM LOB

Data Warehouse

Business Intelligence

Data Lake

1001100001001010111001010101110010101000010111110110100011110010110010110

0100011000010

Devices Web Sensors Social

Data Catalog

Machine Learning

DW Queries

Big data processing Interactive Real-time

Page 15: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

on AWS

D eU

SG an

t Bhd s

o s U c

m5 / 0 .5 2 00-

Snowball

Snowmobile KinesisData Firehose

KinesisData Streams

Amazon S3 AWS Glue

Redshift

EMR

AthenaKinesis

Elasticsearch Service

������ ��������

Page 16: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

&�%)�!��+��

���/J�&�%�B"#'�<�78��H>����-E

&�%�FA��,@�K��$ (��6���-E

=9D�&�%��I5*� �!$#�C0��

�����.?�:��&�%G;13�42-E

Page 17: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

���(8� ���4��! 1�/0��72����%6

( )) �.��� ����*,��': ���+������%6

5$����#)�-9�� ���&3���"%6

Page 18: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

�������

,%��� ( >������!)'��&��$������� ��������*���-.>

�+#"��������� ( ������>

Page 19: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

������������������ ������

à AWS Glue

Page 20: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

AWS Glue �����

Z��+�('�%A�+�(��68$8 �4*):�<M�9T��"�,�PO�����"�,�@R��%(2�&���UL�"�,�K�G����;V

#�.�6%�=Q�?\��%!�5��>B�$30�MF���17/$3-8 ���?\�����CJ4'�%�X[��

��

LTS�+�(�WE�Y]������Z��+�('�%���� DH�IN����MF;V��

Page 21: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

AWS Glue ����

��

Apache Hive �������,*

AWS����8.����

097��&�$'�

��

PySpark�Scala�6(��ETL����097�54�

<2������+=);

��

���%�1-);

3:����"�$'�

!��$'��#��1/);

Data Catalog Job Authoring Job Execution

Page 22: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

AWS Glue Data Catalog ����$��������������')��&"(� ��$���Amazon RDS�Amazon Redshift�Amazon S3 �3�(…;,7D�20�9��H>�)��

❏ �����G:��������#� View�6A.C❏ � �%�1 �<+'�!B� ���4@?�')��❏ ���� &�(���� ���F-�=8��❏ Amazon Athena ���� Amazon Redshift Spectrum �JI�� ���!"��

❏ Apache Hive metastore������Amazon EMR �51���#'���&*�/E%��!����JI.C

Page 23: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

������ on Amazon S3 with AWS Glue

O n prem ises data

W eb app data

Am azon R DS

O ther databases

Stream ing data

Your data

AMAZON QUICKSIGHT

AWS GLUE ETL

Page 24: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Glue Data Catalog ���������

��������� Hive DDL ���������

Glue ������ API �����

Page 25: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Glue Data Catalog ����������1

2 1 1

Data Catalog 1 .

Page 26: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Crawler�����Crawler �M\Z� Data Catalog HV��KRQU :L��

❏ b��,�*)-!�aT ����,�* )$49��3*,�*�,�*�[Df WP��Data Catalog �+�07Y@ X=

❖ S�J��#)*2 classifier ec���,�* _g❖ Grok N ec��]M� classifier Y@�������

❏ R��,�* ^F��)$�1Y@ WP❖ )$�1�`< EP��+�07�.�(69 GR��❖ Amazon S3 ���,�*��Hive CN�/�+"'69 EP��

❏ dA�;�������)%(5�7�?���OI❖ &�.�8)����Crawler �OI �����>B ��

Page 27: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Glue Data Catalog ������

Nested fields

�����

��� �

�������

Page 28: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

���������������Crawler �'�#�!�-0�����'�#�F�� classifier �@9�KQ��62�&�*.���,#'�#� Data Catalog �I1��❏ classifier�'�#�)��+%(�L>��!��+�DC

❏ classifier �P4��OS�9���������=�3����H(0.0 � 1.0) �7��Crawler �OS�� �����MG

❏ ABN�� classifier �@9� Crawler �"%(�Crawler �+%$�����J5� A� classifier �?8��

❏ Glue � /�.�<�E:;�� classifier �RQ��������������� classifier ������ ���

Page 29: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Crawler ����������

IAM Role

Glue Crawler

Data Lakes

Data Warehouse

Databases

AmazonRDS

AmazonRedshift

Amazon S3

JDBC Connection

Object Connection

Built-In Classifiers

MySQLMariaDB

PostreSQLOracle

Microsoft SQL ServerAmazon Aurora

Amazon Redshift

AvroParquet

ORCXML

JSON & BSONLogs

( A p a c h e ( G r o k ) , L in u x ( G r o k ) , M S ( G r o k ) , R u b y , R e d is , a n d m a n y o t h e r s )

Delimited(c o m m a , p ip e , t a b , s e m ic o l o n )

Compressions( Z IP , B Z IP , G Z IP , L Z 4 , S n a p p y )

���� classifier ����

Page 30: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

���� classifier �����❏ Grok ��"�#2������%�5��:= �!�03���� �6,����� classifier �3'�����$7

❏ Grok���"��#4�1)�����.*����+<���/&8(�;19�-*

❏ Example:%{TIMESTAMP_ISO8601:timestamp} ¥[%{MESSAGEPREFIX:message_prefix}¥] %{CRAWLERLOGLEVEL:loglevel} :%{GREEDYDATA:message}

Page 31: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

���� classifier1. ���� classifier ���� 2. Crawler ����� classifier ����

Page 32: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Crawlers: ����������semi-structured unified schema

enumerateS3 objects

file 1

file 2

file N

struct

char

bool int

int

array

char int

identify file type and parse files

custom classifiers

Grok based parser

built-in classifiersJSON parser

CSV parserParquet parser

semi-structuredper-file schema

intchar

struct

char int

arraychar

int

array

intbool

Page 33: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

���������

name: str id: num

Schema A root

addr

street: str city: str zip: num

name: str id: num

Schema B root

addr: str

Schema similarity heuristic§ ��� ���� +1 point§ ���� ���� +1 point§ sim > 0.7 ������

7

8.875

intersection

min(A,B)sim

à ���

Page 34: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

���������

����������

Page 35: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

������� �����

�����S3 ������

!� ����� ������"����������������� ��#����

Page 36: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

��������

��������������������������

Page 37: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

������ Import/Export

Apache Hive Metastore

Apache Hive Metastore

Import from an external metastore Export to an external metastore

AW S G LU E ETL AW S G LU E ETLAW S G LU E DATA C ATALO G

. :- . - / / / ./. . . . .

Page 38: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

����� ��� AWS Glue �������������������������

Page 39: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

����������

"%! �-+��#��&(�,����*)��� ���������������"%�'$�,

Amazon Athena ����&(�,(Amazon Athena �������� )

Page 40: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

������������ ���

Page 41: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Amazon Athena

,.1 .9�(+1�[`�Amazon S3 �MW������0�-�&$=�NICY

&$=DG�0�-AO�5�/!*;3@'�dMH7%�:.1�a_������ 1 &$=����cG� 30�90% U^�57%�:@+�HR�CY

ANSI SQL PF�"@-7#�+�JDBC/ODBC 2<"4�ZP0�-7%�:.1�AO�� �� \K� join Qb�VB

)�4>+��)�4�TW�Eb�[`�Amazon QuickSight(BI)�XJ�����

❏ ZP SQL�L_ �Amazon S3�0�-�]S��"@-<&/!8�&$=)�6+❏ TW���Eb��"@7<����?�2��0�-���

SQLQuery Instantly Pay per query Open Easy

$

Page 42: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Amazon EMR

77��H30]=[LQ���0< ��.9�3;���IQ

_?E�EC2�1#&�<�!<��7��/'�<�!<����&���7<��db��O\?EVF���&�50-80% MG>^

Amazon S3 JW��%�!9���%�!� EMRFS�)�!� +�,.��2<��XT���4�Pe>^

Ra�BU2)��' Apache Hadoop & Apache Spark �DZ>^��6�!�#&�#0�*�'0:-�5(<���6�!"4�(<�`c

❏ 20 ���0< ��0:���&� ���6/8aS�C@AN�>^❏ Apache Spark�Apache Hive�Presto b AWS Glue Data Catalog �YK❏ �<!�06���9�'���47$�

$

Latest versions Low cost Use S3 storage Easy

Data Lake

100110000100101011100101010111001010100000111100101100101010001100001

Page 43: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Amazon Redshift Spectrum❏ S3 ��+��� �����"���1>��

S3 data lakeAmazon Redshift data

Amazon Redshift Spectrumquery engine

S3 �=�����#�!�(� Amazon Redshift SQL ��)�:6/A

Redshift S3 ����� ��47

38)����!+���B�����*/A

-?����) $%��&,�C;5 @9:6<Parquet, ORC, Grok, Avro, CSV � %��&�!�=.

��',�� ��D�=����)02

Page 44: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

������������ �������� ���������!�������

à Amazon Redshift Spectrum

Page 45: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Amazon Redshift – Data Warehousing

/ SSD

Amazon S3

1 1,000

USDO

1/10

❏ 1/10❏ I ;

$

Fast at any scale InexpensiveOpen file formats Secure

Page 46: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Amazon Redshift Spectrum (��)❏ S3 ��+��� �����"���1>��

S3 data lakeAmazon Redshift data

Amazon Redshift Spectrumquery engine

S3 �=�����#�!�(� Amazon Redshift SQL ��)�:6/A

Redshift S3 ����� ��47

38)����!+���B�����*/A

-?����) $%��&,�C;5 @9:6<Parquet, ORC, Grok, Avro, CSV � %��&�!�=.

��',�� ��D�=����)02

Page 47: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Amazon Redshift Spectrum �������

LoadU nloadBackupRestore

S Q L C l ie n t s / B I T o o ls

J D B C / O D B C

Compute Node

Compute Node

Compute Node

L e a d e rN o d e

A m a z o n S 3

...1 2 3 4 N

A m a z o nR e d s h if tS p e c t r u m

;?C�+A��C0,������$❏ Leader Node

❖ SQL �)�"�)�❖ #�����@8❖ ?C��%3B����� ��

❏ Compute Node❖ (��&C0,��'��❖ ?C��%�2-❖ ��� load / unload / backup / restore

❏ Amazon Redshift Spectrum Node❖ Amazon S3 9��<5��%�2-❖ Redshift Spectrum �46�)��)���1>=*:�����!�����9���� ��%�.7>/

Page 48: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Amazon Redshift Spectrum ��������

AmazonRedshift

JDBC/ODBC

...

1 2 3 4 N

Am azon S3Exabyte-scale object storage

D ata CatalogApache Hive Metastore

1

��4��NQSELECT COUNT(*)FROM S3.EXT_TABLEGROUP BY…

2

❖ ��4��4�%�*�)D�.&�/�#����8+�5

❖ ��'�$�7��5"(6�!��O�����Spectrum I 4��"(����<L

3 ��4�.38�H�8,1�(*�) JF

4

�8,1�(*�)� Data Catalog ��+�&� 28ES�BO(Dynamically prune partitions)

5 ;�8,1�(*�)��Amazon Redshift Spectrum I K��RG�4��"(�JF

6 Amazon Redshift SpectrumI�*�)�S3�'�$�"�08

7

Amazon Redshift SpectrumI'�$�A9�-�5$�!2�8���4�� 28�@>

8

Amazon Redshift�3"$�P�7��5'�"�D�!2�8�?CM���4�� 28�@>���

9 =:��3��8( T���

Page 49: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

�� ������������(Schema on Read)

Data Catalog ����� Amazon Redshift �����������CREATE external schema archived_tripsfrom data catalog database 'sampledb' iam_role 'arn:aws:iam::123456789012:role/MySpectrumRole' region 'us-east-2’;

����������select * from svv_external_schemas

���� �����select * from svv_external_tables

Page 50: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

����

❏ Amazon Redshift � AWS Glue ��� Data Catalog Amazon S3 ���!��"��%����������()�15 ��

❏()�4������*,� AWS Identity and Access Management (IAM) &�%�.-��15���

❏�����$����&�%��� ���Amazon Redshift �'2���#.-3�/��ARN(Amazon Resource Name) �&�%�+0��

Page 51: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

������ on Amazon S3 with AWS Glue (��)

O n prem ises data

W eb app data

Am azon R DS

O ther databases

Stream ing data

Your data

AMAZON QUICKSIGHT

AWS GLUE ETL

Page 52: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

������������ �������� �������� Redshift Spectrum �"!���#�����������������

Page 53: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

�� ����� - 1 / 5

Amazon Redshift Spectrum ������������������������

❏ Redshift Spectrum ��?>���� Redshift �-"#���<A� 8>� �&�1���

❏)�/#�7@�6=7@��� �:���0(,�%�0$0!*�7@� Redshift Spectrum 9�3������.�5>��Redshift �-"#��7@�+'!$��;��42��

Page 54: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

����� �� - 2 / 5

�������������– Apache Parquet ���

❏ Apache Parquet ��+�(M^.:�3<�$�+�(5+9�0;%72=%FG�>��]\@V�`LH.#�1),

❏ SVL_S3QUERY_SUMMARY *�/9�R�����-�*!&6=X��� Parquet . "9�K�I��S3 �B �[��CYN�4,8$'�AU���

❏ T� s3_scanned_rows � s3query_returned_rows ��� 2 ��4,8$'�QZ������CSV . "9�M^ ����W���Redshift Spectrum �� Redshift $7'(��P���+�(O_�D?S�JE����������

Page 55: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

������ - 3 / 5

��� ����������– Parquet ��������� ��

❏ (�SQL��������"�!��"��4%-�3.�

❏ �� �),�������"���+� �*&����$/0��1'��������#2���:

SELECT query, segment, max(assigned_partitions) as total_partitions, max(qualified_partitions) as qualified_partitions FROM svl_s3partition WHERE query=<Query-ID> GROUP BY 1,2;

Page 56: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

������ - 4 / 5

�����������������

❏ Amazon Redshift Spectrum '%1�K@A<EL��57� 2 ��3.2�D;8L❖'%13.2�'%1�� 1 (0$(��>G 10 �K@A<C�• ����'%1K@�A< !�� ������K@A<CM" • 9����!�K@A<C�����S3 #(&/4� (3*+CD: !

❖,�+3.2�,�+B�J?� ���� S3 #(&/4� '%1�IN ! �,�+)$-����C6� �• ��G�,�+)$-#FH� ���B:C�=��

Page 57: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

����� �� - 5 / 5

Predicate pushdown ��������������������

❏ Amazon�Redshift Spectrum&�$��" �%��'��<:�SQL60������)=���������+=�A@��B)�❖ GROUP BY -�����>1C*5❖ ;(3/� LIKE ����!��'# �4.❖ COUNT/SUM/AVG/MIN/MAX/ �78�,92?*5❖ Regex_replace ;�*5

Page 58: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

����� �� - 5 / 5 (Cont.)

Predicate pushdown ��������������������

❏ DISTINCT�ORDER BY����+*� SQL($��Amazon Redshift Spectrum�������� ������Amazon Redshift &"������%-�#'!�.�� ����%-�,��

/)�❖ DISTINCT � GROUP BY )� ��

Page 59: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Amazon Redshift Spectrum 10 ���������https://aws.amazon.com/jp/blogs/news/10-best-practices-for-amazon-redshift-spectrum/

Page 60: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

���

Page 61: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

���

❏�����������$�"���'/���& � �32�+�2� -.1. ������ ����#)(�!4(�,%�+���-.

à AWS Glue

2. ����������������*���1���0 ����!4(�,%��-.à Amazon Redshift Spectrum

Page 62: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

����������������� ���������� �������!

https://d1.awsstatic.com/whitepapers/Storage/data-lake-on-aws.pdf

Snowball

Snowmobile KinesisData Firehose

KinesisData Streams

Amazon S3 AWS Glue

Redshift

EMR

AthenaKinesis

Elasticsearch Service

Page 63: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

����❏ AWS Glue

❖ https://aws.amazon.com/jp/glue/❏ AWS Glue �����

❖ https://aws.amazon.com/jp/glue/details/❏ AWS Glue � �����

❖ https://aws.amazon.com/jp/glue/developer-resources/

❏ Amazon Redshift❖ https://aws.amazon.com/jp/redshift/

❏ Amazon Redshift � �����❖ https://aws.amazon.com/jp/redshift/developer-resources/

Page 64: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

���

Page 65: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

���������� ��& S a• . / /

& :/ Q• W A• . -

Page 66: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

/: / 88F S AR

8 .

���������������� ����������������������� ��

1 8 / @

Page 67: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

������������� c S h W e

a :/ .- . / .-

: A :

Page 68: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

• .. d r tc h e A jlm i

g . / .. . - /.

• Sou• na s:psW

Page 69: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

10 -8237 - 8 /:86: 0 96:2 l pS

n ) 10 upks id 10 O re l a W

S cn ) 10 8 td t C 10 o

O zn ) 10 10 0d 10 0 996 ) m x

( n ) 92B : 8 A2 07688A /88 2 - 2 1 8 .:. 8 . 0 6

Page 70: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

-:8 D C DK FDBF 8 EBF yS

) p -E O F 5 K F) z -E O F 1D KB 2BD 8N K E -E O F 128 ) p -:8R u g eiac o) z 9 K - MB) ) p vn ml koTs S-:8R u 0B K7 M N) z -:8 8 MB / K D A) p -E O F 1D KB / FK BF 8 MB 3 F K -E O F138 -:8 2 A K) ( z -:8 8N K E 4 F A) p 8 a a t faWb d er

A /88 7 7 2 - 2 1 8 7 .:. 87 . 0 7

Page 71: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

�������� ����Q A )1

-Q A - )

Q QQ Q 1)

&- 1

Page 72: B + !) & A - d1.awsstatic.com · Spreadsheets Structured data Unstructured and Semi -structured data © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

� ����������