Top Banner
© 2019 Snowflake Inc. All Rights Reserved SNOWFLAKE BEST PRACTICES LOUIS LEE SALES ENGINEER CLIVE ASTBURY REGIONAL SALES ENGINEERING MANAGER
38

SNOWFLAKE BEST PRACTICES · 2019-12-01 · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

May 24, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: SNOWFLAKE BEST PRACTICES · 2019-12-01 · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved

SNOWFLAKEBEST PRACTICES

LOUIS LEESALES ENGINEER

CLIVE ASTBURY REGIONAL SALES ENGINEERING MANAGER

Page 2: SNOWFLAKE BEST PRACTICES · 2019-12-01 · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved

AGENDA

2

Virtual Warehouse Management

Cost Management

Network Security Policies

User Authentication

Role Management

Snowflake Community

Page 3: SNOWFLAKE BEST PRACTICES · 2019-12-01 · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved

VIRTUAL WAREHOUSE MANAGEMENT

Page 4: SNOWFLAKE BEST PRACTICES · 2019-12-01 · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved 4

VIRTUAL WAREHOUSE MANAGEMENT

Considerations• Key SLA’s and challenges with

meeting SLA’s

• Data load and transformation workloads

• Reporting, ad hoc analysis, and data science workloads

• Cost management

Topics• Sizes and approach to right-sizing

• Scaling up vs. scaling out

• Automating suspend/resume, sizing, and multi-cluster scale-out

Page 5: SNOWFLAKE BEST PRACTICES · 2019-12-01 · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved

WAREHOUSE SIZESSizes Servers / Cluster Credits / Hour Notes

X-Small 1 1 Default size when created using CREATE WAREHOUSE.

Small 2 2

Medium 4 4

Large 8 8

X-Large 16 16 Default size for warehouses created in the web UI.

2X-Large 32 32

3X-Large 64 64

4X-Large 128 128

5

Page 6: SNOWFLAKE BEST PRACTICES · 2019-12-01 · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

Doubling the number of servers halves the run-time...

SCALE UP - LOADING 1BN RECORDS

Doubling the number of servers halves the run-time...

… but you pay per-server, per-second of compute...

… so you can get your answer 8x faster for the same cost.

Page 7: SNOWFLAKE BEST PRACTICES · 2019-12-01 · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

SCALE OUT - MULTI-CLUSTER WAREHOUSES

Page 8: SNOWFLAKE BEST PRACTICES · 2019-12-01 · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

SCALE OUT - MULTI-CLUSTER WAREHOUSES

4x increase in servers

4x increase in servers (at peak load)

both are 16 servers, in different configurations

multi-cluster is also half the cost of the xlarge single cluster

multi-cluster gives better results

Page 9: SNOWFLAKE BEST PRACTICES · 2019-12-01 · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

S

M

MM

time

All three examples contain the same amount of work.

Using scale up and scale out, total run-time is significantly reduced.

You pay per-server, per-second so they all cost the same.

ALL TOGETHER - SCALE, ELASTICITY, COST

Page 10: SNOWFLAKE BEST PRACTICES · 2019-12-01 · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved 10

AUTOMATING SUSPEND/RESUME

Auto Suspend/Resume• On-demand, end-user workloads• Suspend idle time setting should take into

account data caching

Programmatic Suspend/Resume• Scheduled jobs where process orchestration is

controlled• Programmatically resume at the start of

processing and suspend at the end of processing to avoid idle time costs

Page 11: SNOWFLAKE BEST PRACTICES · 2019-12-01 · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved

COST MANAGEMENT

Page 12: SNOWFLAKE BEST PRACTICES · 2019-12-01 · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved 12

Considerations• Compute Costs• Storage Costs• Service Costs• Data Transfer (Egress) Costs• Monitoring & Alerting

COST MANAGEMENT

Topics

● Resources Incurring Costs● Compute

○ Viewing Usage○ Resource Monitors

● Storage○ Time Travel & Fail-Safe○ Viewing Usage

● Services○ Non-warehouse compute

Page 13: SNOWFLAKE BEST PRACTICES · 2019-12-01 · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved

RESOURCES INCURRING COSTS

Materialized ViewsAccount

Virtual Warehouses

Databases Schemas

Tables

Permanent

Temp/Transient

AutomaticClustering

Service

Stages

Internal

Cross-RegionExtract Egress

PipesCompute Costs

Storage CostsService CostsPass-through Costs

Materialized Views

13

Page 14: SNOWFLAKE BEST PRACTICES · 2019-12-01 · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved 14

RESOURCE MONITOR• Align with team-by-team warehouse

separation for granular cost governance

• Set at account level if specific virtual warehouse quotas are not needed

• Leverage tiered triggers with escalating actions (e.g., Notify > Notify > Suspend)

• Enable notifications using ACCOUNTADMIN role and set e-mail address

Page 15: SNOWFLAKE BEST PRACTICES · 2019-12-01 · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved

STORAGE FUNDAMENTALS

15

Page 16: SNOWFLAKE BEST PRACTICES · 2019-12-01 · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved

TABLE TYPES

Tied to an individual session and persists only for the duration of the session. Used for storing non-permanent, transitory data (e.g. ETL data, session-specific data).

TemporarySpecifically designed for transitory data that needs to be maintained beyond each session (in contrast to temporary tables), but does not need the same level of data protection and recovery provided by permanent tables.

TransientDesigned for data that requires the highest level of data protection and recovery with both a Time-Travel and Fail-Safe period, and is the default for creating tables.

Permanent

Time-Travel

Fail-Safe x x

Page 17: SNOWFLAKE BEST PRACTICES · 2019-12-01 · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved 17

TIME TRAVELSTORAGE

• High churn detected with ratio such as:

TIME_TRAVEL_BYTES / ACTIVE_BYTES

from TABLE_STORAGE_METRICS view

• For Enterprise (or higher), retention period can be up to 90 days; verify retention period on all large or high-churn tables

• Reduce retention period if data can be regenerated/reloaded and time/effort to do so is within acceptable boundaries/SLAs

• Use periodic zero-copy-cloning (snapshots) instead of time travel to provide longer retention period at discrete points in time (daily, weekly, etc)

Areas Of Focus• Dimensional Tables• Persistent Staging Areas• Materialized Relationships,

Derivations, Other Business Rules

Page 18: SNOWFLAKE BEST PRACTICES · 2019-12-01 · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved 18

FAIL-SAFESTORAGE

• Permanent tables follow full CDP lifecycle; temp/transient tables NEVER use fail-safe

• Utilize temp tables for session-specific intermediate results in complex data processing workflow

• Temporary tables are dropped (and storage released) as soon as session ends

• Utilize transient tables for staging where frequent truncate/reload operations occur

• Consider designating databases/schemas as transient to simplify table creation

Areas Of Focus• Staging Tables• Intermediate Result Tables• Work Areas for Developers, Analysts

& Data Scientists• Reporting Tool Materialized Results

Page 19: SNOWFLAKE BEST PRACTICES · 2019-12-01 · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved

NETWORK SECURITY

Page 20: SNOWFLAKE BEST PRACTICES · 2019-12-01 · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved

LAYERED SECURITY

To protect customer data using AES 256 bit encryption, and periodic re-keying

Network(AuthenticateConnection)

Account(Authenticate User)

Object(Authorization)

Data(Encryption)

1 2 3 4

To authenticate users using a Password, Multi-Factor Authentication or Single Sign-On

To restrict access to specific Databases, Schemas, Tables, Views, etc.

Using Roles and Privileges

To restrict access to specified IP address/rangeOptionally: To restrict via Secure Private Network

20

Page 21: SNOWFLAKE BEST PRACTICES · 2019-12-01 · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved 21

NETWORKSECURITY

Considerations• IP Whitelisting &

Blacklisting• Public Internet Exposure

Considerations

Topics• Network Security Policies

• AWS/Azure PrivateLink

Page 22: SNOWFLAKE BEST PRACTICES · 2019-12-01 · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved 22

• Managed by ACCOUNTADMIN or SECURITYADMIN roles

• Only one network policy object can be active at any one time

• Supports IPv4 addresses & CIDR notation

• Maintain consistency with other enterprise application network security policies

• Connectivity test plan should include all networks (i.e., internal, vpn, etc.)

• Utilize IP ranges versus IP lists whenever possible (e.g., 192.168.1.0/24)

• Blocked IP’s are enforced first and require careful consideration when overlapping an allowed IP range (e.g., 0.0.0.0/0 blocks all IP’s)

NETWORK SECURITY POLICIES

Page 23: SNOWFLAKE BEST PRACTICES · 2019-12-01 · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved

AWS/AZURE PRIVATELINK

23

AWS QuickSight

Page 24: SNOWFLAKE BEST PRACTICES · 2019-12-01 · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved

USER AUTHENTICATION

Page 25: SNOWFLAKE BEST PRACTICES · 2019-12-01 · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved 25

USER AUTHENTICATION

Considerations• Multi-Factor

Authentication• Federated Authentication• User Group Scenarios• Service Account

Scenarios

Topics• Multi-Factor Authentication

• Federated Authentication & SSO

• OAuth

Page 26: SNOWFLAKE BEST PRACTICES · 2019-12-01 · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved 26

• Provides increased login security for users connecting to Snowflake

• Powered by Duo Security, which is managed by Snowflake

• Can self-enroll

• Strongly recommend requiring MFA for all users with ACCOUNTADMIN role

• Duo-generated passcode can be used when connecting through Python, SnowSQL, JDBC or ODBC

MULTI-FACTOR AUTHENTICATION

Page 27: SNOWFLAKE BEST PRACTICES · 2019-12-01 · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved 27

• Enables user SSO (single sign-on) through federated authentication

• Browser-based supports for most SAML 2.0-compliant identity providers (Google, Azure, Onelogin, PingOne)

• Native support for Okta and Microsoft ADFS

• Browser-based SSO can be used in combination with MFA

FEDERATED AUTH & SSO

Page 28: SNOWFLAKE BEST PRACTICES · 2019-12-01 · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved 28

• Open-standard 2.0 protocol that allows supported clients authorized access to Snowflake without sharing or storing user login credentials

• Supports Tableau Desktop/Server/Online and custom clients configured by your organization

• Supports OAuth with AWS PrivateLink

• ACCOUNTADMIN and SECURITYADMIN are blocked roles by default, but can be enabled by Snowflake Support

• Currently only the default role for a user is authorized or PUBLIC if no default is set

OAUTH

Page 29: SNOWFLAKE BEST PRACTICES · 2019-12-01 · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved

ROLE MANAGEMENT

Page 30: SNOWFLAKE BEST PRACTICES · 2019-12-01 · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved 30

ROLE MANAGEMENT

Considerations• Administrators• Developers & DevOps

Flow• End-Users• Service Accounts

Risks • Inappropriate or Overly

Restrictive Access

• Lack of Extensibility & Control

• Burdensome Maintenance

• Future Rework & Reconfiguration

Page 31: SNOWFLAKE BEST PRACTICES · 2019-12-01 · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved 31

SYSTEM-DEFINED ROLES

Users & Roles Objects

ACCOUNTADMINOwns the Snowflake account and can operate on all objects in the account, view and manage Snowflake billing and credit data, and stop any running SQL statements

SECURITYADMINPrimary role for managing users, custom roles and object access (grants)

SYSADMINPrimary role for creating and managing objects (i.e., warehouses, databases, tables, etc.) and administering object access through custom roles

PUBLICPseudo-role that is automatically granted to every user and every role in your account

Page 32: SNOWFLAKE BEST PRACTICES · 2019-12-01 · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved 32

EXAMPLE Functional Roles● Analyst Team Lead● Junior Analyst

Analyst Team Lead● Has all (CRUD) access to a working schema

● Read access to the main schema

Junior Analyst● Limited to read access to the main schema

Both roles share access to a Virtual Warehouse

Page 33: SNOWFLAKE BEST PRACTICES · 2019-12-01 · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

Analyst Team Lead

Table

Database: DWH

Schema: Working Area

Select

JuniorAnalyst

Table

Schema: Main

READ ONLY PATTERN: SOLUTION

OBRIAN WSMITH

Usage

Virtual Warehouse

Usage

Usage

Usage

Page 34: SNOWFLAKE BEST PRACTICES · 2019-12-01 · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved 34

Naming Convention• Establish and use a consistent naming

convention across entire account

Future Grants• Allows defining a role with an initial set of

privileges on new objects of a certain type (e.g., tables or views) within a schema or database (pr-preview)

Viewing Granted Roles & Privileges• SHOW GRANTS TO USER <user>;• SHOW GRANTS TO ROLE <role>;• SHOW GRANTS OF ROLE <role>;• Query INFORMATION_SCHEMA

Managed Access Schema• Centralizes grant management to the

schema owner or role with MANAGE GRANTS

OTHERCONSIDERATIONS

• Naming Convention• Future Grants• Viewing Granted Roles &

Privileges• Managed Access Schema

Page 35: SNOWFLAKE BEST PRACTICES · 2019-12-01 · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved

SNOWFLAKECOMMUNITY

Page 36: SNOWFLAKE BEST PRACTICES · 2019-12-01 · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved 36

SNOWFLAKE COMMUNITY

Snowflake Community• We are moving our forum

to Stack Overflow• Use existing forum for

Snowflake account-related questions

• Everything else will remain the same with Snowflake Community

Stack Overflow • Technical Q&A

• Use the “[snowflake]” tag

• Include relevant information like error messages

Page 37: SNOWFLAKE BEST PRACTICES · 2019-12-01 · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved

Questions?

Page 38: SNOWFLAKE BEST PRACTICES · 2019-12-01 · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved

Thank You