Top Banner
Inexpensive Datamasking for MySQL with ProxySQL René Cannaò
43

Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Developers / Rene Cannao (ProxySQL)

Jan 21, 2018

Download

Engineering

Ontico
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Developers / Rene Cannao (ProxySQL)

Inexpensive Datamasking for MySQL with ProxySQLRené Cannaò

Page 2: Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Developers / Rene Cannao (ProxySQL)

Who we are

René Cannaò

Founder of ProxySQL

MySQL SRE at Dropbox

thanks to:

Frédéric Descamps

MySQL Community Manager

Page 3: Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Developers / Rene Cannao (ProxySQL)

Other Sessions

273. ProxySQL, MaxScale, MySQL Router and other database traffic managers / Petr Zaitsev (Percona)

155. ProxySQL Use Case Scenario / Alkin Tezuysal (Percona)

Page 4: Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Developers / Rene Cannao (ProxySQL)

Agenda

● Database overview● What is ProxySQL● Features overview● Data masking● Rules● Masking rules● Obfuscation with mysqldump● Examples

Page 5: Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Developers / Rene Cannao (ProxySQL)

Overview of ProxySQL

Page 6: Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Developers / Rene Cannao (ProxySQL)

Application and Database layers

APPLICATIONS

DATABASES

Page 7: Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Developers / Rene Cannao (ProxySQL)

Main motivations

empower the DBAs

Improves manageability

understand and improve performance

High performance and High Availabilitycreate a proxy layer to shield the database

Page 8: Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Developers / Rene Cannao (ProxySQL)

Database as a Service (layered)

APPLICATIONS

DATABASES + MANAGER(s)

DAAS – REVERSE PROXY

Page 9: Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Developers / Rene Cannao (ProxySQL)

What is ProxySQL?

The MySQL data stargate

Page 10: Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Developers / Rene Cannao (ProxySQL)

How to deploy

Page 11: Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Developers / Rene Cannao (ProxySQL)

How to deploy

Page 12: Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Developers / Rene Cannao (ProxySQL)

ProxySQL Features (short list)

High Availability and Scalabilityseamless failoverfirewallquery throttlingquery timeoutquery mirroringruntime reconfigurationSchedulerSupport for Galera/PXC and Group Replication

on-the-fly rewrite of queriescaching reads outside the databaseconnection pooling and multiplexingcomplex query routing and r/w splitload balancingreal time statisticsmonitoringData maskingMultiple instances on same portsNative Clustering

Page 13: Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Developers / Rene Cannao (ProxySQL)

Support for ClickHouse

Page 14: Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Developers / Rene Cannao (ProxySQL)

Data MaskingData masking or data obfuscation is the process of hiding original data with random characters or data.

The main reason for applying masking to a data field is to protect data that is classified as personal identifiable data, personal sensitive data or commercially sensitive data, however the data must remain usable for the purposes of undertaking valid test cycles

Page 15: Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Developers / Rene Cannao (ProxySQL)

Why using ProxySQL as data masking solution?Open Source & Free like in beer

Other solutions are expensive or not working

Not worse than the other solutions as currently none is perfect

The best solution would be to have this feature implemented in the server just after the handler API

Page 16: Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Developers / Rene Cannao (ProxySQL)

Query Rules

instructions to "program" ProxySQL behavior

matching criteriaactionsflow control and chains

Page 17: Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Developers / Rene Cannao (ProxySQL)

Query Rewrite

Dynamically rewrite queries sent by the application/client

without the client being awareon the flyusing ProxySQL query rulesrules defined using regular expressions, s/match/replace/

Page 18: Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Developers / Rene Cannao (ProxySQL)

The conceptWe use Regular Expressions to modify the clients’ SQL statement and replace the column(s) we want to hide by some characters or generate fake data.

We will split our solution in two different solutions:● Provide access to the database to developers● Generate dump to populate a database to share

Only the defined users, in our example we use a developer, will have his statements modified.

Page 19: Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Developers / Rene Cannao (ProxySQL)

The concept (2)We will also create two categories :

• data masking

• data obfuscating

Page 20: Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Developers / Rene Cannao (ProxySQL)

Data MaskingHere we will just mask with a generic character the full value of the column or part of it:

Page 21: Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Developers / Rene Cannao (ProxySQL)

Data ObfuscationHere we will just replace the value of the column with random characters of the same type, we create fake data

Page 22: Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Developers / Rene Cannao (ProxySQL)

Access

INSERT INTO mysql_users(username, password, active, default_hostgroup)VALUES ('devel','devel',1,1);

INSERT INTO mysql_users(username, password, active, default_hostgroup)VALUES ('backup','dumpme',1,1);

Create a user for masking:

Create a user for backups:

Page 23: Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Developers / Rene Cannao (ProxySQL)

RulesAvoid SELECT *

for the developer, we need to create some rules to block any SELECT * variant on the table

if the column is part of many tables, we need to do so for each of them

Page 24: Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Developers / Rene Cannao (ProxySQL)

Rules (2)Mask or obfuscate the field

when the field is selected in the columns we need:● to replace the column by showing the first 2 characters and a

certain amount of X s or generate a random string● keep the column name● for mysqldump we need to allow SELECT * but mask and/or

obfuscate sensible values

Page 25: Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Developers / Rene Cannao (ProxySQL)

Rules overview

rule_id: 1 active: 1 username: devel schemaname: employees flagIN: 0 match_pattern: `*first_name*` re_modifiers: caseless,global flagOUT: NULL replace_pattern: first_name apply: 0

Rule #1

Page 26: Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Developers / Rene Cannao (ProxySQL)

rule_id: 2 active: 1 username: devel schemaname: employees flagIN: 0 match_pattern: (\(?)(`?\w+`?\.)?first_name(\)?)([ ,\n]) re_modifiers: caseless,global flagOUT: NULL replace_pattern: \1CONCAT(LEFT(\2first_name,2),REPEAT('X',10))\3 first_name\4 apply: 0

Rule #2

Page 27: Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Developers / Rene Cannao (ProxySQL)

rule_id: 158 active: 1 username: devel schemaname: employees flagIN: 0 match_pattern: (\(?)(`?\w+`?\.)?salary(\)?)([ ,\n]) negate_match_pattern: 0 re_modifiers: CASELESS,GLOBAL flagOUT: NULL replace_pattern: \1CONCAT( floor(rand() * 50000) + 10000,'')\3 salary\4

Rule #2 - obfuscating

Let's imagine we want to provide fake number for `salaries`.`salary` column.We could instead of the previous rule use this one

Page 28: Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Developers / Rene Cannao (ProxySQL)

rule_id: 3 active: 1 username: devel schemaname: employees flagIN: 0 match_pattern: \)(\)?) first_name\s+(\w), re_modifiers: caseless,global flagOUT: NULL replace_pattern: )\1 \2, apply: 1

Rule #3

Page 29: Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Developers / Rene Cannao (ProxySQL)

rule_id: 4 active: 1 username: devel schemaname: employees flagIN: 0 match_pattern: \)(\)?) first_name\s+(.*)\s+from re_modifiers: caseless,global flagOUT: NULLreplace_pattern: )\1 \2 from apply: 1

Rule #4

Page 30: Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Developers / Rene Cannao (ProxySQL)

rule_id: 5 active: 1 username: devel schemaname: employeesmatch_pattern: ^SELECT\s+\*.*FROM.*employees re_modifiers: caseless,global error_msg: Query not allowed due to sensitive information, please contact [email protected] apply: 0

Rule #5

Page 31: Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Developers / Rene Cannao (ProxySQL)

rule_id: 6 active: 1 username: devel schemaname: employeesmatch_pattern: ^SELECT\s+employees\.\*.*FROM.*employees re_modifiers: caseless,global error_msg: Query not allowed due to sensitive information, please contact [email protected] apply: 0

Rule #6

Page 32: Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Developers / Rene Cannao (ProxySQL)

rule_id: 7 active: 1 username: devel schemaname: employeesmatch_pattern: ^SELECT\s+(\w+)\.\*.*FROM.*employees\s+(as\s+)?(\1) re_modifiers: caseless,global error_msg: Query not allowed due to sensitive information, please contact [email protected] apply: 0

Rule #6

Page 33: Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Developers / Rene Cannao (ProxySQL)

Rules for mysqldumpTo provide a dump that might be used by developers, Q/A or support, we need to:

● generate valid data● obfuscate sensitive information● rewrite SQL statements issued by mysqldump● only for tables and columns with sensitive data

Page 34: Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Developers / Rene Cannao (ProxySQL)

mysqldump rules

rule_id: 8 active: 1 user: backup schema: employees flagIN: 0 match: ^/\*!40001 SQL_NO_CACHE \*/ \* FROM `salaries` replace: SQL_NO_CACHE emp_no, ROUND(RAND()*100000), from_date, to_date FROM salaries flagOUT: NULL apply: 1

Rule #8

Page 35: Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Developers / Rene Cannao (ProxySQL)

mysqldump rules

rule_id: 9 active: 1 user: backup schema: employees flagIN: 0 match: \* FROM `employees` replace: emp_no, CONCAT(LEFT(birth_date,2), FLOOR(RAND()*50)+10, RIGHT(birth_date,6)) birth_date, CONCAT(LEFT(first_name,2), REPEAT('x',LENGTH(first_name)-2)) first_name, CONCAT(LEFT(last_name,3), REPEAT('x',LENGTH(last_name)-3)) last_name, gender, hire_date FROM employees flagOUT: NULL apply: 1

Rule #9

Page 36: Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Developers / Rene Cannao (ProxySQL)

Limitions

● better support in proxySQL >= 1.4.x○ RE2 an PCRE regexes

● all fields with the same name will be masked whatever the name of the table is in the same schema

● the regexps can always be not sufficient● block any query not matching whitelisted SQL statements

● the dump via ProxySQL solution seems to be the best

Page 37: Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Developers / Rene Cannao (ProxySQL)

Make it easyThis is not really easy isn´t it ?You can use this small bash script (https://github.com/lefred/maskit) to generate them:

# ./maskit.sh -c first_name -t employees -d employeescolumn: first_nametable: employeesschema: employees

let's add the rules...

Page 38: Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Developers / Rene Cannao (ProxySQL)

ExamplesEasy ones:

SELECT * FROM employees;

SELECT emp_no, last_name, first_name FROM employees;

Page 39: Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Developers / Rene Cannao (ProxySQL)

Examples (2)More difficult:

select emp_no, concat(first_name), last_name from employees;

select emp_no, first_name, first_name from employees.employees

select emp_no, `first_name` from employees;

select emp_no, first_name -> from employees; (*)

Page 40: Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Developers / Rene Cannao (ProxySQL)

Examples (3)More difficult:select t1.first_name from employees.employees as t1;

select emp_no, first_name as fred from employees;

select emp_no, first_name rene from employees;

select emp_no, first_name `as` from employees;

select first_name as `as`, last_name from employees;

select `t1`.`first_name` from employees.employees as t1;

Page 41: Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Developers / Rene Cannao (ProxySQL)

Examples (4)More difficult:select first_name fred, last_name from employees;

select emp_no, first_name /* first_name */ from employees.employees;

/* */ select last_name, first_name from employees;

select CUSTOMERS.* from myapp.CUSTOMERS;

select a.* from employees.employees a;`

Page 42: Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Developers / Rene Cannao (ProxySQL)

We need you!

Page 43: Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Developers / Rene Cannao (ProxySQL)

Thank you!

Questions?

E: [email protected]