Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Developers / Rene Cannao (ProxySQL)

Post on 21-Jan-2018

146 Views

Category:

Engineering

9 Downloads

Preview:

Click to see full reader

Transcript

Inexpensive Datamasking for MySQL with ProxySQLRené Cannaò

Who we are

René Cannaò

Founder of ProxySQL

MySQL SRE at Dropbox

thanks to:

Frédéric Descamps

MySQL Community Manager

Other Sessions

273. ProxySQL, MaxScale, MySQL Router and other database traffic managers / Petr Zaitsev (Percona)

155. ProxySQL Use Case Scenario / Alkin Tezuysal (Percona)

Agenda

● Database overview● What is ProxySQL● Features overview● Data masking● Rules● Masking rules● Obfuscation with mysqldump● Examples

Overview of ProxySQL

Application and Database layers

APPLICATIONS

DATABASES

Main motivations

empower the DBAs

Improves manageability

understand and improve performance

High performance and High Availabilitycreate a proxy layer to shield the database

Database as a Service (layered)

APPLICATIONS

DATABASES + MANAGER(s)

DAAS – REVERSE PROXY

What is ProxySQL?

The MySQL data stargate

How to deploy

How to deploy

ProxySQL Features (short list)

High Availability and Scalabilityseamless failoverfirewallquery throttlingquery timeoutquery mirroringruntime reconfigurationSchedulerSupport for Galera/PXC and Group Replication

on-the-fly rewrite of queriescaching reads outside the databaseconnection pooling and multiplexingcomplex query routing and r/w splitload balancingreal time statisticsmonitoringData maskingMultiple instances on same portsNative Clustering

Support for ClickHouse

Data MaskingData masking or data obfuscation is the process of hiding original data with random characters or data.

The main reason for applying masking to a data field is to protect data that is classified as personal identifiable data, personal sensitive data or commercially sensitive data, however the data must remain usable for the purposes of undertaking valid test cycles

Why using ProxySQL as data masking solution?Open Source & Free like in beer

Other solutions are expensive or not working

Not worse than the other solutions as currently none is perfect

The best solution would be to have this feature implemented in the server just after the handler API

Query Rules

instructions to "program" ProxySQL behavior

matching criteriaactionsflow control and chains

Query Rewrite

Dynamically rewrite queries sent by the application/client

without the client being awareon the flyusing ProxySQL query rulesrules defined using regular expressions, s/match/replace/

The conceptWe use Regular Expressions to modify the clients’ SQL statement and replace the column(s) we want to hide by some characters or generate fake data.

We will split our solution in two different solutions:● Provide access to the database to developers● Generate dump to populate a database to share

Only the defined users, in our example we use a developer, will have his statements modified.

The concept (2)We will also create two categories :

• data masking

• data obfuscating

Data MaskingHere we will just mask with a generic character the full value of the column or part of it:

Data ObfuscationHere we will just replace the value of the column with random characters of the same type, we create fake data

Access

INSERT INTO mysql_users(username, password, active, default_hostgroup)VALUES ('devel','devel',1,1);

INSERT INTO mysql_users(username, password, active, default_hostgroup)VALUES ('backup','dumpme',1,1);

Create a user for masking:

Create a user for backups:

RulesAvoid SELECT *

for the developer, we need to create some rules to block any SELECT * variant on the table

if the column is part of many tables, we need to do so for each of them

Rules (2)Mask or obfuscate the field

when the field is selected in the columns we need:● to replace the column by showing the first 2 characters and a

certain amount of X s or generate a random string● keep the column name● for mysqldump we need to allow SELECT * but mask and/or

obfuscate sensible values

Rules overview

rule_id: 1 active: 1 username: devel schemaname: employees flagIN: 0 match_pattern: `*first_name*` re_modifiers: caseless,global flagOUT: NULL replace_pattern: first_name apply: 0

Rule #1

rule_id: 2 active: 1 username: devel schemaname: employees flagIN: 0 match_pattern: (\(?)(`?\w+`?\.)?first_name(\)?)([ ,\n]) re_modifiers: caseless,global flagOUT: NULL replace_pattern: \1CONCAT(LEFT(\2first_name,2),REPEAT('X',10))\3 first_name\4 apply: 0

Rule #2

rule_id: 158 active: 1 username: devel schemaname: employees flagIN: 0 match_pattern: (\(?)(`?\w+`?\.)?salary(\)?)([ ,\n]) negate_match_pattern: 0 re_modifiers: CASELESS,GLOBAL flagOUT: NULL replace_pattern: \1CONCAT( floor(rand() * 50000) + 10000,'')\3 salary\4

Rule #2 - obfuscating

Let's imagine we want to provide fake number for `salaries`.`salary` column.We could instead of the previous rule use this one

rule_id: 3 active: 1 username: devel schemaname: employees flagIN: 0 match_pattern: \)(\)?) first_name\s+(\w), re_modifiers: caseless,global flagOUT: NULL replace_pattern: )\1 \2, apply: 1

Rule #3

rule_id: 4 active: 1 username: devel schemaname: employees flagIN: 0 match_pattern: \)(\)?) first_name\s+(.*)\s+from re_modifiers: caseless,global flagOUT: NULLreplace_pattern: )\1 \2 from apply: 1

Rule #4

rule_id: 5 active: 1 username: devel schemaname: employeesmatch_pattern: ^SELECT\s+\*.*FROM.*employees re_modifiers: caseless,global error_msg: Query not allowed due to sensitive information, please contact dba@acme.com apply: 0

Rule #5

rule_id: 6 active: 1 username: devel schemaname: employeesmatch_pattern: ^SELECT\s+employees\.\*.*FROM.*employees re_modifiers: caseless,global error_msg: Query not allowed due to sensitive information, please contact dba@acme.com apply: 0

Rule #6

rule_id: 7 active: 1 username: devel schemaname: employeesmatch_pattern: ^SELECT\s+(\w+)\.\*.*FROM.*employees\s+(as\s+)?(\1) re_modifiers: caseless,global error_msg: Query not allowed due to sensitive information, please contact dba@acme.com apply: 0

Rule #6

Rules for mysqldumpTo provide a dump that might be used by developers, Q/A or support, we need to:

● generate valid data● obfuscate sensitive information● rewrite SQL statements issued by mysqldump● only for tables and columns with sensitive data

mysqldump rules

rule_id: 8 active: 1 user: backup schema: employees flagIN: 0 match: ^/\*!40001 SQL_NO_CACHE \*/ \* FROM `salaries` replace: SQL_NO_CACHE emp_no, ROUND(RAND()*100000), from_date, to_date FROM salaries flagOUT: NULL apply: 1

Rule #8

mysqldump rules

rule_id: 9 active: 1 user: backup schema: employees flagIN: 0 match: \* FROM `employees` replace: emp_no, CONCAT(LEFT(birth_date,2), FLOOR(RAND()*50)+10, RIGHT(birth_date,6)) birth_date, CONCAT(LEFT(first_name,2), REPEAT('x',LENGTH(first_name)-2)) first_name, CONCAT(LEFT(last_name,3), REPEAT('x',LENGTH(last_name)-3)) last_name, gender, hire_date FROM employees flagOUT: NULL apply: 1

Rule #9

Limitions

● better support in proxySQL >= 1.4.x○ RE2 an PCRE regexes

● all fields with the same name will be masked whatever the name of the table is in the same schema

● the regexps can always be not sufficient● block any query not matching whitelisted SQL statements

● the dump via ProxySQL solution seems to be the best

Make it easyThis is not really easy isn´t it ?You can use this small bash script (https://github.com/lefred/maskit) to generate them:

# ./maskit.sh -c first_name -t employees -d employeescolumn: first_nametable: employeesschema: employees

let's add the rules...

ExamplesEasy ones:

SELECT * FROM employees;

SELECT emp_no, last_name, first_name FROM employees;

Examples (2)More difficult:

select emp_no, concat(first_name), last_name from employees;

select emp_no, first_name, first_name from employees.employees

select emp_no, `first_name` from employees;

select emp_no, first_name -> from employees; (*)

Examples (3)More difficult:select t1.first_name from employees.employees as t1;

select emp_no, first_name as fred from employees;

select emp_no, first_name rene from employees;

select emp_no, first_name `as` from employees;

select first_name as `as`, last_name from employees;

select `t1`.`first_name` from employees.employees as t1;

Examples (4)More difficult:select first_name fred, last_name from employees;

select emp_no, first_name /* first_name */ from employees.employees;

/* */ select last_name, first_name from employees;

select CUSTOMERS.* from myapp.CUSTOMERS;

select a.* from employees.employees a;`

We need you!

Thank you!

Questions?

E: rene@proxysql.com

top related