Neatly Hashing a Tree: FP tree-fold in Perl5 & Perl6

Post on 13-Feb-2017

93 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

Transcript

Steven Lembark Workhorse Computing lembark@wrkhors.com

There was Spaghetti Code.

And it was bad.

There was Spaghetti Code.

And it was bad. So we invented Objects.

There was Spaghetti Code.

And it was bad. So we invented Objects.

Now we have Spaghetti Objects.

Based on Lambda Calculus.

Few basic ideas:

Transparency.

Consistency.

Constant data.

Transparent transforms.

Functions require input.

Output determined fully by inputs.

Avoid internal state & side effects.

time() random() readline() fetchrow_array()

Result: State matters!

Fix: Apply reality.

Used with AWS “Glacier” service.

$0.01/GiB/Month.

Large, cold data (discounts for EiB, PiB).

Uploads require lots of sha256 values.

Uploads chunked in multiples of 1MB. Digest for each chunk & entire upload.

Result: tree-hash.

Image from Amazon Developer Guide (API Version 2012-06-01) http://docs.aws.amazon.com/amazonglacier/latest/dev/checksum-calculations.html

sub calc_tree {

my ($self) = @_; my $prev_level = 0; while (scalar @{ $self->{tree}->[$prev_level] } > 1) { my $curr_level = $prev_level+1;

$self->{tree}->[$curr_level] = []; my $prev_tree = $self->{tree}->[$prev_level]; my $curr_tree = $self->{tree}->[$curr_level]; my $len = scalar @$prev_tree; for (my $i = 0; $i < $len; $i += 2) { if ($len - $i > 1) { my $a = $prev_tree->[$i]; my $b = $prev_tree->[$i+1]; push @$curr_tree, { hash => sha256( $a->{hash}.$b->{hash} ), start => $a->{start}, finish => $b->{finish}, joined => 0 }; } else { push @$curr_tree, $prev_tree->[$i]; } } $prev_level = $curr_level;

} }

Trees are naturally recursive. Two-step generation:

Split the buffer.

Reduce the hashes.

Reduce pairs.

Until one value remains.

sub reduce_hash {

# undef for empty list @_ > 1 or return $_[0]; my $count = @_ / 2 + @_ % 2;

reduce_hash map

{

@_ > 1 ? sha256 splice @_, 0, 2 : shift

} ( 1 .. $count )

}

Reduce pairs.

Until one value remains.

Catch: Eats Stack

sub reduce_hash {

# undef for empty list @_ > 1 or return $_[0]; my $count = @_ / 2 + @_ % 2;

reduce_hash map

{

@_ > 1 ? sha256 splice @_, 0, 2 : shift

} ( 1 .. $count )

}

Tail recursion is common.

“Tail call elimination” recycles stack.

“Fold” is a feature of FP languages.

Reduces the stack to a scalar.

Reset the stack.

Restart the sub.

my $foo = sub { @_ > 1 or return $_[0]; @_ = … ; # new in v5.16 goto __SUB__ };

Voila!

Stack shrinks.

sub reduce_hash { @_ > 1 or return $_[0]; my $count = @_ / 2 + @_ % 2; @_ = map { @_ > 1 ? sha256 splice @_, 0, 2 : @_ } ( 1 .. $count ); goto __SUB__ }

Voila!

Stack shrinks.

@_ =

goto

scare people.

sub reduce_hash { @_ > 1 or return $_[0]; my $count = @_ / 2 + @_ % 2; @_ = map { @_ > 1 ? sha256 splice @_, 0, 2 : @_ } ( 1 .. $count ); goto __SUB__ }

See K::D POD for {{{…}}} to avoid "\@_".

use Keyword::Declare; keyword tree_fold ( Ident $name, Block $new_list ) { qq # this is source code, not a subref! { sub $name { \@_ > 1 or return \$_[0]; \@_ = do $new_list; goto __SUB__ } } }

User supplies generator a.k.a

$new_list

tree_fold reduce_hash { my $count = @_ / 2 + @_ % 2; map { @_ > 1 ? sha256 splice @_, 0, 2 : @_ } ( 1 .. $count ) }

User supplies generator. NQFP: Hacks the stack.

tree_fold reduce_hash { my $count = @_ / 2 + @_ % 2; map { @_ > 1 ? sha256 splice @_, 0, 2 : @_ } ( 1 .. $count ) }

Replace splice with offsets.

tree_fold reduce_hash {

my $last = @_ / 2 + @_ % 2 – 1;

map { $_[ $_ + 1 ] ? sha256 @_[ $_, $_ + 1 ] : $_[ $_ ] } map { 2 * $_ } ( 0 .. $last )

}

Replace splice with offsets.

Still messy: @_, stacked map.

tree_fold reduce_hash {

my $last = @_ / 2 + @_ % 2 – 1;

map { $_[ $_ + 1 ] ? sha256 @_[ $_, $_ + 1 ] : $_[ $_ ] } map { 2 * $_ } ( 0 .. $last )

}

Declare fold_hash with parameters.

Caller uses lexical vars.

keyword tree_fold (

Ident $name, List $argz, Block $stack_op

) {

... }

Extract lexical variables.

See also: PPI::Token

my @varz # ( '$foo', '$bar' ) = map { $_->isa( 'PPI::Token::Symbol' ) ? $_->{ content } : () } map { $_->isa( 'PPI::Statement::Expression' ) ? @{ $_->{ children } } : () } @{ $argz->{ children } };

Count & offset used to extract stack.

my $lexical = join ',' => @varz; my $count = @varz; my $offset = $count -1; sub $name {

\@_ > 1 or return \$_[0];

my \$last = \@_ % $count ? int( \@_ / $count ) : int( \@_ / $count ) - 1 ;

...

Interpolate lexicals, count, offset, stack op.

\@_ = map {

my ( $lexical ) = \@_[ \$_ .. \$_ + $offset ];

do $stack_op

} map {

\$_ * $count } ( 0 .. \$last );

goto __SUB__

Not much body left:

tree_fold reduce_hash($left, $rite) { $rite ? sha2656 $left, $rite : $left }

Explicit map, keyword with and without lexicals.

4-32MiB are good chunk sizes.

MiB Explicit Implicit Keyword 1 0.02 0.01 0.02 2 0.03 0.03 0.04 4 0.07 0.07 0.07 8 0.14 0.13 0.10 16 0.19 0.18 0.17 32 0.31 0.30 0.26 64 0.50 0.51 0.49 128 1.00 1.02 1.01 256 2.03 2.03 2.03 512 4.05 4.10 4.06 1024 8.10 8.10 8.11

Don’t need Haskell or Scala. Efficient and elegant functional code.

In Perl 5.

Don’t need Haskell or Scala. Efficient and elegant functional code.

In Perl 6?

Don’t need Haskell or Scala. Efficient and elegant functional code.

In Perl 6? Doubt if even Damian could do it better.

use v6; sub tree_hash (Str $data, Int :$chunk_size = 1024²) { reduce_hash map &sha256, comb / . ** {1..$chunk_size} /, $data } multi sub reduce_hash ( @nodes) { reduce_hash redigest @nodes } multi sub reduce_hash ([$node]) { $node } sub redigest (@list) { map -> $a, $b? { $b ?? sha256 $a~$b !! $a }, @list; }

use v6; sub tree_hash (Str $data, Int :$chunk_size = 1024²) { reduce_hash map &sha256, comb / . ** {1..$chunk_size} /, $data } multi sub reduce_hash ( @nodes) { samewith redigest @nodes } multi sub reduce_hash ([$node]) { $node } sub redigest (@list) { map -> $a, $b? { $b ?? sha256 $a~$b !! $a }, @list; }

use v6; sub tree_hash (Str $data, Int :$chunk_size = 1024²) { reduce_hash map &sha256, comb / . ** {1..$chunk_size} /, $data } multi sub reduce_hash (@nodes) { samewith map -> $a, $b? { $b ?? sha256 $a~$b !! $a }, @nodes } multi sub reduce_hash ([$node]) { $node }

use v6; sub tree_hash (Str $data, Int :$chunk_size = 1024²) { reduce_hash map &sha256, comb / . ** {1..$chunk_size} /, $data } sub reduce_hash (@nodes) { treefold -> $a, $b? { $b ?? sha256 $a~$b !! $a }, @nodes } sub treefold (&block, *@data) { @data > 1 ?? samewith &block, map &block, @data !! @data[0] }

use v6; sub tree_hash (Str $data, Int :$chunk_size = 1024²) { reduce_hash map &sha256, comb / . ** {1..$chunk_size} /, $data } sub reduce_hash (@nodes) { treefold { sha256 $^a~$^b }, @nodes } multi treefold (&block, @data) { |@data } multi treefold (&block, @data where * >= &block.arity) {

given @data - @data % &block.arity -> $last { samewith &block, [|map(&block, @data[^$last]), |@data[$last..*]] } }

use v6; use Treefold; sub tree_hash (Str $data, Int :$chunk_size = 1024²) { reduce_hash map &sha256, comb / . ** {1..$chunk_size} /, $data } sub reduce_hash (@nodes) { treefold { sha256 $^a~$^b }, @nodes }

use v6; use Treefold; sub tree_hash (Str $data, Int :$chunk_size = 1024²) { treefold { sha256 $^a~$^b }, map &sha256, comb / . ** {1..$chunk_size} /, $data }

Don’t need Haskell or Scala. Efficient and elegant functional code.

In Perl 5 or Perl 6.

Easy to write (once you get the knack).

Easy to optimize (with some syntactic sugar). Surprisingly efficient.

Give it a try.

top related