Facebook 2010 (confidential)
HipHop Compiler for PHP
Transforming PHP into C++
HipHop Compiler TeamFacebook, Inc.
May 2010
PHP is easy to read
Facebook 2010 (confidential)
<?php
function tally($count) { $sum = 0; for ($i = 0; $i < $count; ++$i) { $sum += $i; } return $sum;}
print tally(10) . “\n”;
PHP syntax is similar to C++/Java
Facebook 2010 (confidential)
<?php
class Tool extends Object { public $name;
public use($target) {}}
$tool = new Tool();$tool->name = ‘hammer’;$tool->use($nail);
PHP Statements and Expressions
Facebook 2010 (confidential)
ExpressionList, AssignmentExpression, SimpleVariable, DynamicVariable, StaticMemberExpression, ArrayElementExpression, DynamicFunctionCall, SimpleFunctionCall, ScalarExpression, ObjectPropertyExpression, ObjectMethodExpression, ListAssignment, NewObjectExpression, UnaryOpExpression, IncludeExpression, BinaryOpExpression, QOpExpression, ArrayPairExpression, ClassConstantExpression, ParameterExpression, ModifierExpression, ConstantExpression, EncapsListExpression,
FunctionStatement, ClassStatement, InterfaceStatement, ClassVariable, ClassConstant, MethodStatement, StatementList, BlockStatement, IfBranchStatement, IfStatement, WhileStatement, DoStatement, ForStatement, SwitchStatement, CaseStatement, BreakStatement, ContinueStatement, ReturnStatement, GlobalStatement, StaticStatement, EchoStatement, UnsetStatement, ExpStatement, ForEachStatement, CatchStatement, TryStatement, ThrowStatement,
PHP is weakly typed
Facebook 2010 (confidential)
<?php
$a = 12345;$a = “hello”;$a = array(12345, “hello”, array());$a = new Object();
$c = $a + $b; // integer or array$c = $a . $b; // implicit casting to strings
Core PHP library is small
Facebook 2010 (confidential)
- Most are in functional style- ~200 to 500 basic functions
<?php
$len = strlen(“hello”); // C library$ret = curl_exec($curl); // open source
PHP is easy to debug
Facebook 2010 (confidential)
<?php
function tally($count) { $sum = 0; for ($i = 0; $i < $count; ++$i) { $sum += $i; var_dump($sum); } return $sum;}
PHP is easy to learn
Facebook 2010 (confidential)
easy to read easy to write easy to debug
Hello, World!
PHP is slow
Facebook 2010 (confidential)
C++ Java C# Erlang Python Perl PHP0
10
20
30
40
http://shootout.alioth.debian.org/u64q/benchmark.php?
test=all&lang=all
CPU
Why is Zend Engine slow?
Byte-code interpreter
Dynamic symbol lookups
functions, variables, constants class methods, properties,
constants
Weakly typing zval array()
Facebook 2010 (confidential)
Transforming PHP into C++
Facebook 2010 (confidential)
g++ is a native code compiler
static binding
functions, variables, constants class methods, properties,
constants
type inference integers, strings, arrays,
objects, variants struct, vector, map, array
Static Binding – Function Calls
Facebook 2010 (confidential)
<?php$ret = foo($a);
// C++Variant v_ret;Variant v_a;
v_ret = f_foo(v_a);
Dynamic Function Calls
Facebook 2010 (confidential)
<?php$func = ‘foo’;$ret = $func($a);
// C++Variant v_ret;Variant v_a;String v_func;
V_func = “foo”;v_ret = invoke(v_func, CREATE_VECTOR1(v_a));
Function Invoke Table
Facebook 2010 (confidential)
Variant invoke(CStrRef func, CArrRef params) { int64 hash = hash_string(func); switch (hash) { case 1234: if (func == “foo”) return foo(params[0]) } throw FatalError(“function not found”);}
Re-declared Functions
Facebook 2010 (confidential)
<?phpif ($condition) { function foo($a) { return $a + 1;}} else { function foo($a) { return $a + 2;}}$ret = foo($a);
// C++if (v_condition) { g->i_foo = i_foo$$0; } else { g->i_foo = i_foo$$1;}g->i_foo(v_a);
Volatile Functions
Facebook 2010 (confidential)
<?phpif (!function_exists(‘foo’)) { bar($a);} else { foo($a);}function foo($a) {}
// C++if (f_function_exists(“foo”)) { f_bar(v_a);} else { f_foo(v_a);}g->declareFunction(“foo”);
Static Binding – Variables
Facebook 2010 (confidential)
<?php$foo = ‘hello’;function foo($a) { global $foo; $bar = $foo . $a; return $bar;}
// C++String f_foo(CStrRef v_a) { Variant &gv_foo = g->GV(foo); String v_bar; v_bar = concat(toString(gv_foo), v_a); return v_bar;}
GlobalVariables Class
Facebook 2010 (confidential)
class GlobalVariables : public SystemGlobals {public: // Direct Global Variables Variant gv_foo;
// Indirect Global Variables for large compilation enum _gv_enums { gv_foo, } Variant gv[1]; };
Dynamic Variables
Facebook 2010 (confidential)
<?phpfunction foo() { $b = 10; $a = 'b'; echo($$a);}
void f_foo() { class VariableTable : public RVariableTable { public: int64 &v_b; String &v_a; VariableTable(int64 &r_b, String &r_a) : v_b(r_b), v_a(r_a) {} virtual Variant getImpl(const char *s) { // hash – switch – strcmp } } variableTable(v_b, v_a);
echo(variableTable.get("b”));}
Static Binding – Constants
Facebook 2010 (confidential)
<?phpdefine(‘FOO’, ‘hello’);echo FOO;
// C++echo(“hello” /* FOO */);
Dynamic Constants
Facebook 2010 (confidential)
<?phpif ($condition) { define(‘FOO’, ‘hello’);} else { define(‘FOO’, ‘world’);}echo FOO;
// C++if (v_condition) { g->declareConstant("FOO", g->k_FOO, "hello”);} else { g->declareConstant("FOO", g->k_FOO, "world”);}echo(toString(g->k_FOO));
Static Binding with Classes
Class methods
Class properties
Class constants
Re-declared classes
Deriving from re-declared classes
Volatile classes
Facebook 2010 (confidential)
Summary - Dynamic Symbol Lookup Problem is nicely solved
Rule of 90-10
Dynamic binding is a general form of static binding
Generated code is a super-set of static binding and dynamic binding
Facebook 2010 (confidential)
Problem 2. Weakly Typing
Type Inference
Runtime Type Info (RTTI)-Guided Optimization
Type Hints
Strongly Typed Collection Classes
Facebook 2010 (confidential)
Type Coercions
Facebook 2010 (confidential)
Type Inference Example
Facebook 2010 (confidential)
<?php$a = 10;$a = ‘string’;
Variant v_a;
Why is strong type faster?
Facebook 2010 (confidential)
$a = $b + $c;
if (is_integer($b) && is_integer($c)) { $a = (int)$b + (int)$c;} else if (is_array($b) && is_array($c)) { $a = array_merge((array)$b + (array)$c);} else { …}
int64 v_a = v_b + v_c;
Type Inference Blockers
Facebook 2010 (confidential)
<?phpfunction foo() { if ($success) return 10; // integer return false; // doh’}
$arr[$a] = 10; // doh’
++$a; // $a can be a string actually!
$a = $a + 1; // $a can become a double, ouch!
RTTI-Guided Optimization
Facebook 2010 (confidential)
<?phpfunction foo($x) { ...}
foo(10);foo(‘test’);
void foo(Variant x) { ...}
Type Specialization Method 1
Facebook 2010 (confidential)
template<typename T>void foo(T x) { // generate code with generic T (tough!)}
-Pros: smaller generated code-Cons: no type propagation
Type Specialization Method 2
Facebook 2010 (confidential)
void foo(int64 x) { // generate code assuming x is integer}void foo(Variant x) { // generate code assuming x is variant}
-Pros: type propagation-Cons: variant case is not optimized
Type Specialization Method 3
Facebook 2010 (confidential)
void foo(int64 x) { // generate code assuming x is integer}void foo(Variant x) { if (is_integer(x)) { foo(x.toInt64()); return; } // generate code assuming x is variant}
-Pros: optimized for integer case-Cons: large code size
Type Hints
Facebook 2010 (confidential)
<?phpfunction foo(int $a) { string $b;}
class bar { public array $c;}
bar $d;
Strongly Typed Collection Classes
That omnipotent “array” in PHP
Swapping out underlying implementation:Array escalationPHP classes:
VectorSetMap: un-orderedThen Array: ordered map
Facebook 2010 (confidential)
Compiler Friendly Scripting Language
If all problems described here are considered when designing a new scripting language, will it run faster than Java?
Facebook 2010 (confidential)