What You Missed in Computer Science

Post on 02-Jul-2015

4828 Views

Category:

Software

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

This presentation explains what Computer Science actually entails. It covers ways to describe code performance using Big-Oh notation comparing different post meta and taxonomy queries and it discusses concurrency as it applies to WordPress, specifically data races and how they can occur while counting post views.

Transcript

Computer Science in

WordPress

Taylor Lovett

My name is Taylor Lovett

- Senior Strategic Web Engineer at 10up

- Core Contributor

- Plugin Author (Safe Redirect Manager)

- Plugin Contributor

- BS in Computer Science from the University

of Maryland, College Park

What is Computer Science?

- It can mean a lot of things. It is really the

study of computational theory, computer

software, and hardware.

Theory of Computation

- General Mathematics (Calculus, linear

algebra, general computational theory,

statistics)

- Algorithms (a method to solve a problem)

- Data structures (which data structure will

allow us to access our data the quickest?)

- Graph theory

Computer Software

- Programming techniques and design patterns

(i.e a singleton class)

- Concurrent design patterns (data races)

- Mobile software development

- Operating system software

- Web development

- Databases

- Networking

- Benchmarking

Computer Hardware

- Motherboards

- Memory types (solid state, RAM, etc.)

- Benchmarking (processor execution time)

- Pipelining

- Processors

Big-Oh Notation

- "Big O notation is used to classify algorithms by how

they respond (e.g., in their processing time or working

space requirements) to changes in input size." --

Wikipedia

- Very useful to describe how performant your code

may or may not be

- Big-Oh usually describes the upper bound of a

function (worst-case)

Big-Oh Notation (cont.)

- Big-Oh notation is concerned with measuring the rate

of growth of the amount of processing that your code

might do on an unknown input size

- In Big-Oh we are only concerned about how a our

code performs as the input size approaches infinity.

Mathematically speaking, this means we only care

about the highest order term:

i.e. O(3n2 + 5n) = O(n2) since as n approaches infinity

the only thing that matters is the n2

Let's look at some

examples!

// $fruits contains a non-empty array of strings

function contains_orange( $fruits = array() ) {

for ( $i = 0; $i < count( $fruits ); $i++ ) {

if ( 'orange' == $fruits[$i] ) return true;

}

return false;

}

Best Case Scenario: Loop executes once,

orange is found, and it returns.

Worst Case Scenario: Loop executes n times

(where n is the number of elements in $fruits)

Performance: contains_orange() is in O(n)

Remember!

- With Big-Oh we are only concerned with what

happens in the worst case. Sometimes knowing

what happens in the best case is useful, but we

are mostly worried about the performance hit

our code could take in the worst possible

situation.

// $fruits contains a non-empty array of strings. For educational

// purposes, $fruits is guaranteed to have at least one duplicate.

function contains_duplicate_fruit( $fruits = array() ) {

for ( $i = 0; $i < count( $fruits ); $i++ ) {

for ( $z = 0; $z < count( $fruits ); $z++ ) {

if ( $i != $z && $fruits[$z] == $fruits[$i] )

return true;

}

}

return false;

}

What does everyone think?

Best Case Scenario: Outer loop executes

once, inner loop executes twice, duplicate is

found, function returns

Worst Case Scenario: Outer loop executes n -

1 times (where n is the size of $fruits), inner

loop executes n times for each outer loop

execution... n * (n -1) = n2 - n

Performance: contains_duplicate_fruit is in

O(n2 - n) = O(n2)

An important reminder

- We dropped the (-n) from our final Big-Oh

evaluation because, as n approaches infinity,

n2 dominates and (-n) becomes insignificant.

But seriously... How is

this useful?

Big-Oh Notation and Databases

- Big-Oh notation is used a lot in conjunction

with SQL operations.

- We've all heard that indexing a column in

MySQL makes search on that column faster.

- But why? What does that actually mean?

MySQL Indexes

- An index is a data structure that speeds up

search time for information.

- Without an index, searching for a specific

column value is O(n) because in the worst case

scenario every single row in the table must be

examined.

MySQL Indexes

- When a column is indexed, MySQL takes the data

across all of the rows in that column and stores

references to that data in a B-tree (this structure is

used for the majority of index types).

- A B-tree is just what it sounds like: A tree of data that

speeds up search time. The worst case scenario for

the amount of items to be processed in a B-tree is log

n. A log is a mathematical function such that:

n2 > n > log n

http://en.wikipedia.org/wiki/B-tree

Post Meta Queries

- The full Big-Oh analysis of a post meta query is

pretty complex because of the join operation and

therefore is outside the scope of this talk.

- For our purposes, searching for posts based on a

meta key is O(n) where n is the number of posts that

have that key.

- Let's frame this in terms of featured posts. Featured

posts refers to the situation where a website needs to

mark select posts as featured and query for them.

Featured Posts Solution #1

On post update:

if ( isset( $_POST['meta_box_feature'] ) )

update_post_meta( $post_id, 'featured', 1 );

else

update_post_meta( $post_id, 'featured', 0 );

Query:

$args = array(

'meta_key' => 'featured',

'meta_value' => 1,

);

$featured_posts = new WP_Query( $args );

Solution #1 Analysis

- Using this code, every time a post is saved, it will have

post meta attached to it such that 'featured' = 1 or 0. This

will create a ton of unnecessary post meta rows.

- Remember searching for posts based on a meta key is

O(n) where n is the number of posts that have that key.

Therefore saving meta when a post is not featured is not

only unnecessary but will really slow us down. This would

result in O(m) performance where m is the number of

posts!

Featured Posts Solution #2

On post update:

if ( isset( $_POST['meta_box_feature'] ) )

update_post_meta( $post_id, 'featured', 1 );

else

delete_post_meta( $post_id, 'featured' );

Query:

$args = array(

'meta_key' => 'featured',

'meta_value' => 1,

);

$featured_posts = new WP_Query( $args );

Solution #2 Analysis

- This solution is a major improvement over our first

one. This will result in O(n) search time where n is the

number of featured posts.

- However, we can still do better.

Featured Posts Solution #3

Let's create a tag called 'featured' and attach it to all our featured

posts:

On init:

$args = array( ... );

register_taxonomy( 'featured', 'post', $args );

Query:

$args = array(

'post_tag' => 'featured'

);

$featured_posts = new WP_Query( $args );

Solution #3 Analysis

- For our purposes, searching for posts based on a tag

is O(log n) since there is an index on the tag id

column.

The full Big-Oh analysis of our tag solution is pretty

complex due to SQL join operations and therefore is

beyond the scope of this talk.

Concurrency

- In Computer Science concurrency is a

property describing the event where multiple

computations are executed simultaneously,

sometimes interacting with each other.

Concurrency

- With concurrent programming we can, among

other things, force each core in a computer to

process a piece of a larger problem or handle

separate tasks. This is extremely powerful.

- When not properly account for, Concurrency

can sometimes result in unexpected bugs that

are difficult to reproduce.

Concurrency in WordPress

- Concurrency takes a slightly different form in

WordPress. We don't solve problems by

starting new threads/processes. However,

since behind the scenes servers can run

multiple processes at the same time and thus

multiple users can execute the same code

simultaneously, issues surrounding

concurrency can arise.

Tracking Postviews in WordPress

- A common request in WordPress is to display the

number of views for each post on the frontend.

- There are many different ways to approach this

problem; the most common is to increment an

integer stored in post meta each time a post is

viewed, then to display this number for each post.

- This implementation can lead to data races.

Here is the code that executes on

each post request

$views = get_post_meta( $id, 'views', true );

$views++;

update_post_meta( $id, 'views', $views );

Data Races

- A data race is the situation where two or more

threads access a shared memory location, at

least one of those accesses is a write, and the

order of the accesses is unknown (meaning

there are no explicit locking mechanisms used).

- Think of each page request as a thread on the

server. If two users request a post at the same

time, a data race for pageviews occurs since

both accesses are writing to the postmeta

table.

A Possible Ordering of Events

Code executed for User A is in red and User B in blue

$views = get_post_meta( $id, 'views', true ); // $views = 0

$views++; // $views = 1

update_post_meta( $id, 'views', $views ); // _views = 1

$views = get_post_meta( $id, 'views', true ); // $views = 1

$views++; // $views = 2

update_post_meta( $id, 'views', $views ); // _views = 2

In this ordering of events, $views ends up with a value of 2

which is what we want. However, these events could occur

in any order...

Another Ordering of Events

$views = get_post_meta( $id, 'views', true ); // $views = 0

$views = get_post_meta( $id, 'views', true ); // $views = 0

$views++; // $views = 1

$views++; // $views = 1

update_post_meta( $id, 'views', $views ); // _views = 1

update_post_meta( $id, 'views', $views ); // _views = 1

In this ordering of events, $views ends up with a value of 1

which is NOT what we want.

Conclusion:

This algorithm won't work!

Solution to Pageview Problem?

Solution 1: Jetpack plugin. We can install

Jetpack and leverage it's stats API to query

information on specific posts.

Solution 2: Google Analytics. Using a websites

Google Analytics account, we can set custom

variables on a post-to-post basis and query the

API based on those variables.

Questions?

top related