Accelerating OpenERP accounting: Precalculated period sums

Accelerating OpenERP accounting:Precalculated period sums

Borja López Soilánhttp://www.kami.es

http://www.kami.es/

Index

Current approach (sum of entries)● Current approach explained.● Performance analysis.Proposal: “Precalculated period sums”● Alternative 1: Accumulated values using triggers

– Proposed by Ferdinand Gassauer (Chricar)● Alternative 2: Period totals using the ORM

– Proposed by Borja L.S. (@NeoPolus)● Current approach vs Precalculated period sums

http://www.chricar.at/ChriCar/index.html

http://twitter.com/NeoPolus

Current approach: Sum of entries

Currently each time you read the credit/debit/balance of one account OpenERP has

to recalculate it from the account entries (move lines).

The magic is done by the “_query_get()” method of account.move.line, that selects the lines to consider, and the “__compute()” method of

account.account that does the sums.

Inside the current approach

_query_get() filters: builds the “WHERE” part of the SQL query that selects all the account move lines involving a set of accounts.● Allows to do complex filters, but usually look like

“include non-draft entries from these periods for these accounts”.

__compute() sums: uses the filter to query for the sums of debit/credit/balance for the current account and its children.● Does just one SQL query for all the accounts. (nice!)● Has to aggregate the children values on python.

Sample query done by __computeSELECT l.account_id as id, COALESCE(SUM(l.debit), 0) as debit, COALESCE(SUM(l.credit), 0) as credit, COALESCE(SUM(l.debit),0) - COALESCE(SUM(l.credit), 0) as balance FROM account_move_line l WHERE l.account_id IN (2, 3, 4, 5, 6, ..., 1648, 1649, 1650, 1651) AND l.state <> 'draft' AND l.period_id IN (SELECT id FROM account_period WHERE fiscalyear_id IN (1)) AND l.move_id IN (SELECT id FROM account_move WHERE account_move.state = 'posted') GROUP BY l.account_id

Account + children = lot of ids!

Sample query plan QUERY PLAN

---------------------------------------------------------------------------------------------------------------------

HashAggregate (cost=57.83..57.85 rows=1 width=18)

-> Nested Loop Semi Join (cost=45.00..57.82 rows=1 width=18)

Join Filter: (l.period_id = account_period.id)

-> Nested Loop (cost=45.00..57.52 rows=1 width=22)

-> HashAggregate (cost=45.00..45.01 rows=1 width=4)

-> Seq Scan on account_move (cost=0.00..45.00 rows=1 width=4)

Filter: ((state)::text = 'posted'::text)

-> Index Scan using account_move_line_move_id_index on account_move_line l (cost=0.00..12.49 rows=1 width=26)

Index Cond: (l.move_id = account_move.id)

Filter: (((l.state)::text <> 'draft'::text) AND (l.account_id = ANY ('{2,3,4,5, ..., 1649,1650,1651}'::integer[])))

-> Index Scan using account_period_fiscalyear_id_index on account_period (cost=0.00..0.29 rows=1 width=4)

Index Cond: (account_period.fiscalyear_id = 1)

Ugh!, sequential scan on a table with (potentially)

lots of records... :(

Performance Analysis Current approach big O 1/2

“Selects all the account move lines” The query complexity depends on l, the number of move lines for that account and (recursive) children:

O(query) = O(f(l))“Has to aggregate the children values”The complexity depends on c, the number of children.

O(aggregate) = O(g(c))

Current approach big O 2/2

O(__compute) = O(query) + O(aggregate)

O(__compute) = O(f(l)) + O(g(c))

What kind of functions are f and g?

Let's do some empiric testing (funnier than maths, isn't it?)...

Let's test this chart... 1/2

The official Spanish chart of accounts, when empty:

Has about 1600 accounts.Has 5 levels.

(to test this chart of accounts install the l10n_es module)

Let's test this chart... 2/2

How many accounts below each level?

Account code Number of children (recursive)

Level 5 – 430000(leaf account)

0

Level 4 - 4300 1Level 3 - 430 6Level 2 - 43 43Level 1 - 4 192Level 0 – 0 (root account)

1678

To get the balance of account “4” we need to sum the balance of 192 accounts!

Ok, looks like the number of children c has a lot of influence, and the number of moves l has little or zero influence, g(c) >> f(l)Lets split them...

Now it is clear that g(c) is linear! (note: the nº of children grows exponentially)O(g(c)) = O(c)

So, the influence was little, but linear too!O(f(l)) = O(l)

Big O - Conclusion

O(__compute) = O(l) + O(c)

c has an unexpectedly big influence on the results => Bad performance on complex charts of accounts!c does not grow with time, but l does... => OpenERP accounting becomes slower and slower with time! (though it's not that bad as expected)

Proposal: Precalculated sums

OpenERP recalculates the debit/credit/balance from move lines each time.Most accounting programs store the totals per period (or the cumulative values) for each account. Why?● Reading the debit/credit/balance becomes much

faster.● ...and reading is much more data intensive than

writing: – Accounting reports read lots of times lots of accounts.– Accountants only update a few accounts at a time.

It's really faster?

Precalculated sums per period means:● O(p)query (get the debit/credit/balance of each

period for that account) instead of O(l)query, with p being the number of periods, p << l. Using opening entries, or cumulative totals, p becomes constant => O(1)

● If aggregated sums (with children values) are also precalculated, we don't have to do one O(c)aggregation per read.

It's O(1) for reading!!(but creating/editing entries is a bit slower)

Alternative 1: Accumulated values using triggers (I)

Proposed by Ferdinand Gassauer.How does it work?● New object to store the accumulated

debit/credit/balance per account and period (let's call it account.period.sum).

● Triggers on Postgres (PL/pgSQL) update the account_period_sum table each time an account move line is created/updated/deleted.

Opening 1st 2nd 3rd 4thMove line values in period

400 +200, +50

+25 -400 +25, +200

Value in table 400 650 675 275 500

Alternative 1: Accumulated values using triggers (II)

How does it work?(cont.)● The data is calculated accumulating the values from

previous periods. (Ferdinand prototype requires an special naming of periods for this).

● Creates SQL views based on the account account_period_sum table.

● For reports that show data aggregated by period:– New reports can be created that either directly use the

SQL views, or use the account.period.sum object.● The account.account.__compute() method could be

extended to optimize queries (modified to make use of the account_period_sum when possible) in the future.

Alternative 1: Accumulated values using triggers (III)

Good pointsTriggers guarantee that the data is always in sync.(even if somebody writes directly to the database!)

Triggers are fast.Prototype available and working! - “used this method already in very big installations - some 100 accountants some millions moves without any problems” (Ferdinand)

Bad pointsDatabase dependent triggers.Triggers are harder to maintain than Python code.Makes some assumptions on period names. (as OpenERP currently does not flag opening periods apart from closing ones)

Alternative 2: Period totals using the ORM (I)

Proposed by Borja L.S. (@NeoPolus).How does it work?● New object to store the debit/credit/balance sums

per account and period (and state):

● Extends the account.move.line open object to update the account.sum objects each time a line is created/updated/deleted.

Opening 1st 2nd 3rd 4thMove line values in period

400 +200, +50

+25 -400 +25, +200

Value in table 400 250 25 -400 225

http://twitter.com/NeoPolus

Alternative 2: Period totals using the ORM (II)

How does it work?(cont.)● Extends account.account.__compute() method to

optimize queries:– If the query filters only by period/fiscal year/state, the

data is retrieved from the account.sum object.– If the query filters by dates, and one ore more fiscal

periods are fully included on that range, the data is retrieved from for the account.sum objects (for the range covered by the periods) plus the account.move.lines (the range not covered by periods).

– Filtering by every other field (for example partner_id) causes a fallback into the normal __compute method.

Alternative 2: Period totals using the ORM (III)

Good pointsDatabase independent.Optimizes all the accounting.Flexible.No PL/pgSQL triggers required, just Python => Easier to maintain.

Bad pointsDoes not guarantee that the sums are in sync with the move lines. (but nobody should directly alter the database in first place...)

Python is slower than using triggers.No prototype yet! :)(But take a look at Tryton stock quantity computation)

http://codereview.appspot.com/4276068/

Current approach VS Period sums Current approachPros

● No redundant data.● Simpler queries.

Cons● Slow.

– Reports and dashboard charts/tables are performance hungry.

● Becomes even slower with time.

Precalculated sumsPros

● Fast, always.● Drill-down navigation.

Cons● Need to keep sums in

sync with move lines.● More complex

(__compute) or specific queries to make use of the precalculated sums.

Precalculated sums – Drill down navigation (Chricar prototype) 1/3



And one last remark...

...all this is applicable to the stock quantities computation too!

Accelerating OpenERP accounting: Precalculated period sums

Technology

account openerp

balance of account

balancefrom account

movewhere account

account andrecursive

compute method of account

account entriesmove

period cost