Transaction Validation and Analysis January 14, 2020 _________________________________________ A Major Qualifying Project Report Submitted to the Faculty of Worcester Polytechnic Institute In partial fulfilment of the requirements for the Degree of Bachelor of Science. Project ID: 14985 Project Team Manasi Danke CS Ethan Merrill MGE Joseph Yuen CS Project Advisors: Michael Ginzberg, Business Department Robert Sarnie, Business Department Wilson Wong, Computer Science Department Sponsored by: Hedge Fund Company
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Transaction
Validation and Analysis January 14, 2020 _________________________________________
A Major Qualifying Project Report Submitted to the Faculty of Worcester Polytechnic Institute In partial
fulfilment of the requirements for the Degree of Bachelor of Science.
Project ID:
14985
Project Team
Manasi Danke CS
Ethan Merrill MGE
Joseph Yuen CS
Project Advisors:
Michael Ginzberg, Business Department
Robert Sarnie, Business Department
Wilson Wong, Computer Science Department
Sponsored by:
Hedge Fund Company
i
Acknowledgements
First, we would like to thank our sponsor for the amazing opportunity to learn about financial
technology and to assimilate into the company culture.
We would also like to thank our WPI advisors Professor Michael Ginzberg, Robert Sarnie, and Wilson
Wong for their availability and support. Our regular meetings with them encouraged us and taught us
how to be agile in the financial industry.
Lastly, we would like to thank the open source community for their extensive documentation and
tutorials. We were able to learn a wide array of new technologies due to these valuable resources.
Thank you,
Manasi Danke
Ethan Merrill
Joseph Yuen
ii
Table of Contents
Table of Contents .......................................................................................................................................... ii
Table of Figures ........................................................................................................................................... vii
Abstract ...................................................................................................................................................... viii
Executive Summary ...................................................................................................................................... ix
1.1 Problem ............................................................................................................................................... 1
Thread 1 (Winners and Losers Report Update) .................................................................................... 1
2.1 Finance Industry .................................................................................................................................. 3
5.2 User Stories ....................................................................................................................................... 16
6.4 Use Case Diagrams ............................................................................................................................ 23
6.5 User Interface Structure Diagram ..................................................................................................... 24
6.6 User Experience ................................................................................................................................ 25
6.6.1 Home .......................................................................................................................................... 25
6.6.4 History ........................................................................................................................................ 28
User Stories: ........................................................................................................................................ 36
User Stories: ........................................................................................................................................ 38
8.2 User Feedback ................................................................................................................................... 43
9. Future Work ............................................................................................................................................ 44
Works Cited ................................................................................................................................................. 51
Remaining Market Value (remMV) ..................................................................................................... 64
vi
Total Cost ............................................................................................................................................ 64
Total Sales ........................................................................................................................................... 64
Total Terminal Value ........................................................................................................................... 65
Return Period ...................................................................................................................................... 65
APPENDIX F: Site Map ............................................................................................................................. 65
APPENDIX G: Site Structure Diagram ...................................................................................................... 66
Human-readable commentary was generated to provide a more understandable narration of
changes in gross profit. This commentary concatenates the Deal Name, Strategy, and most
recent month over month difference in gross profit into an intelligible sentence. The user also
had the ability to drill through the performance commentary and view transactions which
contributed to notable changes in gross profit. The performance commentary was presented on
the Strategy level.
16
4. Integrate data lake to Power BI (Integration and Automation)
This theme involved the steps required to create a connection between the backend (Datalake)
and the frontend (Power BI). This connection caches all the data in Power BI via refresh in the
Power BI interface.
5. Design User Experience (User Experience and Design)
A large portion of this project focused on how to best display the data on the front end, so that
the user was informed but not overwhelmed. To do this, we created User Stories focused on
what individuals wanted to see and how they wanted to see the displays refined for future
releases.
6. Write Documentation (Documentation)
A goal of our project was to write code which could be maintained in the future. To do this, we
produced documentation to ensure that users and developers were well informed of the
capabilities and design of the dashboard system.
5.2 User Stories
Since Epic 1 was a continuation of a previous MQP project, most User Stories for Epic 1 consisted of
setting up our development environments, analyzing the code, running tests on the system, and
producing a tutorial video.
For Epic 2, we broke down our themes based on the different sections of the dashboard. We then made
User Stories for each theme. Each theme required User Stories that took place in both Databricks and
Power BI. Some User Stories focused on research, as we were not as familiar with some of the
technologies such as Power BI and Pandas. All User Stories are listed in Appendix A and throughout the
paper.
17
6. Design
In order to execute the two threads, we utilized Azure Datalake to store and maintain relevant data. We
also used Databricks with Spark to perform calculations and manipulate the data. Lastly, we worked with
Power BI to visualize our data and insights. We utilized specific design patterns to produce modular and
well documented code and developed multiple iterations of Databricks notebooks. Additionally, the
Power BI dashboard went through a series of top down design changes as the capabilities and
limitations of the programs involved became better understood by the team. To explain our design
choices and how the program is structured, we created a series of diagrams and descriptions. Then, we
explained how the user interface looks and functions.
6.1 System Architecture To query the data lake from Databricks, we had to understand the firm’s cloud infrastructure. As seen in
Figure 6.0, we learned that a script fetches raw transaction data from the Geneva Accounting System via
the Active Batch Scheduler and then prepares it to be stored in the Azure Data Lake. Then, additional
scripts convert the data into Delta tables which can be manipulated in Databricks. By loading the tables
into the data lake, Power BI can import the tables and display them as visualizations.
Figure 6.0 System Architecture Diagram
6.2 Data Flow Diagram (DFD) The following figures describe how data is processed and flows throughout the Azure Validation
Dashboard system. Three levels of detail are provided. The Context Diagram presents the process from a
high level with the entire system represented by one process which views financial performance. The
next diagram, level 0 goes into more detail on how the data moves between different systems and
18
external entities in the process. The diagram breaks out the front-end viewing processes, the back-end
data analysis processes, and summarizes the data which flows between them.
Figure 6.1 Context and Level 0 Diagram
Finally, the most detailed diagram is the level 1 diagram which introduces data stores. In this diagram,
one can see how data flows for the main processes and views in the dashboard. Most viewing processes
simply access locally cached data from the Power BI Datastore (D2). When all the data is refreshed
(process 1.0) the backend Databricks processes are triggered to run and update the data in Power BI.
The updates from the Geneva Accounting System (external entity) are currently scheduled by the firm.
19
Figure 6.2 Data Flow Diagram Level 1
6.3 Entity Relationship Diagram (ERD) To create the back-end table for the Power BI dashboard, the team needed to understand which tables
to access in the data lake. Figure 6.2 displays three tables that were accessed. The tables are not
connected on shared keys, but they are uploaded to the data lake using prebuilt scripts.
20
Figure 6.3 As Is Data Lake Entity Relationship Diagram
The figure below shows the additional tables that were generated from sections of the former tables.
Even though the generated tables default.results_and_flows and default.irr_timeseries contain data
from the other tables, they are not joined in SQL. Instead, we created them using Pandas Data Frame,
converted them to Spark Data Frame, and then uploaded them to the data lake. We created two tables
because results_and_flows analyzes the selected PeriodEndDate and the previous month, while
irr_timeseries examines the data from inception to the selected PeriodEndDate. By having these two
time ranges, we could set Power BI’s drillthrough functionality to exclusively show what transactions
contribute to validation checks pertaining to month over month changes.
21
Figure 6.4 New Data Lake Entity Relationship Diagram
Since the above figure does not show the relationships between tables converted into Pandas Data
Frames, we created an entity relationship diagram on the Pandas Data Frame level. For the
results_and_flows table, we merged reporting.irr_results and reporting.irr_mod_cashflows to show
summary values such as IRR, MOIC, and GrossProfit as well as the transactions that contributed to them
over the PeriodEndDate selected and the previous month. Many of the additional tables merged into
the central table are alert tables created from the irr_results and irr_mod_cashflows Data Frame. The
team had to merge all alerts into the central table because of the limitations of PowerBI. Although
Power BI offers powerful visualizations and useful functionality such as drillthrough and drilldown,
Power BI can only join tables on one attribute. Thus, to use certain features such as drillthrough for
alerts, we had to merge all alerts into one table.
22
23
Figure 6.5 results_and_flows Data Frame Entity Relationship Diagram
Instead of proving alerts like in Figure 6.5 for the selected PeriodEndDate and the previous month,
Figure 6.6 was used to calculate historical estimations based on data from inception to the selected
PeriodEndDate. These values include the mean, standard deviation, month over month change, and
linear regression estimate for the next PeriodEndDate.
Figure 6.6 irr_timeseries Data Frame Entity Relationship Diagram
6.4 Use Case Diagrams The following figure describes the use cases for the Azure Validation Dashboard. The three main use
cases are: Analyze Historical Performance, View Alerts, and View Commentary. These uses are featured
prominently in the user interface. Also illustrated is the drillthrough use case. Drillthrough functionality
24
is included in the View Alerts and Commentary use cases. This drillthrough routes the user to the Raw
Data page. This page is represented by the View Raw Data 2 Month use case.
Figure 6.7 Use Case Diagram
6.5 User Interface Structure Diagram We constructed a user interface structure diagram to show how each view is connected. Users start at
the home view which acts as a launch pad to see different analyses. When a user goes to view Alerts or
Commentary, the user can drill through on an entry in the tables and navigate to Raw Data 2 Month for
supporting data. When a user goes to the History screen, the user can select further analyses derived
from Raw Data ITD. Explanations for each screen can be found in the User Experience section. Another
version of this diagram that includes every alert page can be found in Appendix F.
25
Figure 6.8 User Interface Structure Diagram
6.6 User Experience
6.6.1 Home The home display of the Azure Validation Dashboard was designed to give the user a general overview
of what the program is capable of and what is within the three main uses of the program. The top of the
page displays the date for which the report was generated. In all instances, this date is automatically set
to the latest available date in the alerts table (results_and_flows). As seen in the figure below, the firm’s
logo is to the left of the date and to the right is the page name. The scroller lies below the date display
and provides a preview of the data in the commentary section. The three main functions are denoted by
large clickable panes which lead the user to their respective landing pages. These panes are titled
Commentary, Alerts, and History.
26
Figure 6.9 Power BI Home
6.6.2 Commentary The Commentary Page was designed to have all the capability of the IRR Analytic report. The IRR
Analytic is a report manually created by the accounting department each month which describes the
biggest positive and negative changes in gross profit across different regions and time periods on a per
deal (Strategy) basis. These gains and losses are described in easy to read phrases using the following
syntax: [Deal Name] [Strategy] [gross profit] [gain or loss]. For example: ‘Apple Computer (BRKT:0005)
1.3mm gain.’ The wording in the original report is ‘contributors’ and ‘detractors’ for the biggest gainers
and losers, respectively for a given region or time period. Our report has a table on the left for the
biggest contributors and a table on the right for the biggest detractors. These tables can then be filtered
on region and return period.
Figure 6.10 Power BI Commentary
*firm logo
27
6.6.3 Alerts The Alerts Page contains all the validation checks performed on the data. Each button leads to a
different alert type as described on that button label. On the right of the page, the number of Business
Units, Portfolios, Strategies and Sycodes are displayed. These numbers are shown in order to give the
user an understanding of what data was analyzed. As a user of this program at the firm, the user is
expected to know how many business units and portfolios exist. Therefore, if the numbers displayed on
this page are radically different than what is known, it is a sign that the program may have
malfunctioned.
Each alert page has a table with relevant identifiable information for that alert. All entries which were
flagged with that alert for the given month will appear in the alert table. Each alert entry has
drillthrough capability. This means that users can right click and select the drillthrough option to be
directed to the raw data page where they can view all the transaction level information for that flagged
row.
There are 17 alerts which are broken up into 6 categories. The categories, alerts, and descriptions are
listed below:
• Transactions
o RemMV Change, but no transaction: Outputs Strategy codes that have a remMV change
over the previous month and do not contain any significant transactions (any
TransactionType that contains the string 'buy', 'sel', and 'AccountingRelated').
o RemMV same, but transaction exists: Outputs Strategy codes that have no change in
remMV over the previous month and contain any significant transactions (any
TransactionType that contains the contains 'buy', 'sel', and 'AccountingRelated').
o Monetized Strategy code with Transactions: Identifies Strategy codes that have been
monetized, have a quantity of zero when transaction type is total terminal value, and
have any other transactions for a given month.
o Gross Profit Changed, but No Transaction: Identifies Strategy codes that have a
GrossProfit change over the previous month and do not contain any significant
transactions (transactions that contain the string 'buy','sel' and 'AccountingRelated').
o Gross Profit Same, but Transaction Exists: Outputs Strategy codes that have no change
in GrossProfit over the previous month and contain significant transactions (any
TransactionType that the string contains 'buy', 'sel', and 'AccountingRelated').
• IRR, MOIC Breaks
o Negative IRR Change, Positive MOIC Change: Identifies Strategy codes that have a
positive MOIC and Negative IRR change over the previous month.
o Negative MOIC Change, Positive IRR Change: Identifies Strategy codes that have a
negative MOIC and positive IRR change over the previous month.
o MOIC < 1, IRR Positive: Identifies Strategy codes that have a MOIC less than 1 and an IRR
that is positive.
• Missing Data
o Missing Begin Date: Sycode level analysis which determines if the begin date field is null
• Sycode
28
o SyCode Price Inconsistencies Across Portfolios: Identifies SyCode Price inconsistencies
across portfolios for a given month.
o Sycode: One-to-Many Strategies: Lists Sycodes that belong to multiple Strategies.
o Sycode Price Change Month Over Month: Lists SyCode month over month (MoM)
changes over the previous month (in any Sycode-StratCode pair), if a current and
previous month exist.
• Monetized
o Ongoing, but Listed End Date: Strategy Codes that are ongoing and have an end date
that is not the current period EndDate.
o Not Monetized and RemMV is 0: Identifies Strategy codes that have a
Total_Terminal_Value/remMV of 0 and are not monetized.
o RemMV Not 0, but Listed as Monetized: Finds Strategy codes that are monetized and
contain a non-zero remMV.
o Monetized, No Listed End Date: Identifies Strategy codes that are monetized and do not
have an end date.
• Strategy
o New Strategy Codes: Identifies Strategy codes that exist in the given PeriodEndDate, but
do not exist in the previous month.
Figure 6.11 Power BI Alerts
6.6.4 History All panes in the history page use the timeseries data table as their source. Additionally, all historical
analysis occurs on three metrics: gross profit, Internal Rate of Return (IRR), and Multiple of Invested
Capital (MOIC).
The timeseries view is designed to display 1-5 deals a time. To use this view, the Deal Name, return
period, and business unit are selected on the left. After this, the month over month changes in the three
metrics will be displayed in the table in the center of the page. Additionally, on the left the historical
values for gross profit, IRR, and MOIC will be plotted in separate charts. Below these charts the
29
minimum, maximum, average and projected values for each deal selected will be shown. See Figure
6.12.
Back on the history page, three other views aside from timeseries can be selected. Each of these views
shows a table which compares the selected metric to its historical average on a per Strategy basis. The
table is sorted by the absolute value of the difference between the mean historical value and the most
recent value of the metric. Each table holds additional identifiable information such as return period,
portfolio, business unit, and Sycode to assist the user in understanding where this data can be located
and what might explain the deviation between the most recent and average value.
Figure 6.12 Power BI History Timeseries
6.7 Design Patterns Python’s code needs to be organized so that it is easily readable, understandable, and well documented
for developers. In order to accomplish the tasks required for T
hread 1, we implemented the strategy pattern (Boyanov, 2016).
6.7.1 Strategy Pattern
The strategy pattern enables an algorithm or class behavior to be changed at run time. Strategy objects
are created for different strategies and the behavior of the context object depends on the strategy
object, which changes the algorithm that is run for the context object.
We also implemented the Strategy pattern during Thread 1 when we demonstrated how to add a
column to the output sheet. There are multiple levels of processing, but our final level of processing
determined how the final number or string should be displayed; the data in the output sheet was
processed and generated using a Strategy_mapping hashmap with keys such as in_millions and
monetized. The value associated with the key referred to a class that defined the logic for how that
Strategy is implemented. In order to add new columns and populate them with data in the correct
format, developers can reference the defined Strategy or create a new Strategy and reference that, as
we did with in_abs_millions.
30
7. Implementation
Pre-Qualifying Project Work Before starting work on site at the firm’s office in New York city, we prepared by performing background
research as well as speaking with the project sponsor on a weekly basis. Regular communication with
the sponsor helped us develop a preliminary understanding of the project. Also, these early
conversations helped us develop a project plan and identify basic project requirements. However, it was
difficult to conduct extensive research without access to the software environment.
Sprint 1
User Stories Completed: As a firm analyst, I want to add columns in the Excel template, so that I don't have to manually edit the report.
As a firm analyst, I want to delete columns in the Excel template, so that I don't have to manually edit the report.
As a firm analyst, I want to modify columns in the Excel template, so that I don't have to manually edit the report.
As a firm analyst, I want to populate the modified template with data corresponding to the column names, so that I don't have to manually input data into the report.
As a firm employee, I want to learn how to use the report system, so that accounting can manually produce reports.
Sprint Review
This week was spent entirely on our initial setup at the firm and Thread 1. After gaining access to the
codebase, we learned that adding or modifying columns in the report required only minimal
modifications to the code. The prior MQP designed the program with this functionality in mind. After
presenting this to our sponsor, he recommended that we revise the documentation to better describe
this functionality and modify the template by adding a column. We created a video tutorial and made
significant revisions and updates to the documentation to better describe this process. These changes
and the video were pushed to the GIT repository by the end of the week.
No User Stories were rolled over or left incomplete for this Sprint.
Story Points Completed: 104
Hours Worked: 116.5
Velocity: 89.27%
Sprint Retrospective Meeting
What Worked Well
• Our sponsor was happy to meet with us every morning which kept our communication clear and easy.
• The onboarding process was faster and smoother than expected.
31
• The team was able to adapt to fast changes in scope and direction as the unfolded for Thread 1.
• Subject Matter experts on Python and Azure seem to be somewhat available to assist us with this project.
• Coming into work early meant we had to spend less time commuting.
What Could be Improved
• No items were rolled over in the backlog. As we learned more about the scope of Thread 2, we
should have built a larger backlog.
• The development environment did take some time to set up which slightly slowed our progress,
this was expected and likely will not be an issue after this week.
• We need to improve how information is shared on the team regarding code changes and how
we can all collaborate on code.
• We need to be aware of when parallel work is needed, so everyone has the same base of
knowledge.
• We need to work on unifying syntax and procedures in code.
Sprint 2
User Stories Completed: As a firm analyst I want to know if a Strategy switched from being a gain to a loss or vice versa so I can
recognize performance changes which affect the overall fund.
As a firm accountant I want to know if any terminal values changed when there was no buying or selling
activity because this is indicative of incorrect Data copy.
As a firm analyst I want to know if any terminal values went to 0 over the last month so I am aware of
any new closed positions.
As a firm accountant I want to know if remMV values changed when there was no trading activity so
that I can check if the data is correct.
As a firm analyst, I want to see the biggest month over month change in IRR at the Strategy code level
over the last N months, so I can make an informed decision about investing.
As a firm analyst, I want to see the biggest month over month change in MOIC at the Strategy code level
over the last N months, so I can make an informed decision about investing.
As a firm analyst, I want to see the biggest month over month change in GrossProfit at the Strategy code
level over the last N months, so I can make an informed decision about investing.
As a firm analyst, I want to know the "Buy and Sell" transactions over the last month, so I can make an
informed decision about investing.
As a firm trainee, I want to find the difference in IRR over 1 month, so that I can learn how to use
DataBricks and interact with the DataLake.
32
Project Risks Sprint 2
Sprint Review
Early in the week we had some difficulty with a rapidly changing scope for Thread 2. Initially we
interpreted this project as the need to create a report from scratch (Monday). We spent time Monday
Sprint planning and planning the overall structure of how the report would be built using our existing
understanding of the firm’s database systems. Tuesday, we learned that we would be providing
validation checks on data as it is loaded into the database. These procedures would be run daily from
the start of each month to determine what data is in the Datalake and what still needs to be added for
the report at the end of the month. This week our sponsor gave us many tasks involving insights he
would find useful to extract from the database. Although slightly challenging without business context
or an understanding of the database structure, we were able to accomplish most of these tasks. We are
learning a lot about the firm’s development environment. Additionally, because there were so many
tasks, the parallel work issue has been resolved.
Tasks added to the backlog include finding strategies that have a total terminal value of 0 and have not
been monetized. This is because we were unable to deliver the User Story the exact specifications of our
sponsor. We are close to completion on this and it will be easy to finish quickly next week.
Story Points Completed: 152
Hours Worked: 230
Velocity: 66.09%
Sprint Retrospective Meeting
What Worked Well
• Weekly updates to advisors seem to be appreciated and will continue in order to ensure all
parties understand our progress and status on the project
• Databricks/Azure lake are easy to use because Python is user friendly and Python notebooks
make code very easy to debug.
• Communication with firm staff continues to be going well as we speak to other members of the
firm
• Less parallel work because of the wider range of tasks.
• More tasks given by sponsor which left our team more room to plan the project
33
What Could Be Improved
• Be more agile -- less time planning, more time doing.
• Find different places to work, others have arrived in the space we are working in and they do
not appreciate our chatter.
Sprint 3
User Stories Completed: As a firm accountant, I want to be able to see the largest difference in IRR between two months so that I
do not have to manually find it.
As a firm accountant, I want to be able to see the largest difference in Gross Profit between two months
so that I do not have to manually calculate it.
As a firm accountant, I want to know which Strat codes changed from ongoing to monetized from one
month to the next to understand which stratcodes affect overall fund performance.
As a firm dev I want to know if the total terminal value is zero because if it is it should be monetized.
As an accountant I want to be able to drill down in the raw files so that I can see where the data may be
incorrect.
As a firm analyst I want to filter down to specific funds so that I can perform more accurate validation
checks.
As a developer I want to figure out how this report to connect Data Frame from Databricks to PowerBI
and create tables out of Data Frame so that I don't have to manually create a report.
As a firm accountant I want to see data points that are outside a number of standard deviations from
what are normal so that I can identify extraneous data.
As a firm accountant, I want to see missing data (IRR, MOIC, GrossProfit, Total_Cost, Total_Sales,
Total_Terminal_Value) in irr_results and irr_mod_cashflows for a certain month, so that I fix them.
Sprint Review This week our scope and objective narrowed and remained consistent. We now know that we are
building a Power BI dashboard which will be used by accountants to check the validity of the firm’s
monthly financial data. Today we sent our project sponsor our initial prototype of this dashboard.
Although still in the early stage, we are now confident enough in the definition of our project to focus
our time towards one deliverable, which contrasts with the smaller missions of last week. Some of this
can be attributed to our experience in communicating with our sponsor. On the technical side, there is
still a lot for us to learn about the development environment. Understanding how to efficiently work
with large datasets in Pandas has been particularly challenging. We have had the assistance of Robert
Dreeke and Oren Efrati in helping us to understand how to create tables in the Databricks Database and
efficiently manipulate data in Pandas, respectively. Next week it was mentioned that we would be able
to meet with an accountant. In preparation we have developed a set of questions to better determine
what an accountant would like to see in a data validation dashboard. We are at the halfway point of
usable project time, the project at this stage appears entirely achievable in the next three weeks.
34
For next week we rolled over User Stories relating to biggest month over month changes in total cost
and total sales. These items have been added to the backlog for next week and will be started in Sprint
4.
Story Points Completed: 162
Hours Worked: 143.5
Velocity: 112.89%
Sprint Retrospective Meeting
What Worked Well
• Good communication with individuals in the firm who are not the Project Sponsor.
• Less time was spent planning, and more time was spent on pursuing User Stories, this greatly
improved our velocity
• Our team has started to understand how to use the Pandas package effectively in Python.
What Could Be Improved
• Show more work to our project sponsor in context. This helped our sponsors understanding of
our progress and the value of our work.
Project Risks
Project Risks Sprint 3
Sprint 4
User Stories Completed: As a firm accountant, I want to know the average change over any time period for MOIC, Gross Profit,
Total Cost, Total Sales, after specifying a Strategy so that I can understand changes in Strategies over
time.
As a firm accountant, I want to see the biggest month over month change in TotalCost at the Strategy
code level, so I can make an informed decision about investing.
As a firm accountant, I want to see the biggest month over month change in TotalSales at the Strategy
code level, so I can make an informed decision about investing.
As an accountant I want to be able to have a report that updates automatically, so I always have the
most up to date information.
35
As a firm accountant, I want to see if Strategies in funds with end dates are monetized, so that I can
determine why.
As a firm accountant, I want to know if a Strategy in a fund is monetized and whether it has no quantity
and no market value, so that I can determine why there is a notable change in the data.
As an accountant I want to see the biggest sycode move for any strat code so I can further analyze that
strat code.
As an accountant I want to see when MOIC and IRR are moving in opposite directions so I can further
analyze the story associated with it.
As an accountant I want to know when Gross Profit does not change and there are many transactions
As an accountant I want to know when RemMV changes and there are many transactions
As an accountant I want to know the month to month price changes for a sycode, so that I can see the
biggest moves in sycode price.
As a firm accountant, I want to check if there are begin dates for strategies, so that I can see why there
might be none.
As an accountant I want to see if a monetized portfolio has a terminal value OR RemMV which changes
from 0 to any number.
As a firm accountant, I want to see if strategies in funds with a terminal value of 0 are monetized, so that
I can determine why.
As a firm accountant, I want to see which strategies are new, so that I can determine which strategies do
not have previous data.
As a firm accountant, I want to see if a sycode belongs to multiple strategies, so that I can determine
how to override the data.
As a firm accountant, I want to see whether prices for sycodes changes across funds, so that I can see if
there were inconsistencies in the data.
Sprint Review
As a result of meeting with accountants Firm Accountant 1 and Doug Mackenzie on Tuesday, we were
able to further refine the needs of our future users. During the meeting we discussed which checks are
performed on the data, the order the checks are performed, and developed an understanding of the
priority of these checks. After the meeting, Firm Accountant 1 sent the Excel files currently used to
perform these checks. Using these sheets and the recording of our meeting we created an outline of all
the validation checks to be performed on the data.
36
Wednesday and Thursday, we developed functions to execute these checks. Thursday afternoon and
Friday were spent integrating and re-validating these checks. Due to difficulties with integration, this
took longer than expected, as a result we were unable to ship a revised dashboard Friday. We will work
to complete this Monday and will review it with the project sponsor. After this review we will have
another meeting with Evan and possibly other members of the accounting department to receive
feedback on the dashboard.
Multiple user stories were incomplete which prevented the dashboard from coming together at the end
of the week. Some of the stories we rolled over to next week related to the human readable
commentary and drillthrough capability. Drillthrough was confused with drilldown which resulted in a
false completion of a User Story.
Story Points Completed: 168
Hours Worked: 126.5
Velocity: 132.81%
Sprint Retrospective Meeting
What Worked Well
• Improved morale
• Realistic goals were set
What Could be Improved
• Better planning for integration, it took far longer than expected because of poor planning of
code.
• Re-use others work, there is no need to re-invent the wheel.
• Narrow scope to allow a finished product by the end of the week.
Project Risks
Project Risks Sprint 4
Sprint 5
User Stories: As an accountant I want to see human readable commentary on which Strategy codes influenced the
portfolio, and moved the most
37
As a firm accountant, I only want to see BKRT, so that I can make decisions on a more meaningful
dataset
As a firm accountant, I want to stratify the data by Region, so I can gain insights about the progress of
each region.
As an MQP student, I want to structure the final paper, so it accurately describes our work at the firm.
As a firm accountant, I want to drill through alerts, so that I can prove that an alert is valid.
As a firm accountant I want to be able to use filters on every page for common fields such as portfolio,
Business Unit, Strategy, Region, Sycode, ALERT ATTRIBUTE so that I can universally filter displayed data
As a firm accountant, I want to see projected values based on historical data like standard deviation and
linear regression, so that I can determine if my numbers are in a reasonable range.
As a firm accountant I want to see the Alert Description without the Extra linked column Visible so that
the view is less cluttered
As an MQP team member, I want to refresh and update our paper-omit previous technologies used and
write about the new technologies used.
As a firm accountant I want to be able to see The DealName as well as StrategyCode Because I know
Deal Names better than Stratcodes
As a firm accountant when Viewing the GP No-Change Transactions Rules I want to be able to drill down
to transactions
As a firm accountant I want Extreme IRR Values to be Filtered Out (Perhaps greater than 1000) Before
the Standard Deviation Is calculated So that I only see useful Data
As a frim trainee I want to meet with Users of the dashboard to better understand their needs
Sprint Review
This Sprint we showed our progress to our sponsor twice and had another meeting with two members
of the accounting team. These sessions were brief but helped us design the Power BI interface. We used
these meetings to hear direct feedback on the state of our dashboards. As a result of a meeting early in
the week, the data structure of our project had to be consolidated to enable the 'Drillthrough' feature in
Power BI. Also, as a result of this meeting we were informed of additional tables in the database which
specify deal-name and region. The accountants use these fields very often, so joining them to our main
table will make the data far easier to understand and manipulate.
During our last stand up with our sponsor this week we received feedback focused mostly on the
presentation of the data in Power BI. Usability is a key sponsor concern. Also, of note in this meeting is a
feature request to use portfolio as a filter in one of our pages.
Our team has concerns on the feasibility of implementing this capability because of the complexity it
would introduce in the back-end data processing. We will communicate our concerns at the start of
Sprint 6. Scheduling would be affected, and it has become a priority to minimize changes to the backend
for two reasons: First, we need to focus time on improving the existing Power BI interface and secondly
38
because changes to our processing and organizing of the data on the backend have the potential to
break our interface. We also started the outline and planning of our final paper.
Many of the items added to the backlog this week were larger formatting tasks which will not be
confirmed to be done until the project is nearly complete. For instance, when a new table is added in
PowerBI, grand totals are added by default. Until no more tables are added we cannot be sure that
there are extraneous and irrelevant grand total fields. This is also true of larger uniformity in formatting,
such as making all the headers the same.
Story Points Completed: 151
Hours Worked: 151.15
Velocity: 99.99%
Sprint Retrospective Meeting
What Worked Well
• Using the pair programming technique allowed for more effective collaboration and better
communication across the team.
• Splitting tasks and User Stories amongst the team has become easier and more natural.
• Working more in the front end is gratifying.
What Could Be Improved
• Estimating the time, a task needs to be completed and conveying that number to others.
Essentially performing a better real time story point allocation and communication when tasks
are in progress and perhaps running long.
• Communicating current objectives casually could be improved so that all members of the team
have a sense of direction.
• Planning Power BI usage for versioning and collaboration purposes is important because only
one person can edit it at the same time.
Project Risks
Project Risks Sprint 5
Sprint 6
User Stories: As a firm accountant I want to Disable Grand totals on non-Applicable Fields.
39
AS a firm accountant I want the ToolTip on Diffs to show the two values used to calculate the DIFF.
As a Project Sponsor, I want to see when a MOIC and IRR are different in my own terms, so that I can be
alerted when it happens.
As a firm accountant I want the Closed_fund_transactions field to be renamed as the
monetized_stratcode_with_transactions and to only check for non-null values in the RESID Column.
As an accountant, I want to see DealName and StrategyRegionofRisk as columns and as filters so I can
effectively analyze the data and utilize the PowerBI Dashboard.
As an accountant, I want to be able to filter on portfolio (including ALL), so that I can assess strategies on
a general level.
As an accountant I want to see the absolute value of all Diffs so that I can sort them.
As an accountant I want to see relevant usable filters on each report page so that I can filter the data
appropriately.
As an accountant, I want to just see values where the alert is true, so I see data respective to that alert.
As an accountant I want a Top Level Summary page that contains the data and a well organized way to
access alerts.
As a developer, I want to learn how to properly use the slicer to arrange data to the accountant's
satisfaction.
As an accountant, I want to see a flat list of transitions: raw data.
As a user of the PowerBI dashboard, I want the columns names to be to be easier to understand, so I can
better understand the data is represented.
As an accountant, I want to see a description of each page, so I understand how to use the data
provided and further understand the alert and its check.
As a firm developer I want to see commented Code so that I can maintain the software.
As a firm developer I want to add "changes" to IRR, MOIC, Buttons so that the RAW explorer is more
useable.
As a firm accountant I want to filter the entire Report by Investment type so that I do not see irrelevant
cash transactions.
As a project sponsor I want the headers of each page to be the same on every page so that there is
consistency in design.
As an accountant I want to see commentary for all all return periods 1, 3, 5 year to date.
As a project sponsor for each dealname I want to see the Average Min Max LR for IRR, MOIC.
As a project Sponsor I want to see a graph of time series data with IRR GP MOIC all in one visual.
As a project sponsor I want to see a descending sort of a diff between current value and average Value.
40
Sprint Review
This Sprint was focused on revisions to the user interface. After Wednesday no more changes were
made to the backend code, and the feature we were concerned about implementing last week was
added within our time constraints. This week we went from demoing once or twice per week to nearly
every day with our project sponsor. This compressed feedback loop let us make the many small changes
needed to improve the user interface much faster. These changes focused on formatting and the overall
flow of the user through the interface. A key challenge was providing enough information to summarize
performance, while not overwhelming the user, all while also giving the user transparency into how the
values were calculated. Towards the end of the week we received some informal feature requests over
email, these features were implemented by review time Friday. We plan to not develop any further
features after this week to stay on track. Our sponsor understands this and will be working with us to
assist in refining our presentation next week.
Story Points Completed: 185
Hours worked: 146
Velocity: 126
Sprint Retrospective Meeting
What Worked Well
• Our goals and expectations we achievable and realistic for the time we have left.
• Scheduling of essay allowed for early professor feedback
• Advisor feedback is positive, which is a good indication of project status.
What Could Be Improved
• We should try to avoid pursuing low clarity instructions without asking for more information,
because it is unlikely we will be able to meet expectations.
• We need to better communicate technical limitations of Power BI.
Project Risks
Project Risks Sprint 6
41
Weekly Burndown
42
43
8. Testing
8.1 Quality Assurance Procedure
For Thread 1, the team used the intermediary Excel file to determine if the numbers were correctly
displayed in the Winners and Losers Report. We found key values for TotalSales and their associated
Deal Names to do a quick check on the validity of the data. In addition, the sponsor verified that the
newly produced column in the report was correct.
In Thread 2, testing was more complex than quickly determining if numbers had been copied over. Since
accountants were one of our primary users, we attempted to use their accounting procedure and
former Excel Sheets to check if our validation alerts had produced similar information. When we tried to
compare our numbers, however, we realized that the accountants’ files had a series of overrides that
were futile to replicate. Our sponsor later told us to not use their numbers as we would waste time
implementing overrides. As a result, we had separate notebooks where we would redo different alert
entries using SQL queries instead of using Pandas. For example, to prove that a certain Strategy had a
GrossProfit change with no significant transactions in the given PeriodEndDate, we queried
reporting.irr_results to prove the GrossProfit change for the given Strategy. Then, we queried
reporting.irr_mod_cashflows to see that the Strategy contained no significant transactions with a
TradeDate within the time of the PeriodEndDate. In addition, we produced sanity check columns for
Power BI, so that we could see the inputs of certain calculations such as month over month changes.
8.2 User Feedback
The team scheduled regular demos with the sponsor to receive user feedback on the accuracy and
usability of the dashboard. These regular meetings allowed us to quickly see initial reactions from the
sponsor and write new User Stories. The stories would then be used to fine tune the user interface and
back-end accordingly.
The firm accountants demoed early versions of the dashboard during interviews, which can be found in
Appendix C and D. We had two major meetings with the accountants that revealed what they value and
how they validate their data.
In the first meeting, we learned about what suspicious activity should be flagged such as IRR and MOIC
movements in opposite directions. We also learned in this meeting how the accountants use a series of
interconnected Excel spreadsheets to flag alerts and generate the commentary for the IRR Analytic
Report. In addition, we got a glimpse of their workflow and what accountants prioritize when validating
data.
In the second meeting, the accountants interacted with a prototype of the dashboard and relayed their
first impressions. They initially did not like the validation section but were interested in the section when
we explained the drillthrough functionality. In addition, they liked the drilldown functionality in Power BI
because it was an intuitive way to navigate the large tables.
44
9. Future Work
9.1 Thread 1
9.1.1 Modularize Strategies Further
Although the current code base modularizes strategies for report customization, the strategies could be
further modularized in column_Strategy.py. The main concern is that the user can only call one Strategy
per header. These strategies are highly specific to headers, which means that creating one requires
intermediate coding knowledge. As a result, developing multiple, modular strategies that can be mixed
and matched (i.e. absolute value, in millions, in billions, in percentage etc.) will likely generate a simpler
user experience for generating custom columns. Keep in mind that some strategies may be so unique to
the data set that they cannot be easily modularized.
9.1.2 Modularize Pre-Processing Functions Further
While implementing the “All Other Positions” portion of the “Invested Capital Column” in
preprocessing_factory.py, we noticed that the file is similarly organized to column_Strategy.py.
However, the code that generates the “All Other Positions” is less modular because it is a helper
function called by the class ConcatLowerBPSProcessing(). As a result, creating multiple, more modular
functions that can be mixed and matched may allow for users to easily customize the template. Some
pre-processing functions may be so unique to the template that they cannot be easily modularized.
9.1.3 Determine User Base The current generator relies heavily on both the initial Excel Template and Python code. As a result,
potential users who are not familiar with Python and software development may have issues editing the
code base to suit their needs. Since the firm may ask non-technical employees to perform report
generation in the future, it is key to determine who will be using the software before further
development. This determination will dictate how to develop the software in a manner that is easy and
appropriate for the user base. Depending on the user base, a solely Excel or Python implementation may
be needed.
9.2 Thread 2
9.2.1 Add More Timeseries Data to Datalake As of December 2019, the data lake only contained PeriodEndDates from mid 2018 to late 2019. To
develop the dashboard, we examined 8/2019 and 7/2019, because 9/2019 and 10/2019 did not have as
much data. As a result, adding more data would allow for a recent analysis of the latest PeriodEndDate
and accurate historical analysis. In addition, more data could be used to train a machine learning model
and perform further analysis.
9.2.2 Schedule Script We designed the dashboard to support accountants in their validation of the latest PeriodEndDate. To
put the Azure Validation Dashboard into production, we recommend that the firm run the script for the
latest PeriodEndDate using the PeriodEndDate selector widgets in Databricks.
45
9.2.3 Add More Alerts and Analysis As of December 2019, we built 17 alerts into the Power BI dashboard. We also laid the foundation for
others. For example, we calculated the linear regression prediction of GrossProfit, IRR, and MOIC for the
latest PeriodEndDate. An additional alert could be designed to find the difference between the actual
value versus the predicted value for the latest PeriodEndDate.
9.2.4 Add More Fields to Data Frame After working in the data lake, we realized we only used a small portion of the many fields in the tables.
The fields in the current dashboard were required by our sponsor, but we imagine that even more
analysis could be performed if more fields were introduced. In addition, we recommend adding all the
ReturnPeriods in the irr_timeseries script as we only included ITD and YTD at our sponsor’s request.
9.2.5 Create Summary Page Although we created a wide range of alert reporting pages, these functions are not prioritized or readily
accessible when opening the dashboard. In its current state, it would take at least 35 clicks to view every
possible alert. A streamlining of the user interface is necessary to improve the workflow and reduce the
amount of time required to view alerts. A future redesign could reduce the number of interactions
required to see the most important alerts. This is a considerable challenge due to the rigidity of
designing in Power BI and because assigning priority to each alert will require a deeper understanding of
their importance.
46
10. Learning Assessment
10.1 Challenges
1. Identifying Requirements
Before we started our project in New York, we had a general idea of what we had to do. Due to security
reasons, we were not able to see how the data was structured until we got to New York. We understood
that we needed to edit the previous MQP’s code and develop new ways to log and analyze data, but
many of the details were not clear. When we arrived in New York, we realized that some of our
requirements had changed. In Thread 1, we planned to upgrade the report generator by implementing
XML, but we quickly realized after looking at the code, that the previous MQP team had already
implemented a Python package utilizing XML. In Thread 2, we were tasked with completing one out of
the three sub threads that we had planned. Additionally, some of the requirements such as using Power
BI as the primary front-end tool was not clearly established until mid-way through the project.
In response, we attempted to clarify requirements with the sponsor and engaged in conversations on
what we had to do. Although the conversations gave the team new insights, these insights would
occasionally conflict with other requirements. Eventually, by having regular product demonstrations
with the sponsor along with an agile mindset, the team was able to determine requirements, gain
actionable feedback, and move forward.
2. Planning VS Execution
During the second Sprint, the team planned the project after getting the initial overview of Thread 2. We
created diagrams and wrote User Stories for four hours. As we began to execute our plan, our
requirements rapidly changed mid-sprint, and much of our planning was not applicable to the project.
On the flip, the team began to develop the back-end tables without considering the limitations of Power
BI. Overall, the team was challenged to find the balance between planning and executing.
After experiencing both extremes, the team realized that shorter planning and execution cycles with
daily feedback was most effective. By receiving our sponsor’s reactions on smaller chunks of our user
interface and back-end code, we were able to align ourselves more with our sponsor’s needs.
3. Domain Knowledge Although the team had some financial literacy and a rough idea of the firm’s asset organization, we
struggled to understand the entirety of the system. Different in-house column headers frequently
confused us as we worked on the back-end structure. Even though our sponsor clarified many terms for
us, we did not interact with most of the columns in the datasets. Although many of the columns were
not relevant to the project, we frequently wondered if we were missing information. Towards the end of
the project, we added Deal Name to our Data Frames since our sponsor requested it. While the task was
easy to complete, the field was stored in an obscure table that we would not have found on our own.
4. Optimization
Despite having some experience with the Pandas Python library, the team had to research how to use
the library correctly. Initially, the team used for-loops to analyze the data. However, we quickly learned
that Pandas was created for vectorization. When trying to optimize our functions, we attempted to learn
best Pandas practices, but we did not fully understand the program. We eventually asked for help from
47
a software developer at the firm, and he showed us the groupby function and the apply function. By
using these functions, we were able to analyze large chunks of data in a shorter amount of time.
10.2 Learnings
10.2.1 Computer Science
Technologies
Throughout the project, the team learned how to adapt and use the firm’s technologies. These
technologies included the Azure Data Lake, Databricks, Pandas, and Power BI. While the team had
familiarity with Python and SQL, we had programmed and interfaced with the data lake and Power BI.
By speaking with employees of the firm, we were able to ask about the company’s best coding practices,
development setup, and advice on how to write in Pandas and Databricks. The team, however, did not
have as much support when working in Power BI. We relied on YouTube tutorials, Microsoft
documentation, and experimentation to develop the final deliverable.
Optimization
Vectorization and GroupBy
During development of various Data Frames, we understood our commands had to run relatively quickly
and utilize the matrix feature of Pandas. At first, we used for-loops and Pandas’ version of for-loops to
iterate through the large matrices. Although our for-loop code produced accurate results, the
commands were relatively slow for large datasets. A software developer at the firm suggested using
different groupby techniques to apply functions on an entire column or group as opposed to rows.
When these techniques were implemented, the commands cut our run times exponentially. As a result,
the team coded with vectorization in mind and structured the Data Frames with temporary columns to
allow for quick calculations and analysis.
Query Optimization
Each team member worked individually on different alerts, so each person wrote their own SQL queries.
When merging the team’s code together into one notebook, we realized the commands took a
considerable amount of time. After running some tests on the code base, we learned that some SQL
queries took minutes to complete, while Pandas commands executed in a tenth of a second. As a result,
the team extracted their SQL commands and made four relatively large Data Frames at the beginning of
the program to be shared amongst the different alerts. By completing this task, we significantly cut
down our run times.
Integration
Midway through the project, our goal was to link our alerts table with the raw data table in Power BI.
We discovered that Power BI could only join tables on one field with a 1-to-1 relationship and would
allow for drillthrough only on that one key. As various alerts needed to be joined on different sets of
keys, we realized we had to re-design our entire Data Frame. At first, we tried creating a unique
identifying key for each row, but we realized that this system would not be able to provide enough
context for drillthroughs. In a similar manner, we then tried to create a column for each alert type in the
raw data table. We also entertained the idea of creating a customized raw data table for each alert type.
While the idea might have worked in Power BI, we quickly dismissed the idea because of the lack of
extensibility. Eventually, we realized we had to merge our alerts table into our raw data table. We
refactored our commands to allow for the merge, and by doing so, we were able to avoid the Power BI
48
join process and allow for immediate drillthrough. Because of this experience, the team learned to be
agile after many failed attempts and aware of the limitations of integration with another program.
10.2.2 Project Management
Working with clients
Throughout this project, we learned how to work with clients. The process of translating abstract ideas
into concrete business and functional requirements in the real world is very different than any
experience in a classroom. Using a variety of techniques, our team refined our ability to ask the right
questions and determine what the client and end users were really interested in. As we became more
familiar with the software environment, data structures, and financial terms, it became easier to identify
the needs of the client.
Additionally, we learned that taking good notes even during small interactions with our project sponsor
helped us keep a good record of feedback. This allowed us to triangulate a solution from all feedback
with a bias towards what feedback was most recent. If we only pursued what was mentioned at the
most recent meeting, as we often did early in the project, we would start many items, complete few,
and overall set difficult-to-achieve goals.
We also learned to speak in the terms of the user. As we became acclimated to the business
environment of the firm, we picked up on many of the terms used in the industry to communicate key
information. By learning the definitions of these terms, we began to have far more productive
conversations when gathering requirements for future versions.
Finally, we found that stating our interpretation a sponsor’s directive and asking if we were correct, was
a productive way to determine if we understood what was communicated. This technique allowed us to
catch any misunderstandings. Getting confirmation early and often was a constant theme throughout
the project.
Iterative development with feedback
Throughout our project experience, it was apparent how important it was to receive feedback on
prototypes quickly and consistently. A large portion of the early project work was spent focusing on
back-end development with minimal feedback from users. Once we developed an initial prototype, we
were able to make greater improvements to our overall product once the user had the product in their
hands. Although this experience confirms the value of user input - a main tenant of the agile
methodology, our project was developed with less initial user input because of other factors. The main
factor was the connection between the user-interface and the back-end: Due to the way Power BI
connects to Databricks, the table we exported from Databricks had to have the same name and header-
names, or every visual would need to be rebuilt. We spent a considerable amount of time understanding
this connection.
10.3 What we would do differently
1. Determine needs of client – priorities and whether it is a want or a need To begin, we could have improved the methods by which we gathered requirements for the Power BI
dashboard. We were able to meet with the sponsor and the accountants on separate occasions to get
their feedback related to our product; however, we spent hours creating and fine-tuning features for the
dashboard that were later discarded. While meeting with our client, we tried to avoid this issue by
49
prioritizing features based on accountant feedback. Instead, we could go about gathering requirements
by asking the accountants which features were “wants” versus “needs.”
2. Establish capability of tools with client The team learned that it is essential to communicate technical limitations when developing project for a
client. Although we were not experts in any of the technologies used, we gained experience, and it
became clear that some functionality desired by the sponsor would be either impossible or very time-
consuming to develop. As a result, we would convey the limitations of the tools to the sponsor early in
the design process and thus close the expectations gap between the team and the project sponsor.
3. Testing Midway through the project, we received a set of validation files from the firm’s accountants, however,
we were advised by our sponsor to not use their numbers for tests. The team learned that the
accountants applied many complex and nuanced overrides that would have taken too much time to
replicate. As a result, we did not have a set of ground truths to test our code with. Instead, the team
tested alert calculations by running independent SQL queries. If we were to do the project again, we
would put more priority on asking for usable tests. The lack of an official ground truth created some
confusion for the team and thus slowed down development.
4. Team Communication Towards the last few weeks of the project, the team had to focus on developing a testable Power BI
dashboard involving various tasks and thus work longer hours. During this time, there was a general
concern about how long we would stay at the office. Although we agreed to work on certain items until
they were finished, we knew that we needed to set time expectations with one another. In retrospect,
we would have established more expectations regarding how long to stay at the office and proactively
establish the priority of certain tasks.
5. Technical Mentors Throughout the project, we met with several firm employees who gave us coding tips, set up tutorials,
and provided feedback on our dashboard. Each time we met with them, we learned how to approach
problems in new ways and gathered clear project requirements. As a result, we feel that having more
conversations with firm team members would have benefited the team greatly and may have increased
our productivity.
50
11. Conclusion
While at the firm, the team improved the Winners and Losers report generator and developed an Azure
Validation Dashboard. By adding documentation to the Winners and Losers report generator, we were
able to help future firm employees maintain the code base. By building the Power BI dashboard, we
provided the firm’s analysts with robust and transparent calculations in a cloud-independent
environment.
Although we faced many challenges such as identifying requirements, planning appropriately, learning
domain knowledge, and optimizing our code base, we were able to overcome them by planning with the
end user in mind, iteratively developing with regular feedback, and learning powerful new tools.
At the end of the project, we were able to present our deliverables to our sponsors and exceed their
expectations.
51
Works Cited
Appelo, J. (2010, October 26). Agile Goal Setting. Retrieved from https://www.infoq.com/articles/agile-
goal-setting-appelo/.
Atlassian. (2020, January 3). Atlassian Documentation. Retrieved from
https://confluence.atlassian.com/.
Beck, K., Beedle, M., Bennekum, A. van, Cockburn, A., Cunningham, W., Fowler, M., … Thomas, D.
(2001). Manifesto for Agile Software Development. Retrieved from https://agilemanifesto.org/.
Boyanov, A. (2020). Python Design Patterns: For Sleek and Fashionable Code. Retrieved from
As a firm analyst, I want to add columns in the Excel template, so that I don't have to manually edit the report.
1 1 16 1
As a firm analyst, I want to populate the modified template with data corresponding to the column names, so that I don't have to manually input data into the report.
1 1 24 1
As a firm analyst, I want to delete columns in the Excel template, so that I don't have to manually edit the report.
1 1 16 1
As a firm analyst, I want to modify columns in the Excel template, so that I don't have to manually edit the report.
1 1 24 1
As a firm employee, I want to learn how to use the report system, so that accounting can manually produce reports.
1 1 24 1
As a firm employee, I want to be guided through the use of the win-loss reporting system so that I can change the output of the report.
1 1 12 1
54
As a firm developer I want a Video Tutorial to Guide me through adding a column to the Win-Loss Report so that I can use the report generator more effectively.
1 1 3 2
As a firm analyst I want to be able to choose variable months for my diff report so that I can validate any month pair in the database.
1 2 24 2
As a firm trainee, I want to find the difference in IRR over 1 month, so that I can learn how to use DataBricks and interact with the DataLake.
1 2 12 2
As a firm analyst I want to know if a Strategy switched from being a gain to a loss or vice versa so I can recognize performance changes which affect the overall fund.
3 2 12 2
As a trainee I want to be able to get basic information from monthly data so that I can decide what is valuable to include in the report.
3 2 12 2
As a firm analyst, I want to see the biggest month over month change in IRR at the strategy code level over the last N months, so I can make an informed decision about investing.
1 2 4 2
As a firm accountant I want to know if remMV values changed when there was no trading activity so that I can check if the data is correct.
1 2 12 2
As a firm analyst I want to know if any terminal values went to 0 over the last month so I am aware of any closed positions in a fund.
1 2 12 2
As a firm analyst, I want to see the biggest month over month change in MOIC at the strategy code level over the last N months, so I can make an informed decision about investing.
1 2 4 2
As a firm analyst, I want to see the biggest month over month change in GrossProfit at the strategy code level over the last N months, so I can make an informed decision about investing.
1 2 4 2
As a firm analyst, I want to know the "Buy and Sell" transactions over the last month, so I can make an informed decision about investing.
1 2 8 2
As a firm accountant I want to know if any terminal values changed when there was no Buying or Selling activity because this is indicative of incorrect Data copy.
1 2 36 2
As a firm accountant, I want to be able to see the largest difference in IRR between two months so that I do not have to manually find it.
3 2 12 3
As a firm accountant, I want to be able to see the largest difference in Gross Profit between two months so that I do not have to manually calculate it.
3 2 12 3
As a firm accountant, I want to know which Strat codes changed from ongoing to monetized from one month to
3 2 8 3
55
the next to understand which stratcodes affect overall fund performance.
As a firm dev I want to know if the total terminal value is zero because if it is it should be monetized.
1 2 10 3
As an accountant I want to be able to drill down in the raw files so that I can see where the data may be incorrect.
2 2 18 3
As a firm analyst I want to filter down to specific funds so that I can perform more accurate validation checks.
2 2 6 3
As a developer I want to figure out how this report to connect dataframes from Databricks to PowerBI and create tables out of dataframes so that I don't have to manually create a report.
4 2 48 3
As a firm accountant I want to see data points that are outside a number of standard deviations from what are normal so that I can identify extraneous data.
1 2 24 3
As a firm accountant, I want to see missing data (IRR, MOIC, GrossProfit, Total_Cost, Total_Sales, Total_Terminal_Value) in irr_results and irr_mod_cashflows for a certain month, so that I fix them.
1 2 24 3
As a firm trainee I want to understand which analysis are performed on what tables .
1 2 4 4
As an accountant I want this report to be easy to use so that I can accurately check company reporting.
3 2 12 4
As an accountant I want to take the sum of inflow values in cashflow, so I can use that to further analyze cashflow data.
3 2 4 4
As an accountant I want to take the sum of outflow values in cashflow, so I can use that to further analyze cashflow data.
3 2 4 4
As an accountant I want to take the sum of total terminal values in cashflow, so I can use that to further analyze cashflow data.
3 2 4 4
As an accountant I want to be able to drill down from an alert to the specific information in the results or cashflows table.
4 2 12 4
As a firm accountant, I want to know if a strategy in a fund is ongoing and if it has quantity or accrued interest, so that I can determine why there is a notable change in the data.
1 2 12 4
As a firm accountant, I want to know when there is a break in the time series for strategies, so that I can determine why there is a break.
1 2 12 4
As a firm accountant, I want to see a line graph showing the changes in price for a given SyCode, so that I can predict where it might go next.
1 2 4 4
As a firm accountant, I want to know the average change over any time period for MOIC, Gross Profit, Total Cost,
3 2 48 4
56
Total Sales, after specfying a strategy so that I can understand changes in Strategies over time.
As a firm accountant, I want to see the biggest month over month change in TotalCost at the strategy code level, so I can make an informed decision about investing.
3 2 12 4
As a firm accountant, I want to see the biggest month over month change in TotalSales at the strategy code level, so I can make an informed decision about investing.
3 2 12 4
As an accountant I want to be able to have a report that updates automatically so I always have the most up to date information.
4 2 8 4
As a firm accountant, I want to see if strategies in funds with end dates are monetized, so that I can determine why.
1 2 4 4
As a firm accountant, I want to know if a strategy in a fund is monetized and whether is has no quantity and no market value, so that I can determine why there is a notable change in the data.
1 2 8 4
As an accountant I want to see the biggest sycode move for any strat code so I can further analyze that strat code.
3 2 4 4
As an accountant I want to see when MOIC and IRR are moving in opposite directions so I can further analyze the story associated with it.
1 2 12 4
As an accountant I want to know when GP does not change and there are many transactions.
1 2 8 4
As an accountant I want to know when RemMV changes and there are many transactions.
1 2 8 4
As an accountant I want to know the month to month price changes for a sycode, so that I can see the biggest moves in sycode price.
1 2 8 4
As a firm accountant, I want to check if there are begin dates for strategies, so that I can see why there might be none.
1 2 6 4
As an accountant I want to see if a monetized portfolio has a terminal value OR RemMV which changes from 0 to any number.
1 2 12 4
As a firm accountant, I want to see if strategies in funds with a terminal value of 0 are monetized, so that I can determine why.
1 2 6 4
As a firm accountant, I want to see which strategies are new, so that I can determine which strategies do not have previous data.
1 2 4 4
As a firm accountant, I want to see if a sycode belongs to multiple strategies, so that I can determine how to override the data.
1 2 2 4
57
As a firm accountant, I want to see whether prices for sycodes changes across funds, so that I can see if there were inconsistencies in the data.
1 2 6 4
As an accountant I want to see human readable commentary on which strategy codes influenced the portfolio, moved the most.
3 2 12 5
As a firm accountant, I only want to see BKRT, so that I can make decisions on a more meaningful dataset.
3 2 2 5
As a firm accountant, I want to stratify the data by Region, so I can gain insights about the progress of each region.
1 2 18 5
As an MQP student, I want to structure the final paper so it accurately describes our work at the firm, and so it is not based solely on our proposal.
6 2 6 5
As a firm accountant, I want to drill through alerts, so that I can prove that an alert is valid.
1 2 60 5
As a firm accountant I want to be able to use filters on every page for common fields such as portfolio, Business Unit, Strategy, Region, Sycode, ALERT ATTRIBUTE so that I can universally filter displayed data.
1 2 2 5
As a firm accountant, I want to see projected values based on historical data like standard deviation and linear regression, so that I can determine if my numbers are in a reasonable range.
3 2 20 5
As a firm accountant I want to see the Alert Description without the Extra linked column Visible.
1 2 4 5
As an MQP team member, I want to refresh and update our paper-omit previous technologies used and wrote about the new technologies used.
6 2 4 5
As a firm accountant I want to be able to see The DealName as well as StrategyCode Because I know deal names better than Stratcode.
5 2 2 5
As a firm accountant when Viewing the GP No-Change Transactions Rules I want to be able to drill down to transactions.
1 2 12 5
As a firm accountant I want Extreme IRR Values to be Filtered Out (Perhaps greater than 1000) Before the Standard Deviation Is calculated So that I only see useful Data.
1 2 3 5
As a firm trainee I want to meet with Users of the dashboard to better understand their needs.
5 2 6 5
As a firm accountant I want to Disable Grand totals on non-Applicable Fields.
5 2 5 6
AS a firm accountant I want want the ToolTip on Diffs to show the two values used to calculate the DIFF.
5 2 8 6
58
As a project sponsor, I want to see when a MOIC and IRR are different in my own terms, so that I can be alerted when it happens.
1 2 4 6
As a firm accountant I want the Closed_fund_transactions field to be renamed as the monetized_stratcode_with_transactions and to only check for non-null values in the RESID Column.
1 2 4 6
As an accountant, I want to see DealName and StrategyRegionofRisk as columns and as filters so I can effectively analyze the data and utilize the PowerBI Dashboard.
2 2 4 6
As an accountant, I want to be able to filter on portfolio (including ALL), so that I can assess strategies on a general level.
1 2 24 6
As an accountant I want to see the Absolute value of all Diffs so that I can sort them.
5 2 12 6
As an accountant I want to see relevant usable filters on each report page so that I can filter the data appropriately.
2 2 4 6
As an accountant, I want to just see values where the alert is true, so I see data respective to that alert.
5 2 4 6
As an accountant I want a Top Level Summary page that contains the data and a well organized way to access alerts.
5 2 24 6
As a developer, I want to learn how to properly use the slicer to arrange data to the accountant's satisfaction.
5 2 3 6
As an accountant, I want to see a flat list of transitions: raw data.
2 2 3 6
As a user of the PowerBI dashboard, I want the columns names to be to be easier to understand, so I can better understand the data is represented.
5 2 20 6
As an accountant, I want to see a description of each page, so I understand how to use the data provided and further understand the alert and its check.
2 2 12 6
As a firm developer I want to see commented Code so that I can maintain the software.
6 2 3 6
As a firm developer I want to add "changes" to IRR, MOIC, Buttons so that the RAW explorer is more useable.
2 2 1 6
As a firm accountant I want to filter the entire Report by Investement type so that I do not see irrelivant cash transactions.
1 2 2 6
As a project sponsor I want the headers of each page to be the same on every page so that there is consistency in design.
1 2 3 6
As an accountant I want to see commentary for all all return periods 1, 3, 5 year to date.
1 2 12 6
59
As a project sponsor for each dealname I want to see the Average Min Max LR for IRR, MOIC.
1 2 24 6
As a project Sponsor I want to see a graph of time series data with IRR GP MOIC all in one visual.
2 2 1 6
As a project sponsor I want to see a Descending sort of a diff between current value and average Value.
2 2 8 6
As an accountant, I want to the commentary to be split up into sections based on ReturnPeriod, so that I can easily digest the commentary section.
5 2 24 6
As a firm accountant I do not want to see total terminal value transaction types on gross profit same but transaction exists page because these types are not relevant.