Published
- 13 min read
- by Alp
Financial Statement Analysis Software | Part 1
The thumbnail for this post was generated in part using DALL-E.
This will be the Part 1 of a multi-part series.
I: Metagoals
Welcome back dear reader! Let’s get some top level information out of the way before we start.
I’ve decided my first post of real substance to be on a little project that I am working on at the moment. The goal of this post and its upcoming successors are to discuss the following:
- How to approach a software project in its different phases.
- How to tackle development, especially when working alone or with a very small team.
- Some simple principles of software, applied.
- What I implemented, why, how, etc.
Notice how the implementation itself is of least importance? That is the goal. I believe there is no shortage of “how to” guides on the internet. Plus, if you’re here reading a blog on software, I’m guessing you either A) develop software for a living, or B) are curious about software processes and trends, or C) are interested in me. If it’s that last case: hii (DM me) ;) Notice how the demographics described by A and B would not benefit from a step-by-step guide to get “heavily inspired” from. It’s also probably easier to ✨proompt✨ your way into something like that these days anyway.
TL;DR I: This is Part 1 of many. Its goal is not to show you how one builds a financial analytics tool. For the actual goals, see ordered list above.
II: Obligatory life story
My “serious” capacity for software was really built in university. Before that the best I could do were for loops, and maybe recursion? I can’t remember to be honest.
My first ever non-school project was developing a simple analytics pipeline that would read some data coming from a “clocking-in machine” in a factory and determine the degree to which every worker was clocked in during their scheduled hours. Let’s discuss that project and its ethical and socio-political implications some other time. What is relevant to this here post about that project is that I—starry-eyed second year that I was—botched it pretty good.
The finished thing worked in the end, and was much more performant than both my client and I had expected. The thing is, by being amazingly performant and what not, I had saved the client a couple minutes out of an hours-long, monthly process. The flip-side was that I built the whole thing in Java + Scala, so the interface was properly eye-bleed-inducing and some poor personal assistant had to figure out how to install and set up JVM just to run it. Needless to say that this person, whose computer usage was presumably limited to tools they got a corporate training on, never figured it out¹. And yes, I did send them the Java downloads page. I didn’t realise this at the time, but the whole thing was probably a novelty to them and not worth the effort.
Still, it was fun to have this company’s IT lead eat his words for saying it would take a team months to build this tool. Needless to say that it took me less, I wouldn’t be flexing if it had not. Oh, I also lost all the money I made on this project to a trip to Italy that I had to burn because it was to be on the week that a certain globe-trotting virus first jumped to Europe, by way of Italy.
Why did I tell you this story? Though the homage to recipe blogs was also my intention, that was secondary. There are learnings to be learned from this learning opportunity. I had built this tool in a needlessly complicated way out of misguided notions of what is right and good. My main takeaways from that fiasco can be summarised as the following:
- Don’t overcomplicate things. I built a tool that could probably handle terabytes of data in a distributed manner. It only ever processed a single file at a time, once a month, on a laptop. A single file was at most one gigabyte in size.
- Build what the owner/stakeholder/client wants and needs, even if they don’t know that themselves. The client took a look at the ugly UI, took another look at how “it wouldn’t run” and decided it was amateur work. In some cases it was, but not in the ways that they thought about it. The amateur-ness lies in the ways in which I allowed the client to think that. If only they could appreciate the distributed, functional analytics pipeline I built, right? They almost never will, unless it leads directly to functionality that they want. And them’s the facts.
- Research before you start. I ended up reinventing ETLs (and as all accidental reinventions of the wheel go, what I built was worse). I also forgot that Excel files are essentially CSV’s and spent about one third of the project trying to get Apache POI to work with me.
TL;DR II: My first ever project was an analytics tool. Outside of fulfilling the main requirement, it failed in almost all ways that a software project can fail. There are takeaways from that project that are relevant to this one. See unordered list above.
III: Getting the project
A childhood friend of mine sent me a message asking to have a chat about a thing he’s working on. One of many dimensions of our relationship is that I’ve become his software friend. A cultural hypothesis of mine is that everyone born after 1990 either has a software friend or is one.
We had a very nice catch-up session (Reach out to your friends, now!), followed by an explanation of what he was working on. It was a collection of metrics based on financial reports to choose companies to invest in in the long term. He’d built a spreadsheet with three main sections. One for the financial data points themselves, a second for the metrics derived from the data points and a third for applying some rule to the metric values to assign a kind of OK/NOK per metric. He had to fill the financial data points in by hand, which took a lot of time and was prone to (human) errors.
He was essentially asking if we could software magic our way into an automated version of this. So I got my requirements analysis hat on and we got talking. I should mention that this is not a paid project, and the requirements are very loose. It still pays to establish some where we can.
Intermezzo I: Agreeing on requirements
This project is more of a side-project, so this very important section exists as an intermezzo. If you’re building software for money, check your requirements with every (real²) stakeholder. Look them in the eyes as you have them confirm your requirements. Then get written confirmation. Written communication can count if the action space you’re in allows for using them as proof.
If there’s the faintest shadow of a doubt, go back and repeat all steps again and again, until absolutely no doubt remains.
This advice might make you uneasy, especially if you’re not used to being forceful in this context. You don’t have to be as much of a zealot as I described. But to be clear: if you aren’t doing this, you’re doing yourself a disservice.
Intermezzo I over.
Let’s get back to our little project. The requirements laid down with no particular structure, method or much effort are:
- Get financial report data. Yahoo Finance is preferred at first.
- Read the data points in the spreadsheet from the financial report and add them to the output.
- Calculate the metrics in the spreadsheet from the data points and add them to the output.
- Calculate the OK/NOK based on the metrics.
And here are some that I assumpulated:
- Have it be easy to run without much setup. Or provide a guide or setup script.
- Have it be easy to add or change data points, metrics and GO/NO-GO rules.
- Print out in a way that is as close as possible to the sample spreadsheet.
- Make it possible to take in a list of companies and execute the analysis on all of them.
With these in mind, let’s get to work.
TL;DR III: A friend called and asked me to software-ise his financial statement analysis. I got requirements. Requirements rock.
IV: Scouting the field
Before starting, a couple quick checks were in order. First I checked whether the Yahoo Finance API was accessible in any way. Turns out yes, and turns out there’s even a Python package that does that for you. Yay for less work³!
At this stage I go ham on Google and a number of LLMs. The idea is to collect a wide range of introductory information in as short a time as possible. We will dive into deeper details (and READ THE DOCS!) later. Turns out we can build a POC for this idea using just yfinance and pandas. Neat. Let’s begin there.
Intermezzo II: Reveal the sculpture within
You might be familiar with that Michelangelo (not the turtle) quote that goes something like “I don’t create sculptures, they already exist within the marble. All I do is reveal the sculpture within.”, but like in 15th century Florentine or something (yes, I had to look up the century; shame).
Why am I waxing (or marbling, hehe) poetic here? Let’s build a metaphor methods of manufacture.
On one hand, we have those crafts that produce an end result by way of addition. On the other we have those that produce by way of subtraction from a source. In either approach, one can add or subtract in equal, decreasing or increasing steps.
In my experience, and therefore also in my metaphor, there are no (efficient) approaches that make use of an increasing increment. Those approaches or crafts where increments are equal in size are forced into this by their tools or materials. Examples to equal size increments in manufacture could be knitting, weaving or (most) 3D printing.
The remaining approach is that of either adding or subtracting increments of decreasing size. An additive example can be laying pipes, where one starts with wider and/or longer pipes that feed a whole system (such as your house), then moves on to narrower and/or shorter ones as the finer details are reached (such as your shower head). Subtractive examples are also plenty, such as sculpting and machining⁴.
Think sanding. Think simulated annealing. Think binary space partitioning. Think Michelangelo.
Now, one can think of building software as either a subtractive or an additive process of manufacture. In either case, one can progress much faster by starting with larger chunks of addition or subtraction and moving to smaller and smaller chunks for finer details. This is especially valuable when faster results are desirable at the cost of absolute technical perfection, which is to say almost always. In the case of this project, it is important for me to report some early results. This way, my friend can validate that what I envisioned when we had our first talk was close to what he had in mind.
If not, we iterate. This great “invention” of Scrum is something we often look over when leveling our weapons at each other in the great war between the Holy Army of our Scrum Master and the anti-Scrumarchy insurgency. It’s fine to not build the end product on the first go. That’s actually almost the whole point. Because it’s basically impossible to do that.
Intermezzo II over.
And so, as discussed in Intermezzo II, we start with adding as large chunks as we can manage. Just two packages and a little elbow grease will do for our first “deliverable”.
V: Much ado about nothing
Yes dear reader, we didn’t even get to start implementing the “Financial Statement Analysis Software” yet. I think this post has gotten large enough, both in size and in meaningful content. This structure also maps remarkably nicely to how any real project is done; even in a project as small as this one, the first stage of discovery, requirements analysis, research and—dare I say—design should take a substantial amount of time. “Measure twice, code once,” as they say. As a bit of foreshadowing to Part 2, this whole stage (including the relevant parts of the call with my friend-client) took 2-3 hours, while the whole implementation of the first deliverable took about 4.
We’ll dive more into the implementation, some pontifications about the implementation and takeaways from what I built on this first round of iteration in Part 2.
Until then, take care!
VI: Footnotes
1: Tangent about computer usage
Did you, dear reader, repulse a little from my presumptions about an imaginary personal assistant’s ineptitude with computers? To be honest with you, I thought to myself “That might be a little bit too edgy”, but then I remembered this little story.
A friend of mine was working at an international non-software company as a data science intern; let’s call this company ACorp from now on. This was at the global HQ of ACorp too by the way. ACorp did not directly sell software, but had a lot of internal software tools and projects. Most of these were being outsourced for exorbitant sums (compared to what got delivered that is), and some were being handed to interns. I should also mention that ACorp only hired interns that were honors students and more or less expected them to deliver work that is similar in quality to the six to seven figure projects they outsourced. While working part-time alongside their studies. While being paid less than 60% the minimum hourly wage of the country (yes that’s legal there, because they’re students). Absolutely mad.
The main story is actually quite a small event. ACorp hires a Business Intelligence expert or something. The hire before this was an ex flight attendant with no education or past experience in math, software, engineering or data science by the way. It goes without saying that my friend—the intern—ended up mentoring this guy who was getting paid at least five times as much. So my friend—the intern—is thrilled about this new, expert hire. Finally someone who can show them the ropes, some mentorship. Right? Nnnnope!
After a week on the job, the expert comes to my friend—the intern—and confesses that they could not download the BI tool that this company uses. My friend—the intern—looks into it and sees that the expert does not have python installed on their machine; chalks it up to a little mistake since it’s a new computer. My friend—the intern—tells the expert to just install python and you should be fine. In full view of my friend—the intern—this guy proceeds to type in “python” into the Google search bar, click the top image of the python logo, click save, turn to my friend—the intern—and ask “Are we done then?“.
I believe this story doesn’t need a wrap up or conclusion. ⤾
2: Real Stakeholders
Sharon, the associate project manager of your sister product team does not count. Neither does the new junior engineer who makes themselves a part of every discussion. Only people who have a real say in the matter are real stakeholders. I’ll write more about these dynamics some other time. Can’t wait. ⤾
3: Using the Yahoo Finance API
I’m actually interested in doing this, and might do it in one of the following parts, though its not at the top of my priorities. Almost all API integrations come with nice decisions to make, but if we can abstract this away for our proof of concept, I’m more than happy to do that. See Intermezzo II about that. ⤾
4: Reductive Metaphor is Reductive
Yes, all of these crafts make use of additive and subtractive methods, that’s why the metaphor is reductive. Let’s keep moving. ⤾