Week 3 & 4: Can 2 guys with a laptop beat an army of analysts?

Week 3 & 4: Can 2 guys with a laptop beat an army of analysts?

It might be an audacious question, but it is the one which we are ultimately trying to answer.
Can two guys write an algorithm using XBRL which can do what huge firms like Bloomberg, Reuters and Morningstar pay hundreds if not thousands of analysts to do?
uuptick XBRL data

Sign up to our Newsletter!

[mailerlite_form form_id=1]
Last week’s article never showed up because we were too busy working on one of the cornerstones of our business: Obtaining data.

At uuptick we are developing a stock analysis platform which allows retail investors to have a centralised suite of tools to help them make the best decisions for their portfolio.

We believe we are approaching the problem from a variant angle than our competitors, trying to solve retail investor specific pain points by involving a group of 25 investors in the creation of the platform.

But what our testers see is just the tip of the Iceberg of what goes on behind the scenes on Robert’s laptop. As you can imagine, any company whose core business is data visualisation needs data.

As far as stock analysis products go, competitors have usually gone down two routes:
• Hire analysts to create your database of financial statements. This is what Reuters, Bloomberg, Morningstar and Factset do, among others. While this if effective, it is not efficient as the cost of paying analysts to normalize financial statements is invariably high.
• The other option is to buy the data from one of these companies. It won’t be cheap, but it is an extremely easy way to leverage another company’s work. This is the case of companies like Stockepedia, Old School Value and so on. It is an interesting option since you get easy access to data, but you invariably are exposed to any human errors that made their way in. Your business is also highly dependent on your supplier.

Neither of those options seemed like satisfactory choices for us when we were developing uuptick. During lunch with Brian Pinchuk, a portfolio manager at Lorne Steinberg Wealth Management, he mentioned XBRL filings. I had never heard of them and the 4 consonant acronym was already giving me a headache.

I went back home and looked it up. It turned out XBRL was short for extensible business reporting language, a variant of xml geared towards business reporting. All kinds of reports are concerned, and depending on the geography the technology is used differently. For example, one of our testers, Luc de Vos, has recently invested in a Belgian startup which uses XBRL filings for Belgian companies, Finactum.

The reports we were interested in were annual and quarterly reports with financial statements of publicly listed companies. I then found out that since 2009, any public company that reports to the SEC must file in both XBRL and pdf. These documents are in the public domain, and can be used to build our database of financial statements.

I called Robert, and was extremely excited. This is it I said, just download these, slap them in a database and everything will be Gucci.

Little did I know that for the next 12 months we would be scratching – if not banging- our heads, figuring out how to use them effectively.

uuptick algorithm XBRL
There were several tasks we needed to do:
• Strip the files to keep only the data we need.
• Display a company’s statements.
• Display historical statements side by side.
• Make calculations on the statements (ratios, growth rates, etc)
• Compare these calculations among companies, to discriminate within a screener.

You can’t even imagine the amount of work which would be required to do this effectively. We encountered roadblocks at pretty much every step.
The point is, that we decided to take on the ambitious challenge of writing an algorithm which can map all the data between companies and throughout history.

To do so we must create hundreds of instances of test data to map the data effectively. We will then feed these rules into a machine to apply an aspect of machine learning to the data. What we want is for our algorithm to recognize a problem, understand what treatment must be applied, and apply it every time we add a company to our database, as well as every time a company files a new statement.

I’m now going to let Robert discuss the technical challenges of what we are trying to accomplish.

Robert:

According to the SEC:

All companies, foreign and domestic, are required to file registration statements, periodic reports, and other forms electronically through EDGAR. Anyone can access and download this information for free.

Since 2009 Companies have been providing quarterly (10Q) and annual (10K) filings to the SEC in a machine-readable format. This format is called XBRL (eXtensible Business Reporting Language). The SEC provides a number standard definitions called taxonomies, based upon the US GAAP taxonomy, which companies are supposed to adopt when preparing their filings.
A taxonomy defines a structure for the financial statements. Below is an extract of the structure for the current assets for a Balance Sheet.

xbrl taxonomy

As can be seen there is a lot of detail available to be completed by companies. We set ourselves the objective of retrieving, analysing and presenting the financial statements using the original SEC XBRL data as filed by companies.

This has the advantage of making available all the detail which has been supplied by the companies. Some of the well-known financial information websites take shortcuts by presenting data in a standardised format for all companies.

Here we can see an extract of the SEC filing for 3M for the most recent quarter compared with what is presented by a well-known competitor.
We can see clearly how our competitor on the right has not included the break-down of the inventories which are included in the SEC filing on the left. This might not be vital when analysing the fundamentals of a company but I prefer the full picture and to summarise it myself.

xbrl comparison uuptick
The decision to use the SEC XBRL data does, however come at a price. We have had to develop special algorithms which merge the filings for the different periods for a company into a single table. This is not a trivial task and it is not helped by the fact that companies have a lot of freedom when it comes to preparing their filings:
The taxonomy defines names for the different data items or “facts”.

  • Companies have changed the fact between filings for the same information. This does not help when we want to put side by side data from different filings.
  • Companies can extend the standard taxonomy creating their own fact names. Whilst this does not make it harder to display the data side by side it does not help when we try to automate the analysis.

To give you an idea of the choices available, Cost Of Revenue can be presented in the filing using any of the following names:

– Cost Of Revenue
– Cost Of Goods And Services Sold
– Cost Of Services
– Cost Of Goods Sold
– Cost Of Goods Sold Excluding Depreciation Depletion And Amortization
– Cost Of Goods Sold Electric

In addition to improving the data import process to handle the vagaries of the different company filings, we are developing new features such as an automatic update and notifications based on new filings.

When a company submits a filing and it is accepted by the SEC it’s data becomes automatically available on the SEC site. We receive a notification of this and will retrieve the data and import it into our database. This means that within 5 minutes of receiving the notification the data is available for analysis on the uuptick web-app. Not many financial sites can boast that. In addition, we aim to notify by email, all users who have added the company to their watchlists or portfolio.

To conclude this brief exposé, the SEC data that we are presenting within our uuptick web application is a work in progress as we continue to improve the algorithms which process the data that we retrieve. I’ll let Sam wrap up this blog post.

Sam:

The roadmap from where we are currently to where we need to be is clear internally. Unfortunately, I can’t go into it in too much detail right now. While I try to be as transparent as I can in this blog, I cannot give our secret sauce away.

I believe these little secrets contribute to a more enjoyable world anyway.
I have tried to imitate the Big Mac sauce (no it is not just a thousand island sauce), as well as KFC seasoning, yet every time I get the real thing, I’m happy they keep it away from me.

xbrl burger

That is all for this week, if you enjoyed this article, please enter your email below to receive our newsletter.

[mailerlite_form form_id=1]

If you want to test uuptick in our next round of testing, you can apply by filling in this form.

[mailerlite_form form_id=2]

Finally if you want to learn more about investing in the stock market, you can download our free ebook for dividend growth investors here.

3 thoughts on - Week 3 & 4: Can 2 guys with a laptop beat an army of analysts?

  • What % of your algorithm relies on Machine Learning vs. manually specifying rules (such as Cost of Revenue = [Cost of Goods Sold] || [Cost of Services || […]?

    • Narayana,
      On the data you are currently testing, it is only manual rules. It is a little more complicated than what you suggested because sometimes companies have a combination of both cost of goods sold and cost of services such as
      Cost of Revenue = Cost of Goods Sold + Cost of Services.

      We are currently prepping the data for it to be adapted to each security individually instead of generic rules like you mentioned.
      From there on, we will be able to implement some automation.

      Does that help?

      • Yes it helps. Coming from a background where I’ve done substantial data cleaning, it would be very interesting to know how you will use machine learning to identify these rules automatically!