Episode #391: Vinesh Jha, ExtractAlpha – Alternative Data & Crowdsourcing Financial Intelligence

February 16, 2022

Episode #391: Vinesh Jha, ExtractAlpha – Alternative Data & Crowdsourcing Financial Intelligence

Guest: Vinesh Jha founded ExtractAlpha in 2013 in Hong Kong with the mission of bringing analytical rigor to the analysis and marketing of new data sets for the capital markets. Most recently he was Executive Director at PDT Partners, a spinoff of Morgan Stanley’s premiere quant prop trading group.

Date Recorded: 1/26/2022 | Run-Time: 1:04:54

Summary: In today’s episode, we’re talking all things quant finance and alternative data. Vinesh walks through his background at StarMine, which built a Morningstar-esque company for equity research, and then dives into what he’s doing today at ExtractAlpha. He shares all the different ways he analyzes alternative data, whether it’s looking at sentiment and ticker searches or using natural language processing to analyze transcripts from earnings calls. Then he shares whether or not he thinks alternative data can help investors focused on ESG.

As we wind down, we touch on ExtractAlpha’s merger with Estimize and the ability to crowd source financial intelligence.

Comments or suggestions? Email us Feedback@TheMebFaberShow.com or call us to leave a voicemail at 323 834 9159

Interested in sponsoring an episode? Email Justin at jb@cambriainvestments.com

Links from the Episode:

0:40 – Sponsor: The Idea Farm
1:10 – Intro
2:00 – Welcome to our guest, Vinesh Jha
3:16 – Episode #351: Leigh Drogen
4:45 – Vinesh’s years spent on the sell side, Star Mine & Morgan Stanley
9:01 – The origin story of ExtractAlpha
12:50 – The sourcing, cleaning, and challenges of working with data
14:52 – The Limits of Arbitrage (Schleifer; Vishny)
19:23 – Episode #362: Seth Stephens-Davidowitz; going from research to robust data
26:41 – Turning data into actionable insights
31:45 – Trade Like A Hedge Fund (Altucher)
32:44 – How hedge funds incorporate ExtractAlpha’s insights into their investment process
38:33 – Example of a strange data set Vinesh has come across
40:46 – Any interest in data outside of public equities?
43:06 – The intersection between ESG and alternative data
48:26 – ExtractAlpha DTCC Investor Kinetics Data
52:14 – The ExtractAlpha and Estimize merger
58:39 – A data set he wished he had his most memorable investment
1:01:08 – Learn more about Vinesh; extractalpha.com; LinkedIn; estimize.com

Transcript of Episode 391:

Welcome Message: Welcome to “The Meb Faber Show,” where the focus is on helping you grow and preserve your wealth. Join us as we discuss the craft of investing and uncover new and profitable ideas, all to help you grow wealthier and wiser. Better investing starts here.

Disclaimer: Meb Faber is the co-founder and chief investment officer at Cambria Investment Management. Due to industry regulations, he will not discuss any of Cambria’s funds on this podcast. All opinions expressed by podcast participants are solely their own opinions and do not reflect the opinion of Cambria Investment Management or its affiliates. For more information, visit cambriainvestments.com.

Sponsor Message: Today’s podcast is sponsored by The Idea Farm. Do you want the same investing edge as the pros? The Idea Farm gives you access to some of this same research usually reserved for only the world’s largest institutions, funds, and money managers. These are reports from some of the most respected research shops in investing. Many of them cost thousands and are only available to institutions or investment pros, but now they can be yours with the subscription to The Idea Farm. Are you ready for an edge? Visit theideafarm.com to learn more.

Meb: What’s up, friends? We got a fun show today all the way from Hong Kong. Our guest is the founder and CEO of ExtractAlpha, an independent research firm dedicated to providing unique, actionable alpha signals to institutional investors.

In today’s show, we’re talking all things quant finance and alternative data. Our guest walks through his background at StarMine, which built a Morningstar-esque company for equity research, and then dives into what he’s doing today at ExtractAlpha. He shares all the ways he analyses alternative data, whether it’s looking at sentiment and ticker searches, or using natural language processing to analyze transcripts from earnings calls. Then he shares whether or not he thinks alternative data can help investors focused on ESG.

As we wind down, we touch on ExtractAlpha’s merger with Estimize and the ability to crowd source financial intelligence. Please enjoy this episode with ExtractAlpha’s Vinesh Jha.

Meb: Vinesh, welcome the show.

Vinesh: Thanks, man. Glad to be here.

Meb: Where do we find you? Where’s here? It’s early in the morning for you, almost happy hour for me.

Vinesh: Exactly. I’m here in Hong Kong at the office, actually going into the office these days, in a place called Cyberport, which has got this fabulously ’90s sounding name. It’s a government-funded, coworking space.

Meb: Cool. You know what I saw the other day that I haven’t seen in forever is computer cafes, were like a huge thing. Like every start-up college kid have…internet cafe is like their idea. But I actually saw a gaming VR one the other day, that was the nicest game room I’ve ever seen in my life in LA. So, who knows, coming full circle? Why are you in Hong Kong? What’s the origin story there? How long have you been there?

Vinesh: I’ve been here since 2013, so about 8 years, eight and a half years now. I came out here largely for personal reasons. My wife is from Hong Kong, and her family’s out here. I was kind of between things. I resigned from a job at a hedge fund in New York, that was a spin off from Morgan Stanley called PDT Partners, and didn’t really have a plan, just wanted to do something entrepreneurial. So I was flexible as to where I could go. My wife doesn’t like New York, too cold for her, so ended up out here.

Meb: Your company currently, ExtractAlpha, famously merged with another podcast alum Estimize’s Leigh Drogen. However, we’ll get to that in a second. I have to rewind a little bit because you and I both were out in San Francisco at the time of the last great big internet bubble, the Big Daddy. When did you make it out there? Were you in time for the upswing too or just the decimation afterwards?

Vinesh: I got there right in time. I got there in November ’99.

Meb: So the champagne was still flowing, it was still good times, right?

Vinesh: Yeah. All my friends and I worked in these nice spaces with pool tables and ping pong tables. We’d all go to Starbucks then on brand, and I think it was. And it was funny when we got there, lines out the door at the Starbucks. This is my Starbucks indicator. Four months later, you know, March, April 2000, I was the only one there. They knew my name. They got my coffee before I got in the door. It was a boom and bust and kind of echoes of today, it seems like.

Meb: You are more thoughtful than I was. I didn’t get there until ’01, ’02. So I used to visit and be like, “Oh man, this is the land of milk and honey, free happy hours.” I go to the Google parties in Tahoe before they went public. But then, I showed up and I moved there with the perception that that’s what it was going to be like forever. And it was just the internet winter, just desolation.

That’s where my coffee addiction began. I didn’t really drink coffee and I lived in North Beach. And they were just littered with a bunch of amazing coffee shops, Syd’s Bagels. I don’t know if they still exist.

Anyway, StarMine was a big name in the fund world, particularly in San Francisco at that time, because data, at that time, there’s a lot of what you guys were doing. So I want to hear about your role. You were there for a handful of years and just kind of what you did. I imagine it was the foundation and genesis for some of the ideas and things that you’re doing now, over two decades later.

Vinesh: So I got my start a couple years before that, actually on the sell side. So I was at Salomon Smith Barney, if anyone remembers that name, eventually it was part of the Citi Group and Travelers merger. I was in sell-side equity research doing some global asset allocation. So it’s really quant-driven global asset allocation group. I was there right out of school, really just wrangling Excel spreadsheets and getting data on CDs and stuff, and putting it all together into a model that predicts returns on countries.

As a result of the merger, that group got dissolved. But during that time, I met this guy, Joe Gatto, out in San Francisco. And Joe was running a small company called StarMine out of a garage. So his garage at 15 Brian, underneath that big Coca Cola sign South of Market. And it was just a handful of people.

He had this idea. He is a former management consultant, really bright guy, but he was looking to invest some of the money he made. And he was looking at Dell, which at the time is a publicly traded company, had 10 or 15 analysts covering it, putting out earnings estimates.

And he’s like, “These guys are all over the place. Some of them an estimate of $1. Some of them are 50 cents. I don’t know who to listen to. If you take an average, that doesn’t seem right, 75 cents. Maybe that’s the right number, maybe it’s not. Let me see if I can figure out who’s actually good. And then, if I figure who’s actually good, maybe I’ll have an edge out. Maybe I’ll really know what Dell’s earnings are going to be.”

He interviewed me. And we had many beers at a bar and figured out something about how we might proceed in figuring out how to weight these different estimates, how to determine who’s good and who’s not, and, generally, a path forward to really create something like a Morningstar for equity research. That’s where the name actually came from, a riff on Morningstar. It was StarMine, star ratings on analysts in terms of data mining for stars.

This is before Joe really noticed that data mining has a negative connotation in quant finance, but that’s fine. So yeah, we started building metrics of how accurate these analysts were, how good their buy-sell recommendations were. And then it grew from there. And we built out a suite of analytics on stocks or anything from earnings quality to estimate revisions.

We did some work with Fidelity on independent research recommendations that still seem to exist within the Fidelity broker site today. A lot of really interesting work just applying rigor to what, at that time, was I guess what you would call alternative data, because you’re really getting into the details of the estimates as opposed to looking at the consensus level. But that’s really all you had to work with. Back then, there wasn’t this sort of plethora of data. It was like price data, fundamental data, earnings estimates, and we really focused quite a lot on the earnings estimates side of things at the time.

Meb: The company eventually sold to Reuters. And then you do a little hedge fund prop trading world applying, I assume, some of these ideas that you’ve been working on. That takes us to what? Post-financial crisis at this point?

Vinesh: Yeah, it does. So I left StarMine in 2005. They later got acquired by Reuters, you’re right, right before the Thomson and Reuters merger. I went to work for one of our clients, which was a prop trading group at Merrill Lynch, who all of a sudden wanted to do some interesting stuff with their internal capital. So I was building strategies from partly based on earnings estimates, but other things too, sort of medium to long horizon strategies.

I was there for about 18 months, then moved over to Morgan Stanley at a desk called Process Driven Trading, PDT. It’s run by a guy named Pete Muller. And Pete has been around for a long time. PDT was founded in ’93. It was still a small group, 20 and 25 people, but really successful, at times been a significant portion of Morgan’s revenues at various quarters, and really just a largely stat arb-type of shop, running faster type of strategy, several day horizon type strategies. And I came in, sort of build out their medium to longer-term strategies and really improve those.

So I started in March 2007. And then four months later, we had the quant crisis in August 2007. So that was fun. And then through the financial crisis, and then I was there through early 2013.

Meb: And then you said, “You know what? I want to do this crazy, terrible entrepreneurship idea.” And ExtractAlpha was born. Tell me the origin story.

Vinesh: I think the origin story really goes back to that quant crisis in 2007. So a little bit of backstory on that. We experienced a few days in the early days of August 2007, where a lot of quant managers suddenly had large losses, our group included, unprecedented 20-sigma-type events, things that you would never model, couldn’t figure out why. And then, the models then bounced strongly back the next day. So there’s something exogenous going on that we’d expect from the models.

And it turns out what we were trading and what other people were trading, what other hedge funds were trading, were largely similar, similar types of strategies. Why were they similar? Well, we looked at what we are basing the stuff on, it’s the same datasets. It was price data, fundamental data, earnings estimates, similar types of models, similar types of data. So even if you get the smartest guys in the room, you give them the same datasets, they’re going to come out with things that are pretty correlated.

And that’s really what happened is you had someone out there liquidating their portfolio, and it causes a domino effect, because we’re all holding the same positions, all holding the things based on these similar types of models. So I was like, “That’s a problem. Let’s solve this problem at the source. Let’s start looking for data that will give us different insights.” So that was sort of the spark for me.

And then many years later, when I left PDT, I realized I wanted to get back into the data world and start-up world, focusing on these unique sources of intelligence, unique sources of data, wanting to do something entrepreneurial, for sure. I loved my time at StarMine. I wanted to sort of replicate that but with more alternative more interesting datasets.

And the origin story was really meeting people, likely, for example, who had these really cool datasets. They weren’t quite sure yet. It was early days. They weren’t quite sure what to do with the datasets, how to monetize them. They weren’t sure if these datasets had value. They weren’t sure if they had the capabilities to go in and do a bunch of quant research and say, “Okay, this is a proof statement. This thing really works. This thing can predict something we might care about. Stock price is thing we ultimately care about, but maybe earnings or something else.”

So, essentially, built it initially up as a consulting company, where I had a few clients. Estimize is probably the first one, TipRanks, AlphaSense, TIM Group, a bunch of interesting companies that especially had interesting sources of sort of crowd source or alternative information, alternatives to the sell side. So that was part of what I was looking at, but really anyone with interesting data.

And it really worked with them to find that value or help them find that value, monetize. I did that for a couple of years. The issue with that is it’s a consulting business, and consulting businesses don’t scale. So okay, we’ve got these interesting datasets we now know about. Let’s turn this into a product company.

So we did that, and pivoted around 2015, 2016, brought on technology group, brought on other researchers, brought on a sales team, and became essentially a hybrid between a quantitative research shop and an alternative data provider. So what we’re doing is looking for interesting datasets, doing a lot of quant research on them, finding where they had value. Most of the time, we didn’t. But when we did, “Okay, this is interesting, let’s become a seller of this data.” And it didn’t matter whether the origin of the data was some other company or something we scraped ourselves, or maybe we bought some data and then built some intelligence on top of it, and then sold it.

We did and we do all of those things. And it really is all about trying to help fund managers find value in these things. Because they’re confronted with these huge lists of datasets, hundreds of them at this point. They don’t know where to start. They don’t know which ones are going to be helpful. They don’t know which ones will slot into their process nicely. Ultimately, it’s up to them to decide. But if we can do anything to get them closer to that goal and make it more plug and play, that’s really our value prop.

Meb: There’s a couple interesting points. The first being this realization early, as you went through this for the early years of the 2000s, which was really in many ways probably a golden era for hedge funds, and then some have done well since, some are a graveyard, but this realization that some data is a commodity. Like you mentioned, some of the hedge fund hotel names were…

I remember way back when looking at some of these multi-factor models that are pretty basic, not much more complicated than the French-Fama stuff. And you pull up a name that scores well. And it would be all 10 quant shops or the 10 largest holders. And that may or may not be a bad thing, but it’s certainly something you want to be aware of. And you could do this for just stock after stock after stock.

Talk to me a little bit about the evolution of data, if this is the best way to begin. How do you guys even think about sourcing the right data, challenges of cleaning it? Just on and on, just have at it, the mic is yours, let’s dig in.

Vinesh: Going back to the early days, you’re right, the simple factor is value or momentum, think about those. We’re looking at right now, as the time when value had a stretch for 10 years where it wasn’t doing much, momentum had increasingly frequent crashes. So if these are your main drivers of your portfolio, maybe you want to diversify that.

And they’re also crowded as you say. Now crowding is an interesting thing to think about. And that’s one of the drivers for what we’re doing. My view is that, yes, when you get to the stage of something like value or momentum, earnings revisions, or price reversals, these are crowded, truly crowded trades.

But it takes a while for something to get to that crowded stage. At that point, they’re basically risk premia in some sense. And a new factor doesn’t get arb’d right away. It takes some time. So one of the rationales for this, there’s a great paper called “The Limits of Arbitrage” by Shleifer and Vishy, as I recall. And this is all about, even if you have a pretty close to a pure arbitrage, if it is not a perfect arbitrage, no one’s going to put their whole portfolio into it, especially if you’re playing with someone else’s money.

So for that reason, these are risk bets. You’re going to want to spread your risk bets. And instead of spreading them for… A fundamental manager spreads their bets across assets or stocks, quant managers spread their bets across strategies. Really, what you want to do as a quant manager is diversify your strategies.

So in the early days, I was, “Okay. We went from just value momentum to we added quality somewhere along the way in the ’90s, early 2000s.” But all that’s based on the available data. And getting clean data was hard and cumbersome at that time. So I mentioned like getting data on CDs.

There was even a guy, he was a customer of Compustat, getting fundamental data from them on CDs. Compustat had not actually saved their backup data. So he was able to collect all the historical CDs and sell it back to them as a point-in-time database. Pretty clever.

So you didn’t have clean point-in-time data all the time. So it used to be pretty tough to get this stuff. It got easier over time. And then the fundamental stuff and, obviously, the market data got quite commoditized.

But if you start looking for more exotic things, it is sometimes tricky to source. Sometimes you got to be creative. Sometimes it is very messy. We work on some datasets, quite a few of them that are not tagged to securities.

So you’ve got dataset where there’s like a company name in it. And this can be common in some filings data, if you go beyond EDGAR filings, beyond SEC filings, and start looking at interesting government filing data. You’re not going to have like a ticker symbol, or a CIK or CUSIP or any other ISIN, some common identifier. You’re going to have International Business Machine. You got to figure out that’s IBM.

There’s cleaning stuff involved. Just to continue with the example of government filings data, a lot of that is some person writing down a form that gets scanned, and then that turns into structured data. And there are going to be errors all over the place there. There’s going to be dirty, messy stuff. You got to work through that.

There’s a lot of cleaning that has to go on. You have to, again, to the point-in-time issue, you have to make sure everything is as close to point in time as possible, if you want to have a clean back test. So you want to reconstruct, “Okay, sitting at 10 years ago, what did I really know at this time?” You don’t always have that information. You don’t also have a timestamp or a date when the data was cut. So you have to sometimes make some conservative assumptions about that. You have to make sure that the data is free of survivorship bias.

So a lot of people who are collecting interesting datasets, they might not realize that once, for example, an entity goes bust, they should keep the data on the busted entity. Otherwise, you’ve got a polluted dataset that’s missing dead companies.

So a lot of these issues, we have to struggle through with some of these more exotic datasets, which are not really pre-canned or prepared for a quant research use case. So we spent a ton of time cleaning data, mapping identifiers, and making sure everything is as organized as possible. And that’s the 80% of work before you even start on the fun stuff, which is, “Hey, is this predictive? Is it useful?”

By the time we reach that stage, you know, some proportion of the datasets we look at have fallen off. They’re too dirty. And then, that’s without even knowing that we’ve got something that could be useful. And then, as I say, the fun stuff starts, you start.

What we do is largely kind of old school, I guess, but it’s hypothesis testing. Do we think that there’s some feature in this dataset that could be predictive of something we care about? And we have to think about what it is we care about, or what this dataset might tell us about.

And the simple thing, but perhaps the most dangerous thing to look at, is stock prices. And it’s dangerous because stock prices are incredibly noisy. And you could have some spurious correlations. And sometimes we find it much better, much cleaner to look for something in the dataset that might tell us about a company’s revenues, or a company’s earnings.

And for a lot of datasets, that can make sense because you’re talking about evidence of how well the company is doing through…I’ll give you an example…through how many people are searching for the company’s brands and products online. We look at a lot of this type of data. That’s direct evidence that people are interested in potentially buying the company’s product, and therefore, there’s a clean story why that should predict something about the company’s revenues.

So that’s actually a much more robust way we find to model things. We don’t always do it. But for some datasets, it’s very appropriate to predict fundamentals rather than predicting stock prices. That’s one of the things that can help when you have maybe a messier dataset or a dataset with a shorter history, which is very common with these alternative or exotic datasets.

Meb: Anytime anyone talks about alternative data, the press or people, there’s like three or four, they always come back to, they always talk about and they’re like, “Oh, hedge funds with satellite data.” Or everyone always wants to do Twitter sentiment, which seemed to be like table stakes that are probably been picked over many times.

We did a fun podcast with the guy that wrote Everyone Lies, Seth Stephens-Davidowitz, and he is talking about all the interesting things people search and what it reveals from behavioral psych. It’s just a really fun episode. But maybe walk us through, to the extent you can – and it doesn’t have to be a current dataset, but it could just be a dataset that you don’t use anymore, either way, I don’t care – of one that you use and how you approach it, and the whole start-to-finish research process that doesn’t just result in some data mining and to test just the UF or quant and on and on.

Vinesh: I’m happy to talk about everything we’re doing. Unlike a fund, we have to be somewhat transparent about our work. So you can even go to our website and see these are the datasets that are our current products, and they’re just listed there. So we got a factsheet. You can really understand what we’re talking about.

So going to your examples, I’ll start with your examples, because you’re right. People name the same few things – credit card data, satellite data, Twitter sentiment. Those come up a lot. Read a Wall Street Journal article, they’ll always be mentioned. We’ve looked at some of these things. Not all of them, some of them, there’s too many players, we don’t feel like we’d add any value.

But just going through them, we are really focused on finding the things that are really likely to be robust going forward. And that means we want some degree of history. We want some degree of breadth. These are the things that are going to move the needle for quant managers, who are our core clients. And we think if quant managers find them valuable, then that’s sort of a real strong proof statement.

So things that quant managers care about, need to have some sort of capacity. They need to have some sort of breadth. And so the breadth thing is a bit missing with the satellite data. There’s some really cool things you can do with it.

The examples are always, you can count the number of cars in a parking lot for a big box retailer. So you look at Lowe’s, Home Depot, and so on, or even food beverage. You can look at Starbucks outside of urban areas. You can see how many cars there are. You can adjust for weather and lighting conditions and all this. And you can get some sort of a robust forecast of maybe revenues for these companies. But it’s a relatively narrow number of companies. So it may not move the needle for a quant manager who’s got hundreds of positions.

Twitter stuff, you’re on Twitter, you know how much noise there is.

Meb: Right, I tweeted the other day, and this tweet got zero traction. So I’m assuming that Twitter blocked it because it was one of the quant research shops that said 2021 set a record for curse words in transcripts. So I was like, “What the F is up with that?” I was like, “What’s number one? What do you guys’ guess?” And I’d said BS was probably the number one. I got no engagement because I think Twitter put it in some sort of bad behavior box or something. But I thought that was a humorous one.

Vinesh: So, you’re at the mercy of the algo. I’ll check that for you. We do NLP on earnings call transcripts.

Meb: See, I’ve uncovered a new database that if someone’s cursing in the transcripts, that means things are probably going bad rather than good. No one’s getting on the conference call and being like, “We’re doing fucking amazing.”

Vinesh: Quick aside, we’ve looked also at news sentiment in China, actually. We actually work with a lot of Chinese providers. Being out here in Hong Kong, we feel like we’re a good conduit between hedge funds in the U.S., UK, and data providers here in Asia. And we looked at some news sentiment stuff.

Interestingly, the reaction to it is much slower in China. And the rationale is largely individual in a retail-driven market. So people respond to news a lot slower than machines do, essentially, is the story there. But if you got a machine, maybe you could be faster.

News and Twitter stuff is fairly fast moving. It’s a little bit noisy. But we started to go beyond that, looking for really more exotic things. I can give you a couple examples.

So one, is to look at something that’s intuitive and scalable and makes a lot of sense and is done really well. Recently, we started trying to figure out how to quantify a company’s innovation based on interesting filings data. So this is something that people have talked a lot about, why is it a value dead? Well, maybe traditional measures of value don’t capture intangibles, so you’re looking at price-to-book ratio. It doesn’t tell you anything about IP, really.

So we started looking for how we could figure out which companies are investing in innovation. So the traditional way you do this is, in some cases, there’s an R&D line item in the financial statements, but not every company has that. And it’s noisy.

So what else can you do? You can look at a company’s IP activity. So you can look at, are they applying for patents, have they’ve been granted patents? You could look at trademarks. That’s something we’re starting to look at now.

And interestingly, we had this idea that you could figure out whether companies are hiring knowledge worker. So if you look at the data on H1B visas that a company has applied for. The company has to say what the job title is they’ve got a job opening for. And if you look at the 10 words that I’ve had the most growth in the job descriptions or job titles, it’s machine and learning, and data and scientist, and analytics and all these words. So when companies hire for foreign workers, they’re usually hiring for knowledge workers. People they can’t necessarily hire as easily in the U.S. And maybe it’s grad students and so on.

So this hiring activity, we think, is a measure of innovation. So we put together something that is, okay, we get the data. This comes from the Department of Labor in the case of the hiring data, and that is a quarterly Excel spreadsheet. That’s an absolute disaster because it’s put together by The Department of Labor. There’s no surprise there. It’s again, like I mentioned, by company name, the formats change all the time. The data is a mess. It’s a disaster. We tried to reconstruct it’s point in time as much as we could. The patent data is quite a bit cleaner that comes in a nice XML format. That’s from the USPTO, U.S. Patent and Trademark Office.

But we put these things together, organize them. It’s fairly simple idea that companies which have the most activity, according to these metrics, relative to their size, because of course a large company is going to have more hiring and more patents than a small one, these companies tend to outperform.

And what’s really interesting is that we’ve got this data going back quite a ways. We started tracking it really 10, 15 years ago. And it really starts to pick up around sort of 2013, 2014. And then you see this massive upswing and it’s exactly on March 2020, where the most innovative companies, the ones that work from home and ahead of digitization, these are the companies that massively outperforms in that period. So there’s this huge rotation into these companies.

And it’s not just individual companies, it’s the industries as well. So we find that this is an interesting effect where the most innovative companies outperform, and the most innovative industries also outperform. And that might be a little bit static because you’re always going to have biotech and software, the most innovative maybe according to our measures, and real estate, utilities, the least. But there are some rotations among those over time. And there are differences among the companies within those industries as well.

So those are an interesting way of collecting data from a very messy source, turning it into something sort of intuitive. And by the way, there’s also a nice slow moving, high-capacity type of strategy. So it’s a good example of how you can kind of be creative about data that’s been sitting around out there for a long time, and no one’s really paid attention to it in the investing world.

Meb: We did a fun podcast with Vanguard, their economist, a couple years ago, that was talking about a similar thing, which was linked academic paper references. Same genre as what you’re talking about with patent applications or things like this. But they were looking at broad sector concepts.

How does this flow through down to actionable ideas? And you mentioned, maybe all these immigrant or job postings are just for tech companies. And all you’re really getting is tech. How do you guys tease out statistics-wise? I know you do a lot of long, short portfolios. But how do you run these studies so that you’re not just biasing it to something that may just be industry bet or something else? Do you just end up with a portfolio of IBM every year?

Vinesh: We definitely try to tease these things apart. You have to. No one’s going to pay us for a set of ideas that’s just tech. And the way we deliver these things is largely as datasets and signals that people can ingest into their systems. And when they ingest them, they’re going to also strip out these bets, if they’re doing it the right way.

So we need to identify something that’s got incremental value over and above an industry bet or value of momentum type of bet is another example. So we need to know that these types of things that we’re identifying are unique. They’re uncorrelated.

So we do a lot of risk controls. We have an internally built risk model we use. It’s nothing too exotic, but it looks at standard factors, you know, industry classifications, value momentum, volatility growth, dividend yield, things that classic sort of Barra-style risk factors. And the signals that we produce have to survive those. In other words, they have to be orthogonal to those. They have to be additive to those. They have to be additives to the other factors we also have in sort of a factor suite.

And they also have to, for example, survive or ideally survive transaction costs. So if you have something that’s very fast moving, it can be useful and incremental, if you’re already trading very quickly. But that’ll only be interesting to serve the high frequency funds and the stat arb funds. And anyone else, they’ll say, “That’s too fast,” relative to the other signals that they’re already trading.

So we have a series of hurdles that something has to beat. And we use some fairly traditional statistical techniques and revisualization and so on to handle that.

Meb: So you mentioned you have booked shorter term, what’s the longest-term signal? Do you have stuff that operates on what sort of time horizon?

Vinesh: Everything from a day to a year, I would say, is the range. We don’t do a lot in the high frequency space. A lot of the data that comes in intraday is largely going to be technical data and things like that.

So we do a lot of daily data. So things that update every day. And in some cases, you have to trade on those relatively quickly to take advantage of the alpha. Maybe it decays fairly quickly. Something that’s based on, for example, analyst estimates, that’s data that’s disseminated quite broadly. And if you don’t jump on it, it’s going to be less valuable. And then we have some things like the innovation one that I mentioned that can be much, much longer and really realized over many quarters, several quarters at least.

Meb: How often do you guys deal with the reality? As we were talking about earlier in the show of, have you had some of these killer ideas, clearly, they work. You start to disseminate them to either the public or your clients. And they start to erode or just because of the natural arbitrage mechanism of, if you’ve got some of these big dudes trading on this that it actually may make these more efficient. How do you monitor that? And also, do you specifically look for ones that are maybe less arbitragable, is that a word? Or how do you think about that sort of consistent process?

Vinesh: We think about it in a few different ways. So our clients are not all big. We’ve got big funds. We get small funds. It’s a real mix. The bigger funds tend to come to us for perhaps more raw data that they can manipulate into something that’s more customizable. The smaller funds might take something that’s more off the shelf.

But either way, first of all, we’re tracking performance of these things on a real time basis. We’ve built a tool to do that our clients can use as well. It’s called AlphaClub. That’s something that we’ll be opening up more broadly soon. It’s basically a way to track for any of these signals that whether it’s our signal or someone else’s, for that matter, that you can track how it’s doing for large caps, mid-caps, small caps, different sectors, what the capacity is, how fast the turnover is, what the risk exposures are, and track that on an ongoing basis.

So we do monitor these things. What we don’t normally see outside of things that are more like technical signals. We don’t normally see a curve which just flattens, just a secular decline in the efficacy of a signal. If you look back at a reversal strategy, so the simplest dumbest quant strategy, but a relatively fast one, an easy one to compute is, “Let’s go long, the stocks that went down the most tomorrow. We’re going to go short, the stocks went up the most tomorrow.” No more nuanced than that.

That actually used to work great in the ’90s and early 2000s. And then sometime around 2003 or 2004, where there’s lot more electronic trading, people trading more automatically, there’s a sudden kink in the cumulative return chart for that, just like that. And then now, it’s pretty much flattened out. There’s no intelligence whatsoever in that strategy and anyone can do it.

Meb: That was one of the systems in James Altucher’s original book, Invest Like a Hedge Fund. I remember, I went and tested them, and maybe it’s Larry Connors. I think it’s Altucher. Anyway, they had some of these shorter-term stat arb ideas. And that one was anything that was down over 10%, you put in an order and exit in the day.

Vinesh: It’s just too easy to do. You can get more clever with it. But still, that’s going to get arb’d away. But something that’s a little more sophisticated, or a little more exotic, you’re going to have fewer people using it. It’s not as if we’ve got thousands of hedge funds trading stuff we’re using.

So we don’t see those clear arb situations. And also, you can see sometimes a factor that flattens out and then suddenly spikes up. These things are a lot less predictable than the simple story of, “Oh, it’s arb’d away. It’s gone. It’s commoditized.” So I think these things can be cyclical. And sometimes, if they stop working, people get out of them, and they can work again. That’s another aspect of this. There are cycles in the quant space like that as well.

Meb: How much of a role does the short side play? Is that something that you just post as, “Hey, this is cool. You’d see that they underperform. So just avoid these stocks.”? Or is it actually something that people are actually trading on the short side? The dedicated short funds, at least until about a year ago are almost extinct. It feels like they’re just…there’s not many left. But even the long-short ones, how do they incorporate this knowledge?

Vinesh: It’s a really brutal game or has been to be short funds, recently. Even if you have great ideas on a relative basis, unless you’re significantly hedging your shorts, then you’re going to get blown up or you can get blown up.

So most of the folks that we work with are, they don’t always tell us exactly what they’re doing, but our understanding, our inference is it’s mostly equity market neutral stuff where you’re not looking for shorts to go down, you’re looking for shorts that are underperform and long that outperform. And you’re attempting to hedge.

And a market like the U.S., you can do that. You’ve got a liquid enough short market, securities lending market. And you can construct a market-neutral portfolio in these things. Or in long-only sense, you can just underweight stuff that looks bad and overweight stuff that looks good.

You go to some other markets, and it’s much harder. I mean, shorting in China is extremely difficult. Just one example China A shares, the domestic mainland Chinese market. So the securities lending market is not mature there. Hedging with futures is very expensive. So in other markets, it can be much more complex. And the natural thing to do is just build a long-only portfolio and try to outperform.

Meb: And what’s the business model? Is it like a subscription-fee as the basis points? Is it per head? And you hinted at some sort of new product coming out. I want to hear more about it.

Vinesh: Historically, our model has been the same as any data provider. You come to us. You test something out on a trial basis. We give you history data. You examine it. You decide if you like it. And then, if you like it, you pay us a fee. And it’s just a flat annual fee per operating group. So there’s a pod at a multi-pod fund or maybe there’s a smaller hedge fund, they pay us just flat fee per year, pegged to inflation. And that’s been the traditional business model for data feeds.

For more interface, we do have some interface as well, those are more than a seat basis. So the fee is $1,000 a year and one person gets a login to a website. So that’s sort of the traditional method.

Now there’s other methods as well, because we think… I come from a trading background. I really believe in these things. I want to put my money where the models are. And I’m happy to be paid if they work and not paid if they don’t work.

And I think this is going to be a paradigm shift with a lot of these data providers. It’ll take a long time because many of them come from an IT and technology background where the mentality is, “I built this. You should pay me for it, whether it helps you or not.” And really, this is alpha generation, so shouldn’t get paid if there’s no alpha.

We’re doing a couple things to make that happen. One is this new platform I mentioned is called AlphaClub. And currently, it’s a platform for the exploration of signals. And really, that’s more sort of visual and exploratory. But what it does is it tracks performance over time.

So since we’re tracking performance, we can even set up something where we get paid based on the performance of these things. So maybe instead of you paying us X thousands of dollars per year, there’s some band where you pay a minimum amount just to get the data, but that goes up if it performs well. And that might be a function of whether you used it or not. It might just be based on its performance, because it’s up to you whether you use it or not as the end user. So that’s one method of variable payments that we’re exploring.

Another method of that is really to become not just a signal provider, but a portfolio provider. So right now, we give people data signals. They incorporate them. They construct portfolios. They trade those. And if they do well, they do well, that’s great. But we don’t get as involved, currently, in the portfolio construction process.

But we’ve had some funds come to us and say, “Maybe we want to launch a dedicated product based on one of these things.” Or, “Maybe we want to run a stat arb portfolio, which incorporates your data, but we don’t want to do all the work to put it together. Can you do that? And we’ll pay you based on how it does.” “Great.”

So we’re starting to build out those capabilities. Some of that may require licensing, which we’re exploring as well. Some of those activities could be licensed activities, depending on the jurisdiction. So we’re exploring all of that.

So this is really getting into more of the alpha capture trade ideas, portfolio construction, multi-manager type of worlds, where we’re still not the ones collecting the assets. But we are getting closer to the alpha side of things, and not just the data side of things. I think that’s a natural evolution that a lot of data providers will probably go through at some stage in their process.

Meb: Yeah, I mean, I imagine this has happened, not just currently, but in the previous iterations where you’ve been where you get a big company or fund that just sits down, gets you in a boardroom and says, “Vinesh, here’s our process. We own these 100 stocks. Can you help me out?”

I imagine you get that conversation a lot, where people was just like, “Dude, just you tell me what to do?” Because that’s what I would say. I’d say, “Hey, man, let’s launch an ETF. We get the ticker JJ, probably available. Let’s see.”

But how often are the funds coming back to you and saying, “You know what? What do you guys think about this idea? Can we do like a private project?” Where you’re like an extension of their quant group. I assume you guys do those too.

Vinesh: We do. Yeah, we have a handful of projects like that. It’s not a ton of them. But we’ve had some of the larger firms come to us and say, “Hey, we’re doing this project. We want bespoke research that only we get exclusive thing.” I can’t go into details on exactly what they’re asking for. But they’re looking for something very specific. And they think that we can help them build that. And they might go to multiple people for this. They might have multiple partners in these projects.

So we do bespoke projects, for sure. That stuff ends up being quite different from the stuff that we provide to everybody. It kind of has to be by its nature. But that is something that happens more often with someone who’s already got the quant group that exists, but they want to scale it externally, in a sense. They’re almost using us, as you say, as an outsourced quant research group. That does happen.

Meb: Tell me a story about either a weird, and it can be worked out or not, dataset that you’ve examined. What are some of the ones you’re like, “Huh, I never thought about that. That’s an odd one. But maybe it’ll work? I don’t know.”? Are there any that come to mind?

Because, I mean, you must every day, be wandering around Hong Kong having a tea or coffee or having a beer and wake up one night and be like, “I wonder if anybody’s ever tried this.” How often is that a part of the process? And what are some of the weird alleys you’ve gone down?

Vinesh: That happens. And then even more often than that, because I can’t claim to be the spark of insight for all of our products, we have someone coming to us and saying, “Hey, I’ve been collecting this data for a long time. Can you tell me if it’s worth anything?” And a lot of those we’ve got NDAs, and I can’t talk too much about them. But there are definitely some weird ones.

We’ve had some where it’s like a website where people are complaining about their jobs. We need to figure out it’s indicative of anything. We didn’t end up going down that route. But that’s an interesting dataset.

There’s an interesting one, which looks at internet quality, for example. So this company can identify whether the quality of internet in Afghanistan suddenly dropped ahead of the U.S. troops pulling out or something like that. So is infrastructure crumbling as a result of a natural disaster or some geopolitical risk or something like that. So really cool, clever ideas that are out there.

Those are ones that are not part of our products. We like them. We think they’re interesting. They’re not the sort of things that our clients typically look for. But I think the really slick and creative.

And then there are others that may sound a little more conventional. But we have done something with and we’re interested in, so things like app usage data. So we work with a company in Israel that has access to the app usage data. Your installs, for example, of 1.3 billion people or devices, a huge panel. So for all these large apps, whether it’s the Citibank app, or Uber, or whatever, we know how many people are looking at these things. And we know it more frequently than the company will disclose in their quarterly filings.

So app usage is something people talk about a lot. But you can really get a nice handle on corporate earnings from some of these things that just by thinking creatively. This company never thought really about, “Hey, we should sell data to funds.” But we had a discussion with them. And they’re like, “Yeah, that sounds great. Let’s explore it.”

Meb: Do you guys ever do anything outside of equities?

Vinesh: Not as much. We are interested in that. And personally, I should say, do we do anything outside of public equities? So people are starting to look at exotic datasets for private equities. And app usage is actually a great example of that. You could have a private company where VCs and private equity investors want to know what’s under the hood a little bit. So you can look at things like that, evidence of the popularity.

Meb: Well, that’s a huge one on the sense to that the private world, there’s no such thing as insider trading. Now the problem is you have to let the company agree that you can invest or need to, or at least find secondary liquidity. And I say this carefully, but this concept of insider trading, where there’s certain data that may not be permissible to trade upon, private equity and VCs seems like a huge area that this could be informative.

Vinesh: And it does seem to be growing there. And I’ll say also, in the fixed income space, we’ve got datasets that really tell us something about a company’s, essentially, you can think of his credit quality, to the extent that we can predict that a company will have an earnings shortfall. That’s going to matter for credit. So we’ve had some conversations with funds about that approach as well.

And did a work doing an ESG, which we’ll get to in a sec, might tie into that as well. And then other asset classes, we personally don’t do a lot in the commodities and FX space. But there are folks looking at interesting datasets there. There’s a company in the UK called Cuemacro, which looks at a lot of similar things to what we do, but their focus is in the macro space.

And then just outside of U.S. equities, I mean, we’re doing a lot trying to identify these datasets in global markets. We have an advantage, as I mentioned, in sitting here in Asia, but having a lot of U.S. clients, but also a lot of these datasets that, I don’t know if we take for granted, but seem kind of well known for the U.S. are not well known or not well used outside of the U.S. And that can be due to you need someone on the ground to identify these things and find them.

There are language issues. If they’re based on natural language processing, you’ve got to recreate your NLP for Chinese, Korean, whatever it is. Governments have different levels of disclosure in different countries. So the amount of public filing information will vary widely. Common law countries like U.S., UK, Australia tend to have a lot of these sort of public filings, other countries a lot fewer. You got to really dig to find even stuff that we commonly look at in the U.S.

Meb: You mentioned ESG, talk to me about what you’re talking about there.

Vinesh: This intersection between ESG and alternative data is a natural fit for alternative data because ESG, by its nature, no one knows what it means. That’s the first thing. What is ESG? There’s no benchmark for it. It’s not like value, where you know, you’re going to build a value factor out of some combination of financial statement data and market data. So it’s kind of the ratio between those two things.

There’s no accepted framework for ESG. And there are literally dozens of these frameworks for the way people look at things. So there are a lot of companies out there, they’re taking very creative and cool approaches to ESG.

The easy thing to do is you go to MSCI, and you get their ratings and you’re done. So you divested low-rated companies, or you divested like coal or whatever industry you don’t like. That’s a simple way to do it. And that’s fine, if that fulfills your mandate.

But we take a slightly different view on this. We think this should be done more systematically thinking about it. As a risk manager, we think about it. These are risk factors. And they’re going to increasingly be risk factors because they’re going to increasingly drive the prices of assets. And part of that, purely from a flow perspective, you see what Larry Fink is saying about ESG. And that is going to drive the companies they allocate to.

So almost by definition, ESG becomes a risk factor, risk premium, I don’t know, but a risk factor for sure. So you start thinking about it in that sense. And you have to look at what are the exposures of companies positive and negative to various ESG issues?

So we’ve started building a tool called FolioImpact that really looks at these things in exactly that framework where it’s a risk model. But the risk factors, instead of value in growth and momentum and industries, are positive economic impact, positive social impact, climate impact, things like these, and both positive and negative. So really taking your portfolio and thinking about it like, “Okay. Well, how do I determine whether the portfolio as a whole and its constituents, its holdings, have these exposures? How do you do that?”

Well, you can do that in two different ways. You can look at the economic activities of the company, so the industry it’s in and looking at segmentation data. And knowing that if a company is using a lot of lithium batteries, Tesla, you’re looking at battery usage, then that’s going to have negative environmental impact on soil, for example. So that’s a good example.

Apple may be the same for battery issues. But Apple has positive impacts, too. Apple is a company that promotes, in some sense, the free flow of information. Google, the same. So you’re looking at companies that have both good and bad impacts.

And you have to think of it in both sides. And so the first way, as I said, is based on their economic activities. And then aggregating that up to the portfolio level to see where you could potentially tilt your portfolio away from or towards different issues that you care about.

And the framework we’ve been using for this is the United Nations’ Sustainable Development Goals, so SDGs. There’s 17 of them that are gender equality, life underwater, climate, soil, all these 17 different things that the UN has decided are the key goals for… It provides a really nice framework for us.

The other way we can look at this is actually what the company is saying. So we can look at company disclosures. And this goes back to, in addition to finding all the swear words in the transcripts, we can also find what topics they’re talking about. So we can look at mapping what the companies themselves talk about in their quarterly calls with all these topics. And we can see some really interesting things.

Back to my example of Apple, so Apple talks more than most companies about gender equality, and increasingly so, and you can track that over time using our tools. You can also track the degree to which they discuss climate issues. And that’s actually really low and has not increased. So unlike other companies, which are starting to discuss climate issues a lot in their disclosures and, in particular, their earnings calls, Apple doesn’t focus on that at all.

And I’m not saying that necessarily matters to their stock price. But if it matters to you as an investor, then you might want to pay attention to that. That’s the entire goal is to really enable you as the investor to tweak your portfolio to exactly issues that you happen to care about or that your investors care about.

Meb: U.S., China, is it a global coverage? What are some areas that you guys cover?

Vinesh: For ESG, if you’re looking at things in the sense of economic activities and what industries companies are in, that’s global. You can do it for any asset, as long as you can have a mapping to the various economic activities. That can be very broad, tens of thousands of companies globally, could include China.

When you’re looking at it from the NLP perspective, this source have the issues that I discussed earlier. So if you’ve got documents from a company in English, then it’s fairly easy to do this. So we’ve got a methodology for taking an earnings call, or potentially a 10K or a Q, or a news data feed, or broker report. Anything that’s like text block in English about a company, we can map it to the SDGs. We can tell which issues are important to a company.

When you get outside of the U.S., it’s as difficult as any other work on text filings for those companies. So try to identify transcripts, or news, or what have you in these other languages, it’ll have the same issues. That’s something that we will tackle in the future. English is a lot easier. And that includes U.S., UK, Australia, Hong Kong, Singapore, and countries like that, Canada.

Meb: It seems like one of those trade-offs, where you’re talking about the efficiency of a certain market versus the potential ability to even trade it. So if you’re going down to lower market cap levels, it’s just harder. But potentially, less efficient when you find some of these things.

One of the insights that I thought was fun was when the reflexive process where the funds become the signal themselves. Was this a public paper? I think a lot of your papers are public. So we can just delete this, if not. But the hedge fund volume indicator signals, that’s something we can talk about?

Vinesh: Yeah, sure. So this is a really interesting dataset that comes from a company called DTCC, Depository Trust & Clearing Company. And they are largest clearing house in the U.S. And they’re basically tracking which types of investors are buying and selling individual stocks globally. This is sort of something where, if you wanted to, you could create effectively. If you had the data for this, if you knew what hedge funds are buying and selling, you could create a hedge fund-mimicking portfolio.

So, you can say, “Okay, well, I knew what they bought. This data is delayed. It’s t plus 3 data.” So it’s delayed, but you can see what they’re buying or selling a few days ago. And if you track that, well, a lot of these hedge funds will get into positions over multiple days. So especially if they’re larger funds, they’re buying something three days ago, they might still be buying it today. That’s essentially what we think is driving this effect.

So you can sort of capture the tail end of their trades, and as sort of a mechanical thing where if you can ride those, then you can certainly benefit from it. Now, there’s certainly a risk here that you’re almost by definition getting into crowded trades by doing this. So there’s a little bit of a chicken and egg here, I guess. Do you want to take advantage of this alpha? And is it going to get crowded almost by definition So, but we think it’s a really rich, interesting dataset. We’re starting to look at that.

In the flip side of that, which has become really interesting in the last two years, which is not what these sophisticated hedge funds are doing, but what the retail investors are doing. Both of these things are interesting and relevant in different ways and for different segments of the market, potentially.

Meb: How the whole meme stock…? You’ve seen the quant quake, you saw the financial crisis, all of a sudden you had some weirdness going on last couple years, is that something you guys just have a bunch of anonymous accounts on Reddit that just insight some of these theories? Have you thought about that in the past year or two? Or is that just something that’s always been a part of markets?

Vinesh: No, it’s always been a part of markets. But in the U.S. market, it’s been a smaller part, until recently, post-COVID. Obviously, this is common knowledge at this point. But trading stocks became the new gambling, and everyone staying at home and trading on Robin Hood and so on.

And we have a lot of funds coming to us… By the way, it’s rare for funds to come to us and say, “Do you have something on X?” Because most of the time, they don’t want to tell us what they’re interested in, what they’re looking at. That’s proprietary.

But in this case, it’s so common, and it’s so well known that we had a lot of funds coming to us and saying, “What do you have that can help us understand what’s going on with meme stocks? Because meme stocks are risky, they’re moving based on things that are not captured by our models.”

So we have been looking for things that will capture that sort of information. Some of those are still in the works, but we have one really interesting one that looks at, not Wall Street bets specifically, but generally financial websites. So we can measure through this dataset the number of visits to the ticker page in various well-known financial websites. So I can’t name the sites themselves.

But any of the common sites where you’d punch in a ticker, to pull up price data or fundamentals or earnings estimates, whatever it is, if you have clickstream data from those websites, and, you know, clickstream data at the ticker level, you can see which companies are being paid the most attention to.

And we clearly saw that the companies with the most attention were just spiking. And we can’t necessarily identify who’s looking at these sites, but it’s a lot of retail traffic. There are certainly institutional investors who look at the sites, but they’re a minority of it.

Meb: I remember seeing Google Trends does their like year-end review reports, and top 10 business searches on Google, 3 or 4 of them were meme-stock related, which to me, it seems astonishing. But, whatever, 2021 was super weird.

Tell me a little bit about your decision to make sweet love and merge with Estimize. What was the idea there? And then what’s the result now? How many folks you all got? Where is everybody and all that good stuff?

Vinesh: I’ve known Leigh since his early years. So I think I got an unsolicited email from him when I was in PDT. And I was like, “Oh, this is cool.” Forwarded around to a bunch of ex-StarMine friends. And we’re like, “This is really interesting.”

So I decided to go meet him for a beer and met up somewhere in the village. And he just described to me what he’s doing. And I thought this is really cool.

So just to recap, Estimize, it’s a crowd sourced earnings estimates platform. It’s been around since 2011, you and I or anyone else can go in and say, “This is what I think Apple or Tesla or Netflix is going to do in terms of earnings and revenues for the next quarter.”

Hundreds of thousands of people contributed to this platform, so it’s very broad. Its contributors are buy-side, students, individual traders, maybe people who work in a particular industry and care about companies in the industry. So it’s a very diverse set of contributors. They’re contributing mostly on earnings estimates and revenue estimates, but also company KPIs, like how many iPhones Apple sells, macroeconomic forecasts, your nonfarm payrolls, for example.

And there’s been a ton of academic research that’s been done on this in the last 10 years that shows that these estimates are more accurate than the stuff that the sell sides are pumping out. And that you can use this data to really predict not only what earnings are going to be, but how the stock is going to move after earnings are reported.

Because we’re really measuring what the market expects. And if we have a better metric of market expectations, and we know whether a beat is really a beat or miss is really a mess.

So Leigh explained all this to me back in 2013 or something. I came on as an advisor, had equity, in the company for a long time, followed his progress and helped out where I could in terms of…we wrote a white paper together. Leigh and I introduced the data to a lot of funds over the years.

And then late 2020, early 2021, we started talking about joining forces. So the idea there was we built up a really nice suite of data products. We had a sales team that was going out and getting into the market with these things. We also have a research team that is able to extract insights from datasets, including the Estimize data. And Estimize has this amazing platform with tons of contributors and really rich data, though, it just makes sense to bring that data in house.

So we worked through that merger, completed in May of 2021. A little bit before you talked to Leigh last year. And it’s going great. There’s a ton of interest in the data and we have people who are saying, “Okay, can you give me all the stuff you know about earnings.” We say, “Okay. Well, we know what the crowd is saying, we know what the best analysts are saying. We have a view on earnings from the perspective of web activity like the Google Trends type of data you were talking about.”

We might have folks come to us saying, “Give me everything you’ve got for short term sentiment,” and that could be post earnings announcement drift strategy for Estimize, and it could be some of these other things that we’ve talked about as well that are sentiment-related, like the transcript sentiment.

So we’re able to provide suites of datasets to funds who were looking for things. And then, on the Estimize side, we’re going to work on continuing to grow that community getting more involved in a lot of the platforms on things like Reddit and discord servers, and so on. That data is also available, actually, interestingly, within a discord bot called ClosingBell.

So if you’re an admin of one of those groups, you can install the ClosingBell app, and then you can grab a ticker and see what the Estimize crowd is saying. So we’re embedding that more into the way people work today, and the way the crowd interacts with itself today, as opposed to just keeping that within the Estimize platform. Because we know that workflows have changed in the last two years.

Meb: What’s the future look like for you guys? Here we are 2022, how many folks do you guys have?

Vinesh: We’re 10. And we’re distributed globally. So we’ve got our headquarters here in Hong Kong. And it’s been great starting a company here. It’s low corporate taxes. It’s a very business-friendly climate. There are other issues going on in Hong Kong, obviously, from a political perspective and COVID perspective, that are probably not worth getting too much into. But it’s a great place to have a company base. And we’ve got an R&D team based out here.

But with the Estimize merger, we brought on a few folks in New York, and Leigh continues to advise from Montana. And then, we’ve got a global sales team. So we’ve got salespeople in the U.S., UK, and here in Hong Kong, who were talking to all the funds and potential clients. So it’s very distributed. And we were ahead of that curve. Although we always had a small office in Hong Kong, we’ve always been kind of global in that sense.

Meb: So what’s the future look like for you, guys? What’s the plans? Is it more just kind of blocking and tackling and keeping on? Are you Inspector Gadget on the hunt for new datasets and partners? What’s next?

Vinesh: Anyone out there, if you got a cool dataset, you want to find out what it’s worth, talk to us, reach out. We’re always in the hunt. We’re looking for datasets ourselves as well. We’re looking for new ways to monetize datasets, whether that’s through investment vehicles, or new markets to tackle whether that’s geographically or asset classes.

And we’re looking for interesting new ways that people are thinking about data itself, whether that’s the workflows of data, like I mentioned, through Slack, and so on. Or also looking at ESG, which is just such a huge topic that we’re just dipping our toes, to be honest. This is new. That’s going to be a whole new world.

So these are a lot of the directions we’re taking, but also just getting these interesting datasets in front of more traditional investors. So our core business has been the hedge funds. The hedge funds are always ahead of the curve on this stuff. They’re the early adopters. The traditional asset managers and asset owners have been slower on it.

Even those that have large research, internal research teams with direct investments, they’ve been more reluctant to adopt some of these things, and just maybe less technologically inclined, or maybe just more cautious, in general. And also, because a lot of these things are potentially lower capacity, they’re obviously as larger long-only funds looking for larger capacity things.

And we’re starting to find some of those things. But many of the early ones that you talked about, like Twitter sentiment, that’s not going to be useful to a giant pension fund. So it’s too fast moving to have any capacity in it.

We’re starting to build tools for all of those types of investors also to take advantage of these types of alternate datasets. And then going beyond traditional managers, out to the retail and wealth management space and looking for the right partners there. The Estimize data is available on E*TRADE. If you’ve got an E*TRADE account, you can see it there. It’s on Interactive Brokers as well.

But there are ways to get this data into the hands of the everyday investor, whether that’s through an investment vehicle like an ETF, or whether it’s through the actual data on those platforms. That are things that we’re actively pursuing.

Meb: You’re going to answer this question in two different ways, or both. It’s your choice. Looking back over the past two decades, in financial datasets and markets, we usually ask people what’s been their most memorable investment. So you can choose to answer that question, yes or no. You could also choose to answer what’s been your most memorable dataset. So that’s a unique one to you, if there’s anything pops into your mind, crazy, good, bad in between, or answer both.

Vinesh: So there’s a dataset I wish I had, which was back in the late ’90s when talked about the internet bust. I talked about similar website earlier, but there was a website that collected people’s opinions on the dotcom companies they worked for. And the platform is called fuckedcompany.com. It was great.

Basically, everyone would be sitting in their offices, South of the Market, and like looking up their competitors on this platform and seeing, “Oh, we just had to layoff, 30 people,” whatever it is. If that were data, if I could get the time grab that, scraped it, done some NLP, it would have been great for knowing which internet companies to short at the time. It’s a dataset that never was a dataset that should have been. And it was very memorable.

Meb: Glassdoor, reminds me a little bit. I wonder. It’s always challenging just between like, you have the company, you have the stock. You just have people who are maligned and want to vent. It’s noisy, I think, but interesting. Go ahead and answer, then I got another question for you too.

Vinesh: I just think, if you’re looking at the, of course, level we’ve done at ExtractAlpha, the most memorable equity position was just in Estimize, honestly, because that got us together. And really, that was our engagement many years before the marriage. So obviously, I have to give credit to Leigh in the platform he built over that time.

Meb: I was rapping with someone on Twitter today, and maybe you can answer because I don’t remember at this point, and talking about datasets, and someone was like they have all these active mutual funds that are high fee traditionally, and someone was actually referring specifically to Ark and the new fund that came out that’s an Inverse Ark fund.

And they said, “How come people don’t replicate mutual funds?” And then I said, “There used to be a company that did this back in the ’90s, the active mutual funds.” But I can’t remember if it was a fund or a company? It’s not 13Fs, but it would just use the funds. Does this ring a bell? Was it parametric or something?

Vinesh: 13Fs are one way to go for this. And we do have a partner company that looks at 13F data and finds a really interesting value in finding the highest conviction picks of the best managers. But what you’re particularly talking about doesn’t ring a bell for me.

Meb: My man, it was fun. It’s your morning, my evening, time for a brewski, you can have a tea or coffee. Where do people go if they want to subscribe to your services? So I’m going to forewarn you, guys, don’t waste Vinesh’s time if you just want to squeeze out all the best signals out of him. But seriously interested in your services, where do they get a hot data set that’s just been unearthed that no one knows about? Where do they go?

Vinesh: Our website extractalpha.com. We got an Info page there, a Contact Us page. You can write to info@extractalpha.com. We’re on LinkedIn as well, of course. And then for Estimize, if you’re interested in that platform, obviously estimize.com. It’s free to contribute estimates and free to dig around that platform as well. So I encourage people to look at that as well.

Meb: Awesome, Vinesh. Thanks so much for joining us today.

Vinesh: Thanks, Meb. I appreciate it.

Meb: Podcast listeners, we’ll post show notes to today’s conversation at mebfaber.com/podcast. If you love the show, if you hate it, shoot us feedback at mebshow.com. We love to read the reviews. Please review us on iTunes and subscribe to the show anywhere good podcasts are found. Thanks for listening friends and good investing.

RELATED ARTICLESMORE FROM AUTHOR

Episode #535: KraneShares’ Brendan Ahern on China’s Economic Landscape: Is It Still Investable?

Episode #534: Michael Melissinos – Mastering the Art of Trend-Following

Episode #533: Eric Crittenden & Jason Buck Explain Why Best Investors Follow the Trends

RELATED ARTICLES MORE FROM AUTHOR