Quantifying Self Attribution Bias with Meng Wang | Excess Returns Podcasts

When fund managers outperform, they tend to attribute it to their skill. When they underperform, they tend to blame external factors. While that information has been known for some time, it wasn't something that researchers were able to quantify. But the advent of ChatGPT and large language models has changed that. In this episode, we are joined by Meng Wang, a PhD student at Georgia State University. He used this new technology to analyze and quantify self-attribution bias among fund managers and recently published a paper "Heads I Win, Tails It’s Chance: Mutual Fund Performance Self-Attribution?" where he highlighted his findings. We discuss his research process, what he learned and the most important conclusions for investors.

We hope you enjoy the discussion.

SEE LATEST EPISODES ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://www.validea.com/excess-returns-podcast⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

FIND OUT MORE ABOUT VALIDEA ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://www.validea.com⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

FIND OUT MORE ABOUT VALIDEA CAPITAL ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://www.valideacapital.com⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

FOLLOW JACK Twitter: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://twitter.com/practicalquant⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ LinkedIn: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://www.linkedin.com/in/jack-forehand-8015094⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠

FOLLOW JUSTIN Twitter: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://twitter.com/jjcarbonneau⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ LinkedIn: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://www.linkedin.com/in/jcarbonneau⁠⁠⁠⁠⁠

[00:00:00] Welcome to Excess Returns, where we focus on what works over the long term in the markets. Join us as we talk about the strategies and tactics that can help you become a better long-term investor.

[00:00:30] We work through Meg's research on mutual fund manager self-attribution biases.

[00:00:35] Ever wonder how these managers credit their success or failures, is it due to their own skills or external factors?

[00:00:40] Meg employs Jack GPT to decode this and look at the communications from funds, assigning each manager a unique self-attribution score

[00:00:47] and looking at the performance of those who score high and low based on self-attribution bias.

[00:00:52] And that's not all we also venture into Meg's research on utilizing AI to decode images in company presentations.

[00:00:58] Also, we are joined by our good friend and co-host of the Education of a Financial Planner Podcast Matt Ziegler, who asked some questions of Meg as well.

[00:01:06] Thanks so much for listening. Please enjoy this discussion with Meg Wing.

[00:01:11] Hi, Meg. Thank you very much for joining us today.

[00:01:14] Hi, Sixth Ball Heave-Me here is a great pleasure to talk with you guys and share my research findings.

[00:01:22] Very, very, very good. Very cool. Yeah, we, I think we first stumbled on your research on Twitter, which Twitter can be kind of a train wreck sometimes, but it's good for finding and uncovering interesting pieces of research.

[00:01:36] And I think what got our attention is you spent some time, considerable time in research and energy into some studies using chat GPT, large language models and AI

[00:01:50] and specifically on the, I think in the investment state. So we'll sort of talk about what that research is and what those studies are in a few minutes.

[00:01:58] But maybe like starting at a high level for you, it's always interesting when these new technologies, I think come on the scene how the academic world sort of embraces them and how researchers sort of engage with technologies like this.

[00:02:16] So, you know, I just want to start with you and sort of ask you, I guess out of the gate, you know, how what got you into working with sort of chat GPT in this type of technology.

[00:02:27] Yeah, so actually, you know, I was a big fine of machine learning before I entered the PhD program.

[00:02:34] So, you know, when I started doing research in finance as a PhD student, it's just, you know, very natural that I developed research interests into the application of AI into the finance.

[00:02:49] And so with regard to the specific research paper, so basically it's quite interesting because initially I didn't intentionally choose GPT as the base model.

[00:03:01] But it just happened to be the case that, you know, when I run a house race of different model candidates and GPT just, you know, slightly ultra from other comparable transformer based models.

[00:03:12] So I decided to use GPT as the base model of my paper.

[00:03:18] That's interesting. So you kind of sort of started by looking at all the LL models out there and then said what one's most efficient?

[00:03:25] What one can help you accomplish the type of research that you're looking to do most efficiently.

[00:03:30] It's just as a side note, I something caught my attention earlier today that I saw that open AI was testing a new model.

[00:03:38] I don't know the name of it, but then they scrapped that because it wasn't as efficient as they thought it was going to be.

[00:03:46] And so they kind of moved away from that. So in your process, I guess you kind of were running the horses alongside each other and then you chose chat GPT as the best one.

[00:03:55] Yeah, exactly. Yeah. So basically the ironic, you know, house race, all different models and GPT started out to perform other models.

[00:04:03] So that's why I used to use GPT as the base model throughout the paper.

[00:04:07] I'm just curious, what are the major models you look at when you run a horse race like that? Like what are the major players out there today?

[00:04:13] Yeah. So like Rubata and also like XRNIDE because I'm trying to do some specialized tasks.

[00:04:21] I'm trying to do some synchronous classification model, synchronous classification tasks.

[00:04:25] So the models I choose are basically all those popular models people use to do those synchronous classification jobs.

[00:04:34] I'm just curious when you think about this before we get into your research and we think about this from the perspective of investors in general, like what do you think the biggest benefits of these new technologies will be for investors?

[00:04:46] You know, if we look 10 years forward, what do you think the biggest benefits will be?

[00:04:50] Yeah. So I think the biggest benefit is nowadays you can easily use those AI techniques and you can to do a better job.

[00:05:02] So you can, for example, with regard to detection analysis. Compared to 10 years ago, if you want to adopt some national language processing techniques, 10 years ago, you probably need to write maybe hundreds of or or so,

[00:05:20] or you can do a lot of lines of course in Python to do those tasks. And also the performance was not really that good but nowadays, you don't have to do that right with large number model.

[00:05:32] You can just write a simple prompt so as an eyes you know how to write good prop. You can do a lot of tasks with you know, a prop engineer in.

[00:05:41] And also the performance is really amazing. So so I think the you know, the bar off using AI is getting lower and lower but in the meanwhile, the performance is getting better and better.

[00:05:51] So yeah.

[00:05:53] Can you just comment on that? That like the simple and good in the prompt because I think a lot of whether they're individual investors or maybe advisors who have been in the business a long time.

[00:06:04] They haven't really played around with this yet. Could you maybe just speak for a second on what what makes a prompt simple and good and why is that barrier coming down?

[00:06:15] Yeah, so I think that the biggest advantage of using prompt is it already has print, pretrained in a models. So because for example, chat chat, it's pretty much a lot of data sources.

[00:06:33] So don't have to really construct a very complicated training site to train the model to do those specialized tasks. And instead, you can just use a field shot like you can just give a few examples.

[00:06:46] For example, you can you can tell your model like this is the thing I want to do. And the model will be easy, you know, we'll easily understand those instructions. So so I think that's you know, the biggest advantage of large assumption model compared to other pretrained models we have in the past.

[00:07:04] So let's get into your paper because your paper was really great. And you know what brought us to you is, you know, two of the people we respect the most in investing Larry Swedrow and Daniel Crosby. I don't know if you're familiar with both of them. I know you've they wrote about you see probably are but they both highlighted your research. And whenever those two people highlight someone's research, you know, we know it's really, really good.

[00:07:22] So that's kind of what brought us to you in the first place and your papers called heads, I win tails as chance mutual fund performance self attribution.

[00:07:29] So can you just talk a little bit about the setup like what you were looking for and what got you into that area.

[00:07:36] Yeah, so in my paper basically I studied how mutual funds attribute their past performance. So I look at mutual fund shareholder report where you know SEC, you claim mutual funds to disclose their performance information the shareholder reports.

[00:07:52] So they're going to talk about what contribute to their past performance and what the track from their past performance. Right, so starting from there we can kind of do analysis, we can do analysis and we can study the dynamics and also the implications of those attribution information.

[00:08:09] But you know to accurately identify those information is very challenging to use the traditional approach such as back off first approach. So in the past people turn to use those dictionary or back off for support.

[00:08:25] So basically you have you want to construct a dictionary right and in the dictionary, you have a lot of keywords. And you can calculate the frequency of these keywords in any texture contents and to proxy for any information you want.

[00:08:41] But the challenge is those dictionary approach doesn't consider that the meaning of words in different texture content and also it doesn't consider the semantic feature at the sentence level.

[00:08:53] So for example, if I give you two sentences the first sentence feels like value funds out to perform growth funds. And the second sentence feels like growth funds under perform value funds.

[00:09:10] So basically, symmetrically this two sentence have exactly almost exactly same implication right because they're trying to say that value funds are better than growth funds.

[00:09:21] But if you use keywords approach or if you use use that dictionary approach then because out of performance and under perform these two words have exactly opposite meaning is because of that if you use keywords approach it probably will end up with a very wrong conclusion.

[00:09:39] You end up with a conclusion that you know, like these two sentences have opposite meanings. So if you want to try to use some natural language processing model here.

[00:09:53] So with the development of natural language processing techniques notice we can actually train a model to help us processly capture those information and capture all those you know, sentence level semantic feature.

[00:10:06] We don't have to rely on the human rules. We don't have to design rules by ourselves. We can use national grid process model to do this in a very fast way and also in a very accurate way.

[00:10:18] So that's the motivation I want to adopt a natural language processing model techniques in my paper.

[00:10:25] So I understand the outperform under perform and I just want to make sure I understand this because this is like the

[00:10:31] the this is the dream of being able to use this stuff. So outperform versus underperform is obvious what about like outperform versus ought to outperform can like are you able to drill into that layer of understanding in the language.

[00:10:49] Yeah, exactly. Yeah. So I think the bottom line is like as long as human being you can tell the meaning of this sentence then I deal with model should able to do exactly same thing as a human being right.

[00:11:07] So it will consider like every part of the semantic feature at the sense level.

[00:11:12] But you know to be able to train a model to do the you know to do something like that you have to construct a very high quality training site.

[00:11:21] So in other words, you have to give some examples right so and you have to let your model know like this or the examples of the good performance and this or the examples of bad performance.

[00:11:33] So you want to give some very complicated sentences in that scenario so the model can learn from that.

[00:11:39] Can you talk a little bit about the volume of data here? I mean it would seem like without these new technologies this wouldn't have been possible like how much data did you have to analyze to be able to do this.

[00:11:48] Yeah, so actually, you know, the performance really amends it amazing. You don't have to give a lot of synthesis.

[00:11:55] So in my case, you know the training side is just 2000 synthesis.

[00:12:01] So and the performance stabilize after training like a solid nine 500 synthesis.

[00:12:08] So so it's very amazing you don't have to construct a very large you know training site to do this to do this task.

[00:12:17] Can you talk about you mentioned the training site? Can you just talk about what that is and like how that kind of leads to what you do next.

[00:12:22] Yes, so basically you want to give you know as many categories as possible of synthesis to the model right.

[00:12:31] So you want to tell your model what these are the examples of internal you know factors and this are the examples of external factors.

[00:12:40] So for example, you want to give something related to stocks election. You want to give some synthesis related to like benchmark deviation right.

[00:12:48] So you want to give as many types of categories as possible to the model.

[00:12:53] So basically for each type I remember that in my paper I basically do something called stratified sampling.

[00:13:01] So I just use keywords to identify all those related different categories and for each you know for each category.

[00:13:09] You want to extract maybe 20 to 30 synthesis and you just give all those synthesis to the model and in total you have like 2000 synthesis right.

[00:13:20] You have different sub categories and then the model will learn that you know what was the meaning of each sub category.

[00:13:27] I'm just curious how long did this take look if you think about when you first started thinking about doing this paper to when you had a published paper like how long is that process.

[00:13:35] You're talking about to twinning the model or talking about writing a paper yeah, I'm just thinking about like starting training the model like when you think about this is an idea.

[00:13:43] This is something I want to test all the way to when you have a final paper I assume it's a pretty long process but I was just curious how long it was.

[00:13:48] Yeah so I expect to you know when I start the project I expect to spend a really you know long time with regards to training process but actually it turned out that the training process was really quick.

[00:14:02] It just spent I just been like maybe like one month to finish the entire training training process.

[00:14:08] I think the most time consuming part is to construct a training training sample to construct a training site and once you have the once you have a decent training site is going to be you know where you're fast to to to to train the model to understand those you know symmetric meanings but with regard to to the paper.

[00:14:30] Actually took me much much longer to to finish the paper because you know you have to do a lot of tests using those measure yeah.

[00:14:41] Can you just talk as one of the find one more thing here is we go you talked about the paper using a two layer natural language processing model can you talk about what that is.

[00:14:48] Yeah sure so basically it's a GTI 3 model fine toned using you know specialized training sample so the idea is that if you keep any sentence to the model then the first layer of the model will be able to identify if there is any attribution information in the sentence or if there's no attribution information in the sentence.

[00:15:14] So to give you some example if you have a sentence goes like we we we we did very well in the industrial sector industrial sector if that's the sentence the model will say that.

[00:15:28] You do not have any you know attribution part in the sentence but if you have a sentence like we did while in industrial sector because we you know because of the effects of individual stocks election.

[00:15:42] Then in the first layer of model will will be able to detect you know stocks election is the attribution part of the sentence and pass that attribution information to the second layer.

[00:15:53] And the second layer of the model will be able to you know to classify those information along two dimensions the first dimension is you know whether the sentence is talking about something contribute to the performance or whether it's something detract from the performance.

[00:16:10] And the second dimension is whether it's about something you know internal or fun specific or it's something external or not fun specific.

[00:16:21] So so essentially you know for every sentence in this chapter report my model will be able to detect whether it's a performance related and performance attribution sentence and be able to classify those sentence into different categories right so at the disclosure level I'm able to aggregate all those information.

[00:16:39] To to to to get a score basically I can I can say what other you know proportion of your internal contribution of your external contribution and your internal detection external detection.

[00:16:53] And I'm able to you know calculate the discrepancy in the attribution of your performance contributors and then the performance detectors.

[00:17:03] So just to define the term before we get into the findings a little bit more we're talking really about self self attribution bias here which is something that's obviously very common among money managers including myself can you define what that is.

[00:17:14] Yeah so self attribution bias is a very very classical concept is psychologist studies so it refers to a scenario that you know individuals turn to attribute successes to themself and attribute failures to some external factors.

[00:17:31] So in behavior economics basically theorist that you know if a trader has a self attribution bias then he or she will you know cannot objectively update his belief on the past.

[00:17:47] Performance so basically when you observe your past performance you're going to try to learn your skill based on observation and if you have successful.

[00:17:57] You have the attribution bias you tend to believe that this is because of your skill but if you have unsuccessful you know you are coming and we have the attribution bias there tend to believe that it's just some kind of the noise right so because of that you're going to develop something called over confidence over time.

[00:18:17] So you're going to believe that you are actually better than who you are so you're going to you know treat more aggressively you're going to do some access of treating you can take access race which you know negatively affect your future performance that's the idea of the self attribution bias.

[00:18:34] So the idea is if I outperform I'm a genius if I underperform I'm going to try to find some factor basically that led me underperforming does nothing do with me yeah pretty much like that yeah well it's just a cognitive bias I think it's not a good thing.

[00:18:47] You know it's my comedy sin bias in different groups of people you talk about this a little bit already but can you give some more examples of like the type of statement that would be on both sides of that like so obviously you know if I if I underperform I might try to blame a sector or something like that or what's going on in the overall market like could you explain some of the types of statements you looked for in terms of determining whether the person was taking you know responsibility for it or whether they were blaming something else.

[00:19:14] Yeah so it's actually very interesting so you know when a fun all to perform or has you know something contribute the performance the fun manager typically try to say that.

[00:19:27] It's because that we are different so it is like starting type of benchmark deviation so to try to convince you that for example we did while in the sector because we have some benchmark deviation or because we have some stocks election in the in the sector.

[00:19:44] However you know sometimes we lose the money manager try to always a vice versa is because of non-fun specific then so for example if you are well you find right so you can say that we did very you know we did relatively poorly in the past.

[00:20:04] Because you know value stocks on the perform the growth stocks in the in the past reporting period right so trying to convince the audience for that is not just you like everyone if it's a value fund then pretty much stuff are in the same way.

[00:20:19] So so that's the idea of internal you know attribute our first is external edge.

[00:20:25] I'm just curious for a hand it back to Matt just one other thing how do you look at like do you look personally these to try to evaluate it so if the model looks at all the statements in one of these.

[00:20:34] And it makes a determination as to you know whether the person is taking you know responsibility or whether they're blaming other factors like do you guys it as you're testing it do you read some of these yourself and say I do I think it came up with the right results is that part of the process yes we have a test side basically.

[00:20:49] You know I was a half off to and these students we're able to construct a test site right so basically we we manually label those synthesis into categories and we use the test side to validate the performance of the model.

[00:21:05] And I average my model is able to achieve accuracy of 89% so yes it's very you know amazing so I actually before I do this project I didn't expect the model is able to achieve such a high accuracy but you know there's the power of large and action model.

[00:21:25] Does that make you susceptible to a self attribution bias in your confidence in models.

[00:21:30] I don't think you know yes it's not because of me is because those so Steve I love her so the large and small I'm just the one to you know kind of find some them all.

[00:21:41] Okay we'll read your disclosure so let's see fair so one thing that really caught me in this that just real it jumped out because when we're doing do diligence and things like this on managers right we're always looking for things to anchor on and another bias in another way.

[00:22:00] So what's up and you gave us this self attribution score could you explain the score and how you think it's useful.

[00:22:08] Right so basically my self attribution score is the discrepancy in you know how you attribute your performance contributors and how you attribute your performance detector so for example.

[00:22:20] Like your disclosure when you talk about what contribute to your performance maybe like 80% of the discussion are internal factors and then 20% are external factors but in contrast when you talk about what detract from the phone performance then maybe 40% of the discussion are internal and then the remaining 60% are external so it's just the case then basically you see that there is a discrepancy.

[00:22:50] So you can see how you attribute performance contributors and how you attribute performance and the self attribution score is based on that discrepancy so in the ultimately if you have a higher self attribution score then that implies that you are more likely to suffer from the self attribution bias because you're more likely to talk about yourself when you talk about performance contributors and you are more likely to talk about you know external factors when we talk about performance detract.

[00:23:20] The follow-up question to that does that mean as a skeptic of these things too not necessarily of well of all things would managers hack this score like is this one of those things that if people were aware of they would start to go like it would be a bad thing if people got better at this.

[00:23:39] Yeah that's a very good question so actually you know another hypothesis I made in the paper is probably you know mutual fund managers they just try to strategically do this things because they try to convince their invasor they are good.

[00:23:54] So it's not necessarily that they truly believe that they have the self attribution bias but because they try to convince their invasor they are good.

[00:24:02] But you know the findings of paper cannot suggest that the attribution bias in the shareholder report is more likely to step from their competitive bias but rather strategic signals because so so so so because I found that the variation the attribution you know information is correlated with manager's future behavior in a way that is predicted in you know

[00:24:31] behavior economics theories so if it's just a simple strategic signal then there is going to be very hard to explain such relationships between manager behavior and attribution information.

[00:24:48] Yeah so that's pretty yeah that's basically why I claim it is a is a competitive bias but not a strategic signaling.

[00:24:59] Which I think is a huge part and that was one of the insights of the paper because that differential between the cognitive bias and the signaling value and how strategic it is.

[00:25:11] It was really interesting that you were correct me if I'm wrong the paper basically parses out like no there's there's there's actually something going on here it's not just everybody's a brilliant marketer in this space.

[00:25:23] Yeah because actually you know before I do that analysis my initial conjecture is you know they just try to strategically do this in the show report right but skeptic in me that's where my brain goes first right but but you know it turned out that you know that I'd at least my findings suggest that is more likely to be a competitive bias but not a strategic signaling behavior.

[00:25:49] Yeah so that's a very interesting thing I think yeah.

[00:25:54] Any other ways that you see this score being useful so specifically the self attribution score like is this something I should be bringing up to the director of research at my RIA and saying like we should be applying this when we parse manager reports.

[00:26:10] Yeah so I think it's useful to many people right so for example as mutual fund in vice versa you should you should carefully look at how your manager explain their past performance.

[00:26:21] So you can sort of use this self attribution score to see if your manager has a strong self attribution bias because one of one of the funniest of my paper is when a manager exhibits stronger self attribution bias is more likely to.

[00:26:39] To have a worse performance in the subsequent period right so as mutual fund when you ask her then we will find your manager has a very strong self attribution bias maybe you should you know just pull out your money from this manager.

[00:26:54] And it's also important to mutual fund families right so as best you know the governor of the mutual funds you should also oversee the mutual fund managers to see if they have any attribution bias and the bias negative effect their future performance then this is a very important issue.

[00:27:15] And maybe this is instead of point in time maybe this is like a point over time question and it's relative to that because the predictive value gets really interesting.

[00:27:24] If did you track at all in the paper or elsewhere the like streaks like somebody's on a streak they're getting really really lucky a quarter a year a multi year period in a row how this.

[00:27:38] So the self attribution score would like evolve over a period of time and in districts impacted.

[00:27:44] Yeah this actually very good point so well basically you know the theories say that when you have a series of successful investing outcomes then you're more likely to be biased because those successful investing outcome can you know reinforce your belief on that you are actually you have skills.

[00:28:07] So in my people are basically find something very similar so I find that when a manager has you know very strong pass in western outcome the manager would be more likely to have a stronger self attribution bias so so yeah that's in something that's consistent with your argument.

[00:28:28] And if Michael Lewis shows up at the end you should be really concerned is that the other tail here.

[00:28:34] Well yes I don't know we can maybe.

[00:28:41] I just major conclusions to you so you write this paper it's out there in the world or it's on its way out into the world in full what do you think the biggest takeaways are for either regular investors or professional allocators.

[00:28:58] Right so I think the biggest takeaway is actually you know I analyze the texture content right which is something that people.

[00:29:07] Have rarely analyzed the past of those mutual funds in a disclosure so I think with the development of large action model actually in future we can actually you know do more analysis based on.

[00:29:22] Those texture content from the fund manager we can really know you know what what kind of people this guys are right because eventually we we in finance you know.

[00:29:34] We give money to the future from manager we are we try to you just get a sense of how good your manager is right but not only just based on his skills but also try to know.

[00:29:48] What kind of person what kind of people those managers are right so use those large object models and use those you know texture content you can actually do some analysis based on those texture content to to get some in size from from the

[00:30:03] and those can really help you investors to to better understand your fund managers.

[00:30:10] You know it's another good example investors pay a lot of attention to performance and that's what you know most people are looking at but I think you know research like this studies like this can kind of go way beyond.

[00:30:23] The performance of a fund and get at to your point you know who are these people are they being honest with their shareholders is their communication methods been consistent over time.

[00:30:33] And how that kind of plays into you know the way that they.

[00:30:37] View managing money for people and trying to accomplish what they're accomplishing so.

[00:30:43] You know congrats on a very interesting paper I think that there you know your point there's probably many ways that this to maybe even be commercially.

[00:30:53] You know utilized possibly so that happens will be happy that we had you on as a guest when.

[00:31:01] You know maybe this turns into a business for you at some point someday somewhere yeah.

[00:31:06] So just kind of pivoting to a couple other pieces of research that you've worked on you had a paper before this one where you used AI in the title that paper was visual information in the age of AI evidence from.

[00:31:23] Corporate executive presentation so can you just maybe talk a little bit about what you were you know what you were studying in that paper.

[00:31:32] Yeah so well in that paper we study corporate presentation slice so basically we use deep learning we use AI to sort of extract the graphic information from those presentation slides and we find that certain type of graphic information.

[00:31:52] Can actually help in west first to better understand the business operations we find like certain type of graphic information can help predict.

[00:32:01] Showtime announcement return and also not around cash flow.

[00:32:06] So you know when we start doing that project we think the idea is very interesting because if you look at those presentation slides you will say that first is that of having a lot of text or contents in the slides.

[00:32:21] Very often you know the slide could be just a picture and then a couple of keywords right so there are a lot of visual information in the slides but you know when we start doing the project text analysis was really hot but but visual analysis you know there are just a few studies on how to do graphic analysis.

[00:32:42] So I think it could be really cool if we can you know do some visual analysis or graphic analysis on those data.

[00:32:50] And second more importantly you know actually that the data the information in those graphic in those images are actually quite unique because for example sometimes the many the corporate executives will tend to include a picture about like future product right so it could be a product design or it could be a blueprint.

[00:33:12] So before the plantation you can rarely see those information anywhere else in the market right so you cannot say those information from other this culture from 10 K or from other data sources.

[00:33:26] So I think the information there could be very unique and it could be new to the market so we we we conjecture that it's anti we conjecture that the market might respond to those pet information so we want to you know dig deeper into those images we want to see if we can get any interesting finding.

[00:33:44] So as for the you know that the main conclusions of the paper we basically find that if you have a picture about forward looking operational information then those type of picture can help predict returns.

[00:34:01] So for example if you show me a picture about your future product right if you show me a picture about the product design or if you show me a picture of something to be to be finished in future then we find that in the short window around the plantation the market respond very strongly to that information.

[00:34:21] And we also find that in the long run the cash flow is also associated with the variation in that information right so which we interpret this as the China or the mechanism or why you know those graphic information can predict stock return so that's basically you know two main findings of the paper.

[00:34:41] Yeah I'm thinking of something like I don't even know like Facebook and their AR glasses you know maybe before and I don't know how big of a business line that's going to be for them or how impact will be on their financials but that might be a like an example where in their executive presentation in their annual order whatever they're executive.

[00:35:04] You know what they post online like a system like this could go in you're pulling all those images and then correlating them with.

[00:35:13] You know the type of new product development and initiatives that you know they're going to have and then really not to future profitability and I guess cash flows that kind of a good example of what you were looking at there yeah exactly so that's over you know can check sure here so before we do that that an analysis.

[00:35:34] That's exactly what we think like sometimes if you have you know visual information about the future product then the market might respond to those type of information yeah.

[00:35:44] I guess we should find that and also find another very interesting finding is if you just show a picture about existing product right so if you show a picture about back we're looking at formation all if you show a picture about something that irrelevant then it seems like the market doesn't respond to those type of information.

[00:36:03] Yeah I think with all these things it's like you know trying to find using AI to trying to find interesting places where an investor might be able to get a little bit of an edge and trying to uncover different things and so yeah very very interesting stuff what what else are what other pieces of research are you are you working on anything.

[00:36:32] New and exciting.

[00:36:35] What's your share.

[00:36:37] Yeah so I'm actually very interesting you know how we can use large-term model to explore you know possibilities in finance content so actually I have another you know ongoing project where we use large-term model to try to differentiate between managers is sent to you know

[00:37:01] and then we can use the same type when they talk about ESG in their earning code so like sometimes some firms that talk about ESG because they want to do something so called green washing as the ones who just make them look you know you know better but sometimes we firms talk about ESG in their you know earning code they're going to actually do some real ESG in Westme future

[00:37:29] so we basically you know trying a large-term model we try to differentiate between these two scenarios and the intuition here is that if you really want to do some you know ESG in Westman but instead of just you know generally talking about ESG you're going to have some specific details in your discussion and we can you know rely on those details to to to kind of infer that you're going to do some real investment ESG.

[00:37:58] So we find some preliminary results that you know our model can actually differentiate people into you actually want to do ESG in Westman or you just want to you know start up do some green washing and we find that those guys labeled by our model who can do really ESG in Westme future will actually in Westmore and spend more efforts ESG in future so that's our preliminary finally and of course we are doing some

[00:38:28] robust checks for now we were trying to say you've our funding are driving by other factors of some other you know noise then related to that and this came up my one of my business partners Michael Pompey and has written a lot of stuff on behavioral finance behavioral psychology investor behaviors in the last 10 20 years 5 10 20 years and when he read your paper he said

[00:38:55] this changes the whole landscape of what's available with a lot of the behavioral focused work that was done previously but you couldn't really test it against data sets like mutual fund filings for example the way you did or now investor investor filings I'm thinking like SEC filings and whatever else you're testing for this ESG research which is amazing to be able to sus that out

[00:39:21] do you see it as take a bunch of the I don't say old problems but like things biases and things like that that were already defined like self attribution bias is the big opportunity to take that use lm and gbt processing things to

[00:39:39] like revet those like with greater context more robustness of the data source.

[00:39:44] Yes so I think that's the beauty of you know large-elm model right so we can actually use large-elm model to explore you know unlimited potentially there is definitely a huge potential where we try to apply large-elm model to do a lot of this and self attribution bias is just you know one example

[00:40:08] but we can like giving any texture texture data we can actually do a lot of analysis with the texture data so yeah so self attribution is just one example we can definitely do more analysis with the large-elm model and in the past when people try to identify self attribution bias they change to use some you know for example perform your characteristics or manage a pair of six to to identify self attribution bias.

[00:40:36] So for example people use gender and they use like concentration for photo concentration and they use edu's and credit risk to proxy for self attribution bias but those things are not really you know this is a quite nausea right in mutual fun setting we know that for example gender it's not something that we can really test right it cannot be an outcome variable and also edu's and credit risk can drive you by a lot of effect.

[00:41:05] So we cannot really use edu's and credit risk to accurately proxy for self attribution bias but in this setting when we look at mutual fun managers attribute their past performance we can you know use texture analysis to directly measure that so that's the beauty of this texture part.

[00:41:26] Yeah highlight around that rewind yourself through that last a hit sir I think that's so big and so important that we can test these things in ways that we just couldn't test them 20 years ago that's that's why your work is so cool.

[00:41:41] I have another absolutely I have another follow-up question for you on this.

[00:41:48] Is English your first language?

[00:41:50] Not ready.

[00:41:51] Yeah, not not really.

[00:41:54] Okay, so no, no.

[00:41:56] No, I couldn't help but thinking about this just talking to you and reading this too like ocean vong is one of my favorite fiction writers right now fiction and poetry Vietnamese descent.

[00:42:08] English is not his first language his command of the English language is more profound more poetic than most natural English speakers.

[00:42:15] I'm really curious what this work in large language language just these systems is English not being your first language and advantage to you being able to do this work.

[00:42:27] Oh, like he what kind of sense is I feel like you're awareness of pattern matching might be a little bit different like cross language.

[00:42:40] Do you think that that's an advantage at all would would you say that yeah maybe because when I read those several reports like it just very natural to me that I want to you know understand the meaning of those things right so maybe as as a non-native English speaker.

[00:42:56] I try to spend more time and effort trying to try to understand those information and that's why I come up with this idea to have me better.

[00:43:06] You know to to classify those information from those texture content.

[00:43:11] I think that's a really powerful point I think that's a really really cool detail. I don't think everybody would notice what's going on grammatically with what's going on the way you've noticed it in these studies.

[00:43:21] Okay, yes, that's very interesting.

[00:43:24] Well man, thank you very much for joining us we're going to put links to both these research papers in the description and we're looking forward to.

[00:43:33] You know continuing to follow you and your interesting research and so so thank you so much and best of luck.

[00:43:41] Thank you is pretty better thank you.

[00:43:45] This is Justin again thanks so much for tuning into this episode of excess returns.

[00:43:49] You can follow Jack on Twitter at app practical quant and follow me on Twitter at at JJ Carbano.

[00:43:55] If you found this discussion interesting and valuable please subscribe in either iTunes or on YouTube or leave a review or a comment we appreciate.

[00:44:04] Justin Carbano and Jack forehand are principles of political capital management.

[00:44:07] The opinions expressed in this podcast do not necessarily reflect the opinions of a little capital.

[00:44:11] No information on this podcast should be construed as investment advice.

[00:44:14] Securities discussed in the podcast may be holdings of clients of a little.