In the Age of Prediction, Algorithms, AI, and the Shifting Shadows of Risk, Tulchinsky, founder of quantitative hedge fund WorldQuant, and Mason, a geneticist and computational biologist, jointly question how the evolution of artificial intelligence, data, and prediction will play out in finance, medicine, crime fighting, elections and other disparate areas — and where complex solutions and problems overlap. In an excerpt from Chapter 3, appropriately called the Quantasaurus, Tulchinsky and Mason, director of the WorldQuant Initiative for Quantitative Prediction, write that the exponential growth in data will be required to keep prediction improving in the markets and in the immune system. But finance and medicine, to take just two sectors as examples, have very different data sets, limitations — and stakes.
Excerpted from Chapter 3, The Quantasaurus
In 1995, Igor took a job as a portfolio manager at a New York City hedge fund firm. At the time, mathematically sophisticated, academically trained professionals were popping up on Wall Street, and “quant finance” was emerging. Though markets often appeared to be chaotic, research was turning up persistent regularities. Some were stronger and more persistent than others. Igor built his first algorithm and over the next decade evolved his own set of ideas about the most effective way to use his “alphas” to try to predict prices. Much of this work involved scaling up the number of alphas into what he began to call an “alpha factory.” In 2007, he formed his own firm, WorldQuant.
Just as our immune system keeps learning about evolving microbes, Igor recognized a similar reality of market regularities: no regularity lasts forever, so there’s a constant need for new alphas to identify new market signals derived from new data sets and new insights. Over time, he developed what he calls “tenets of prediction,” which laid out a kind of mathematical circularity among models, data, and predictive accuracy. He found that with his strategy the accuracy of models tended to rise logarithmically with the number of algorithmic models, so 100 times more models produce, in theory, 10 times better predictions.
WorldQuant is in the business of trying to predict stock prices and market trends. Every model or formula attempts to predict returns based on a very specific piece of information or a very specific way of looking at a piece of information. For example, WorldQuant has several hundred thousand alphas built on relatively simple and publicly available price and volume information. One alpha may look at five-day returns, another at one-day returns, yet another at returns relative to similar stocks. Each different way of looking at data yields a slightly different prediction, which, in turn, interacts with other predictions. Broadly speaking, if combined predictions are uncorrelated, Igor concluded that they increase their power with the square root of the number of predictions. If they are correlated, the increase is less steep, typically following a logarithmic improvement for each new alpha. (The shorthand of logarithms is an essential tool in dealing with exponential growth.) But if you combine an exponential number of logarithmic improvements, you get a linear improvement with the logarithm of alpha quantity. That’s why Igor believes it is an advantage to have millions of alphas. These alphas can reveal a signal that is much stronger than any of the component signals, which cannot be described by a simple formula.
This is a complex endeavor in practice. To predict something based on different data, you can mix the data and then try to predict, or you can make predictions based on each alpha derived from each data set. The latter is much simpler, especially at scale with thousands of data sets and millions of predictions. It’s not the amount of data but the number of quality models built from the data that ultimately drives prediction in a strategy like WorldQuant’s. But the raw number of models is proportional to the number of useful fields in any given data set. This prediction strength is proportional to the logarithm of the volume of quality data, which explains why exponential growth in data is required to keep prediction improving linearly. Quantity is necessary to win the prediction war in the markets, in the immune system, and in any predictive system.
But this begs the question, why do signals come and go? First, markets are fluid and prone to continual and often random change. That fluidity isn’t just a metaphor. Quants borrow a term from physics that describes a turbulent, nonlinear flow—stochastic, which means the flow is somewhat random and therefore can only be analyzed statistically. Markets are clearly nonlinear, and prediction is almost never a black-and-white binary choice, but rather a probability. Second, profitable strategies attract imitators, which may flatten out that probability and the associated profits—a remorseless process in financial markets known as arbitrage. This is why Igor uses the metaphor of a “prediction war” and why getting data is effectively an arms race.
That arbitrage often comes from rivals. Few really good ideas go unnoticed, and if enough professional traders, ceaselessly hunting for mispricings, show up at the same place at the same time, a strong signal will get smothered and leave everyone disappointed, like when everyone goes to the same good restaurant at the same time. More subtly, an organization can find itself inadvertently moving in sync, like sheep in a flock blithely heading over a cliff. Algorithms may contain internal biases in approach that their developers never recognize. A bias can create inadvertent correlations in seemingly diverse strategies, spawning a crowded trade; if such unconscious herding is large enough, it can result in serious losses that spill throughout the market and into the larger economy. The result is that specific strategies suddenly cease to generate profits or that entire markets suddenly decline. These biases are psychologically difficult to recognize or eliminate, and they may be deepened by the overconfidence that you are beyond bias.
Idea Arbitrage
Let’s continue looking at one of the big ideas that shaped MPT: arbitrage. University of Chicago economists had long argued that the ceaseless activities of self-interested professional investors established the market’s essential nature as efficient and rational. As skilled consumers of financial information—in contrast to what the economist and proto-quant Fischer Black, one of the creators of the Black–Scholes options pricing model, referred to as “noise” traders, who operate irrationally or emotionally— these investors are continually searching for market irregularities: stocks that are mispriced, such as two securities with the same underlying asset but two different prices trading on different exchanges. These traders engage in arbitrage—for example, by buying, or going long on, an underpriced security and by selling, or going short on, an overpriced one they bet will fall. Arbitrage is a fundamental aspect of trading, but devotees of the Chicago school viewed it even more broadly as the engine driving the market toward accurate prices—price discovery itself as a form of prediction—thus rendering financial markets efficient (quick to react to changes in information) and rational (driving a securities price to its intrinsic value or true equilibrium price).
The power and centrality of arbitrage have always provoked debate. But the essence of arbitrage is really just the power of competition to produce a winner. By analogy, Darwinian evolution is a competition as well, with survivors representing not greater perfection but expedient adjustment to current conditions; markets are always about current or future conditions. Gather enough agents to freely compete, you may well generate better results than a lone trader, even if that competition isn’t always efficient, rational, or accurate. Igor’s alpha factory was designed to be flexible and expedient and to aggregate many different signals.
In 2018, Igor started the company WorldQuant Predictive to apply the same ideas beyond finance, and to commercialize prediction in other industries. The company used the same basic approach of iterative testing and competition—“idea arbitrage”—as WorldQuant. To predict more effectively, you need algorithms that approach the problem from many different directions. To do that, you need model builders who think creatively and data sets that offer as many different perspectives on a problem as possible. Scale is essential. Through iterative backtesting, the ideas embodied in algorithms compete with one another. Arbitrage occurs, and the best-performing ideas emerge. All this was baked into a formula: ideate (develop ideas), arbitrate (the verbal form of arbitrage), and predict. The same process works for aggregating cancer data mutations, drug combinations, and patient outcomes or even for tracking a novel infection; the models continually need to improve and be adjusted based on the latest data.
Again, Igor’s strategy of amassing large numbers of alphas is just one of many processes that have emerged in the race to make more accurate predictions. The operations of any of these prediction businesses combine human talent, powerful machines, and fire hoses of data. Moving beyond finance to a more diverse set of industries, which is what WorldQuant Predictive and a number of other companies are doing, is not necessarily easy. Health care, transportation, insurance, telecommunications, and consumer products are very different from finance, with its rich historical and market-pricing data, high transaction volumes, fixed numbers of instruments, and many, if weaker, signals. Each of these industries poses different problems, operates with different dynamics, requires different data sets, and possesses different limitations. And it remains an open question as to how many alphas or signals are necessary to build a strong-enough library of predictive algorithms for each one. We learn by doing.
An algorithm, for instance, may be built around the predictive notion that a tech stock that goes up three days in a row is likely to go up on the fourth. That signal may be relatively weak, though: it may be true some of the time, but you wouldn’t necessarily want to bet big on it. But if you get hundreds or thousands of such signals operating together, you might be able to build a model that is right 55 percent of the time, which may be all you need as an investor.
Of course, 55 percent will not work particularly well if you’re making predictions about health outcomes or even trends in, say, grocery shopping. Take a hypothetical insight that could emerge from survey and supermarket data: “A mother with two children buys apple juice only on Wednesdays.” You probably wouldn’t want to bet much money on that. But you can use machine learning across many predictions and “ensemble” them—the term comes from statistics and means a probability distribution of the state of a system that is combined from several models—so that they all are merged into a production model. The output of signals, which themselves are models, produces inputs to generate even stronger models and better outputs. For example, for predicting which species (out of the possible millions) is causing an infection, ensemble methods in genomics can combine the best features of various models to give the best prediction, as was shown in a paper from Chris’s lab in 2017, “Ensemble Approaches for Metagenomic Classifiers.”
An important benefit of the “ideate, arbitrate, predict” process and the ensemble method is that you can integrate signals from very disparate data sets. For instance, you might not think to integrate weather data with credit card data, but you can build ensemble models that come from weather data and credit card data to predict the products people might buy when the weather is nice rather than, say, cold and rainy. In influenza and now COVID-19 models, researchers in Chris’s lab and others now integrate epidemiological data with weather data, transportation data, and data from proxies for social distancing, such as mobile phones. Then predictions can be built to discern whether an area is high risk or low risk for transmission of a virus, such as SARS-CoV-2, but the same models can also be used for other infectious diseases.
Excerpted with permission of MIT Press. The Age of Prediction: Algorithms, AI, and the Shifting Shadows of Risk By Igor Tulchinsky & Christopher E. Mason. Copyright ©2023 by MIT Press.
Igor Tulchinsky is founder, chairman, and CEO of WorldQuant.
Chris Mason is a geneticist and computational biologist who is also Director of the WorldQuant Initiative for Quantitative Prediction