Beyond AI+Bio — Emergent Biotechs
Over the last 30 years, the scale of biological discovery has been a major driver of biotech success. We are entering a new era, where leveraging the right data at the right scale with the right computational tools can feed emergent capabilities that qualitatively change the nature of the biotech discovery paradigm.
The huge increase in the scale of biological discovery has allowed scientists to generate and then address many hypotheses in parallel. High throughput screening originated at Pfizer in the mid-1980s and has since revolutionized discovery approaches for a wide range of applications. Sequencing costs have plummeted by orders of magnitude in the last 20 years which has dramatically reduced marginal experimental costs. More recently, tools such as pooled CRISPR screens have provided a way for researchers to collect causal genetic information genome-wide.
While these approaches have been transformative, they still operate in the linear regime, in which more wells, more conditions, or more targets mean more discovery potential. Today, most high throughput discovery methods are primarily filters. The bigger your upstream funnel, the better your chances of finding something interesting.
We’re seeing a new paradigm, in which at-scale data generation is being used to train models, and these models encapsulate information about how a given biological system operates in an incredibly dense way. The result is that now we can generate new biological or therapeutic designs, not just filter them.
At the heart of this paradigm is nonlinear scaling where you see the following dynamic: more data improves your ability to model a behavior. That modeling allows you to collect more informative data, translating to an even better model, and so on. You can think of this as compound interest: in the short term, the compounding doesn’t seem to do much, but in the long term, the compounding has an astronomical effect. In biological discovery, this dynamic means that those executing on this paradigm will appear status-quo for a time. However, once they accumulate advantages, those advantages will only ever increase.
Beyond AI+Bio: This trend goes way beyond just applying AI and machine learning (ML) to biology. Most early applications of these methods to discovery have been for data analysis. These methods are well suited for interpreting signals from images, large genomic datasets, and clinical data that can otherwise be unwieldy. Instead, if datasets are custom-generated to feed AI and ML models in the right way, those trained models can encapsulate design rules. Then, if those design rules are used to generate new designs, you’ve just eliminated a massive amount of experimental overhead.
Why these emergent biotechs change the game
- The scale of discovery becomes what you can model, not just what you can directly screen.
- New discovery opportunities open up that were previously off-limits due to insufficient experimental scale.
- Positive feedback loops between data and models will create incredibly strong first-mover advantages.
- Due to the compounding of data, and reinforcement of models, the marginal cost of initiating new discovery programs decreases over time, potentially opening up new business models.
Evaluating emergent biotechs: There is no standard playbook for this new flavor of biotech company, so realizing a company’s full potential requires both the company and investors to understand the underlying dynamics and effectively communicate. Companies must recognize that due to the nonlinearity of this paradigm and the time required to generate sufficient training data, in the early days they are asking investors and employees to use their imaginations and see the vision of the future. That said, companies can stand out if the data generation itself is already differentiated or commercially useful and can be used to demonstrate traction.
Meanwhile, investors must get comfortable with the fact that due to nonlinear scaling of these platforms, there will be very strong first-mover advantages that will come about via sudden tipping points. Investors must understand these dynamics if they want to engage with companies when the price is low, before those tipping points.
At Modulus, we’re thrilled to be at the center of this discovery revolution. Our major engineering goal is to understand the vast combinatorial design space of cell therapies. Because that search space is so large, it’s critical that we employ techniques that learn design rules versus executing brute force search.
We are still in the early days of generating game-changing datasets, but we are well on our way to hitting the first of many tipping points on our nonlinear trajectory.