The inflationary model of the early universe describes a period where the volume of space expanded exponentially before slowing down. Similarly, AI-augmented HPC (AHPC) is expanding the computational universe, opening new spaces previously inaccessible to traditional HPC numerical methods. This shift is revolutionizing scientific computations by making complex problems solvable.
Predicting the Future in Numeric Computation
In the realm of numeric computation, future predictions often rely on past data. This involves extending lines from past performance to forecast future capabilities of supercomputers running HPC benchmarks. These predictions help set realistic expectations by highlighting computational efficiencies and bottlenecks. However, the linear progression of HPC is entering an inflationary period driven by generative AI, specifically large language models (LLMs).
The Role of LLMs in HPC
Well-trained LLMs can identify relationships or features within data that are unfamiliar to scientists and engineers. For instance, an LLM can recognize the concept of “speed” across various contexts, such as automobiles, animals, and computers, and draw meaningful analogies. LLMs are adept at uncovering “dark features” in data—relationships or characteristics that are not immediately apparent.
Surrogate Models and Computational Space Expansion
AI-augmented HPC leverages these dark features through surrogate models, providing shortcuts to potential solutions. These models can significantly reduce the computational effort required by narrowing down the field of viable solutions. Instead of exploring 10,000 potential pathways, LLMs can suggest the best candidates, transforming previously intractable problems into solvable ones.
Foundational Models and NP-Hard Problems
Creating foundational models is computationally expensive, but their results can be tested relatively quickly. AI-augmented HPC utilizes these models to assist traditional HPC domains, optimizing solution spaces and reducing computation time. This approach is leading to remarkable breakthroughs in various scientific fields.
Programmable Biology: EvolutionaryScale ESM3
The holy grail of biological science is understanding and manipulating DNA sequences, protein structures, and cellular functions. Combining these elements could revolutionize programmable biology, leading to new medicines and treatments. EvolutionaryScale’s ESM3 model represents a significant step towards this goal.
Advancements in Protein Engineering
ESM3, trained on 2.8 billion protein sequences, offers substantial improvements over previous models. As a proof of concept, EvolutionaryScale generated a new green fluorescent protein (GFP) with a sequence only 58% similar to known GFPs, yet with comparable brightness. This achievement is equivalent to simulating over 500 million years of evolution.
Ethical Considerations and Open Development
While ESM3 holds immense potential, it also raises safety concerns. The ability to create new biological entities must be handled responsibly to prevent misuse. EvolutionaryScale has committed to open development, sharing their models and code on GitHub and collaborating with independent research efforts.
Weather and Climate Prediction: Microsoft ClimaX
Microsoft’s ClimaX model is the first foundational model for weather and climate science, available as open-source software. Traditional numerical weather models rely on differential equations and HPC systems, but ClimaX offers an alternative through machine learning.
Performance and Applications
Based on a modified Vision Transformer (ViT) model, ClimaX can be fine-tuned for various prediction tasks. It outperforms state-of-the-art systems on several benchmarks, demonstrating its utility in weather and climate modeling.
COVID-19 Variant Search at Argonne
Researchers at Argonne National Laboratory developed an LLM to track SARS-CoV-2 variants. Viruses evolve through mutations, and some variants become more dangerous or transmissible. Predicting these variants is challenging due to the vast number of possible mutations.
Genome-Scale Language Model
Argonne’s GenSLM model analyzes COVID-19 genes to rapidly identify variants of concern (VOCs). Trained on a year’s worth of genome data, GenSLM distinguishes between viral strains and predicts potential VOCs, streamlining a process that traditionally required extensive labor and time.
Expanding the Scientific Universe
The examples of ESM3, ClimaX, and GenSLM illustrate how AI-augmented HPC is expanding the computational space of various scientific domains. Building and running LLM foundational models is becoming more accessible, allowing practitioners to explore new frontiers. The universe of science and technology is poised for exponential growth, driven by the capabilities of AI-augmented HPC.