Three technologies to watch in 2024
Deep learning for protein design
Two decades ago, David Baker at the University of Washington in Seattle and his colleagues achieved a landmark feat: they used computational tools to design an entirely new protein from scratch. "Top' folded as predicted, but it was inert: it performed no meaningful biological functions. Today, de novo protein design has matured into a practical tool for generating made-to-order enzymes and other proteins. "It's hugely empowering," says Neil King, a biochemist at the University of Washington who collaborates with Baker's team to design protein-based vaccines and vehicles for drug delivery. "Things that were impossible a year and a half ago — now you just do it."
Much of that progress comes down to increasingly massive data sets that link protein sequence to structure.
But sophisticated methods of deep learning, a form of artificial intelligence (Al), have also been essential.
Sequence based' strategies use the large language models (LLMs) that power tools such as the chatbot ChatGPT (see 'ChatGPT? Maybe next year'). By treating protein sequences like documents comprising polypeptide
'words', these algorithms can discern the patterns that underlie the architectural playbook of real-world proteins. "They really learn the hidden grammar," says Noelia Ferruz, a protein biochemist at the Molecular Biology Institute of Barcelona, Spain. In 2022, her team developed an algorithm called ProtGPT2 that consistently comes up with synthetic proteins that fold stably when produced in the laboratory. Another tool co-developed by Ferruz, called ZymCTRL, draws on sequence and functional data to design members of naturally occurring enzyme families.
Sequence-based approaches can build on and adapt existing protein features to form new frameworks, but they're less effective for the bespoke design of structural elements or features, such as the ability to bind specific targets in a predictable fashion.
Structure based' approaches are better for this. and 2023 saw notable progress in this type of protein-designalgorithm, too. Some of the most sophisticated of these use 'diffusion' models, which also underlie image-generating tools such as DALL-E. These algorithms are initially trained to remove computer-generated noise from large numbers of real structures; by learning to discriminate realistic structural elements from noise, they gain the ability to form biologically plausible, user-defined structures.
RFdiffusion software developed by Baker's lab and the Chroma tool by Generate Biomedicines in Somerville, Massachusetts, exploit this strategy to remarkable effect. For example, Baker's team is using RFdiffusion to engineer novel proteins that can form snug interfaces with targets of interest, yielding designs that "just conform perfectly to the sur- face," Baker says. A newer
'all atom' iteration of RFdiffusion allows designers to computationally shape proteins around non-protein targets such as DNA, small molecules and even metal ions. The resulting versatility opens new horizons for engineered enzymes,
transcriptional regulators, functional biomaterials and more.
Deepfake detection
The explosion of publicly available generative Al algorithms has made it simple to synthesize convincing, but entirely artificial images, audio and video. The results can offer amusing distractions, but with multiple ongoing geopolitical conflicts and a US presidential election on the horizon, opportunities for weaponized media manipulation are rife.
Siwei Lyu, a computer scientist at the University at Buffalo in New York, says he's seen numerous AI-generated 'deepfake' images and audio related to the Israel-Hamas conflict, for instance. This is just the latest round in a high-stakes game of cat-and-mouse in which Al users produce deceptive content and Lyu and other media-forensics specialists work to detect and intercept it.
One solution is for generative-Al developers to embed hidden signals in the models' output, producing watermarks of Al-generated content. Other strategies focus on the content itself. Some manipulated videos, for instance, replace the facial features of one public figure with those of another, and new algorithms can recognize artefacts at the boundaries of the substituted features, says Lyu. The distinctive folds of a person's outer ear can also reveal mismatches between a face and a head, whereas irregularities in the teeth can reveal edited lip-sync videos in which a person's mouth was digitally manipulated to say something that the subject didn't say. Al-generated photos also present a thorny challenge - and a moving target. In 2019, Luisa Verdoliva, a media-forensics specialist at University Federico II of Naples, Italy, helped to develop FaceForensics++, a tool for spotting faces manipulated by several widely used software packages. But image-forensic methods are subject- and software-specific, and generalization is a challenge. "You cannot have one single universal detector — it's very difficult," she says.
And then there's the challenge of implementation. The US Defense Advanced Research Projects Agencv's Semantic Forensics (Sema- For) programme has developed a useful tool- box for deepfake analysis, but, as reported in Nature (see Nature 621, 676-679; 2023) major social-media sites are not routinely employing it.
Broadening the access to such tools could help to fuel uptake, and to this end Lyu's team has developed the DeepFake-O-Meter, a centralized public repository of algorithms that can analyse video content from different angles to sniff out deepfake content. Such resources will be helpful, but it is likely that the battle against Al-generated misinformation will persist for years to come.
Large-fragment DNA insertion
In late 2023, US and UK regulators approved the first-ever CRISPR-based gene-edit- ing therapy for sickle-cell disease and transfusion-dependent B-thalassaemia
— a major win for genome editing as a clinical tool.
CRISPR and its derivatives use a short programmable
RNA to direct a DNA-cutting enzyme such as Cas9 to a specific genomic site. They are routinely used in the lab to disable defective genes and introduce small sequence changes. The precise and program- mable insertion of larger DNA sequences span- ning thousands of nucleotides is difficult, but emerging solutions could allow scientists to replace crucial segments of defective genes or insert fully functional gene sequences. Le Cong, a molecular geneticist at Stanford Uni- versity in California and his colleagues are exploring single-stranded annealing proteins (SSAPs) - virus-derived molecules that mediate DNA recombination. When combined with a CRISPR-Cas system in which the DNA-slicing function of Case has been disabled, these SSAPs allow precisely targeted insertion of up to 2 kilobases of DNA into the human genome.
Other methods exploit a CRISPR-based method called prime editing to introduce short 'landing pad' sequences that selectively recruit enzymes that in turn can precisely splice large DNA fragments into the genome. In 2022, for instance, genome engineers Omar Abudayyeh and Jonathan Gootenberg at the Massachusetts Institute of Technology, Cambridge and their colleagues first described programmable addition through site-specific targeting ele- ments (PASTE), a method that can precisely insert up to 36 kilobases of DNA8. PASTE is especially promising for ex vivo modification of cultured, patient-derived cells, says Cong, and the underlying prime-editing technology is already on track for clinical studies. But for in vivo modification of human cells, SSAP might offer a more compact solution: the bulkier PASTE machinery requires three separate viral vectors for delivery, which could undermine editing efficiency relative to the two-compo-nent SSAP system. That said, even relatively inefficient gene-replacement strategies could be sufficient to mitigate the effects of many genetic diseases.
And such methods are not just relevant to human health.
Researchers led by Caixia Gao at the Chinese Academy of Sciences in Beijing developed PrimeRoot, a method that uses prime editing to introduce specific tar- get sites that enzymes can use to insert up to 20 kilobases of DNA in both rice and maize. Gao thinks that the technique could be broadly use- ful for endowing crops with disease and patho- gen resistance, continuing a wave of innovation in CRISPR-based plant genome engineering. "I believe that this technology can be applied in any plant species," she says.