27 February 2025 / Published in News

January / February 2025 Newsletter

I think it is fair to say that Artificial Intelligence (AI) is no longer the next big thing – rather it is the current big thing. Keeping tabs on ever-improving large language models (LLMs), protein structure prediction, foundational models for data analysis, graph neural networks and everything in between is overwhelming. I have an uneasy, yet strangely exhilarating feeling that with the recent evolution of AI we are holding a tiger by the tail, living in a new wild west. How much is hype? How much is real? How useful is it in the context of biological research? In my role as a computational biologist within the Centre these questions cannot just be idle curiosities. It is imperative that I stay on the pulse of such game changing technologies. Here I’ll highlight some of the ways that I am using AI for biological research.

The simplest and easiest way to apply AI in research is by asking questions of publicly available LLMs such as OpenAI’s GPT, Google’s Gemini or Anthropic’s Claude. LLMs are generative models, which means they don’t simply ‘look up’ an answer from a database. Rather they generate a realistic sounding answer based on having seen millions of tracts of relevant text and data during their training. Which means that they can completely make stuff up. Fantastic for creatives, but not so desirable in the world of scientific research! Nevertheless, I have found LLMs are effective at introducing new subjects, explaining complex concepts at whatever level you need, helping you with a tricky piece of coding, or leading you in the right direction to gain expertise. If prompted cleverly (it is worth reading some articles on prompt engineering), then LLMs are like having a personalised tutor on call 24/7.

Recently, for example, I needed to modify the ID of every protein sequence in a large file (FASTA format) in a very specific manner. I knew from experience that a good tool for this task was ‘sed’ in linux, but the sed syntax here was going to get very ugly. In the past I would have hacked around with the help of Stack Overflow, eventually solving the problem. Instead, I informed Gemini that it was an experienced bioinformatician, carefully explained the challenge to it and completed the task in a few minutes. It’s worth noting that I already knew the bioinformatics terminology to best describe things to the LLM, because therein lies the rub – in my opinion LLMs enhance productivity most in areas you already know well. Just maintain a healthy dose of scepticism. You wouldn’t trust everything the post-doc in the next office says, and neither should you trust the LLM!

A more complex use case is the laborious task of annotating scientific observations with tags or ontology terms. Annotation requires strong domain knowledge so that the correct ontology terms are assigned to an observation. For example, last year I downloaded thousands of descriptions of phenotypes observed when specific genes were mutated in various plant species. The descriptions, coming from thousands of studies, had no consistency of style or terminology. If I could assign Plant Trait Ontology (TO) terms to them then I could unify the information for downstream analyses, but how to do that efficiently? Available tools that rely on text-mining proved to be very poor at this task. What I needed was a tool with reasoning capabilities. To cut a long story short, I carefully scripted a Retrieval Augmented Generator pipeline using the OpenAI API to ask GTP-4o to select the best TO terms for each phenotype. By using its language reasoning and concept recognition skills, GPT proved highly adept at auto-annotation in this way. It is not quite good enough to replace a domain expert, but it removes a huge chunk of the drudge work. Check out my newly published paper here.

Finally, what about reading, writing and reviewing? As I demonstrated recently at the Centre’s writing workshop, increasingly sophisticated AI tools are now available to automatically find relevant papers, summarise them, explain their findings, take notes, and even synthesize the knowledge into a review. Google’s NotebookLM product goes a step further and can generate highly engaging podcasts discussing the papers of interest. This is a fantastic way to keep on top of your endless “to read” pile of unread papers. I use it as a filter to find papers that I should read in more detail, because nothing quite beats reading a key paper yourself! I predict that soon we will see a general shift away from incumbent literature management software like Zotero and Endnote to AI-driven ones like NotebookLM and SciSpace. Here, once again, AI is a great tool for assisting you as a scientist – but it does not replace you.

On that note, I asked GPT, Gemini and Copilot to give me their 10 commandments of AI for biological research. All three produced the same for #1 (with some wording differences): “thou shalt not abdicate thy scientific responsibility to the machine”. AI is a tool. It should assist you, not replace you. Think back to the introduction of the electronic calculator. No doubt there were concerns that math skills would decline (“no one will do long division anymore! The horror!”), but instead it let scientists focus on answering the bigger questions rather than getting bogged down in the weeds calculating 3.64² and log(212). What a massive force-multiplier calculators and computers have been. Now it is the turn of AI.

David Kainer
Centre-wide Senior Research Fellow, The University of Queensland

January / February 2025 Newsletter

January / February 2025 Newsletter

READ MORE >

What you can read next

November 2022 Newsletter

January 2022 Newsletter

July 2024 Newsletter

sign up to our newsletter