Skip to main content

Novel Artificial Intelligence approaches to distinguish allergenic from non-allergenic proteins

NIAID - National Institute of Allergy and Infectious Diseases

open

About This Grant

ABSTRACT Allergic diseases have increased dramatically to become the most common human disorders in developed countries. Despite improved clinical diagnosis and management, they are the leading cause of work and school absences. The prevalence of food allergy is rising for unclear reasons, with prevalence estimates in the developed countries around 10%. Peanut allergy affects 1–2% of the population in Westernized nations and is a leading cause of food-induced anaphylaxis. Specifying and maintaining elimination diets places a significant burden on patients, families and healthcare systems. The possibility of anaphylactic responses by sensitized individuals complicates food manufacturing, as the products should be free of unmarked allergy triggers, including peanuts, tree nuts, wheat, soy, fish, shellfish, eggs and milk. Thus, there is an unmet need to identify allergen features and find proteins which could cause cross-reactions in sensitive individuals among the vast number of proteins now catalogued in proteome databases. A first version of a novel machine learning (ML) tool, AllergenAI, which used only amino acid sequences in three allergen protein databases, SDAP 2.0, COMPARE and AlgPred, achieved robust results. To further improve this ML model, we will use Alphafold-2 to model all proteins in these three databases and incorporate the 3D structural information into AllergenAI (Aim1). The predictive abilities of AllergenAI will be experimentally assessed in Aim2, by analyzing the features scores for well-studied vicilin allergens, which are among the most common allergens (with over 20 entries in the Structural Database of Allergenic Proteins (SDAP), including the major allergen Ara h 1 of peanuts and its homologues in tree nuts, legumes and cotton). The program’s ability to distinguish homologues in the very broad vicilin protein family that could cause cross-reactions with IgE in allergic patient sera will be tested. The weight parameters of AllergenAI will be fine-tuned by using a Large Language Model (LLM) ESM-2 in Aim 3. The optimized AllergenAI will be benchmarked by comparison to other prediction methods for allergenic proteins that are based on sequence alone. We hypothesize that artificial intelligence (AI) technologies that made dramatic improvements in recent years will clarify the problem of “what makes a protein allergenic”. The application of AI technology to allergen research is novel. Our combined experimental and computational approach will yield a powerful new method to identify potential allergenic characteristics in new proteins and help design better immunotherapies for allergenic diseases. The source code for the optimized AllergenAI model, documentation for use and example input files will be made available from our SDAP 2.0 website, which is used extensively by researchers and clinicians throughout the world.

Focus Areas

health research

Eligibility

universitynonprofithealthcare org

How to Apply

Funding Range

Up to $429K

Deadline

2031-01-31

Complexity
high

One-time $749 fee · Includes AI drafting + templates + PDF export

AI Requirement Analysis

Detailed requirements not yet analyzed

Have the NOFO? Paste it below for AI-powered requirement analysis.

0 characters (min 50)