Celiac Disease (CD) Novel Protein Risk Assessment Tool

The Food Allergy Research & Resource Program (FARRP) in the Department of Food Science & Technology, University of Nebraska, has added a new bioinformatics tool to identify Exact Peptide matches between the amino acid sequence of a query protein and the 1,041 naturally occurring, mutated or deamidated (Gln = Q, converted to Glu = E, by tissue transglutaminase) peptides from wheat and wheat relatives (barley, rye and two proteins from oats) that have been demonstrated to elicit celiac disease or activate MHC Class II restricted T cells of subjects with celiac disease. The basis of specificity is due to antigen presentation of these peptides by genetically inherited specific Major Histocompatibility class II receptors HLA DQ2.5 or DQ8 receptors or variants (DQ2.2, DQ8.5) that activate T cells in affected individuals. Proteins derived from the wheat subfamily (Pooideae) of the grass family (Poaceae), which are considered for use as Novel food ingredients or introduced into other species of food crops through genetic engineering may pose a risk for those with celiac disease if they contain celiac active peptides. The database provides a simple screening tool to identify those proteins that might pose a risk of eliciting celiac disease, or are sufficiently similar to CD eliciting proteins/ peptides that further testing would be reasonable to demonstrate safety for consumption by affected individuals.

In addition to the Exact Peptide match, the linked Celiac Disease database also includes a FASTA algorithm to compare the query protein against 76 celiac inducing proteins that are the sources of the peptides and list of 69 published references supporting the inclusion of peptides and proteins in the database. Proteins lacking any identity match to the 1,013 peptides are not likely to trigger celiac disease, however it is possible that not all peptides that can trigger CD are known. Thus FASTA to the 76 proteins adds a level of safety. The FASTA comparison has not (yet) been validated sufficiently to set absolute thresholds of concern for celiac disease. However, preliminary searches with proteins from rice, sorghum, maize and other food sources that are considered safe for those with celiac disease allowed us to establish reasonably conservative guidelines. Identity matches of less than 45 percent over at least one-half of the FASTA aligned CD protein and those with an E score smaller than 1 x 10-16th using this database are unlikely to present a risk of inducing celiac disease. These criteria have been re-reviewed by our committee and seem relevant in April 2022

Note: The first version of this database was available for public use on 14 February, 2012. The peptide entries were updated on 10 August 2017 to include 1030 peptides. Further review completed 21 September 2017 demonstrated the 8 AA peptides are not predictive and they were removed. The update on 21 September 2017 has 2013 peptides in the 2018 version release. In April 2022 we ave updated the database again increasng the number of peptides to 1041 and the number of proteins toIn addition, BLAST of all 9 AA peptides found that some have 100% identity matches to other species unrelated to wheat (Pooideae), but we did not repeat the full validation processes used in 2012 and 2018. Words of CAUTION were added to the data table for those 9 AA sequences that match proteins from sources outside of Pooideae. Additional tests of FASTA criteria were conducted to ensure differentiation of proteins that are unlikely to elicite CD from those that are likely sources of CD peptides will come in the future, after April, 2022.

What is CD? Celiac Disease, also known as gluten-sensitive enteropathy or celiac sprue, is a genetically linked inflammatory immune disease with varying severity in an estimated 0.5% to 1.5% of the population in various geographies. (A, B) Affected individuals experience symptoms after the consumption of food containing proteins from wheat, barley, rye and possibly oats and other grass family grains closely related to wheat.(C) The primary target organ is the upper small intestine and symptoms are usually associated with the digestive tract with chronic diarrhea, abdominal pain, cramping, bloating or irritable bowel syndrome.(D) However, general nutritional deficiency, failure to thrive, mouth ulcers, and fatigue are experienced by many subjects. Continuing exposure to glutens leads to increased immune response, increased expression of tissue transglutaminase and inflammation that leads to flattening of the villi in the small intestine, erosion of the mucosal epithelium and loss of absorptive capacity.(E) Vitamin deficiency is common. Loss of calcium density in bones is associated with the disease and there is an increased risk of developing adenocarcinoma of the small intestine and T-cell lymphoma.

The specificity of the disease is determined by T lymphocytes that bind to specific native or deamidated peptides of certain wheat-family glutens (glutenins and gliadins) that are presented in the antigen presenting groove of MHC class II, HLA DQ2.5 or DQ8, leading to activation of CD4 T cell response driven inflammation involving macrophages, NK cells and other inflammatory cells that cause tissue destruction.(F) Interestingly, while nearly 20% of individuals in North America and Europe express HLA DQ2, and ~ 95% of those with celiac disease have HLA DQ2.5 (and others express DQ2.2, DQ8, DQ8.5 and possibly DQ9), only about 1% of the population has been diagnosed with celiac disease. Refer to the reference list for publications on peptides and MHC restriction. Thus, other unknown factors are also very important determinants leading to celiac disease or tolerance. Many speculate that there is a much higher percent of the population that simply have subtle symptoms or are undiagnosed, but there is little hard evidence to support much greater prevalence of disease than 1% of the global population.

Avoiding the proteins that stimulate the CD immune response is the only effective treatment for those with celiac disease. Since these cereal grains are commonly used not only as major carbohydrate and protein food sources in breads and pasta, but processed wheat and wheat relatives are also used as functional food ingredients in many restaurant and processed foods, making dietary avoidance complex. They may also be used as an ingredient in processed nutritional products including as probiotics and nutritional supplements as fermentation sources. Protection of those with celiac disease requires separation of commodities that are intended for "gluten-free" foods, from the source of the commodity, through processing and packaging. Gluten- free foods are defined as those containing less than 20 parts per million gluten, but that is hard to uniformly prove by testing as different grains have different ratios of gluten-protens. Gluten-free foods must also be labeled clearly and accurately in order to protect the most sensitive affected consumers. Food companies who produce gluten-free foods work hard to source commodities from suppliers with minimal (no) contamination. Interestingly, so far, there is no evidence that proteins from high-quality oats do not affect most celiac patients. One issue is commodity contamination of barley, rye or wheat in commodity oat lots. Some wild-grasses also have gluten related proteins. Oats are often produced on farms that also grow wheat, barley or rye is grown in neighboring farms, producing potential source of contamination. Further, farming equipment and shipping containers (trucks, trains and ships) would often carry wheat, barley or rye and can serve as sources of contamination. In addition, commodity processing and food manufacturing facilities are often used for products that contain wheat, so accurate segregation is difficult. Consumers with celiac disease must trust food producers to accurately represent foods as being free from gluten and there are stringent standards for claiming "gluten-free" in many industrialized countries.

Evaluating Genetically Modified Organisms and Novel Food Ingredients

In order to help ensure that those with gluten-sensitivity would not be at greater risk of exposure, regulatory guidelines for genetically modified crops recommend that the proteins encoded by genes transferred from wheat and wheat relatives (members of the Pooideae subfamily) into different food sources (e.g. rice, maize, potato), should be evaluated regarding their capacity to elicit celiac disease. (G) FARRP believes that the current Celiac Disease Peptide and Protein searchable database provides an efficient screening tool to determine whether additional tests (e.g. laboratory T cell activation tests using samples from individuals with CD or performing tissue biopsy challenge or clinical challenge from volunteers with CD) would need to be undertaken to demonstrate safety of a new protein. Proteins isolated from wheat and wheat relatives for use as novel food ingredients could also be assessed using this computer-comparison.

Proteins that do not contain an exact peptide match to those identified in this database are unlikely to induce symptoms in those with celiac disease. The FASTA search routine is provided as an added safety measure in case not all CD active peptides are known. In the event a candidate food protein from a member of the Pooideae subfamily matches of the 75 proteins in the CD database, with an identity of at least 45% over at least one-half of the length of the CD inducing protein, with an E score smaller than 1 x 10-15th , it would be prudent to perform additional tests to rule out any risk the protein might cause CD.

Compilation and Review of FARRP Celiac Disease Peptide and Protein database

Plaimein Amnuaycheewa, PhD, compiled the original set of probable CD active peptides from his review of approximately 100 publications describing proteins and peptides (1,016) as part of his dissertation research. The update was provided by re-review of original publications and newer publications up until 2017 to end with a total of 1013 peptides after removing peptides smaller than 9 AA. The publications provide data these peptides have been tested for T cell activation potential or induction of celiac enteropathy. The list of 72 celiac associated wheat-related proteins was compiled as representing proteins containing one or more of the peptides. John Wise compiled the data for the 2012 database and structured the database and search routines and updated newer versions. The MHC-II binding peptides listed by EFSA (2017) were identified and renamed by Sollid et al., in 2012. The dataset was reviewed by Afua O. Tetteh, PhD, Plaimein Amnuaycheewa, PhD, Barbara Bohle, PhD, Fatima Ferreira, PhD, Frits Koning, PhD and Richard Goodman, PhD. The effort for the 2012 database was funded primarily by FARRP and partly by the six biotechnology companies that fund the FARRP AllergenOnline.org database. Funding in 2017 has been provided by FARRP, two major biotech companies (BASF and Pioneer), JR Simplot, NuSeed and Unilever. Funding in 2018 is primarily from FARRP, with help from NuSeed and Unilever. The April 2022 version 3 was funded by FARRP with some help from Unilever and NuSeed.


  • A. Biagi F, Klersy C, Balduzzi D, Corazza GR. 2010. Are we not over-estimating the prevalence of coeliac disease in the general population? Annals of Medicine 42:557-561. PMID:20883139
  • B. Abadie V, Sollid LM, Barreiro LB, Jabri B. 2011. Integration of genetic and immunological insights into a model of celiac disease pathogenesis. Annual Reviews in Immunology. 29:493-525. PMID 21219178
  • C. Tye-Din JA, Stewart JA, Dromey JA, Beissbarth T, van Heel DA, Tatham A, Hederson K, Mannering SI, Gianfrani C, Jewell DP, Hill AVS, McCluskey J, Rossjohn J, Anderson RP. 2010. Comprehensive, quantitative mapping of T cell epitopes in gluten in celiac disease. Science Translational Medicine 2(41):41ra51. PMID:20650871
  • D. Scanlon SA, Murray JA. 2011. Update on celiac disease-etiology, differential diagnosis, drug targets, and management advances. Clinical and Experimental Gastroenterology. PMID:22235174
  • E. Sollid LM, Jabri B. 2011. Celiac disease and transglutaminase 2: a model for posttranslational modification of antigens in HLA association in the pathogenesis of autoimmune disorders. Current Opinion in Immunology 23:732-738. PMID: 21917438.
  • F. Kagnoff MF. 2007. Celiac disease: pathogenesis of a model immunogenetic disease. Journal of Clinical Investigation. 117:41-49. PMID:17200705
  • G. Codex (2003). Codex Alimentarius Guidelines. Alinorm 03/34, Joint FAO/WHO Food Standards Programme, Twenty-Fifth Session (FA), Rome, Italy
  • H. Sollid LM, Qiao S-W, Anderson RP, Gianfrani C, Koning F. (2012). Nomenclature and listing of celiac disease relevant gluten T-cell epitopes restricted by HLA-DQ molecules. Immunogenetics. 64:455-460.
  • EFSA. EFSA GMO Panel (EFSA Panel on Genetically Modified Organisms) (2017). Naegli H, Birch AN, Casacuberta J et al., EFSA Journal 2017;15(5):4862, 49 pp. https://doi.org/10.2903/j.efsa.2017.4862