Top Banner
SNP haplotypes of the BADH1 gene and their association with aroma in rice (Oryza sativa L.) Anuradha Singh Pradeep K. Singh Rakesh Singh Awadhesh Pandit Ajay K. Mahato Deepak K. Gupta Kuldeep Tyagi Ashok K. Singh Nagendra K. Singh Tilak R. Sharma Received: 5 July 2009 / Accepted: 2 March 2010 / Published online: 21 March 2010 Ó Springer Science+Business Media B.V. 2010 Abstract Betaine aldehyde dehydrogenase (BADH) is a key enzyme involved in the synthesis of glycin- ebetaine—a powerful osmoprotectant against salt and drought stress in a large number of species. Rice is not known to accumulate glycinebetaine but it has two functional genes coding for the BADH enzyme. A non- functional allele of the BADH2 gene located on chromosome 8 is a major factor associated with rice aroma. However, similar information is not available regarding the BADH1 gene located on chromosome 4 despite the similar biochemical function of the two genes. Here we report on the discovery and validation of SNPs in the BADH1 gene by re-sequencing of diverse rice varieties differing in aroma and salt tolerance. There were 17 SNPs in introns with an average density of one per 171 bp, but only three SNPs in exons at a density of one per 505 bp. Each of the three exonic SNPs led to changes in amino acids with functional significance. Multiplex SNP assays were used for genotyping of 127 diverse rice varieties and landraces. In total 15 SNP haplotypes were identified but only four of these, corresponding to two protein haplotypes, were common, representing more than 85% of the cultivars. Determination of population structure using 54 random SNPs classified the varieties into two groups broadly corresponding to indica and japonica cultivar groups, aromatic varieties clustering with the japonica group. There was no association between salt tolerance and the common BADH1 haplotypes, but aromatic varieties showed specific association with a BADH1 protein haplotype (PH2) having lysine 144 to asparagine 144 and lysine 345 to glutamine 345 substitutions. Protein modeling and ligand docking studies show that these two substitu- tions lead to reduction in the substrate binding capacity of the BADH1 enzyme towards gamma-aminobutyr- aldehyde (GABald), which is a precursor of the major aroma compound 2-acetyl-1-pyrroline (2-AP). This association requires further validation in segregating populations for potential utilization in the rice breeding programs. Keywords BADH1 Á Oryza sativa Á SNP haplotypes Á Aroma Electronic supplementary material The online version of this article (doi:10.1007/s11032-010-9425-1) contains supplementary material, which is available to authorized users. A. Singh Á P. K. Singh Á R. Singh Á A. Pandit Á A. K. Mahato Á D. K. Gupta Á K. Tyagi Á N. K. Singh (&) Á T. R. Sharma Rice Genome Laboratory, National Research Centre on Plant Biotechnology, Indian Agricultural Research Institute, New Delhi 110012, India e-mail: [email protected] Present Address: R. Singh National Bureau of Plant Genetic Resources, New Delhi 110012, India A. K. Singh Division of Genetics, Indian Agricultural Research Institute, New Delhi 110012, India 123 Mol Breeding (2010) 26:325–338 DOI 10.1007/s11032-010-9425-1
14

SNP haplotypes of the BADH1 gene and their association with aroma in rice (Oryza sativa L.)

Jan 29, 2023

Download

Documents

Rakesh Singh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: SNP haplotypes of the BADH1 gene and their association with aroma in rice (Oryza sativa L.)

SNP haplotypes of the BADH1 gene and their associationwith aroma in rice (Oryza sativa L.)

Anuradha Singh • Pradeep K. Singh • Rakesh Singh • Awadhesh Pandit •

Ajay K. Mahato • Deepak K. Gupta • Kuldeep Tyagi • Ashok K. Singh •

Nagendra K. Singh • Tilak R. Sharma

Received: 5 July 2009 / Accepted: 2 March 2010 / Published online: 21 March 2010

� Springer Science+Business Media B.V. 2010

Abstract Betaine aldehyde dehydrogenase (BADH)

is a key enzyme involved in the synthesis of glycin-

ebetaine—a powerful osmoprotectant against salt and

drought stress in a large number of species. Rice is not

known to accumulate glycinebetaine but it has two

functional genes coding for the BADH enzyme. A non-

functional allele of the BADH2 gene located on

chromosome 8 is a major factor associated with rice

aroma. However, similar information is not available

regarding the BADH1 gene located on chromosome 4

despite the similar biochemical function of the two

genes. Here we report on the discovery and validation

of SNPs in the BADH1 gene by re-sequencing of

diverse rice varieties differing in aroma and salt

tolerance. There were 17 SNPs in introns with an

average density of one per 171 bp, but only three SNPs

in exons at a density of one per 505 bp. Each of the

three exonic SNPs led to changes in amino acids with

functional significance. Multiplex SNP assays were

used for genotyping of 127 diverse rice varieties and

landraces. In total 15 SNP haplotypes were identified

but only four of these, corresponding to two protein

haplotypes, were common, representing more than

85% of the cultivars. Determination of population

structure using 54 random SNPs classified the varieties

into two groups broadly corresponding to indica and

japonica cultivar groups, aromatic varieties clustering

with the japonica group. There was no association

between salt tolerance and the common BADH1

haplotypes, but aromatic varieties showed specific

association with a BADH1 protein haplotype (PH2)

having lysine144 to asparagine144 and lysine345 to

glutamine345 substitutions. Protein modeling and

ligand docking studies show that these two substitu-

tions lead to reduction in the substrate binding capacity

of the BADH1 enzyme towards gamma-aminobutyr-

aldehyde (GABald), which is a precursor of the major

aroma compound 2-acetyl-1-pyrroline (2-AP). This

association requires further validation in segregating

populations for potential utilization in the rice breeding

programs.

Keywords BADH1 � Oryza sativa �SNP haplotypes � Aroma

Electronic supplementary material The online version ofthis article (doi:10.1007/s11032-010-9425-1) containssupplementary material, which is available to authorized users.

A. Singh � P. K. Singh � R. Singh � A. Pandit �A. K. Mahato � D. K. Gupta � K. Tyagi �N. K. Singh (&) � T. R. Sharma

Rice Genome Laboratory, National Research Centre

on Plant Biotechnology, Indian Agricultural Research

Institute, New Delhi 110012, India

e-mail: [email protected]

Present Address:R. Singh

National Bureau of Plant Genetic Resources,

New Delhi 110012, India

A. K. Singh

Division of Genetics, Indian Agricultural Research

Institute, New Delhi 110012, India

123

Mol Breeding (2010) 26:325–338

DOI 10.1007/s11032-010-9425-1

Page 2: SNP haplotypes of the BADH1 gene and their association with aroma in rice (Oryza sativa L.)

Introduction

Glycinebetaine (GB) is a compatible organic solute

synthesized in response to salt, drought and temper-

ature stress in a large number of species including

microbes, animals and plants. The enzyme betaine

aldehyde dehydrogenase (BADH, E.C. No. 1.2.1.8) is

involved in the synthesis of GB from its precursor

betaine aldehyde. Many flowering plants, e.g. man-

grove, spinach, amaranth, barley and sorghum, are

proven betaine accumulators and tolerate salt and

drought stress partly through this mechanism, but

other species like tobacco, tomato and rice are

considered non-accumulators of GB (Ishitani et al.

1993; Rathinasabapathi et al. 1993; Shirasawa et al.

2006). Transformation of the Badh gene from bac-

terial and plant sources into betaine-deficient plant

species has resulted in accumulation of GB in their

system and consequent acquisition of tolerance to salt

and drought stress (Liang et al. 1997; Mohanty et al.

2002). BADH synthesis is up-regulated several-fold

in response to salt and drought stress in spinach,

barley and sorghum leaves (Weretilnyk and Hanson

1990; Ishitani et al. 1995; Wood et al.1996).

Rice (Oryza sativa L.) is thought to be a non-

accumulator of GB but it does express BADH at low

levels (Fitzgerald et al. 2008). This evokes an interest

in the phylogenic evolution of this enzyme in rice and

a search for variation in the BADH gene sequence in

the rice germplasm and its association with important

traits. Rice has two functional genes coding for the

BADH enzyme: BADH1 gene located on chromo-

some 4 and BADH2 gene on chromosome 8. Both the

genes have 15 exons that show high sequence

homology to their orthologs in other species. Rice

BADH1 is induced by salt and water stresses whereas

the BADH2 gene is expressed constitutively at low

levels. Expression of both genes also appears to be

regulated by post-translational processing directed by

paired short direct repeats in response to stress (Niu

et al. 2007). Addition of GB or choline to the culture

media showed increased accumulation of GB in rice

seedlings, which was strongly associated with water

use efficiency and maximum quantum yield of PSII

(Yang et al. 2005).

BADH has also been implicated in the develop-

ment of aroma in basmati and jasmine rice. The

concentration of the most important rice aroma

compound 2-acetyl-1-pyrroline (2-AP) is controlled

by a recessive gene for fragrance (fgr) mapped on

rice chromosome 8. The accumulation of 2-AP in

aromatic rice is explained by the loss of function

mutations in the BADH2 gene (Bradbury et al. 2008;

Chen et al. 2008). At least ten non-functional alleles

of the BADH2 gene have now been identified (Shi

et al. 2008; Sakthivel et al. 2009; Kovach et al. 2009).

It is anticipated that the BADH1 gene could also be

involved in the development of rice aroma in a way

similar to the BADH2 gene because, except for the

stability and regulation of the enzyme, the biochem-

ical function and substrate specificity of the BADH

enzymes coded by the two loci is quite similar

(Bradbury et al. 2008). A minor quantitative trait

locus (QTL) for aroma has been mapped on rice

chromosome 4 that is co-localized with the BADH1

gene, although no studies are available on the allelic

variation of the BADH1 gene in the rice germplasm

and its possible relationship with the aroma (Lorieux

et al. 1996; Amarawathi et al. 2008),

Analysis of single nucleotide polymorphism (SNP)

and small insertion/deletion (InDel), which are the

basis of most differences between alleles, has been

simplified by recent developments in sequencing

technology. Due to the ability of SNPs to generate

numerous markers within a target region and the

availability of high-throughput SNP assays, SNP

genotyping is becoming a valuable tool for gene

mapping, map-based cloning, and marker-assisted

selection in crops (Rafalski 2002; Belo et al. 2008).

In comparison to individual SNPs, haplotype analysis

of a group of linked SNPs is more informative in

determining association with the phenotypes.

The objectives of the present study were to: (1)

identify SNP variation in the BADH1 gene from 16

diverse rice varieties with different levels of salt

tolerance and aroma and (2) develop high-throughput

SNP genotyping assays and their application to a

large set of rice varieties to study possible association

of the BADH1 haplotypes with salt tolerance and

aroma.

Materials and methods

Plant material

Sixteen rice varieties used for the study of polymor-

phism in the BADH1 gene were collected from the

326 Mol Breeding (2010) 26:325–338

123

Page 3: SNP haplotypes of the BADH1 gene and their association with aroma in rice (Oryza sativa L.)

Indian Agricultural Research Institute and the

National Bureau of Plant Genetic Resources

(NBPGR), New Delhi, the Central Soil Salinity

Research Institute, Karnal and GB Pant University

of Agriculture and Technology, Pantnagar (Table 1).

The varieties included four salt-tolerant lines (Pok-

kali, CSR 10, CSR 27, CSR 36), two short-grain

aromatic salt-tolerant varieties (Kalanamak 3119,

Kalanamak 3131), two new plant type lines (Pusa

1266, Pusa 1342), a japonica variety Taipei 309, an

indica variety with pigmented endosperm (Red

Triveni), four modern high-yielding varieties (Jaya,

Ratna, Jyoti, Pusa 44), a drought-tolerant line MI48,

and a basmati variety Pusa 1121 (Table 1). A larger

set of 127 diverse rice varieties and landraces was

used for the validation and genotyping of SNPs for

association analysis (Table S1). One accession each

of O. nivara (No. 283160) and O. rufipogon (No.

381932) were obtained from the NBPGR, New Delhi.

PCR amplification of the BADH1 gene fragments

Genomic DNA was extracted from young seedlings

using the standard CTAB method (Murray and

Thompson 1980). The reference sequence of the

BADH1 gene from Oryza sativa japonica cv. Nippon-

bare was downloaded from the NCBI database (http://

www.ncbi.nlm.nih.gov). It was excised from the

sequence of chromosome 4 BAC clone OSJNBa00614

(IRGSP 2005). Since the BADH1 coding sequence is in

the negative strand of this BAC, the excised fragment

between nucleotide positions 41764 and 36437 was

reverse-complemented for correct orientation of the

gene sequence. The excised sequence represented a

region between 700 bp upstream of the ATG transla-

tion start codon and 200 bp downstream of the TAG

translation stop codon. Forward and reverse primers

were designed from different overlapping segments of

the BADH1 gene using Primer 3 software and re-

checked by BLAST search to ensure that they matched

uniquely with the expected positions in the rice gen-

ome (Table 2). Gradient PCR was done for all the

primer pairs to find the optimum annealing temperature

for amplification of single DNA fragments.

Sequencing of the PCR products and SNP

discovery

PCR products were sequenced by MegaBACE 4000

automated capillary sequencers (GE Healthcare). The

sequence trace files from each variety were assem-

bled into contigs using combined Phred/Pharp/Con-

sed software (Ewing and Green 1988). The sequence

reads generated from PCR products of the BADH1

gene from 16 rice varieties were pooled into a single

assembly to find SNP differences among the varie-

ties. Polymorphism tags were generated automati-

cally by Polyphred software integrated with the

Table 1 List of 16 rice

varieties of diverse origin,

aroma and salt tolerance

used for discovery and

validation of SNPs in the

BADH1 gene on rice

chromosome 4

S. no. Variety Origin Salt tolerance Aroma

1. CSR 10 M40-431-24-114/Jaya Tolerant Absent

2. CSR 27 Nona Bokra/IR5657-33-2 Tolerant Absent

3. CSR 36 indica Tolerant Absent

4. Jaya T(N)1/T 141 Sensitive Absent

5. Jyoti PTB10/IR8 Sensitive Absent

6. Kalanamak 3119 Aromatic landrace Tolerant Strong

7. Kalanamak 3131 Aromatic landrace Tolerant Strong

8. MI 48 Pelita1-1//H4/H501 Sensitive Absent

9. Pokkali indica, landrace Tolerant Absent

10. Pusa 1121 P 614-1-2/P 614-2-4-3 Sensitive Strong

11. Pusa 1266 indica, NPT line Sensitive Absent

12. Pusa 1342 indica, NPT line Sensitive Absent

13. Pusa 44 IARI 5901-2/IR8 Sensitive Absent

14. Ratna TKM 6/IR8 Sensitive Absent

15. Red Triveni indica Sensitive Absent

16. Taipei 309 japonica Sensitive Absent

Mol Breeding (2010) 26:325–338 327

123

Page 4: SNP haplotypes of the BADH1 gene and their association with aroma in rice (Oryza sativa L.)

Consed. High quality SNPs were then identified

manually and screen shots of the SNP trace files for

the two alleles were taken. The SNPs were analyzed

from the transcribed region of the gene only because

primers designed for the promoter region did not

work well. The position of exons and introns were

identified using gene prediction software FGENESH

(www.softberry.com) and verified manually by

checking against the full length cDNA sequence of

the BADH1 gene. The sequence reads of each variety

were also assembled separately to obtain a high-

quality consensus sequence of the BADH1 gene of the

individual varieties for comparison with the Nip-

ponbare reference sequence using pair-wise nucleo-

tide BLAST.

BADH1 SNP assay design and genotyping

The Sequenom MassARRAY� system uses matrix-

assisted laser desorption ionization-time of flight

(MALDI-TOF) mass spectrometer for accurate detec-

tion of SNPs in a high-throughput manner (www.

sequenom.com). MassARRAY Assay Design 3.1

software was used for the design of multiplex iPLEX

assays for 20 SNPs of the BADH1 gene in two wells.

One of these wells also included an SNP from exon 7 of

the BADH2 gene for genotyping a loss of function

allele (Bradbury et al. 2005, Amarawathi et al. 2008).

The 30-mer pre-amplification primers and variable

length genotyping primers generated by the Assay

Design 3.1 software were procured and used for the

validation of SNPs according to the Sequenom user

manual. The MassARRAY Typer 3.4 Software was

used for the visualization of SNPs and allele calling.

BADH1 sequence divergence and association

with rice aroma

The phylogenetic tree of the BADH1 gene sequence

obtained by resequencing of 16 rice varieties and

Nipponbare reference gene sequence (NCBI

gi:2244603) was constructed using MEGA version 4

software with O. rufipogon (NCBI gi:165874486) as an

out group for rooting the tree (Tamura et al. 2007). The

reliability of neighbor-joining phylogeny output was

estimated using bootstrap analysis with 1000 permu-

tations and one input order per replicate. In addition,

analysis of the BADH1 sequence variation among 127

rice varieties was done based on the scores of 15

validated SNPs identified by resequencing of the

BADH1 gene from 16 varieties and Nipponbare using

the Sequenom MassARRAY assays. The frequencies

Table 2 PCR primers used for amplification of different segments of the BADH1 gene from 16 rice varieties for SNP discovery

Primer ID Forward primer Reverse primer Start

positionaEnd

positionaProduct

size

Bad1-Pa TGTTAAAATGACCAGATTACCCCTA AGCCCGTGATACCTTTTTGA -684 -158 527

Bad1-1 GCATTTGGTTTGCTCCATC TTATGTGCACCCCCTTCCTA 159 666 508

Bad1-2 AGGACGCTCTTAGTGCCGTA AATCTGCAGCCAGTTCACAA 515 1114 600

Bad1-3 GTGCCCACCCTGTCATTAGT TCAAACATGAACCAACAAAAGC 1035 1635 601

Bad1-4 ACGTCCAATTTCCCTCGTCT ACAGGGAAGCAAGCTCAGAT 1533 2142 610

Bad1-5 GCTGATGGCTACTTGGAAGG AGGTTTCTTGCTCCGACAGA 2060 2616 557

Bad1-6 TGATCATGCCCTGAAGAGAA CAGCCTGCAACCTTCTTCTA 2549 3152 605

Bad1-7 AATTGCAAAGCGATTCTTGG AAGAACCCCCTTTTGAGGTG 3069 3671 603

Bad1-8 TGCCCGACCACAAGTATGTA TTGGCCACAGTTTGTGACAT 3555 4278 724

Bad1-9 GGGAGCTAGGACAGTGGTGA GTGACTTGCTTCACGCTCAA 4169 4373 205

Bad1-IN-5 GGTACTCCGTCCCTTTGCTT TGAGAAACCCATTGTTCAAAGA 15 869 755

Bad1-3.1 GCTGATGGCTACTTGGAAGG AGCACTGCAGACTTGACCAG 2060 2972 913

Bad1-3.2 CCCACGTCAACTATGCTCCT AGAATTGTGGCACCTTCACA 2775 3545 770

Bad1-3a CCCAAGGCTGAAATTTTTGT TGAAATTTCCAATTGGTCTTCTG 967 1661 694

Bad1-3b TAAATGGAAAGCCCCAAGG TTGGATGATCACGTACAAAAGG 955 1670 815

Bad1-END GTCTAGCTGGCGCTGTGATT CCGTATGGTTCATCTGAGCA 3885 4400 516

a With reference to ATG translation start codon

328 Mol Breeding (2010) 26:325–338

123

Page 5: SNP haplotypes of the BADH1 gene and their association with aroma in rice (Oryza sativa L.)

of the two common BADH1 protein haplotypes

(corresponding to four BADH1 SNP haplotypes) were

analyzed in all 127 rice varieties and also separately in

the aromatic and salt-tolerant subgroups of varieties,

and significance of deviation in allele frequencies from

the population means was tested using the chi-squared

test.

Population structure and cluster analysis using

genome-wide SNPs

Sequenom MassARRAY multiplex assays were

designed for 72 SNPs (two wells of iPLEX gold

chemistry), representing conserved single-copy rice

genes (Singh et al. 2007), taking six genes from each

of the twelve rice chromosomes. Two 36-plex assays

were designed and validated by Sequenom Corpora-

tion (San Diego), but only 54 SNPs giving more than

95%success rates were used for the population

structure analysis (Table S2). STRUCTURE 2.3.1

software was used to infer population structure of the

127 rice varieties using a burn-in of 100,000, run

length of 100,000 and a Bayesian model allowing for

admixture and with correlated allele frequencies

applying (Pritchard et al. 2000). The software was

applied for ten independent runs with an assumption

of ‘independent allele frequencies’ using a value of K

ranging from 1 to 10. The final K value (number of

subpopulations) was selected such that the value of awas less than 0.2 and did not change with subsequent

runs using higher K values. Graphical outputs from

STRUCTURE were produced to visualize the opti-

mum number of clusters. The 54 genome-wide SNP

scores of 127 rice accessions and one accession each

of two wild rice species (O. rufipogon and O. nivara)

were also analyzed using Free Tree software to

construct a phylogenetic tree (Pavlicek et al. 1999).

Dice coefficient was used for generating the distance

matrix and clusters were developed using the neigh-

bour-joining method. Bootstrap analysis was done to

test the reliability of clustering. The phylogenetic tree

of the varieties was constructed using the tree option

with O. nivara at the root of the tree.

Protein modeling and ligand docking

Two haplotypes of the BADH1 protein, PH1 and

PH2, were modeled using Accelrys Discovery Studio

Software. Protein sequences of PH1 and PH2 were

searched against the PDB database using the

BLASTP program of NCBI. The crystal structure of

BADH from Staphylococcus aureus (PDB ID: 3ED6)

showed highest similarity (44%) with the PH1 and

PH2 protein haplotypes, hence it was selected as

template to model the BADH1 proteins. The protein

modeling of PH1 and PH2 was done by loop refinement

and energy minimization using CHARMm 27 force

field. A three-dimensional structure model of gamma-

aminobutyraldehyde (GABald) substrate for the BADH

enzyme was created using CHEMSKETCH software.

Docking of GABald was carried out with the modeled

PH1 and PH2 using the Ligandfit Module of the

Discovery Studio Software. BADH1-PH1 and BADH1-

PH2 proteins were treated as receptor molecules and

GABald as the ligand. The program was run on a

machine with CPU speed of 2.8 GHz, Intel Xeon

processors, 12 GB RAM with a modeling time of 1 h

30 min, energy minimization of 30 min and docking

time of 10 min for each binding sites.

Results and discussion

Discovery of SNPs in the BADH1 gene of rice

Considering the implication of BADH enzyme in rice

aroma development and its potential role in drought

and salt stress tolerance, we attempted to find SNP

variation in the BADH1 gene by resequencing of this

gene from 16 diverse rice genotypes differing in

aroma and salt tolerance (Table 1). Initially, a single

pair of PCR primers flanking the BADH1 gene was

designed for the amplification of the full-length

BADH1 gene, but this effort failed despite using

different conditions and high-fidelity long-read DNA

polymerases. Therefore, multiple pairs of primers

were designed to amplify smaller overlapping seg-

ments of the BADH1 gene (Table 2), so that full

sequence could be generated by in silico assembly of

the sequence reads. These primers gave good ampli-

fication after optimization of the annealing temper-

ature using gradient PCR in the range 45–65�C. On

the basis of gradient PCR results, the optimum

annealing temperature (Ta) for all the primer pairs

was taken as 65�C, which gave a clean single PCR

product with minimum background. The primers

designed for amplification of promoter region of the

Mol Breeding (2010) 26:325–338 329

123

Page 6: SNP haplotypes of the BADH1 gene and their association with aroma in rice (Oryza sativa L.)

BADH1 gene (BAD1-Pa, Table 2) did not amplify

well even in the gradient PCR experiment except for

two varieties, namely Taipei 309 which is a japonica

type and Pusa 1266 which has a japonica back-

ground. This may be due to high GC content in this

region and, furthermore, there could be significant

sequence difference between the indica and japonica

rice varieties in this region, as the PCR primers were

designed based on the sequence of reference japonica

variety Nipponbare. Among the coding region prim-

ers, BAD1-3 showed amplification only in seven

varieties, and therefore new set of primers, BAD1-3a,

BAD1-3b, BAD1-3.1 and BAD1-3.2, were designed

(Table 2), which solved the problem and led to

complete assembly of the sequence for this region of

the gene. Similarly, another primer pair, BAD1-END,

was redesigned to amplify the stop codon region of

the BADH1 gene that amplified nicely in all the

varieties. All the primer pairs showed amplification

of excepted size products indicated in Table 2.

PCR products were sequenced from both the ends

using the same forward and reverse primers that were

employed for their amplification, generating high

Phred quality sequence data, and several hundred

high-quality sequence reads were obtained from the

32 sequencing primers. The promoter region primer

BAD1-Pa gave amplification only with Taipei 309

and Pusa 1266, and therefore was not considered for

the SNP analysis in this study. For sequence assembly

each sequence read was named in full giving project

name, variety name, primer name and forward or

reverse primer names, which helped in the interpre-

tation of data in the sequence assemblies. The first

kind of sequence assemblies involved assembly of all

the reads coming from a single variety without the

sequence of reference variety Nipponbare. There

were sixteen separate contig assembly projects, one

for each test variety. Twelve of the 16 varieties

produced single assembled contigs for the coding

region while the remaining four varieties (CSR36,

Jyothi, Kalanamak 3131 and Ratna) were assembled

into two to three contigs. Some of these contigs were

just touching each other when compared with the

reference sequence of Nipponbare, but did not merge

into a single contig in the Phrap assembly. The

consensus sequences of the contigs for each variety

already submitted to the NCBI were used for

construction of the phylogentic tree of the BADH1

gene (Fig. S1).

The second kind of assembly included sequence

reads from all the 16 test varieties and the reference

variety Nipponbare into a single project, and the

entire assembly was integrated with the Polyphred

that helped automatically tag the SNPs with their

quality score. The consensus sequence was based on

the reference sequence and SNPs were identified and

tagged in the CONSED alignment window. Twenty

SNPs were identified in this manner and the high

quality of the SNPs was evident from the fact that all

the flanking sequences were clearly identical except

for the single nucleotide differences. The nucleotide

positions of all the 20 SNPs in the BADH1 gene of

reference variety Nipponbare are shown in Fig. 1.

The BADH1 gene has 15 exons and 14 introns, and

out of the 20 SNPs identified here 17 were present in

the introns and only three were in the exons,

suggesting high sequence conservation in the coding

region of this gene (Fig. 1). The abundance of SNPs

in the introns (one per 171 bp) was almost three times

higher than exons (one per 505 bp). In fact, 50% of

the 20 SNPs were found in just two introns (numbers

2 and 4). The three exonic SNPs were (1) S61371 in

exon 4 with a T/A polymorphism resulting in

asparagine to lysine substitution at amino acid

position 144; (2) S183493 in exon 11 with a C/A

polymorphism resulting in glutamine to lysine sub-

stitution at amino acid position 345, and (3) S193500

also in exon 11 with T/C polymorphism resulting in

isolucine to threonine substitution at amino acid

position 347. The S61371 and S183493 SNPs are

significant because they result in incorporation of

functionally active lysine residues. Furthermore, they

are base transversions which are less frequent than

base transition mutations. The 3D crystal structure of

the BADH enzyme has been solved in rice and its

active sites and functional domains have been

identified; these amino acid mutation positions are

near the enzyme active site and substrate recognition

site (Chen et al. 2008). Certain SNPs were specific to

only one of the 16 varieties, e.g. Red Triveni has

unique allele A for the SNP S2474 at nucleotide

position 474. Similarly, SNP numbers S142772 and

S193500 were unique to short-grain aromatic salt-

tolerant varieties Kalanamak 3119 and Kalanamak

3131. The SNPs S193500 and S183493 are both near

the substrate recognition site of the BADH enzyme.

This suggests evolution of these mutations under

positive selection pressure.

330 Mol Breeding (2010) 26:325–338

123

Page 7: SNP haplotypes of the BADH1 gene and their association with aroma in rice (Oryza sativa L.)

Fig. 1 Sequence of the

BADH1 gene from Oryzasativa japonica cv

‘Nipponbare’ showing

positions of 20 SNPs

(highlighted yellow) in

exons (capital letters) and

introns (lower case). PCR

primers used for

amplification of

overlapping gene fragments

are shown in red font,reverse primers are

underlined

Mol Breeding (2010) 26:325–338 331

123

Page 8: SNP haplotypes of the BADH1 gene and their association with aroma in rice (Oryza sativa L.)

Phylogenetic analysis of the BADH1 gene of rice

Earlier studies in plants have shown the divergence

of BADH1 and BADH2 genes from a common

ancestor well before the divergence of cereals, with

each gene having similar enzyme activities but

different regulatory control (Bradbury et al. 2005).

However, no study is available on the BADH1

sequence variation in different rice varieties to

provide an insight into the origin of functional

mutations in the BADH1 gene of rice. We con-

structed a phylogenetic tree on the basis of the high-

quality sequence contigs assembled for the coding

region of the BADH1 gene of the 16 rice varieties

along with the reference sequence of japonica

cultivar Nipponbare for comparison and a partial

genomic sequence of the progenitor species O.

rufipogon for rooting of phylogenetic tree (Fig S1).

The tree rooted with the progenitor species O.

rufipogon clearly grouped the varieties according to

their sequence differences with very high bootstrap

values. Two indica type varieties Pusa 44 and Jyothi

were more closely related to the origin than other

varieties. The next most closely related variety was

Pokkali, a salt-tolerant landrace from Kerala, India.

The tree clearly shows the evolution of the Indian

aromatic and japonica group of cultivars from the

progenitor O.rufipogon in a step-wise manner (Fig

S1). It was interesting to note that the phylogram

based on a single gene was able to group these 16

varieties into clusters, which corresponded closely to

their geo-evolutionary origin. For example, the

japonica type varieties Nipponbare and Taipei 309

and the japonica 9 indica crossbred varieties

Pusa1342 and Pusa1266 were grouped together,

along with Red Triveni. Salt-tolerant varieties

CSR36 and CSR10 formed a separate group while

short-grain aromatic rice varieties Kalanamak 3119

and Kalanamak 3131 were grouped in a separate

cluster. Basmati quality variety Pusa 1121 formed a

separate group with CSR27 which has Nona Bokra

in its pedigree (Table S1), and three high yielding

semi-dwarf varieties, Jaya, Ratna and MI48, were

more closely related to each other. These groupings

are consistent with the known pedigree and origin of

these varieties (Table S1). Interestingly, the six salt-

tolerant rice varieties did not form a single group,

thus highlighting the possible role of different

mechanisms of salt tolerance in these varieties and

possibly indicating the lack of association between

salt tolerance and the BADH1 polymorphism in rice.

Multipex SNP genotyping and association

of BADH1 haplotypes with rice aroma

Sequenom MassARRAY multiplex assays were

designed for high-throughput analysis of the 20 SNPs

identified during this study, and an additional SNP

linked with 8 bp deletion in exon 7 of the BADH2 gene

(Table 3). However, after optimization and validation

of the assays it was found that only 15 BADH1 SNPs

and one BADH2 SNP could be scored reliably with a

high degree of reproducibility. The remaining five

BADH1 SNPs (S7, S10, S12, S13 and S20) did not

work consistently, most likely due to the overlapping

target sites for the primers in the same PCR reaction.

However, it may not be necessary to assay all the SNPs

because they tend to inherit as haplotype blocks and

therefore 15 reproducible SNPs in the BADH1 gene

can provide full information on its common alleles.

While the individual SNPs were scored successfully in

more than 95% of the samples, all the 15 SNPs were

scored accurately in 92 of the 127 varieties analyzed.

Based on the data of 15 SNPs, 92 varieties could be

grouped into 15 different haplotypes (Fig. 2, Table

S3). Ten of these SNP haplotypes were rare, repre-

sented by one variety each, viz. Taipai309, Jyothi,

Pusa44, SKR126, CSR10, IR64, Pusa1266, Kasturi,

Pusa1121 and Pant Dhan 4. Another unique SNP

haplotype was present in the two different accessions

of Kalanamak (3119 and 3131). Four haplotypes were

common, representing 86.9% of the varieties with full

15 SNP complements. Analysis based on the three

exonic SNPs showed only five haplotypes, two of

which (PH1 and PH2) were common and the remain-

ing three were rare, including haplotype PH3 in the

aromatic salt-tolerant landrace Kalanamak 3119 and

3131. Varieties CSR10 and Pant Dhan 4 also have

unique protein haplotypes PH4 and PH5, respectively

(Fig. 2, Table S4).

Meaningful trait association analysis in the present

study is possible only with the common alleles

present in eighty varieties, e.g. SH1-SH4 (PH1-PH2).

The association of rare SNPs with the traits con-

cerned can be better analyzed using bi-parental

mapping populations (Rafalski 2002, Belo et al.

2008). The number of varieties with the four common

SNP haplotypes was 38 for SH1, 19 for SH2, 17 for

332 Mol Breeding (2010) 26:325–338

123

Page 9: SNP haplotypes of the BADH1 gene and their association with aroma in rice (Oryza sativa L.)

Ta

ble

3S

equ

ence

of

the

pre

-am

pli

fica

tio

nP

CR

pri

mer

s(P

CR

P)

and

sin

gle

nu

cleo

tid

eex

ten

sio

np

rim

ers

(UE

P)

des

ign

edfo

rm

ult

iple

xg

eno

typ

ing

of

20

SN

Ps

inth

eB

AD

H1

gen

ean

do

ne

SN

Pin

the

exo

n7

of

the

BA

DH

2g

ene,

des

ign

edfo

rth

eiP

LE

Xas

say

so

fS

equ

eno

mM

assA

RR

AY

syst

em

SN

P_

IDW

ell

2n

d-P

CR

P1

st-P

CR

PU

EP

BA

DH

1_

S2

1A

CG

TT

GG

AT

GT

TT

AC

GG

CA

CT

AA

GA

GC

GT

CA

CG

TT

GG

AT

GT

CT

CC

TA

TG

CT

GC

TA

AC

CT

GT

TC

AG

TC

AC

AC

AG

CA

AG

BA

DH

1_

S5

1A

CG

TT

GG

AT

GT

TG

AT

TC

TG

GG

AA

GC

CT

CT

GA

CG

TT

GG

AT

GC

GG

AA

AA

TC

TT

GT

GT

CA

TG

CG

CT

GG

GG

AC

AT

GG

TA

TG

BA

DH

1_

S6

1A

CG

TT

GG

AT

GT

TA

GA

TG

GG

AA

AC

AA

CG

GG

CA

CG

TT

GG

AT

GA

CC

CC

AA

TG

GG

TT

CT

TT

GA

GT

CT

CT

CT

AC

CC

AT

GG

AA

AA

BA

DH

1_

S1

71

AC

GT

TG

GA

TG

CA

CT

AG

AA

GA

AG

GT

TG

CA

GG

AC

GT

TG

GA

TG

TT

GC

AA

CT

GA

AC

TT

CA

GG

AG

GG

GC

AG

GT

AA

TG

TA

AA

TA

G

BA

DH

1_

S4

1A

CG

TT

GG

AT

GA

AC

AG

CA

AG

GG

CA

AG

TC

AA

CA

CG

TT

GG

AT

GT

GG

CT

AG

TT

TA

GT

CT

AG

CG

GT

AG

TG

GA

GA

AT

CT

CA

TC

TG

C

BA

DH

1_

S7

1A

CG

TT

GG

AT

GG

GA

CG

TG

TC

AA

AG

TA

TT

TC

GA

CG

TT

GG

AT

GG

CC

TT

GA

AA

AA

TT

GA

TA

CT

GA

TT

TC

GA

TA

CA

CT

AC

AA

CA

TC

BA

DH

2_

S1

1A

CG

TT

GG

AT

GG

TT

AG

GT

TG

CA

TT

TA

CT

GG

GA

CG

TT

GG

AT

GC

CT

TA

AC

CA

TA

GG

AG

AG

CT

GC

TG

GG

AG

TT

AT

GA

AA

CT

GG

TA

T

BA

DH

1_

S9

1A

CG

TT

GG

AT

GC

CA

AT

TG

GT

CT

TC

TG

TT

AT

CA

CG

TT

GG

AT

GG

TT

TT

GT

TC

AC

AC

CG

GA

AG

CA

TC

AA

AC

AT

GA

AC

CA

AC

AA

AA

G

BA

DH

1_

S1

91

AC

GT

TG

GA

TG

TG

TG

GC

AC

CT

TC

AC

AT

CT

TG

AC

GT

TG

GA

TG

AC

TC

GG

AC

GA

CT

TA

AG

AA

CC

CT

TG

CT

GT

TG

AG

AT

GA

AC

TT

CA

TT

BA

DH

1_

S1

41

AC

GT

TG

GA

TG

CG

CA

TA

AT

TG

AA

TT

AG

GA

GC

AC

GT

TG

GA

TG

GG

TA

TT

AG

GT

AT

CT

TC

CA

CT

CT

TA

GG

AG

CA

TA

GT

TG

AC

GT

GG

GA

A

BA

DH

1_

S8

2A

CG

TT

GG

AT

GC

GA

AA

TA

CT

TT

GA

CA

CG

TC

CA

CG

TT

GG

AT

GG

AA

CA

AT

TG

CT

TC

CG

GT

GT

GC

AC

GT

CC

AA

TT

TC

CC

TC

BA

DH

1_

S2

02

AC

GT

TG

GA

TG

CC

AG

CT

AG

AC

CA

TA

GC

TG

AG

AC

GT

TG

GA

TG

TG

CT

TT

AT

GT

GG

AA

CT

GT

GG

AG

AC

AA

GA

AC

AG

GC

TT

A

BA

DH

1_

S1

62

AC

GT

TG

GA

TG

AG

TC

TG

CA

GT

GC

TA

CT

TC

TC

AC

GT

TG

GA

TG

CG

AG

GA

CA

TG

AC

TT

AA

GC

TG

TC

GT

CT

AC

TT

TT

GC

AT

GT

A

BA

DH

1_

S1

52

AC

GT

TG

GA

TG

TT

CC

CA

CG

TC

AA

CT

AT

GC

TC

AC

GT

TG

GA

TG

CT

AC

AG

AA

TT

GC

AG

GT

AA

TG

AA

TT

AT

GC

GA

TG

TT

GT

CA

AA

BA

DH

1_

S3

2A

CG

TT

GG

AT

GT

TT

AC

GG

CA

CT

AA

GA

GC

GT

CA

CG

TT

GG

AT

GT

CT

CC

TA

TG

CT

GC

TA

AC

CT

GC

CT

TT

AA

TC

AA

TC

CC

CG

AA

AG

BA

DH

1_

S1

12

AC

GT

TG

GA

TG

AG

CC

TA

AA

GC

GC

AT

TT

AC

AC

AC

GT

TG

GA

TG

CA

TG

AG

TA

CA

GA

CA

TG

CA

CC

AC

AC

AA

AG

CA

AT

GA

AA

CA

CG

GA

BA

DH

1_

S1

2A

CG

TT

GG

AT

GA

GA

GC

TC

TG

CT

CA

AC

AA

CT

CA

CG

TT

GG

AT

GA

AG

GT

GC

TC

AG

CT

TC

TA

TG

CC

TG

CT

CA

AC

AA

CT

CA

TC

TA

GC

TC

BA

DH

1_

S1

22

AC

GT

TG

GA

TG

GT

AA

AA

GG

CT

AA

TC

GT

TT

GG

AC

GT

TG

GA

TG

TT

CC

AA

GT

AG

CC

AT

CA

GC

AG

CT

AA

TC

GT

TT

GG

AT

TA

TT

TT

CT

TA

BA

DH

1_

S1

32

AC

GT

TG

GA

TG

GC

TC

TA

AA

CA

TG

TC

CT

GG

TT

AC

GT

TG

GA

TG

TC

CC

TG

TA

AG

GA

TC

TA

AG

GA

agC

AT

AT

TT

TT

AG

TC

AT

GG

GG

TA

CA

BA

DH

1_

S1

82

AC

GT

TG

GA

TG

AC

TC

GG

AC

GA

CT

TA

AG

AA

CC

AC

GT

TG

GA

TG

TG

TG

GC

AC

CT

TC

AC

AT

CT

TG

tcA

GA

AC

CT

TT

TT

CT

CA

TT

TC

AG

TA

T

BA

DH

1_

S1

02

AC

GT

TG

GA

TG

TG

GG

AA

GA

TC

AA

CT

TC

AA

AG

AC

GT

TG

GA

TG

CC

TG

TC

AG

AG

GA

AA

GA

CT

AT

ccA

CT

TC

AA

AG

TT

AT

TG

GA

TG

AT

CA

C

Mol Breeding (2010) 26:325–338 333

123

Page 10: SNP haplotypes of the BADH1 gene and their association with aroma in rice (Oryza sativa L.)

SH3 and 6 for SH4 (Fig. 2). The four common SNP

haplotypes represented two BADH1 protein haplo-

types resulting from two exonic SNPs, S61371 and

S183493. The S193500 was a rare SNP unique to

Kalanamak 3119 and Kalanamak 3131, but S61371

and S183493 were polymorphic in the four common

SNP haplotypes. The SNP haplotypes SH1 and SH2

have lysine residues at amino acid positions 144 and

345, whereas in the SNP haplotypes SH3 and SH4 the

two lysine residues are substituted by asparagine and

glutamine residues, respectively. The difference

between SNP haplotypes SH1 and SH2 was due to

three intronic SNPs (S1, S9 and S16), whereas the

difference between haplotypes SH3 and SH4 was due

to two intronic SNPs (S2 and S5). Thus, at the protein

level there are only two common alleles: (i) protein

haplotype 1 (PH1), represented by SNP haplotypes

SH1 and SH2, and (ii) protein haplotype 2 (PH2),

represented by SNP haplotypes SH3 and SH4. The

protein haplotype PH1 was present in most high-

yielding indica rice varieties, whereas the protein

haplotype PH2 was common to the aromatic and

japonica group of varieties. The BADH1 protein

modeling and GABald ligand docking study revealed

20 ligand binding sites in the PH1 and 18 ligand

binding sites in the PH2 protein haplotype. Monte

Carlo simulation was used to dock the ligand with the

receptor in 1000 iterations for each binding sites. In

the PH1 haplotype, 15 out of the 20 binding sites

were predicted as probable active sites, as the

GABald ligand bound tightly on those sites. In the

PH2 haplotypes only 8 out of 18 binding sites were

predicted as probable active sites for the binding of

GABald, suggesting a drastic reduction in the affinity

of the BADH1-PH2 haplotype for GABald, which is

a precursor of the aroma compound 2-AP (Fig. 3).

Thus PH2 could be a loss of function allele of the

BADH1 gene with implications in rice aroma similar

to the loss of function alleles of the BADH2 gene

(Kovach et al. 2009).

Variety S-1 S-2 S-3 S-4 S-5 S-6 S-8 S-9 S-11 S-14 S-15 S-16 S-17 S-18 S-19 SNP Haplotype

Protein Haplotype Frequency

Jaya G C G T T A A C C T T T T A T SH1 PH1 38ADT43 A C G T T A A G C T T C T A T SH2 PH1 19Basmati 370 G C A A C T G G T T C C C C T SH3 PH2 17Taraori Basmati G A A A T T G G T T C C C C T SH4 PH2 6Kalanamak 3119 G C A T T A A G C A T C C A C SH5 PH3 2Taipai 309 G C A A T T G G T T C C C C T SH6 PH2 1Jyothi G C G T C A G C C T T T T A T SH7 PH1 1Pusa 44 G C G T T A A G C T T T T A T SH8 PH1 1SKR 126 G C G T T A A G C T T C T A T SH9 PH1 1CSR 10 G C G T T T G G T T T C T A T SH10 PH4 1IR 64 G C G T T A A C C T T T C A T SH11 PH1 1Pusa 1266 G C A A C T G C T T C C C C T SH12 PH2 1Kasturi G C A T C T G G T T C C C C T SH13 PH2 1Pusa 1121 A C G T T A A C C T T C T A T SH14 PH1 1Pant Dhan 4 G C G T T A A C C T T T T C T SH15 PH5 1

Fig. 2 Haplotypes of the

BADH1 gene in 92 diverse

rice varieties based on 15

SNPs with no missing data

genotyped using Sequenom

MassARRAY system.

Protein haplotypes are

based on three exonic SNPs

(S6, S18 and S19)

Fig. 3 3D modeling of the

two common BADH1

protein haplotypes (PH1

and PH2) with ligand

docking (green color)

showing reduced number of

binding sites in the PH2 for

GABald, a precursor of the

rice aroma compound 2-AP

334 Mol Breeding (2010) 26:325–338

123

Page 11: SNP haplotypes of the BADH1 gene and their association with aroma in rice (Oryza sativa L.)

An important observation in the present investiga-

tion was the association of BADH1-PH2 protein

haplotype with the aromatic rice varieties. To com-

plement such analysis it was important to analyze the

127 rice varieties and landraces for their population

structure using the STRUCTURE 2 software (Prit-

chard et al. 2000). We used 54 validated genome-wide

SNP markers with four to six loci per rice chromosome

for genotyping using Sequenom Mass ARRAY in two

multiplex assays (Table S2). Genome-wide synony-

mous SNPs were identified by resequencing of the

intron-spanning regions of conserved single copy rice

genes (our unpublished data). After fixing the K value

at two, and 1000 bootstrap permutations, the 127 rice

varieties were classified in two population groups

(Fig. 4). The list of varieties in each group with their

BADH1 haplotype score, aroma, salt tolerance and

BADH2-exon 7 SNP score is shown in Table S5. Most

of the traditional aromatic rice varieties were present

in population group 2 along with the japonica type

varieties, broadly agreeing with the phylogenetic

grouping based on the BADH1 gene sequence (Fig.

S1). But the population structure grouping presented

here is based on 54 genome-wide SNPs and therefore

reflected the pedigree and selection strategy used in

the development of these varieties (Fig. 4a). Rooting

of the phylogenetic tree with O. nivara showed that the

indica cultivar group was closer to this wild progen-

itor, but O. rufipogon was closer to the japonica/

aromatic population group (Fig. 4b). In each of the

two population groups there were varieties having

significant proportion of genes from the other group

due to their crossbreeding pedigrees; those with more

than 10% genes from other groups are marked in Table

S5.

The association between the BADH1 haplotypes

and salt tolerance or aroma trait of the rice varieties

was assessed manually using the chi-squared test of

significance. The two common protein haplotypes of

the BADH1 gene were present in both the population

groups. Similarly, salt-tolerant and aromatic varieties

were also present across the population groups, but

there was predominance of aromatic varieties in the

population group 2. Frequencies of varieties with

different salt tolerance and aroma scores against the

two common BADH1 protein haplotypes present in

80 rice varieties are shown in Table 4. The frequen-

cies of two protein haplotypes were analyzed in each

of the four categories of aromatic, non-aromatic, salt-

tolerant and salt-susceptible varieties against the

observed overall distribution of 71.2% for PH1 and

28.8% for PH2 in the whole population. There was a

significant association between aroma score and the

BADH1 protein haplotype PH2, where lysine144 and

lysine345 residues of the common haplotype PH1 are

substituted by asparagine144 and glutamine345,

respectively (v2 = 6.985, P = 0.008 at df 1). It is

known from several independent genetic studies and

validation through genetic transformation that the

BADH2 gene is a major locus responsible for rice

aroma, where loss of function mutations in the gene

lead to accumulation of gamma-aminobutyraldehyde

(GABald), a precursor of the aroma compound 2-AP

in the rice grains (Bradbury et al. 2008). At least two

earlier studies have shown co-location of a QTL for

aroma and the BADH1 gene on rice chromosome 4

(Lorieux et al. 1996, Amarawathi et al. 2008). The

BADH1 haplotype aroma association indentified in

this study may explain the functional basis of the

aroma QTL on chromosome 4.

Due to their similar biochemical function it is

anticipated that loss of function mutations in the

BADH1 gene could also control rice aroma similar to

the BADH2 gene, particularly in salt and water stress

conditions (Bradbury et al. 2008). In this study we

provide evidence that the BADH1 protein haplotype

PH2 (SNP haplotypes SH1 and SH2) is associated

with the aromatic rice varieties. It is important to note

here that the loss of function mutation in the BADH2

gene is a primary requirement for aroma development

due to constitutive expression of the BADH2 gene

(Bradbury et al. 2005). However, just the loss of

function of the BADH2 gene is not enough; it may be

complemented with the BADH1 protein haplotype

PH2 (SNP haplotypes SH1 and SH2) for full aroma

expression. For example, the popular crossbred

basmati variety Pusa Basmati 1 has the badh2-exon

7 deletion mutation but has BADH1 haplotype PH1/

SH1, which could be the reason for its mild aroma,

whereas another popular crossbred basmati variety

Pusa 1121 has a rare allele of the BADH1 gene

(haplotype PH1/SH14) that might lead to a better

aroma development than Pusa Basmati 1. Thus, a

combination of loss of function mutation in the

BADH2 gene and a reduction in the substrate binding

capacity of the BADH1 enzyme to aroma precursor

compound GABald could be important for full aroma

development in rice.

Mol Breeding (2010) 26:325–338 335

123

Page 12: SNP haplotypes of the BADH1 gene and their association with aroma in rice (Oryza sativa L.)

Re-sequencing of the target gene from different

genotypes of a species is one of the most reliable

techniques for SNP discovery that was applied here

for identification for the first time of 20 new SNPs in

the BADH1 gene. However, for routine application of

SNPs for allele mining and marker-assisted breeding,

high-throughput methods of SNP genotyping are

required. Sequenom MassARRAY system allows

handling of a large number of samples using small

to medium numbers of well characterized SNPs in

0.1

BindliTetep99

JayaPantDha1299

42

Kanak10

PR106Prabhat36

5

PNR381Pokkali65

MI4825

Pothana22

ADT3714

ADT384

1

PR108Shiva29

Swarnamukhi8

CSR27MTU107556

IR64Jyothi36

23

SumatiVarsha35

4

DhoiabankoiAnnadaRudramma

Vikas36

Rajavadlu19

ADT438

NeelaPantDhan422

1

KaushalMTU529398

NLR3444917

SwarnaCSR36

MalviyaDhan3636

OrgalluPelalaVadlu43

IR3614

BhadrakaliDivya60

8

IR501

IntanPusa83411

BPT1768Narendra359100

PusaSugandh2PusaSugandh341

PusaSugandh530

Lunishree52

Pusa16914

SaleemTellahamsa71

Satya72

Rajendra55

Rasi57

KrishanaHamsaRatna84

PNR16250

TKM634

RedTriveni3

HeiBaoJD617

HeeraIR20100

Narendra11827

Pusa20514

HKR126Keshava33

PR111Pusa4415

PR113Phalguna24

1

Kalinga3Vanprabha67

ChandanErramallelu82

Varalu34

JGL11470JGL3855100

SagarSamba31

5

JGL13595WGL3210050

1

ChaitanyaHKR12096

Krisahnaveni68

SwarnaDhan30

MandyaVijayaVijetha59

MTU106445

2

KavyaWGL1497

JGL384445

SonaMahsuri21

SambaMahsuri4

JGL117279

CSR10MTU108125

4

SurekhaIndurSamba

NLR3049129

BPT22312

8

Sriranga

23

CSR30TaraoriBasmati89

Basmati37088

HasanSarai69

Kalanamak313120

Kalanamak31197

Pusa1121Sonasal42

Pusa117610

33

SeondBasmati

41

ShahPasand20

NipponbareTaipai30999

Tripura83

Sathi56

46

Orufipogon381932

43

JhumKhasaPechiBadam37

Nagina2281

Golmalati35

51

KasturiPusaBasmati129

49

Pusa1266Pusa134252

64

Onivara283160

A B

Fig. 4 a Population structure (K = 2) and b NJ phylogenetic

tree (using Nei’s similarity matrix) of 127 rice varieties based

on 54 synonymous SNPs present in conserved single-copy rice

genes evenly distributed on the 12 rice chromosomes. The

phylogenetic tree was rooted using Oryza nivara as an out

group

336 Mol Breeding (2010) 26:325–338

123

Page 13: SNP haplotypes of the BADH1 gene and their association with aroma in rice (Oryza sativa L.)

multiplex reactions. We designed two multiplex

genotyping assays for the 20 newly discovered SNPs.

Fifteen of these SNPs were validated successfully and

used for the genotyping of a large set of 127 rice

genotypes of diverse origin and agronomic trait

variation. This helped discovery of BADH1 haplo-

types showing significant association with the aro-

matic rice varieties that may compliment the role of

known loss of function alleles of the BADH2 locus

for rice aroma. However, this association needs

further validation in a segregating population.

Sequence submissions

The BADH1 sequences from sixteen rice varieties have

been submitted to the NCBI GenBank with accession

numbers: CSR10 (EU566870), CSR27 (bankit111

4275), CSR36 (bankit1114294), Jaya (EU566862),

Jyoti (bankit1114323), Kalanamak 3119 (EU566863),

Kalanamak 3131 (bankit1114319), MI48 (bankit111

4313), Pokkali (EU566865), Pusa 44 (EU566866),

Pusa 1121 (EU566867), Pusa 1266 (EU566864), Pusa

1342 (EU566868), Ratna (bankit1114305), Red Tri-

veni (bankit1114308), Taipei 309 (EU566869).

Acknowledgments This work was supported by the NPTC

project of the Indian Council of Agricultural Research and is

part of the M.Sc. thesis of the senior author.

References

Amarawathi Y, Singh R, Singh AK, Singh VP, Mohapatra T,

Sharma TR et al (2008) Mapping of quantitative trait loci

for basmati quality traits in rice (Oryza sativa L.). Mol

Breed 21:49–65

Belo A, Zheng P, Luck S, Shen B, Meyer DJ, Li B, Tingey S,

Rafalski A (2008) Whole genome scan detects an allelic

variant of fad2 associated with increased oleic acid levels

in maize. Mol Genet Genomics 279:1–10

Bradbury LMT, Fitzgerald TL, Henry RJ, Jin QS, Waters DLE

(2005) The gene for fragrance in rice. Plant Biotechnol J

3:363–370

Bradbury LMY, Gillies SA, Brushett DJ, Daniel LEW, Henry

RJ (2008) Inactivation of an aminoaldehyde dehydroge-

nase is responsible for fragrance in rice. Plant Mol Biol

68: 439–449

Chen S, Yang Y, Shi W, Ji Q, He F, Zhang Z, Cheng Z, Liu X,

Xu M (2008) BADH2, encoding betaine aldehyde dehy-

drogenase, inhibits the biosynthesis of 2-acetyl-1-pyrro-

line, a major component in rice fragrance. Plant Cell 20:

1850–1861

Ewing B, Green P (1988) Base calling of automated sequence

tracer using Phred I. Accuracy assessment. Genome Res

8:175–185

Fitzgerald TL, Waters DLE, Henry RJ (2008) The affect of salt

on betaine aldehyde dehydrogenase transcript levels and 2-

acetyl-1-pyrroline concentration in fragrant and non-fra-

grant rice (Oryza sativa). Plant Sci. doi:10.1016/j.plantsci.

2008.06.005

IRGSP (2005) The map based sequence of the rice genome.

Nature 436:793–800

Ishitani M, Arakawa K, Mizuno K, Kishitani S, Takabe T

(1993) Betaine aldehyde dehydrogenase in the Grami-

neae. Levels in leaves of both betaine accumulating and

non-accumulating cereal plants. Plant Cell Physiol 34:

493–495

Ishitani M, Nakamura T, Han SY, Takabe T (1995) Expression of

the betaine aldehyde dehydrogenase gene in barley in

response to osmotic stress and abscisic acid. Plant Mol Biol

27:307–315

Kovach MJ, Mariafe N, Calingacionb N, Melissa AF,

McCouch SR (2009) The origin and evolution of fra-

grance in rice (Oryza sativa L.). Proc Natl Acad Sci USA

106:14444–14449

Liang Z, Ma D, Tang L, Hong Y, Luo A, Zhou J, Dai X (1997)

Expression of the spinach betaine aldehyde dehydroge-

nase (BADH) gene in transgenic tobacco plants. Chin J

Biotechnol 13:153–159

Lorieux M, Petrov M, Huang N, Guiderdoni E, Ghesquiere A

(1996) Aroma in rice: genetic analysis of quantitative

trait. Theor Appl Genet 93:1145–1151

Mohanty A, Kathuria H, Ferjani A, Sakamoto A, Mohanty P,

Murata N et al (2002) Transgenics of an elite indica rice

variety Pusa Basmati 1 harbouring the codA gene are

highly tolerant to salt stress. Theor Appl Genet 106:51–57

Table 4 Frequency distribution of two major protein haplotypes of the BADH1 gene (based on the exonic SNPs) in a diverse set of

80 rice varieties and landraces representing different pedigrees and eco-geographical origins

BADH1 protein haplotype Aromatic Non-aromatic Salt-tolerant Salt susceptible Total (%)

PH1 5 52 5 52 57 (71.2)

PH2 8 15 1 22 23 (28.8)

Total 13 67 6 74 80 (100)

v2 6.985 1.225 0.448 0.050

P (df = 1) 0.008 0.268 0.503 0.823

Mol Breeding (2010) 26:325–338 337

123

Page 14: SNP haplotypes of the BADH1 gene and their association with aroma in rice (Oryza sativa L.)

Murray MG, Thompson WF (1980) Rapid isolation of high

molecular weight plant DNA. Nucleic Acids Res 8:4321–

4325

Niu XL, Zheng WJ, Lu BR, Ren GJ, Huang WZ, Wang SH

et al (2007) An unusual post-transcriptional processing in

two betaine aldehyde dehydrogenase loci of cereal crops

directed by short, direct repeats in response to stress

conditions. Plant Physiol 143:1929–1942

Pavlicek A, Hrda S, Flegr J (1999) FreeTree—Freeware pro-

gram for construction of phylogenetic trees on the basis of

distance data and bootstrap/jackknife analysis of the tree

robustness. Application in the RAPD analysis of the genus

Frenkelia. Folia Biol (Praha) 45:97–99

Pritchard JK, Stephens M, Donnelly P (2000) Inference of

population structure using multilocus genotype data.

Genetics 155:945–959

Rafalski A (2002) Applications of single nucleotide polymor-

phisms in crop genetics. Curr Opin Plant Biol 5:94–100

Rathinasabapathi B, Gage DA, Mackill DJ, Hanson AD (1993)

Cultivated and wild rices do not accumulate glycine

betaine due to deficiencies in two biosynthetic steps. Crop

Sci 33:534–538

Sakthivel K, Sundaram RM, Shobha Rani N, Balachandran

SM, Neeraja CN (2009) Genetic and molecular basis of

fragrance in rice. Biotechnol Adv 27:468–473

Shi W, Yang Y, Chen S, Xu M (2008) Discovery of a new

fragrance allele and the development of functional markers

for the breeding of fragrant rice varieties. Mol Breed 22:

185–192

Shirasawa K, Takabe T, Takabe T, Kishitani S (2006) Accu-

mulation of glycinebetaine in rice plants that overexpress

choline monooxygenase from spinach and evaluation of

their tolerance to abiotic stress. Ann Bot 98:565–571

Singh NK, Dalal V, Batra K, Singh BK, Chitra G et al (2007)

Single-copy genes define a conserved order between rice

and wheat for understanding differences caused by dupli-

cation, deletion, and transposition of genes. Funct Integr

Genomics 7:17–35

Tamura K, Dudley J, Nei M, Kumar S (2007) MEGA4:

molecular evolutionary genetics analysis (MEGA) soft-

ware version 4.0. Mol Biol Evol 24:1596–1599

Weretilnyk EA, Hanson AD (1990) Molecular cloning of a

plant betaine-aldehyde dehydrogenase, an enzyme impli-

cated in adaptation to salinity and drought. Proc Natl Acad

Sci USA 87:2745–2749

Wood AJ, Saneoka H, Rhodes D, Joly RJ, Goldsbrough PB

(1996) Betaine aldehyde dehydrogenase in sorghum. Plant

Physiol 110:1301–1308

Yang X, Liang Z, Lu C (2005) Genetic engineering of the bio-

synthesis of glycinebetaine enhances photosynthesis

against high temperature stress in transgenic tobacco plants.

Plant Physiol 138:2299–2309

338 Mol Breeding (2010) 26:325–338

123