Constituting Core collections of Germplasm using morphological descriptors

Constituting Core collections of Constituting Core collections of Germplasm using morphological Germplasm using morphological

descriptorsdescriptors

R. BalakrishnanR. Balakrishnan

Sugarcane Breeding Institute.Sugarcane Breeding Institute.

Coimbatore- 641 007Coimbatore- 641 007

Major issues involved in the management of a large gene bank (germplasm collection) are:

• Year-to-year maintenance of large collections of germplasm require enormous amount of land, time, labour and other resources

• The use of the collection is limited by lack of knowledge of the way in which genetic diversity is distributed in the collection

• The users are not fully aware of the variation in the collection that could benefit their breeding programmes or enrich their research projects

• It is difficult to decide whether gaps exist or whether new material has to be added to the collection

How to How to overcomeovercome the problems in the problems in utilizing large germplasm collections?utilizing large germplasm collections?

• By short-listing the field evaluated germplasm & by earmarking a set of accessions holding promise for one or more traits – called a working collection (Harlan, 1972)

• By adopting the concept of Core Collection (Frankel, 1984) in identifying such limited sets for effective use of the collection

What’s a Core CollectionWhat’s a Core Collection ? ?

A Core collection or a Core subset is a sub-A Core collection or a Core subset is a sub-

sample of the base collection (about 5 – sample of the base collection (about 5 –

20% size of the base collection)20% size of the base collection)

It is sampled in such a manner so as to It is sampled in such a manner so as to

represent the available genetic variability represent the available genetic variability

in the base collection to the maximum in the base collection to the maximum

possible extent with minimum duplication possible extent with minimum duplication

(or redundancy(or redundancy))

Scientific basis for setting up a core Scientific basis for setting up a core collection (Brown 1989)collection (Brown 1989)

• The first reason is based on statistical sampling considerations, which essentially assume that breeders, through crossing and selection could recover desirable alleles when required from the core collection. Hence in principle they needed to access only one copy of such alleles

• The second reason relates to the genetic structure of plant populations in general and germplasm collections in particular

• The third reason relates to easier management and better access and exploitation of the germplasm collections

Advantages of core collectionsAdvantages of core collections

• For breeders a core collection represents a logical first step in screening desirable alleles in the collection

• Setting up a core collection is important in understanding the quality of the base collection itself as it helps in elucidating the contents, diversity and duplication in the base collection

• It helps in deciding the quantity of conserved seed stocks that needs to be preserved – smaller seed collection for the reserve collection (or non-core set) and larger seed stocks for the core collection.

• The time and resources needed to evaluate a new trait in the collection are reduced by allowing evaluation of more number of characters and use of more sophisticated techniques like molecular markers

The reports on Global Survey on core The reports on Global Survey on core collections by the International Plant collections by the International Plant Genetic Resources Institute (IPGRI) Genetic Resources Institute (IPGRI) indicates that at least 63 core indicates that at least 63 core collections covering 51 crop species collections covering 51 crop species have been formed across the worldhave been formed across the world

Core collection Scenario

Procedures of constituting a core collection Procedures of constituting a core collection

• Use compiled data on Passport & evaluation of qualitative and quantitative traits – from germplasm catalogues

• Constitute appropriate groups wherever possible • Use a suitable sampling procedure to select the

entries for the core collection• Verification and validation of the selected core

collection• In some instances, the assembly of core collections

has been based on a combination of morphological data, biochemical and molecular markers

Statistical / Sampling methods for constituting a Statistical / Sampling methods for constituting a core collection core collection

• Simple Random sampling (no need for evaluation data on morphological descriptors)

• Stratified Random Sampling – i.e first we discover some structure in the base collection by forming groups through– Stratification on the basis of Geographical Origin of

accessions in the base collection (Passport data needed)– Stratification on the basis of Multivariate Cluster Analysis

(evaluation data on morphological descriptors required)– Stratification on the basis of a combination of Geographical

origin & cluster analysis or other schemes that is applicable to the crop species (both passport and evaluation data required)

• Purposive or directed sampling using the Principal Component Scores (Noirot et al. 1996) aimed at maximizing the diversity in the core or Information Measure (Balakrishnan, 2002).

How to decide the optimum number of entries in the How to decide the optimum number of entries in the core collection collection?core collection collection?

• By studying the relative efficiencies of different stratified sampling procedures through simulation - by estimating the sampling variance of a diversity measure for varying sample sizes.

• Normally the sampling variance of a pooled Shannon Diversity Index (SDI) of the descriptors is a useful criterion.

• Since the sampling variance of SDI can not be estimated thro’ formulas, we resort to simulation or boot-strap procedures to estimate the sampling variance of SDI and decide the core collection size as the one beyond which there is no appreciable reduction in the sampling variance.

• The best stratification method is decided as the one for which even for a smaller sample size, there is a high value of diversity with minimum sampling variance of the diversity measure.

Constituting a core subset by Stratified Random Constituting a core subset by Stratified Random Sampling method Sampling method

• Having decided the size of the core collection, say 10% of the base collection size, allocate as many accessions randomly from each group to the core subset by – Proportional to Frequency method (P strategy)-

when group diversity is proportional to group size.

– Proportional to Logarithm of Frequency method (L strategy)-when group sizes differ widely

– Constant Frequency method (C- strategy)-when diversity is concentrated in smaller groups.

– Proportional to diversity method - when sampling depends on some measure of diversity of each group.

Method of grouping

Group ID

Accessions from No. of Accessions

GroupPooled SDI

1 New Guinea 364 31.01

2 Indonesia 68 29.61

3 New Caledonia 36 27.82

4 Fiji 18 26.17

5 India 22 27.57

6 & 7 Hawaii & Mauritius 9 23.65

8 Other regions 173 31.19

Geographical origin

Grouping of S. officinarum accessions for stratified sampling of the core set

Between group : Within group Diversity Component = 58 : 42

Method Group ID

Accessions from No. of accessions

GroupPooled SDI

1 New Guinea in cluster I 165 30.50

2 Indonesia in cluster I 32 31.65

3 New Caledonia in cluster I 22 32.13

4 Other regions in cluster I 61 33.18

5 Cluster 2 73 30.95

6 Cluster 3 38 31.31

7 New Guinea in cluster 4 169 30.09

8 Indonesia in cluster 4 25 29.79

9 Others in cluster 4 83 29.79

10 Cluster 5 22 34.87

Grouping on the basis of Cluster Analysis + major sources within clusters

Between-Group Diversity : Within-Group Diversity Component = 68 : 32

Method Group ID

Accessions with No. of accessions

GroupPooled SDI

GroupMean LEAV

1 LEAV range 22.0 - 26.0 39 20.04 24.90

2 …………… 26.0 - 27.5 63 23.98 26.70

3 …………… 27.5 – 28.5 66 26.25 28.00

4 …………… 28.5 – 29.5 58 27.61 29.00

5 …………… 29.5 – 30.5 79 28.80 30.00

6 …………… 30.5 – 31.5 66 29.78 30.90

7 …………… 31.5 – 32.5 58 30.66 31.90

8 …………… 32.5 – 33.5 54 31.81 32.90

9 …………… 33.5 – 35.0 64 32.93 34.30

10 …………… 35.0 – 36.5 42 33.55 35.80

11 …………… 36.5 – 39.5 52 34.96 37.90

12 …………… 39.5 – 47.5 49 36.68 41.90

Diversity groups on the basis of LEAV index of the accessions

Between Group Diversity : Within Group Diversity Component = 71: 29

Purposive sampling methodsPurposive sampling methods

• Principal Component Analysis Method of Noirot et al. (1996)– In this method a Principal Component Analysis is carried out

using quantitative traits data of the base collection– The contribution of the i-th accession to the total variance of

the system is computed as: t Pi = y2

ij, j=1– where yij is the component score of the i-th accession on the

j-th principal component and t is the number of principal components extracted

– Then, for each accession in the base collection, its relative contribution to the total GSS (Generalized Sum of Squares) is computed as follows-

Cri = SS of Component Scores * 100

(p x t)where p = no. of accessions; t = no. of traits (p x t) is called the Generalized Sum of Squares or

GSS in short

PCA Method – contd..PCA Method – contd..

• The accessions in the base collection are then arranged in the descending order of magnitude of their contribution to the GSS; and the cumulative contribution of successive accessions to the GSS is also computed.

• A logistic regression model of the form: loge y/(A-y) = exp (a + b*n)

is fitted to the cumulative values. • The rate of progress (dy/dn) for this model = by(A-y). • Either a fixed percent (say 5-10%) of the top accessions are

selected to form the core set or the top accessions are included in the core set until the point of at which the rate of increase in the contribution of the accessions to the GSS starts declining (see the fig. in next slide).

• This method is useful for reducing the redundancy in the core set.

PCS Method – contdPCS Method – contd....

Rate of progress of cumulative contribution to the variance by the accessions

0

5

10

15

20

25

1 33 65 97 129 161 193 225 257 289 321 353 385 417 449 481 513 545 577 609 641 673

No. of accessions

Ra

te o

f in

cre

as

e in

GS

S%

Purposive sampling method (cond)Purposive sampling method (cond)

• The second method is similar to PCS method, but here each accession is ranked based on an Information Measure (called the Length of Encoded Attribute Value-LEAV) and the top ranked accessions are included in the core set (Balakrishnan, 2002).

• LEAV is evaluated based on the concepts of Information Theory (Shannon, 1948, Wallace and Boulton, 1968)

• Each entry in the base collection is assigned a score by combining the evaluation data on a number of characters that are either qualitative or quantitative in nature.

• LEAV can be treated as a diversity measure that tells how far each individual is distributed away from the centroid of all individuals.

• It can be used to group the accessions in a way similar to cluster analysis (but we use a divisive algorithm for clustering of the accessions)

• A typical example of computing LEAV is illustrated in the next slide

Descriptor states Freq -ln(p)

Weather marks Present * 146 0.2367 Absent 39 1.5422

Ivory marks Present 153 0.1942 Absent * 32 1.7546

Bud germpore Apical 119 0.4490 Sub-Apical * 46 1.3917 Median 20 2.1919

Geographical origin New Guinea 62 1.1196 Indonesia * 26 1.9622 New Caledonia 19 2.2670 India 15 2.4901 Fiji 13 2.6236 Hawaii 6 3.3168 Mauritius 3 3.8764 Unknown origin 41 1.5250

LEAV = 0.2367 + 1.7546 + 1.3917 + 1.9622 = 5.3452

Grouping of the accessions on the basis of LEAV Index

1. The computed LEAV index for the entries can be arranged in the form of a frequency distribution and the entries divided into L strata, with stratum boundaries x1, x2,…..x(L-1)

2. An optimum stratification strategy can be arrived at such that the pooled variance of LEAV index evaluated through the stratification is minimum. The stratum boundaries are fixed by using the Dalenius formula (Jarque, 1981) through an interactive computer program

x(h) x(h) x(h+1) x(h+1)

x(h) = ½ {[ x.f(x) dx / f(x) dx] + [ x.f(x) dx / f(x) dx]}

x(h-1) x(h-1) x(h) x(h)

Advantages of using LEAV for clusteringAdvantages of using LEAV for clustering

• Multivariate cluster analysis using quantitative traits through Hierarchical methods becomes complicated and unwieldy when number of accessions are more.

• In general, a small proportion of accessions (say about 100 entries) is selected at random from the main set, clustered and the remaining entries are grouped into already formed clusters (k-means clustering).

• Cluster constitution may differ depending up on the initial selection.

• In most cases, we tend to leave out qualitative traits in these methods, though there are procedures using which you can rescale qualitative attributes to quantitative values

Advantages of LEAV for clustering (cont)Advantages of LEAV for clustering (cont)

• LEAV is very easy to compute and we can include all evaluation data (including passport data) that are qualitative or quantitative.

• Class-intervals of Quantitative data can be treated as attributes and hence can be used in a similar way to that of qualitative attribute values.

• All accessions (even thousands) can be included in one step and a divisive algorithm can be used to form the accessions into diversity groups and these groups can be used for stratified random sampling to constitute the core collection.

• See the references cited in the lecture notes for further details.

• Verification of the core subsets constituted through various methods– by evaluating the retention level of

diversity in the core subsets– by evaluating the retention of association

among closely related traits in the core subsets - through correlation and Factor Analysis

– by evaluating the redundancy levels in the core subsets - using the empirical distribution a Similarity Index

Measures of Diversity Measures of Diversity

• Quantitative traits– Range– Standard Deviation (SD)– Coefficient of Variance (CV)

• Qualitative traits– Shannon-Weaver Diversity Index

Descriptor State Frequencies ofINTERNODE SHAPE

Absolute Relative Frequency Frequency

_________________________________________________

CYLINDRICAL : 306 44.35 TUMESCENT : 45 6.52 BOBBIN : 93 13.48 CONOIDAL : 193 27.97 OBCONOIDAL : 13 1.88 CONCAVE CONVEX : 40 5.80

Total............. 690 100.00 _________________________________________________

Shannon-Weaver Diversity Index (SDI):: 1.4050 Std.Err of Shannon Diversity Index :: 0.0314 Standardized value of SDI.......... :: 0.7842

$: Figures in parentheses are corresponding number of descriptor states#: Standardized SDI = SDI / Loge(No. of descriptor states); its value ranging from 0 –1

Diversity as measured by Shannon Diversity Index (SDI) for qualitative descriptors in the whole collection of S. officinarum

Descriptor $ SDI Standardized SDI #

1. Ivory marks (2) 0.245 0.353

2. Weather marks (2) 0.188 0.271

3. Internode shape (6) 1.405 0.784

4. Internode alignment (2) 0.482 0.695

5. Internode wax (5) 1.263 0.785

6. Growth cracks (2) 0.672 0.970

7. Stripes on cane (2) 0.427 0.615

8. Bud shape (11) 1.696 0.707

9. Bud germpore (3) 0.512 0.466

10. Bud groove (3) 0.926 0.843

11. Growth ring swelling (3) 0.748 0.681

12. Leaf upper surface (2) 0.192 0.277

13. Leaf carriage (3) 0.859 0.782

14. Sheath prickles (5) 1.403 0.872

15. Sheath clasping (2) 0.627 0.905

16. Ligule shape (12) 2.180 0.877

17. Ligular process symmetry (2) 0.492 0.710

Mean diversity and its sampling variance for the core subsets drawn from the whole collection of S. officinarum through simple random sampling and stratified random sampling using different stratification procedures

Sample

size

10 groups based on geographical

distribution within major clusters

8 groups based on geographical origin

only

12 groups based on the LEAV index

Mean pooled

SDI *

Variance (pooled

SDI)

Mean pooled

SDI

Variance (pooled

SDI)

Mean pooled

SDI

Variance (pooled

SDI)

1. Simple random sampling

70 30.90 0.3132 Common to all the 3 methods of grouping

100 31.25 0.1285 “

140 31.39 0.1364 “

170 31.54 0.1009 “

210 31.57 0.0680 “

2. Frequency proportion method

70 30.92 0.1849 30.77 0.2023 30.89 0.0181

100 31.29 0.1177 31.11 0.1432 31.22 0.0100

140 31.38 0.0999 31.42 0.1117 31.33 0.0050

170 31.50 0.0787 31.49 0.0724 31.48 0.0037

210 31.53 0.0574 31.56 0.0798 31.50 0.0028

Sample

size




only


Mean pooled

SDI *

Variance (pooled

SDI)

Mean pooled

SDI

Variance (pooled

SDI)

Mean pooled

SDI

Variance (pooled

SDI)

3. Square root proportion method

70 31.27 0.2673 30.87 0.2454 31.06 0.0166

100 31.65 0.1361 31.14 0.1757 31.35 0.0094

140 31.84 0.0761 31.40 0.1083 31.52 0.0063

170 31.94 0.0662 31.45 0.0891 31.62 0.0039

210 31.96 0.0463 31.57 0.0636 31.68 0.0025

4. Log frequency method

70 31.48 0.2318 30.79 0.2486 31.21 0.0187

100 31.83 0.1037 31.10 0.1163 31.42 0.0096

140 31.95 0.0717 31.36 0.1275 31.70 0.0067

170 32.13 0.0680 31.37 0.0720 31.73 0.0033

210 32.16 0.0359 31.60 0.0540 31.86 0.0025

Sample

size




only


Mean pooled

SDI *

Variance (pooled

SDI)

Mean pooled

SDI

Variance (pooled

SDI)

Mean pooled

SDI

Variance (pooled

SDI)

5. Diversity proportional method

70 31.06 0.2566 30.87 0.2085 31.50 0.0236

100 31.31 0.1495 31.21 0.1840 31.87 0.0115

140 31.58 0.1103 31.39 0.1280 31.97 0.0057

170 31.58 0.0764 31.50 0.1030 32.14 0.0037

210 31.68 0.0495 31.57 0.0660 32.14 0.0030

6. Equal frequency method

70 31.71 0.2287 Not considered 31.19 0.0185

100 32.02 0.1354 Do 31.43 0.0078

140 32.26 0.0746 Do 31.65 0.0055

170 32.29 0.0522 Do 31.73 0.0052

210 32.42 0.0453 Do 31.80 0.0017

Evaluation of retention of diversity in a core subset

1. Retention (%) of a diversity measure = (1/m) * m [Core diversity / Base diversity] * 100, over m characters

2. Retention of variance or GSS = i (CrI), the sum of contribution of individual accessions in the core subset to the total variance.

Method of sampling

Grouping criterion

Allocation strtegy

Retention % of Range

Retention % CV

Retention % of GSS

Retention % of SDI

Total Retention of Diversity

Verification of Core subsets selected through Purposive Sampling in S. officinarum

PCS-method NIL NIL 99.7 137.3 43.33 104.47 73.90

PCS-method Cluster P 99.8 135.0 42.01 102.24 72.12

PCS-method Cluster L 98.9 137.6 41.86 105.49 73.67

PCS-method Cluster C 98.8 136.0 41.04 107.61 74.32

PCS-method Origin P 99.1 136.8 42.50 104.55 73.53

PCS-method Origin L 98.1 132.3 39.22 108.89 74.05

LEAV Index NIL NIL 97.7 134.8 34.69 124.36 79.53

LEAV Index Cluster P 95.0 126.9 33.50 126.12 79.81

LEAV Index Cluster L 96.1 129.7 33.54 127.14 80.34

LEAV Index Cluster C 98.9 128.9 33.46 126.54 80.00

LEAV Index Origin P 97.7 132.8 34.79 129.62 82.21

LEAV Index Origin L 96.6 126.6 31.79 126.65 79.22

Core subset size = 20% (140 accessions)

General considerationsGeneral considerations

• Need to optimize methods to determine the size of the core collection

• Cluster Analysis using a large base collection is cumbersome

• There should be scope for the user to pre-determine the extent of diversity or variation that he would like to have in the core collection for various traits

• Including accessions with missing data for one or more traits needs consideration

• The personal knowledge about the collection by the gene bank curator is also essential in selecting the accessions to the core subset.

• Thro’ Factor Analysis, the major factors can be identified and the associated factor loadings on the individual traits in the base collection and the core subset can be evaluated.

• A comparison of the factor loadings can then used to infer whether the association among the traits in the base collection is retained in the core subset also.

Evaluation of retention of association Evaluation of retention of association among quantitative traitsamong quantitative traits

Variable

Whole

collection

Sample Size = 15% (100) Sample Size = 20% (140)

SRS PCS-based

LEAV Index- based

SRS PCS-bases

LEAV-Index-based

Retention of association among groups of quantitative traits in core subsets obtained thro’ random and purposive sampling in S. officinarum

Factor-1

Sucrose 0.965 0.945 0.974 0.981 0.957 0.968 0.976

Brix-300 0.905 0.905 0.909 0.927 0.905 0.884 0.921

Purity 0.854 0.827 0.862 0.899 0.852 0.842 0.885

Brix-200 0.604 0.718 0.633 0.791 0.687 0.587 0.771

%variance 32.0 36.1 30.1 33.8 33.9 30.7 33.6

Factor-2

Stk. Girth 0.853 0.747 0.816 0.887 0.746 0.8538 0.886

Stk.Wt 0.832 …. 0.819 0.870 …. 0.8898 0.867

Leaf Wid 0.705 0.735 …. 0.810 0.735 0.6566 0.807

Leaf Lng 0.497 0.584 0.582 0.644 0.657 …. 0.622

NMC -0.570 -0.726 -0.654 -0.636 -0.639 …. -0.629

%variance 23.5 21.5 22.6 30.6 23.8 24.6 29.1

Evaluation of redundancy in the core

1. A measure of similarity between any two accessions in the core collection is computed by making use of available information on quantitative and qualitative trait. Likewise for all possible pairs of accessions the similarity index can be computed (there would be N* (N-1)/2 such coefficients)

2. The empirical distribution of the similarity coefficient is then tabulated

3. The relative frequencies of accession-pairs having a range of similarity coefficients {like 0-0.5 (least similar); 0.5-0.7 (moderately similar) and more than 0.7 (highly similar) can then be analyzed

Computation of Similarity Coefficient

• For any pair of accessions i and j in the core subset, the similarity coefficient can be computed using the formula (Gower Metric):

m

vij = 1 – (1/m) * [(1/Rp ) * |xip - xjp|], p=1

where m is the number of descriptors (which includes qualitative, binary and quantitative types considered for the analysis); Rp, the range in case of a quantitative descriptor or 1 otherwise; xip and xjp are respectively the values for the pth

descriptor for the ith and the jth accessions.

Range of similarity coefficient

Core subset of 15% size based on

Core subset of 20% size based on

Random $ sampling

PCS-method

LEAV Index

Random $ sampling

PCS-method

LEAV Index

0.0 – 0.1 0.00 0.00 0.00 0.00 0.00 0.00

0.1 – 0.2 0.00 0.00 0.00 0.00 0.00 0.00

0.2 – 0.3 0.00 0.00 0.00 0.00 0.00 0.00

0.3 – 0.4 0.19 0.40 1.29 0.11 0.26 0.79

0.4 – 0.5 4.47 8.42 19.54 3.77 6.75 15.72

0.5 – 0.6 27.52 35.69 49.15 25.86 33.09 48.44

0.6 – 0.7 45.77 41.27 25.72 47.10 43.54 30.06

0.7 – 0.8 20.13 12.83 4.06 21.07 15.10 4.68

0.8 – 0.9 1.90 1.37 0.22 2.05 1.24 0.29

0.9 – 1.0 0.02 0.02 0.02 0.04 0.02 0.02

Distribution of similarity coefficient among all possible pairs of accessions in S. officinarum (as relative % of total number of accessions pairs)

Constituting core subsets with pre-assigned frequency distributions for several traits simultaneously

Core subsets constituted through simple or stratified random sampling in general represent the diversity in the base collection for a reasonable core size but does not satisfy specific user’s needs.

Core subsets constituted through purposive sampling using the PCS-method or the LEAV index method on the other hand are expected to result in higher level of diversity than that of random sampling but it would not be possible to predict the pattern of variation for the individual traits in the core collection.

So, can the user of a genetic resource decide the pattern of diversity to be represented in the core collection?

As an example, an attribute (say, spiny leaves) may be present in 10% of the accessions and absent in 90% of the accessions in the base collection.

This indicates a standardized SDI of 0.47 (in the scale 0-1) for this descriptor.

Can the user of the genetic resource obtain a core subset with say, the attribute being present in 70% of the accessions and absent in 30% of the accessions in the core (with a standardized SDI of 0.88)?

This implies an entirely different distribution pattern of the descriptor states in the core subset and a better variation in the relative frequencies of the descriptor states.

If the variation in the relative frequency of attribute states of only one descriptor is pre-determined, then it is quite simple to draw such a core set.

However, when the frequency densities of several descriptors (qualitative or quantitative) are to be pre-assigned simultaneously for the core subset of a given size, the conventional sampling strategies are not adequate.

Hence a new technique for constituting such a user-defined core subset with pre-assigned core size and frequency distributions with respect to several traits has been proposed by Balakrishnan (2002)

Basic strategies in delineating a core subset with pre-determined density distributions with respect to several traits

• Decide a suitable core collection size (and hence the non-core group size).

• Decide the pattern of variation to be realized in the core subset (and hence for the non-core group also) for each of the selected traits (qualitative & quantitative traits)

• If stratification of the accessions in the base collection is available on some criterion (say, on the basis of geographical origin), the user may also decide the allocation pattern of accessions from the diversity groups to the core subset.

• Compute a pair of diversity indices for each accession in the base collection, one corresponding to the core subset and the other to the non-core group based on the joint density distributions.

• Any accession that has the least value of the diversity index on the core group is allocated to the core subset. The process is repeated until all accessions are screened.

Pre-assigning the frequency density of Pre-assigning the frequency density of qualitative traitsqualitative traits

• For illustrating the method, two types of frequency transformation can be used in pre-assigning the frequency density of the qualitative attributes, viz. Square-root- proportion and Log-frequency methods. These two frequency transformations result in a higher variation of the frequency proportions of the attribute states of a qualitative trait (and hence result in higher SDI values) in the core subset.

An example of fixing the descriptor state frequencies of a core subset as per the square-root proportion method

Descriptor & descriptor states

Frequency in whole

collection @

Relative frequency

in the whole

collection

= pi

Relative frequency

fixed for the core subset

= qi

Frequency

in non-core

group

Relative frequency

in non-core group

Bud germpore

Apical 581 0.8420 0.6341 462 0.9149

Sub-Apical 89 0.1290 0.2482 43 0.0851

Median 20 0.0290 0.1177 0 0.0000

Standardized SDI 0.4657 0.7992 0.2782

sWhere qi = pi / [ pj] is the square-root proportion freq. transformation

j=1

@ : Size of the whole collection = 690; Size of core collection = 185; Size of non-core group = 505

An example of fixing the descriptor state frequencies of the core subset as per the Log-Frequency method

Descriptor & descriptor states

Freq. in base

collection

= pi

Relative frequency

in the base

collection

Relative frequency

fixed for the core

subset

=qi

Freq. in non-core

group

Relative frequency

in non-core

subset

Location of spines on OIB

None 133 0.0409 0.2101 45 0.0158

Tip only 62 0.0191 0.1476 0 0.0000

Tip & few basal 88 0.0271 0.1923 7 0.0026

Tip & few apical 6 0.0018 0.0143 0 0.0000

Tip & all along the margin

2961 0.9111 0.4357 2778 0.9816

Standardized SDI 0.2489 0.8437 0.0676

Size of the main collection = 3250; Size of core = 420 ; Size of non-core group = 2830

s

qi = Log(Ni)/ [ {Log(Nj)}], is the Log-frequency transformation

j=1

Example of pre-assigned frequency profile of a quantitative descriptor in the core subset and the non-core group

Descriptor Whole collection Core subset* Non-core group

Absolute freq.

Relative freq.

Absolute freq.

Relative freq.

Absolute freq.

Relative freq.

Leaf width (cm)

2.50- 3.50 27 0.039 17 0.092 10 0.020

3.50- 4.50 179 0.259 39 0.211 140 0.277

4.50- 5.50 312 0.452 67 0.362 245 0.485

5.50- 6.50 137 0.199 34 0.184 103 0.204

6.50- 7.50 31 0.045 24 0.129 7 0.014

> 7.50 4 0.006 4 0.022 0 0.000

Total size 690 1.000 185 1.000 505 1.000

Mean 5.03 5.16 4.98

Std. Deviation 0.872 1.173 0.724

CV% 17.33 22.75 14.55

Range 2.50-9.60 2.50-9.60 3.00-7.30

Std. SDI 0.740 0.879 0.726

Allocating an accession with a set of measurement & attribute Allocating an accession with a set of measurement & attribute values to the core subset with pre-assigned frequency values to the core subset with pre-assigned frequency

densitiesdensities

• For each accession k in the base collection, compute the LEAV index that is based on the pre-assigned frequency densities of the traits under consideration fixed for the proposed core subset. Add to this the length of the information code ( = loge(S/n1)), that

corresponds to the core size, where S is the base collection size and n1 is the core size. This is denoted by F(k,1).

• Compute the corresponding LEAV index for each accession that is based on the frequency densities of non-core group. Add to this the length of the information code ( = loge(S/n2)), that corresponds

to the non-core group size, where S is the base collection size and n2 is the non-core group size. This is denoted by F(k,2).

• If F(k,1) <= F(k,2), then the accession is allocated to the core subset, else it is allocated to the non-core group.

• Continue till all accessions are screened.

Descriptor states Freq- core group *

Freq- non-core group *

c[m,d,t] for core group

c[m,d,t] for non-core

Weather marks Present 146 498 0.2407 0.0159 Absent 39 7 1.5422 4.1491

Ivory marks Present 153 505 0.1942 0.0020 Absent 32 0 1.7346 6.2285

Bud germpore Apical 119 462 0.4490 0.0928 Sub-Apical 46 43 1.3863 2.4463 Median 20 0 2.1919 6.2305

Geographical origin New Guinea 62 302 1.1196 0.5265 Indonesia 26 42 1.9669 2.4791 New Caledonia 19 17 2.2670 3.3499 India 15 7 2.4901 4.1608 Fiji 13 5 2.6236 4.4485 Hawaii 6 0 3.3168 6.2403 Mauritius 3 0 3.8764 6.2403 Unknown origin 41 132 1.5250 1.3499

F[k,1] = log(690/185) + 0.2407 + 1.7346 + 1.3863 = 4.6779; F[k,2] = log(690/505) + 0.0159 + 6.2285 + 2.4463 = 9.0028

Constituting Core collections of Germplasm using morphological descriptors

Documents

core collection brown

reserve collection

core collections of

core subset

smaller seed collection

working collection harlan

noncore set

large germplasm collections