Constituting Core collections Constituting Core collections of Germplasm using of Germplasm using morphological descriptors morphological descriptors R. Balakrishnan R. Balakrishnan Sugarcane Breeding Sugarcane Breeding Institute. Institute. Coimbatore- 641 007 Coimbatore- 641 007
49
Embed
Constituting Core collections of Germplasm using morphological descriptors
Constituting Core collections of Germplasm using morphological descriptors. R. Balakrishnan Sugarcane Breeding Institute. Coimbatore- 641 007. Major issues involved in the management of a large gene bank (germplasm collection) are :. - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Constituting Core collections of Constituting Core collections of Germplasm using morphological Germplasm using morphological
descriptorsdescriptors
R. BalakrishnanR. Balakrishnan
Sugarcane Breeding Institute.Sugarcane Breeding Institute.
Coimbatore- 641 007Coimbatore- 641 007
Major issues involved in the management of a large gene bank (germplasm collection) are:
• Year-to-year maintenance of large collections of germplasm require enormous amount of land, time, labour and other resources
• The use of the collection is limited by lack of knowledge of the way in which genetic diversity is distributed in the collection
• The users are not fully aware of the variation in the collection that could benefit their breeding programmes or enrich their research projects
• It is difficult to decide whether gaps exist or whether new material has to be added to the collection
How to How to overcomeovercome the problems in the problems in utilizing large germplasm collections?utilizing large germplasm collections?
• By short-listing the field evaluated germplasm & by earmarking a set of accessions holding promise for one or more traits – called a working collection (Harlan, 1972)
• By adopting the concept of Core Collection (Frankel, 1984) in identifying such limited sets for effective use of the collection
What’s a Core CollectionWhat’s a Core Collection ? ?
A Core collection or a Core subset is a sub-A Core collection or a Core subset is a sub-
sample of the base collection (about 5 – sample of the base collection (about 5 –
20% size of the base collection)20% size of the base collection)
It is sampled in such a manner so as to It is sampled in such a manner so as to
represent the available genetic variability represent the available genetic variability
in the base collection to the maximum in the base collection to the maximum
possible extent with minimum duplication possible extent with minimum duplication
(or redundancy(or redundancy))
Scientific basis for setting up a core Scientific basis for setting up a core collection (Brown 1989)collection (Brown 1989)
• The first reason is based on statistical sampling considerations, which essentially assume that breeders, through crossing and selection could recover desirable alleles when required from the core collection. Hence in principle they needed to access only one copy of such alleles
• The second reason relates to the genetic structure of plant populations in general and germplasm collections in particular
• The third reason relates to easier management and better access and exploitation of the germplasm collections
Advantages of core collectionsAdvantages of core collections
• For breeders a core collection represents a logical first step in screening desirable alleles in the collection
• Setting up a core collection is important in understanding the quality of the base collection itself as it helps in elucidating the contents, diversity and duplication in the base collection
• It helps in deciding the quantity of conserved seed stocks that needs to be preserved – smaller seed collection for the reserve collection (or non-core set) and larger seed stocks for the core collection.
• The time and resources needed to evaluate a new trait in the collection are reduced by allowing evaluation of more number of characters and use of more sophisticated techniques like molecular markers
The reports on Global Survey on core The reports on Global Survey on core collections by the International Plant collections by the International Plant Genetic Resources Institute (IPGRI) Genetic Resources Institute (IPGRI) indicates that at least 63 core indicates that at least 63 core collections covering 51 crop species collections covering 51 crop species have been formed across the worldhave been formed across the world
Core collection Scenario
Procedures of constituting a core collection Procedures of constituting a core collection
• Use compiled data on Passport & evaluation of qualitative and quantitative traits – from germplasm catalogues
• Constitute appropriate groups wherever possible • Use a suitable sampling procedure to select the
entries for the core collection• Verification and validation of the selected core
collection• In some instances, the assembly of core collections
has been based on a combination of morphological data, biochemical and molecular markers
Statistical / Sampling methods for constituting a Statistical / Sampling methods for constituting a core collection core collection
• Simple Random sampling (no need for evaluation data on morphological descriptors)
• Stratified Random Sampling – i.e first we discover some structure in the base collection by forming groups through– Stratification on the basis of Geographical Origin of
accessions in the base collection (Passport data needed)– Stratification on the basis of Multivariate Cluster Analysis
(evaluation data on morphological descriptors required)– Stratification on the basis of a combination of Geographical
origin & cluster analysis or other schemes that is applicable to the crop species (both passport and evaluation data required)
• Purposive or directed sampling using the Principal Component Scores (Noirot et al. 1996) aimed at maximizing the diversity in the core or Information Measure (Balakrishnan, 2002).
How to decide the optimum number of entries in the How to decide the optimum number of entries in the core collection collection?core collection collection?
• By studying the relative efficiencies of different stratified sampling procedures through simulation - by estimating the sampling variance of a diversity measure for varying sample sizes.
• Normally the sampling variance of a pooled Shannon Diversity Index (SDI) of the descriptors is a useful criterion.
• Since the sampling variance of SDI can not be estimated thro’ formulas, we resort to simulation or boot-strap procedures to estimate the sampling variance of SDI and decide the core collection size as the one beyond which there is no appreciable reduction in the sampling variance.
• The best stratification method is decided as the one for which even for a smaller sample size, there is a high value of diversity with minimum sampling variance of the diversity measure.
Constituting a core subset by Stratified Random Constituting a core subset by Stratified Random Sampling method Sampling method
• Having decided the size of the core collection, say 10% of the base collection size, allocate as many accessions randomly from each group to the core subset by – Proportional to Frequency method (P strategy)-
when group diversity is proportional to group size.
– Proportional to Logarithm of Frequency method (L strategy)-when group sizes differ widely
– Constant Frequency method (C- strategy)-when diversity is concentrated in smaller groups.
– Proportional to diversity method - when sampling depends on some measure of diversity of each group.
Method of grouping
Group ID
Accessions from No. of Accessions
GroupPooled SDI
1 New Guinea 364 31.01
2 Indonesia 68 29.61
3 New Caledonia 36 27.82
4 Fiji 18 26.17
5 India 22 27.57
6 & 7 Hawaii & Mauritius 9 23.65
8 Other regions 173 31.19
Geographical origin
Grouping of S. officinarum accessions for stratified sampling of the core set
Between group : Within group Diversity Component = 58 : 42
Method Group ID
Accessions from No. of accessions
GroupPooled SDI
1 New Guinea in cluster I 165 30.50
2 Indonesia in cluster I 32 31.65
3 New Caledonia in cluster I 22 32.13
4 Other regions in cluster I 61 33.18
5 Cluster 2 73 30.95
6 Cluster 3 38 31.31
7 New Guinea in cluster 4 169 30.09
8 Indonesia in cluster 4 25 29.79
9 Others in cluster 4 83 29.79
10 Cluster 5 22 34.87
Grouping on the basis of Cluster Analysis + major sources within clusters
• Principal Component Analysis Method of Noirot et al. (1996)– In this method a Principal Component Analysis is carried out
using quantitative traits data of the base collection– The contribution of the i-th accession to the total variance of
the system is computed as: t Pi = y2
ij, j=1– where yij is the component score of the i-th accession on the
j-th principal component and t is the number of principal components extracted
– Then, for each accession in the base collection, its relative contribution to the total GSS (Generalized Sum of Squares) is computed as follows-
Cri = SS of Component Scores * 100
(p x t)where p = no. of accessions; t = no. of traits (p x t) is called the Generalized Sum of Squares or
GSS in short
PCA Method – contd..PCA Method – contd..
• The accessions in the base collection are then arranged in the descending order of magnitude of their contribution to the GSS; and the cumulative contribution of successive accessions to the GSS is also computed.
• A logistic regression model of the form: loge y/(A-y) = exp (a + b*n)
is fitted to the cumulative values. • The rate of progress (dy/dn) for this model = by(A-y). • Either a fixed percent (say 5-10%) of the top accessions are
selected to form the core set or the top accessions are included in the core set until the point of at which the rate of increase in the contribution of the accessions to the GSS starts declining (see the fig. in next slide).
• This method is useful for reducing the redundancy in the core set.
PCS Method – contdPCS Method – contd....
Rate of progress of cumulative contribution to the variance by the accessions
• The second method is similar to PCS method, but here each accession is ranked based on an Information Measure (called the Length of Encoded Attribute Value-LEAV) and the top ranked accessions are included in the core set (Balakrishnan, 2002).
• LEAV is evaluated based on the concepts of Information Theory (Shannon, 1948, Wallace and Boulton, 1968)
• Each entry in the base collection is assigned a score by combining the evaluation data on a number of characters that are either qualitative or quantitative in nature.
• LEAV can be treated as a diversity measure that tells how far each individual is distributed away from the centroid of all individuals.
• It can be used to group the accessions in a way similar to cluster analysis (but we use a divisive algorithm for clustering of the accessions)
• A typical example of computing LEAV is illustrated in the next slide
Descriptor states Freq -ln(p)
Weather marks Present * 146 0.2367 Absent 39 1.5422
Geographical origin New Guinea 62 1.1196 Indonesia * 26 1.9622 New Caledonia 19 2.2670 India 15 2.4901 Fiji 13 2.6236 Hawaii 6 3.3168 Mauritius 3 3.8764 Unknown origin 41 1.5250
LEAV = 0.2367 + 1.7546 + 1.3917 + 1.9622 = 5.3452
Grouping of the accessions on the basis of LEAV Index
1. The computed LEAV index for the entries can be arranged in the form of a frequency distribution and the entries divided into L strata, with stratum boundaries x1, x2,…..x(L-1)
2. An optimum stratification strategy can be arrived at such that the pooled variance of LEAV index evaluated through the stratification is minimum. The stratum boundaries are fixed by using the Dalenius formula (Jarque, 1981) through an interactive computer program
Advantages of using LEAV for clusteringAdvantages of using LEAV for clustering
• Multivariate cluster analysis using quantitative traits through Hierarchical methods becomes complicated and unwieldy when number of accessions are more.
• In general, a small proportion of accessions (say about 100 entries) is selected at random from the main set, clustered and the remaining entries are grouped into already formed clusters (k-means clustering).
• Cluster constitution may differ depending up on the initial selection.
• In most cases, we tend to leave out qualitative traits in these methods, though there are procedures using which you can rescale qualitative attributes to quantitative values
Advantages of LEAV for clustering (cont)Advantages of LEAV for clustering (cont)
• LEAV is very easy to compute and we can include all evaluation data (including passport data) that are qualitative or quantitative.
• Class-intervals of Quantitative data can be treated as attributes and hence can be used in a similar way to that of qualitative attribute values.
• All accessions (even thousands) can be included in one step and a divisive algorithm can be used to form the accessions into diversity groups and these groups can be used for stratified random sampling to constitute the core collection.
• See the references cited in the lecture notes for further details.
• Verification of the core subsets constituted through various methods– by evaluating the retention level of
diversity in the core subsets– by evaluating the retention of association
among closely related traits in the core subsets - through correlation and Factor Analysis
– by evaluating the redundancy levels in the core subsets - using the empirical distribution a Similarity Index
Measures of Diversity Measures of Diversity
• Quantitative traits– Range– Standard Deviation (SD)– Coefficient of Variance (CV)
• Qualitative traits– Shannon-Weaver Diversity Index
Shannon-Weaver Diversity Index (SDI):: 1.4050 Std.Err of Shannon Diversity Index :: 0.0314 Standardized value of SDI.......... :: 0.7842
$: Figures in parentheses are corresponding number of descriptor states#: Standardized SDI = SDI / Loge(No. of descriptor states); its value ranging from 0 –1
Diversity as measured by Shannon Diversity Index (SDI) for qualitative descriptors in the whole collection of S. officinarum
Descriptor $ SDI Standardized SDI #
1. Ivory marks (2) 0.245 0.353
2. Weather marks (2) 0.188 0.271
3. Internode shape (6) 1.405 0.784
4. Internode alignment (2) 0.482 0.695
5. Internode wax (5) 1.263 0.785
6. Growth cracks (2) 0.672 0.970
7. Stripes on cane (2) 0.427 0.615
8. Bud shape (11) 1.696 0.707
9. Bud germpore (3) 0.512 0.466
10. Bud groove (3) 0.926 0.843
11. Growth ring swelling (3) 0.748 0.681
12. Leaf upper surface (2) 0.192 0.277
13. Leaf carriage (3) 0.859 0.782
14. Sheath prickles (5) 1.403 0.872
15. Sheath clasping (2) 0.627 0.905
16. Ligule shape (12) 2.180 0.877
17. Ligular process symmetry (2) 0.492 0.710
Mean diversity and its sampling variance for the core subsets drawn from the whole collection of S. officinarum through simple random sampling and stratified random sampling using different stratification procedures
Sample
size
10 groups based on geographical
distribution within major clusters
8 groups based on geographical origin
only
12 groups based on the LEAV index
Mean pooled
SDI *
Variance (pooled
SDI)
Mean pooled
SDI
Variance (pooled
SDI)
Mean pooled
SDI
Variance (pooled
SDI)
1. Simple random sampling
70 30.90 0.3132 Common to all the 3 methods of grouping
100 31.25 0.1285 “
140 31.39 0.1364 “
170 31.54 0.1009 “
210 31.57 0.0680 “
2. Frequency proportion method
70 30.92 0.1849 30.77 0.2023 30.89 0.0181
100 31.29 0.1177 31.11 0.1432 31.22 0.0100
140 31.38 0.0999 31.42 0.1117 31.33 0.0050
170 31.50 0.0787 31.49 0.0724 31.48 0.0037
210 31.53 0.0574 31.56 0.0798 31.50 0.0028
Sample
size
10 groups based on geographical
distribution within major clusters
8 groups based on geographical origin
only
12 groups based on the LEAV index
Mean pooled
SDI *
Variance (pooled
SDI)
Mean pooled
SDI
Variance (pooled
SDI)
Mean pooled
SDI
Variance (pooled
SDI)
3. Square root proportion method
70 31.27 0.2673 30.87 0.2454 31.06 0.0166
100 31.65 0.1361 31.14 0.1757 31.35 0.0094
140 31.84 0.0761 31.40 0.1083 31.52 0.0063
170 31.94 0.0662 31.45 0.0891 31.62 0.0039
210 31.96 0.0463 31.57 0.0636 31.68 0.0025
4. Log frequency method
70 31.48 0.2318 30.79 0.2486 31.21 0.0187
100 31.83 0.1037 31.10 0.1163 31.42 0.0096
140 31.95 0.0717 31.36 0.1275 31.70 0.0067
170 32.13 0.0680 31.37 0.0720 31.73 0.0033
210 32.16 0.0359 31.60 0.0540 31.86 0.0025
Sample
size
10 groups based on geographical
distribution within major clusters
8 groups based on geographical origin
only
12 groups based on the LEAV index
Mean pooled
SDI *
Variance (pooled
SDI)
Mean pooled
SDI
Variance (pooled
SDI)
Mean pooled
SDI
Variance (pooled
SDI)
5. Diversity proportional method
70 31.06 0.2566 30.87 0.2085 31.50 0.0236
100 31.31 0.1495 31.21 0.1840 31.87 0.0115
140 31.58 0.1103 31.39 0.1280 31.97 0.0057
170 31.58 0.0764 31.50 0.1030 32.14 0.0037
210 31.68 0.0495 31.57 0.0660 32.14 0.0030
6. Equal frequency method
70 31.71 0.2287 Not considered 31.19 0.0185
100 32.02 0.1354 Do 31.43 0.0078
140 32.26 0.0746 Do 31.65 0.0055
170 32.29 0.0522 Do 31.73 0.0052
210 32.42 0.0453 Do 31.80 0.0017
Evaluation of retention of diversity in a core subset
1. Retention (%) of a diversity measure = (1/m) * m [Core diversity / Base diversity] * 100, over m characters
2. Retention of variance or GSS = i (CrI), the sum of contribution of individual accessions in the core subset to the total variance.
Method of sampling
Grouping criterion
Allocation strtegy
Retention % of Range
Retention % CV
Retention % of GSS
Retention % of SDI
Total Retention of Diversity
Verification of Core subsets selected through Purposive Sampling in S. officinarum
PCS-method NIL NIL 99.7 137.3 43.33 104.47 73.90
PCS-method Cluster P 99.8 135.0 42.01 102.24 72.12
PCS-method Cluster L 98.9 137.6 41.86 105.49 73.67
PCS-method Cluster C 98.8 136.0 41.04 107.61 74.32
PCS-method Origin P 99.1 136.8 42.50 104.55 73.53
PCS-method Origin L 98.1 132.3 39.22 108.89 74.05
LEAV Index NIL NIL 97.7 134.8 34.69 124.36 79.53
LEAV Index Cluster P 95.0 126.9 33.50 126.12 79.81
LEAV Index Cluster L 96.1 129.7 33.54 127.14 80.34
LEAV Index Cluster C 98.9 128.9 33.46 126.54 80.00
LEAV Index Origin P 97.7 132.8 34.79 129.62 82.21
LEAV Index Origin L 96.6 126.6 31.79 126.65 79.22
Core subset size = 20% (140 accessions)
General considerationsGeneral considerations
• Need to optimize methods to determine the size of the core collection
• Cluster Analysis using a large base collection is cumbersome
• There should be scope for the user to pre-determine the extent of diversity or variation that he would like to have in the core collection for various traits
• Including accessions with missing data for one or more traits needs consideration
• The personal knowledge about the collection by the gene bank curator is also essential in selecting the accessions to the core subset.
• Thro’ Factor Analysis, the major factors can be identified and the associated factor loadings on the individual traits in the base collection and the core subset can be evaluated.
• A comparison of the factor loadings can then used to infer whether the association among the traits in the base collection is retained in the core subset also.
Evaluation of retention of association Evaluation of retention of association among quantitative traitsamong quantitative traits
Variable
Whole
collection
Sample Size = 15% (100) Sample Size = 20% (140)
SRS PCS-based
LEAV Index- based
SRS PCS-bases
LEAV-Index-based
Retention of association among groups of quantitative traits in core subsets obtained thro’ random and purposive sampling in S. officinarum
1. A measure of similarity between any two accessions in the core collection is computed by making use of available information on quantitative and qualitative trait. Likewise for all possible pairs of accessions the similarity index can be computed (there would be N* (N-1)/2 such coefficients)
2. The empirical distribution of the similarity coefficient is then tabulated
3. The relative frequencies of accession-pairs having a range of similarity coefficients {like 0-0.5 (least similar); 0.5-0.7 (moderately similar) and more than 0.7 (highly similar) can then be analyzed
Computation of Similarity Coefficient
• For any pair of accessions i and j in the core subset, the similarity coefficient can be computed using the formula (Gower Metric):
m
vij = 1 – (1/m) * [(1/Rp ) * |xip - xjp|], p=1
where m is the number of descriptors (which includes qualitative, binary and quantitative types considered for the analysis); Rp, the range in case of a quantitative descriptor or 1 otherwise; xip and xjp are respectively the values for the pth
descriptor for the ith and the jth accessions.
Range of similarity coefficient
Core subset of 15% size based on
Core subset of 20% size based on
Random $ sampling
PCS-method
LEAV Index
Random $ sampling
PCS-method
LEAV Index
0.0 – 0.1 0.00 0.00 0.00 0.00 0.00 0.00
0.1 – 0.2 0.00 0.00 0.00 0.00 0.00 0.00
0.2 – 0.3 0.00 0.00 0.00 0.00 0.00 0.00
0.3 – 0.4 0.19 0.40 1.29 0.11 0.26 0.79
0.4 – 0.5 4.47 8.42 19.54 3.77 6.75 15.72
0.5 – 0.6 27.52 35.69 49.15 25.86 33.09 48.44
0.6 – 0.7 45.77 41.27 25.72 47.10 43.54 30.06
0.7 – 0.8 20.13 12.83 4.06 21.07 15.10 4.68
0.8 – 0.9 1.90 1.37 0.22 2.05 1.24 0.29
0.9 – 1.0 0.02 0.02 0.02 0.04 0.02 0.02
Distribution of similarity coefficient among all possible pairs of accessions in S. officinarum (as relative % of total number of accessions pairs)
Constituting core subsets with pre-assigned frequency distributions for several traits simultaneously
Core subsets constituted through simple or stratified random sampling in general represent the diversity in the base collection for a reasonable core size but does not satisfy specific user’s needs.
Core subsets constituted through purposive sampling using the PCS-method or the LEAV index method on the other hand are expected to result in higher level of diversity than that of random sampling but it would not be possible to predict the pattern of variation for the individual traits in the core collection.
So, can the user of a genetic resource decide the pattern of diversity to be represented in the core collection?
As an example, an attribute (say, spiny leaves) may be present in 10% of the accessions and absent in 90% of the accessions in the base collection.
This indicates a standardized SDI of 0.47 (in the scale 0-1) for this descriptor.
Can the user of the genetic resource obtain a core subset with say, the attribute being present in 70% of the accessions and absent in 30% of the accessions in the core (with a standardized SDI of 0.88)?
This implies an entirely different distribution pattern of the descriptor states in the core subset and a better variation in the relative frequencies of the descriptor states.
If the variation in the relative frequency of attribute states of only one descriptor is pre-determined, then it is quite simple to draw such a core set.
However, when the frequency densities of several descriptors (qualitative or quantitative) are to be pre-assigned simultaneously for the core subset of a given size, the conventional sampling strategies are not adequate.
Hence a new technique for constituting such a user-defined core subset with pre-assigned core size and frequency distributions with respect to several traits has been proposed by Balakrishnan (2002)
Basic strategies in delineating a core subset with pre-determined density distributions with respect to several traits
• Decide a suitable core collection size (and hence the non-core group size).
• Decide the pattern of variation to be realized in the core subset (and hence for the non-core group also) for each of the selected traits (qualitative & quantitative traits)
• If stratification of the accessions in the base collection is available on some criterion (say, on the basis of geographical origin), the user may also decide the allocation pattern of accessions from the diversity groups to the core subset.
• Compute a pair of diversity indices for each accession in the base collection, one corresponding to the core subset and the other to the non-core group based on the joint density distributions.
• Any accession that has the least value of the diversity index on the core group is allocated to the core subset. The process is repeated until all accessions are screened.
Pre-assigning the frequency density of Pre-assigning the frequency density of qualitative traitsqualitative traits
• For illustrating the method, two types of frequency transformation can be used in pre-assigning the frequency density of the qualitative attributes, viz. Square-root- proportion and Log-frequency methods. These two frequency transformations result in a higher variation of the frequency proportions of the attribute states of a qualitative trait (and hence result in higher SDI values) in the core subset.
An example of fixing the descriptor state frequencies of a core subset as per the square-root proportion method
Descriptor & descriptor states
Frequency in whole
collection @
Relative frequency
in the whole
collection
= pi
Relative frequency
fixed for the core subset
= qi
Frequency
in non-core
group
Relative frequency
in non-core group
Bud germpore
Apical 581 0.8420 0.6341 462 0.9149
Sub-Apical 89 0.1290 0.2482 43 0.0851
Median 20 0.0290 0.1177 0 0.0000
Standardized SDI 0.4657 0.7992 0.2782
sWhere qi = pi / [ pj] is the square-root proportion freq. transformation
j=1
@ : Size of the whole collection = 690; Size of core collection = 185; Size of non-core group = 505
An example of fixing the descriptor state frequencies of the core subset as per the Log-Frequency method
Descriptor & descriptor states
Freq. in base
collection
= pi
Relative frequency
in the base
collection
Relative frequency
fixed for the core
subset
=qi
Freq. in non-core
group
Relative frequency
in non-core
subset
Location of spines on OIB
None 133 0.0409 0.2101 45 0.0158
Tip only 62 0.0191 0.1476 0 0.0000
Tip & few basal 88 0.0271 0.1923 7 0.0026
Tip & few apical 6 0.0018 0.0143 0 0.0000
Tip & all along the margin
2961 0.9111 0.4357 2778 0.9816
Standardized SDI 0.2489 0.8437 0.0676
Size of the main collection = 3250; Size of core = 420 ; Size of non-core group = 2830
s
qi = Log(Ni)/ [ {Log(Nj)}], is the Log-frequency transformation
j=1
Example of pre-assigned frequency profile of a quantitative descriptor in the core subset and the non-core group
Descriptor Whole collection Core subset* Non-core group
Absolute freq.
Relative freq.
Absolute freq.
Relative freq.
Absolute freq.
Relative freq.
Leaf width (cm)
2.50- 3.50 27 0.039 17 0.092 10 0.020
3.50- 4.50 179 0.259 39 0.211 140 0.277
4.50- 5.50 312 0.452 67 0.362 245 0.485
5.50- 6.50 137 0.199 34 0.184 103 0.204
6.50- 7.50 31 0.045 24 0.129 7 0.014
> 7.50 4 0.006 4 0.022 0 0.000
Total size 690 1.000 185 1.000 505 1.000
Mean 5.03 5.16 4.98
Std. Deviation 0.872 1.173 0.724
CV% 17.33 22.75 14.55
Range 2.50-9.60 2.50-9.60 3.00-7.30
Std. SDI 0.740 0.879 0.726
Allocating an accession with a set of measurement & attribute Allocating an accession with a set of measurement & attribute values to the core subset with pre-assigned frequency values to the core subset with pre-assigned frequency
densitiesdensities
• For each accession k in the base collection, compute the LEAV index that is based on the pre-assigned frequency densities of the traits under consideration fixed for the proposed core subset. Add to this the length of the information code ( = loge(S/n1)), that
corresponds to the core size, where S is the base collection size and n1 is the core size. This is denoted by F(k,1).
• Compute the corresponding LEAV index for each accession that is based on the frequency densities of non-core group. Add to this the length of the information code ( = loge(S/n2)), that corresponds
to the non-core group size, where S is the base collection size and n2 is the non-core group size. This is denoted by F(k,2).
• If F(k,1) <= F(k,2), then the accession is allocated to the core subset, else it is allocated to the non-core group.