CCP4 Chicago 2011: Substructure Solution - GWDGshelx.uni-ac.gwdg.de/.../pdfs/ccp4-2011_tgruene_substructure.pdf · Tim Grüne Macromolecular Crystallography (in brief) Data Integration

Tim Grüne

CCP4 Workshop Chicago 2011Phasing & Substructure Solution

Tim GrüneDept. of Structural Chemistry, University of Göttingen

June 2011http://shelx.uni-ac.gwdg.de

[email protected]

CCP4 Chicago 2011: Substructure Solution 1/50

http://shelx.uni-ac.gwdg.de

Tim Grüne

Macromolecular Crystallography (in brief)

Data Integration (Anomalous)Differences

SubstructureSolution

Phasing & DensityModification

Building &Refinement

mosflm, xds,HKL2000,. . .

xprep, shelxc,solve,. . .

shelxd, SnB,HySS,. . .

shelxe, resolve, pi-rate,. . .

arp/warp, coot, ref-mac5,phenix, YOU!

MolecularReplacement

< 1h � < 1h - days–weeks


Tim Grüne

Motivation: The Phase Problem

An electron density map is required to create a crystallographic model.

The electron density is calculated from the complex structure factor F (hkl):

ρ(x, y, z) =1

Vcell

∑h,k,l

|F (h, k, l)|eiφ(h,k,l)e−2πi(hx+ky+lz) (1)

Experiment measures: I(hkl) i.e. real numbersModel Building requires: F (hkl) i.e. complex numbers:

F (hkl) = |F (hkl)|eiφ(hkl)

Connection: |F (hkl)| =√I(hkl)

φ(hkl) = ?


Tim Grüne

Motivation: The Phase Problem

The phase problem is one of the critical steps in macromolecular crystallography:

A diffraction experiment only delivers the amplitude |F (hkl)|, but not the phase φ(hkl) of the structure factor.

Without phases, a map cannot be calculated and no model can be built.


Tim Grüne

Solutions to the Phase Problem

The main methods to overcome the phase problem in macromolecular crystallography (i.e. to obtain initialphases):

• (multi/ single) wavelength anomalous dispersion (MAD, SAD)• (multiple/ single) isomorphous replacement (MIR, SIR)• molecular replacement (MR)

and combinations thereof (SIRAS, MR-SAD, . . . ).

The first two methods are so-called experimental phasing methods.

They require the solution of a substructure.


Tim Grüne

What’s a Substructure?

~a

~b

~c

The Substructure of a (crystal-) structure are the coordi-nates of a subset of atoms within the same unit cell.


Tim Grüne

What’s a Substructure?

~a

~b

~c

The Substructure of a (crystal-) structure are the coordi-nates of a subset of atoms within the same unit cell.

It can be any part of the actual structure. In the usualsense, substructure refers to the marker atoms that areused for phasing.

A real crystal with the properties of the substructure (atomdistribution vs. unit cell dimensions) cannot exist: theatoms are too far apart for a stable crystal.


Tim Grüne

Small Molecule Crystallography

The substructure consists of a small number of atoms in a large unit cell.

In small molecule crystallography, the phase problem is usually easily solved:

A structure with not too many atoms (< 2000 non-hydrogen atoms) can be solved by means of ab initio methodsfrom a single data set — provided the resolution is better than 1.2Å (this is called “Sheldrick’s rule” [2]).


Tim Grüne

The Terms “Ab initio” and “Direct” Methods

ab initio Methods: phase determination directly from amplitudes, without prior knowledge of any atomic posi-tions. Includes direct methods and the Patterson method. The most widely used of all ab initio methodsare:

Direct Methods: phase determination using probabilistic phase relations — usually the tangent formula (Nobelprize for H. A. Hauptman and J. Karle in Chemistry, 1985).


Tim Grüne

The Substructure

Central to phasing based on both anomalous dispersion and isomorphous replacement is the determination ofthe substructure.

The steps involve

1. Collect data set(s)2. Create an artificial substructure data set3. Determine the substructure coordinates with direct methods4. Calculate phases (estimates) for the actual data5. Improve phases (density modification)


Tim Grüne

What about Sheldrick’s Rule?

The Sheldrick’s Rule says that direct methods only work when the data resolution is 1.2 Å or better.

Macromolecules seldomly diffract to this resolution.

So why can we still use direct methods to determine the substructure?


Tim Grüne

Sheldrick’s Rule

Sheldrick’s Rule refers the real structures and the 1.2 Å-limitis about the distance where single atoms can be resolved inthe data.The substructure is an artificial crystal and atom distanceswithin the substructure are well above the usual diffractionlimits of macromolecules, even for 4 Å data.


Tim Grüne

Solving a Macromolecular Phase Problem with a Substructure

The substructure represents only a very small fraction of all atoms in the unit cell (one substructure atom per afew thousand atoms). Yet, knowing the coordinates of the substructure allows to solve the phase problem forthe full structure.


Tim Grüne

The Independent Atom Model (IAM)

“Ordinary” crystallography (other than ultra-high resolution charge density studies) is based on the “IndependentAtom Model”: The total structure factor F (hkl) of each reflection (hkl) is the sum the independent contributionsfrom each atom in the unit cell.

(x, y, z)

x

y

(hkl) = (430)

φ = 2π(hx+ ky + lz)

= rel. distance to Miller-plane

Re(F )

Im(F )

(independent contributions per atom)


Tim Grüne

The Independent Atom Model (IAM)

x

y

(hkl) = (430)

The (vectorial) sum from each atom contribution yields the total scattering factor F (hkl).

The experiment delivers only its modulus |F (hkl)| =√I(hkl), but not its phase.


Tim Grüne

Isomorphous Replacement (1/2)

The Harker Construction represents one way to determine the phase of F (hkl) from an isomorphous replace-ment experiment. It requires two data sets:

√Inat(hkl)

√Ider(hkl)

Native: lacking one or several atoms (the sub-structure)

Derivative: Same unit cell, same atoms plussubstructure atoms


Tim Grüne

Isomorphous Replacement (2/2)

The substructure coordinates can derived from the difference data set (√Ider(hkl)−

√Inat(hkl)).

The substructure phases φ(hkl) can be calculated from the substructure coordinates.

The phases of the native (or derivative) data set can be derived from

• measured native intensities Inat(hkl)

• measured derivative intensities Ider(hkl)

• calculated substructure structure factor F (hkl)


Tim Grüne

Harker Construction (1/4)

Im(F (hkl))

Re(F (hkl))

Draw the structure factor F (hkl) of the substructure.


Tim Grüne


√Inat(hkl)


Fder(hkl) points somewhere on the circle around the tipof F (hkl) with radius

√Inat(hkl)

because

Fder = F + F + F + F︸︷︷︸Fnat


Tim Grüne



Fder(hkl) points somewhere on the circle around the tipof F (hkl) with radius

√Inat(hkl)

At the same time, Fder(hkl) also lies somewhere on thecircle around the origin with radius

√Ider(hkl), because

F + F + F + F = F + F + F + F


Tim Grüne


The two intersections of the two circles are the two possible structure factors Fder(hkl).

How to select the correct one:

MIR a second Harker construction with |Fder-2(hkl)| has exactly one of thetwo possibilities in common with the first Harker construction.

SIR the mean phase of the two possibilities serve as starting point for densitymodification, but if the substructure coordinates are centrosymmetric thetwo-fold ambiguity cannot always be resolved (this problem is specific toSIR).


Tim Grüne

Substructure Contribution to Data

The electron density ρ(x, y, z) can be calculated from the (complex) structure factors F (hkl). Inversely thestructure factor amplitudes can be calculated once the atom content of the unit cell is known:

F (hkl) =atoms j∑

in unit cellfj(θhkl)e

−Bjsin2 θhkl

λ2 e2πi(hxj+kyj+lzj) (2)

• fj atomic scattering factor: depends on atom type and scattering angle θ. Values tabulated in [3], Tab. 6.1.1.1• Bj atom displacement parameter, ADP: reflects vibrational motion of atoms• 2π(hxj + kyj + lzj) phase shift: relative distance from origin• (xj, yj, zj): fractional coordinates of jth atom


Tim Grüne

Extraction of Substructure Contribution

Experimental determination (i.e. other than by molecular replacement) of the contribution of the substructure tothe measured intensities is achieved by

• Anomalous Dispersion: Deviation from Friedel’s Law and comparison of|F (hkl)| with |F (h̄k̄l̄)|

• Isomorphous replacement: Changing the intensities without changing the unit cell and com-parison of

|Fnat| with |Fder|The alteration is usually achieved by the introduction of some heavy atom type into the crystal,either by soaking or by co-crystallisation.


Tim Grüne

Anomalous Scattering

The atomic scattering factors fj(θ) describe the reaction of an atom’s electrons to X-rays.

They are different for each atom type (C, N, P,. . . ), but normally they are real numbers and do not vary signifi-cantly when changing the wavelength λ.

If, for one atom type, the wavelength λ matches the transition energy of one of the electron shells, anomalousscattering occurs.

This behaviour can be described by splitting fj(θ) into three parts:

fanomj = fj(θ) + f ′j(λ) + if ′′j (λ)

Near the transition energy, the latter two, f ′ and f ′′ vary strongly with the wavelength.


Tim Grüne


Since the equation for F (h, k, l) (Eq. 2) is a “simple” sum, one can group it into sub-sums. In the case ofSAD and MAD the following “grouping” has turned out to be useful, with its phase diagram on the right (ADP

e−8π2Uj

sin2 θhklλ2 omitted for clarity):

F (hkl) =non-∑

substructurefµe

2πihrµ

︸︷︷︸FP

+substructure∑

normalfνe

2πihrν

︸︷︷︸FA

+substructure∑anomalous

f ′νe2πihrν

︸︷︷︸F ′

+isubstructure∑anomalous

f ′′ν e2πihrν

︸︷︷︸iF ′′

Im(F )

Re(F )

FP FA

F ′iF ′′

F


Tim Grüne


Since the equation for F (h, k, l) (Eq. 2) is a “simple” sum, one can group it into sub-sums. In the case ofSAD and MAD the following “grouping” has turned out to be useful, with its phase diagram on the right (ADP

e−8π2Uj

sin2 θhklλ2 omitted for clarity):

F (hkl) =non-∑

substructurefµe

2πihrµ

︸︷︷︸FP

+substructure∑

normalfνe

2πihrν

︸︷︷︸FA


f ′νe2πihrν

︸︷︷︸F ′


f ′′ν e2πihrν

︸︷︷︸iF ′′

Im(F )

Re(F )

FP FA

F ′

iF ′′

F

Normal presentation, because f ′ usu-ally negative


Tim Grüne

Breakdown of Friedel’s Law

Now compare F (hkl) with F (h̄k̄l̄) in the presence of anomalous scattering:

F (h̄k̄l̄) =non-∑

substructurefµe

2πi(−h)rµ

︸︷︷︸F−P

+substructure∑

normalfνe

2πi(−h)rν

︸︷︷︸F−A


f ′νe2πi(−h)rν

︸︷︷︸F ′−


f ′′ν e2πi(−h)rν

︸︷︷︸iF ′′−

Im(F )

Re(F )

F+P

F+A

F ′+

Friedel’s Law is valid for the fµ, fν, and f ′ν parts:

F−P

F−AF ′−


Tim Grüne

Breakdown of Friedel’s Law

Now compare F (hkl) with F (h̄k̄l̄) in the presence of anomalous scattering:

F (h̄k̄l̄) =non-∑

substructurefµe

2πi(−h)rµ

︸︷︷︸F−P

+substructure∑

normalfνe

2πi(−h)rν

︸︷︷︸F−A


f ′νe2πi(−h)rν

︸︷︷︸F ′−


f ′′ν e2πi(−h)rν

︸︷︷︸iF ′′−

Im(F )

Re(F )

F+P

F+A

F ′+

iF ′′+

F+

The complex contribution of if ′′ν violates Friedel’s Law:

F−P

F−AF ′−

iF ′′−

F−

|F+| 6= |F−|


Tim Grüne

How all this helps

Karle (1980) and Hendrickson, Smith, Sheriff (1985) published the following formula∗:

|F+|2 = |FT |2 + a|FA|2 + b|FA||FT |+ c|FA||FT | sinα

|F−|2 = |FT |2 + a|FA|2 + b|FA||FT | − c|FA||FT | sinα

Im(F )

Re(F )

α

F+T

F+AF+

P

F+

with FT = FP + FA the non-anomalous contribution of structure and substructure.

NB: α(hkl) = α(h̄k̄l̄), |FA(hkl)| = |FA(h̄k̄l̄)|, and |FT (hkl)| = |FT (h̄k̄l̄)|

∗a = f ′′2+f ′2

f 2 , b = 2f ′

f, c = 2f ′′

f


Tim Grüne

Combining Theory and Experiment

The diffraction experiment measures the Bijvoet pairs |F+| and |F−| for many reflections.

In order to “simulate” a small-molecule experiment for the substructure, we must know |FA|.

With the approximation 12(|F+|+ |F−|) ≈ |FT | the difference between the two equations above yields

|F+| − |F−| ≈ c|FA| sinα


Tim Grüne

Status Quo – A Summary

• Our experiment measures |F+| =√I(hkl) and |F−| =

√I(h̄k̄l̄).

• We are looking for (an estimate) of |FA| for all measured reflections.– These |FA| mimic a small-molecule data set from the substructure alone.– The small-molecule data set can be solved with direct methods, even at moderate resolution.

• With the help of Karle and Hendrickson, Smith, Sheriff, we already derived

|F+| − |F−| ≈ c|FA| sinα

We are nearly there!


Tim Grüne

The Factor c

Before we know |FA|, we must “get rid of” c and sinα.

c = 2f ′′(λ)f(θ(hkl)) can be calculated for every reflection provided f ′′(λ).

During a MAD experiment, f ′′(λ) is usually measured by a fluorescence scan.

To avoid the dependency on f ′′(λ), the program shelxd [1] uses normalised structure factor amplitudes|E(hkl)| instead of |F (hkl)|, which are very common for small-molecule programs.


Tim Grüne

The Angle α

In the case of MAD, multi-wavelength anomalous dispersion, we can calculate sinα because there is oneequation for each wavelength.

In the case of SAD, the program shelxd approximates |FA| sin(α) ≈ |FA|.

Why is this justified?

Bijvoet pairs with a strong anomalous difference (∣∣∣|F+| − |F−|

∣∣∣) have greater impact in direct methods. Thedifference is large, however, when α is close to 90◦ or 270◦, i.e. when sin(α) ≈ ±1. This coarse approximationhas proven good enough to solve hundreds or thousands of structures with shelxd.


Tim Grüne

The Angle α

In the case of MAD, multi-wavelength anomalous dispersion, we can calculate sinα because there is oneequation for each wavelength.

In the case of SAD, the program shelxd approximates |FA| sin(α) ≈ |FA|.

Why is this justified?

Bijvoet pairs with a strong anomalous difference (∣∣∣|F+| − |F−|

∣∣∣) have greater impact in direct methods. Thedifference is large, however, when α is close to 90◦ or 270◦, i.e. when sin(α) ≈ ±1. This coarse approximationhas proven good enough to solve hundreds or thousands of structures with shelxd.


Tim Grüne

Fluorescence Scan

In order to find the wavelength with the strongest response of the anomalous scatterer, the values of f ′ and f ′′

are determined from a fluorescence scan.

−10

−8

−6

−4

−2

0

2

4

13460 13470 13480 13490

Sig

nal [

e]

Energy [eV]

f’ and f’’ for Br

λ = 1Å * 12398 eV/E

Br f"Br f’

In order to get the strongest contrast in the differ-ent data sets, MAD experiments collect data at

1. maximum for f ′′ (peak wavelength)2. minimum for f ′ (inflection point)


Tim Grüne

SIR/MIR

Single isomorphous replacement requires two data sets:

1. Macromolecule in absence of heavy metal (Au, Pt, Hg, U,. . . ),called the native data set.Measures |FP |.

2. Macromolecule in presence of heavy metal, called the deriva-tive data set.Measures |F |.

|FA| can then be estimated from the difference

|F | − |FP |

Im(F (hkl))

Re(F (hkl))

F

FA

FP


Tim Grüne

SAD and MAD vs. SIR and MIR

SAD/ MAD SIR/ MIR• |FA| from |F+| vs. |F−| (one data set) • |FA| from native vs. derivative (two data sets)• signal improved by wavelength selection • signal improved by heavy atom (large atomic num-

ber Z)• requires tunable wavelength (synchrotron) • independent from wavelength (inhouse data)• “multi”: one crystal, different wavelengths • “multi”: several derivatives = several crystals


Tim Grüne

SIRAS (1/2)

Isomorphous replacement is hardly used for phasing anymore:

• Incorporation of heavy atom into crystal often alters the unit cell⇒ non-isomorphismbetween native and derivative ruins usability of data sets• Heavy atom in structure⇒ go to synchrotron and try MAD instead of SIR/MIR


Tim Grüne

SIRAS (2/2)

However, one often has a high-resolution data set from a native crystal and one (lower resolution) data set withanomalous signal.

They can be combined into a SIRAS experiment:

• Anomalous data set: SAD• Anomalous data set as derivative vs. native: SIR

These are two independent sources of phase information and usually work better than either alone.


Tim Grüne

|FA|: Substructure Solution with Direct Methods


Tim Grüne

Direct Methods

Having figured out the values |FA| from our measured data we are actually pretending having collected a dataset from a crystal with exactly the same (large) unit cell as our actual macromolecule but with only very fewatoms inside.

We artificially created a small molecule data set.


Tim Grüne

Normalised Structure Factors

Experience shows that direct methods produce better results if, instead of the normal structure factor F (hkl),the normalised structure factor is used.

The normalised structure factor is calculated as E(hkl)2 = F (hkl)2/ε

〈F (hkl)2/ε〉

ε is a statistical constant used for the proper treatment of centric and acentric reflections; the denominator⟨F (hkl)2/ε

⟩as is averaged

per resolution shell. It is calculated per resolution shell (≈ 20 shells over the whole resolution range).

This eliminates the strong fall-off with scattering angle θ.


Tim Grüne

Starting Point for Direct Methods: The Sayre Equation

In 1952, Sayre published what now has become known as the Sayre-Equation

F (h) = q(sin(θ)/λ)∑h′Fh′Fh−h′

This equation is exact for an “equal-atom-structure” (like the substructure generally is).

It requires, however, complete data including F (000), which is hidden by the beam stop, so per se the Sayre-equation is not very useful.


Tim Grüne

Tangent Formula (Karle & Hauptman, 1956)

The Sayre-equation serves to derive the tangent formula

tan(φh) ≈∑

h′ |Eh′Eh−h′| sin(φh′ + φh−h′)∑h′ |Eh′Eh−h′| cos(φh′ + φh−h′)

which relates three reflections h, h′, and h− h′.

H. A. Karle & J. Hauptman were awarded the Nobel prize in Chemistry in 1985 for their work on the tangentformula (and many other contributions).


Tim Grüne

Solving the Substructure

Direct methods (and in particular shelxd) do the following:

1. Assign a random phase to each reflection. They will not fulfil the tangent formula.2. Refine the phases using |FA| (or rather |EA|) to improve their fit to the tangent formula3. Calculate a map, pick the strong peaks (as putative substructure)4. Calculate phases from the peak coordinates and return to step 2.


Tim Grüne

Solving the Substructure

Steps 1-4 are repeated many times, each time with a new set of random phases.

Every such attempt is evaluated and the best attempt is kept as solution for the substructure.

The substructure is thus found by chance. It works even if the data quality is only medium — but may requiremore trials.

The cycling between steps 2-3, i.e. phase refinement in reciprocal space vs. peak picking in direct space, iscalled dual space recycling [4, section 16.1]


Tim Grüne

Dual-Space Recycling

dual space

recycling

Real space:

select atoms

reciprocal space

refine phases

random atomsatoms consistent

with Patterson

repeat a few

100−

1000 tim

es

Yes

No

FFT andpeak search

from atoms

phases(map)

keep solution

between E obs and E calc

selection criterion:

Correlation Coefficient

or

(better)

best CC?


Tim Grüne

Phasing the Rest

Once the coordinates of the substructure are found, they serve to calculate the phases for the whole structure,e.g. with the Harker Construction. With those phases, an initial electron density map for the new structure canbe calculated:

I(hkl)

��

|FA| (x, y, z)A ∫fµ

φ(hkl)A φ(hkl)

ρ(x, y, z)


Tim Grüne

Phasing the Rest

. . . this is the field of density modification and not part of this presentation. . .


Tim Grüne

References

1. G. M. Sheldrick, A short history of SHELX, Acta Crystallogr. (2008), A64, pp.112–1332. R. J. Morris & G. Bricogne, Sheldrick’s 1.2Å rule and beyond, Acta Crystallogr. (2003), D59, pp. 615–6173. E. Prince (ed.), International Tables for Crystallography, Vol. C, Union of Crystallography4. M. G. Rossmann & E. Arnold (eds.), International Tables for Crystallography, Vol. F, Union of Crystallogra-

phy


CCP4 Chicago 2011: Substructure Solution - GWDGshelx.uni-ac.gwdg.de/.../pdfs/ccp4-2011_tgruene_substructure.pdf · Tim Grüne Macromolecular Crystallography (in brief) Data Integration

Documents

CCP4 Chicago 2011: Substructure Solution - GWDGshelx.uni-ac.gwdg.de/.../pdfs/ccp4-2011_tgruene_substructure.pdf · Tim Grüne Macromolecular Crystallography (in brief) Data Integration