Page 1
Engineering Applications of Computational Fluid Mechanics Vol. 8, No. 1, pp. 158–177 (2014)
158
DEVELOPMENT OF PARALLEL PSEUDO-SPECTRAL SOLVER
USING INFLUENCE MATRIX METHOD AND APPLICATION TO
BOUNDARY LAYER TRANSITION
S. Bhushan+*
and D. K. Walters
#
+Center for Advanced Vehicular Systems, Mississippi State University, Starkville, MS 39762, USA
#Department of Mechanical Engineering, Mississippi State University, Starkville, MS 39762, USA
*E-Mail: [email protected] (Corresponding Author)
ABSTRACT: The performance of a pseudo-spectral solver, which uses FFT in the streamwise and spanwise
directions and Chebyshev polynomials in the wall-normal direction, is improved by implementing MPI domain
decomposition for the Chebyshev collocation method using the influence matrix method. The parallel code is
validated for LES of plane channel flow at Re=590 and 2003, and applied for temporally developing flat plate
transition simulations. The domain decomposition using the influence matrix method is implemented to achieve C1
continuity of the velocity across the domain interface. The method provides super linear scalability of solver
computational performance. The predictions show kinks in the turbulent statistics on coarser grids, but agree very
well with DNS data for finer grids. It is found that accurate turbulent predictions require a minimum of 48
collocation points per subdomain for Re ~ 105, with higher numbers required for Re ~ 10
6. The flat plate transitional
flow simulation is performed for a freestream turbulence intensity of 1.8%, which predicts the beginning of bypass
transition at Rex=9.2×105, turbulent overshoot in the wall shear stress profile for Rex=1.42-1.72×10
6, and a fully
developed turbulent region for higher Rex. Analysis of vortical structure show that the transition occurs due to the
formation of turbulent spots, which are in return formed due to the merger of near wall low strength elongated
structures. The growth of the turbulent spot is sustained by ejection sweep events from the wall. The results
demonstrate that a temporally developing simulation is a viable, inexpensive alternative to the commonly used
spatially developing approach for boundary layer transition prediction.
Keywords: parallel solver, pseudo-spectral solver, influence matrix method, channel flow, boundary layer
transition
1. INTRODUCTION
Bypass transition flows are common in
engineering systems, for which Reynolds
averaged Navier-Stokes (RANS) solvers are
typically used for design applications. RANS
transition modeling methods range from
correlation-based to physics-based models
(Walters and Cokljat, 2008; Yakinthos, 2013).
However, the available models are not mature
enough to be used with confidence for a wide
range of applications, and often parametric study
is performed to identify the most accurate
combination of numerical schemes and models,
e.g., the Baxevanou and Fidaros (2008) study for
wind turbine applications. Walters and Cokljat
(2008) suggest that one key reason for the
uncertainty in transition modeling is the limited
availability of detailed, large-scale validation
datasets from either experiments or state-of-the-
art numerical methods such as Large Eddy
Simulation (LES) or Direct Numerical Simulation
(DNS). The motivation of this study is to develop
a scalable pseudo-spectral solver for DNS/LES of
flat plate boundary layer transition flows, and use
it to generate a large-scale parametric dataset for
use by model developers. This study is a first step
towards this goal and focuses on parallelization of
a pseudo-spectral solver using a hybrid
MPI/OpenMP approach, and demonstration of
temporally developing simulations as a viable
approach for flat plate boundary layer
simulations. The following provides a review of
the LES/DNS studies for flat plate boundary layer
transition available in the literature.
1.1 Literature review
To date, high fidelity numerical studies of flat
plate boundary layers have helped to highlight the
underlying transitional flow physics. Studies
agree that freestream disturbances enter the
boundary layer and induce low-frequency stream-
wise vortices or streaks in the pre-transitional
region (referred to as Klebanoff modes), which
lift from the wall causing ejection events.
Transition occurs due to the formation of
turbulent spots which are associated with multiple
Received: 5 Nov. 2012; Revised: 31 Oct. 2013; Accepted: 25 Nov. 2013
Page 2
Engineering Applications of Computational Fluid Mechanics Vol. 8, No. 1 (2014)
159
head hairpin type vortices with U- or -shaped
structures underneath them. However, studies
have not reached a consensus regarding the
turbulent spot formation and turbulence
production mechanisms. Singer and Joslin (1994)
concluded that the hairpin vortices are formed
due to the initial disturbances, and are not related
to the near-wall ejection process. Wu and Moin
(2009) arrived at a similar conclusion regarding
turbulent spot generation, and proposed that the
self-sustained auto-regeneration of the hairpin
structures is responsible for turbulence
production. They also suggested that the hairpin
structures leave behind elongated streak like
structures as they are ejected from the wall, thus
streaks are merely a kinematic feature (symptom)
not the cause of breakdown. The auto-
regeneration of hairpin vortices as a key
turbulence production mechanism has also been
supported by Kim et al. (2008) for turbulent
channel flow at relatively low Reynolds number
(Re = 395) and by Wallace et al. (2010) in
transitional flow studies. Zaki and Durbin (2005)
and Jacobs and Durbin (2010) proposed that the
spots exhibit Kelvin-Helmholtz type instability
and are formed due to the interaction of lifting
streaks and the high-frequency freestream
disturbances at the edge of boundary layer.
Schlatter et al. (2008) proposed that the spots are
formed due to streak secondary instability,
wherein the streaks interact to first form -shaped
structures, which evolve into hairpin structures.
This process is sustained by the advection of
turbulence via ejection events. Jeong et al. (1997)
also identified ejection/sweep events as the
primary cause of turbulence production for
turbulent channel flow simulations at Re = 180.
But they did not observe hairpin vortices,
probably because the study focused only on the
sub- and buffer layer regions.
DNS and LES of canonical flows such as flat
plate boundary layers are well suited for
understanding transition flow physics and
obtaining detailed datasets for model validation.
However, very limited datasets are available in
the open literature, and these are typically
performed for moderate Reynolds numbers (Rex ~
105) and high freestream turbulence intensities
(FSTI 5%) (Wu and Moin, 2009; Jacobs and
Durbin, 2001; Schlatter et al., 2008). Real
applications involving bypass transition may
include a broad range of turbulent intensities from
1% to 10% or more, and transition is expected to
occur at much higher Reynolds number (Rex ~
2×106) for low FSTI values. Further, prior
simulations have often been performed by
imposing manufactured/controlled instabilities
(Schlatter et al., 2008) or bursts of isotropic
fluctuations (Wu and Moin, 2009), and have
mostly focused on the analysis of boundary layer
breakdown rather than the dissemination of
parameterized statistical data for model
development. The study most relevant to this
work is that of Brandt et al. (2004) wherein DNS
was performed for FSTI of 1.5% to 4.7% with
integral length scales of 2.5 to 7.5 times the
displacement thickness. The lack of a
comprehensive dataset is likely due to the high
computational cost associated with DNS/LES,
which require: (a) highly accurate numerical
schemes for accurate prediction of the small-scale
turbulent structures; and (b) large numbers of grid
points and small time steps, which increase
significantly for higher Re.
Pseudo-spectral methods provide an accurate
scheme for DNS/LES of canonical
incompressible flows. The most commonly used
solvers are based on Fourier modes to represent
the flow field along homogenous streamwise and
spanwise directions, and Chebyshev polynomials
along the non-homogenous wall-normal direction
(Kim et al., 1987; Moser et al., 1999). One
drawback of these solvers is that they are not well
suited for parallel domain decomposition. This is
because spectral methods require all of the data
along a given grid direction to obtain derivatives,
thus data transfer requirements are large.
Schlatter et al. (2008) performed DNS of
transitional flows for relatively low Re = 4.1×104
using a pseudo-spectral solver parallelized only
along one of the homogeneous directions.
Schlatter et al. (2009) extended the solver to
include domain decomposition along both
homogenous directions and applied it for
turbulent boundary layer simulations at Re =
1.2×106. Some recent studies have used finite
difference solvers, probably due to the scalability
limitations of pseudo-spectral solvers. Hoyas and
Jimenez (2006) performed DNS of plane channel
flow at Re = 4.86×104 using a pseudo-spectral
solver which used a seven-point compact finite
difference scheme in the wall-normal direction.
Wu and Moin (2009) performed DNS of a
transitional boundary layer at Re = 3.85×105
using a 2nd
order finite-difference solver. Borell et
al. (2011) recently developed a solver for
turbulent boundary layer simulations that uses a
fourth-order compact finite difference scheme in
the streamwise and wall normal directions and
FFT along the spanwise direction.
Page 3
Engineering Applications of Computational Fluid Mechanics Vol. 8, No. 1 (2014)
160
Multi-domain techniques for Chebyshev
collocation methods for parallel computations
have been documented by Peyret (2001). They
discuss influence matrix and iterative methods. In
both approaches, the governing equations are
decomposed on different domains along with
interface conditions to satisfy continuity of both
the variable and its derivatives at the domain
interface. In the influence matrix method, the
variable is decomposed into two components: one
with predefined null boundary conditions and a
second to satisfy the boundary and interface
conditions. The solution of the second component
requires evaluation of influence coefficients,
which change with the interface conditions. In the
iterative approach, the equations on one side of
the domain interface are solved using Dirichlet
type boundary condition, and on the other side
using Neumann type conditions. The equations
are solved iteratively until the interface conditions
are satisfied. As expected, the iterative method is
significantly more expensive compared to the
influence matrix method, thus the latter has been
used in several studies. Raspo et al. (1996)
applied the technique for laminar flow simulation
within a cavity type configuration with a
singularity. They reported that the single-domain
simulation predicts large Gibbs oscillations due to
the singularity, whereas the multi-domain
simulations minimize the oscillations. Gibbs
oscillations have also been reported in several
other studies for problems which exhibit rapid
spatial variation in the solution (refer to Yang and
Shizgal, 1994). The multi-domain approach helps
to overcome this limitation, as it limits the
influence of large gradients within each sub-
domain thereby enhancing convergence. Sabbah
and Pasquetti (1998) developed a 3D solver using
multi-domain Chebyshev polynomials in two
directions and FFT along the third direction. The
method was validated for Rayleigh-Bernard
convection in a cavity of large aspect ratio at
Rayleigh number Ra = 3000, which involved
smooth laminar flow. The solver has also been
successfully applied for LES of turbulent flow
past a square cylinder and an Ahmed body by
Minguez et al. (2008 and 2011).
High computational costs associated with
transitional flow studies in the literature are also
due to the use of a spatially developing approach,
which requires a relatively large flow domain in
order to resolve the entire region of the boundary
layer that contains the relevant flow physics. In
addition, such simulations with a pseudo-spectral
solver require a fringe region in the streamwise
direction so that the periodic flow can be mapped
to appropriate inlet boundary conditions (Spalart,
1988; Khurajadze and Oberlack, 2004; Schlatter
et al., 2008). Such mapping requires
decomposition and transformation of the
governing equations, which does not accurately
represent turbulent straining as pointed out by Wu
and Moin (2009).
Temporally developing simulations represent an
alternative to the spatially developing approach.
In this approach, the numerical domain moves
along with the developing flow in the streamwise
direction due to the use of periodic boundary
conditions, and it has been applied successfully
for DNS/LES of mixing layer and jet flows
(Ansari, 1997; Akhavan et al., 2000; Bhushan et
al., 2006). Malik and Hussaini (1990) used this
approach for transitional and turbulent boundary
layer simulations, including study of the
interaction of Gortler vortices and Tollmein-
Schlichting waves in a boundary layer developing
on a concave surface. Overall, the temporally
developing simulations are less numerically
expensive than their spatially developing
counterparts, as they require an order of
magnitude smaller domain along streamwise
direction. Further, they do not involve a debatable
fringe region for purposes of recycling the
boundary conditions. A smaller number of grid
points along the streamwise direction also means
less data communication for FFT calculations,
when domain decomposition is used along FFT
directions, and better parallel performance. In
addition, input/output (I/O) file sizes and time are
reduced, and averaged quantities at any instant
can be obtained from the solutions along
homogeneous directions. The relative
disadvantages of the temporally developing
approach include a larger required number of
time steps and more frequent I/O than the
spatially developing simulations.
1.2 Objectives and approach
The objectives of this study are to: (1) implement
and validate a hybrid MPI/OpenMP
parallelization to a pseudo-spectral solver, and (2)
demonstrate that a temporally developing
approach is a viable method for transitional
boundary layer simulations. The pseudo-spectral
solver used herein was previously developed and
validated for channel and free-shear flows
(Bhushan and Warsi, 2005; Bhushan et al., 2007).
In this study, the computational performance of
the solver is improved by using OpenMP thread
parallelization for FFT, and MPI parallelization
along the wall-normal direction using the
Page 4
Engineering Applications of Computational Fluid Mechanics Vol. 8, No. 1 (2014)
161
influence matrix method for the Chebyshev
collocation method, as discussed in the following
section. In section 3, the parallel solver is
validated for plane channel flow LES predictions
at Re = 590 and 2003 using DNS datasets. In
section 4, the scalability of the solver is studied
and performance bottlenecks are identified. The
solver is applied for LES of a temporally
developing transitional flat plate boundary layer
in section 5, and results validated for prediction
of integral quantities. Conclusions of the study
and future work are summarized in section 6.
2. NUMERICAL METHOD
LES requires solution of the filtered Navier–
Stokes equations:
(1a)
The term on the right hand side is the subgrid
stress (SGS) tensor, which in the present study is
obtained using the Dynamic Smagorinsky model
(DSM) (Lilly, 1992). The equations are
discretized using the pseudo-spectral method
employing Fourier series in the streamwise (x1)
and spanwise (x3) directions and Chebyshev
polynomials in the wall-normal direction (x2). The
equations are solved using a fractional step
method, where the equations at every time step
are solved in three steps. The procedure is
summarized here; readers are referred to Bhushan
et al. (2007) for details.
The first step marches the convective term using
the second-order Adams Bashforth method,
where the superscript (1) (or 2 as used below)
implies the level of the fractional step, (N) the
time iteration level and t is the time step size.
The second step is the pressure correction step
which imposes incompressibility. The equations
solved in this step are:
(3a)
(3b)
Employing the Fourier transform along x1 and x3,
where the transform is defined as below for a
variable :
(4)
Eqs. (3a and b) reduces in wave number space
as:
(5a)
(5b)
(5c)
(5d)
The third step incorporates the viscous and
subgrid stresses, which in wave number space is:
where l = 1,2,3 and are computed from .
The calculation of the 2nd
and 3rd
terms on the
right-hand side of Eq. (2) and the turbulent
stresses in Eq. (6) involve convolution in the
wave number space, which is complicated to
perform. Thus, they are computed in physical
space using the 3/2 dealiasing rule, and hence the
solver is called “pseudo-spectral.” This procedure
requires inverse FFT computations of 3 velocity
and 6 derivative components to move data from
wave number to physical space. After the
convective (or stress) terms are computed, they
are transferred to the wave number space; this
requires 3 FFT computations for the convective
step and 6 for the viscous step. The FFT’s are
performed using FFTW subroutines version 3.3
(Frigo and Johnson, 2011).
The high performance computing of the solver
includes shared memory OpenMP thread
parallelization for the FFT calculations using the
FFTW multi-thread library (Frigo and Johnson,
2011). Message Passing Interface (MPI)
parallelization along the x2-direction using
influence matrix method is discussed in section
2.3.
2.1 Solution of 1-D Helmholtz equation using
Chebyshev collocation method
Equations (5a) and (6) are 1-D Helmholtz
equations for each point, which are solved
using the Chebyshev collocation method. The
following summarizes the solution procedure for
a general equation with general Robin-type
boundary conditions (refer to Peyret, 2001):
(7)
The numerical domain is algebraically mapped
to a solution domain as discussed in
Section 2.2. The simulation domain is discretized
(1b)
(2)
(6)
Page 5
Engineering Applications of Computational Fluid Mechanics Vol. 8, No. 1 (2014)
162
into N grid points, which are the Gauss-Lobatto
points :
(8)
The mapping of Eq. (7) results in:
(9a)
(9b)
(9c)
where, , , and are coefficients
associated with boundary conditions. The
derivatives are calculated using a collocation
method wherein,
(10a)
(10b)
(11b)
Thus Eq. (7) is discretized as:
Equations (12a and b) can be combined as (N+1)
algebraic equations:
(13)
Equation (13) is solved by a matrix
diagonalization method. In this approach, first the
matrix is decomposed as:
(14)
where is the right eigenvector matrix and is
the diagonal matrix consisting of eigenvalues .
Since is a function of grid only, its
eigenvalues and vectors are computed at the
beginning of the simulation using LAPACK subroutines (Anderson et al., 1995) and stored.
Equation (14) is solved in three simple matrix multiplication steps:
(a)
(b) (15)
(c).
2.2 Mapping of numerical domain to
Chebyshev solution domain
The numerical domain ( ) is mapped to the
Chebyshev solution domain ( ) via the following
algebraic functions:
Channel flow:
; and (16a)
Flat plate boundary layer:
; ;
(16b)
where, and .
2.3 Multi-domain solution of 1-D Helmholtz
equation using influence matrix method
Multi-domain solution of Eqs. (9a-c) using the
influence matrix method is summarized below,
readers are referred to Peyret (2001) for further
details. First assume M sub-domain partitions
each with
points, where a0 = -1 and aM = 1. The
points in each sub-domain are mapped to Gauss-
Lobatto points :
(17)
which are then mapped to the domain by a
linear mapping:
(18a)
(18b)
(18c)
The 1-D Helmholtz equation for each sub-domain
is expressed in algebraic form, similar to those
discussed in Section 2.1.
where
and is solved with interface conditions:
(11a)
(12a)
(12b)
: (19a)
(19b)
Page 6
Engineering Applications of Computational Fluid Mechanics Vol. 8, No. 1 (2014)
163
and boundary conditions:
(19e)
(19f)
Solutions of the above equations are sought in the
form:
The part of the solution is obtained by
solving the equation:
(21b)
The part of the solution is obtained as:
where are the influence matrix coefficients.
The solutions and are obtained by
solving:
(23b)
(23d)
It must be noted that both and
solutions do not depend on the instantaneous
solution, and can be solved at the beginning of the
iterations and saved. Substituting , and
in Eqs. 19(c-f), the equations for the
influence matrix are obtained:
Eqs. (24a-c) constitute an (M+1) algebraic system
of equations:
[ (25)
where [ is a tridiagonal matrix, which is a
function of and , and is calculated
before the numerical iteration. is a vector
array of unknowns , and is a
vector function of . The influence matrix is
computed as:
(26)
where the inverse of the matrix is obtained using
the LAPACK subroutines. The final solution is
obtained by using the λi values in Eq. (22) and
from superposition of solutions in Eq. (20).
Summary. The steps involved in the
implementation of the multi-domain Helmholtz
solver using Chebyshev collocation method are:
1. Partition solution domain into M sub-
domains . Distribute
points in each sub-domain on Gauss-Lobatto
points . Obtain an analytic mapping
function to map to . Obtain the
matrix in Eqs. (19) and compute associated
eigenvalues and eigenvectors.
2. Solve Eqs. (23) following the methodology
presented in Eqs. (15) and save
and .
3. Evaluate matrix [ from Eqs. (24), and
compute [ to be used in Eq. (26). Both
the steps 2 and 3 do not change with time
step, thus is performed before the numerical
iteration and results are saved.
4. Evaluate , which is the RHS term in
Eqs. 5(a) and (6), and solve Eqs. (21)
following Eq. (15) to obtain .
5. Compute the derivative of and use the
values at the domain interfaces along with
boundary condition coefficients , ,
and to obtain in Eq. (25). Then, solve
Eq. (26) to obtain influence matrix
coefficients .
6. Use and saved variables and
to obtain from Eq. (22).
7. Obtain updated solution using Eq. (20) from
solutions obtained from steps 4 and 6. Note
(19c)
(19d)
(20)
(21a)
(22)
(23a)
(23c)
: (24a)
(24b)
(24c)
Page 7
Engineering Applications of Computational Fluid Mechanics Vol. 8, No. 1 (2014)
164
that Eq. (20) also includes the velocities at the
interface, so Eqs. (19c and d) are satisfied.
8. For solution at next time iteration repeat steps
4-7.
Note that step 5 is solved by the root processor
only, which broadcasts λi values to other
processors. Thus, MPI parallelization requires a
total of 2×x1×x3×M data transfers, i.e., each
processor sends values at the sub-domain
interfaces to the root processor.
2.4 Validation of multi-domain solver for 1-D
Helmholtz equation
The multi-domain solver was first implemented
and tested for a model equation:
(27)
where, and the boundary
conditions are . The
analytic solution of the above equation is:
where
and
(28b)
Test simulations are performed using 193 points
partitioned into 1 to 16 sub-domains with as few
as 13 points per sub-domain.
The numerical predictions compared very well
with the analytic simulations for all the domains,
as shown in Fig. 1. In Fig. 1a, the L2 norm of the
error (difference between the analytic and
numerical predictions) decreases approximately
two orders of magnitude as the number of sub-
domains is increased from 1 to 16. Consistent
with previous studies, improved convergence is
expected in the multi-domain simulations, since
the predictions depend only on the gradients
within each sub-domain (Yang and Shizgal,
1994). Figs. 1b-1e show the different solution
components, the influence matrix coefficients,
and comparison of the numerical results with the
analytic solution for the 8 sub-domain
simulations. It is evident that the influence matrix
coefficients play a significant role in the solution
process, wherein very simple curves are
combined to obtain complex solutions.
3. TURBULENT CHANNEL FLOW
SIMULATIONS
The first LES test case is turbulent channel flow
at Re = 590 (Re = 1.26×104). Results are
compared to available DNS data (Moser et al.,
1999). The study seeks to evaluate the effect of
MPI domain decomposition on the solution and
determine appropriate grid requirements for the
turbulent flow simulations. The solver is then
applied for a higher Re = 2003 (Re = 4.86×104)
case. The predictions are validated using DNS
data (Hoyas and Jimenez, 2006) and compared
with the Re = 590 predictions. The vortical and
turbulent structures of the flow are analyzed and
compared with DNS predictions for Re = 180
(Jeong et al., 1997). Both channel flow
simulations were performed using a domain
extent of 0 ≤ x1 ≤ 2, 0 ≤ x3 ≤ and -1 ≤ x2 ≤ 1. A
representative illustration showing the flow
domain, grid distribution and boundary conditions
is shown in Fig. 2(a). Grid points are uniformly
distributed in the streamwise and spanwise
directions. The grid in the wall normal direction
is refined near the wall and at domain interfaces
for appropriate near wall resolution and accurate
calculation of influence matrix coefficients,
respectively. The simulations use periodic
boundary conditions in x1 and x3 directions, and
no-slip boundary conditions at x2 = -1 and 1. The
grids used for this study are summarized in
Table1.
3.1 Re = 590 results
The Re = 590 case was performed on a
64×97×64 grid using 1, 2 and 4 processors along
the x2 direction, which are referred to as cases
Ch1-G4, Ch1-G3 and Ch1-G2, respectively. The
Ch1-G4, Ch1-G3 and Ch1-G2 cases consist of 97,
49 and 25 points in the x2 direction for each sub-
domain, respectively. Two additional finer grids
along the x2 direction, namely Ch1-G1 and Ch1-
G0, are also used. The former consists of 2 sub-
domains with 65 points in x2 and the latter has 3
sub-domains with 49 points in x2. The normalized
grid resolution for the simulations is x1+ = 58,
x3+ = 29, x2,min
+ = 0.6 and x2,max
+ = 23, where
the normalization is performed using friction
velocity u and kinematic viscosity . The grids
are about 150 times coarser than the DNS grid
(Moser et al., 1999) and are considered adequate
for LES.
(28a)
Page 8
Engineering Applications of Computational Fluid Mechanics Vol. 8, No. 1 (2014)
165
1.E-07
1.E-06
1.E-05
1.E-04
0 4 8 12 16
Err
or
L2
No
rm(D
iffe
ren
ce b
etw
een
nu
meri
ca
l an
d a
na
lyti
c p
rofi
les)
Number of Sub-domain (a)
0
0.003
0.006
0.009
-1 -0.5 0 0.5 1
ᵠ-tild
e
y (b)
0
0.25
0.5
0.75
1
-1 -0.5 0 0.5 1
ᵠ 1
y (c)
0
0.25
0.5
0.75
1
-1 -0.5 0 0.5 1
ᵠ 2
y (d)
0
0.03
0.06
0.09
0.12
-1 -0.5 0 0.5 1
φ
y
Analytic
#Sub-domain = 8
(e)
Fig. 1 Test of multi-domain 1-D Helmholtz solver
using model equation: (a) L2 norm of the error;
(b-d) Component solutions for 8 multi-domain
case, and (e) Comparison of 8 multi-domain
predictions with analytic results. The inset in (e)
shows the influence matrix coefficients.
Fig. 2 Representative domain, subdomain
decomposition, grid distribution and boundary
conditions for (a) channel flow LES cases and
(b) flat-plate transitional boundary layer
simulation.
0
5
10
15
20
25
0.1 1 10 100
u+
y+
DNS
Ch1-G2
Ch1-G3
Ch1-G4
Ch1-G1
Ch1-G0
Fig. 3 Mean velocity profile predictions for Re = 590
compared with DNS (Moser et al., 1999).
As shown in Fig. 3, Ch1-G2, G3 and G4 mean
velocity predictions overall agree well with DNS
in the sub- and log-layers, but are under
predictive in the buffer and lower log-layers.
Ch1-G3 and G4 show errors of about 7%,
whereas Ch1-G2 shows error of about 15%. As
shown in Fig. 4, the turbulence intensity and
shear stress profiles predicted for the Ch1-G3 and
Ch1-G4 cases compare well with the DNS, except
Page 9
Engineering Applications of Computational Fluid Mechanics Vol. 8, No. 1 (2014)
166
towards the centerline. In contrast, the Ch1-G2
predictions are up to 20% lower. For the higher
order statistics (figure not shown), both Ch1-G3
and Ch1-G4 predictions agree well with DNS,
whereas Ch1-G2 predictions show a kink at the
domain interface. Multi-domain simulations on
finer grids, Ch1-G1 and Ch1-G0, show
improvements in the mean and turbulent statistics
predictions compared to the coarse grids. In
particular, Ch1-G0 predictions compare very well
with the DNS for all statistical quantities. Ch1-G1
predictions are comparable to Ch1-G0 for mean
velocity, urms and shear stress, but are 5% over
predictive for vrms and wrms.
Overall, the results suggest that for this case, Re ~
105, a minimum of 48 points per sub-domain is
required to accurately resolve the sharp gradients
in the flow. Higher Re simulations are expected to
involve even more small-scale fluctuations and
higher gradients and may require even finer grids.
Therefore, a minimum of 48 points per sub-
domain is used for the remainder of the test cases.
3.2 Re = 2003 results
The Re = 2003 channel flow test case was
performed on 7 different grids with different
resolutions as summarized in Table 1. The finest
grid consisted of 128×289×128 points with x1+ =
98, x3+ = 49, x2,min
+ = 0.7 and x2,max
+ = 21. The
simulation was performed using 12 processors (6
MPI×2OpenMP) and took 3000 CPU hours to
complete. Selected results, primarily on the finer
grids, are shown in Figs. 5 and 6.
The mean velocity predictions in Fig. 5 improve
with increasing grid refinement, and results on the
two finest grids compare well with the DNS data.
As shown in Fig. 6, the streamwise turbulence
intensity is predicted well using the finest two
grids, but the wall normal and spanwise
intensities are overpredicted near the centerline.
The peak streamwise, wall normal and spanwise
turbulence intensities occur at y+ = 12, 70 and 40,
respectively, which are similar locations as for the
Re = 590 case. However, the magnitude of
turbulence intensities are approximately 15%
higher throughout the channel compared to the
Re = 590 case. The shear stress profile (figure
not shown) shows a peak at y+ = 50 and an almost
linear decrease towards the center. The peak
location is similar to those predicted for the Re =
590 case, and the linear variation is expected in
the log-layer.
The streamwise skewness profile in Fig. 7a shows
large values around 1.2 in the sub-layer, which
decreases rapidly and becomes almost constant at
around -0.4 in the log-layer. This behavior is
expected due to the interaction of high-speed flow
away from the wall and low-speed flow near the
wall. The wall-normal velocity skewness in Fig.
7b shows a region of negative value in the buffer
layer region y+ = 11 – 70, an almost constant
value around 0.25 for most of the log-layer, and a
decrease towards the center. The spanwise
velocity skewness profile fluctuates around zero
as shown in Fig. 7c. This suggests that the
spanwise fluctuations are mostly isotropic, which
is expected due to flow symmetry along the
spanwise direction. The flatness profiles for all
velocity components in Fig. 8 show large values
near the wall, which is expected as the turbulence
is highly intermittent near the wall. The values
are almost steady around 3 away from the wall,
which corresponds to a Gaussian distribution. The
skewness and flatness profile predictions are
similar to the 590 case.
Jeong et al. (1997) studied the near wall coherent
structures for plane channel flow at Re = 180.
The structures were detected using the conditional
averaged second largest eigenvalue λ2 of the
tensor . The results showed that
the isosurfaces of λ2 correlated well with the
isosurfaces of the instantaneous longitudinal
streamwise vortices x. Therefore, in this paper,
instantaneous x isosurfaces and contours are
shown in Fig. 9 to highlight the vortical and
turbulent structures in the flow.
The isosurface of x shows elongated quasi-
streamwise structures of length x1+ = 200 to 400
with opposite circulation in the sub and buffer
layer 0 < x2+
< 60. The vortices with positive x
are inclined at around +14º, whereas the vortices
with negative x are inclined at -14º in the
streamwise-spanwise (x1-x3) plane. These vortices
show an ejection from the sub-layer inclined at 3º.
The isosurfaces of wall normal vorticity help
evaluate the interaction of the streamwise vortices
in the x1-x3 plane. Results show -type structures
in the sub- and buffer layer, which merge to form
hairpin like structures in the lower log-law region
around x2+ = 200.
As shown in Fig. 10, the instantaneous turbulent
shear stress values are mostly positive
with somewhat lower values in the vortex core.
This suggests that the streamwise vortices are
associated with the sweep (positive ) and
ejection (negative ) events on different sides.
Page 10
Engineering Applications of Computational Fluid Mechanics Vol. 8, No. 1, pp. 158–177 (2014)
167
Table 1 Summary of simulations performed in this study.
Test Case Re Grid # Grid Size Nx Ny Nz x1+
x2,max+ x2,min
+ x3
+ NProc Objective
1D
Helmholtz
Equation
- He-G0 193 - 193 - - - - - 1 to 16 Validate influence
matrix method implementation
Channel
Flow
Re = 590
Re = 1.26×104
Ch1-G4
40K 64 97 64 58 23 0.7 29
1
Evaluate effect of
domain
decomposition on solution accuracy
Validate solution
against DNS (Moser
et al., 1999)
Ch1-G3 2
Ch1-G2 4
Ch1-G1 53K 64 129 64 58 21 0.6 29 2
Ch1-G0 59K 64 145 64 58 28 0.7 29 3
Channel
Flow
Re = 2003
Re = 4.86×104
Ch2-G6 10K 32 96 32 393.3 66.6 1.07 196.6 1
Validate solution against DNS (Hoyas
and Jimenez, 2006)
Analyze vortical and turbulent structures,
compare with
Re=590 predictions and Jeong et al.
(1997) DNS results
for Re=180
Ch2-G5 53K 64 129 64 196.6 49.2 1.20 98.3 2
Ch2-G4 79K 64 193 64 196.6 32.8 0.84 98.3 3
Ch2-G3 1.05M 64 257 64 196.6 23.3 0.56 98.3 4
Ch2-G2 1.78M 96 193 96 131.2 31.9 1.07 65.5 3
Ch2-G1 2.66M 96 289 96 131.2 20.9 0.71 65.5 6
Ch2-G0 4.73M 128 289 128 98.3 20.9 0.71 49.1 6 ×(2)*
Channel
Flow
Re = 2003
Re = 4.86×104
Ch3-G2 2.66M 96
289
96 131.2
20.9 0.71
65.5
up to
12×(8)*
Scalability study
Evaluate solver performance
Ch3-G1 11.1M 196 196 65.6 32.8
Ch3-G0 18.9M 256 256 49.2 24.6
Flat plate Re 2.2×106 FP-G0 1.18M 64 289 64 136 - 0.3 68 6 ×(2)
*
Demonstrate
feasibility of
temporally developing flat plate
simulation approach *
#Threads
0
0.5
1
1.5
2
2.5
3
0 200 400 600
urm
s+
y+
DNS
Ch1-G2
Ch1-G3
Ch1-G4
Ch1-G1
Ch1-G0
(a)
0
0.2
0.4
0.6
0.8
1
0 200 400 600
v rm
s+
y+
DNS
Ch1-G2
Ch1-G3
Ch1-G4
Ch1-G1
Ch1-G0
(b)
0
0.3
0.6
0.9
1.2
1.5
0 200 400 600
wrm
s+
y+
DNS
Ch1-G2
Ch1-G3
Ch1-G4
Ch1-G1
Ch1-G0
(c)
0
0.2
0.4
0.6
0.8
1
0 200 400 600
(uv+ 1
2)/ w
y+
DNS
Ch1-G2
Ch1-G3
Ch1-G4
Ch1-G1
Ch1-G0
(d)
Fig. 4 Velocity fluctuation RMS and shear stress
profiles for Re = 590 compared with DNS
(Moser et al., 1999).
Engineering Applications of Computational Fluid Mechanics Vol. 8, No. 1 (2014)
Page 11
Engineering Applications of Computational Fluid Mechanics Vol. 8, No. 1 (2014)
168
0
10
20
30
0.2 2 20 200 2000
u+
y+
DNS
Ch2-G4
Ch2-G3
Ch2-G1
Ch2-G0
Fig. 5 Mean velocity profile for Re = 2003 compared
with DNS (Hoyas and Jimenez, 2006).
0
1
2
3
4
0 500 1000 1500 2000
urm
s+
y+
DNS
Ch2-G4
Ch2-G3
Ch2-G1
Ch2-G0
(a)
0
0.4
0.8
1.2
0 500 1000 1500 2000
v rm
s+
y+
DNS
Ch2-G4
Ch2-G3
Ch2-G1
Ch2-G0
(b)
0
0.5
1
1.5
0 500 1000 1500 2000
wrm
s+
y+
DNS
Ch2-G4
Ch2-G3
Ch2-G1
Ch2-G0
(c)
Fig. 6 RMS of (a) streamwise, (b) wall normal and (c)
spanwise velocity fluctuations for Re = 2003
compared with DNS (Hoyas and Jimenez,
2006).
-1
-0.5
0
0.5
1
0 500 1000 1500 2000
S(u
')
y+
Ch2-G4
Ch2-G3
Ch2-G1
Ch2-G0
(a)
-1
-0.5
0
0.5
1
0 500 1000 1500 2000
S(v
')
y+
Ch2-G4
Ch2-G3
Ch2-G1
Ch2-G0
(b)
-1
-0.5
0
0.5
1
0 500 1000 1500 2000
S(w
')
y+
Ch2-G4
Ch2-G3
Ch2-G1
Ch2-G0
(c)
Fig. 7 Skewness predictions on different grids for Re
= 2003.
The instantaneous stress component shows
a clover leaf pattern with positive and negative
stresses along the diagonal of the vortex core. The
contour shows predominantly positive
values at positive x vortex cores, and negative
values for negative x cores. For both and
stresses the positive and negative regions are
almost uniformly distributed, thus the spatially
(and temporally) averaged stresses are negligible.
Overall, the vortical and turbulent structure
predictions for Re = 2003 are qualitatively
similar to those predicted by Jeong et al. (1997),
and the interactions of quasi-streamwise vortices
in the buffer layer are identified to be the cause of
turbulence production. However, the inclination
of the elongated quasi-streamwise vortices is
larger, i.e., ±14º compared to ±4º, and the ejection
angle is lower, i.e., 3º compared to 9º. The results
also show the presence of hairpin-like structures
in the lower log-law region which were not
predicted by Jeong et al. (1997).
Page 12
Engineering Applications of Computational Fluid Mechanics Vol. 8, No. 1 (2014)
169
0
1
2
3
4
5
6
0 500 1000 1500 2000
F(u
')
y+
Ch2-G4
Ch2-G3
Ch2-G1
Ch2-G0
(a)
0
1
2
3
4
5
6
7
0 500 1000 1500 2000
F(v
')
y+
Ch2-G4
Ch2-G3
Ch2-G1
Ch2-G0
(b)
0
1
2
3
4
5
6
0 500 1000 1500 2000
F(w
')
y+
Ch2-G3
Ch2-G4
Ch2-G1
Ch2-G0
(c)
Fig. 8 Flatness predictions on different grids for Re =
2003.
4. COMPUTATIONAL PERFORMANCE
The solver was profiled using the OpenMP
omp_get_wtime function to obtain CPU times
required by different subroutines during the
channel flow simulations, and to perform a
scalability study. Three different grids were used
for this purpose: coarse with 96×289×96 points;
medium with 196×289×196 points; and fine with
256×289×256 points along x1, x2 and x3,
respectively. The study was performed on the
Mississippi State University HPC2 machine
Talon. Talon is an IBM P6 machine with 12
CPUs/node and 2GB RAM per CPU. It has a total
of 256 nodes or 3072 processors. The scalability
study is performed on up to 8 nodes for MPI × 12
OpenMP threads/node = 96 cores.
Fig. 11a shows the % of time spent in different
routines of the solver for fine, medium and coarse
grids. As shown, approximately 17%, 39%, 4.3%,
6.3% and 26.5% of time is spent for convection,
SGS, pressure solver, implicit viscous solver and
post-processing routines, respectively. The
convection and SGS steps are the most expensive
since they require several FFT and inverse FFT
operations. The latter also involves secondary
filtering which adds to the computational cost. I/O
is the second most expensive step, but it is only
used every 1000th iteration during a simulation,
hence it is not regarded as a bottleneck.
As shown in Fig. 11b, the solver w/o I/O shows
only 12× speed-up for 50 cores, and the
performance decreases after that. The MPI
domain decomposition using the influence matrix
method shows a super linear scalability, i.e., the
speedup is higher than ideal, as shown in in Fig.
12c. On the other hand, OpenMP shows very
limited scalability as shown in Fig. 12d. OpenMP
scalability improves slightly as the grid size
increases, and the best performance of 30% is
obtained for 6 threads. The performance obtained
herein is much lower compared to that reported
by Frigo and Johnson (2011), wherein almost
linear scalability was reported on an 80003 grid.
The poor performance for current grids are
expected as the OpenMP overhead dominates for
smaller grid sizes and larger number of threads A
memory bandwidth study was performed by
comparing the performance improvement of the
solver as the memory available for a single thread
is increased from 2GB to 24GB. The study shows
that the increased memory improves performance
by 12%. Thus, the OpenMP speed-up may not be
entirely due to the threads, but also due to
increase in memory bandwidth.
5. FLAT PLATE BOUNDARY LAYER
TRANSITION
Temporally developing LES was performed for
bypass transition flow over a flat plate at zero-
pressure gradient using a coarse 64×289×64 grid.
The multi-domain solver was applied on 12 cores
(6 MPI×2 OpenMP). The initial conditions are
obtained by imposing random phase velocity
fluctuation over a laminar Blasius profile for Re
= 30. The random phases were generated
following Rogallo (1981) using an energy
spectrum with l/=43, where l is integral length
scale and is Taylor’s length scale (Kang et al.,
2003). The velocity fluctuations were supplied
with 5% freestrem turbulence intensity (FSTI),
but as the simulation was started the turbulence
energy dissipated rapidly in the entire domain due
to phase adjustment. The flow eventually settled
Page 13
Engineering Applications of Computational Fluid Mechanics Vol. 8, No. 1, pp. 158–177 (2014)
170
Fig. 9 Isosurfaces of streamwise vorticity, x = -15 and 15. Inset 1 shows the contours of x along streamwise and
spanwise planes. Inset 2 shows the x isosurface from the top. Inset 3 shows the contour of streanwise velocity.
Inset 4 shows the isosurface of y =5 colored using x2+. The results are shown for the fine grid Re = 2003
simulation.
Fig. 10 Instantaneous shear stress contours in x2-x3
plane for fine grid Re = 2003 simulation: (a)
, (b) and (c) . The arrows
show the streamwise vortex circulation
direction.
down, i.e., the frictional resistance started
following the laminar wall shear stress (Cf)
profile, and at this stage the FSTI was around
1.8%.
For the present simulation, the Reynolds number
based on plate length L=1 was ReL = 3.85×105,
where L can be interpreted as the distance that the
simulation window moves downstream with the
mean flow during 1 non-dimensional second.
Periodic boundary conditions were applied in the
streamwise and spanwise directions, no slip was
applied at the wall, and a zero gradient condition
was applied at the freestream boundary, as
illustrated in Fig. 3(b). The domain size was 0 ≤
x1 ≤ 2, 0 ≤ x3 ≤ and 0 ≤ x2 ≤ 1, and the
dimensionless time step size t = 10-4
. During the
simulation, the numerical domain moves in the
streamwise direction at a rate Rex = ReL×t/x1 =
6.12 each time step. The simulation was
performed for 360K iterations which provided
solutions up to Rex = 2.2×106. The grid resolution
in the fully developed turbulent region was x1+ =
136, x3+ = 68 and x2,min
+ = 0.3. The simulation
took 400 wall clock hours (17 days) or 5000 CPU
hours.
Fig. 12 compares Cf predictions with analytic
laminar and turbulent profiles, experimental data
at Rex = 2.06×106 (DeGraff and Eaton, 2000), and
DNS results at Rex = 1.84×106 and 2.14×10
6
(Schlatter and Orlu, 2010). The LES predicts a
Postive
Streamwise vortex
circulation
Engineering Applications of Computational Fluid Mechanics Vol. 8, No. 1 (2014)
Page 14
Engineering Applications of Computational Fluid Mechanics Vol. 8, No. 1 (2014)
171
0
10
20
30
40
50
Convection SGS Pressure Viscous Post-processing
% C
PU
Tim
e S
pen
t
Solver Components
Coarse Grid
Medium Grid
Fine Grid
(a)
0
5
10
15
0 25 50 75 100
To
tal
CP
U T
ime
Ra
tio
Number of CPUs
Coarse Grid
Medium Grid
Fine Grid
(b)
1
3
5
7
9
1 2 3 4 5 6 7 8
To
tal
CP
U T
ime
Ra
tio
Number of CPUs
Ideal
Coarse Grid
Medium Grid
Fine Grid
(c)
0
10
20
30
40
2 3 4 5 6 8 12
% S
pee
du
p
# OpenMP Threads
Coarse Grid
Medium Grid
Fine Grid
(d)
Fig. 11 (a) CPU time required by different components
of the solver. Scalability study for CPU time
w/o I/O for (b) hybrid MPI/OpenMP
parallelization, (c) MPI domain decomposition
in normal direction and (d) OpenMP
parallelization for FFT.
Fig. 12 Skin friction coefficient distribution showing
flat plate boundary layer transition for FSTI =
1.8%. Laminar analytic profile
and turbulent profile
(Warsi, 2006).
decrease in Cf up to Rex= 9.2×105 in close
agreement with the analytic laminar profile, an
increase thereafter with overshoot from the
analytic turbulent profile for Rex = 1.42×106 to
1.72×106, and
a fully developed turbulent region
for higher Rex. The Cf slope is slightly higher than
the analytic turbulent profile and the magnitude is
7% lower than the experimental data and DNS.
The ERCOFTAC experimental data for FSTI =
0.874% (Coupland, 1990) shows that the
transition begins at Rex = 1.7×106. In the present
case, transition occurs earlier, which is expected
since FSTI is higher. The overshoot behavior is
also consistent with the ERCOFTAC data. This is
in contrast to the result of Wu and Moin (2009),
who did not observe a similar overshoot.
The contours of streamwise velocity at the center
plane in Fig. 13a show a stable boundary layer in
the laminar region, and turbulent fluctuations
starting to grow in the transition region. The
thickness of the low velocity region near the wall
grows both in the laminar and fully developed
turbulent regions, due to flow deceleration and
growth of boundary layer thickness as shown in
the inset figure. The low velocity thickness
decreases in the transition region due to a thinning
of the viscous sublayer, as shown in Fig. 13b. The
integral boundary layer parameters including
hydrodynamic thickness , displacement
thickness * and momentum thickness in Fig. 14
follow the analytic profiles (Warsi, 2006) in both
the laminar and turbulent regions, and show a
steady growth in the transition region. The
predictions also agree well with the available
experimental data and DNS in the turbulent
region. The shape factor H is around 2.6 in the
laminar region and 1.4 in the turbulent region.
Page 15
Engineering Applications of Computational Fluid Mechanics Vol. 8, No. 1, pp. 158–177 (2014)
172
Fig. 13 (a) Contours of streamwise velocity (< 99% Ufreestream) showing growth of laminar, transitional and turbulent
boundary layer regions. The contours are shown for the center plane using instantaneous data @ 2000 iterations
(or Rex = 25,000). The inset shows the mean streamwise velocity profile. (b) U/Ue vs y/ at several streamwise
locations, compared to DNS in turbulent region.
0
0.025
0.05
0.075
0.1
2.0E+05 7.0E+05 1.2E+06 1.7E+06 2.2E+06Rex
LES
Experiment (Degraff and Eaton, 2000)
DNS (Schlatter et al, 2010)
0
0.004
0.008
0.012
2.0E+05 7.0E+05 1.2E+06 1.7E+06 2.2E+06Rex
*
LES
DNS (Schlatter et al., 2010)
Experiment (Degraff and Eaton, 2000)
(a) (b)
0
0.0025
0.005
0.0075
0.01
2.0E+05 7.0E+05 1.2E+06 1.7E+06 2.2E+06Rex
LES
Experiment (Degraff and Eaton, 2000)
DNS (Schlatter et al., 2010)
1
1.5
2
2.5
3
2.0E+05 7.0E+05 1.2E+06 1.7E+06 2.2E+06Rex
H
LESExperiment (Degraff and Eaton, 2000)DNS (Schlatter et al., 2010)
(c) (d)
Fig. 14 Prediction of integral boundary layer parameters (a) hydrodynamic thickness, (b) displacement thickness, (c)
momentum thickness and (d) shape factor of the flat plate boundary layer. Laminar analytic profiles:
; ; ; and . Turbulent analytic
profiles: ; ; ; ; and
. Refer to Warsi (2006) for details of analytic profiles.
Engineering Applications of Computational Fluid Mechanics Vol. 8, No. 1 (2014)
Page 16
Engineering Applications of Computational Fluid Mechanics Vol. 8, No. 1 (2014)
173
Fig. 15 Isosurfaces of x are shown in the laminar, transitional and turbulent boundary layer regions. Surfaces in (a) are shown for x = -3.5 and 3.5, whereas (b) and (c) are for x = -30 and 30.
Isosurfaces of streamwise vorticity in Fig. 15
show low strength elongated structures in the
laminar region, which merge to form -shaped
structures around Rex = 7.8×105. -shaped
structures grow into hairpin like structures and
eventually into a turbulent spot, which causes the
boundary layer breakdown. The turbulent region
shows dense hairpin-like structures, and the
structures are even denser in the overshoot region. Both in the transition and turbulent regions
overlapping quasi-streamwise vortices with
alternating tilt and signs are predicted in the
buffer layer. These vortices develop in the
transition region and show lifting similar to the
Re =2003 channel flow case. Thus, it is expected
that the passage of these overlapping structures
with opposite circulation leads to the
ejection/sweep events that feed the hairpin vortex
generation. Analysis of the development of the
turbulent fluctuations shows that the streamwise
(urms) and spanwise (wrms) turbulence peaks in
the transition region around Rex = 1.41×106,
where the skin friction overshoots the turbulent
profile. The peak values are 1.5 to 2 times larger
than those in the fully developed turbulent region.
On the other hand, the wall normal velocity
fluctuation (vrms) and resolved shear stress (uv)
grow steadily. The growth of v and uv profiles
agree well with Wu and Moin (2009) DNS
predictions.
The LES predictions in the fully developed
turbulent region at Rex = 1.84×106 and 2.14×10
6
were compared with the DNS data of Schlatter
and Orlu (2010) (figure not shown). The results
show that LES over predicts the log-layer velocity
profile and the streamwise and spanwise
turbulence intensities, and under predicts the wall
normal turbulence intensity. The coarse grid Re =
2003 channel flow simulation also showed similar
over prediction of the turbulence intensities, thus
the poor predictions are possibly due to coarse
grid resolution. In addition, the higher order
(b)
Page 17
Engineering Applications of Computational Fluid Mechanics Vol. 8, No. 1 (2014)
174
statistics shows kinks similar to channel flow
simulations with low number of collocation points
for each subdomain. Fine grids with larger
numbers of collocation points in each subdomain
are likely required for improved predictions.
6. CONCLUSIONS, DISCUSSIONS AND
FUTURE WORK
6.1 Summary and conclusions
Computational performance of a pseudo-spectral
solver has been improved by implementing
OpenMP thread parallelization for FFT
calculations and MPI domain decomposition for
the Chebyshev collocation method using the
influence matrix method. The solver is validated
for LES of turbulent plane channel flows using
the dynamic Smagorinsky model at Re = 590 and
2003 by comparison to DNS data, and applied for
flat plate boundary layer transition flow for FSTI
= 1.8% and Rex up to 2.2×106. The viability of the
temporally developing approach, opposed to the
commonly used spatially developing approach, is
investigated to reduce the computational cost of
the transition simulations.
The turbulent channel flow study shows that the
increase in the number of MPI subdomains for a
fixed grid (which results in a decrease of
collocation points in each subdomain) results in
loss of solution accuracy, in particular showing
kinks in higher order turbulent statistics at the
domain interfaces. One should expect similar
behavior for (pressure, shear or shock) waves
present in the flow, i.e., higher order features of
the waves would show kinks on coarse grids. The
channel flow results on sufficiently finer grids
compare very well with the DNS data. This
suggests that accurate solution for turbulent
statistics can be obtained if the number of
collocation points in each subdomain is
sufficiently high. Similar conclusions were drawn
by Sabbah and Pasquetti (2008) for high Re non-
stationary flows. It is estimated that a minimum
of 48 collocation points per subdomain are
required for Re ~ 105, and even larger numbers for
Re ~ 106. For sufficiently fine grids, the accuracy
of the multi-domain Chebyshev collocation
method is expected to improve with an increase in
the number of subdomains, as the range of
gradients to be resolved within each domain
decreases. Overall, the study validates the
accuracy of the MPI domain decomposition for
Chebyshev polynomial method using the
influence matrix method for turbulent flows.
The MPI domain decomposition using the
influence matrix method shows a super linear
scalability, and the speedup of the Poisson solvers
is almost exponential. In contrast, OpenMP thread
parallelization of FFT shows performance
improvement of only 30% on 6 threads, and
almost 12% of that is due to increased memory
bandwidth. Overall, the solver shows 12× speed-
up on 50 cores.
The flat plate simulation shows that the transition
begins at Rex= 9.2×105, overshoots the turbulent
skin friction profile for Rex = 1.42×106 to
1.72×106, and achieves a fully developed
turbulent region for higher Rex. The transition
trends are consistent with ERCOFTAC
experimental data. The integral boundary layer
parameters are in good agreement with the
analytic profiles, experimental data and DNS in
the laminar and turbulent regions, and show a
steady growth in the transition region. However,
the predictions do not compare very well with the
DNS in fully developed turbulent region, due to
coarse grid resolution. Overall, the results suggest
that the temporally developing approach may be a
viable alternative for transitional flow
simulations.
6.2 Discussions and future work
In this study, the Chebyshev approximate
equations are solved using the influence matrix
method to attain C1 continuity of the velocity at
domain interfaces. Ideally, C2 continuity is
required for Navier-Stokes solutions, and the
influence matrix method can be extended to
include multiple point overlap to allow such
continuity (Peyret, 2001). However, specification
of higher order continuity is expected to be
susceptible to numerical instabilities (Subbah and
Pasqutti, 2008), and requires additional numerical
expense. Subbah and Pasqutti (2008) identified
enforcing mass conservation at the interface using
specification of C0
continuity of pressure and C
1
continuity of velocity to be sufficient for turbulent
flows. In spectral element methods described by
Deville et al. (2002) the governing equations are
expressed in variational form and only C0
continuity of velocity is satisfied. Studies using
such codes have shown reasonable results on
sufficiently fine grids. Considering the results
presented in this study and those in the literature,
C1 continuity of velocity is identified to be a
sufficient boundary condition for the influence
matrix method.
Analysis of the vortical structures for the flat plate
Page 18
Engineering Applications of Computational Fluid Mechanics Vol. 8, No. 1 (2014)
175
case shows that the turbulent spots are formed due
to the merger of near wall low strength elongated
structures to -shaped structures, which grow into
hairpin like structures. Further, both the transition
and turbulent regions show overlapping quasi-
streamwise vortices with alternating tilt and signs
in the buffer layer, and the passage of these
structures leads to the ejection/sweep events to
feed the hairpin vortex generation, similar to
those in the turbulent channel flow. The transition
mechanism predicted in this study agrees most
closely to the conceptual model of Schlatter et al.
(2008).
Future solver performance improvement effort
will focus on MPI domain decomposition along
FFT directions and OpenMP parallelization of the
entire solver. The former would involve data
transposition and transfer for FFT calculations for
the convection and viscous solvers (Xu, 2007).
OpenMP thread parallelization can be used to
improve performance of convection, pressure
projection and viscous subroutines, wherein
different wall normal planes can be solved
simultaneously on threads.
Future work for transition simulations includes:
(a) specification of more appropriate initial
freestream turbulence conditions, and (b)
performance of simulations on finer grids
matching the ERCOFTAC dataset. The
freestream condition generated in this study used
random phase perturbations, which lead to a sharp
decay in turbulence at the beginning. This is
expected, as such methods do not account for the
flow inhomogeneity due to the presence of the
wall (Brandt et al. 2004). Future study will use
freestream turbulence generated using
superposition of modes of the continuous
spectrum of the linearized Orr–Sommerfeld and
Squire operators, which allows accurate decay of
freestream turbulence (Jacobs and Durbin, 2001;
Brandt et al., 2004). In addition, in temporal
simulations spatial decay is based on the velocity
of the reference frame. Since, the boundary layer
and the free-stream have different velocities, it is
expected that their decay rates will be out of
phase. Further investigation is required to clarify
the need for transformation/external forcing to
achieve a consistent turbulence decay or growth
of boundary layer (Spalart 1988).
Future work for the analysis of the transition flow
results will focus on: (a) implementation of better
vortex identification techniques, such as second
largest eigenvalue λ2 of the tensor
, and (b) turbulence kinetic
energy and stress budgets to identify dominant
terms and study the intercomponent energy
transfer mechanism (Kim et al., 2008).
ACKNOWLEDGEMENT
This research was funded by NSF under Grant
No. EPS-0903787 and by NASA under Grant No.
NNX10AN06A.
REFERENCES
1. Akhavan R, Ansari A, Kang S, Mangiavacchi
N (2000). Subgrid-scale interactions in a
numerically simulated planar turbulent jet and
implications for modeling. J. Fluid Mech.
408: 83–120.
2. Anderson E, Bai Z, Bischof C, Demmel J,
Dongarra J, Croz JD, Greenbaum A,
Hammarling S, McKenney A, Ostrouchov S,
Sorensen D (1995). LAPACK Users' Guide. Second Edition, SIAM, Philadelphia, PA.
3. Ansari A (1997). Self-similarity and mixing
characteristics of turbulent mixing layers
starting from laminar initial conditions. Phys.
Fluids 9(6): 1714–28. 4. Baxevanou CA and Fidaros DK. (2008).
Validation of numerical schemes and turbulence models combinations for transient
flow around airfoil. Engineering Applications of Computational Fluid Mechanics 2:208-
221. 5. Bhushan S, and Warsi, ZUA (2005). Large
eddy simulation of turbulent channel flow using an algebraic model. International
Journal for Numerical Methods in Fluids, 49: 489-519.
6. Bhushan S, Warsi ZUA, Walters DK (2006). Estimating backscatter in subgrid scale
turbulence through algebraic modeling. AIAA Journal 44(4): 837-847.
7. Bhushan S, Warsi ZUA (2007). Large eddy
simulation of free-shear flows using an algebraic model. Computers and Fluids
36(8): 1384-1397. 8. Borell G, Sillero JA, Jimenez J (2011). Direct
numerical simulation of turbulent boundary layers at high Reynolds numbers. 23
rd
International Conference on Parallel Computational Fluid Dynamics, May 16-20,
Barcelona, Spain. 9. Brandt L, Schlatter P, Henningson, DS
(2004). Transition in boundary layers subject to free-stream turbulence. J. Fluid Mech. 517:
167-198.
10. Coupland J (1990). ERCOFTAC Special
Interest Group on Laminar to Turbulent
Page 19
Engineering Applications of Computational Fluid Mechanics Vol. 8, No. 1 (2014)
176
Transition and Retransition: T3A and T3B
Test Cases. ERCOFTAC.
11. DeGraff DB, Eaton JK (2000). Reynolds-
number scaling of the flat-plate turbulent
boundary layer. J. Fluid Mech. 422: 319-346.
12. Deville MO, Fischer PF and Mund EH
(2002). High-Order Methods for
Incompressible Fluid Flow. Cambridge
University Press.
13. Frigo M, Johnson SG (2011). FFTW for
version 3.3. http://www.fftw.org/fftw3.pdf.
14. Hoyas S, Jimenez J (2006). Scaling of the
velocity fluctuations in turbulent channels up
to Reτ=2003. Phys. Fluids 18: 011702.
15. Jacobs RG, Durbin PA (2010). Simulation of
bypass transition. J. Fluid Mech. 428: 185–
212.
16. Jeong J, Hussain F, Schoppa W, Kim J
(1997). Coherent structures near the wall in a
turbulent channel flow, J. Fluid Mech. 332:
185-214.
17. Kang H, Chester S, Meneveau C. (2003).
Decaying turbulence in an active grid
generated flow and comparisons with large-
eddy simulation. Journal of Fluid Mechanics
480:129–160.
18. Khurajadze G, Oberlack M (2004). DNS and
scaling laws from new symmetry groups of
ZPG turbulent boundary layer flow. Theor.
Comp. Fluid Dyn. 18: 391–441.
19. Kim J, Moin P, Moser R (1987). Turbulence
statistics in fully developed channel flow at
low Reynolds number. J. Fluid Mech. 117:
133-143.
20. Kim K, Sung HJ, Adrian RJ (2008). Effects
of background noise on generating coherent
packets of hairpin vortices. Phys. Fluids 20:
105107.
21. Lilly DK (1992). A proposed modification of
the Germano subgrid-scale closure method,
Phys Fluids 4 (3): 633–634.
22. Malik MR, Hussaini MY (1990). Numerical
simulation of Gortler/Tollmein-Schlichting
wave-interaction. J. Fluid Mech. 210: 183–
199.
23. Minguez M. Pasquetti, R, Serre E. (2008)
High-order LES of flow over the Ahmed
reference body, Phys. Fluids. 20(9): 095101-
095101-17.
24. Minguez M., Brun C, Pasquetti, R, Serre E.
(2011). Experimental and high-order analysis
of the flow in near-wall region of a square
cylinder. Int. J. Heat and Fluid Flow. 32(3):
558-566.
25. Moser RD, Kim J, Mansour NN (1999).
Direct numerical simulation of turbulent
channel flow up to Reτ=590. Physics of Fluids
11: 943-945.
26. Peyret R (2001). Spectral methods for
incompressible viscous flow. Appl Math
Sci.148: 39-98.
27. Raspo I, Ouazzani J, Peyret R (1996). A
spectral multidomain technique for the
computation of the czochralski melt
configuration. Int. J. Num. Meth. Heat Fluid
Flow 6: 31 - 42.
28. Rogallo RS. (1981). Numerical Experiments
in Homogeneous Turbulence. NASA TM-
81315.
29. Sabbah C, Pasquetti R (1998). A divergence-
free multi-domain spectral solver of the
Navier–Stokes equations in geometries of
high aspect ratio. J. Comput. Phys. 139: 359–
379.
30. Schlatter P, Brandt L, deLange HC,
Henningson DS (2008). On streak breakdown
in bypass transition. Phys. Fluids 20: 101505.
31. Schlatter P, Orlu R, Li Q, Brethouwer G,
Fransson J, Johansson AV, Alfredsson PH,
Henningson DS (2009). Turbulent boundary
layers up to Reθ = 2500 studied through
simulation and experiment. Phys. Fluids 21:
051702.
32. Schlatter P, Orlu R (2010). Assessment of
direct numerical simulation data of turbulent
boundary layers. J. Fluid Mech. 659: 116-
126.
33. Singer BA, Joslin RD (1994). Metamorphosis
of a hairpin vortex into a young turbulent
spot. Phys. Fluids 6: 3724–3736.
34. Spalart PR (1988) Direct simulation of a
turbulent boundary layer up to Re = 1410. J.
Fluid Mech. 187: 61–98.
35. Wallace JM, Park GI, Wu X, Moin P (2010).
Boundary layer turbulence in transitional and
developed states. Center for Turbulence
Research Proceedings of the Summer
Program.
36. Walters DK, Cokljat D (2008). A three-
equation eddy-viscosity model for Reynolds -
averaged Navier - Stokes simulations of
transitional flow. ASME J. Fluids Eng. 130:
121401.
37. Warsi ZUA (2006). Fluid Dynamics, Theoretical and Computational Approach
(3rd edn). CRC Press: Boca Raton, FL.
38. Wu A, Moin P, (2009). Direct numerical
simulation of turbulence in a nominally zero-
pressure-gradient flat-plate boundary layer. J.
Fluid Mech. 630: 5–41.
Page 20
Engineering Applications of Computational Fluid Mechanics Vol. 8, No. 1 (2014)
177
39. Xu J (2007). Benchmarks on tera-scalable
models for DNS of turbulent channel flow.
Parallel Comput. 33(12): 780–794.
40. Yakinthos K. (2013). Application of non-
linear eddy-viscosity model involving A2
stress-invariant transport equation to
transitional flows. Engineering Applications
of Computational Fluid Mechanics 7:393-
405.
41. Yang HH, Shizgal B (1994). Chebyshev
pseudo-spectral multi-domain technique for
viscous flow calculation. Computer Methods
in Applied Mechanics and Engineering 118:
47-61.
42. Zaki TA, Durbin PA (2005). Mode
interaction and the bypass route to transition.
J. Fluid Mech. 531: 85–111.