Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
45f2251
Added sample test to check algorithmic speedup of using real inputs
katgonzalez17 Apr 18, 2021
7ff1906
Added toy code as proof of concept of speedup - need to set array to …
katgonzalez17 Apr 19, 2021
b8dcf61
Accidentally had two forward fourier transforms
katgonzalez17 Apr 21, 2021
bb0e5f3
Changed printout format, averaged 5 runs per datapoint to get more co…
katgonzalez17 Apr 21, 2021
ac653f4
Added toy file on top of benchmarking, computes physical to fourier a…
katgonzalez17 May 2, 2021
bd5f48d
First pass at moving to using one-sided fft, seems to work in paravie…
katgonzalez17 May 2, 2021
555aef0
Modified input.data to be consistent with branch benchmarking and tes…
katgonzalez17 May 2, 2021
f61cd36
Using larger grid, finer time step to check correctness
katgonzalez17 May 4, 2021
e2b0215
Modulating input data file to see at what grid size one and two-sided…
katgonzalez17 May 5, 2021
6c75d9a
Found matrix size at which nusselt numbers start to diverge, now writ…
katgonzalez17 May 5, 2021
3740bc2
Moved debug prints to after calls to calc_explicit in imex_rk
katgonzalez17 May 5, 2021
864d422
Added printout of subset of temperature values before calc_explicit(1…
katgonzalez17 May 5, 2021
22ccaf1
Added another line to dump out temperature values prior to any dft ex…
katgonzalez17 May 5, 2021
cfaaf48
Allocating full size arrays for complex, setting var_comp(Nx-i+1) equ…
katgonzalez17 May 6, 2021
c36565b
Switched to using half-size complex arrays, reducing length of for lo…
katgonzalez17 May 6, 2021
598f30c
Added extra subroutine to calculate Nu-like quantity on Ti to examine…
katgonzalez17 May 6, 2021
95e1841
Added intermediate Nu_ti print in calc_explicit to look for source of…
katgonzalez17 May 7, 2021
10dc54a
Removed extraneous writing to debug file, focusing on using nusselt n…
katgonzalez17 May 7, 2021
fdefbed
Added function to calculate nu of arbitrary array, filling in K[X]hat…
katgonzalez17 May 7, 2021
1f43a5c
Moved to using Nx/2+1 for the kx loops in imex_rk, will look again ti…
katgonzalez17 May 7, 2021
378dcbd
Added more printouts
katgonzalez17 May 8, 2021
d4a5dd8
Cleaned up code
katgonzalez17 May 9, 2021
1f6c23e
Added comments
katgonzalez17 Jun 29, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 5 additions & 4 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,12 +1,13 @@
FC=gfortran
FFLAGS= -c -O3

LIBFLAGS2 = -L/usr/local/fftw/lib
LDFLAGS = -I/usr/local/fftw/include
LIBFLAGS2 = -L/usr/local/Cellar/fftw/3.3.9/lib
LDFLAGS = -I/usr/local/Cellar/fftw/3.3.9/include

MAIN = Ra_loop
# MAIN = Ra_loop
#MAIN = Ra_loop_no_opt
#MAIN = time_loop
MAIN = time_loop
# MAIN = real_to_comp_set

OBJECTS = fftw.o global.o allocate_vars.o precmod.o stringmod.o write_pack.o interpolation_pack.o mesh_pack.o imod.o bc_setup.o statistics.o time_integrators.o jacobians.o gmres_pack.o nonlinear_solvers.o $(MAIN).o
PROGRAMS = $(MAIN).exe
Expand Down
8 changes: 8 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,14 @@ steady solution for a given $Ra$ and $Pr$. At each $(Ra, Pr)$ pair, steady solut
$L$. The size of the domain (box) in $x$ is given by $L=2\pi/\alpha$ where $\alpha$ is the wavenumber that sets the scale of
the solution in $x$. The solution in box size $L$ that maximizes the Nusselt number is referred to as the optimal solution.

### Branch Overview

This branch was created to leverage the real nature of the physical fields being used. Because these fields (including temperature, velocity) are real, their Fourier transforms can be computed more effeciently through Hermitian conjugate symmetry. This allows for a theoretical 2x speedup of each transform calculation.

Compared to the mainline time integrators code, this branch has a set of real variables and complex variables (ex. there is a real version of the temperature field (physical space) and a complex version (Fourier space)). The real variables have n entries, while the complex ones have n/2 + 1 entries (http://www.fftw.org/fftw3_doc/One_002dDimensional-DFTs-of-Real-Data.html#One_002dDimensional-DFTs-of-Real-Data). The plans created in time_loop.f90 are either explicitly from real to complex or from complex to real. These plans are then used in time_integrators when doing the Fourier transforms.

Note for debugging: In the current iteration of the code, the Nusselt number is consistently the same between the one- and two-sided code versions during the first time step. After, there is a consistent divergence that appears to scale with grid size rather than number of time steps. To aid in debugging the steps between temperature field updates, another method called nusselt_var() has been added to statistics.f90. This takes in an array and computes a nusselt-like number. It omits using h1, h2, and h3 to be solely dependent on the input array.

### Input File Structure
| Line # | Column 1 | Column 2 | Column 3 |
| :------: | :--------: | :--------: | :--------: |
Expand Down
56 changes: 35 additions & 21 deletions allocate_vars.f90
Original file line number Diff line number Diff line change
Expand Up @@ -27,20 +27,27 @@ subroutine global_allocations
allocate(dyv1_B(Nx), dyv1_T(Nx), stat=alloc_err)
allocate(dyv2_B(Nx), dyv2_T(Nx), stat=alloc_err)

allocate(tT (Nx), stat=alloc_err)
allocate(tux(Nx), stat=alloc_err)
allocate(tuy(Nx), stat=alloc_err)
allocate(tnlT(Nx), stat=alloc_err)
allocate(tnlphi(Nx), stat=alloc_err)
allocate(tphi(Nx), stat=alloc_err)
allocate(tT_real(Nx), stat=alloc_err)
allocate(tux_real(Nx), stat=alloc_err)
allocate(tuy_real(Nx), stat=alloc_err)
allocate(tnlT_real(Nx), stat=alloc_err)
allocate(tnlphi_real(Nx), stat=alloc_err)
allocate(tphi_real(Nx), stat=alloc_err)

allocate(tT_comp(Nx/2+1), stat=alloc_err)
allocate(tux_comp(Nx/2+1), stat=alloc_err)
allocate(tuy_comp(Nx/2+1), stat=alloc_err)
allocate(tnlT_comp(Nx/2+1), stat=alloc_err)
allocate(tnlphi_comp(Nx/2+1), stat=alloc_err)
allocate(tphi_comp(Nx/2+1), stat=alloc_err)

allocate(xp(Nx), stat=alloc_err)
allocate(yp(Ny), stat=alloc_err)
allocate(zp(Nz), stat=alloc_err)

allocate(kx(Nx), stat=alloc_err)
allocate(kx(Nx/2+1), stat=alloc_err)
allocate(kz(Nz), stat=alloc_err)
allocate(kx_modes(Nx), stat=alloc_err)
allocate(kx_modes(Nx/2+1), stat=alloc_err)

allocate(dynu(Ny-1), stat=alloc_err)
allocate(g1(Ny), g2(Ny), g3(Ny), stat=alloc_err)
Expand All @@ -51,13 +58,13 @@ subroutine global_allocations
allocate(tmp_K_phi(Ny), tmp_K_T(Ny), stat=alloc_err)
allocate(phii(Ny,Nx), Ti(Ny,Nx), stat=alloc_err)
allocate(uyi(Ny,Nx), uxi(Ny,Nx), stat=alloc_err)
allocate(K1_phi(Ny,Nx), K1_T(Ny,Nx), stat=alloc_err)
allocate(K2_phi(Ny,Nx), K2_T(Ny,Nx), stat=alloc_err)
allocate(K3_phi(Ny,Nx), K3_T(Ny,Nx), stat=alloc_err)
allocate(K1hat_phi(Ny,Nx), K1hat_T(Ny,Nx), stat=alloc_err)
allocate(K2hat_phi(Ny,Nx), K2hat_T(Ny,Nx), stat=alloc_err)
allocate(K3hat_phi(Ny,Nx), K3hat_T(Ny,Nx), stat=alloc_err)
allocate(K4hat_phi(Ny,Nx), K4hat_T(Ny,Nx), stat=alloc_err)
allocate(K1_phi(Ny,Nx/2+1), K1_T(Ny,Nx/2+1), stat=alloc_err)
allocate(K2_phi(Ny,Nx/2+1), K2_T(Ny,Nx/2+1), stat=alloc_err)
allocate(K3_phi(Ny,Nx/2+1), K3_T(Ny,Nx/2+1), stat=alloc_err)
allocate(K1hat_phi(Ny,Nx/2+1), K1hat_T(Ny,Nx/2+1), stat=alloc_err)
allocate(K2hat_phi(Ny,Nx/2+1), K2hat_T(Ny,Nx/2+1), stat=alloc_err)
allocate(K3hat_phi(Ny,Nx/2+1), K3hat_T(Ny,Nx/2+1), stat=alloc_err)
allocate(K4hat_phi(Ny,Nx/2+1), K4hat_T(Ny,Nx/2+1), stat=alloc_err)

if (alloc_err /= 0) then
write(*,*) "ERROR: Global allocations failed."
Expand All @@ -72,12 +79,19 @@ subroutine global_allocations
nlT = (0.0_dp, 0.0_dp)
nlphi = (0.0_dp, 0.0_dp)

tT = (0.0_dp, 0.0_dp)
tux = (0.0_dp, 0.0_dp)
tuy = (0.0_dp, 0.0_dp)
tnlT = (0.0_dp, 0.0_dp)
tnlphi = (0.0_dp, 0.0_dp)
tphi = (0.0_dp, 0.0_dp)
tT_real = 0.0_dp
tux_real = 0.0_dp
tuy_real = 0.0_dp
tnlT_real = 0.0_dp
tnlphi_real = 0.0_dp
tphi_real = 0.0_dp

tT_comp = (0.0_dp, 0.0_dp)
tux_comp = (0.0_dp, 0.0_dp)
tuy_comp = (0.0_dp, 0.0_dp)
tnlT_comp = (0.0_dp, 0.0_dp)
tnlphi_comp = (0.0_dp, 0.0_dp)
tphi_comp = (0.0_dp, 0.0_dp)

phi1 = 0.0_dp
phi2 = 0.0_dp
Expand Down
6 changes: 4 additions & 2 deletions global.f90
Original file line number Diff line number Diff line change
Expand Up @@ -63,8 +63,10 @@ module global
! Allocatable variables
complex(C_DOUBLE_COMPLEX), allocatable :: T(:,:), uy(:,:), phi(:,:), ux(:,:)
complex(C_DOUBLE_COMPLEX), allocatable :: nlT(:,:), nlphi(:,:)
complex(C_DOUBLE_COMPLEX), allocatable :: tT(:), tuy(:), tux(:), tnlT(:), tnlphi(:)
complex(C_DOUBLE_COMPLEX), allocatable :: tphi(:)
complex(C_DOUBLE_COMPLEX), allocatable :: tT_comp(:), tuy_comp(:), tux_comp(:), tnlT_comp(:), tnlphi_comp(:)
complex(C_DOUBLE_COMPLEX), allocatable :: tphi_comp(:)
real(dp), allocatable :: tT_real(:), tuy_real(:), tux_real(:), tnlT_real(:), tnlphi_real(:)
real(dp), allocatable :: tphi_real(:)
complex(C_DOUBLE_COMPLEX), allocatable, dimension(:,:) :: Tptrb

real(dp), allocatable, dimension(:) :: x0, GT
Expand Down
6 changes: 3 additions & 3 deletions input.data
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
7.0, 1.5585, 0.1
5.0e+03, 1, 1.1
5.0, 0.1
0.1, 0.01
-1.0, 1.0
64, 51, 1
1280, 1020, 1
0, 0, 0
1, 0, 1
0, 0, 0
6 changes: 3 additions & 3 deletions nonlinear_solvers.f90
Original file line number Diff line number Diff line change
Expand Up @@ -119,10 +119,10 @@ subroutine flow_map(Nu, iter_nl, iter_gmres_ave)
call unpackx(x0)

! Calculate phi and ux from uy
do jj = 1,Nx
do jj = 1,Nx/2+1
if (kx(jj) /= 0.0_dp) then
tuy = uy(:,jj)
ux(:,jj) = -CI*d1y(tuy)/kx(jj)
tuy_comp = uy(:,jj)
ux(:,jj) = -CI*d1y(tuy_comp)/kx(jj)
else if (kx(jj) == 0.0_dp) then
ux(:,jj) = cmplx(0.0_dp, 0.0_dp, kind=C_DOUBLE_COMPLEX) ! Zero mean flow!
end if
Expand Down
80 changes: 80 additions & 0 deletions re_to_comp_test.f90
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
program re_to_comp_test

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

use global
use strings
use write_pack
use interpolation_pack
use allocate_vars
use statistics
use mesh_pack
use time_integrators

implicit none

! http://www.fftw.org/fftw3_doc/Fortran-Examples.html

integer :: N_points
! go from real -> complex -> real
real(C_DOUBLE), allocatable :: tu_in_real(:), tu_out_real(:)
complex(C_DOUBLE_COMPLEX), allocatable :: tu_comp(:)
real :: start, finish
type(C_PTR) :: planre, planc
integer :: N_point_exp
integer :: iterations
! go from complex->complex->complex
complex(C_DOUBLE_COMPLEX), allocatable :: tu_in_comp(:), tu_out_comp(:)
type(C_PTR) :: planc_1, planc_2

do N_point_exp=20,25
N_points = 2**N_point_exp
print '("N_points_exp = ",i10)',N_point_exp
do iterations=1,5
planre = fftw_plan_dft_r2c_1d(N_points, tu_in_real, tu_comp, FFTW_ESTIMATE);
planc = fftw_plan_dft_c2r_1d(N_points, tu_comp, tu_out_real, FFTW_ESTIMATE);

! Allocate space for each array
allocate(tu_in_real(N_points), stat=alloc_err)
allocate(tu_out_real(N_points), stat=alloc_err)
allocate(tu_comp(N_points/2 + 1), stat=alloc_err)

tu_in_real = (0.0_dp, 0.0_dp)

call cpu_time(start)
call fftw_execute_dft_r2c(planre, tu_in_real, tu_comp)
call fftw_execute_dft_c2r(planc, tu_comp, tu_out_real)
call cpu_time(finish)
print '("1 sided FFTW time = ",f6.3," seconds.")',finish-start

call fftw_destroy_plan(planre)
call fftw_destroy_plan(planc)

deallocate(tu_in_real, tu_out_real)
deallocate(tu_comp)

! Now try the same but with two-sided FFTW

planc_1 = fftw_plan_dft_1d(N_points, tu_in_comp, tu_out_comp, FFTW_FORWARD, FFTW_ESTIMATE);
planc_2 = fftw_plan_dft_1d(N_points, tu_out_comp, tu_in_comp, FFTW_BACKWARD, FFTW_ESTIMATE);

allocate(tu_in_comp(N_points), stat=alloc_err)
allocate(tu_out_comp(N_points), stat=alloc_err)

tu_in_comp = (0.0_dp, 0.0_dp)

call cpu_time(start)
call fftw_execute_dft(planc_1, tu_in_comp, tu_out_comp)
call fftw_execute_dft(planc_2, tu_out_comp, tu_in_comp)
call cpu_time(finish)
print '("2 sided FFTW time = ",f6.3," seconds.")',finish-start

call fftw_destroy_plan(planc_1)
call fftw_destroy_plan(planc_2)

deallocate(tu_in_comp, tu_out_comp)
end do
end do

end program re_to_comp_test
95 changes: 95 additions & 0 deletions real_to_comp_set.f90
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
program re_to_comp_set

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

use global
use strings
use write_pack
use interpolation_pack
use allocate_vars
use statistics
use mesh_pack
use time_integrators

implicit none

! http://www.fftw.org/fftw3_doc/Fortran-Examples.html

integer :: N_points
! go from real -> complex -> real
real(C_DOUBLE), allocatable :: tu_in_real(:), tu_out_real(:)
complex(C_DOUBLE_COMPLEX), allocatable :: tu_comp(:)
real :: start, finish
type(C_PTR) :: planre, planc
integer :: N_point_exp
integer :: set_iter
real(C_DOUBLE) :: x_test
! go from complex->complex->complex
complex(C_DOUBLE_COMPLEX), allocatable :: tu_in_comp(:), tu_out_comp(:)
type(C_PTR) :: planc_1, planc_2

N_point_exp=3
N_points = 2**N_point_exp
print '("N_points_exp = ",i10)',N_point_exp

! Allocate space for each array
allocate(tu_in_real(N_points), stat=alloc_err)
allocate(tu_out_real(N_points), stat=alloc_err)
allocate(tu_comp(N_points/2+1), stat=alloc_err)

! set the values in the array
x_test = 0.1_dp
do set_iter=0,N_points
tu_in_real(set_iter) = sin(x_test)
x_test = x_test + 0.2_dp
end do

planre = fftw_plan_dft_r2c_1d(N_points, tu_in_real, tu_comp, FFTW_ESTIMATE);
planc = fftw_plan_dft_c2r_1d(N_points, tu_comp, tu_out_real, FFTW_ESTIMATE);

call cpu_time(start)
call fftw_execute_dft_r2c(planre, tu_in_real, tu_comp)
call fftw_execute_dft_c2r(planc, tu_comp, tu_out_real)
call cpu_time(finish)
print '("1 sided FFTW time = ",f6.3," seconds.")',finish-start

! Print the real_in, complex, real_out
print '("tu_in_real array: ")'
do set_iter=0,N_points
print *, tu_in_real(set_iter)
end do

print '("tu_real_out array: ")'
do set_iter=0,N_points
print *, tu_out_real(set_iter)/N_points
end do

call fftw_destroy_plan(planre)
call fftw_destroy_plan(planc)

deallocate(tu_in_real, tu_out_real)
deallocate(tu_comp)

! Now try the same but with two-sided FFTW

planc_1 = fftw_plan_dft_1d(N_points, tu_in_comp, tu_out_comp, FFTW_FORWARD, FFTW_ESTIMATE);
planc_2 = fftw_plan_dft_1d(N_points, tu_out_comp, tu_in_comp, FFTW_BACKWARD, FFTW_ESTIMATE);

allocate(tu_in_comp(N_points), stat=alloc_err)
allocate(tu_out_comp(N_points), stat=alloc_err)

tu_in_comp = (0.0_dp, 0.0_dp)

call cpu_time(start)
call fftw_execute_dft(planc_1, tu_in_comp, tu_out_comp)
call fftw_execute_dft(planc_2, tu_out_comp, tu_in_comp)
call cpu_time(finish)
print '("2 sided FFTW time = ",f6.3," seconds.")',finish-start

call fftw_destroy_plan(planc_1)
call fftw_destroy_plan(planc_2)

deallocate(tu_in_comp, tu_out_comp)

end program re_to_comp_set
35 changes: 35 additions & 0 deletions statistics.f90
Original file line number Diff line number Diff line change
Expand Up @@ -127,4 +127,39 @@ subroutine nusselt(Nuave, in_fourier)

end subroutine nusselt

! Subroutine added to calculate "nusselt-like" number of any given array

subroutine nusselt_var(Nuave, comp_array, in_fourier)

use global
use write_pack

implicit none

real(dp), intent(out) :: Nuave
complex(dp), intent(in) :: comp_array(:,:)
logical, intent(in) :: in_fourier
real(dp) :: dyT_wall
integer :: ii

write(*,*) "Computing Nu of input array..."


! Compute Nu at walls!
if (in_fourier) then
dyT_wall = real(comp_array(1,1)) + real(comp_array(2,1)) + real(comp_array(3,1))
Nuave = -dyT_wall
else
Nuave = 0.0_dp
do ii = 1,Nx
! Compute temperature gradients at the walls
dyT_wall = real(comp_array(1,ii)) + real(comp_array(2,ii)) + real(comp_array(3,ii))
Nuave = Nuave - dyT_wall ! Nu = -dy(T)
end do
Nuave = Nuave / Nx
end if
write(*,*) "Nu of input = ", Nuave

end subroutine nusselt_var

end module statistics
Loading