aBuildReferenceGuide/refGuide.tex at master · lydiash/aBuildReferenceGuide · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000

\documentclass{article}
\usepackage[utf8]{inputenc}
\usepackage{relsize}
\usepackage{float}
\usepackage{longtable}
\usepackage[dvipsnames]{xcolor}
\usepackage{graphicx} %for pictures
\usepackage{eufrak} %for fancy neighborhood n
\usepackage{indentfirst} %to indent the first paragraph after a title
\usepackage{enumitem} %itemize spacing options
\usepackage{textcomp} %for tildesssssss?????!!!
\usepackage{nameref} %for referencing unnumbered sections by name
\usepackage[section]{placeins} %float barriers
\setlist[itemize]{nosep, leftmargin=.5cm, topsep=0pt, partopsep=0pt}
\setlist[enumerate]{leftmargin=.5cm}

\title{So You Want to do Materials Research:\\[0.02em]\smaller{}a
guide to aBuild and the skills you need to use it}
\author{Lydia Harris and Eli Harris}

\begin{document}
\maketitle
\begin{abstract}
 The python module aBuild is meant to automate the process of
 building an MTP model for a paticular system. The documents presented
 in this folder/book are meant to be a reference guide to bash, git,
 aBuild, etc. Disclaimer: we wrote this to help ourselves remember
 these things, and as a favor to Brother Nelson. We reserve the right
 for some of it to be incomplete or confusing at times. If you have
 questions, don't be afraid to ask us or Brother Nelson.
\end{abstract}

\section{Getting Started}
To do computational research you will need to learn what the command
line is and some bash commands--see \nameref{sec:bash}. Go
learn your commands it will make your life easier!!

Now that you know the basics of the command line environment it is
time to utilize those skills to log into the supercomuter. If you have
not yet made an account, go to marylou.byu.edu click on request an
account. Your faculty mentor (most likely Brother Nelson) will need to
approve your account. The whole process may take a couple of
days. Once you have an account, if you are using Mac or Linux simply
open the terminal and use the ssh command (see
\nameref{sec:bash}.). If you have a Windows machine you have several
options: you can use the Ubuntu app, Windows power shell, Windows
terminal, etc.

\subsection{Make directories}
Once logged into the supercomputer you will need to build the
following directories (see \nameref{sec:bash}):
\begin{itemize}
  \item{\verb|\home\codes\aBuild|}
  \item{\verb|\home\bin|}
  \item{\verb|\home\environments|}
  \item{\verb|\home\system-species| (i.e. AgPt)}
\end{itemize}

If you don't know what system you're gonna study, talk with Brother
Nelson before you make your system-species folder. He'll help you find
one to study.

\subsection{Download aBuild}
Now it is time to download aBuild. The python module aBuild is meant
to automate the process of building a MTP model for a paticular
system. The module is continuously undergoing improvements. To have
version control we use git. If you are unfamiliar with git go read
\nameref{sec:github}. Git is an extremely helpful tool and used across
many different organizations. It's a skill you should add to your
resume once you are familiar with it. Once you've learned how to,
fork/download aBuild from https://github.com/lancejnelson/aBuild (see
table \ref{git} for a quick refresher).

\subsection{Virtual Environment/bash\_profile}
A convenient way to install and run software is in a virtual
environment. Because aBuild is a software package, it needs to be
installed for you to be able to run it. We will install it in a python
virtual environment. This is how you will create a virtual environment
and install aBuild:

\begin{center}
  \begin{longtable}{||p{7cm}|p{4cm}||} %longtable so it wraps %
    % pages
    \caption{Making a virtual environment and installing aBuild}
    \label{bashcommands}
    \\ \hline
    \textbf{Command} & \textbf{Description}\\ \hline \hline
    \endhead
    % make the previous page footer
    \hline
    \multicolumn{2}{||c||}
    {\tablename\ \thetable\ -- \textit{Continued on next
        page}} \\ \hline
    \endfoot
    %make the last footer
    \hline
    \endlastfoot
    %contents of the table
    cd environments & go into that directory \\
    python $-$m venv name\_of\_env	& make the environment\\
    emacs .bash\_profile & edit your .bash\_profile \\
    \verb|function workon {| & add these lines to it: \\
    \verb|source ~/environments/$1/bin/activate| &	\$1 refers to
                                                   the name of \\
    \verb|}| & your virtual environment\\
    ctrl$+$x ctrl$+$c y & quit the editor\\
    source .bash\_profile & update your .bash\_profile \\
    workon name\_of\_env & enters your environment \\
    python install -e \texttildelow\slash codes/aBuild & install
      aBuild in your virtual environment \\
    ctrl$+$d & exits your environment \\
  \end{longtable}
\end{center}

Since we used the .bash\_profile to make the virtual environment work,
I'll talk a little more about that now. Your .bash\_profile holds the
settings that are applied each time you log in to the
supercomputer. You can load software modules, change settings, and a
lot of other things. Some helpful additions to your .bash\_profile are
given here (actually I basically just copied and pasted my entire
.bash\_profile here), although these were the current software modules
on Marylou in Dec 2018, and I reserve the right to have old versions
of the modules loaded, since it is probably not Dec 2018 anymore.

\vspace{5mm}
%\noindent
\begin{itemize}
  \item{export PATH$=$ \$ PATH:\texttildelow \slash bin}
  \item{PS1$=$``\textbackslash u:\textbackslash w\textbackslash \$ "}
  \item{module load libfabric}
  \item{module purge}
  \item{module load compiler\_intel\slash 13.0.1}
  \item{module load gdb\slash 7.9.1}
  \item{module load gcc\slash 6.4}
  \item{module load python\slash 3.6}
  \item{export MAKESTRX$=$\texttildelow\slash bin\slash makestr.x}
  \item{export GETKPTS$=$\texttildelow\slash bin\slash getKPoints}
  \item{export ENUMX$=$\texttildelow\slash bin\slash enum.x}
  \item{export F90$=$ifort}
  \item{export HISTSIZE$=$100000}
  \item{set completion-ignore-case on}
  \item{function workon \{\\
      source\texttildelow /environments/\$1/bin/activate\\
      \} }
  \item{alias hh$=$'history $|$ grep '}
\end{itemize}

\subsection{Copy other software packages}
You'll need enum.x (.x means it's an executable file), makestr.x, and
getKPoints. There may be copies of these in the group folder that you
can copy to your bin. Ask Brother Nelson for help with this if there's
not.

\section{Run aBuild: Training Process}
Now that you have your account all set up, you are ready to start
training the model to your specific system. This section will guide
you through the different commands needed to train the model.

Your folder will need builder.py and a yaml file in it to run any of
these commands. Copy these from
\verb|~/codes/aBuild/aBuild/templates/master.yml| and
\verb|~/codes/aBuild/aBuild/scripts/builder.py|. Name your yaml
something intuitive (e.g. if you're studying the silver gold system,
maybe name it ``AgAu,'' etc..). Now you'll need to edit the yaml to
make sure everything in there matches your system (such as the title,
species, root, potcar directory, potcar versions, potcar setups,
mindistance, concs, nconfigs, sizes, etc..). If you need help
deciphering the yaml, go see the example.yml file in the
aBuildReferenceGuide repository on git. It has a bunch of comments to
help you understand what's going on, although you don't really need to
understand what's going on to get started.

Now you can start building your model. You can see the steps in table
\ref{algorithm} below. aBuild commands start with
``\verb|python builder.py **YML**|" and then some tag(s). This prefix
is only used for the tags denoted in this table by the `-'.

\begin{center}
  \begin{longtable}{||p{4cm}|p{7cm}||} %longtable so it wraps %
    % pages
    \caption{Algorithm steps and their descriptions.}
    \label{algorithm}
    \\ \hline
    \textbf{Step} & \textbf{Description}\\ \hline \hline
    \endhead
    % make the previous page footer
    \hline
    \multicolumn{2}{||c||}
    {\tablename\ \thetable\ -- \textit{Continued on next
        page}} \\ \hline
    \endfoot
    %make the last footer
    \hline
    \endlastfoot
    %contents of the table
    -enum & enumerates the crystalline structures up to sizes specified in your
            yaml \textcolor{red}{--run interactively}\\
    -setup\_relax & Builds to-relax.cfg and relax.ini; runs calc-grade
                    \textcolor{red}{--run interactively}\\
    qsub* jobscript\_relax.sh & mlp relax: needs to-relax.cfg, pot.mtp, and
                                relax.ini; generates: relaxed.cfg,
                                unrelaxed.cfg, and candidates.cfg \textcolor{red}{--job
                    submission, parallel, 10-30 cores, 6-30 hrs}\\
    -setup\_select\_add & Concatenates all of the candidate.cfg\_\#,
                          selection.log\_\#, relaxed.cfg\_\# and
                          unrelaxed.cfg\_\# into one file
                          each. relaxed.cfg file should get bigger and
                          bigger with each iteration. Also builds a
                          submission script. \textcolor{red}{--run interactively}\\
    qsub* jobscript\_select.sh & mlp select-add: generates:
                              new\_training.cfg; needs: train.cfg,
                              candidate.cfg \textcolor{red}{--job
                              submission, single core, 1-4 hrs}\\
    -add & builds A folders in training set and creates jobscript \textcolor{red}{--run interactively}\\
    qsub* jobscript\_vasp.sh & runs vasp calculations for
                                           the selected configurations
                                            \textcolor{red}{--array
                                            job, 6-30 hrs}\\
    -setup\_train & Pulls data from VASP folders, builds
                    train.cfg and pot.mtp \textcolor{red}{--run interactively} \\
    qsub* jobscript\_train.sh & mlp train: needs train.cfg, pot.mtp;
                                generates: Potential.mtp \textcolor{red}{--job
                    submission, parallel, 10-20 cores, 6-12 hrs}\\
    %go back to step -setup\_train & Repeat until model is fully
    %                                trained, i.e. all structures relax.\\
    go back to step -setup\_relax & Repeat until model is fully
                                    trained, i.e. all structures relax.\\
  \end{longtable}
\end{center}
*For Marylou use sbatch instead of qsub
**YML** is the yaml file without the .yml extension.

The training process will begin with an empty training set. This will
cause the relaxation to terminate for each structure on the first
iteration. It will take several iterations before the model is able to
relax all the configurations.

The mlp relax step tries to relax the structures in the to relax
set. If it extrapolates too much, it stops relaxing it, and adds the
structure to a preselected set.

The mlp select add step chooses from structures in candidate.cfg the
structures that best fill the ``missing'' configuration space.

The mlp train step tries fits a ``line'' to the training data it is
given. I say ``line'' and not line because it is a non linear problem,
but if it helps you to think of it like a line, do that.

\subsection{Other helpful MTP/aBuild commands}
These might come in handy at some point
\begin{center}
  \begin{longtable}{||p{4cm}|p{7cm}||} %longtable so it wraps %
    % pages
    \caption{aBuild commands and description}
    \label{bashcommands}
    \\ \hline
    \textbf{Command} & \textbf{Description}\\ \hline \hline
    \endhead
    % make the previous page footer
    \hline
    \multicolumn{2}{||c||}
    {\tablename\ \thetable\ -- \textit{Continued on next
        page}} \\ \hline
    \endfoot
    %make the last footer
    \hline
    \endlastfoot
    % contents of the table
    -status & prints a status report for Vasp calculations \\
    -report & creates data report file from completed Vasp
                           calculations \\
    -report -file path/to/file & creates data report file from the
                           configurations contained in the file specified \\
    -chull -file path/to/file & creates a convex hull from the data
                           report file specified \\
    mlp mindist file.cfg & prints the global minimum distance of the
                           atoms to eachother in a .cfg file. Also
                           adds the mindist attribute to structures in
                           the .cfg file\\
    mlp calc-grade pot.mtp train.cfg train.cfg temp.cfg & creates
                           state.mvs, a file needed for relaxation \\

  \end{longtable}
\end{center}

\section{Theory}
In this section we discuss what the algorithms are doing or the
theory behind the computations. You don't really need to know any of
this to get started working on this project, but if you're curious,
(as you should be at some point, you're a scientist, after all) here's
some info for you.

The main idea is to create a MTP model from Vasp calculations. The
Vasp calculations are ab initio or first principle calculations that
use DFT and/or DFT+U depending on the specific system. The MTP then is
trained by optimization, fitting the coefficients of the basis
functions (see section \ref{sec:MTP}:MTP for more details). After the
model has been trained it attempts to relax the atoms to their happy
place. If it cannot relax the atoms and still accurately predict the
energy, forces, and stresses, the model then selects more
configurations to add to the training set. These configurations are
then evaluated using Vasp and the cycle continues until the model is
able to relax all of the configurations.

The motivation of training a MTP model comes from the bottleneck
caused by Vasp calculations. In searching for configurations to create the
convex hull, the configuration space for varying concentrations is so vast
that using DFT calculation becomes impractical. This is due to the amount of
time the calculations take. By using an MTP training model we are able
to explore much more of the configuration space in signicantly less
time. This allows us to potentially discover new configurations of
a paticular system.

DISCLAIMER: Some of this gets hard to understand. We hope that this
document will be a nice introduction without too much scary stuff, but
with that being said, don't go scaring yourself off by jumping into all
of this too early. Each section has a statement that says ``stop
reading here if you're not ready.'' Listen to these statements.


\subsection{MTP} \label{sec:MTP}
MTP is an acronym for moment tensor potentials. It is a basis
expansion, and it involves a bunch of tensors. For the purpose of
doing this research, you don't really need to know a whole lot about
it, except that it's a way to represent crystalline structures that is
systematically improvable (meaning you can add more basis functions to
get a better and better representation of the crystal) unlike
classical potentials, and it is an off-lattice model (it isn't
confined to some parent structure, the atoms can be located anywhere
with respect to eachother), unlike cluster expansion. Basically it's a
brand spanking new (published just last year), way better basis for
crystal configurations than the previous methods (classical potentials
and cluster expansion, mentioned earlier). Figure \ref{fig:2d} is a
good cartoon visualization of what we're trying to accomplish with
this model.

\begin{figure}[h]
  \centering
  \includegraphics[scale = .4]{configVsEnergy}
  \caption{ Simple 2-dimensional visualization of configuration
    space vs. energy with a best fit ``line''. Each configuration is on
    the x-axis with it's corresponding energy on the y-axis. In
    reality this graph would be N+1 dimensional, where N is the
    number of basis functions, $B_{\alpha} \left(\mathfrak{n} \right)
    $.}
  \label{fig:2d}
\end{figure}

This is your quitting point, but in case the previous paragraph isn't
enough for you, here's a quick summary of the MTP basis, with a little
math, but not very much explanation of the math. Sorry about that. In
this github repository we will also include Bro. Nelson's report that
has a really good explanation of the MTP basis, including a sample
problem that can help you understand it if you want to.

The moment tensor potential (MTP) basis is a set of orthogonal basis
functions given by:
\begin{equation} \label{eq:basis}
  V \left(\mathfrak{n} \right) = \sum\limits_{\alpha}
  \xi_{\alpha}B_{\alpha} \left(\mathfrak{n} \right)
\end{equation}

The MTP basis can be used to represent crystalline configuration
space. A crystalline structure can be evaluated on this basis to an
arbitrary precision. A simple analogy to this kind of evaluation is
the Fourier transform, where higher and higher frequencies can be
added to make a better fit to any function.

The basis functions $B_{\alpha} \left(\mathfrak{n} \right) $ of the
MTP basis depend on the set of moment tensor descriptors:
\begin{equation} \label{eq:descriptors}
  M_{\mu,\nu}\left(\mathfrak{n}_i\right) = \sum\limits_{j} f_{\mu}
  \left( \left| \mathbf{r}_{ij} \right| ,z_i,z_j \right)
  \underbrace{\mathbf{r}_{ij}\otimes...\otimes\mathbf{r}_{ij}}_{\nu
     times}
\end{equation}
These descriptors are dependent on the immediate neighborhood,
$\mathfrak{n}_i$, of the $i$th atom, within some $R_{cut}$, as shown
in Figure \ref{fig:neighborhood}.  Each atom in the neighborhood of
the $i$th atom introduces four degrees of freedom to the energy
contribution, $V_i$. The total energy, $E$, depends on each energy
contribution, $V_i$. The four degrees of freedom are the three
coordinates in Euclidean space of the separation between atoms $i$ and
$j$, $r_{ij}$, and a discrete variable, $z_j$, that represents the
species of the neighboring atom. %These moment tensor descriptors are
%tensors of rank $J$, where $J$ is the number of atoms in the
%neighborhood.

\begin{figure}[h]
  \centering
  \includegraphics[scale=1.2]{neighborhood.jpg}
  \caption{The $i$th atom's neighborhood is made up of each atom
    within some $R_{cut}$ of itself. The total energy E is made up of
    the contributions from individual neighborhoods. The energy
    contribution, $V_i$, of neighborhood $\mathfrak{n}_i$ depends on
    the separation between atoms $i$ and $j$, $r_{ij}$, and a discrete
    variable, $z_j$, that represents the species of the atom in the
    neighborhood (I or II in this illustration)
    \cite{gubaev2019accelerating}. }
  \label{fig:neighborhood}
\end{figure}

The $f_{\mu}\left( \left| \mathbf{r}_{ij} \right| ,z_i,z_j
\right)$ term in Equation \ref{eq:descriptors} is given by:

\begin{equation} \label{eq:someequation}
  f_{\mu} \left( \mathbf{\rho} ,z_i,z_j \right) =
  \sum\limits_{k}c_{\mu,z_i,z_j}^{\left( k \right)}Q^{\left( k \right)} \left(
    \mathbf{\rho}
  \right)
\end{equation}
where
\begin{equation}  \label{eq:cheby}
  Q^{\left(k\right)} \left( \mathbf{\rho} \right) =
  T_k\left( \mathbf{\rho} \right)\left( R_{cut} - \mathbf{\rho}
  \right)^2
\end{equation}
In Equation \ref{eq:cheby}, $T_k\left(  \mathbf{\rho} \right)$ are the
Chebyshev polynomials on the interval $\left[ R_{min},R_{cut}\right]$.

The $\mathbf{r}_{ij}\otimes...\otimes\mathbf{r}_{ij}$ terms in
Equation \ref{eq:descriptors} contain angular information about the
neighborhood $\mathfrak{n}_i$ and are tensors of rank $\mathbf{\nu}$.
The basis functions $B_{\alpha} \left(\mathfrak{n} \right) $ are made
up all of possible contractions of any number of
$M_{\mu,\nu}\left(\mathfrak{n}_i\right)$ that result in a scalar. The
maximum depth of these calculations is chosen, with more levels
providing more accuracy. This attribute makes the basis a
systematically improvable functional form, similar to including more
frequencies in a Fourier transformation.

The $\xi_{\alpha}$ and $c_{\mu,z_i,z_j}^{\left( k \right)}$ terms in
Equations \ref{eq:basis} and \ref{eq:someequation} are parameters that must
be optimized. A quasi-Newton optimization technique is used to fit
these to the data provided by the training set.  The configuration
space created by these basis functions is $N$ dimensional, where $N$
is the number of basis functions the crystal was evaluated on. The
optimization can be visualized in 2 dimensions, ``configuration'' vs
energy, shown in Figure \ref{fig:2d}, where the appropriate terms are
chosen to minimize the error of a best fit ``line'' to the training
data.


If you would like to know more, see ref
\cite{gubaev2019accelerating}. There are several other less
understandable (for undergraduates) papers that talk about the basis,
in refs \cite{podryabinkin2017active,gubaev2018machine,
  shapeev2016moment}.


\subsection{DFT}
In quantum mechanics, the Schr\"{o}dinger equation must be solved to
find the energy of a system. Because of the size of many body
problems, it is impossible to solve the Schr\"{o}dinger equation
exactly for the system. This leads scientists to an approximation
called density functional theory (DFT). It is an ab-initio
calculation, or a first-principles calculation, if you've ever heard
those terms before. Density functional theory is based on the
assertion that the ground state energy of a system is a unique
functional (function of a function) of the electron density (which is
a function) \cite{PhysRev.136.B864}.

This is your quitting point. If you'd like to know more, go ahead and
keep on reading!

In this energy functional mentioned earlier, called the Kohn-Sham
equation \cite{kohn1965self}, every term can be known exactly except
the exchange-correlation (XC) functional.

The XC functional can be approximated in many different ways,
including the Local Density Approximation (LDA), which assumes the
electronic density behaves locally like a homogenous electron
gas. Another approach is the Generalized Gradient Approximation (GGA),
which is similar to LDA but includes the local gradient, and other
derivative methods of these two basic methods.

Because the Kohn-Sham equation depends on the electronic density and
is also used to find the density, an iterative approach that
converges to be self consistent is used to solve for the energy of the
system.  DFT calculations are done in k-space, and sample points in
the first Brillouin zone called k-points are chosen and appropriately
weighted to replace the integrals in the Kohn-Sham equation. These are
the KPOINTS files in your Vasp folders.

\begin{figure}
  \includegraphics[scale = .5]{PAW}
  \caption{Comparison of atomic wave functions of Mn using the PAW
    method (solid line) with the exact result (bullets) for a given
    energy and angular momentum. Shown also are their differences
    magnified by a factor of 10 (dash-dotted line), and their pseudo
    wave functions (dashed line) \cite{PhysRevB.50.17953}. }
  \label{fig:MnPAW}
\end{figure}

The method of pseudopotentials is often used to reduce the
computational load of DFT calculations. This method approximates the
inner electrons as a ``frozen core'' that place the electrons
surrounding them in an effective potential. The two common methods of
this approximation are the projector augmented-wave (PAW) method and
the UltraSoft PseudoPotential (USPP) method. The PAW method allows
all-electrons orbitals to be reconstructed from the pseudo-orbitals,
as shown in Figure \ref{fig:MnPAW}.

Vasp stands for the Vienna Ab initio Simulation Package
\cite{kresse1996software}. This is the package we use to do DFT. Vasp
requires the user to choose a method for approximating the XC
functional. Vasp also prefers the use of the PAW method, but a
specific PAW potential must be chosen. These are the POTCARS you
specified in the yaml.

This is pretty much all I understand about DFT. If you'd like to know
more, there's a pretty good lecture series on Youtube:
https://youtu.be/vJkNv095Aj8, and a good book you could read
\cite{kitchin2008modeling}.


\subsubsection{DFT+U}

With modeling large atoms (such as uranium), there can be some
difficulties with getting the DFT calculations to converge to a
``correct'' total energy. You probably won't use the method I'm about
to talk about, so you probably don't need to read this section unless
you've talked to Brother Nelson about modeling large atoms, or you're
morbidly curious. Read on, if you want.

This effect happens because the traditional treatment of electrons in
DFT calculations allows the Coulomb repulsion to scatter the
electrons, when in large atoms, the $d$ and $f$ electrons are strongly
correlated and localized. Because the energy of the system is
dependent upon the electron density, an incorrect density will often
predict an energy that is too high.

To remedy the traditional treatment of electrons in large atoms, DFT+U
should be used. DFT+U is also known as LDAU. Without the addition of
the U parameter to DFT calculations, the calculation may converge,
but it will likely converge to a non-physical solution.

The recommended method to ensure DFT+U converges to the
``correct'' total energy is to follow a ramping scheme discussed in ref
\cite{meredig2010method} that begins with a U parameter of 0 and ramps
up to 4.5 (in steps of 1, e.g. $0\Rightarrow 1\Rightarrow 2\Rightarrow
3\Rightarrow 4\Rightarrow 4.5$), using the charge density calculated in
the previous iteration. This helps ensure convergence to a ``true''
total energy.

These are the settings in the INCAR that must be used to employ DFT+U:
\begin{itemize}
  \item{ENCUT = 550 }
  \item{LWAVE = False }
  \item{ LDAU = True }
  \item{ LDAUTYPE = 1 }
  \item{ LDAUL = 3 -1 -1 }
  \item{ LDAUU = \# 0 0 (\# changes with ramping scheme)}
  \item{ ICHARG = 1 (after the first iteration of ramping scheme)}
  \item{ LDAUJ = 0.51 0 0 }
  \item{ ISMEAR = -5 }
  \item{ LMAXMIX = 6 }
  \item{ ISIF = 2 }
  \item{ NSW = 0 }
\end{itemize}

\subsection{Optimization Routines}
Optimization has to do with finding the best choice of some possible
options. That's a really broad concept, but with relation to what we
are doing, we want to minimize the error of a best fit line. We create
a best fit ``line'' of energy vs. structure (remember figure
\ref{fig:2d}?) This is a really simplified explanation of what the
algorithm is actually doing, because a structure is decomposed into a
set of $N$ basis functions, and each basis function has a rank $n$ tensor
(where $n$ is the order of the system, e.g. binary, ternary) that also
has to be optimized with it, so it's not a linear system.

The optimization scheme used to train the model is called BFGS. It
stands for Broyden-Fletcher-Goldfarb-Shanno. And now we're at your
quitting point. Feel free to keep reading if you're very curious (good
for you):

The BFGS algorithm is called a Quasi-Newton method, and is
based on the Newton method of optimization, which uses the Taylor
expansion of a function out to two derivatives, takes the partial with
respect to $ \Delta x $ and iteratively searches for the value of x that
makes the resulting function equal 0. This is the same as finding
where the derivative of a function equals 0. For the Newton method, we
need to find the second derivative to make this equation. For a
multivariable function, the second derivative is the Hessian
matrix. This matrix can be expensive to calculate, and thus
Quasi-Newton methods were developed, which approximate the Hessian in
one way or another. The BFGS algorithm requires an initial guess for
the Hessian (usually an identity matrix of appropriate dimension) and
iteratively approximates a new one as it solves for the $x$ that
minimizes the function. The equation for the Hessian matrix that the
BFGS algorithm uses is as follows:

\begin{equation}
  B_{k+1}= B_k + \frac{y_k y_k^T}{y_k^T s_k} - \frac{B_k s_k s_k^T
    B_k^t}{s_k^T B_k s_k}
\end{equation}

\noindent
with $ y_k = \nabla f ( \, x_{k+1} ) \, - \nabla f ( \, x_k ) \, $ and $ s_k =
x_{k+1}-x_k $.

The algorithm essentially finds a step direction (direction of
steepest descent), finds an acceptable step size (backtracking line-search
algorithm), takes the step, and then solves for the approximate
Hessian matrix until the new $x$ value and the old one are close enough
to each other, according to some epsilon value.

The backtracking line-search algorithm start with a maximum step size,
and makes it smaller by some factor $ \tau \in ( \, 0,1 ) \, $ until it
satisfies what is called the Armijo-Goldstein condition, as follows:

\begin{equation}
  f ( \, x + \alpha p ) \, \leq \alpha c m
\end{equation}

\noindent
where $ m = p^T \nabla f ( \, x ) \, $ and $ c \in ( \, 0,1 ) \, $
is some control parameter.

Are you happy you read to the end of this section?

\subsection{Convex Hull}
This is actually an important section. Go ahead and read the whole
thing if you get this far.

A convex hull is a cool mathematical tool, but it also has physical
importance. See figure \ref{fig:AgAuAFLOW} for a visualization. The
convex hull has some hints to what it is in it's name (for once a
useful name for something). It is constructed of the lowest energy
structures of a system that can be connected with a line that is
convex. It looks like the hull of a boat. The breaking points of the
convex hull represent the ground state structures of a system. (If you
didn't know, you can't actually make a crystalline structure with any
concentration of materials. For example, there is such thing as
$\mathrm{UO}_2$ and $\mathrm{U}_3\mathrm{O}_8$, but there is no such
thing as $\mathrm{U}_3\mathrm{O}_2$).

\begin{figure}[t!]
  \centering
  \includegraphics[scale=.3]{AgAuAFLOW.png}
  \caption{Convex hull of Ag-Au system as determined by 294
    high-throughput ab initio calculations
    \cite{curtarolo2012aflowlib}. }
  \label{fig:AgAuAFLOW}
\end{figure}

I lied to you. This is your quitting point, don't bother reading this unless you're doing a
ternary system (which you very well may be doing...). A three body system's convex hull
is the same thing mathematically, but is maybe a bit harder to
visualize because to see all of it you need 3 dimensions.  But you can
represent it in 2 dimensions, like in figure \ref{fig:3dchull}.

\begin{figure}[h]
  \centering
  \includegraphics[scale=1]{3dchull.jpg}
  \caption{Convex hull of Convex hull of the Co-Nb-V system
    constructed by MTP in the Co-rich region
    \cite{gubaev2019accelerating}.  }
  \label{fig:3dchull}
\end{figure}

% \subsection{}

\FloatBarrier

\section*{Appendix A: VASP files and what they do} \label{sec:vaspinput} %%%TODO
Pretty self explanatory name. If you want to learn a bit more about
VASP, this is a good section to read.

In order to run any Vasp calculation, you need a POSCAR, POTCAR,
INCAR, PRECALC, and KPOINTS. The rest of these files are output
files. There are a couple other output files I haven't mentioned
here.
%I'll have to come back and add some stuff in here later.... This might
%be better if it wasn't an appendix and just a section in the regular
%ref guide.

\subsection*{POSCAR}
This is the ``POSition'' car. Not really sure who came up with the
naming convetion, but whatever \verb|\_(-.-)_/|.
In this file there is title. There is also the lattice parameter,
which is what you multiply all the lattice vectors by to get the
cartesian coordinates for the lattice vectos, which are the three
lines that follow. Then there are the atom counts for each species (in
reverse alphabetical order), and then the coordinate system. This can
be ``D'' for direct, meaning that you multiply the first number of the
basis vector by the first lattice vector, the second by the second,
and the third by the third, then add up the x-coordinates, the
y-coordinates, and the z-coordinates to find the cartesian coordinate
of each atom within the unit cell. The ``C'' stands for cartesian
coordinates, so each number is just an x, y, or z component of the
basis atom's position. Next come the basis vectors, in either direct
or cartesian coordinates.

If you didn't already know, the lattice vectors are the repeating unit
of a crystalline structure. If you slide the origin to the end of one
lattice vector, you will end up on the same position (as in, the same position
but in a neighboring unit cell). The basis vectors are the positions
of all the atoms in the unit cell. Usually there is one basis vector
with a value of (0,0,0), meaning that the lattice vectors lie in the
middle of one of the atoms.

\subsection*{INCAR}
This is is ``INput'' car. This file has all the settings for the vasp calculations. In
\nameref{sec:VASPsettings} I talk about some of the settings we've
used in past works. They're probably safe to use, but you might want
to consult Brother Nelson about this.

\subsection*{PRECALC}
This is the input file for k point generation. The only thing you need
to touch in here is mindistance, unless you find yourself in some very
dire circumstances. See \nameref{sec:VASPsettings} for an explanation
of such dire circumstances...

\subsection*{POTCAR}
This is the ``POTential'' car. It has the psuedopotentials in it. You
can \verb|grep TITEL POTCAR| to make sure that the correct atom types
are in here.

\subsection*{KPOINTS}
This is generated by the getKPoints script. You probably don't ever
really have to worry about it unless you forget to generate it.

\subsection*{OUTCAR}
This is the ``OUTput'' car. It has some of the output in it. You can
\verb|grep TOTEN OUTCAR|  to see the total energy. If the energy has
converged, you will see ``\verb|free  energy|'' with two spaces.

\subsection*{CONTCAR}
This is an output file. If you let the atoms relax, this is the new POSCAR. Kinda. It just
specifies the new positions of the atoms. Also I'm not sure what the
``CONT'' is.

\subsection*{CHGCAR}
Not sure, maybe the ``CHarGe'' car? Ask Brother Nelson

\subsection*{OSZICAR}
This has how many iterations Vasp has done in it. I'm not sure what
other purpose it serves, and I'm also not sure what the ``OSZI'' is.


\section*{Appendix B: Bash Commands} \label{sec:bash}

To do any of this work, you need to learn to navigate your command
line. Macs and Linux have a built in command line (terminal). On
Windows, there are several options:
\begin{enumerate}
  \item{Windows Powershell. See:
      https://docs.microsoft.com/en-us/windows-
      server/administration/windows-commands/powershell
      for instructions on how to configure your pc to use it.}
  \item{The Ubuntu app--simply search for it in your windows store.}
  \item{Windows recently came out with Windows Terminal as a
      central place for all the command line userfaces.}
  \item{If you know any other apps, feel free to use them}
\end{enumerate}
Your command line allows you to communicate with your machine by
typing. The language you use is called bash, and if you want to make a
script to execute bash commands, you call it a shell script. See Table
\ref{bashcommands} for a list of helpful commands.

\begin{center}
  \begin{longtable}{||p{5.5cm}|p{5.5cm}||} %longtable so it wraps pages
    \caption{Bash commands and what they mean}
    \label{bashcommands}
    %make the main header
    \\ \hline
    \textbf{Command} & \textbf{What it does}\\ \hline \hline
    \endfirsthead
    %make the next page header
    \hline
    \multicolumn{2}{||c||}
    {\tablename\ \thetable\ -- \textit{Continued from previous page}}
    \\ \hline
    \textbf{Command} & \textbf{What it does}\\ \hline \hline
    \endhead
    %make the previous page footer
    \multicolumn{2}{||c||}
    {\tablename\ \thetable\ -- \textit{Continued on next
        page}} \\ \hline
    \endfoot
    %make the last footer
    \hline
    \endlastfoot
    %contents of the table
    \verb|ls| & list contents of current directory
    \\ \hline
    \verb|ls -a| & show hidden files too \\ \hline
    \verb|ls -altr| & see the last changes made to
    the files in a directory \\ \hline
    \verb|mkdir directory| & make a new directory
    \\ \hline
    \verb|cd directory| & change directory \\ \hline
    \verb|cd .. |& go back a directory \\ \hline
    \verb|cd ../..| & go back two directories \\ \hline
    \verb|cd ~ or cd |& go to your home directory \\ \hline
    \verb|pwd| & print working directory \\ \hline
    \verb|~/| & means your root \\ \hline
    \verb|.| & means the current directory \\ \hline
    \verb|cp file/to/copy where/newName| & copy
    a file \\ \hline
    \verb|cp file/to/copy .| & copy a file to current
    directory without changing the name \\ \hline
    \verb|cp directory/* . |& copy all the files in a
    directory \\ \hline
    & to the current directory \\ \hline
    \verb|cp -r directory new/directory| & copy a
    directory recursively \\ \hline
    \verb|rm file/to/remove| & remove a file \\ \hline
    \verb|rmdir directory| & remove a directory \\ \hline
    \verb|rm -rf directory| & blow away a directory
    permanently \\ \hline
    \verb|mv file/to/move where/newName| & moves
    or renames a file \\ \hline
    \verb|man command| & show the manual for a
    command \\ \hline
    \verb|cat file/one file/two| \textgreater \verb| new_file|
    & concatonate two or more
    files into a new file \\ \hline
    \verb|history| & shows a history of your
    commands \\ \hline
    \verb|less file/to/see| & shows one page of a
    file \\ \hline
    & space turns the page q quits \\ \hline
    \verb|head file/to/see| & see the first page
    of a file \\ \hline
    \verb|head -n 8 file/to/see| & see the first 8 lines
    of a file \\ \hline
    \verb|tail file/to/see| & see the last page of a file
    \\ \hline
    \verb|tail -n 10 file/to/see| & see the last 10 lines
    of a file \\ \hline
    \verb|grep keyword file/to/search| & search a file
    for a keyword and print all the lines with
    that
    keyword
    to the
    screen \\ \hline
    \verb|history |\textbar\verb| grep keyword| & search your history
    for a keyword \\ \hline
    \verb|grep keyword file/to/search|& count
    the occurences of lines with a \\
    \textbar \verb| wc -l| & keyword\\ \hline
    \verb|command |\textbar \verb| less| & pipe the output of a command
    to less \\ \hline
    \verb|command |\textgreater\textgreater\verb| file|
    & append the output of a
    command to a file \\ \hline
    \verb|command |\textgreater \verb| file| & writes the output of the
    command to a file \\ \hline
    \verb|!command| & executes the most recent command
    that starts with the letters you typed \\ \hline
    \verb|echo something| & print something to the screen \\\hline
    \verb|ls -altr| & see when files in the directory were last
                      altered \\\hline
    \verb|sed -i 's/to replace/new| & find and replace a phrase in a
                                      file \\
    \verb|phrase/' file/to/search| & \\ \hline
    \verb|grep -Rl keyword| & recursively search for a keyword and
                              print the file it was found in \\\hline
    \verb|awk '!a[$0]++' file/to/search| & get rid of duplicate lines
    \\\hline
    \verb|echo "phrase" >> file/to/append| & append a phrase to a file \\\hline
  \end{longtable}
\end{center}

\subsection*{Bash Loops}

Here's a table of basic bash loops and logic, and a basic example in
Table \ref{loopexample} that relates to what we do.

\begin{center}
  \begin{longtable}{||p{4.5cm}|p{6.5cm}||}
    \caption{Loops in bash}
    \label{loops}
    %make the main header
    \\ \hline
    \textbf{Command} & \textbf{What it does}\\ \hline \hline
    \endfirsthead
    %make the next page header
    \hline
    \multicolumn{2}{||c||}
    {\tablename\ \thetable\ -- \textit{Continued from previous page}}
    \\ \hline
    \textbf{Command} & \textbf{What it does}\\ \hline \hline
    \endhead
    % make the previous page footer
    \multicolumn{2}{||c||}
    {\tablename\ \thetable\ -- \textit{Continued on next
        page}} \\ \hline
    \endfoot
    %make the last footer
    \hline
    \endlastfoot
    %contents of the table
    \verb|for i in {1..100}| & for 100 iterations \\
    \verb|do|& \\
    \verb|command $i| & do this thing (\verb|$i|references the index) \\
    \verb|done| & \\
    \hline
    \verb|for i in `ls -d */`| & for every directory in this
                                   directory \\
    \verb|do| & \\
    \verb|cd $i| & cd into every directory \\
    \verb|...| & \\
    \hline
    \verb|if [ condition ]| & check the condition \\
    \verb|then| & if it's true \\
    \verb|command| & do this \\
    \verb|else| & if it's not \\
    \verb|command| & do this \\
    \verb|fi| & \\
    \hline
    \verb|if [ -e file ] |& check if a file exists \\
  \end{longtable}
\end{center}

\begin{center}
  \begin{longtable}{||p{4.5cm}|p{6.5cm}||}
    \caption{Example of a Bash loop}
    \label{loopexample}
    %first header
    \\ \hline
    \textbf{Command} & \textbf{What it does}\\ \hline \hline
    \endfirsthead
    %next page header
    \hline
    \multicolumn{2}{||c||}
    {\tablename\ \thetable\ -- \textit{Continued from previous page}}
    \\ \hline
    \textbf{Command} & \textbf{What it does}\\ \hline \hline
    \endhead
    %footer
    \multicolumn{2}{||c||}
    {\tablename\ \thetable\ -- \textit{Continued on next
        page}} \\ \hline
    \endfoot
    %last page footer
    \hline
    \endlastfoot
    \verb|for i in {1..100}| & for 100 times \\
    \verb|do| & \\
    \verb|cd E.$i| & enter the directory named
    \verb|E.#|\\
    \verb|if [! -e KPOINTS ]| & if KPOINTS doesn't exist
    \\
    \verb|echo $i| & print the directory number \\
    \verb|getKPoints| & run the getKPoints script \\
    \verb|fi|
    \verb|cd ..| & go back one directory \\
    \verb|done|& \\
  \end{longtable}
\end{center}


\section*{Appendix C: Emacs} \label{sec:emacs}

Emacs is a text editor that has a bunch of cool shortcuts you can
learn to make editing documents super easy. On a Mac you can download
Aquamacs, which uses the same commands as Emacs but you can click with
your mouse. \ref{emacs} has a chart of basic Emacs
commands.

\begin{center}
  \begin{longtable}{||p{4.5cm}|p{6.5cm}||}%longtable so it wraps pages
    \caption{Emacs commands and what they mean}
    \label{emacs}
    %first header
    \\ \hline
    \textbf{Command} & \textbf{What it does}\\ \hline \hline
    \endfirsthead
    %next page header
    \hline
    \multicolumn{2}{||c||}
    {\tablename\ \thetable\ -- \textit{Continued from previous page}}
    \\ \hline
    \textbf{Command} & \textbf{What it does}\\ \hline \hline
    \endhead
    %footer
    \multicolumn{2}{||c||}
    {\tablename\ \thetable\ -- \textit{Continued on next
        page}} \\ \hline
    \endfoot
    %last footer
    \hline
    \endlastfoot
    %contents of table
    \verb|emacs path/to/file| & enter emacs editor for
    existing file or creates new file with that name \\ \hline
    ctrl+x ctrl+c y & save and quit a file \\ \hline
    ctrl+x ctrl+c n & quit without saving \\ \hline
    ctrl+w & cut a line \\ \hline
    ctrl+y & paste a line \\ \hline
    ctrl+k & kills the contents of a line \\ \hline
    ctrl+k ctrl+k & kills a whole line \\ \hline
    ctrl+shift+- & undo \\ \hline
    ctrl+u \verb|3 command| & executes the command 3 times \\ \hline
    ctrl+x ctrl+f & find and open a file (at the bottom of the screen)