Tuesday, 10 June 2014

5. Performance considerations when using nwchem

nwchem supports parallel computing among all the available CPUs in a multicore computer, depending on the compilation flags. The Ubuntu distribution is linked against openmpi library, but compilation flags also determine whether they benefit automatically from the available CPUs or whether you need to explicitly indicate how many CPUs to use. In order to check how your installation works, first check whether it depends on openmpi:
linux$ ldd /usr/bin/nwchem | grep libmpi
If you get a match, then you can use mpi parallelization. Next step is to check how many CPUs your computer has with lscpu command. and whether nwchem distributes the process among all available available ones. Run the command again and in another shell run top. If the number of CPUs is bigger than 1, but there is only one nwchem process, then you must explicitly run nwchem in multiple cores. You should run instead:
linux$ mpirun -n <number of cpus> nwchem l-proline
Not the entire optimization process is completely parallelizable, so parallelization does not meant that if you run the process in 4 cpus, it will take 4 times less time, but for sure will improve it. Improvement also depends on the kind of calculation you are performing. Some tasks are more paralellizable than others.

Once you achieved the optimized geometry, the output files are what you need for performing other calculations with it. But if you want as output a new xyz file for visualization or other purposes (even using this optimized geometry in a geometry block of another nwchem file), you can print it, by including

driver
  xyz l-proline_opt
end
 This command will generate a new xyz file for each optimization step performed. So, lets modify our nwchem input file:
basis
  * library 6-31g
end
driver
  xyz l-proline_opt
end
task scf optimize
(we don't need the geometry block here, as we will work with the generated vector files, but if you do include it, it will be ignored)

As we are working with an already converged solution, nwchem will perform some few more steps (and then discover that convergence was already achieved), so you will get some few l-proline_opt* files, consecutively numbered. The last one contains the optimized geometry. You can then do:
linux$ babel l-proline_opt-002.xyz l-proline.mol
and open the file with jmol for visualization.

You can also use the optimized geometry for a further task instead of the nwchem output vector files. You can generate the last exactly in the same way we generated them the first time, but with the optimized xyz file instead in the geometry block. As geometry is already optimized, nwchem will achieve convergence very fast, in a pair of steps. Even faster, you can use task scf energy instead of task scf optimize. This will generate all the vector files without performing any optimization.

A typical strategy for optimizing molecule geometry with a higher order (more accurate and more CPU demanding) basis set, is to pre optimize molecule geometry with a fast, less accurate one, so the optimization with the higher one will take really much less time. For example, after we got an optimized state with 6-31g, we can run a further one with cc-pvdz:
basis
  * library cc-pvdz
end
task scf optimize 

4. Energy optimization of molecule geometry with nwchem

In this lesson I will introduce a great opensource suite for classical and quantic chemical computation, called nwchem. If you are not familiarized with this application, I recommend to check the Getting Started section of the documentation before continue reading this post. Here i will mention only the most basic concepts.

nwchem reads all the parameters needed for the computation from an input file with extension nw. In order to feed nwchem with the molecule geometry, we can write the coordinates in the input file, in simple xyz format, or load it from another file. As in out first post, Visualizing a 3D structure of a molecule with jmol, we will start geometry specification from a smiles file. Lets suppose we want to minimize the configuration of L-proline (smiles: [O-]C(=O)C1CCC[NH2+]1), so we save the smiles string into a file, and convert to xyz using babel:
linux$ babel l-proline.smi l-proline.xyz --gen3d
And the following is the nwchem input file for optimizing configuration:
geometry
  load format xyz l-proline.xyz
end
basis
  * library 6-31g
end
task scf optimize
Inside the geometry directive we instruct nwchem to load molecular geometry from our xyz file.

Inside the basis directive we specify the basis set to use for modeling the electronic orbitals of each atom class in the molecule. There are many different basis sets. You have yo choose in terms of compromise between speed and precision. The set cc-pvdz is an acronym for the correlation-consistent Double-zeta, and it is the minimal of the correlation-consistent group of basis sets, so it is the faster and less accurate of them, but still good quality for most serious applications that involves atoms up to the 3rd period. However, in many systems still can demand lot of time, specially if the input xyz file is very far from optimized configuration. So for this exercise we will choose a quite fast (but less accurate) model: 6-31g. We apply this model to all the atoms, using the wildcard *.

With the task directive we invoke the Hartree-Fock (SCF) method for optimizing the molecular geometry by energy minimization.

Save the file as l-proline.nw and then run
linux$ nwchem l-proline
The computations are very CPU intensive. They imply quantum mechanical numeric methods with several levels of iterations, and increases geometrically with the number of atoms involved, so they will take a time, which depends mainly on the CPU speed on your computer and the geometry of the input file.

If you get an error saying that optimization failed because computation didn't reach convergence criteria, don't worry. Just run the command again, until you achieve the default set convergence threshold. nwchem stores iteration vectors state so when you run the command again, you will not start from beginning, but from the point the previous run finished. Even more, the saved vector files will allow you to perform further calculations (see following lessons).

Saturday, 23 February 2013

7. Computing and visualization of Electrostatic Potential Map over molecular isodensity surface

In the previous post i explained how to compute and visualize an EPM over a Van der Waals surface using nwchem and jmol. However, actually it was found more realistic, theoretically and experimentally, to define the surface of a molecule as an isodensity surface, that is, a surface of equal electronic density, derived from the molecular orbitals influence.

A typically useful isodensity surface for estimating molecular shape and surface, is the one defined by a cutoff of 0.001 electron/Bohr3, that is, the one that encloses the space where the electron density is above such threshold. Such a cutoff encloses more than the 95% of the molecular electronic density.

Taking the case of l-proline again, the nwchem file for generating an electron density grid from the vector files generated in the previous steps would be something like:
dplot
  limitxyz
    -5.0 5.0 100
    -5.0 5.0 100
    -5.0 5.0 100
  gaussian
  output l-proline.den.cube
end
task dplot 
The limitxyz block here is used to define the limits and number of points for each axis. In order to estimate good limits for the grid, you can open the model in jmol and type on the console
jmol$ show boundbox
and substract two unities to the resuling first corner coordinates, and add two unities to the resulting second corner coordinate, and round it.

Once got the density grid output, file, in jmol console you can do:
jmol$ isosurface cutoff 0.001 "l-proline.den.cube" map "l-proline.elp.cube" jmol$ write isosurface "l-proline.den.jvxl"
Compare the result (first figure) with the one obtained previously with VDW surface (second figure):

Thursday, 14 February 2013

6. Simple computing and visualization of molecular electron potential map

The most used kind of electrostatic potential map (EPM) is a mapping of the electrostatic potential generated by the charge distribution of the molecule, over an electron isodensity surface, but there is a simple and reasonable approximation to EPM using as mapping isosurface the Van der Waals surface of the atoms, and jmol is able to generate such isosurface.

The second requirement for building an EPM is to provide a volumetric grid of electrostatic potential (ESP) around the molecule. jmol does not generate it, but nwchem does it (see section 4). Using the generated data in section 4, we modify our input file l-proline.nw:
property
  esp
  grid
end
task scf property
save it and run nwchem:
linux$ nwchem l-proline
After process finished, some new files have been generated. One of them, a file named l-proline.elp.cube, contains the ESP volume grid and the optimized molecular geometry in cube format. Worth to mention that, although nwchem was always able to generate ESP grid data, dumping it in cube format is only available since version 6.2, so be sure this, or a recent one, is the version installed in your system1

So next step is to generate the mol file from the optimized geometry contained in the cube file, and load with jmol:
linux$ babel l-proline.elp.cube l-proline.mol
linux$ jmol l-proline.mol

The final step is to generate the molecular surface coloured with the ESP projected on it. So, from jmol console:
jmol$ isosurface ID myepmsurface molecular map "l-proline.elp.cube"
The molecular keyword is just a minor fix that jmol makes over the pure Van der Waals (vdw) surface, in order to slightly fill the spaces between the Van der Waals spheres, thus generating a surface more similar to the one defined in terms of electronic isodensity.

Because cube files are very large even for small molecules and contains much more data than needed in order to display an isosurface, they are not the best suited for loading into a jmol applet from the web. Instead, you can dump the generated isosurfaces into a file, in jvxl file format (a jmol specific file for storing jmol generated data in more convenient way):
jmol$ write isosurface "l-proline.jvxl"
and load them from the applet, using its name as reference. Below we can see the result in a jmol applet initialized with the script
load http://dl.dropbox.com/u/xxxxxxxx/jmol/l-proline2.mol; isosurface color translucent 0.3 "http://dl.dropbox.com/u/xxxxxxxx/jmol/l-proline.jvxl"; spin on; 
The load command is not needed if you only want to see the isosurface. But you may want to make it slightly translucent in order to view the molecule inside it, as actually this script does:

Notes

[1] Check this document if you need to compile nwchem by yourself.

Friday, 11 January 2013

3. Embedding a jmol applet in your blog

Jmol has a beautiful feature: it can be embedded in a web page so you can present an interactive jmol view in yours, including blog posts.

You will need a place in the web to store the jmol jar file. If you don't have your own server, you can just use your Dropbox (or similar service) public folder, and this is the method i will describe here as it is readily available for anyone.

First step is to download the latest version (i had problems testing the procedure with version 12. My tests were done with version 13.0.10) of jmol full source package and uncompress it:
http://sourceforge.net/projects/jmol/files/Jmol/

Then create a folder jmol inside your Dropbox Public folder, and copy there the file JmolAppletSigned.jar and any of the models you want to show in your web page or blog. In Jmol version 14, JmolAppletSigned.jar comes inside a zip file jsmol.zip, included in the jmol distribution.

Finally, assuming the public Dropbox link to the folder is http://dl.dropbox.com/u/xxxxxxxx/jmol/, you would include something like this in the proper place in your post:

<applet name='jmol' code="JmolApplet" archive="http://dl.dropbox.com/u/xxxxxxxx/jmol/JmolAppletSigned.jar" width="350" height="350">
<param name="progressbar" value='true' />
<param name="script" value="load http://dl.dropbox.com/u/xxxxxxxx/jmol/L-proline.mol; spin on;" />
</applet>

And this is the working example (give a time to load):


Observe that you can interact with the applet using your mouse1. Right button opens an options menu. You can change the size of the applet area at taste using the tag attributes width and height. This is the most basic usage. About everything else, it is just a matter of adding the needed jmol script code in the value attribute of the script parameter.

Notes

[1] Check my previous post Visualizing a 3D structure of a molecule with jmol.

Thursday, 10 January 2013

2. Creating a molecule scheme with bkchem

The best application i found for creating a molecule scheme is bkchem. It has an excellent user interface for scheme edition and reads smiles1 files.

In comparison with other applications, i found bkchem has very practical and friendly tools for performing edition operations, where the others becomes very annoying and even impossible.

Let's introduce the user interface of bkchem by creating the scheme of D-glucose molecule (the figure is what we want to obtain as final result)
  1. Go to main menu -> Chemistry -> Read SMILES and enter the smiles string for D-glucose O=C[C@H](O)[C@@H](O)[C@H](O)[C@H](O)C(O).
  2. Select the "transformation mode" set of tools (from the first line of icons menu), and using its different tools unfold and align the obtained structure.
  3. In order to fix the group symbols order (in our example, OH instead of HO, or viceversa) click right button over the group, and select option Symbol positioning. Select center last or center first, according to your needs.
  4. Finally lets fix the bond draws in order to reflect the correct stereo isomer configuration. Use the "draw" set of tools and fix bonds with the wedge and hatch tools.
  5. bkchem saves the image directly in SVG (Scalable Vector Graphics) format, another big advantage i found with this program. SVG graphic format is very suitable for edition of molecule schemes, because is based on vector primitives instead of pixels. And most important browsers and document editing applications supports it. Before saving the file, just ensure to crop the image to the area of the scheme (main menu -> File -> File properties) with a nice margin.
This has been an introduction using just a small subset --but the most used-- of tools among all the possibilities this application provides, so i leave for you the task for exploring the rest.

Notes

[1] On smiles files, see previous post, Visualizing a 3D structure of a molecule with jmol.

Wednesday, 9 January 2013

1. Visualizing a 3D structure of a molecule with jmol

jmol is well suited for creating 3D molecule structures in a graphical way, and my first steps were in this direction. However, soon i found that writing the molecule specification in a smiles formatted file, import them to mol format and finally open with jmol is much easier and much much faster, once you learn the smiles format specifications (very easy indeed).

The best place i found for learning smiles format specs is
http://www.daylight.com/dayhtml/doc/theory/theory.smiles.html
A smiles file consist just in a text file with a single line which specifies chemical composition and structure. In the table below some examples are shown.

molecule namesmiles specification
methanolCO
ammonium[NH4+]
D-glucoseO=C[C@H](O)[C@@H](O)[C@H](O)[C@H](O)C(O)
L-proline[O-]C(=O)C1CCC[NH2+]1
psilocybinO(P(O)(=O)[O-])c1cccc2c1c(CC[NH+](C)C)cn2
iron(III) oxideO=[Fe]O[Fe]=O
Sodium chloride[Na+].[Cl-]

Another great advantage of using this approach is that most chemical compounds found in wikipedia comes with the smiles specification string.

Once you have the smiles file with the molecule spec, you can easily convert to jmol format file (.mol extension), using babel:
linux$ babel myfile.smi myfile.mol --gen3d
The --gen3d option is needed in order to generate 3D coordinates in the destination file. babel will also perform energy minimization calculations in order to create a valid molecule configuration.

Finally, you open it with jmol:
linux$ jmol myfile.mol
The image is an example on how a molecule looks in the jmol visualizer area (jmol can export a view as an image in different formats), according to ball-and-stick model:



Once in the jmol visualizer, you can move and rotate the molecule easily with the mouse:

  • rotate on X axis: press left button and move the mouse pointer up/down without releasing the button.
  • rotate on  Y axis: press left button and move the mouse pointer left/right without releasing the button.
  • rotate on Z axis: press SHIFT key, then right button, and move the mouse pointer left/right without releasing the key nor the button.
  • move away/closer: use the mouse wheel.
  • translate: double click left button and move mouse pointer without releasing button.
Rotation operations center is the first atom of the file.

Other interesting functions are accesible by clicking right mouse on the visualization area. For example, with menu View you can change the perspective from the six sides of the molecule bounding box. Also you can active spin on and off in order to view the molecule from a rotating camera around the object.

Jmol commands are also accesible from a console (to open: Menu File -> Console) and in fact advanced control and computations can only be done from the console.

For instance, spin can be set on/off from the console using the jmol command spin. Examples:
jmol$ spin on
jmol$ spin off
In order to reset the molecule to its initial position:
jmol$ reset
When you place the mouse over an atom, a label will be displayed (by default, the element symbol plus the number id of the atom, but can be controlled with the command hover). You can use this label with the command center in order to center the molecule at the given atom. Rotation operations then will be centered on the selected atom. Examples:
jmol$ center C3
jmol$ center N11

A complete reference of jmol commands and functions can be found here:
http://chemapps.stolaf.edu/jmol/docs/