Parallelized EMPFT and EM3DR
Introduction :
In order to do my calculation jobs as fast as possible, I migrated Dr. Tim. Baker's cryo-EM programs to IBM machines and parallized them. This page contains the information only for how to use my migrated and parallized versions of the programs. For how to use the programs, such as how to set the input viables, please refer to Dr. Baker's program pages.
How to use my version :
Requirement : You need accounts on Purdue University Computer Cluster and SP2 parallel machine to use my version of programs. The 3-D computer cluster is reserved for Biological Science Department which has three IBM RS6000/580(dart.cc,dash.cc,deft.cc) and one RS6000/591(POWER2)(deep.cc). The SP2 computer cluster is open to the whole university. The SP consists of a front-end machine named cloud.cc.purdue.edu and 80 computational nodes. The front-end machine is an IBM RS/6000 Model F-50 with four 332 MHz PowerPC processors and 4 GB of memory. The nodes are housed in five large frames, four of which contain the new WinterHawk-II nodes added during the July 2000 upgrade, and one of which contains the older WinterHawk nodes. All WinterHawk also called POWER3 machines. For how to obtain your account, please talk to Dwight D. Mckay, structure group computer system administrator.
Program list :
Description :
I have optimized the programs on different models of IBM machine. I use suffix to represent different machine model. If there is no suffix of the program, it means the program can be run on all models of IBM machine with AIX operation system except the parallel machine of SP2. Below is the list of the MACHINE_MODEL.
- None: for all models of IBM machine except SP2 parallel machine;
- pwr2: for POWER2 machines like deep.cc in 3-D cluster, you need to login to deep.cc and use "qsub -q rs_huge_pwr2" or "qsub -q rs_memory_pwr2" to submit your job;
- pwr3: for POWER3 machine like cycle.cc, check.cc in CC3 cluster. You need to use "-l arch=pwr3" as one of the parameters when you submit job on cloud.cc.purdue.edu which is a job server of CC3 cluster;
- para: for all nodes of the parallel machine,SP2;
- para.pwr2: for the 16 old POWER2 nodes of SP2 parallel machine. You need to use "-l arch=pwr2" as one of the parameters when you submit your jobs;
- para.pwr3: for the 16 new POWER3 nodes of SP2 parallel machine. You need to use "-l arch=pwr3" as one of the parameters when you submit your jobs;
For how to submit jobs on SP2 parallel machine, please refer to the page of RCD. I will also suggest you to take the short courses PUCC offer every semester for how to use the computational resources.
SPEED Comparison of EMPFT :
A real data set with 814 particles that come from the CAV21 with ICAM-1 was calculated on different models of IBM machines and a VMS machine, laevo.bio.purdue.edu in Professor Tim. Baker's lab. Below is the table for comparison. The time on Laevo.bio is included as a reference. From the table, we know that the speed of POWER2 is a little bit slower than Laevo.bio. But the speed of POWER3 is 2 times faster than Laevo.bio. And the parallel machine reduces the calculation time remarkably. Input parameters of the calculation job are listed after the table.
| System | Digital VMS | IBM AIX | IBM AIX | IBM AIX | IBM AIX | IBM AIX |
|---|---|---|---|---|---|---|
| Machine | Laevo.bio | Dash.cc | Deep.cc | Cycle.cc | Sp101sp -Sp116sp.cc |
Sp201sp -Sp216sp.cc |
| Machine model | ALPHA station500 (400MHz) |
IBM RS6000/580 |
IBM RS6000/591 (POWER2) |
IBM RS6000/43P-260 (POWER3) |
IBM RS6000/59H (POWER2) |
IBM RS6000/43P-260 (POWER3) |
| CPU number | 1 | 1 | 1 | 1 | 16 | 16 |
| Program | pft.exe | pft | pft.pwr2 | pft.pwr3 | pft.para | pft.para.pwr3 |
| Real Time | 7hr32min42sec | 15hr53min5sec | 9hr1min7sec | 3hr51min43sec | 40min53sec | 12min27sec |
| CPU Time | 7hr27min30.35sec | 15hr31min55.17sec | 8hr33min16.54sec | 3hr23min7.24sec | 39min15.01sec -40min7.20sec* |
11min36.65sec -12min18.45sec* |
| Speed up factor | 1 | 0.47 | 0.82 | 1.93 | 10.95 | 35.96 |
| Page Faults | 21177 | 28 | 91 | 596 | 252 -328* |
256 -448* |
Table 1. Comparison between different versions of the program EMPFT with a real data set
* Because there are 16 nodes and different nodes will give out differnt values. Here only list the minimum and maximum values.
** input parameters:
- MODE=1,BIN=1,SYM=532,DANG=1.0,CTFMODE=1,ILIST=0
- PFTRAD_LO=1.0,PFTRAD_HI=111.0,PFTRAD_STEP=1.0,ANN_LO=0,ANN_HI=111
- RES_LO=100.0,RES_HI=30.0,JCUT=1,SIG=0.0
- MAG_CEN=1.0,MAG_STEP=0.005,MAG_NUM=5,MAG_NORM=1
Known problem of parallelized PFT :
The parallized programs do not support ILIST=1 or ILIST=2 which will write output of EMPFT.RADS, EMPFT.RES1, EMPFT.RES2. Because it will lead to different nodes write to the same file and you will not obtain useful EMPFT.RADS. Therefore you need to use the non-parallelized version for using ILIST.
SPEED Comparison of EM3DR :
A real data set with 114 particles
| System | Digital VMS | IBM AIX |
|---|---|---|
| Machine | Laevo.bio | SP2 |
| Machine model | ALPHA station500 (400MHz) |
IBM WHII375M(POWER3) |
| CPU number | 1 | 32 |
| Program | em3dr.exe | em3dr.para.pwr3 |
| Calculation Time | 654.52sec | 57.33sec |
| Speed up factor | 1 | 11.4 |
Table 2. Comparison between serial version of EM3DR and parallelized EM3DR using a real data set of 114 particles
BUG REPORT :
If you meet any bugs or problems of the programs, please email xc@purdue.edu.