Parallelized EMPFT and EM3DR

Introduction :

In order to do my calculation jobs as fast as possible, I migrated Dr. Tim. Baker's cryo-EM programs to IBM machines and parallized them. This page contains the information only for how to use my migrated and parallized versions of the programs. For how to use the programs, such as how to set the input viables, please refer to Dr. Baker's program pages.

How to use my version :

Requirement : You need accounts on Purdue University Computer Cluster and SP2 parallel machine to use my version of programs. The 3-D computer cluster is reserved for Biological Science Department which has three IBM RS6000/580(dart.cc,dash.cc,deft.cc) and one RS6000/591(POWER2)(deep.cc). The SP2 computer cluster is open to the whole university. The SP consists of a front-end machine named cloud.cc.purdue.edu and 80 computational nodes. The front-end machine is an IBM RS/6000 Model F-50 with four 332 MHz PowerPC processors and 4 GB of memory. The nodes are housed in five large frames, four of which contain the new WinterHawk-II nodes added during the July 2000 upgrade, and one of which contains the older WinterHawk nodes. All WinterHawk also called POWER3 machines. For how to obtain your account, please talk to Dwight D. Mckay, structure group computer system administrator.

Program list :

  1. EMPFT : pft.MACHINE_MODEL
  2. EM3DR : em3dr.MACHINE_MODEL

Description :

I have optimized the programs on different models of IBM machine. I use suffix to represent different machine model. If there is no suffix of the program, it means the program can be run on all models of IBM machine with AIX operation system except the parallel machine of SP2. Below is the list of the MACHINE_MODEL.

For how to submit jobs on SP2 parallel machine, please refer to the page of RCD. I will also suggest you to take the short courses PUCC offer every semester for how to use the computational resources.

SPEED Comparison of EMPFT :

A real data set with 814 particles that come from the CAV21 with ICAM-1 was calculated on different models of IBM machines and a VMS machine, laevo.bio.purdue.edu in Professor Tim. Baker's lab. Below is the table for comparison. The time on Laevo.bio is included as a reference. From the table, we know that the speed of POWER2 is a little bit slower than Laevo.bio. But the speed of POWER3 is 2 times faster than Laevo.bio. And the parallel machine reduces the calculation time remarkably. Input parameters of the calculation job are listed after the table.

System Digital VMS IBM AIX IBM AIX IBM AIX IBM AIX IBM AIX
Machine Laevo.bio Dash.cc Deep.cc Cycle.cc Sp101sp
-Sp116sp.cc
Sp201sp
-Sp216sp.cc
Machine model ALPHA
station500
(400MHz)
IBM
RS6000/580
IBM
RS6000/591
(POWER2)
IBM
RS6000/43P-260
(POWER3)
IBM
RS6000/59H
(POWER2)
IBM
RS6000/43P-260
(POWER3)
CPU number 1 1 1 1 16 16
Program pft.exe pft pft.pwr2 pft.pwr3 pft.para pft.para.pwr3
Real Time 7hr32min42sec 15hr53min5sec 9hr1min7sec 3hr51min43sec 40min53sec 12min27sec
CPU Time 7hr27min30.35sec 15hr31min55.17sec 8hr33min16.54sec 3hr23min7.24sec 39min15.01sec
-40min7.20sec*
11min36.65sec
-12min18.45sec*
Speed up factor 1 0.47 0.82 1.93 10.95 35.96
Page Faults 21177 28 91 596 252
-328*
256
-448*

Table 1. Comparison between different versions of the program EMPFT with a real data set

* Because there are 16 nodes and different nodes will give out differnt values. Here only list the minimum and maximum values.
** input parameters:

Known problem of parallelized PFT :

The parallized programs do not support ILIST=1 or ILIST=2 which will write output of EMPFT.RADS, EMPFT.RES1, EMPFT.RES2. Because it will lead to different nodes write to the same file and you will not obtain useful EMPFT.RADS. Therefore you need to use the non-parallelized version for using ILIST.

SPEED Comparison of EM3DR :

A real data set with 114 particles

System Digital VMS IBM AIX
Machine Laevo.bio SP2
Machine model ALPHA
station500
(400MHz)
IBM
WHII375M(POWER3)
CPU number 1 32
Program em3dr.exe em3dr.para.pwr3
Calculation Time 654.52sec 57.33sec
Speed up factor 1 11.4

Table 2. Comparison between serial version of EM3DR and parallelized EM3DR using a real data set of 114 particles

BUG REPORT :

If you meet any bugs or problems of the programs, please email xc@purdue.edu.

Back

contact webmaster - Last Update : 10/23/2007

Copyright © 2007, Rossmann's lab, all rights reserved.