IPP Software Navigation Tools IPP Links Communication Pan-STARRS Links
wiki:Processing

Version 22 (modified by rhenders, 16 years ago) ( diff )

--

Introduction

This page outlines the procedures and responsibilities for the person currently acting as 'IPP Processing Czar'. In a nutshell, these include:

  • monitoring the various pantasks servers running on the production cluster using pantasks_client
  • alerting the IPP group to any notable errors or failures
  • keeping an eye on production cluster load using Ganglia
  • adding and removing labels based on the current set of processing priorities, outlined here
  • keeping an eye on available disk space using the neb-ls command (on any production machine)

Setup

You will need to have ipp user access on the production cluster. For convenience, have someone who already has access (anyone on the IPP team) to add your ssh public key to ~ipp/.ssh/authorized_keys.

Resources

Mostly, you be logged into a production cluster machine using and using the pantasks_client program to monitor operations, however there are other useful resources.

Getting started and checking processing status

Log in as ipp user on any production cluster machine and run

./check_system.sh

This lists the various pantasks servers currently running on the cluster, eg

pantasks server addstar is running (host: ipp004)
pantasks server cleanup is running (host: ippc07)
pantasks server detrend is NOT running (host: ippc06)
pantasks server distribution is running (host: ippc15)
pantasks server pstamp is running (host: ippdb02)
pantasks server publishing is running (host: ippc08)
pantasks server registration is running (host: ippc02)
pantasks server replication is running (host: ippdb00)
pantasks server stdscience is running (host: ippc16)
pantasks server summitcopy is running (host: ippc01)

Assuming some or all of the servers are running, move to the directory corresponding to the server of interest, eg ~ipp/stdscience/, then run

pantasks_client

To check the current labels being processed:

pantasks: show.labels

Within pantasks, to check processing status, do

pantasks: status

This will return something like

 Task Status
  AV Name                     Nrun   Njobs   Ngood Nfail Ntime Command               
  +- extra.labels.on             0       3       3     0     0 echo                  
  +- extra.labels.off            0       3       3     0     0 echo                  
  +- ns.initday.load             0       3       3     0     0 echo                  
  ++ ns.registration.load        0    1331    1331     0     0 automate_stacks.pl    
  ++ ns.chips.load               0      66      66     0     0 automate_stacks.pl    
  ++ ns.chips.run                0       4       4     0     0 automate_stacks.pl    
  ++ ns.stacks.load              0    5825    5825     0     0 automate_stacks.pl    
  ++ ns.stacks.run               0       6       6     0     0 automate_stacks.pl    
  ++ ns.burntool.load            0       8       8     0     0 automate_stacks.pl    
  ++ ns.burntool.run             0     360     360     0     0 ipp_apply_burntool.pl 
  ++ chip.imfile.load            1   48039   48038     0     0 chiptool              
  ++ chip.imfile.run             0   23524   17755  5769     0 chip_imfile.pl        
  ++ chip.advanceexp             0    7514    7514     0     0 chiptool    
  etc...       

The key thing to monitor here is the Nfail column. Depending on the process, different numbers of Nfail as a proportion of Njobs are deemed acceptable.

Stopping and staring the servers

It is occasionally necessary to stop and restart the pantasks_server instances. For example, when it is necessary to update and rebuild the code, or if pantasks itself becomes unresponsive or shows negative values in some columns of the status display (above).

Stopping

To shut down all pantasks_server instances, use

check_system.sh stop
check_system.sh shutdown

Starting

Each pantasks_server uses the input file located in the directory where is in instantiated. It also uses the local ptolemy.rc file (this file details the machine where the server is to run).

To restart all the pantasks_server instances, you need to ssh to each relevant machine, which are found using check_system.sh. For each server do the following:

ssh ipp@ippXXX
cd <serverName>
pantasks_server &
pantasks_client
pantasks: server input input
pantasks: setup

So, for example for stdscience

ssh ippc16
cd ~stdscience
pantasks_server &
pantasks_client
pantasks: server input input
pantasks: setup

Each server then needs to be handled differently for setup.

stdscience

Add surveys

pantasks: add.surveys

This adds the surveys defined in the 'input' file. Now show labels with

pantasks: show.labels

Working from this list, add and remove labels with del.label and add.label, eg

pantasks: del.label M31.nightlyscience
pantasks: add.label ThreePi.DM.20100401

Now add some hosts. Since stdscience is the most intensive server, it requires more hosts than the others. The configuration as shown below is a good guide.

pantasks: hosts add wave1
pantasks: hosts add wave2
pantasks: hosts add wave2
pantasks: hosts add wave3
pantasks: hosts add wave3
pantasks: hosts add compute
pantasks: hosts add compute

However, 1 x wave1, 3 x wave2, 4 x wave3, 4 x compute is probably needed for full-scale operations.

Now we are ready to run the server

pantasks: run

summitcopy, registration, replication

These ate the easy ones, just

pantasks: run

publishing

This server is specifically for publishing data to MOPS.

add labels? TODO

pstamp

The postage stamp server.

pantasks: add.hosts
pantasks: run

distribution

In terms of labels, distribution roughly mirrors stdscience

pantasks: add.labels

same labels as stdscience? TODO

Add hosts

pantasks: hosts add wave1
pantasks: hosts add wave2
pantasks: hosts add wave3
pantasks: hosts add compute

Check processing is running smoothly in stdscience using pantasks: status. If all is okay, then

pantasks: run

cleanup

pantasks: add.labels
pantasks: hosts add wave2
pantasks: hosts add wave3
pantasks: hosts add compute
pantasks: run

detrend, addstar

TODO

Rebuilding the IPP code

The IPP in use presently is located at

~ipp/ipp-20100211

If the code needs an update and rebuild, then:

  • stop pantasks (as above)
  • cd ~ipp/ipp-20100211
  • svn update
  • psbuild -dev -optimize
  • restart pantasks (as above)

Common problems

stdscience

Chip failures, for example

  AV Name                     Nrun   Njobs   Ngood Nfail Ntime Command               
  ++ chip.imfile.run             0   23536   17755  5781     0 chip_imfile.pl        

To investigate the failures, go to

ippmonitor->Science steps->Chip Failed Imfiles

where you can view the logs by clicking within the 'State' column.

Who to contact

Any problems or concerns should be reported to the ipp development mailing list:

ps-ipp-dev@ifa.hawaii.edu

Different members of the IPP team are responsible for different parts of the code, and the relevant person will hopefully address the issue.

Note: See TracWiki for help on using the wiki.