| Version 6 (modified by , 16 years ago) ( diff ) |
|---|
This page outlines the procedures and responsibilities for the person currently acting as 'IPP Processing Czar'. In a nutshell, these include:
- monitoring the various pantasks servers running on the production cluster using
pantasks_client - alerting the IPP group to any notable errors or failures
- keeping an eye on production cluster load using 'ganglia'
- adding and removing labels based on the current set of processing priorities, outlined here
Setup
You will need to have ipp user access on the production cluster. For convenience, have someone who already has access (anyone on the IPP team) to add your ssh public key to ~ipp/.ssh/authorized_keys.
Resources
Mostly, you be logged into a production cluster machine using and using the pantasks_client program to monitor operations, however there are other useful resources.
- Ganglia - For monitoring load on production cluster machines
- ippmonitor - a window onto the gpc1 database, particularly the NightSummary page
- processing priorities - the current list of priorities, for use when setting up labels in
stdscience
Getting started and checking processing status
Log in as ipp user on any production cluster machine and run
./check_system.sh
This lists the various pantasks servers currently running on the cluster, eg
pantasks server addstar is running (host: ipp004) pantasks server cleanup is running (host: ippc07) pantasks server detrend is NOT running (host: ippc06) pantasks server distribution is running (host: ippc15) pantasks server pstamp is running (host: ippdb02) pantasks server publishing is running (host: ippc08) pantasks server registration is running (host: ippc02) pantasks server replication is running (host: ippdb00) pantasks server stdscience is running (host: ippc16) pantasks server summitcopy is running (host: ippc01)
Move to the directory corresponding to the server of interest in the list above, eg stdscience/, then run
pantasks_client
Within pantasks, to check processing status, do
pantasks: status
To check the current labels being processed:
pantasks: show.labels
Stopping and staring the servers
Stopping
To shut down all {{pantasks_server}} instances, use
check_system.sh stop check_system.sh shutdown
Starting
Each pantasks_server uses the input file located in the directory where is in instantiated. It also uses the local ptolemy.rc file (this file details the machine where the server is to run).
To restart all the pantasks_server instances, you need to ssh to each relevant machine, which are found using check_system.sh. For each server do the following:
ssh ipp@ippXXX cd <serverName> pantasks_server & pantasks_client pantasks: server input input pantasks: setup
So, for example for stdscience
ssh ippc16 cd ~stdscience pantasks_server & pantasks_client pantasks: server input input pantasks: setup
Each server then needs to be handled differently for setup.
stdscience
Add surveys
pantasks: add.surveys
This adds the surveys defined in the 'input' file. Now show labels with
pantasks: show.labels
Working from this list, add and remove labels with del.label and add.label, eg
pantasks: del.label M31.nightlyscience pantasks: add.label ThreePi.DM.20100401
No add some hosts
pantasks: hosts add wave1 pantasks: hosts add wave2 pantasks: hosts add wave2 pantasks: hosts add wave3 pantasks: hosts add wave3 pantasks: hosts add compute
Now we are ready to run the server
pantasks: run
summitcopy, registration, replication
The easy ones, just
pantasks: run
publishing
This is for MOPS
add labels? TODO
pstamp
The postage stamp server.
pantasks: add.hosts pantasks: run
distribution
distribution roughly mirrors stdscience
pantasks: add.labels
same labels as stdscience? TODO
Add hosts
pantasks: hosts add wave1 pantasks: hosts add wave2 pantasks: hosts add wave3 pantasks: hosts add compute
Check processing is running smoothly in stdscience using pantasks: status. If all is okay, then
pantasks: run
cleanup
pantasks: add.labels pantasks: hosts add wave2 pantasks: hosts add wave3 pantasks: hosts add compute pantasks: run
detrend, addstar
TODO
Rebuilding the IPP code
The IPP in use presently is in
~ipp/ipp-20100211
If the code needs an update and rebuild, then:
- stop pantasks (as above)
cd ~ipp/ipp-20100211svn updatepsbuild -optimize- restart pantasks (as above)
Who to contact
Any problems should be reported to the ipp development mailing list: ps-ipp-dev@…
