| Version 1 (modified by , 17 years ago) ( diff ) |
|---|
What is pantasks?
pantasks is the ipp parallel process manager for distributed computing across multiple nodes. Also see the ippTools FAQ for information on some of the commands that get launched by pantasks.
How do I start pantasks?
start up pantasks from a terminal window
pantasks
Welcome to pantasks - parallel task scheduler
load some pantasks commands
pantasks: module pantasks.pro
or, if you have a modified pantasks.pro file
pantasks: input /home/username/pantasks/pantasks.pro
After loading the pantasks.pro file, you can add a database easily:
pantasks: add.database mydatabase
If you don't add a database, pantasks will use the one declared in your .ipprc file.
How do I configure pantasks?
.pantasksrc .ptolemyrc
What are the primary pantasks commands?
Connect to a controller host
pantasks: controller host add myhost
Check the controller host status (NOTE: This will sometimes return no info, even when there are active hosts):
pantasks: controller status
Check the processing status:
pantasks: status
For additional timing details:
pantasks: status -taskstats
Start processing:
pantasks: run
Stop adding new processes, but finish out the queue :
pantasks: stop
Stop processes right now :
pantasks: halt
Exit pantasks :
pantasks: exit
How do I get more verbose output from pantasks?
pantasks: $VERBOSE = 1
Raise the number above 1 for more and more verbosity.
Why does my process fail in pantasks but succeed on the command line?
"I copied the command directly from the pantasks error stream (or from the verbose command output) and pasted it into a separate terminal. It succeeds on the terminal, but it failed in pantasks."
You may have a config error in your home directory. When pantasks executes a command, it does so from the user's home directory on whichever remote host happens to have been assigned for that process. If you happen to have some out of date config directories in that home directory, then they may be loaded before the system-level config directories. Here is an example: In your .ipprc file, you may have defined the path to your configuration directory with something like this:
PATH STR /path/to/my/system/level/ippconfig
So in your system.config file you can define the directory for your GPC1 camera.config file with this line:
CAMERAS METADATA
GPC1 STR gpc1/camera.config
END
so when you run any script that needs to reference the GPC1 camera it will look
- first in the current directory for ./gpc1/camera.config
- next in the directory defined by your .ipprc PATH variable (in this case /path/to/my/system/level/ippconfig/gpc1/camera.config)
Thus, if you have some old gpc1/camera.config lying around in your home dir, then pantasks will look there first, and will hit a config error that would not appear if you run the command from the command line when you are not in your home dir.
Solution: move any old config files out of your home directory, or update them to remove the config issue.
Why are my nodes "down" or "resp"?
- First, check that you can ssh from the machine on which you are running pantasks to the node without being prompted for a password and without errors reading your shell startup file (.bashrc, .cshrc, .profile, .login and the like):
- Try "ssh myhost"
- If you're prompted for a password, then you need to set up ssh keys, and/or check your ssh configuration.
- Second, check that you can start up 'pclient' over an ssh connection. For that to work, you need to run 'psconfig ipp-2.6.1' (or whatever the IPP version is called on your system) in your startup file.
- Try "ssh myhost pclient". If it works, you can exit pclient with "exit" or "quit".
- If it doesn't work, you need to check your shell configuration (.cshrc or .bashrc) to ensure the IPP environment is being loaded via psconfig. In .cshrc, just make sure that 'psconfig ipp-2.6.1' is executed. In .bashrc, there's a catch: bash doesn't let you use an alias in the same file that defines it, so you need to expand the 'source' by hand in .bashrc, thus (this fix will be included in the INSTALL instructions in releases post 2.6.1.):
if [ -f /IPP/psconfig.csh ]; then
alias psconfig='source /IPP/psconfig.bash' source /IPP/psconfig.bash ipp-2.6.1
else
alias psconfig='echo psconfig not available'
fi
- If there is a long delay between executing the "ssh" command and the shell appearing, there may be a timeout problem.
- Make sure your shell configuration is not too complex.
- One bash user had the IPP environment being set up in both .bashrc and .cshrc, so that psconfig was being called first when bash started, and then again on starting psconfig (which is a csh script). This produced a long delay, causing the command to time out in pantasks.
- Finally, this can also be triggered by the readline bug, whose fix is described below under detrend_resid_imfile.pl triggers the error message 'Unknown option: --erbose'
On startup I get "can't find config file. some functions will be unavailable." Which config file is missing?
You're missing the .ptolemyrc file. Copy dvo.site from your site-level config directory into your home dir and rename it as .ptolemyrc.
How do I re-run files that have failed at some stage after fixing the bug that caused the failure?
Most of the <code>ippTools</code> binaries (which provide the database interaction) have some version of a <code>-revert</code> command. For example, if a camera stage failed, try <code>camtool -revertprocessedexp -cam_id 12345</code>. You can also revert based on an error code using the <code>-code</code> argument.
Process X succeeded without fault, but it is not moving on to process Y. How do I force it to proceed?
First verify that none of the sub-stages failed. For example, if a chipRun has state 'new' and you think it should be 'full' and moving on to camRun, then check to be sure that all of the contributing chipProcessedImfile rows in your database are completed with fault=0. Then check to be sure that the next process in line was not initiated. e.g. Do you have a camRun with the chip_id that you expect?
If the process really just stopped without raising a fault and without initiating the next stage, then you can try manually setting its state to 'full' and using the appropriate ippTool to initiate the next process in the sequence. For a stalled chipRun with chip_id 2591, this would be done like this:
<pre> chiptool -dbname myDatabase -updaterun -label 'myLabel' -chip_id 2591 -state full
camtool -dbname myDatabase -definebyquery -chip_id 2591 -set_label 'mylabel' </pre>
detrend_resid_imfile.pl triggers the error message 'Unknown option: --erbose'
The dropped character in a long line is a classic symptom of a bad 'readline' library. To fix it, do: <pre> % pschecklibs -build -force libreadline </pre> There should be no need to rebuild the IPP (since we're using dynamic libraries)
