This article describes psched, the Pan-STARRS IPP task scheduler.
The purpose of psched is to manage the automatic construction and execution of inter-related (often repetative) operations. Psched uses a set of rules to define UNIX commands, and their corresponding command-line arguments, to be performed on some regular, repeated basis. The utility of psched is that it can easily define an analysis system which is completely state-based, as opposed to an event-driven system.
Consider, for example, a telescope which obtains a collection of images over the course of a night. Every minute or two, it takes an image and writes the image to some disk. An event-driven analysis system would involve having the telescope initiate a process at the end of the exposure. This process would perform an analysis, write some output, then send trigger another process. This type of operation works very well for a simple set up with reliable hardware. Such a system becomes more difficult to maintain when hardware failures occur or when multiple systems need to interact with each other. When failures occur, the triggering information (the events) is easily lost, thus some mechanisms are needed to detect these failures and either re-send the trigger or send an alternative failure-mode trigger. Or, if two systems need to interact, one or the other system must block for results from the first. Stopping and restarting such an analysis system is very delicate since the appropriate triggers must be set up some how, eg by noticing which images have not succeeded and restarting them at the appropriate stage. All of these types of methods of handling complexity and failures are essentially state-based rules. Psched allows the easy definition of a totally state-based analysis system.
In a state-based system, some mechanism examines the state of the system and decides which actions to perform based on the current state. In the illustration above, the mechanism could examine the images available (either by examining the disk or by examining the state of a data table) and decide to perform an operation based on what images are available. This makes it very easy to handle complexity and errors. If an analysis fails, the state either is not successfully updated or the error state is recorded, both situations being easy to detect and easy to handle. Restarting the system simply involves starting the state-monitoring mechanism. Combining results from multiple input sources simply involves watching for the multiple inputs to be available. Psched provides a mechanism to define state monitors, and to define the actions which are performed when those states occur. Psched action consist of initiating UNIX commands, where the arguments of those commands may depend on the results of the state tests.
The primary function of psched is to repeatedly perform tasks, and execute jobs on the basis of those tasks. A task consists of a set of rules which describe system state tests to perform on a regular time scale. Based on the results of those state tests, the task will then choose whether or not to construct a job. The task also defines actions to perform upon the completion of a job, based upon the output and exit status of the job. A task thus defines the repeat period. It may optionally define valid or invalid time ranges (eg, Mon-Fri or 10:00-17:00, etc). The task may also specify that the job be run locally (ie, in the background on the same computer as psched) or remotely by the parallel process controller (pcontrol). A job may even be restricted to a specific computer managed by pcontrol. An example of a simple tasks is given below.
task datalist
command ls /data/foo
periods -exec 5.0
periods -timeout 50.0
periods -poll 1.0
task.exit 0
queueprint stdout
queuedelete stdout
end
task.exit 1
queuepush failure "task failed"
end
end
This task does not perform any system state tests; it is simply constructs a new job every 5.0 seconds. The job in this case is always the same: ls /data/foo . When the job finished, if the job exit status is 0 (normal UNIX success status), the resulting output is printed to the screen. If the job returns an exit status of 1 (a failure), the failure queue receives a single entry. Although they are not defined in this case, it is also possible to specify the action to be taken if the job crashes (does not exit normally) or if it times out (runs beyond the specified timeout period). A slightly more complex task which performs a state test and constructs a command based on that test is shown below
task datalist
periods -exec 5.0
periods -timeout 50.0
periods -poll 1.0
task.exec
$file = `next.file`
if ($file == "none")
break
end
command cp /data/foo/$file /data/bar
end
task.exit 0
queueprint stdout
queuedelete stdout
queuepush copied $file
end
task.exit 1
queuepush failure $file
end
end
The task.exec macro is executed by psched every 5.0
seconds. This macro executes a (hypothetical user-defined) UNIX
command (next.file) which examines the system state, return
either a filename or the word "none". If the result of this test is
"none", the task does nothing: no job is constructed. Otherwise, a
job is constructed using the name of the file returned by the state
test. Successful jobs have the filename added to the 'copied'
queue, while failed jobs add the filename to the 'failure' queue.
It is possible to interact directly with the parallel processor to examine the current status, halt the parallel processor, etc. Commands to the parallel processor are defined under the controller command. The following controller commands are available:
The time range may be given as a range of absolute dates as follows:
trange YYYY/MM/DD,HH:MM:SS YYYY/MM/DD,HH:MM:SSwhere the two dates specify the start and end of the time range. In either of these date representations, the least-significant elements of the date and time may be dropped, defaulting to 00 (in the case of hours, minutes, and seconds) or 01 (in the case of day and months). Rather than specifying an end date, it is also valid to specify a time interval from the starting date. The time interval is specified as a number followed by a unit indicated by a single letter: d (days), h (hours), m (minutes), s (seconds).
The time range may also be specified as a repeated period of time, either as a time of day or a day and time of week. In the first case, the time range is specified as follows:
trange HH:MM:SS HH:MM:SSwhere again the least-significant elements may be dropped and default to 00. This type of restriction defines a time range which is valid every day. The alternative is to specify a time range within the week, in the following form:
trange DAY@HH:MM:SS DAY@HH:MM:SSwhere the value of DAY may take on any of the three letter day-of-week names (Sun, Mon, Tue, etc). This restriction specifies a start and end time within a week which is evaluated for each week.
Below are several examples of valid time range restrictions
trange 2005/01/01 2005/12/31 (only run during 2005!) trange 18:00 00:00 (only run from 6pm until midnight) trange 00:00 06:00 (only run from midnight until 6am) trange Mon@08:00 Fri@17:00 (only run between Mon morning and Fri afternoon) trange -exclude 12:00 13:00 (skip 1 hour from noon)Note that the current definition of trange does not include time zone information. This means that all times are relative to UT. This should be addressed by adding a timezone environment variable to psched and by allowing the trange to define a timezone offset.
It is also possible to restrict the total number of jobs which are spawned for a given task. This is done with the nmax command, which is given as part of the task definition. Once a task has constructed nmax jobs, it stops task evaluation. It is possible to redefine the value of nmax at any time by redefining the task. Any time the task is redefined, the new values for any task concept will override the existing values for the task concept.
It is always possible for the interprocess communication to be performed externally: all jobs may simply write results to an external data source which is queried as part of the task evaluation. Psched may interact with UNIX programs using Opihi system interaction functions. These interaction methods include: the backticks for setting Opihi variables:
$variable = `UNIX Command`The exec command (which executes a UNIX command) and the backticks both receive the UNIX command exit status, setting the variable $STATUS. It is also possible to set a variable list to the output of a UNIX command:
list var -x "UNIX Command"In this last case, the values $var:0 - $var:N-1 are set to the value of the stdout lines from the UNIX command, and the value $var:n is set to the number of output lines.
Fine-grained control over the job exit status is available with the task.exit macro command. This allows a task to define an exit macro which is performed for different exit status conditions. The argument to the task.exit command is the exit status value which triggers the macro. This may consist of any valid numeric exit status value (0-255). It may also have the value crash, in which case the macro is executed if the program exited as a result of a signal (ie, segmentation fault, etc). Finally, if may have the value default, in which case, the macro is run if no other macro describes the exit status.
Jobs may transmit their results back to psched for further evaluation through the standard output and standard error streams. Whenever a job exits, the complete stdout and stderr streams from the job are pushed onto the psched queues stdout and stderr. The job exit macros may then parse these queues, moving the results into other psched / Opihi data containers (queues, variables, vectors, whatever is appropriate). Note that currently, the output data is simply pushed onto these output queues. It is currently the responsibility of the psched programmer to use or dispose of the data in these queues. This may change in the future: the queues may be flushed for each job completion.
It is also possible to kill or delete individual jobs by hand with the commands kill (jobID) or delete (jobID).
controller -- controller commands task -- define a schedulable task host -- define host machine for a task nmax -- define maximum number of jobs for a task trange -- define valid/invalid time periods for a task task.exit -- define exit macros for a task task.exec -- define pre-exec macro for a task command -- define executed command for a task periods -- define time scales for a task run -- run the scheduler stop -- stop the scheduler pulse -- set the scheduler update period status -- get system status kill -- kill job delete -- delete job verbose -- set/toggle verbose mode