wiki:AlternateDistClientQuickstart

Context Navigation

Version 3 (modified by welling, 16 years ago) ( diff )
--

Quick-Start Guide for update_mirror.py

This page describes 'update_mirror.py', which is an alternative datastore download mechanism (in contrast to the official IPP PanTasks-based mechanism). This is just the quick start; the full instructions are on the wiki page An Alternate Data Distribution Client

To get started using update-mirror, take the following steps:

Save update_mirror.py and parse_config.py somewhere in your PATH. I keep my copy in ~/bin, since I added my ~/bin directory to my PATH long ago.

Edit update_mirror.py to set two things:

Change mirrorRootDir to be the path to the directory where you want to keep the downloaded tree of data files.
Change desiredProductList to be a list of the data distributions you want. For example, if you want medium deep image files you want ps1-md; if you only want medium deep catalogs you want ps1-md-cat.

Run it. A good format for the command is:

nice nohup update_mirror.py >& update_mirror_stdout.log &

That will run update_mirror.py in such a way that you can log out and it will keep going. If you like, you can watch its progress with:

tail -f update_mirror.log

Note that you are not watching update_mirror_stdout.log- that file is only there to catch unexpected error messages.

The first time you run update_mirror.py with this command, it will download all of the distributions you specified. If you run it again the next day it will download only those files added to the distribution since the last time it was run.

You may want to be more selective about what you download, however. There are command line options to update_mirror to let you do that. Some examples are below, but you can see all the options if you do:

update_mirror.py --help

To understand how to select datasets you need to understand the keys that specify datasets. The easiest way to do that is to look at the datastore directory with a web browser. Open

http://datastore.ipp.ifa.hawaii.edu/ps1-md/

with a web browser and you'll see a table. Each row in the table is a FileSet, and the column titles provide keywords with which you can select filesets. update_mirror's command options let you select rows using 'regular expressions'. Here are some example commands. Selecting a particular skycell is a special case; see below.

-To get only stacks, do:

update_mirror.py --select_stage stack

-To select only MD04, do:

update_mirror.py --select_datagroup 'MD04.*'

-You can combine terms. For example, to select only stack-stack diffs

done with the g.00000 filter for MD06, use:

update_mirror.py --select_datagroup 'MD06.*' --select_filter g.00000

--select_stage SSdiff

Caution: the rules specifying what words mean what in the different columns of the datastore table seem to change occasionally, so it's useful to look at the table with a browser to make sure you're selecting what you think you are selecting.

Individual skycells can't generally be identified from the level of this table. To understand how to select them, click down through one of the filesetID's for the stack or warp stage. You'll see a list of the actual files that get downloaded for that fileset. At least one of them will be a .tgz file, with a name like:

MD04.skycell.068.stk.52993.skycell.068.tgz

The --select_tarball option to update_mirror lets you specify a regular expression for the tarball name, so that option can be used to select specific skycells. Everything in the fileset which is *not* a tarball gets downloaded, but that data is small compared to the .tgz file.

For example, to select only warps from MD04 for skycell 040, do:

update_mirror.py --select_stage warp --select_datagroup 'MD04.*'

--select_tarball '.*skycell\.040\.tgz'

This can get tedius if you actually want several skycells, so here is a useful trick with regular expressions. If you want skycells 038, 053, 061 and 074 from that stage and group, try:

update_mirror.py --select_stage warp --select_datagroup 'MD04.*'

--select_tarball '.*skycell\.((038)|(053)|(061)|(074))\.tgz'

Note: See TracWiki for help on using the wiki.

Download in other formats:

Plain Text