| | 1 | To get started using update-mirror, take the following steps: |
| | 2 | |
| | 3 | -Save update_mirror.py and parse_config.py somewhere in your PATH. I keep |
| | 4 | my copy in ~/bin, since I added my ~/bin directory to my PATH long ago. |
| | 5 | |
| | 6 | -Edit update_mirror.py to set two things: |
| | 7 | |
| | 8 | 1) Change mirrorRootDir to be the path to the directory where |
| | 9 | you want to keep the downloaded tree of data files. |
| | 10 | 2) Change desiredProductList to be a list of the data distributions |
| | 11 | you want. For example, if you want medium deep image files you |
| | 12 | want ps1-md; if you only want medium deep catalogs you want ps1-md-cat. |
| | 13 | |
| | 14 | -Run it. A good format for the command is: |
| | 15 | |
| | 16 | nice nohup update_mirror.py >& update_mirror_stdout.log & |
| | 17 | |
| | 18 | That will run update_mirror.py in such a way that you can log out and |
| | 19 | it will keep going. If you like, you can watch its progress with: |
| | 20 | |
| | 21 | tail -f update_mirror.log |
| | 22 | |
| | 23 | Note that you are not watching update_mirror_stdout.log- that file is |
| | 24 | only there to catch unexpected error messages. |
| | 25 | |
| | 26 | The first time you run update_mirror.py with this command, it will |
| | 27 | download all of the distributions you specified. If you run it again |
| | 28 | the next day it will download only those files added to the |
| | 29 | distribution since the last time it was run. |
| | 30 | |
| | 31 | You may want to be more selective about what you download, |
| | 32 | however. There are command line options to update_mirror to let you |
| | 33 | do that. Some examples are below, but you can see all the options if you do: |
| | 34 | |
| | 35 | update_mirror.py --help |
| | 36 | |
| | 37 | To understand how to select datasets you need to understand the keys |
| | 38 | that specify datasets. The easiest way to do that is to look at the |
| | 39 | datastore directory with a web browser. Open |
| | 40 | |
| | 41 | http://datastore.ipp.ifa.hawaii.edu/ps1-md/ |
| | 42 | |
| | 43 | with a web browser and you'll see a table. Each row in the table is a |
| | 44 | FileSet, and the column titles provide keywords with which you can |
| | 45 | select filesets. update_mirror's command options let you select rows |
| | 46 | using 'regular expressions'. Here are some example commands. |
| | 47 | Selecting a particular skycell is a special case; see below. |
| | 48 | |
| | 49 | -To get only stacks, do: |
| | 50 | |
| | 51 | update_mirror.py --select_stage stack |
| | 52 | |
| | 53 | -To select only MD04, do: |
| | 54 | |
| | 55 | update_mirror.py --select_datagroup 'MD04.*' |
| | 56 | |
| | 57 | -You can combine terms. For example, to select only stack-stack diffs |
| | 58 | done with the g.00000 filter for MD06, use: |
| | 59 | |
| | 60 | update_mirror.py --select_datagroup 'MD06.*' --select_filter g.00000 |
| | 61 | --select_stage SSdiff |
| | 62 | |
| | 63 | Caution: the rules specifying what words mean what in the different |
| | 64 | columns of the datastore table seem to change occasionally, so it's |
| | 65 | useful to look at the table with a browser to make sure you're |
| | 66 | selecting what you think you are selecting. |
| | 67 | |
| | 68 | Individual skycells can't generally be identified from the level of |
| | 69 | this table. To understand how to select them, click down through one |
| | 70 | of the filesetID's for the stack or warp stage. You'll see a list of |
| | 71 | the actual files that get downloaded for that fileset. At least one |
| | 72 | of them will be a .tgz file, with a name like: |
| | 73 | |
| | 74 | MD04.skycell.068.stk.52993.skycell.068.tgz |
| | 75 | |
| | 76 | The --select_tarball option to update_mirror lets you specify a |
| | 77 | regular expression for the tarball name, so that option can be used to |
| | 78 | select specific skycells. Everything in the fileset which is *not* a |
| | 79 | tarball gets downloaded, but that data is small compared to the .tgz |
| | 80 | file. |
| | 81 | |
| | 82 | For example, to select only warps from MD04 for skycell 040, do: |
| | 83 | |
| | 84 | update_mirror.py --select_stage warp --select_datagroup 'MD04.*' |
| | 85 | --select_tarball '.*skycell\.040\.tgz' |
| | 86 | |
| | 87 | This can get tedius if you actually want several skycells, so here is |
| | 88 | a useful trick with regular expressions. If you want skycells 038, |
| | 89 | 053, 061 and 074 from that stage and group, try: |
| | 90 | |
| | 91 | update_mirror.py --select_stage warp --select_datagroup 'MD04.*' |
| | 92 | --select_tarball '.*skycell\.((038)|(053)|(061)|(074))\.tgz' |