User Guide

Installation

Pilot requires python 3.7 or above

pip install globus-pilot

The Pilot Client

The Pilot Client installs a new command called pilot. Enter pilot to list all of the available commands:

(pilot-env) $ pilot
Usage: pilot [OPTIONS] COMMAND [ARGS]...

Options:
  --help  Show this message and exit.

Commands:
  delete    Delete file and search record
  describe  Output info about a dataset
  download  Download a file to your local directory.
  index     Set or display index information
  list      List known records in Globus Search
  login     Login with Globus
  logout    Revoke local tokens
  mkdir     The new path to create
  profile   Output Globus Identity used to login
  project   Set or display project information
  status    Check status of transfers
  upload    Upload dataframe to location on Globus and categorize it in...
  version   Show version and exit

All commands support the --help argument for more information. Some commands, such as status, can be run without arguments. Other commands, such as project support additional subcommands. Each subcommand also supports help, these are all valid commands:

  • pilot --help

  • pilot login --help

  • pilot index set --help

Listing Version

List the current version with:

(pilot-env) $ pilot version

Logging In

Login with the following command:

(pilot-env) $ pilot login
You have been logged in.
Your personal info has been saved as:
Name:          Jean-Luc Picard
Organization:  Star Fleet


You can update these with "pilot profile -i"

The Pilot Client expects you to login from a secure location, and has an indefinite session time. If you would like additional security, or you are logging in at a public location, you can use the following:

(pilot-env) $ pilot login --no-refresh-tokens

These credentials will expire in 48 hours.

Logging Out

Use the logout command to revoke your Globus Tokens. This is imperative on public systems.

(pilot-env) $ pilot logout
You have been logged out.

This will keep all other settings and profile information for the next time you login. If you would like to clear that too, you can use the --purge option.

(pilot-env) $ pilot logout --purge
You have been logged out.
All local user info and logs have been deleted.

List Your Information

List your information with the following

(pilot-env) $ pilot profile
You have been logged in.
Your personal info has been saved as:
Name:          Jean-Luc Picard
Organization:  Star Fleet


You can update these with "pilot profile -i"

Configuring Your Profile

The command pilot profile -i will walk you through the settings for your profile. Your profile is used to create default information about the dataset you create or update. For this example, the user will update their organization from “USS Enterprise” to “Star Fleet Academy”

(pilot-env) $ pilot profile -i
Projects have updated. Use "pilot project update" to get the newest changes.
No project set, use "pilot project set <myproject>" to set your project
Name (Wesley Crusher)>
Organization (USS Enterprise)> Star Fleet Academy
Your information has been updated

Setting Your Local Endpoint

If you are sshed into a remote system, you may want to use a GCS endpoint instead of a GCP client. You can set this with the --local-endpoint option.

(pilot-env) $ pilot profile --local-endpoint ddb59af0-6d04-11e5-ba46-22000b92c6ec
Your local endpoint has been set!
Your Profile:
Name:           Jean-Luc Picard
Organization:   Star Fleet
Local Endpoint: My GCS Endpoint
Local Path:     None

The local path on the endpoint will default to the settings on the endpoint, but can also be explicitly stated. You can add a colon separated by your path:

(pilot-env) $ pilot profile --local-endpoint ddb59af0-6d04-11e5-ba46-22000b92c6ec:~/my-subfolder

Please note: You should only use this if your session is local to the endpoint. You may encounter strange behavior with the upload and download commands placing files in unexpected locations if your endpoint is remote to where you’re actually working.

Search Indices and Projects

Search Indices

Use pilot index to list Search Indices you have previously used. Indices will show up as a list of display names. Only indices you have previously used with pilot will show up here. See pilot index set for setting new pilot search indices.

(pilot-env) $ pilot index
Set index with "pilot index set <index_uuid>|<index_name>"
* captains-log
  search-index-1
  search-index-2

Use pilot index set to set a new search index. You need to use the UUID if your Search index does not show up in the list when running pilot index

(pilot-env) $ pilot index set be69a351-f893-4268-8647-70bcb06fcd00

For information on any of your search indices, you can also use the pilot index info <index> command.

List Update & Projects

Use pilot project to list available projects. An asterisk (*) marks your currently selected project. Other commands, such as pilot list, will automatically use the project you select.

(pilot-env) $ pilot project
Set project with "pilot project set <myproject>"
  project1
  project2
  * project3
  pilot-tutorial

Projects may be updated at any time. The Pilot CLI will check for updates every 24 hours, but you can check any time with the following:

(pilot-env) $ pilot project update
Added:
   > new-project

Fetch Info on a Project

Use the info subcommand for more detailed info.

(pilot-env) $ pilot project info
Project 3
Endpoint                 petrel#ncipilot
Group                    Project 3 Group
Base Path                /projects/project3

This is an example project.

You can also query other projects:

(pilot-env) $ pilot project info pilot-tutorial
Pilot Tutorial
Endpoint                 petrel#ncipilot
Group                    Public
Base Path                /projects/pilot-tutorial

Guide to using the pilot CLI for managing and accessing data.

Setting Your Current Project

Change your project with the project set subcommand:

(pilot-env) $ pilot project set pilot-tutorial
Current project set to pilot-tutorial
(pilot-env) $ pilot project
Set project with "pilot project set <myproject>"
  project1
  project2
  project3
  * pilot-tutorial

Working with Datasets

Each Dataset represents a file on Petrel and a corresponding search entry in Globus Search. You can discover datasets with the list and describe commands, and fetch data using the download command.

Each of these commands will only act on datasets within your selected _project_.

Listing Datasets

Use the list command to see all of the datasets for this project:

(pilot-env) $ pilot list
Title                Data       Dataframe Rows   Column Size   Path
Raw tabular data for Meteorolog List      61     6      2 k    tabular/chicago_skewt.csv
Raw tabular data for Meteorolog List      61     6      2 k    tabular/chicago_skewt.tsv
Image plot of air ab Meteorolog                         511 k  chicago_skewt.png
Practical Meteorolog Meteorolog                         1 M    practical_meteorology.pdf

This will list high level general info about datasets in this project, in addition to a path we can use to refer to a specific dataset. For this example, we would refer to the dataset “chicago_skewt.csv” above using tabular/chicago_skewt.csv

Describing Datasets

Use pilot describe <dataset> to get detailed info about a dataset.

In the pilot list example above, we saw there was one record with the path “tabular/chicago_skewt.csv”. Running the following command gives us the following output:

(pilot-env) $ pilot describe tabular/chicago_skewt.csv
Title                Raw tabular data for skewt plot of air above Chicago
Authors              NOAA
Publisher            NOAA
Subjects             skewt
                     chicago
Dates                Created:  Thursday, Jul 12, 2018
Data                 Meteorology
Dataframe            List
Rows                 61
Columns              6
Formats              text/csv
Version              1
Size                 2 k
Description          This is tabular skewt data showing air above Chicago on July 12th, from ground level to 100,000 feet.


Column Name          Type    Count  Freq Top         Unique Min    Max    Mean   Std    25-PCTL 50-PCTL 75-PCTL
altitude_ft          float64 61                             725.0  99150. 34291. 26538. 10328.0 31644.0 53031.0
pressure_mb          float64 61                             12.0   989.0  406.55 333.61 108.0   300.0   702.0
t/td                 string  61     2    -64/-72.5   60
wind_dir             float64 61                             45.0   350.0  259.11 87.789 272.0   287.0   313.0
wind_spd_kts         float64 61                             0.0    37.0   18.704 11.314 7.0     20.0    28.0
time                 float64 61                             1900.0 1900.0 1900.0 0.0    1900.0  1900.0  1900.0

Other Data
Subject              globus://ebf55996-33bf-11e9-9fa4-0a06afd4a22e/projects/pilot-tutorial/tabular/chicago_skewt.csv
Portal               https://petreldata.net/nci-pilot1/detail/globus%253A%252F%252Febf55996-33bf-11e9-9fa4-0a06afd4a22e%252Fprojects%252Fpilot-tutorial%252Ftabular%252Fchicago_skewt.csv

Downloading Datasets

Use pilot download <dataset> to download a dataset. Using the example above, where “tabular/chicago_skewt.csv” is a dataset we discovered from the pilot list command:

pilot describe tabular/chicago_skewt.csv
Saved chicago_skewt.csv

Checking Status of Transfers

If you have transferred data using Globus, you can check the status of the transfer with the pilot status command.

(pilot-env) $ pilot status ID Dataframe Status Start Time Task ID 0 /chicago_skewt.csv SUCCEEDED 2019-07-01 09:04 da1ffbdc-9c19-11e9-8219-02b7a92d8e58

Scripting with the SDK

In addition to the CLI, Pilot1 Tools also provide an SDK you can use for python scripts.

from pilot.client import PilotClient
pc = PilotClient()
# Show in code docs on all methods
help(pc)

The SDK relies on the same credentials as the CLI. As long as a user has been authenticated (You have run pilot login), methods in the SDK will work without any additional parameters.