Accessing Datasets under an Access Control List (ACL)¶

NASA Earthdata API Client 🌍¶

Note: Before we can use earthaccess we need an account with NASA EDL

In [1]:

Copied!

from earthaccess import Auth, DataCollections, DataGranules, Store

auth = Auth()
from earthaccess import Auth, DataCollections, DataGranules, Store

auth = Auth()

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[1], line 1
----> 1 from earthaccess import Auth, DataCollections, DataGranules, Store
      3 auth = Auth()

File ~/checkouts/readthedocs.org/user_builds/earthaccess/envs/1135/lib/python3.11/site-packages/earthaccess/__init__.py:26
     24 from .auth import Auth
     25 from .dmrpp_zarr import open_virtual_dataset, open_virtual_mfdataset
---> 26 from .icechunk_opener import open_icechunk_from_url
     27 from .kerchunk import consolidate_metadata
     28 from .search import DataCollection, DataCollections, DataGranule, DataGranules

File ~/checkouts/readthedocs.org/user_builds/earthaccess/envs/1135/lib/python3.11/site-packages/earthaccess/icechunk_opener.py:6
      3 from typing import List
      4 from urllib.parse import urlparse
----> 6 import icechunk as ic
      7 from icechunk import IcechunkStore, S3StaticCredentials, s3_storage
      9 import earthaccess

ModuleNotFoundError: No module named 'icechunk'

Auth()¶

earthaccess's Auth class provides 3 different strategies to authenticate ourselves with NASA EDL.

netrc: Do we have a .netrc file with our EDL credentials? if so, we can use it with earthaccess. If we don't have it and want to create one we can, earthaccess allows users to type their credentials and persist them into a .netrc file.
environment: If we have our EDL credentials as environment variables
- EARTHDATA_USERNAME
- EARTHDATA_PASSWORD
interactive: We will be asked for our EDL credentials with optional persistence to .netrc

To persist our credentials to a .netrc file we have to do the following:

auth.login(strategy="interactive", persist=True)

In this notebook we'll use the environment method followed by the netrc strategy. You can of course use the interactive strategy if you don't have a .netrc file.

In [2]:

Copied!





auth.login(strategy="environment")
# are we authenticated?
if not auth.authenticated:
    auth.login(strategy="netrc")
auth.login(strategy="environment")
# are we authenticated?
if not auth.authenticated:
    auth.login(strategy="netrc")

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[2], line 1
----> 1 auth.login(strategy="environment")
      2 # are we authenticated?
      3 if not auth.authenticated:

NameError: name 'auth' is not defined

Querying for restricted datasets¶

The DataCollection client can query CMR for any collection (dataset) using all of CMR's Query parameters and has built-in functions to extract useful information from the response.

auth.refresh_tokens()

If we belong to an early adopter group within NASA we can pass the Auth object to the other classes when we instantiate them.

# An anonymous query to CMR
Query = DataCollections().keyword('elevation')
# An authenticated query to CMR
Query = DataCollections(auth).keyword('elevation')

and it's the same with DataGranules

# An anonymous query to CMR
Query = DataGranules().keyword('elevation')
# An authenticated query to CMR
Query = DataGranules(auth).keyword('elevation')

Note: Some collections under an access control list are flagged by CMR and won't count when asking about results with hits().

In [3]:

Copied!





# The first step is to create a DataCollections query
Query = DataCollections()

# Use chain methods to customize our query
Query.short_name("ATL06").version("006")

print(f"Collections found: {Query.hits()}")

# filtering what UMM fields to print, to see the full record we omit the fields filters
# meta is always included as
collections = Query.fields(["ShortName", "Version"]).get(5)
# Inspect some results printing just the ShortName and Abstract
collections
# The first step is to create a DataCollections query
Query = DataCollections()

# Use chain methods to customize our query
Query.short_name("ATL06").version("006")

print(f"Collections found: {Query.hits()}")

# filtering what UMM fields to print, to see the full record we omit the fields filters
# meta is always included as
collections = Query.fields(["ShortName", "Version"]).get(5)
# Inspect some results printing just the ShortName and Abstract
collections

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[3], line 2
      1 # The first step is to create a DataCollections query
----> 2 Query = DataCollections()
      4 # Use chain methods to customize our query
      5 Query.short_name("ATL06").version("006")

NameError: name 'DataCollections' is not defined

In [4]:

Copied!

if not auth.refresh_tokens():
    print("Something went wrong, we may need to regenerate our tokens manually")
if not auth.refresh_tokens():
    print("Something went wrong, we may need to regenerate our tokens manually")

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[4], line 1
----> 1 if not auth.refresh_tokens():
      2     print("Something went wrong, we may need to regenerate our tokens manually")

NameError: name 'auth' is not defined

In [5]:

Copied!





Query = DataCollections(auth)

# Use chain methods to customize our query
Query.short_name("ATL06").version("006")

# This will say 1, even though we get 2 back.
print(f"Collections found: {Query.hits()}")

collections = Query.fields(["ShortName", "Version"]).get()
# Inspect some results printing just the ShortName and Abstract
collections
Query = DataCollections(auth)

# Use chain methods to customize our query
Query.short_name("ATL06").version("006")

# This will say 1, even though we get 2 back.
print(f"Collections found: {Query.hits()}")

collections = Query.fields(["ShortName", "Version"]).get()
# Inspect some results printing just the ShortName and Abstract
collections

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[5], line 1
----> 1 Query = DataCollections(auth)
      3 # Use chain methods to customize our query
      4 Query.short_name("ATL06").version("006")

NameError: name 'DataCollections' is not defined

Oh no! What!? only 1 collection found even though we got 2 results back?!

Interpreting the results¶

The hits() method above will tell you the number of query hits, but only for publicly available data sets. In this case because cloud hosted ICESat-2 data are not yet publicly available, CMR will return “1” hits, if you filtered DataCollections by provider = NSIDC_CPRD you'll get 0 hits. For now we need an alternative method of seeing how many cloud data sets are available at NSIDC. This is only temporary until cloud-hosted ICESat-2 become publicly available. We can create a collections object (we’re going to want one of these soon anyhow) and print the len() of the collections object to see the true number of hits.

Note: Since we cannot rely on hits() we need to be aware that get() may get us too many metadata records depending on the dataset and how broad our query is.

In [6]:

Copied!





Query = (
    DataGranules(auth)
    .concept_id("C2153572614-NSIDC_CPRD")
    .bounding_box(-134.7, 58.9, -133.9, 59.2)
    .temporal("2020-03-01", "2020-03-30")
)

# Unfortunately the hits() methods will behave the same for granule queries
print(f"Granules found with hits(): {Query.hits()}")

cloud_granules = Query.get()

print(f"Actual number found: {len(cloud_granules)}")
Query = (
    DataGranules(auth)
    .concept_id("C2153572614-NSIDC_CPRD")
    .bounding_box(-134.7, 58.9, -133.9, 59.2)
    .temporal("2020-03-01", "2020-03-30")
)

# Unfortunately the hits() methods will behave the same for granule queries
print(f"Granules found with hits(): {Query.hits()}")

cloud_granules = Query.get()

print(f"Actual number found: {len(cloud_granules)}")

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[6], line 2
      1 Query = (
----> 2     DataGranules(auth)
      3     .concept_id("C2153572614-NSIDC_CPRD")
      4     .bounding_box(-134.7, 58.9, -133.9, 59.2)
      5     .temporal("2020-03-01", "2020-03-30")
      6 )
      8 # Unfortunately the hits() methods will behave the same for granule queries
      9 print(f"Granules found with hits(): {Query.hits()}")

NameError: name 'DataGranules' is not defined

In [7]:

Copied!

store = Store(auth)
files = store.get(cloud_granules, "./data/C2153572614-NSIDC_CPRD/")
store = Store(auth)
files = store.get(cloud_granules, "./data/C2153572614-NSIDC_CPRD/")

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[7], line 1
----> 1 store = Store(auth)
      2 files = store.get(cloud_granules, "./data/C2153572614-NSIDC_CPRD/")

NameError: name 'Store' is not defined

In [ ]: