Streaming data from NASA's Earth Surface Minteral Dust Source Investigation (EMIT)¶
This is a proof of concept notebook to demonstrate how earthaccess can facilitate the use of cloud hosted data from NASA using xarray and holoviews. For a formal tutorial on EMIT please visit the official repository where things are explained in detail. EMIT Science Tutorial
Prerequisites
- NASA EDL credentials
- Openscapes Conda environment installed
- For direct access this notebook should run in AWS
IMPORTANT: This notebook should run out of AWS but is not recommended as streaming HDF5 data is slow out of region
from pprint import pprint
import earthaccess
import xarray as xr
print(f"using earthaccess version {earthaccess.__version__}")
auth = earthaccess.login()
--------------------------------------------------------------------------- ModuleNotFoundError Traceback (most recent call last) Cell In[1], line 3 1 from pprint import pprint ----> 3 import earthaccess 4 import xarray as xr 6 print(f"using earthaccess version {earthaccess.__version__}") File ~/checkouts/readthedocs.org/user_builds/earthaccess/envs/1135/lib/python3.11/site-packages/earthaccess/__init__.py:26 24 from .auth import Auth 25 from .dmrpp_zarr import open_virtual_dataset, open_virtual_mfdataset ---> 26 from .icechunk_opener import open_icechunk_from_url 27 from .kerchunk import consolidate_metadata 28 from .search import DataCollection, DataCollections, DataGranule, DataGranules File ~/checkouts/readthedocs.org/user_builds/earthaccess/envs/1135/lib/python3.11/site-packages/earthaccess/icechunk_opener.py:6 3 from typing import List 4 from urllib.parse import urlparse ----> 6 import icechunk as ic 7 from icechunk import IcechunkStore, S3StaticCredentials, s3_storage 9 import earthaccess ModuleNotFoundError: No module named 'icechunk'
Searching for the dataset with .search_datasets()¶
Note: API docs can be found at earthaccess
results = earthaccess.search_datasets(short_name="EMITL2ARFL", cloud_hosted=True)
# Let's print our datasets
for dataset in results:
pprint(dataset.summary())
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[2], line 1 ----> 1 results = earthaccess.search_datasets(short_name="EMITL2ARFL", cloud_hosted=True) 3 # Let's print our datasets 4 for dataset in results: NameError: name 'earthaccess' is not defined
Searching for the data with .search_data() over Ecuador¶
# ~Ecuador = -82.05,-3.17,-76.94,-0.52
granules = earthaccess.search_data(
short_name="EMITL2ARFL",
bounding_box=(-82.05, -3.17, -76.94, -0.52),
count=10,
)
print(len(granules))
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[3], line 2 1 # ~Ecuador = -82.05,-3.17,-76.94,-0.52 ----> 2 granules = earthaccess.search_data( 3 short_name="EMITL2ARFL", 4 bounding_box=(-82.05, -3.17, -76.94, -0.52), 5 count=10, 6 ) 7 print(len(granules)) NameError: name 'earthaccess' is not defined
earthaccess can print a preview of the data using the metadata from CMR¶
Note: there is a bug in earthaccess where the reported size of the granules are always 0, fix is coming next week
granules[7]
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[4], line 1 ----> 1 granules[7] NameError: name 'granules' is not defined
Streaming data from S3 with fsspec¶
Opening the data with earthaccess.open() and accessing the NetCDF as if it was local
If we run this code in AWS(us-west-2), earthaccess can use direct S3 links. If we run it out of AWS, earthaccess can only use HTTPS links. Direct S3 access for NASA data is only allowed in region.
# open() accepts a list of results or a list of links
file_handlers = earthaccess.open(granules)
file_handlers
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[5], line 2 1 # open() accepts a list of results or a list of links ----> 2 file_handlers = earthaccess.open(granules) 3 file_handlers NameError: name 'earthaccess' is not defined
%%time
# we can use any file from the array
file_p = file_handlers[4]
refl = xr.open_dataset(file_p)
wvl = xr.open_dataset(file_p, group="sensor_band_parameters")
loc = xr.open_dataset(file_p, group="location")
ds = xr.merge([refl, loc])
ds = ds.assign_coords(
{
"downtrack": (["downtrack"], refl.downtrack.data),
"crosstrack": (["crosstrack"], refl.crosstrack.data),
**wvl.variables,
}
)
ds
CPU times: user 5 μs, sys: 1 μs, total: 6 μs Wall time: 9.06 μs
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[6], line 1 ----> 1 get_ipython().run_cell_magic('time', '', '\n# we can use any file from the array\nfile_p = file_handlers[4]\n\nrefl = xr.open_dataset(file_p)\nwvl = xr.open_dataset(file_p, group="sensor_band_parameters")\nloc = xr.open_dataset(file_p, group="location")\nds = xr.merge([refl, loc])\nds = ds.assign_coords(\n {\n "downtrack": (["downtrack"], refl.downtrack.data),\n "crosstrack": (["crosstrack"], refl.crosstrack.data),\n **wvl.variables,\n }\n)\n\nds\n') File ~/checkouts/readthedocs.org/user_builds/earthaccess/envs/1135/lib/python3.11/site-packages/IPython/core/interactiveshell.py:2572, in InteractiveShell.run_cell_magic(self, magic_name, line, cell) 2570 with self.builtin_trap: 2571 args = (magic_arg_s, cell) -> 2572 result = fn(*args, **kwargs) 2574 # The code below prevents the output from being displayed 2575 # when using magics with decorator @output_can_be_silenced 2576 # when the last Python token in the expression is a ';'. 2577 if getattr(fn, magic.MAGIC_OUTPUT_CAN_BE_SILENCED, False): File ~/checkouts/readthedocs.org/user_builds/earthaccess/envs/1135/lib/python3.11/site-packages/IPython/core/magics/execution.py:1447, in ExecutionMagics.time(self, line, cell, local_ns) 1445 if interrupt_occured: 1446 if exit_on_interrupt and captured_exception: -> 1447 raise captured_exception 1448 return 1449 return out File ~/checkouts/readthedocs.org/user_builds/earthaccess/envs/1135/lib/python3.11/site-packages/IPython/core/magics/execution.py:1411, in ExecutionMagics.time(self, line, cell, local_ns) 1409 st = clock2() 1410 try: -> 1411 exec(code, glob, local_ns) 1412 out = None 1413 # multi-line %%time case File <timed exec>:2 NameError: name 'file_handlers' is not defined
Plotting non orthorectified data¶
Use the following code to plot the Panel widget when you run this code on AWS us-west-2
import holoviews as hv
import hvplot.xarray
import numpy as np
import panel as pn
pn.extension()
# Find band nearest to value of 850 nm (NIR)
b850 = np.nanargmin(abs(ds["wavelengths"].values - 850))
ref_unc = ds["reflectance_uncertainty"]
image = ref_unc.sel(bands=b850).hvplot("crosstrack", "downtrack", cmap="viridis")
stream = hv.streams.Tap(source=image, x=255, y=484)
def wavelengths_histogram(x, y):
histo = ref_unc.sel(crosstrack=x, downtrack=y, method="nearest").hvplot(
x="wavelengths", color="green"
)
return histo
tap_dmap = hv.DynamicMap(wavelengths_histogram, streams=[stream])
pn.Column(image, tap_dmap)