Satellite cubes using rstac and gdalcube

This R Markdown demonstrates how to work with satellite imagery collections using the rstac and gdalcube libraries in R. The workflow will cover how to access collections from Microsoft’s Planetary Computer and how to filter relevant datasets (e.g., Sentinel, Landsat, MODIS). We will also work with spatial data to extract bounding boxes for our regions of interest.

Load required libraries

library(sf)         # For working with spatial data and shapefiles
library(gdalcubes)  # For working with satellite data cubes
library(rstac)      # For accessing STAC API from Microsoft Planetary Computer
library(dplyr)      # For data manipulation
library(terra)      # For raster data operations
library(leaflet)  # For interactive maps
library(leafem)  # For additional leaflet functionalities
library(htmlwidgets)  # For additional leaflet functionalities

###Accessing STAC API Collections

Here, we’ll connect to the Microsoft Planetary Computer’s STAC API and retrieve available collections. Then, we filter out some collections and identify their IDs based on Sentinel, Landsat, or MODIS data.

# Set planetary computer STAC API
planetary_computer <- rstac::stac("https://planetarycomputer.microsoft.com/api/stac/v1")

collections_ <- rstac::stac("https://planetarycomputer.microsoft.com/api/stac/v1",force_version=T) %>%
  collections() %>%
  get_request()
# Extract and print only the IDs of the collections
collection_ids <- sapply(collections_$collections, function(x) x$id)
print(head(collection_ids))

## [1] "daymet-annual-pr" "daymet-daily-hi"  "3dep-seamless"    "3dep-lidar-dsm"  
## [5] "fia"              "sentinel-1-rtc"

The above code queries the STAC API and lists all the available collections, such as Sentinel, Landsat, and MODIS datasets. These datasets are valuable for monitoring various aspects of the Earth’s surface. Filtering Relevant Collections

# Filter collections for Sentinel, Landsat, or MODIS
filtered_collections <- collections_$collections[sapply(collections_$collections, function(x) {
  grepl("sentinel|landsat|modis", x$id, ignore.case = TRUE)
})]

# Extract and print the IDs of the filtered collections
filtered_collection_ids <- sapply(filtered_collections, function(x) x$id)
print(head(filtered_collection_ids))

## [1] "sentinel-1-rtc"    "modis-64A1-061"    "landsat-c2-l2"    
## [4] "modis-17A2H-061"   "modis-11A2-061"    "modis-17A2HGF-061"

Available Collection Descriptions

# Define the selected collection IDs
coleccion_names <- c('sentinel-2-l2a','landsat-c2-l2','modis-17A2HGF-061',
  'modis-09A1-061','modis-09Q1-061','modis-11A2-061',
  'modis-15A2H-061','modis-15A3H-061')

For the selected collections, here’s a brief description of what each dataset contains:

Landsat Collection 2 Level-2: Atmospherically corrected surface reflectance data.
Sentinel-2 Level-2A: Global imagery in 13 spectral bands at 10m-60m resolution, with bottom-of-atmosphere correction using Sen2Cor.
MODIS Gross Primary Productivity 8-Day Gap-Filled (modis-17A2HGF-061): Cumulative 8-day composites of Gross Primary Productivity at 500-meter resolution.
MODIS Surface Reflectance 8-Day (500m) (modis-09A1-061): Surface spectral reflectance of MODIS bands 1-7, corrected for atmospheric conditions.
MODIS Surface Reflectance 8-Day (250m) (modis-09Q1-061): Surface spectral reflectance of MODIS bands 1 and 2 at 250-meter resolution, corrected for atmospheric conditions.
MODIS Land Surface Temperature/Emissivity 8-Day (modis-11A2-061): Average 8-day per-pixel land surface temperature and emissivity at 1km resolution.
MODIS Leaf Area Index/FPAR 8-Day (modis-15A2H-061): Combined Fraction of Photosynthetically Active Radiation (FPAR) and Leaf Area Index (LAI) data, at 500-meter resolution.
MODIS Leaf Area Index/FPAR 4-Day (modis-15A3H-061): Similar to modis-15A2H-061 but with a 4-day composite.

Loading a Spatial Scenario

To interact with our selected datasets, we’ll first load a spatial scenario from a shapefile. This scenario defines the region of interest for which we will extract satellite data.

# Example scenario data (replace this with your real scenario data)
scenario <- sf::st_read("spatialdata/Area_SR2_PRISMA_v2.shp")

## Reading layer `Area_SR2_PRISMA_v2' from data source 
##   `/Users/carloscamino/Library/Mobile Documents/com~apple~CloudDocs/Workspace/0-toolsrtm-simulator/Vignettes/Stac_server/spatialdata/Area_SR2_PRISMA_v2.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 1 feature and 1 field
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: 602317.8 ymin: 4841295 xmax: 606062.1 ymax: 4843639
## Projected CRS: WGS 84 / UTM zone 32N

# Reproject the bounding box to EPSG:4326 (WGS 84) for STAC API compatibility
scenario_4326 <- sf::st_as_sfc(scenario) |>
  sf::st_transform(4326) |>
  sf::st_bbox()


# Set GDAL cubes options
gdalcubes::gdalcubes_options(parallel = 8)

# Reproject the scenario to EPSG:4326 (WGS 84) for STAC API compatibility
scenario_4326 <- sf::st_transform(scenario, 4326)

# Calculate the centroid of the scenario to use it for centering the map
centroid_coords <- sf::st_coordinates(sf::st_centroid(sf::st_geometry(scenario_4326)))

# Plot the scenario using leaflet
leaflet() %>%
  addTiles() %>%  # Add default OpenStreetMap tiles
  addPolygons(data = scenario_4326,  # Use the reprojected scenario for polygons
              color = "blue", 
              weight = 2, 
              fillColor = "lightblue", 
              fillOpacity = 0.5) %>%
  addScaleBar(position = "bottomleft") %>%
  addMiniMap(tiles = providers$Esri.WorldStreetMap, toggleDisplay = TRUE) %>%
  setView(lng = centroid_coords[1], 
          lat = centroid_coords[2], 
          zoom = 14)

Defining a Bounding Box around a Scenario centroid

In this section, we will define a function to create a bounding box around a given scenario’s centroid, applying a buffer to ensure we encompass a specified area around that centroid. This bounding box will help us narrow down our satellite data search.

Step 1: Function Definition

We begin by defining a function called get_bounding_box that takes two parameters: scenario (our spatial feature) and buffer_size (the size of the buffer in meters).

# Function to define the bounding box around a scenario centroid with buffer
get_bounding_box <- function(scenario, buffer_size) {
  # Calculate the centroid of the scenario geometry
  centroid <- sf::st_centroid(scenario)

  # Apply a buffer around the centroid based on the specified buffer size
  buffer_area <- sf::st_buffer(centroid, buffer_size)

  # Get the bounding box of the buffered area
  bb <- sf::st_bbox(buffer_area)

  # Reproject the bounding box to EPSG:4326 (WGS 84) for STAC API compatibility
  bb_4326 <- sf::st_as_sfc(bb) |>
    sf::st_transform(4326) |>
    sf::st_bbox()

  return(bb_4326)  # Return the reprojected bounding box
}

Step 2: Apply Buffer

Next, we apply a buffer around the centroid based on the provided buffer_size.

# Input parameters (replace with actual values as needed)
buffer_size <- 300  # Buffer size around the centroid in meters
# Calculate the bounding box using the buffer size around the scenario centroid
bb <- get_bounding_box(scenario, buffer_size)
# Print bounding box to verify the result
print(bb)

##     xmin     ymin     xmax     ymax 
## 10.28986 43.72517 10.29743 43.73066

Function to Retrieve Satellite Collections

The get.satellite_collection function retrieves satellite imagery from specified collections using the Planetary Computer STAC API, based on a given scenario, date range, cloud cover threshold, and an optional buffer size. Initially, it calculates a bounding box around the scenario’s centroid, applying the specified buffer size to ensure a broader area is included. The function then searches for available satellite items within the defined bounding box and date range, returning a maximum of 500 results. If no images are found, it raises an error; otherwise, it displays the number of images retrieved. The function extracts asset names from the first item in the results and defines the relevant assets for the selected collection, such as Landsat or MODIS datasets. It then checks if cloud filtering is applicable for the selected collection, creating an image collection either with or without cloud cover filtering based on the specified threshold. Finally, the function returns the filtered image collection, enabling further analysis or visualization of the satellite data.

# Function to retrieve satellite collections
get.satellite_collection <- function(scenario, collection, date_range, cloud_threshold, buffer_size = 300) {
  # Get bounding box based on the scenario centroid with a buffer of buffer_size meters
  bbox <- get_bounding_box(scenario, buffer_size)

  # Search for items using the planetary computer STAC API
  items <- planetary_computer |>
    stac_search(collections = collection,
                bbox = bbox[1:4],  # Use the reduced bounding box
                datetime = paste0(date_range[[1]], "/", date_range[[2]]),
                limit = 500) |>
    post_request()

  # Sign the STAC items for authentication
  items <- items_sign(
    items,
    sign_planetary_computer()
  )

  if (length(items$features) == 0) {
    stop("No images found in the specified collection for the given bounding box and date range.")
  } else {
    cat("Images found:", length(items$features), "\n")
  }

  # Get the assets for the first item
  first_item_assets <- items$features[[1]]$assets
  asset_names <- names(first_item_assets)
  print("Asset names:")  # Debug: Check the available asset names
  print(asset_names)


  # Define asset names based on the collection
  assets <- switch(collection,
                   "landsat-c2-l2" = asset_names[c(3:4,12:16)], # for landsat level-2
                   "sentinel-2-l2a" = asset_names[c(3:9,13,11:12,14)], # Example for Sentinel-2 Level-2A
                   "modis-17A2HGF-061" = c("Gpp_500m",'PsnNet_500m'), # Example for MODIS Gross Primary Productivity
                   "modis-11A2-061" = c("LST_Day_1km"), # or MODIS Land Surface Temperature
                   "modis-09A1-061" = asset_names[c(3:9)], #for MODIS Surface Reflectance 8-Day (500m)
                   "modis-09Q1-061" = c("sur_refl_b01","sur_refl_b02","sur_refl_qc_250m","sur_refl_state_250m"), # Example for MODIS Surface Reflectance 8-Day (250m)
                   "modis-15A2H-061" = c("Lai_500m","Fpar_500m","LaiStdDev_500m",'FparStdDev_500m','FparLai_QC'), # Example for MODIS Leaf Area Index/FPAR 8-Day
                   "modis-15A3H-061" = c("Lai_500m","Fpar_500m","LaiStdDev_500m",'FparStdDev_500m','FparLai_QC')  # Example for MODIS Leaf Area Index/FPAR 4-Day
  )

  # Determine if cloud filtering is applicable
  if (collection == "modis-09A1-061" || collection == "modis-17A2HGF-061" || collection == "modis-11A2-061"
      || collection == 'modis-09Q1-061'|| collection == 'modis-09A1-061'
      || collection == 'modis-15A2H-061'|| collection == 'modis-15A3H-061') {
    # Create image collection without cloud cover filtering
    clt <- items$features |>
      stac_image_collection(asset_names = assets)
  } else {
    # Create image collection with cloud cover filtering for other collections
    clt <- items$features |>
      stac_image_collection(asset_names = assets,
                            property_filter = function(x) {
                              x[["eo:cloud_cover"]] < cloud_threshold
                            })
  }

  return(clt)  # Return the filtered image collection
}

Example Usage

In this section, we define the parameters for retrieving satellite data. We specify a date range from May 1, 2022, to July 31, 2022, during which we want to collect satellite imagery. We also set a cloud cover threshold of 5%, ensuring that the retrieved images have minimal cloud interference.

Next, we create a vector, coleccion_names, which contains the names of the satellite collections we are interested in, including Sentinel-2 Level-2A, Landsat Collection 2 Level-2, and various MODIS datasets. This vector is then printed to the console to confirm the available collections.

Finally, we call the get.satellite_collection function, passing in the scenario, the first collection from our coleccion_names vector, the specified date range, the cloud threshold, and a buffer size of 10 meters. The results, stored in the satellite_collection variable, are displayed using head() to show the first few entries of the retrieved satellite data.

## Example usage
date_range <- as.Date(c("2022-05-01", "2022-07-31"))
cloud_threshold <- 5

coleccion_names <- c('sentinel-2-l2a','landsat-c2-l2','modis-17A2HGF-061','modis-09A1-061','modis-09Q1-061','modis-11A2-061','modis-15A2H-061','modis-15A3H-061')
print(coleccion_names)

## [1] "sentinel-2-l2a"    "landsat-c2-l2"     "modis-17A2HGF-061"
## [4] "modis-09A1-061"    "modis-09Q1-061"    "modis-11A2-061"   
## [7] "modis-15A2H-061"   "modis-15A3H-061"

satellite_collection <- get.satellite_collection(scenario, coleccion_names[1],
                                                 date_range, cloud_threshold, buffer_size = 10)

## Images found: 76 
## [1] "Asset names:"
##  [1] "AOT"                "B01"                "B02"               
##  [4] "B03"                "B04"                "B05"               
##  [7] "B06"                "B07"                "B08"               
## [10] "B09"                "B11"                "B12"               
## [13] "B8A"                "SCL"                "WVP"               
## [16] "visual"             "preview"            "safe-manifest"     
## [19] "granule-metadata"   "inspire-metadata"   "product-metadata"  
## [22] "datastrip-metadata" "tilejson"           "rendered_preview"

Getting Sentinel-2 images

The get.sentinel2_cube function is designed to create and process a data cube specifically for Sentinel-2 satellite collections. It takes a satellite collection object, a spatial shape (bounding box), a date range for analysis, an aggregation method (either “mean” or “median”), and an optional parameter to return a dataset along with the raster cube. The function begins by checking the get.dataset parameter to determine whether to process a raster or just reflectance values. It then defines a projected coordinate reference system (CRS) and transforms the provided shape into this CRS. A cube view is created, defining the spatial extent and resolution of the raster cube, with cloud masking applied using the Scene Classification Layer (SCL) to focus on vegetation pixels. The function checks available bands and selects the relevant bands (e.g., B02, B03, B04, B08) based on the specified aggregation method, calculating either the mean or median across the selected bands over the specified time range. Finally, it converts the result into a raster format suitable for visualization and optionally returns a dataset along with the raster cube.

# Function to create and process the data cube for Sentinel-2 collections
get.sentinel2_cube <- function(s_collection, shape, date_range, aggregation_method = "mean", get.dataset=T) {

  if (missing(get.dataset) || is.null(get.dataset)) {
    get.dataset = F  # Set default to F if empty or missing
    print('A raster is processing ---')
  } else {
    print('The reflectance are processing ---')
  }

  crs_cube <- "EPSG:3035"  # Use a projected CRS for raster operations

  # Convert bounding box to sf object and transform to the target CRS
  shape_cube <- shape |>
    sf::st_as_sfc() |>
    sf::st_transform(crs_cube) |>
    sf::st_bbox(crs = crs_cube)



  # Define the cube view (spatial extent and resolution)
  view <- cube_view(
    srs = crs_cube,
    extent = list(
      t0 = as.character(date_range[[1]]),
      t1 = as.character(date_range[[2]]),
      left = shape_cube["xmin"],
      right = shape_cube["xmax"],
      top = shape_cube["ymax"],
      bottom = shape_cube["ymin"]
    ),
    dx = 10, dy = 10, dt = "P1D",  # 10m resolution, time unit = 1 day
    aggregation = aggregation_method,#"mean",  # Set default aggregation to "mean"
    resampling = "near"    # Resampling method
  )


  # Mask to remove clouds and shadows using the SCL band
  mask <- image_mask("SCL", values = c(0,1,2,3, 6,7,8, 9,10,11,12))  # Clouds,snow,water bad pixels and shadows
  # Create a raster cube for the given collection and view
  cube <- raster_cube(s_collection, view, mask = mask)
  # Check available bands in the cube
  available_bands <- names(cube)  # Adjust based on how you access bands
  print(available_bands)
  # Select bands based on the aggregation method
  #selected_bands <- available_bands  # Example RGB bands
  # Filter selected bands based on available bands
  selected_bands <- intersect(c("B02", "B03", "B04", "B08"), available_bands)
  print(selected_bands)
  # Calculate average or median across time for selected bands
  if (aggregation_method == "mean") {
    result_cube <- cube |>
      select_bands(selected_bands) |>
      reduce_time(paste0("mean(", selected_bands, ")"))


  } else if (aggregation_method == "median") {
    result_cube <- cube |>
      select_bands(selected_bands) |>
      reduce_time(paste0("median(", selected_bands, ")"))
  } else {
    stop("Invalid aggregation method specified.")
  }

  # Convert to stars or terra format for visualization
  result_raster <- st_as_stars.cube(result_cube) |> rast()

  # Generate names based on bands and time range
  band_names <- paste0(selected_bands, "_",
                       format(as.Date(date_range[[1]]), "%Y%m%d"), "_",
                       format(as.Date(date_range[[2]]), "%Y%m%d"))

  # Assign names to the raster layers
  names(result_raster) <- selected_bands

  if (get.dataset == T){
    data.collection <- cube |> as.data.frame(complete_only = TRUE)
    data.list <-list(data.collection=data.collection, raster=result_raster)

    return(data.list)

  } else {
    return(result_raster)
  }


}

Example Usage

In this example, we first define a date range for analysis. We then call the get.sentinel2_cube function twice: first to retrieve the average raster cube and second to retrieve the median raster cube for the specified date range and bounding box (bb). Each raster cube is plotted for visual inspection, and summaries are printed to provide insights into the contents of the cubes. Finally, the resulting raster cubes are converted to brick format, which is useful for further analysis or manipulation within R. If theget.dataset parameter is set toTRUE, the function returns both the raster result and the underlying dataset as a dataframe. The function is structured to print relevant messages during execution, indicating the processing status.

# Get average raster cube
avg_raster_cube <- get.sentinel2_cube(satellite_collection, shape=bb,
                                      date_range, aggregation_method = "mean",get.dataset = F)

## [1] "The reflectance are processing ---"
##  [1] "B02" "B03" "B04" "B05" "B06" "B07" "B08" "B11" "B12" "B8A" "SCL"
## [1] "B02" "B03" "B04" "B08"

plot(avg_raster_cube)

# Get median raster cube
median_raster_cube <- get.sentinel2_cube(satellite_collection, shape=bb,
                                         date_range, aggregation_method = "median",get.dataset = F)

## [1] "The reflectance are processing ---"
##  [1] "B02" "B03" "B04" "B05" "B06" "B07" "B08" "B11" "B12" "B8A" "SCL"
## [1] "B02" "B03" "B04" "B08"

# Plot median raster cube
plot(median_raster_cube)

#### Leaflet Map Creation

In this section, we compute the Normalized Difference Vegetation Index (NDVI) using the average and median raster cubes derived from Sentinel-2 satellite imagery. NDVI serves as a vital indicator of vegetation health and density, derived from the near-infrared (NIR) and red bands of the satellite data.

Below, we provide R code to generate an interactive Leaflet map that displays both the average and median RGB layers alongside the NDVI layers. Users can easily toggle between different layers to visualize variations in vegetation health and land cover. The map features controls for layer visibility, a scale bar for measuring distances, and dynamic coordinates for mouse positioning, enhancing the overall user experience and interactivity.

# Load the raster data
avg_raster <- raster::brick(avg_raster_cube)
median_raster <- raster::brick(median_raster_cube)

# Calculate NDVI for average and median raster cubes
ndvi_avg <- (avg_raster[[4]] - avg_raster[[3]]) / (avg_raster[[4]] + avg_raster[[3]])
ndvi_median <- (median_raster[[4]] - median_raster[[3]]) / (median_raster[[4]] + median_raster[[3]])

# Assign names to NDVI rasters
names(ndvi_avg) <- "NDVI_Average"
names(ndvi_median) <- "NDVI_Median"

# Create a Leaflet map
m <- leaflet() |>
  addTiles(group = "OSM") |>
  addTiles(urlTemplate = "https://server.arcgisonline.com/ArcGIS/rest/services/World_Imagery/MapServer/tile/{z}/{y}/{x}",
           group = "Esri Tile Layer") |>
  
  # Add average RGB raster layer
  leafem::addRasterRGB(avg_raster, r = 1, g = 2, b = 3,
               quantile = c(0.2, 0.98),
               group = "RGB (Average)") |>
  
  # Add median RGB raster layer
  leafem::addRasterRGB(median_raster, r = 4, g = 3, b = 2,
               quantile = c(0.2, 0.98),
               group = "RGB (Median)") |>
  
  # Add NDVI layers
  leaflet::addRasterImage(ndvi_avg, group = "NDVI (Average)", colors = terrain.colors(100)) |>
  leaflet::addRasterImage(ndvi_median, group = "NDVI (Median)", colors = terrain.colors(100)) |>
  
  # Set view to the centroid of the bounding box
  setView(lng = st_coordinates(st_centroid(st_as_sf(st_as_sfc(bb))))[1],
          lat = st_coordinates(st_centroid(st_as_sf(st_as_sfc(bb))))[2],
          zoom = 16) |>
  
  # Add layer controls to switch between raster layers and base maps
  addLayersControl(
    baseGroups = c("OSM", "Esri Tile Layer"),
    overlayGroups = c("RGB (Average)", "RGB (Median)", "NDVI (Average)", "NDVI (Median)"),
    options = layersControlOptions(collapsed = FALSE)
  ) |>
  
  addScaleBar(position = "bottomright") |>
  addMeasure(primaryLengthUnit = "meters", primaryAreaUnit = "sqmeters") |>
  
  # Add mouse coordinates
  leafem::addMouseCoordinates()

# Add click event to extract pixel values
m <- m |>
  htmlwidgets::onRender("
    function(el, x) {
      this.on('click', function(e) {
        var lat = e.latlng.lat;
        var lng = e.latlng.lng;

        // Display coordinates in a popup
        L.popup()
          .setLatLng(e.latlng)
          .setContent('Clicked coordinates: <br>Latitude: ' + lat + '<br>Longitude: ' + lng)
          .openOn(this);
      });
    }"
  )

# Print the map
m

Satellite cubes using rstac and gdalcube

C. Camino and C. Boylin

2024-10-03

Load required libraries

Available Collection Descriptions

Loading a Spatial Scenario

Defining a Bounding Box around a Scenario centroid

Step 1: Function Definition

Step 2: Apply Buffer

Function to Retrieve Satellite Collections

Example Usage

Getting Sentinel-2 images

Example Usage