flowchart TD A[LAS tiles] --> B[Merge + Normalize] B --> C[DEM, CHM, DSM] C --> D[Topographic indices] B --> E[Voxelize + LAD via Beer–Lambert] D --> F[Add species + topography to LAD] E --> F F --> G[PCA + NbClust] G --> H[KMeans clustering] H --> I[Aggregate profiles] I --> J[Compute traits] J --> K[Export .pld XML] J --> L[Export GPKG positions]
ENVI-met 3DPLANTs from ALS Data
Introduction
This tutorial guides you through the complete workflow to derive vegetation structure information from Airborne Laser Scanning (ALS) data and export it into a format compatible with ENVI-met’s 3DPLANT system.
We cover:
- Merging and preprocessing ALS point clouds
- Generating digital terrain and surface models
- Voxelizing ALS data and calculating Leaf Area Density (LAD)
- Deriving ecological and structural metrics
- Clustering tree profiles into generalized types
- Exporting synthetic 3D tree definitions (.pld XML) and their locations (.gpkg)
Workflow Overview
Setup and Configuration
We begin by loading all required libraries and defining key input/output paths and parameters.
# Load necessary R packages
library(lidR) # for LiDAR handling
library(terra) # for raster operations
library(dplyr) # data wrangling
library(tidyr) # reshaping
library(sf) # spatial data (simple features)
library(here) # file path handling
library(XML) # export to ENVI-met XML
library(clusternomics) # clustering
...
We also define which species classes are considered valid and load a species raster map for later assignment.
ALS Preprocessing and Normalization
First, the LAS tiles are merged and their heights normalized using a DEM derived from the ground classification.
<- merge_las_tiles(...) # Merges all LAS tiles into one
las_fn <- readLAS(las_fn)
las <- classify_ground(las, csf(...)) # Adaptive Cloth Simulation Filter
las ...
The point cloud is then normalized to height above ground using normalize_height()
.
Terrain and Canopy Metrics
We generate various raster layers:
- DEM – Digital Elevation Model
- DSM – Digital Surface Model (top of canopy)
- CHM – Canopy Height Model (DSM – DEM)
- Slope, Aspect, and TPI – Topographic descriptors
These are later used to enrich each voxel with its spatial context.
<- terrain(dem, "slope")
slope <- DSM - DEM
CHM ...
Voxelization and LAD Calculation
The normalized ALS cloud is converted to voxels and LAD profiles are derived using a Beer–Lambert approach:
<- preprocess_voxels(las_norm, res_xy, res_z)
voxels <- convert_to_LAD_beer(voxels, grainsize = res_z, k = 0.3) lad_df
Explanation: The Beer–Lambert law models LAD from voxel pulse counts as an exponential decay of light through vegetation. Each voxel contributes to the vertical profile based on its pulse return density and assumed extinction coefficient.
Advanced Implementation Details
This pipeline uses a highly optimized voxel engine that pre-aggregates point returns into vertical bins based on user-defined voxel height (res_z
).
The convert_to_LAD_beer()
function applies:
Normalization by the maximum point count per column (to estimate occlusion probability)
Beer–Lambert transformation:
\[\text{LAD}_i = -\frac{\ln(1 - \frac{N_i}{N_{\max}})}{k \cdot \Delta z}\]
Where:
- \(N_i\) = point returns in voxel \(i\)
- \(N_{\max}\) = max returns in vertical column
- \(k\) = extinction coefficient (typically 0.3–0.5)
- \(\Delta z\) = voxel height (e.g. 2 m)
Clipping to physical LAD limits: optional thresholds (e.g., LAD_min = 0.05, LAD_max = 3.0)
Optional scaling: using a correction factor (
scale_factor
) for vegetation type
This approach is robust across canopy densities and avoids bias from heterogeneous ALS sampling.
Spatial Enrichment
Each voxel column is enriched with values from the terrain metrics (elevation, slope, CHM, etc.) and species raster via buffered spatial extraction using the exactextractr
package.
$elev <- exact_extract(dem, st_buffer(...))
lad_df ...
Note: We use small buffers to ensure overlap with raster cells, especially for coarse resolutions or near-boundary voxels.
Structural and Ecological Metrics
From the LAD profiles, we derive additional vegetation structure descriptors:
- LAD_mean / LAD_max – Central tendency
- Skewness / Kurtosis – Shape of vertical distribution
- Entropy – Evenness / complexity of LAD
- Canopy Cover, Gap Fraction – Occupancy of upper and lower canopy
- Vertical Evenness – Shannon diversity of vertical distribution
$LAD_mean <- matrixStats::rowMeans2(lad_matrix, na.rm = TRUE)
lad_df ...
Interpretation: These metrics help characterize different plant types and are key inputs to the clustering process.
Clustering and Profile Aggregation
We sample the LAD space, reduce dimensionality via Principal Component Analysis (PCA), and use NbClust to find the optimal number of clusters. These clusters represent generalized tree types for ENVI-met.
<- prcomp(sample_data, scale. = TRUE)
pca_res <- NbClust(...)
nb <- KMeans_arma(...) km
PCA Concept and Purpose
PCA transforms the original high-dimensional LAD profile into a set of orthogonal axes (principal components) that capture the greatest variance in the data. This has several advantages:
- Noise reduction: Removes minor variance and unstable features
- Interpretability: Principal axes often reflect dominant structural traits
- Efficiency: Downsamples high-dimensional LAD vectors to fewer axes (e.g., 5–10 PCs)
- Clustering stability: Improves separation of groups in Euclidean space
We determine the number of components by examining explained variance and selecting a cutoff (e.g., 80%). NbClust is then applied to the PCA-reduced data to identify the optimal number of vegetation types (clusters).
Trait Assignment and XML Export
For each cluster, we assign:
- Species name via majority vote
- LeafThickness via lookup table
- LAI, Crown height, Max LAD, Roughness length via structural stats
Then export to .pld
(ENVI-met XML format):
export_lad_to_envimet_p3d(lad_df, file_out = ..., trait_df = ...)
Note: Ensure that ENVIMET_ID is uniquely assigned and used for mapping in both XML and spatial files.
Export Tree Locations
The x/y locations of all LAD columns are exported to a GeoPackage with their cluster ID (ENVIMET_ID
) for domain placement in ENVI-met.
st_write(sf_points, output_gpkg)
Tip: These point geometries can be directly imported into ENVI-met Spaces or Generator.
Weak Points
Component | Issue |
---|---|
preprocess_voxels() |
Assumes normalized input; no fallback or validation |
convert_to_LAD_beer() |
Max normalization sensitive to outliers, bias in sparse canopies |
Trait mapping | LeafThickness hard-coded; no fallback if species unknown |
Memory | lad_matrix , NbClust , full CHM are RAM-intensive steps |
XML export | .pld profiles rely on rigid class–species mapping, not trait-driven |
Conclusion
This pipeline generates realistic, location-aware synthetic vegetation objects based on ALS-derived LAD profiles i.e. the structural types. The final .pld
and .gpkg
can be directly used for the QGIS ENVI-met plugin.
ToDos
- Replacing max-based LAD normalization with transmittance calibration
- Incorporating TLS-derived QSM traits
- Using functional trait databases for
LeafThickness
and roughness - Validating
.pld
export by reverse simulation
Source Code
Functions from new_utils.R
#' Suggest Optimal Number of Principal Components
#'
#' This function analyzes a PCA object to determine the recommended number of
#' principal components (PCs) to retain, using three common criteria:
#' - Cumulative explained variance (threshold-based)
#' - Kaiser criterion (eigenvalue > 1)
#' - Elbow method (first minimum of successive variance drops)
#'
#' Optionally, a diagnostic plot (scree and cumulative variance) is shown.
#'
#' @param pca_obj A PCA object returned by [prcomp()] or similar. Must contain `$sdev`.
#' @param variance_cutoff Numeric; cumulative variance threshold for selecting PCs
#' (default is 0.8 = 80% explained variance).
#' @param plot Logical; if `TRUE`, a scree plot and cumulative variance plot are displayed.
#'
#' @return A list containing:
#' \describe{
#' \item{n_pcs}{Number of PCs recommended based on the variance threshold.}
#' \item{explained_variance}{Number of PCs needed to reach the variance cutoff.}
#' \item{kaiser}{Number of PCs with eigenvalue > 1 (Kaiser criterion).}
#' \item{elbow}{Position of elbow point (first minimal drop in explained variance).}
#' \item{info_table}{A summary table showing the values for each criterion.}
#' }
#'
#' @details
#' - **Cumulative Variance Threshold**: Retain the smallest number of components such that
#' the cumulative proportion of explained variance meets or exceeds `variance_cutoff`.
#'
#' - **Kaiser Criterion**: Retain all components with eigenvalues greater than 1. Assumes
#' data has been standardized (mean-centered and scaled). See Kaiser (1960).
#'
#' - **Elbow Method**: Finds the index where the decrease in explained variance is smallest,
#' i.e., where the "knee" or "elbow" appears in the scree plot.
#'
#' @references
#' - Jolliffe, I. T. (2002). *Principal Component Analysis*. Springer Series in Statistics.
#' - Kaiser, H. F. (1960). The application of electronic computers to factor analysis.
#' *Educational and Psychological Measurement*, 20(1), 141–151.
#' - Cattell, R. B. (1966). The scree test for the number of factors. *Multivariate Behavioral Research*, 1(2), 245–276.
#'
#' @examples
#' pca <- prcomp(USArrests, scale. = TRUE)
#' suggest_n_pcs(pca, variance_cutoff = 0.9)
#'
#' @export
<- function(pca_obj, variance_cutoff = 0.8, plot = TRUE) {
suggest_n_pcs # Extract standard deviations of the principal components
<- pca_obj$sdev
std_dev
# Compute proportion of variance explained by each PC
<- std_dev^2 / sum(std_dev^2)
var_explained
# Compute cumulative explained variance
<- cumsum(var_explained)
cum_var_explained
# Criterion 1: Number of components needed to reach variance_cutoff
<- which(cum_var_explained >= variance_cutoff)[1]
n_var
# Criterion 2: Kaiser criterion – eigenvalue > 1
<- std_dev^2
eigenvalues <- sum(eigenvalues > 1)
n_kaiser
# Criterion 3: Elbow method – where decrease in explained variance flattens
<- diff(var_explained)
diffs <- which.min(diffs)[1]
elbow
# Assemble criterion comparison table
<- data.frame(
info_table Criterion = c("Variance Cutoff", "Kaiser Criterion", "Elbow Method"),
Num_Components = c(n_var, n_kaiser, elbow),
Cumulative_Explained_Variance = c(
round(cum_var_explained[n_var], 3),
round(cum_var_explained[n_kaiser], 3),
round(cum_var_explained[elbow], 3)
)
)
# Final recommendation is based on variance_cutoff only (can be modified as needed)
<- n_var
n_final
# Print summary
cat("📊 Summary of PCA Component Selection Criteria:\n")
print(info_table, row.names = FALSE)
cat("\n✅ Recommended number of PCs (based on variance_cutoff =", variance_cutoff, "):", n_final, "\n")
# Optional plots
if (plot) {
par(mfrow = c(1, 2))
# Scree plot: individual variance explained
plot(var_explained, type = "b", pch = 19, col = "steelblue",
xlab = "Component", ylab = "Explained Variance",
main = "Scree Plot")
abline(h = 1, col = "red", lty = 2) # Kaiser line
abline(v = elbow, col = "darkgreen", lty = 3) # Elbow marker
legend("topright", legend = c("Kaiser (λ > 1)", "Elbow"),
col = c("red", "darkgreen"), lty = c(2, 3), bty = "n")
# Cumulative variance plot
plot(cum_var_explained, type = "b", pch = 19, col = "darkorange",
xlab = "Component", ylab = "Cumulative Variance",
main = "Cumulative Explained Variance")
abline(h = variance_cutoff, col = "red", lty = 2)
abline(v = n_var, col = "blue", lty = 3)
legend("bottomright", legend = c("Cutoff", "Selected Components"),
col = c("red", "blue"), lty = c(2, 3), bty = "n")
par(mfrow = c(1, 1)) # reset plotting layout
}
# Return results silently for use in pipelines
invisible(list(
n_pcs = n_final,
explained_variance = n_var,
kaiser = n_kaiser,
elbow = elbow,
info_table = info_table
))
}
#' Split Z Coordinates into Vertical Slices and Count Points per Slice
#'
#' This function takes a vector of Z-coordinates (heights) and bins them into
#' 1-meter horizontal slices. It returns the count of points in each slice, ensuring that
#' all slices from 0 to `maxZ` are represented, even if some slices have zero points.
#'
#' @param Z A numeric vector of Z coordinates (e.g., heights of LiDAR points in meters).
#' @param maxZ Integer; the maximum height to consider (defines the highest slice boundary).
#'
#' @return A named list containing point counts per 1-meter height slice. The names are
#' formatted for clarity (e.g., `"ground_0_1m"`, `"pulses_1_2m"`, …).
#'
#' @details
#' - This is a foundational step in computing vertical vegetation structure such as
#' Leaf Area Density (LAD) profiles.
#' - The slicing assumes a 1-meter vertical resolution and bins by floor(Z).
#' - Empty slices (no points) are included with count 0 to preserve structure for later matrix assembly.
#'
#' @examples
#' z_vals <- runif(1000, 0, 20)
#' pointsByZSlice(z_vals, maxZ = 20)
#'
#' @export
<- function(Z, maxZ) {
pointsByZSlice # Floor Z-values to get integer bin index (0-based)
<- as.integer(Z)
heightSlices
# Create data.table for potential grouping (not used further here)
<- data.table::data.table(Z = Z, heightSlices = heightSlices)
zSlice
# Count number of points per height slice using base aggregate
<- stats::aggregate(list(V1 = Z), list(heightSlices = heightSlices), length)
sliceCount $V1 <- as.numeric(sliceCount$V1) # Ensure numeric (not integer or factor)
sliceCount
# Ensure all expected slice bins [0, maxZ] exist (fill with 0 if missing)
<- 0:maxZ
colRange <- colRange[!colRange %in% sliceCount$heightSlices]
missing if (length(missing) > 0) {
<- data.frame(heightSlices = missing, V1 = as.numeric(0))
fill <- rbind(sliceCount, fill)
sliceCount
}
# Order slices from bottom to top
<- sliceCount[order(sliceCount$heightSlices), ]
sliceCount
# Create readable column names for each slice
<- as.character(sliceCount$heightSlices)
colNames 1] <- "ground_0_1m" # Name for the lowest bin
colNames[-1] <- paste0("pulses_", sliceCount$heightSlices[-1], "_", sliceCount$heightSlices[-1] + 1, "m")
colNames[
# Create named list of metrics
<- list()
metrics <- sliceCount$V1
metrics[colNames]
return(metrics)
}
#' Recommend DEM Interpolation Method Based on Ground Point Quality
#'
#' This function analyzes a LAS object or LAS file and recommends an appropriate interpolation
#' method (`tin()`, `knnidw()`, or `kriging()`) for `lidR::rasterize_terrain()` based on
#' ground point density, ratio, and nearest-neighbor distance.
#'
#' @param las A LAS object or character path to a .las/.laz file.
#' @param res Numeric. Raster resolution (in meters) for ground point density estimation. Default is 1.
#' @param verbose Logical. If TRUE, prints diagnostic information. Default is TRUE.
#'
#' @return A character string with the recommended interpolation function (e.g., `"tin()"`)
#'
#' @details
#' This function implements a rule-based scoring system to select an appropriate terrain
#' interpolation algorithm for `lidR::rasterize_terrain()`. The recommendation is based on:
#'
#' \itemize{
#' \item Ground point ratio (percentage of points classified as ground)
#' \item Mean ground point density (pts/m²)
#' \item Mean nearest-neighbor distance between ground points (meters)
#' }
#'
#' Depending on these indicators, one of the following interpolation algorithms is suggested:
#' \describe{
#' \item{\code{"tin()"}}{Recommended when ground point distribution is dense and regular.}
#' \item{\code{"knnidw(k = 6, p = 2)"}}{Used under intermediate conditions with moderate density.}
#' \item{\code{"kriging(k = 10)"}}{Recommended for sparse or clustered ground points.}
#' }
#'
#' This approach follows best practices from airborne LiDAR filtering literature, including:
#' \itemize{
#' \item Zhang et al. (2016): Cloth Simulation Filtering (CSF) – \doi{10.3390/rs8060501}
#' \item Ma et al. (2025): Partitioned Cloth Simulation Filtering (PCSF) – \doi{10.3390/forests16071179}
#' \item Chen et al. (2024): Adaptive DEM filtering with roughness-based interpolation – \url{https://www.sciencedirect.com/science/article/pii/S0924271624002636}
#' }
#'
#' These studies suggest that interpolation performance depends strongly on the spatial characteristics
#' of ground point clouds, especially in forested terrain. The chosen metrics are commonly used to
#' quantify LiDAR completeness and ground visibility.
#'
#' @seealso \code{\link[lidR]{rasterize_terrain}}, \code{\link[lidR]{filter_ground}}, \code{\link[lidR]{grid_density}}
#'
#' @examples
#' \dontrun{
#' las <- readLAS("data/las/forest_tile.las")
#' method <- recommend_dem_interpolation(las, res = 1)
#' dem <- rasterize_terrain(las, res = 1, algorithm = eval(parse(text = method)))
#' }
#'
#' @importFrom lidR readLAS filter_ground grid_density
#' @importFrom RANN nn2
#' @export
<- function(las, res = 1, verbose = TRUE) {
recommend_dem_interpolation if (inherits(las, "character")) las <- readLAS(las)
if (is.empty(las)) stop("LAS file is empty or invalid")
<- filter_ground(las)
ground
<- table(las@data$Classification)
cls_tab <- sum(cls_tab)
n_total <- if ("2" %in% names(cls_tab)) cls_tab["2"] else 0
n_ground <- 100 * as.numeric(n_ground) / n_total
ground_pct
<- grid_density(ground, res = res)
density_map <- values(density_map)
density_vals <- density_vals[!is.na(density_vals)]
density_vals <- if (length(density_vals) > 0) mean(density_vals) else 0
mean_density
if (nrow(ground@data) >= 2) {
<- ground@data[, c("X", "Y")]
xy <- RANN::nn2(xy, k = 2)$nn.dists[, 2]
nn_dist <- mean(nn_dist)
mean_nn else {
} <- Inf
mean_nn
}
<- 0
score if (ground_pct > 30) score <- score + 1 else if (ground_pct < 10) score <- score - 1
if (mean_density > 1) score <- score + 1 else if (mean_density < 0.3) score <- score - 1
if (mean_nn < 1.5) score <- score + 1 else if (mean_nn > 3) score <- score - 1
<- if (score >= 2) {
method "tin()"
else if (score <= -1) {
} "kriging(k = 10)"
else {
} "knnidw(k = 6, p = 2)"
}
if (verbose) {
message(sprintf("📊 Ground point ratio: %.1f%%", ground_pct))
message(sprintf("📊 Mean ground density: %.2f pts/m²", mean_density))
message(sprintf("📊 Mean NN distance: %.2f m", mean_nn))
message(sprintf("✅ Recommended method: %s", method))
}
return(method)
}
#' Merge LAS/LAZ tiles into a single file using `lidR::catalog_retile`
#'
#' This function merges LAS/LAZ tiles from a directory into a single file.
#' Internally, it loads the directory as a `LAScatalog`, sets chunking to cover the entire extent,
#' and writes a single merged `.laz` or `.las` file. Uses parallel processing if desired.
#'
#' @param tile_dir Path to directory containing LAS/LAZ tiles (or a single LAS/LAZ file).
#' @param output_file Full path to the merged output file (default: `"merged.laz"`).
#' @param chunk_size Optional internal chunking for processing (default: `10000` m).
#' @param workers Number of parallel workers (default: `4`).
#'
#' @return Character string path to the created merged `.laz` file.
#'
#' @examples
#' \dontrun{
#' merge_las_tiles("tiles/", "merged.laz", workers = 6)
#' }
#'
#' @export
<- function(tile_dir,
merge_las_tiles output_file = "merged.laz",
chunk_size = 10000,
workers = 4) {
if (!dir.exists(tile_dir) && !file.exists(tile_dir)) {
stop("Input tile directory or file does not exist.")
}
library(lidR)
library(future)
set_lidr_threads(workers)
::plan(multisession, workers = workers)
future
<- readLAScatalog(tile_dir)
ctg opt_chunk_size(ctg) <- chunk_size
opt_chunk_buffer(ctg) <- 0
opt_output_files(ctg) <- sub("\\.la[sz]$", "", output_file)
message("Merging tiles into: ", output_file)
catalog_retile(ctg) # This writes the file
# Return final path with correct extension
<- paste0(sub("\\.la[sz]$", "", output_file), ".las")
merged_path return(merged_path)
}
#' Retile a LAS/LAZ file into regular tiles
#'
#' This function splits a large LAS/LAZ file into regular square tiles using `lidR::catalog_retile`.
#' It supports parallel processing and optional compression.
#'
#' @param input_file Path to the input LAS/LAZ file.
#' @param out_dir Directory to write the resulting tiles (default: `"tiles/"`).
#' @param chunk_size Numeric. Tile size in meters (default: `250`).
#' @param output_ext File extension of output tiles: either `"laz"` or `"las"` (default: `"laz"`).
#' @param buffer Buffer size between tiles, in meters (default: `0`, i.e. no overlap).
#' @param workers Number of parallel workers for processing (default: `4`).
#'
#' @return A `LAScatalog` object referencing the tiled files.
#'
#' @examples
#' \dontrun{
#' retile_las("data/input.las", out_dir = "tiles/", chunk_size = 200)
#' }
#'
#' @export
<- function(input_file,
retile_las out_dir = "tiles/",
chunk_size = 250,
output_ext = "laz",
buffer = 0,
workers = 4) {
if (!file.exists(input_file)) stop("Input file not found.")
if (!output_ext %in% c("laz", "las")) stop("Invalid extension: use 'laz' or 'las'.")
dir.create(out_dir, showWarnings = FALSE, recursive = TRUE)
library(lidR)
library(future)
# Enable parallel processing
set_lidr_threads(workers)
::plan(multisession, workers = workers)
future
# Read LAS file as a catalog
<- readLAScatalog(input_file)
ctg opt_laz_compression(ctg) <- (output_ext == "laz")
opt_chunk_size(ctg) <- chunk_size
opt_chunk_buffer(ctg) <- buffer
opt_output_files(ctg) <- file.path(out_dir, paste0("tile_{XLEFT}_{YBOTTOM}.", output_ext))
message("Tiling ", input_file, " into ", out_dir, " with chunk size ", chunk_size, " m")
<- catalog_retile(ctg)
tiled_ctg
return(tiled_ctg)
}
#' Preprocess ALS/TLS Point Cloud into Voxel Slice Counts
#'
#' This function filters a normalized LAS point cloud to a maximum height (`zmax`),
#' splits the points into horizontal Z-slices (via `pointsByZSlice()`),
#' and computes slice-wise point counts per XY pixel using `lidR::pixel_metrics()`.
#' The result is a flat data frame combining X/Y coordinates and vertical slice counts.
#'
#' @param normlas A normalized LAS object (e.g., output from `normalize_height()`) containing ground-aligned Z values.
#' @param res_xy Numeric. The horizontal resolution (in meters) of the XY voxel grid. Default is 2 m.
#' @param res_z Numeric. The vertical resolution (in meters) used for binning points into Z slices. Default is 2 m.
#' This is passed indirectly to `pointsByZSlice()` via `zmax`.
#' @param zmax Numeric. The maximum height (in meters) to include for voxelization. Points above this value are excluded.
#'
#' @return A data frame where each row corresponds to an XY voxel column and contains:
#' \describe{
#' \item{X, Y}{The center coordinates of the pixel (voxel column base).}
#' \item{ground_0_1m, pulses_1_2m, ...}{Point counts per vertical slice from `pointsByZSlice()`.}
#' }
#'
#' @details
#' This function prepares voxel-based vertical profiles from a normalized LAS point cloud,
#' which is a common preprocessing step in vegetation structure analysis, such as for:
#' - Estimating Leaf Area Density (LAD)
#' - Building 3D vegetation models (e.g., for ENVI-met)
#' - Computing light extinction or aerodynamic roughness from LiDAR data
#'
#' The function performs the following steps:
#'
#' 1. **Z-Filtering**: Points are restricted to the height interval `[0, zmax]` to exclude
#' noise (e.g., below ground level) and irrelevant outliers (e.g., birds, clouds).
#'
#' 2. **Safety Check for Empty Point Cloud**: If filtering removes all points, the function
#' returns `NULL` to avoid errors in later processing stages.
#'
#' 3. **Dynamic Vertical Binning Limit**: The actual maximum height (`maxZ`) is computed as
#' the minimum between the highest Z-value and the user-defined `zmax`. This ensures the
#' binning range reflects both the data and physical modeling limits.
#'
#' 4. **Per-Pixel Vertical Slice Metrics**: Using `lidR::pixel_metrics()`, the function applies
#' a custom-defined metric — `pointsByZSlice(Z, maxZ)` — to each XY cell. This splits the
#' vertical column above each pixel into 1-meter height bins (Z-slices) and counts the
#' number of points in each slice. Empty bins are filled with 0 to ensure uniform output.
#'
#' 5. **Raster Geometry to XY Coordinates**: The function extracts the centroid (X, Y) of each
#' pixel cell using `terra::xyFromCell()` so that slice metrics can be mapped spatially.
#'
#' 6. **Output Formatting**: The final result is a flat data frame where each row represents
#' a voxel column. It includes the X/Y coordinate and point counts for each Z-slice,
#' formatted with descriptive column names like `"ground_0_1m"`, `"pulses_2_3m"`, etc.
#'
#' This regularized output is designed to be compatible with downstream modeling frameworks
#' (e.g., Beer–Lambert LAD computation, ENVI-met's 3DPLANT input, or machine learning models).
#' It bridges the gap between unstructured 3D point clouds and gridded model inputs.
#'
#' @note The function assumes the input LAS object is already normalized to ground level
#' (i.e., Z = 0 corresponds to terrain surface). Use `normalize_height()` beforehand if needed.
#'
#' @seealso
#' [lidR::pixel_metrics()],
#' [lidR::voxel_metrics()],
#' [pointsByZSlice()],
#' [normalize_height()],
#' [terra::xyFromCell()]
#'
#' @examples
#' \dontrun{
#' las <- lidR::readLAS("path/to/normalized.laz")
#' voxel_df <- preprocess_voxels(las, res_xy = 2, res_z = 2, zmax = 40)
#' head(voxel_df)
#' }
#'
#' @export
<- function(normlas, res_xy = 2, res_z = 2, zmax = 40) {
preprocess_voxels # Assign LAS to working variable for clarity
<- normlas
las
# Step 1: Filter points to a valid height range [0, zmax]
<- filter_poi(las, Z >= 0 & Z <= zmax)
las
# Step 2: Check if the LAS object is empty after filtering
if (lidR::is.empty(las)) return(NULL)
# Step 3: Define maximum vertical bin based on actual max Z (floored), capped at zmax
<- min(floor(max(las@data$Z)), zmax)
maxZ
# Step 4: Define metric function for pixel-wise Z-slice counting
<- formula(paste0("~pointsByZSlice(Z, ", maxZ, ")"))
func
# Step 5: Compute per-pixel vertical structure using custom Z-slice function
<- pixel_metrics(las, func, res = res_xy)
voxels
# Step 6: Extract the X/Y centroid coordinates of each pixel cell
<- terra::xyFromCell(voxels, seq_len(ncell(voxels)))
xy
# Step 7: Extract Z-slice point counts from each pixel cell
<- terra::values(voxels)
vals
# Step 8: Combine XY coordinates with slice values into a data frame
<- cbind(xy, vals)
df colnames(df)[1:2] <- c("X", "Y") # Rename coordinate columns
return(df)
}
#' Convert Vertical Pulse Counts to LAD Using the Beer–Lambert Law
#'
#' This function applies the Beer–Lambert law to vertical LiDAR pulse count profiles
#' (from voxelized ALS/TLS data) to estimate Leaf Area Density (LAD) per vertical slice.
#' It normalizes the pulse density, corrects edge cases, applies the extinction formula,
#' and scales or clips LAD values to stay within biologically plausible ranges.
#'
#' @param df A data frame containing columns with pulse counts per Z-slice. These columns must be named with the `"pulses_"` prefix, as produced by `pointsByZSlice()` and `preprocess_voxels()`.
#' @param grainsize Numeric. Vertical resolution of each slice (in meters). Default is 2 m.
#' @param k Numeric. Extinction coefficient (typically between 0.3 and 0.5); default is 0.3.
#' @param scale_factor Numeric. Scaling factor applied after Beer–Lambert transformation to adjust LAD magnitude (default: 1.2).
#' @param lad_max Numeric or `NULL`. Maximum LAD value allowed (default: 2.5). Use `NULL` to disable.
#' @param lad_min Numeric or `NULL`. Minimum LAD value allowed (default: 0.05). Use `NULL` to disable.
#' @param keep_pulses Logical. If `TRUE`, the original `"pulses_"` columns are retained in the output. Default is `FALSE`.
#'
#' @return A data frame with the same structure as input, but with new `"lad_"` columns
#' for each original `"pulses_"` column, containing the estimated LAD values.
#' Original pulse columns are optionally removed.
#'
#' @details
#' The Beer–Lambert law models **light attenuation** through a medium such as foliage. In the context of LiDAR,
#' this relationship is inverted to estimate **leaf area density** from the relative decrease in returned pulses:
#'
#' \deqn{
#' LAD = -\frac{\log(1 - p)}{k \cdot \Delta z}
#' }
#'
#' where:
#' - \( p \) is the normalized proportion of pulses in a given voxel column and slice
#' - \( k \) is the extinction coefficient (typically 0.3–0.5)
#' - \( \Delta z \) is the vertical resolution of the slice (grainsize)
#'
#' To avoid mathematical issues:
#' - Values of \( p \geq 1 \) are clipped to 0.9999
#' - Values of \( p \leq 0 \) are clipped to 1e-5
#'
#' A **scaling factor** can be applied to tune the LAD magnitude (empirical correction).
#' LAD values are optionally clipped to a maximum and minimum for ecological realism or
#' to avoid over-saturation artifacts in further modeling (e.g., radiative transfer, ENVI-met input).
#'
#' This approach assumes:
#' - Pulse counts per slice are proportional to occlusion probability
#' - Normalization by column maximum represents local beam extinction well enough
#' - The LiDAR pulse distribution is representative of foliage density (most valid in leaf-on conditions)
#'
#' @note
#' - For ALS data, this method provides an **approximate LAD profile** suitable for modeling but
#' not physically exact. For more accurate LAD estimation, full waveform or TLS data is preferred.
#' - Normalization by max pulse count assumes that the densest slice corresponds to near-complete attenuation.
#' This introduces uncertainty if canopy gaps dominate the scene.
#'
#' @seealso
#' [preprocess_voxels()], [pointsByZSlice()], [lidR::voxel_metrics()], [terra::rast()]
#'
#' @examples
#' \dontrun{
#' df_voxels <- preprocess_voxels(las)
#' df_lad <- convert_to_LAD_beer(df_voxels, grainsize = 2, k = 0.3)
#' head(df_lad)
#' }
#'
#' @export
<- function(df, grainsize = 2, k = 0.3, scale_factor = 1.2,
convert_to_LAD_beer lad_max = 2.5, lad_min = 0.000, keep_pulses = FALSE) {
<- as.data.frame(df)
df_lad # Find all columns containing vertical pulse count data
<- grep("^pulses_", names(df_lad), value = TRUE)
pulse_cols
# Iterate over each pulse column (i.e., per Z-slice)
for (col in pulse_cols) {
# Construct name for LAD output column
<- paste0("lad_", col)
lad_col
# Normalize pulse density per column (relative to max)
<- df_lad[[col]] / max(df_lad[[col]], na.rm = TRUE)
p_rel
# # Avoid values of 0 or 1 that would break log(1 - p)
# p_rel[p_rel >= 1] <- 0.9999
# p_rel[p_rel <= 0] <- 1e-5
# Apply Beer–Lambert transformation to estimate LAD
#lad_vals <- -log1p(1 - p_rel) / (k * grainsize)
<- -log1p(-p_rel) / (k * grainsize)
lad_vals
# Apply empirical scale factor to adjust LAD magnitude
<- lad_vals * scale_factor
lad_vals
# Enforce maximum and minimum LAD values (if set)
if (!is.null(lad_max)) {
<- pmin(lad_vals, lad_max)
lad_vals
}if (!is.null(lad_min)) {
<- pmax(lad_vals, lad_min)
lad_vals
}
# Store LAD values in new column
<- lad_vals
df_lad[[lad_col]]
# Optionally remove original pulse column
if (!keep_pulses) {
<- NULL
df_lad[[col]]
}
}
return(df_lad)
}
#' Compute Canopy Traits from Long-Format LAD Profiles
#'
#' This function calculates key vegetation structure metrics from long-format
#' Leaf Area Density (LAD) profiles, grouped by `ENVIMET_ID`. These traits
#' include Leaf Area Index (LAI), maximum LAD, crown height, total height,
#' leaf thickness, and aerodynamic roughness length, all of which are relevant
#' for canopy modeling (e.g., in ENVI-met).
#'
#' @param lad_df A long-format data frame containing LAD values per voxel slice.
#' Must include at least `ENVIMET_ID`, `lad`, `z`, and optionally `Height_CHM`, `LeafThickness`, and `cluster`.
#' @param res_z Vertical voxel resolution (in meters). Default is 2 m.
#' @param LAD_cutoff Minimum LAD threshold for crown detection and trait computation (default: 0.05).
#' @param leaf_thickness_df Optional data frame mapping `cluster` to `LeafThickness` values.
#' @param roughness_fun A function to derive aerodynamic roughness length from total height.
#' Defaults to `height * 0.1`, but can be replaced with empirical models.
#' @param plantclass_prefix Character string used to label the plant class, typically `"Tree"` or `"Plant"`.
#'
#' @return A data frame with one row per `ENVIMET_ID` and the following columns:
#' \describe{
#' \item{ENVIMET_ID}{Unique identifier for the plant object (voxel column or point).}
#' \item{LAI}{Leaf Area Index (sum of LAD across vertical profile, clipped at 95th percentile).}
#' \item{MaxLAD}{Maximum LAD value (95th percentile capped at 3).}
#' \item{CrownHeight}{Height (in res_z units) of topmost voxel with LAD above cutoff.}
#' \item{Height}{Estimated total plant height, based on CHM or max Z.}
#' \item{LeafThickness}{Estimated or provided leaf thickness (from lookup table or column).}
#' \item{LADcutoff}{The LAD threshold used for crown height filtering.}
#' \item{RoughnessLength}{Estimated aerodynamic roughness length (via `roughness_fun`).}
#' \item{plantclass}{String ID used for matching plant profile in simulations.}
#' }
#'
#' @details
#' This function summarizes vertical LAD profiles (from ALS or TLS data) into trait values
#' that are essential for microclimate or ecological modeling. The key components:
#'
#' - **LAI**: Computed as the sum of LAD values per profile, excluding extreme values above
#' the 95th percentile to avoid waveform or outlier artifacts.
#'
#' - **MaxLAD**: The 95th percentile LAD value, capped at 3 to avoid unphysical spikes.
#'
#' - **CrownHeight**: The highest voxel (Z/res_z) with LAD above the cutoff threshold, interpreted
#' as the vertical extent of the crown (not the total plant height).
#'
#' - **Height**: Either taken from an external `Height_CHM` column (if provided), or estimated
#' as the maximum Z value in the profile × `res_z`.
#'
#' - **LeafThickness**: Optionally joined from a lookup table (`leaf_thickness_df`) using `cluster`.
#'
#' - **RoughnessLength**: Estimated from height using a customizable function (default: 10% of height).
#'
#' This function ensures compatibility with ENVI-met’s 3DPLANT system, where LAI, MaxLAD, and
#' CrownHeight directly map to physical vegetation parameters in `.pld` files.
#'
#' @note
#' - CrownHeight is returned in voxel units (Z index × res_z); it is not a biologically precise
#' crown base height but an upper limit used for model placement.
#' - The input `lad_df` must be long-format, with rows representing individual vertical slices per plant.
#'
#' @seealso
#' [convert_to_LAD_beer()], [export_lad_to_envimet3d()], [normalize_height()], [rasterize_canopy()]
#'
#' @examples
#' \dontrun{
#' traits_df <- compute_traits_from_lad(lad_long_df, res_z = 2)
#' head(traits_df)
#' }
#'
#' @export
<- function(lad_df, res_z = 2, LAD_cutoff = 0.05,
compute_traits_from_lad leaf_thickness_df = NULL,
roughness_fun = function(height) height * 0.1,
plantclass_prefix = "Tree") {
# Optional: merge leaf thickness values if available
if (!is.null(leaf_thickness_df)) {
<- lad_df %>%
lad_df left_join(leaf_thickness_df %>% select(cluster, LeafThickness), by = "cluster")
}
# Split by ENVIMET_ID (each plant or tree column)
<- split(lad_df, lad_df$ENVIMET_ID)
groups
# Trait calculation for each plant object
<- lapply(groups, function(group_df) {
result_list
# LAD filtering: keep values under 95th percentile and above cutoff
<- quantile(group_df$lad, 0.95, na.rm = TRUE)
lad_95 <- group_df$lad[group_df$lad <= lad_95 & group_df$lad > LAD_cutoff]
lad_filtered
# LAI: sum of LAD (unit is m²/m² if LAD is in m²/m³ × height slice)
<- sum(lad_filtered, na.rm = TRUE)
lai
# MaxLAD: 95th percentile capped at a max of 3
<- min(lad_95, 3)
max_lad
# Crown height: highest voxel slice above LAD threshold
if (any(group_df$lad > LAD_cutoff, na.rm = TRUE)) {
<- max(group_df$z[group_df$lad > LAD_cutoff]/res_z, na.rm = TRUE)
crown_z <- crown_z
crown_height else {
} <- NA
crown_height
}
# Total plant height: prefer CHM column, fall back to max Z
<- if ("Height_CHM" %in% names(group_df) && !all(is.na(group_df$Height_CHM))) {
height_value unique(group_df$Height_CHM)[1]
else {
} max(group_df$z, na.rm = TRUE) * res_z
}
# Leaf thickness: taken from joined lookup or left NA
<- if ("LeafThickness" %in% names(group_df)) {
lt unique(group_df$LeafThickness)[1]
else {
} NA
}
# Return one row of traits
data.frame(
ENVIMET_ID = unique(group_df$ENVIMET_ID),
LAI = lai,
MaxLAD = max_lad,
CrownHeight = crown_height,
Height = height_value,
LeafThickness = lt,
LADcutoff = LAD_cutoff,
RoughnessLength = roughness_fun(height_value),
plantclass = paste0(plantclass_prefix, "_", unique(group_df$ENVIMET_ID))
)
})
# Combine all into one data frame
<- do.call(rbind, result_list)
traits_df
# Remove rows with missing LAI (empty plant objects)
<- traits_df[!is.na(traits_df$LAI), ]
traits_df
return(traits_df)
}
#' Export Clustered LAD Profiles to ENVI-met PLANT3D (.pld) XML Format
#'
#' This function exports vertical LAD profiles (e.g., from TLS or ALS clustering) to
#' ENVI-met-compatible 3DPLANT XML (.pld) files. Each profile is mapped to a species definition,
#' associated with a seasonal LAI and blossom cycle, and written in sparse matrix format.
#'
#' @param lad_df A long-format data frame with LAD values per voxel slice and per plant.
#' Must contain columns `ENVIMET_ID`, `z`, and `lad`. Optionally includes `species_class`.
#' @param file_out Output file path for the `.pld` XML file. Default is `"tls_envimet_tree.pld"`.
#' @param res_z Vertical resolution of LAD slices (in meters). Default is 1 m.
#' @param trait_df Optional data frame of additional plant traits (unused here but compatible for future use).
#'
#' @return Writes an XML file to disk in ENVI-met PLANT3D format and prints confirmation to console.
#'
#' @details
#' This function is used to convert voxelized and clustered LAD profiles derived from TLS/ALS into
#' XML `.pld` files for use in ENVI-met's 3DPLANT vegetation modeling system.
#'
#' For each unique `ENVIMET_ID` (typically corresponding to a tree column), a separate `<PLANT3D>` block
#' is generated. Each plant object is assigned:
#' - A default or class-based species name (e.g., `"Fagus sylvatica"`)
#' - A height based on the number of vertical LAD layers
#' - LAD values in sparse 3D format
#' - A monthly season and blossom profile (12 months)
#'
#' The function uses a predefined lookup table (`custom_profiles`) for assigning seasonal LAI and blossom
#' curves to known species. Unmapped species default to generic "broadleaf" or "conifer" profiles.
#'
#' Species-specific traits such as `Albedo`, `Transmittance`, `RootDiameter`, and `LeafType` are hard-coded
#' per class ID but can be extended as needed.
#'
#' @note
#' - `species_class` must be numeric and match predefined mappings. Unknown classes fall back to a default.
#' - LAD values are averaged across horizontal slices (X/Y = 1) and rounded to 5 decimals.
#' - The LAD format is `"sparematrix-3D"` with only z-direction layers, suitable for columnar tree shapes.
#'
#' @seealso
#' [compute_traits_from_lad()], [convert_to_LAD_beer()], [ENVI-met documentation on PLANT3D profiles]
#'
#' @examples
#' \dontrun{
#' export_lad_to_envimet_p3d(lad_df = my_lad_data, file_out = "trees.pld", res_z = 1)
#' }
#'
#' @export
<- function(lad_df, file_out = "tls_envimet_tree.pld", res_z = 1, trait_df = NULL) {
export_lad_to_envimet_p3d # --- Define seasonal LAI/blossom profiles per species ---
<- list(
season_profiles broadleaf = list(
Season = c(0.3, 0.3, 0.3, 0.4, 0.7, 1, 1, 1, 0.8, 0.6, 0.3, 0.3),
Blossom = c(0, 0, 0.5, 0, 0, 0, 0, 0, 0, 0, 0, 0)
),conifer = list(
Season = rep(0.8, 12),
Blossom = rep(0, 12)
)
)
# --- Assign species-specific profiles if known ---
<- list(
custom_profiles "Fagus sylvatica" = season_profiles$broadleaf,
"Alnus glutinosa" = season_profiles$broadleaf,
"Fraxinus excelsior" = season_profiles$broadleaf,
"Pseudotsuga menziesii" = season_profiles$conifer,
"Larix decidua" = season_profiles$broadleaf,
"Quercus robur" = season_profiles$broadleaf,
"Picea abies" = list(Season = rep(0.75, 12), Blossom = rep(0, 12))
)
# --- Clean input and encode Z-layer index ---
<- lad_df[!is.na(lad_df$lad), ]
lad_df <- setNames(seq_along(sort(unique(lad_df$z))), sort(unique(lad_df$z)))
z_map $k <- z_map[as.character(lad_df$z)]
lad_df$lad_value <- round(lad_df$lad, 5)
lad_df
# --- Initialize XML document ---
<- unique(lad_df$ENVIMET_ID)
tree_ids <- format(Sys.time(), "%Y-%m-%dT%H:%M:%S")
now <- newXMLNode("ENVI-MET_Datafile")
root <- newXMLNode("Header")
header_node addChildren(header_node, newXMLNode("filetype", "DATA"))
addChildren(header_node, newXMLNode("version", "1"))
addChildren(header_node, newXMLNode("revisiondate", now))
addChildren(header_node, newXMLNode("remark", "Clustered TLS-based trees"))
addChildren(header_node, newXMLNode("fileInfo", "Clustered LAD Trees"))
addChildren(header_node, newXMLNode("checksum", "32767"))
addChildren(header_node, newXMLNode("encryptionlevel", "1699612"))
addChildren(root, header_node)
# --- Iterate over all unique tree profiles ---
for (id in tree_ids) {
<- lad_df[lad_df$ENVIMET_ID == id, ]
tree_df
# Average LAD across XY slices per height layer
<- tree_df %>%
profile group_by(z = k) %>%
summarise(lad_value = mean(lad_value, na.rm = TRUE), .groups = "drop")
# Define voxel profile parameters
<- max(profile$z)
zlayers <- 1
dataI <- 1
dataJ <- zlayers * res_z
Height
# --- Default plant parameters ---
<- "Fagus sylvatica"; albedo <- 0.18; trans <- 0.30; root_d <- 4.5; leaf_type <- 1
name
# --- Override species traits if species_class is available ---
if (!all(is.na(tree_df$species_class))) {
<- na.omit(unique(tree_df$species_class))[1]
class_val if (class_val == 2) { name <- "Alnus glutinosa"; albedo <- 0.18; trans <- 0.35; root_d <- 3.5; leaf_type <- 1 }
else if (class_val == 3) { name <- "Fraxinus excelsior"; albedo <- 0.19; trans <- 0.38; root_d <- 4.0; leaf_type <- 1 }
else if (class_val == 4) { name <- "Fagus sylvatica"; albedo <- 0.18; trans <- 0.30; root_d <- 4.5; leaf_type <- 1 }
else if (class_val == 5) { name <- "Pseudotsuga menziesii"; albedo <- 0.20; trans <- 0.18; root_d <- 4.2; leaf_type <- 2 }
else if (class_val == 6) { name <- "Larix decidua"; albedo <- 0.23; trans <- 0.25; root_d <- 4.0; leaf_type <- 2 }
else if (class_val == 7) { name <- "Quercus robur"; albedo <- 0.20; trans <- 0.35; root_d <- 5.0; leaf_type <- 1 }
else if (class_val == 11){ name <- "Picea abies"; albedo <- 0.22; trans <- 0.15; root_d <- 3.0; leaf_type <- 2 }
}
# --- Assign seasonal profile ---
<- name
profile_key <- custom_profiles[[profile_key]]
season_profile if (is.null(season_profile)) {
<- season_profiles[[ifelse(leaf_type == 1, "broadleaf", "conifer")]]
season_profile
}
# --- Build PLANT3D block ---
<- newXMLNode("PLANT3D")
plant_node addChildren(plant_node, newXMLNode("ID", id))
addChildren(plant_node, newXMLNode("Description", "Clustered TLS Tree"))
addChildren(plant_node, newXMLNode("AlternativeName", name))
addChildren(plant_node, newXMLNode("Planttype", "0"))
addChildren(plant_node, newXMLNode("Leaftype", as.character(leaf_type)))
addChildren(plant_node, newXMLNode("Albedo", sprintf("%.5f", albedo)))
addChildren(plant_node, newXMLNode("Eps", "0.96000"))
addChildren(plant_node, newXMLNode("Transmittance", sprintf("%.5f", trans)))
addChildren(plant_node, newXMLNode("Height", sprintf("%.5f", Height)))
addChildren(plant_node, newXMLNode("Width", "1.00000"))
addChildren(plant_node, newXMLNode("Depth", "1.00000"))
addChildren(plant_node, newXMLNode("RootDiameter", sprintf("%.5f", root_d)))
addChildren(plant_node, newXMLNode("cellsize", sprintf("%.5f", res_z)))
addChildren(plant_node, newXMLNode("xy_cells", dataI))
addChildren(plant_node, newXMLNode("z_cells", zlayers))
addChildren(plant_node, newXMLNode("scalefactor", "1.00000"))
# --- LAD Profile block (sparse 3D format) ---
<- apply(profile, 1, function(r) {
lad_lines sprintf("%d,%d,%d,%.5f", 1, 1, r[1], r[2])
})<- newXMLNode("LAD-Profile",
lad_node attrs = c(type = "sparematrix-3D",
dataI = dataI,
dataJ = dataJ,
zlayers = zlayers,
defaultValue = "0.00000"),
.children = paste(lad_lines, collapse = "\n"))
addChildren(plant_node, lad_node)
# --- Add seasonality and flowering profiles ---
addChildren(plant_node, newXMLNode("Season-Profile",
paste(sprintf("%.5f", season_profile$Season), collapse = ",")))
addChildren(plant_node, newXMLNode("Blossom-Profile",
paste(sprintf("%.5f", season_profile$Blossom), collapse = ",")))
# --- Disable L-System generation ---
addChildren(plant_node, newXMLNode("L-SystemBased", "0"))
addChildren(root, plant_node)
}
# --- Write XML to file ---
saveXML(root, file = file_out, indent = TRUE, encoding = "UTF-8")
message("✔ ENVI-met PLANT3D (.pld) written to: ", normalizePath(file_out))
}
#' Generate Compact ENVI-met-Compatible Base36 String IDs
#'
#' Converts an integer index to a left-padded base-36 string (digits + uppercase A–Z),
#' prefixed with `"S"`, suitable for use as compact `ENVIMET_ID`s (e.g., `"S0001A"`).
#' Useful when assigning unique but readable identifiers for synthetic plant elements
#' in 3D simulation domains.
#'
#' @param n An integer (scalar) to convert to base-36.
#' @param width Integer width of the resulting code (default: 5). Strings are zero-padded on the left.
#'
#' @return A character string in base-36 representation, prefixed with `"S"` and left-padded
#' to match the desired `width`. Returns e.g. `"S0000A"`, `"S0001Z"`, `"S00010"`, etc.
#'
#' @details
#' Base-36 encoding uses the characters `0–9` and `A–Z` to represent numbers in a compact alphanumeric form.
#' This is commonly used in modeling frameworks like ENVI-met where short string IDs are needed to:
#' - Uniquely label plant objects (`ENVIMET_ID`)
#' - Avoid numeric-only names (which may conflict with XML or database formats)
#' - Allow for high capacity in short formats (36⁵ = ~60 million unique IDs for `width = 5`)
#'
#' The function:
#' 1. Converts the number `n` to base-36 using digit/modulo arithmetic
#' 2. Left-pads the result to fixed width with zeros (`"0"`)
#' 3. Adds a prefix `"S"` to ensure the string starts with a non-numeric character
#'
#' @note
#' This function assumes positive integers (`n > 0`). No validation is done for negative or non-integer input.
#' It is up to the user to avoid duplicate IDs.
#'
#' @seealso
#' [export_lad_to_envimet_p3d()], [sprintf()], [strtoi()] for inverse conversion
#'
#' @examples
#' int_to_base36(1) # "S00001"
#' int_to_base36(35) # "S0000Z"
#' int_to_base36(36) # "S00010"
#' int_to_base36(12345) # "S009IX"
#'
#' @export
<- function(n, width = 5) {
int_to_base36 # Base-36 character set: 0–9, A–Z
<- c(0:9, LETTERS)
chars <- length(chars)
base
# Convert to base-36 via division/remainder
<- character()
result while (n > 0) {
<- c(chars[(n %% base) + 1], result)
result <- n %/% base
n
}
# Collapse to single string
<- paste(result, collapse = "")
result
# Pad with zeros on the left to match width
<- sprintf(paste0("%0", width, "s"), result)
padded
# Replace spaces with "0" and add "S" prefix
paste0("S", substr(gsub(" ", "0", padded), 1, width))
}
#' Suggest Optimal Number of Principal Components
#'
#' This function evaluates a PCA object and recommends the number of components to retain
#' based on three common criteria: cumulative explained variance, Kaiser criterion, and the
#' elbow method. Optionally, a diagnostic scree plot and cumulative variance plot are shown.
#'
#' @param pca_obj A PCA object as returned by [prcomp()], which must contain the `sdev` vector.
#' @param variance_cutoff The cumulative explained variance threshold to use for the primary selection (default: 0.8).
#' @param plot Logical. If `TRUE`, produces a scree plot and a cumulative variance plot for visual inspection.
#'
#' @return A list with the following elements:
#' \describe{
#' \item{n_pcs}{Recommended number of principal components based on variance cutoff.}
#' \item{explained_variance}{Number of components needed to reach the cutoff.}
#' \item{kaiser}{Number of components with eigenvalue > 1 (Kaiser criterion).}
#' \item{elbow}{Index of the "elbow" point in the scree plot.}
#' \item{info_table}{A summary data frame showing results for all three criteria.}
#' }
#'
#' @details
#' The number of principal components can be selected using multiple heuristics:
#'
#' - **Cumulative Explained Variance**: Retain the smallest number of components such that the
#' cumulative variance exceeds `variance_cutoff`. This is often considered the primary criterion.
#'
#' - **Kaiser Criterion**: Retain all components with eigenvalue > 1. Assumes input data has been scaled.
#'
#' - **Elbow Method**: Identifies the point where the marginal gain in explained variance drops off,
#' i.e., the inflection point of the scree plot. Here approximated by the first minimum in the
#' differences of explained variance.
#'
#' The function outputs all three estimates but uses only the **variance cutoff** for final selection.
#'
#' @note
#' The scree plot assumes components are sorted by variance (as returned by [prcomp()]).
#' The elbow method used here is a simplified heuristic and may not capture complex knees.
#'
#' @seealso
#' [prcomp()], [factoextra::fviz_eig()], [psych::principal()]
#'
#' @examples
#' pca <- prcomp(USArrests, scale. = TRUE)
#' suggest_n_pcs(pca, variance_cutoff = 0.9)
#'
#' @export
<- function(pca_obj, variance_cutoff = 0.8, plot = TRUE) {
suggest_n_pcs # Extract standard deviations and calculate explained variance
<- pca_obj$sdev
std_dev <- std_dev^2 / sum(std_dev^2)
var_explained <- cumsum(var_explained)
cum_var_explained
# Criterion 1: variance cutoff
<- which(cum_var_explained >= variance_cutoff)[1]
n_var
# Criterion 2: Kaiser criterion (eigenvalue > 1)
<- std_dev^2
eigenvalues <- sum(eigenvalues > 1)
n_kaiser
# Criterion 3: elbow (minimum change in explained variance)
<- diff(var_explained)
diffs <- which.min(diffs)[1]
elbow
# Summarize selection criteria
<- data.frame(
info_table Criterion = c("Variance Cutoff", "Kaiser Criterion", "Elbow Method"),
Components = c(n_var, n_kaiser, elbow),
Cumulative_Variance = c(
round(cum_var_explained[n_var], 3),
round(cum_var_explained[n_kaiser], 3),
round(cum_var_explained[elbow], 3)
)
)
# Final decision based on variance criterion only
<- n_var
n_final
# Output summary
cat("📊 PCA Component Selection Summary:\n")
print(info_table, row.names = FALSE)
cat("\n✅ Recommended number of PCs (variance_cutoff =", variance_cutoff, "):", n_final, "\n")
# Optional visualization
if (plot) {
par(mfrow = c(1, 2))
# Scree plot
plot(var_explained, type = "b", pch = 19, col = "steelblue",
xlab = "Component", ylab = "Explained Variance",
main = "Scree Plot")
abline(h = 1, col = "red", lty = 2)
abline(v = elbow, col = "darkgreen", lty = 3)
legend("topright", legend = c("Kaiser (λ > 1)", "Elbow"),
col = c("red", "darkgreen"), lty = c(2, 3), bty = "n")
# Cumulative variance plot
plot(cum_var_explained, type = "b", pch = 19, col = "darkorange",
xlab = "Component", ylab = "Cumulative Variance",
main = "Cumulative Explained Variance")
abline(h = variance_cutoff, col = "red", lty = 2)
abline(v = n_var, col = "blue", lty = 3)
legend("bottomright", legend = c("Cutoff", "Selected Components"),
col = c("red", "blue"), lty = c(2, 3), bty = "n")
par(mfrow = c(1, 1))
}
# Return results invisibly
invisible(list(
n_pcs = n_final,
explained_variance = n_var,
kaiser = n_kaiser,
elbow = elbow,
info_table = info_table
))
}
#' Export ENVI-met 3DPLANT XML from enriched LAD profile dataframe
#'
#' This function prepares and exports an ENVI-met compatible 3DPLANT (.pld) XML file
#' using a fully enriched `lad_df_cleaned` dataframe, containing LAD profiles, traits,
#' and metadata for each ENVIMET_ID.
#'
#' @param lad_df_cleaned A data.frame containing voxelized LAD data and associated plant traits.
#' Must contain: `ENVIMET_ID`, `cluster`, `z`, `lad`, `species_class`, `species_name`,
#' `LeafThickness`, `Height_CHM`, `Vertical_Evenness`, `Entropy`, `RoughnessLength`,
#' `LAI`, `MaxLAD`, `CrownHeight`, `Height`, `LADcutoff`, `plantclass`.
#' @param file_out Path to output `.pld` file.
#' @param res_z Vertical voxel resolution in meters (e.g. 2).
#'
#' @return Invisible `TRUE` if export succeeds, otherwise stops with an error.
#' @export
#'
#' @examples
#' export_envimet_pld_from_lad_clean(
#' lad_df_cleaned = lad_profiles_long,
#' file_out = "data/envimet/envimet_pseudo3Dtree.pld",
#' res_z = 2
#' )
<- function(lad_df_cleaned, file_out, res_z) {
export_envimet_pld_from_lad_clean
# --- Validation of required columns ---
<- c(
required_cols "ENVIMET_ID", "cluster", "z", "lad",
"species_class", "species_name", "LeafThickness",
"Height_CHM", "Vertical_Evenness", "Entropy",
"RoughnessLength", "LAI", "MaxLAD",
"CrownHeight", "Height", "LADcutoff", "plantclass"
)
<- setdiff(required_cols, colnames(lad_df_cleaned))
missing if (length(missing) > 0) {
stop("Missing columns in lad_df_cleaned: ", paste(missing, collapse = ", "))
}
# --- Ensure required columns are complete ---
stopifnot(!any(is.na(lad_df_cleaned$ENVIMET_ID)))
stopifnot(!any(is.na(lad_df_cleaned$lad)))
stopifnot(!any(is.na(lad_df_cleaned$species_name)))
stopifnot(!any(is.na(lad_df_cleaned$LeafThickness)))
# --- Extract trait table (one row per plant ID) ---
<- lad_df_cleaned %>%
trait_df group_by(ENVIMET_ID) %>%
summarise(
LAI = mean(LAI, na.rm = TRUE),
MaxLAD = max(MaxLAD, na.rm = TRUE),
CrownHeight = mean(CrownHeight, na.rm = TRUE),
Height = mean(Height, na.rm = TRUE),
RoughnessLength = mean(RoughnessLength, na.rm = TRUE),
LeafThickness = mean(LeafThickness, na.rm = TRUE),
LADcutoff = mean(LADcutoff, na.rm = TRUE),
plantclass = first(plantclass),
species_class = first(species_class),
species_name = first(species_name),
Entropy = mean(Entropy, na.rm = TRUE),
Vertical_Evenness = mean(Vertical_Evenness, na.rm = TRUE),
Height_CHM = mean(Height_CHM, na.rm = TRUE),
.groups = "drop"
%>%
) filter(!is.na(LAI), !is.na(MaxLAD))
# --- Filter valid LAD entries ---
<- trait_df$ENVIMET_ID
valid_ids <- lad_df_cleaned %>%
lad_profiles_filtered filter(ENVIMET_ID %in% valid_ids)
# --- Export via custom XML function ---
export_lad_to_envimet_p3d(
lad_df = lad_profiles_filtered,
file_out = file_out,
res_z = res_z,
trait_df = trait_df
)
invisible(TRUE)
}
#' Export ENVI-met and microclimf LAD Profiles from Clustered Data
#'
#' These functions aggregate a cleaned LAD dataset (lad_df_clean) by cluster,
#' compute synthetic profiles, and export them for use in ENVI-met and microclimf.
#'
#' @param lad_df_clean Data frame with full LAD profile, coordinates, traits, and cluster ID.
#' @param res_z Vertical voxel resolution (numeric).
#' @param out_dir Output directory where .gpkg and .pld files will be written.
#' @export
<- function(lad_df_clean, res_z, out_dir = "data/envimet") {
export_envimet_profiles dir.create(out_dir, showWarnings = FALSE, recursive = TRUE)
# --- Mittelung der LAD-Profile nach Cluster ---
<- lad_df_clean %>%
cluster_profiles group_by(cluster) %>%
summarise(across(starts_with("lad_pulses_"), ~mean(.x, na.rm = TRUE)), .groups = "drop") %>%
arrange(cluster)
# --- Long-Format ---
<- cluster_profiles %>%
lad_profiles_long pivot_longer(cols = starts_with("lad_pulses_"),
names_to = "layer", values_to = "lad") %>%
mutate(z = as.integer(gsub("lad_pulses_|_.*", "", layer)) * res_z,
ENVIMET_ID = paste0("S", formatC(cluster, width = 5, flag = "0"))) %>%
select(ENVIMET_ID, cluster, z, lad)
# --- Art und Traits ---
<- lad_df_clean %>%
cluster_traits group_by(cluster, species_class) %>%
summarise(n = n(), .groups = "drop") %>%
group_by(cluster) %>%
slice_max(n, with_ties = FALSE) %>%
ungroup()
<- tibble::tibble(
species_mapping species_class = c(1, 2, 3, 4, 5, 6, 7),
species_name = c("Acer pseudoplatanus", "Betula pendula", "Fagus sylvatica",
"Picea abies", "Pinus sylvestris", "Quercus robur", "Tilia cordata")
)
<- tibble::tibble(
leaf_thickness_lookup species_name = c("Fagus sylvatica", "Quercus robur", "Acer pseudoplatanus",
"Pinus sylvestris", "Picea abies", "Betula pendula", "Tilia cordata"),
LeafThickness = c(0.0025, 0.0025, 0.0022, 0.0015, 0.0016, 0.0021, 0.0023)
)
<- lad_df_clean %>%
cluster_heights group_by(cluster) %>%
summarise(Height = mean(CHM, na.rm = TRUE), .groups = "drop")
# --- LAD-Profile anreichern ---
<- lad_profiles_long %>%
lad_profiles_long left_join(cluster_traits, by = "cluster") %>%
left_join(species_mapping, by = "species_class") %>%
left_join(leaf_thickness_lookup, by = "species_name") %>%
left_join(cluster_heights, by = "cluster")
# --- Traits berechnen ---
<- compute_traits_from_lad(lad_profiles_long, res_z)
trait_df
# --- Spatial Centroid für jede Cluster-ID ---
<- lad_df_clean %>%
cluster_points group_by(cluster) %>%
summarise(across(c(X, Y), mean), .groups = "drop") %>%
mutate(ENVIMET_ID = paste0("S", formatC(cluster, width = 5, flag = "0"))) %>%
st_as_sf(coords = c("X", "Y"), crs = 25832)
# --- Export .gpkg und .pld ---
st_write(cluster_points, file.path(out_dir, "synthetic_envimet_profiles.gpkg"), delete_dsn = TRUE)
export_lad_to_envimet_p3d(lad_profiles_long, file.path(out_dir, "synthetic_envimet_profiles.pld"),
res_z = res_z, trait_df = trait_df)
}
#' Export microclimf-compatible profiles per cluster
#'
#' @param lad_df_clean Cleaned LAD data with coordinates and cluster ID
#' @param out_file Output GPKG path
#' @export
<- function(lad_df_clean, out_file = "data/microclimf/synthetic_microclimf_profiles.gpkg") {
export_microclimf_profiles dir.create(dirname(out_file), showWarnings = FALSE, recursive = TRUE)
# Mittelung aller nicht-Koordinaten nach Cluster
<- lad_df_clean %>%
mean_traits group_by(cluster) %>%
summarise(across(-c(X, Y), ~mean(.x, na.rm = TRUE)), .groups = "drop")
# Cluster-Zentroiden
<- lad_df_clean %>%
cluster_points group_by(cluster) %>%
summarise(across(c(X, Y), mean), .groups = "drop")
# Zusammenfügen
<- left_join(cluster_points, mean_traits, by = "cluster") %>%
out_df st_as_sf(coords = c("X", "Y"), crs = 25832)
st_write(out_df, out_file, delete_dsn = TRUE)
}
#' Export all original LAD point profiles to GeoPackage with full spatial attribution
#'
#' @param lad_df_clean DataFrame with cleaned LAD columns and traits
#' @param out_gpkg Path to output .gpkg file
#' @param layer_name Layer name inside the GPKG
#' @return Writes .gpkg file to disk
<- function(lad_df_clean, out_gpkg = "data/lad_profiles_full.gpkg", layer_name = "lad_profiles") {
export_full_lad_profiles_gpkg stopifnot(all(c("X", "Y") %in% colnames(lad_df_clean)))
# Create sf object
<- sf::st_as_sf(lad_df_clean, coords = c("X", "Y"), crs = 25832) # EPSG 25832 = UTM Zone 32N
sf_points
# Write to GPKG
::st_write(sf_points, out_gpkg, layer = layer_name, delete_layer = TRUE)
sf }
Main pipeline from microclimate_ALS_tc_v4.R
Warning in readLines("../src/microclimate_ALS_tc_v4.R"): incomplete final line
found on '../src/microclimate_ALS_tc_v4.R'
#' --- ENVI-met 3DPLANT column generator from voxelized ALS data ---
#' Full pipeline from ALS voxelization to XML export for ENVI-met 3DPLANT.
#' Includes normalization, topographic metrics, LAD calculation, clustering, and XML writing.
# === Load Libraries ===
library(lidR)
library(terra)
library(dplyr)
library(tidyr)
library(sf)
library(here)
library(XML)
library(stats)
library(tibble)
library(rprojroot)
library(tools)
library(RANN)
library(clusternomics)
library(e1071)
library(entropy)
library(NbClust)
library(matrixStats)
# === Configuration ===
<- data.frame(
ts ID = 1:12,
value = c("agriculture", "alder", "ash", "beech", "douglas_fir", "larch",
"oak", "pastures", "roads", "settlements", "spruce", "water")
)<- c("alder", "ash", "beech", "douglas_fir", "larch", "oak", "spruce")
valid_species <- ts$ID[ts$value %in% valid_species]
valid_ids ="SYN"
plant_prefix<- FALSE
visualize <- here("data/ALS/tiles/")
las_file <- 2
res_xy <- 2
res_z <- 0.3
k <- 1.2
scale_factor <- 25832
crs_code <- "data/envimet/envimet_p3dtree_points.gpkg"
output_gpkg <- "data/envimet/als_envimet_trees.pld"
xml_output_file <- rast("data/aerial/treespecies_cleaned.tif")
species_raster
dir.create("data/output", showWarnings = FALSE, recursive = TRUE)
source("src/new_utils.R")
# === ⬛ STAGE: LAS Merging and Normalization ===
<- merge_las_tiles(las_file, "data/ALS/merged_output.laz", chunk_size = 10000, workers = 6)
las_fn <- readLAS(las_fn)
las crs(las) <- "EPSG:25832"
# === ⬛ STAGE: Ground Classification and DEM/CHM Generation ===
# ⚠ MEMORY: large point cloud + rasters
<- rasterize_canopy(las, res = res_xy, algorithm = pitfree(c(0,1,3,6,9,12,16)))
chm_pre <- terra::focal(chm_pre, w = matrix(1, 3, 3), fun = sd, na.rm = TRUE)
rugosity <- global(rugosity, fun = "mean", na.rm = TRUE)[[1]]
mean_rug
<- if (mean_rug > 1) {
csf_params message("Detected complex/dense canopy – using fine CSF settings")
csf(cloth_resolution = 0.5, rigidness = 2, class_threshold = 0.4, iterations = 800)
else {
} message("Detected open canopy – using coarse CSF settings")
csf(cloth_resolution = 1.5, rigidness = 4, class_threshold = 0.6, iterations = 300)
}
<- classify_ground(las, csf_params)
las <- eval(parse(text = recommend_dem_interpolation(las, res_xy)))
dem_algo <- rasterize_terrain(las, res = res_xy, algorithm = dem_algo)
dem <- rasterize_canopy(las, res = res_xy, algorithm = p2r())
dsm
<- normalize_height(las, algorithm = knnidw(k = 6L, p = 2))
las_norm
# ✅ Free original LAS
#las <- NULL; gc()
# === ⬛ STAGE: CHM + DSM + Topographic Metrics ===
<- pitfree(c(0, 1, 3, 6, 9, 12, 16))
pit_algo <- rasterize_canopy(las_norm, res = res_xy, algorithm = pit_algo)
chm <- terrain(dem, "slope", unit = "radians")
slope <- terrain(dem, "aspect", unit = "degrees")
aspect <- terra::focal(terrain(dsm, "TPI"), w = matrix(1,3,3), fun = mean)
TPI < 0] <- -1; TPI[TPI > 0] <- 1
TPI[TPI
<- c(dem, dsm, chm, slope, aspect, TPI)
topo names(topo) <- c("dem", "dsm", "chm", "slope", "aspect", "TPI")
writeRaster(topo, "data/ALS/topo_stack.tif", overwrite = TRUE)
plot(topo)
# === ⬛ STAGE: Voxelization + LAD ===
# ⚠ MEMORY: voxel and LAD matrices can be huge
<- preprocess_voxels(las_norm, res_xy, res_z)
voxels <- convert_to_LAD_beer(voxels, grainsize = res_z, k = k, scale_factor = scale_factor)
lad_df saveRDS(voxels,"data/voxels.rds")
# ✅ Free voxelized LAS
#las_norm <- NULL; voxels <- NULL; gc()
# === ⬛ STAGE: Compute structural indices ===
# Only keep numeric LAD columns
<- select(lad_df, starts_with("lad_")) |> as.matrix()
lad_matrix rownames(lad_matrix) <- paste(lad_df$X, lad_df$Y, sep = "_")
# More efficient summary statistics
$LAD_mean <- matrixStats::rowMeans2(lad_matrix, na.rm = TRUE)
lad_df$LAD_max <- matrixStats::rowMaxs(lad_matrix, na.rm = TRUE)
lad_df$LAD_height_max <- max.col(lad_matrix, ties.method = "first") * res_z
lad_df
# Skewness, kurtosis, entropy — still using apply (no native rowSkewness etc.)
$LAD_skewness <- apply(lad_matrix, 1, skewness)
lad_dfgc()
$LAD_kurtosis <- apply(lad_matrix, 1, kurtosis)
lad_dfgc()
$LAD_entropy <- apply(lad_matrix, 1, entropy)
lad_dfgc()
# === ⬛ STAGE: Ecological indices ===
<- ncol(lad_matrix)
n_layers <- seq(ceiling(2 * n_layers / 3), n_layers)
top_third $Gap_Fraction <- rowMeans(lad_matrix == 0, na.rm = TRUE)
lad_df$Canopy_Cover_Top <- rowMeans(lad_matrix[, top_third] > 0, na.rm = TRUE)
lad_df$LAD_CV <- apply(lad_matrix, 1, function(x) sd(x, na.rm = TRUE) / mean(x, na.rm = TRUE))
lad_df
# Vertical evenness
<- rowSums(lad_matrix, na.rm = TRUE)
lad_sums $Vertical_Evenness <- sapply(seq_len(nrow(lad_matrix)), function(i) {
lad_df<- lad_matrix[i, ] / lad_sums[i]
p <- p[p > 0]
p -sum(p * log(p)) / log(length(p))
})
# ✅ Remove LAD matrix if done
gc()
# === ⬛ STAGE: Add topographic & species data ===
<- rast("data/ALS/topo_stack.tif")
topo_stack
# Erzeuge ein sf-Objekt aus lad_df
<- st_as_sf(as.data.frame(lad_df), coords = c("X", "Y"), crs = crs_code)
sf_lad <- st_geometry(sf_lad)
geom_only $elev <- exactextractr::exact_extract( topo_stack[["dem"]], st_buffer(geom_only, dist = 0.1), "mean")
lad_df$slope <- exactextractr::exact_extract( topo_stack[["slope"]], st_buffer(geom_only, dist = 0.1), "mean")
lad_df$aspect <- exactextractr::exact_extract( topo_stack[["aspect"]], st_buffer(geom_only, dist = 0.1), "mean")
lad_df$TPI <- exactextractr::exact_extract( topo_stack[["TPI"]], st_buffer(geom_only, dist = 0.1), "mean")
lad_df$CHM <- exactextractr::exact_extract( topo_stack[["chm"]], st_buffer(geom_only, dist = 0.1), "mean")
lad_df$species_class <- exactextractr::exact_extract( species_raster, st_buffer(geom_only, dist = 0.1), "mean")
lad_df
saveRDS(lad_df,"data/lad_df.rds")
# ✅ Remove sf and stack
#sf_lad <- NULL; topo_stack <- NULL;geom_only=NULL; gc()
# === ⬛ STAGE: Combine data for clustering ===
<- lad_df %>%
lad_matrix select(starts_with("lad_pulses_"), starts_with("LAD_"),LAD_skewness,LAD_kurtosis,LAD_CV ,LAD_entropy , Vertical_Evenness) %>%
as.matrix()
rownames(lad_matrix) <- paste(lad_df$X, lad_df$Y, sep = "_")
# === ⬛ STAGE: Filter for valid species and complete rows ===
<- lad_df$species_class %in% valid_ids
valid_idx <- lad_df[valid_idx, ]
lad_df <- lad_matrix[valid_idx, ]
lad_matrix
<- grep("^lad_pulses_", colnames(lad_matrix), value = TRUE)
pulse_cols <- lad_matrix[, pulse_cols]
lad_pulses
# Filter valid rows for clustering
<- apply(lad_pulses, 1, function(x) all(is.finite(x) & !is.na(x)))
valid_rows <- lad_pulses[valid_rows, ]
lad_pulses <- lad_df[valid_rows, ]
lad_df
# === ⬛ STAGE: PCA sampling + NbClust ===
set.seed(42)
<- sample(seq_len(nrow(lad_pulses)), floor(nrow(lad_pulses) * 0.01))
sample_idx <- lad_pulses[sample_idx, ]
sample_data
cat("Sample size before NA filtering:", nrow(sample_data), "\n")
<- which(colMeans(is.na(sample_data)) <= 0.2)
keep_cols <- sample_data[, keep_cols]
sample_data <- sample_data[, apply(sample_data, 2, var, na.rm = TRUE) > 1e-10]
sample_data <- na.omit(sample_data)
sample_data cat("Sample size after cleaning:", nrow(sample_data), "\n")
# ⚠ MEMORY: PCA and NbClust can explode RAM use
<- prcomp(sample_data, scale. = TRUE)
pca_res <- suggest_n_pcs(pca_res, variance_cutoff = 0.9)
pc_info <- pca_res$x[, 1:pc_info$n_pcs]
sample_data_pca
<- NbClust(sample_data_pca, distance = "euclidean", min.nc = 2, max.nc = 30, method = "kmeans")
nb <- as.integer(names(which.max(table(nb$Best.nc[1, ]))))
optimal_k
# Best number of clusters
cat("🔢 Best number of clusters (Best.nc):", optimal_k, "\n\n")
# --- Bereinige ungültige Zeilen für Clustering ---
<- apply(lad_pulses, 1, function(x) all(is.finite(x) & !is.na(x)))
valid_rows <- lad_pulses[valid_rows, ]
lad_pulses <- apply(lad_df, 1, function(x) all(is.finite(x) & !is.na(x)))
valid_rows <- lad_df[valid_rows, ]
lad_df_clean
# 1. Clustering-Spalten selektieren
<- lad_df_clean %>%
lad_dftocluster select(starts_with("lad_pulses_"),
starts_with("LAD_"),
%>%
LAD_skewness, LAD_kurtosis, LAD_CV, LAD_entropy, Vertical_Evenness) mutate(across(everything(), ~ ifelse(. <= 2e-5, 0, .)))
#lad_df <- lad_df[valid_rows, ]
<- lad_df_clean %>%
lad_features select(starts_with("lad_pulses_"), starts_with("LAD_"), LAD_skewness, LAD_kurtosis, LAD_CV, LAD_entropy, Vertical_Evenness) %>%
mutate(across(everything(), ~ ifelse(. <= 2e-5, 0, .)))
# Entferne leere Zeilen
<- lad_dftocluster[rowSums(lad_dftocluster > 0) > 3, ]
lad_dftocluster
<- scale(lad_dftocluster)
lad_dftocluster_scaled
# --- Clustering (KMeans_arma von ClusterR) ---
<- ClusterR::KMeans_arma(
km_arma data = lad_dftocluster,
clusters = optimal_k,
n_iter = 100,
seed_mode = "random_subset"
)
# --- Cluster-Zuweisung berechnen ---
$cluster <- as.integer(
lad_df_clean::predict_KMeans(
ClusterRdata = lad_dftocluster,
km_arma
)
)
library(dplyr)
# Schritt 1: Synthetische Mittelprofile erzeugen (alle LADs mitteln je Cluster)
<- lad_df_clean %>%
cluster_profiles group_by(cluster) %>%
summarise(across(starts_with("lad_"), mean, na.rm = TRUE), .groups = "drop")
# Schritt 2: Eindeutige ENVIMET_ID je Cluster erzeugen
<- cluster_profiles %>%
cluster_profiles mutate(ENVIMET_ID = paste0(plant_prefix, formatC(cluster, width = 3, flag = "0")))
# Schritt 3: ID zurück an alle Originalpunkte hängen
<- lad_df_clean %>%
lad_df_clean left_join(cluster_profiles %>% select(cluster, ENVIMET_ID), by = "cluster")
# ENVI-met exportieren
export_envimet_profiles(
lad_df_clean = lad_df_clean,
res_z = 1 # oder 2 – je nachdem, wie deine Voxelhöhe lautet
)
# microclimf Input exportieren
export_microclimf_profiles(
lad_df_clean = lad_df_clean
)
export_full_lad_profiles_gpkg(
lad_df_clean = lad_df_clean,
out_gpkg = "data/lad_profiles_full.gpkg",
layer_name = "lad_profiles"
)# --- Plot spatial distribution of clusters (optional diagnostics) ---
if (visualize) {
library(ggplot2)
ggplot(lad_df, aes(x = X, y = Y, color = as.factor(cluster))) +
geom_point(size = 0.8) +
coord_equal() +
scale_color_viridis_d(name = "Cluster") +
labs(title = "Spatial distribution of LAD clusters") +
theme_minimal()
}