Comparison of linear regression, k-nearest neighbor, and random forest methods in airborne laser scanning-based prediction of growing stock
Self archived versionfinal draft
MetadataShow full item record
CitationCosenza, Diogo N. Korhonen, Lauri. Maltamo, Matti. Packalen, Petteri. Strunk, Jacob L. Næsset, Erik. Gobakken, Terje. Soares, Paula. Tomé, Margarida. (2020). Comparison of linear regression, k-nearest neighbor, and random forest methods in airborne laser scanning-based prediction of growing stock. Forestry, [Epub ahead of print 03 Oct 2020], cpaa034. 10.1093/forestry/cpaa034.
In this study, for five sites around the world, we look at the effects of different model types and variable selection approaches on forest yield modelling performances in an area-based approach (ABA). We compared ordinary least squares regression (OLS), k-nearest neighbours (kNN) and random forest (RF). Our objective was to test if there are systematic differences in accuracy between OLS, kNN and RF in ABA predictions of growing stock volume. The analyses are based on a 5-fold cross-validation at five study sites: an eucalyptus plantation, a temperate forest and three different boreal forests. Two completely independent validation datasets were also available for two of the boreal sites. For the kNN, we evaluated multiple measures of distance including Euclidean, Mahalanobis, most similar neighbour (MSN) and an RF-based distance metric. The variable selection approaches we examined included a heuristic approach (for OLS, kNN and RF), exhaustive search among all combinations (OLS only) and all variables together (RF only). Performances varied by model type and variable selection approaches among sites. OLS and RF had similar accuracies and were more efficient than any of the kNN variants. Variable selection did not affect RF performance. Heuristic and exhaustive variable selection performed similarly for OLS. kNN fared the poorest amongst model types, and kNN with RF distance was prone to overfitting when compared with a validation dataset. Additional caution is therefore required when building kNN models for volume prediction though ABA, being preferable instead to opt for models based on OLS with some variable selection, or RF with all variables together.