Publications

Uncertainty-Aware Deep Neural Network Training for Imbalanced Geochemical Data Distributions

The growing interest in raw material extraction, particularly in trace elements, highlights the need for innovative geochemical modeling techniques to predict element concentrations accurately. This paper explores the predictive capabilities of a deep neural network (DNN) in estimating the concentrations of 20 trace elements based on 11 major elements and pH values. Using data from the BrineMine project, we applied DNNs to a challenging dataset characterized by a small sample size and imbalanced distributions. In total, 1000 independent DNN models were generated to address prediction accuracy and uncertainty instead of relying on a single model. Two preprocessing methods, including synthetic minority over-sampling technique for regression with Gaussian noise (SMOGN) statistical transformation, were applied to improve the accuracy and decrease uncertainty further. Despite issues such as low initial correlations between input features and target variables, imbalanced data distributions, and extremely low concentrations, the DNN models provided reliable and robust results, except for Cu and V. For 13 trace elements, the DNN models achieved acceptable reliability with R2 > 0.8. Analyzing the weight distribution of the DNN revealed that input features with high cross-correlation are prone to sharing the same information. While input features such as Fe, pH, and Mg are highly correlated to several target variables, accumulated local effects (ALE) scores indicate that Li has the highest influence, as it is the only input feature with a high correlation coefficient to some of the target variables.

AnnRG – An artificial neural network solute geothermometer

Lars H. Ystroem, Mark Vollmer, Thomas KohlFabian Nitschke

Abstract
Solute artificial neural network geothermometers offer the possibility to overcome the complexity given by the solute-mineral composition. Herein, we present a new concept, trained from high-quality hydrochemical data and verified by in-situ temperature measurements with a total of 208 data pairs of geochemical input parameters (Na+, K+, Ca2+, Mg2+, Cl, SiO2, and pH) and reservoir temperature measurements being compiled. The data comprises nine geothermal sites with a broad variety of geochemical characteristics and enthalpies. Five sites with 163 samples (Upper Rhine Graben, Pannonian Basin, German Molasse Basin, Paris Basin, and Iceland) are used to develop the ANN geothermometer, while further four sites with 45 samples (Azores, El Tatio, Miavalles, and Rotorua) are used to encounter the established artificial neural network in practice to unknown data. The setup of the application, as well as the optimisation of the network architecture and its hyperparameters, are stepwise introduced. As a result, the solute ANN geothermometer, AnnRG (Artificial neural network Regression Geothermometer), provides precise reservoir temperature predictions (RMSE of 10.442 K) with a high prediction accuracy of R2 = 0.978. In conclusion, the implementation and verification of the first adequate ANN geothermometer is an advancement in solute geothermometry. Our approach is also a basis for further broadening and refining applications in geochemistry.