Séminaire de Roberto Cilli (GeoRessources)

Quand

6 février 2026    
14h00 - 15h00

CRAN - FST - 4ème
Campus Sciences, Boulevard des Aiguillettes, Vandoeuvre-lès-Nancy, 54506

Type d’évènement

Speaker: Roberto Cilli, GeoRessources (Nancy)
Title: Transformer-Based Geo-technical Classification of Borehole Logs with Benchmarking of Uncertainty Quantification Methods
Location: SiMul meeting room, Faculté des Sciences et Technologies, Henri Poincaré Building, 4th floor
Abstract:
Geological borehole descriptions are a fundamental resource for subsurface modeling, yet they are often stored as unstructured text, limiting their usability for automated analysis. While a growing number of studies have applied natural language processing to geological texts, most approaches disregard the semantic continuity between adjacent lithological descriptions and overlook the structured nature of stratigraphic successions within borehole logs. In this study, we introduce a context-aware sequence labeling framework that applies natural language processing and positional encoding to classify lithological units from borehole log descriptions modeled as structured sequences by combining a pre-trained Sentence-BERT model for semantic encoding with a single-layer Transformer encoder that captures contextual and positional relationships (Reimers et al, 2019, Vaswani et al. 2017). We evaluate our approach on a dataset of manually labeled boreholes from the Pianello hillslope, located in Southern Italy, focusing on five lithological classes relevant to slope stability analysis. The proposed method achieves an accuracy gain of approximately 15% compared to a baseline random forest classifier fed with Sentence-BERT embeddings, revealing that the context of a lithological description can significantly improve the classification performance of NLP algorithms. Furthermore, we show that the proposed architecture, consisting of less than 350k learnable parameters, is lightweight and scalable, enabling rapid processing and confirming its practical applicability in real-world scenarios especially in low-resource computing environments (Asus ROG Flow X13 AMD Ryzen9 16GB RAM equipped with a NVIDIA RTX3050 4GB VRAM).

Moreover, a comparison between benchmark uncertainty quantification (UQ) algorithms, including the Bayesian by Backprop (Blundell et al. 2015), Deep Ensemble (Lakshminarayanan et al. 2016), MC Dropout (Gal and Ghahramani, 2016) and a custom Bayesian framework inspired by Kendall and Gal, 2016. Results indicate that the framework inspired Kendall and Gal, 2016 is the only capable to disentangle the epistemic and aleatoric components of uncertainty, among those tested. However, uncertainty estimates still need further validation since we observed a weak negative correlation between epistemic uncertainty and classification accuracy in real unseen samples while undesired behaviours are observed when the network is fed with synthetic and meaningless inputs.

Kendall and Gal, 2016. What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?
Lakshminarayanan et al. 2017, Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles.
Gal and Z. Ghahramani, 2016. Dropout as a Bayesian approximation: Representing model uncertainty in deep learning.
Kirkwood et al, 2022. Bayesian Deep Learning for Spatial Interpolation in the Presence of Auxiliary Information.
N. Reimers, I. Gurevych, Sentence-BERT: Sentence embeddings using siamese BERT-Networks (2019).
Vaswani et al, 2017. Attention is all you need.