publications
These are my publications.
2026
- Hypersolid: Emergent Vision Representations via Short-Range RepulsionEsteban Rodríguez-Betancourt and Edgar Casasola-Murillo2026
A recurring challenge in self-supervised learning is preventing representation collapse. Existing solutions typically rely on global regularization, such as maximizing distances, decorrelating dimensions or enforcing certain distributions. We instead reinterpret representation learning as a discrete packing problem, where preserving information simplifies to maintaining injectivity. We operationalize this in Hypersolid, a method using short-range hard-ball repulsion to prevent local collisions. This constraint results in a high-separation geometric regime that preserves augmentation diversity, excelling on fine-grained and low-resolution classification tasks.
@misc{rodríguezbetancourt2026hypersolidemergentvisionrepresentations, title = {Hypersolid: Emergent Vision Representations via Short-Range Repulsion}, author = {Rodríguez-Betancourt, Esteban and Casasola-Murillo, Edgar}, year = {2026}, archiveprefix = {arXiv}, primaryclass = {cs.CV}, url = {https://arxiv.org/abs/2601.21255}, }
2024
- From cart to truck: meaning shift through words in English in the last two centuriesEsteban Rodríguez-Betancourt and Edgar Casasola-Murillo2024
This onomasiological study uses diachronic word embeddings to explore how different words represented the same concepts over time, using historical word data from 1800 to 2000. We identify shifts in energy, transport, entertainment, and computing domains, revealing connections between language and societal changes. Our approach consisted in using diachronic word embeddings trained using word2vec with skipgram and aligning them using orthogonal Procrustes. We discuss possible difficulties linked to the relationships the method identifies. Moreover, we look at the ethical aspects of interpreting results, highlighting the need for expert insights to understand the method’s significance.
@misc{betancourt2024carttruckmeaningshift, title = {From cart to truck: meaning shift through words in English in the last two centuries}, author = {Rodríguez-Betancourt, Esteban and Casasola-Murillo, Edgar}, year = {2024}, archiveprefix = {arXiv}, primaryclass = {cs.CL}, url = {https://arxiv.org/abs/2408.16209}, } - Teaching SQL New Tricks: Efficient Vector Indexing with TrigramsEsteban Rodríguez-Betancourt and Edgar Casasola-MurilloJAIIO, Jornadas Argentinas de Informática, Sep 2024
With the growing use of vector embeddings in areas like natural language processing and recommendation systems, the need for effective storage and retrieval methods is increasingly important. However, deploying specialized databases for vector indexing can be challenging due to resource limitations or operational constraints. This paper introduces a novel approach that utilizes existing trigram indexes within SQL databases to efficiently manage vector embeddings. By adapting traditional relational databases to handle high-dimensional data, organizations can use their existing infrastructure without the need to invest in new database systems. This method reduces management complexity and costs associated with maintaining separate systems for vector data. We outline the process of converting vector embeddings for trigram indexing and evaluate the performance and recall through empirical analysis. This paper aims to offer a practical solution for researchers and practitioners seeking to integrate advanced vector-based queries into their current database systems, thereby enhancing the functionality and accessibility of vector embeddings in mainstream applications.
@article{Rodríguez-Betancourt_Casasola-Murillo_2024, title = {Teaching SQL New Tricks: Efficient Vector Indexing with Trigrams}, volume = {10}, url = {https://revistas.unlp.edu.ar/JAIIO/article/view/17913}, number = {1}, journal = {JAIIO, Jornadas Argentinas de Informática}, author = {Rodríguez-Betancourt, Esteban and Casasola-Murillo, Edgar}, year = {2024}, month = sep, pages = {150--157}, }
2023
- CLEIExploring the Limits of Large Language Models for Word Definition Generation: A Comparative AnalysisEsteban Rodríguez-Betancourt and Edgar Casasola-MurilloIn 2023 XLIX Latin American Computer Conference (CLEI), 2023
In this paper, we explore the ability of large language models (LLMs) to generate word definitions for newly invented words in Spanish through the task of Unknown Definition Modeling. The main goal of our study is to determine the extent to which LLMs can abstract meaning from context and compare the performance of different models for this task. To conduct our analysis, we created a dataset of 20 made-up words, usage examples, and their definitions in Spanish. We then evaluated several LLMs, including OpenAI GPT-3.5-turbo, OpenAI GPT-3, and Google Flan-T5, using automatic evaluation based on cosine similarity of sentence embeddings and qualitative human evaluation on a 4-point Likert scale. Our findings indicate that larger models tend to generate better definitions than smaller models, with the performance of the models generally aligning with their size. This study contributes to our understanding of LLMs’ strengths and weaknesses in generating definitions for unknown words, and offers valuable insights for future research and applications in natural language processing.
@inproceedings{10346136, author = {Rodríguez-Betancourt, Esteban and Casasola-Murillo, Edgar}, booktitle = {2023 XLIX Latin American Computer Conference (CLEI)}, title = {Exploring the Limits of Large Language Models for Word Definition Generation: A Comparative Analysis}, year = {2023}, volume = {}, number = {}, pages = {1-7}, keywords = {Training;Analytical models;Waste materials;Computational modeling;Hate speech;Natural language processing;Internet;Natural languages;Linguistics;Natural language processing;Terminology;Dictionaries}, doi = {10.1109/CLEI60451.2023.10346136} } - CLEIEJAnalysis of the Semantic Shift in Diachronic Word Embeddings for Spanish Before and After COVID-19Esteban Rodríguez-Betancourt and Edgar Casasola-MurilloCLEI Electronic Journal, Sep 2023
Words can shift their meaning across time. This study shows the results obtained by the exploratory analysis of the semantic shifting on Spanish vocabulary using Diachronic Word Embeddings. Diachronic data consists of a 2018 Spanish corpus, before the COVID-19 outbreak, and a second corpus with documents from 2021. This paper addresses the construction of the diachronic Spanish word embeddings model, as well as the results obtained by the analysis using a non-supervised distance vector technique. The results allowed us to identify topics with the most semantic shift between those periods.
@article{betancourt2023cleiej, title = {Analysis of the Semantic Shift in Diachronic Word Embeddings for Spanish Before and After COVID-19}, author = {Rodríguez-Betancourt, Esteban and Casasola-Murillo, Edgar}, journal = {CLEI Electronic Journal}, volume = {26}, number = {2}, year = {2023}, month = sep, doi = {10.19153/cleiej.26.2.4}, url = {https://doi.org/10.19153/cleiej.26.2.4}, keywords = {Linguistics, Natural Language Processing, Natural Languages, Pragmatics} }
2022
- CLEIAnalysis of Semantic Shift Before and After COVID-19 in Spanish Diachronic Word EmbeddingsEsteban Rodríguez-Betancourt and Edgar Casasola-MurilloIn 2022 XLVIII Latin American Computer Conference (CLEI), 2022
Words can shift their meaning across time. This case study shows the results obtained by the exploratory analysis of the semantic shifting on Spanish vocabulary using Diachronic Words Embeddings. Diachronic data consists of a 2018 Spanish corpus, before the COVID-19 outbreak, and a second corpus with documents from 2021. We focused on the semantic shift of three of the topics: COVID-19, masks and vaccines. This paper addresses the construction of the diachronic Spanish word embeddings model, as well as the results obtained by the analysis using a non-supervised distance vector technique. The results allowed to identify shifts related to increase in COVID-19 content.
@inproceedings{9959896, author = {Rodríguez-Betancourt, Esteban and Casasola-Murillo, Edgar}, booktitle = {2022 XLVIII Latin American Computer Conference (CLEI)}, title = {Analysis of Semantic Shift Before and After COVID-19 in Spanish Diachronic Word Embeddings}, year = {2022}, volume = {}, number = {}, pages = {1-9}, keywords = {COVID-19;Vocabulary;Analytical models;Computational modeling;Semantics;Vaccines;Linguistics;natural language processing;natural languages;pragmatics}, doi = {10.1109/CLEI56649.2022.9959896} }
2019
- CLEIDeep Neural Network Comparison for Spanish Tweets Polarity ClassificationEsteban Rodríguez-Betancourt, Pablo Sauma-Chacón, and Edgar Casasola-MurilloIn 2019 XLV Latin American Computing Conference (CLEI), 2019
Two deep neural network models were compared in the task of polarity classification in Spanish text retrieved from social networks. For each model accuracy, precision, recall and F1 was calculated over a particular corpus. Also, the effect of adding gaussian noise on the inputs over the classifier results was evaluated.
@inproceedings{9073947, author = {Rodríguez-Betancourt, Esteban and Sauma-Chacón, Pablo and Casasola-Murillo, Edgar}, booktitle = {2019 XLV Latin American Computing Conference (CLEI)}, title = {Deep Neural Network Comparison for Spanish Tweets Polarity Classification}, year = {2019}, volume = {}, number = {}, pages = {1-6}, keywords = {Computational modeling;Biological neural networks;Machine learning;Google;Convolutional neural networks;Task analysis;Social network services;Sentiment Analysis;Polarity;Neural Networks}, doi = {10.1109/CLEI47609.2019.235083} }