Skip to main content

Research Repository

Advanced Search

Outputs (36)

Cluster-based oversampling with area extraction from representative points for class imbalance learning (2024)
Journal Article
Farou, Z., Wang, Y., & Horváth, T. (2024). Cluster-based oversampling with area extraction from representative points for class imbalance learning. Intelligent Systems with Applications, 22, Article 200357. https://doi.org/10.1016/j.iswa.2024.200357

Class imbalance learning is challenging in various domains where training datasets exhibit disproportionate samples in a specific class. Resampling methods have been used to adjust the class distribution, but they often have limitations for small dis... Read More about Cluster-based oversampling with area extraction from representative points for class imbalance learning.

Better trees: an empirical study on hyperparameter tuning of classification decision tree induction algorithms (2024)
Journal Article
Mantovani, R. G., Horváth, T., Rossi, A. L. D., Cerri, R., Barbon Junior, S., Vanschoren, J., & de Carvalho, A. C. P. L. F. (in press). Better trees: an empirical study on hyperparameter tuning of classification decision tree induction algorithms. Data Mining and Knowledge Discovery, https://doi.org/10.1007/s10618-024-01002-5

Machine learning algorithms often contain many hyperparameters whose values affect the predictive performance of the induced models in intricate ways. Due to the high number of possibilities for these hyperparameter configurations and their complex i... Read More about Better trees: an empirical study on hyperparameter tuning of classification decision tree induction algorithms.

A Comparative Study of Assessment Metrics for Imbalanced Learning (2023)
Conference Proceeding
Farou, Z., Aharrat, M., & Horváth, T. (2023). A Comparative Study of Assessment Metrics for Imbalanced Learning. In New Trends in Database and Information Systems: ADBIS 2023 Short Papers, Doctoral Consortium and Workshops: AIDMA, DOING, K-Gals, MADEISD, PeRS, Barcelona, Spain, September 4–7, 2023, Proceedings (119-129). https://doi.org/10.1007/978-3-031-42941-5_11

There are several machine learning algorithms addressing class imbalance problem, requiring standardized metrics for adequete performance evaluation. This paper reviews several metrics for imbalanced learning in binary and multi-class problems. We em... Read More about A Comparative Study of Assessment Metrics for Imbalanced Learning.

Squared Symmetric Formal Contexts and Their Connections with Correlation Matrices (2023)
Conference Proceeding
Antoni, L., Eliaš, P., Horváth, T., Krajči, S., Krídlo, O., & Török, C. (2023). Squared Symmetric Formal Contexts and Their Connections with Correlation Matrices. In Graph-Based Representation and Reasoning: 28th International Conference on Conceptual Structures, ICCS 2023, Berlin, Germany, September 11–13, 2023, Proceedings (19-27). https://doi.org/10.1007/978-3-031-40960-8_2

Formal Concept Analysis identifies hidden patterns in data that can be presented to the user or the data analyst. We propose a method for analyzing the correlation matrices based on Formal concept analysis. In particular, we define a notion of square... Read More about Squared Symmetric Formal Contexts and Their Connections with Correlation Matrices.

NCC: Neural concept compression for multilingual document recommendation (2023)
Journal Article
Tashu, T. M., Lenz, M., & Horváth, T. (2023). NCC: Neural concept compression for multilingual document recommendation. Applied Soft Computing, 142, Article 110348. https://doi.org/10.1016/j.asoc.2023.110348

In this work, we propose a novel method for generating inter-lingual document representations using neural network concept compression. The presented approach is intended to improve the quality of content-based multilingual document recommendation an... Read More about NCC: Neural concept compression for multilingual document recommendation.

Hyper-parameter initialization of classification algorithms using dynamic time warping: A perspective on PCA meta-features (2022)
Journal Article
Horváth, T., Mantovani, R. G., & de Carvalho, A. C. (2023). Hyper-parameter initialization of classification algorithms using dynamic time warping: A perspective on PCA meta-features. Applied Soft Computing, 134, Article 109969. https://doi.org/10.1016/j.asoc.2022.109969

Meta-learning, a concept from the area of automated machine learning, aims at providing decision support for data scientists by recommending a suitable setting (a machine learning algorithm or its hyper-parameters) to be used for a given dataset. Suc... Read More about Hyper-parameter initialization of classification algorithms using dynamic time warping: A perspective on PCA meta-features.

Solving Multi-class Imbalance Problems Using Improved Tabular GANs (2022)
Conference Proceeding
Farou, Z., Kopeikina, L., & Horváth, T. (2022). Solving Multi-class Imbalance Problems Using Improved Tabular GANs. In H. Yin, D. Camacho, & P. Tino (Eds.), Intelligent Data Engineering and Automated Learning – IDEAL 2022: 23rd International Conference, IDEAL 2022, Manchester, UK, November 24–26, 2022, Proceedings (527-539). https://doi.org/10.1007/978-3-031-21753-1_51

Multi-class imbalance problems are non-standard derivative data science problems. These problems are associated with the skewness in the data underlying distribution, which, in turn, raises numerous issues for conventional machine learning techniques... Read More about Solving Multi-class Imbalance Problems Using Improved Tabular GANs.

Synonym-Based Essay Generation and Augmentation for Robust Automatic Essay Scoring (2022)
Conference Proceeding
Tashu, T. M., & Horváth, T. (2022). Synonym-Based Essay Generation and Augmentation for Robust Automatic Essay Scoring. In H. Yin, D. Camacho, & P. Tino (Eds.), Intelligent Data Engineering and Automated Learning – IDEAL 2022: 23rd International Conference, IDEAL 2022, Manchester, UK, November 24–26, 2022, Proceedings (12-21). https://doi.org/10.1007/978-3-031-21753-1_2

Automatic essay scoring (AES) models based on neural networks (NN) have had a lot of success. However, research has shown that NN-based AES models have robustness issues, such that the output of a model changes easily with small changes in the input.... Read More about Synonym-Based Essay Generation and Augmentation for Robust Automatic Essay Scoring.

Object Detection Using Sim2Real Domain Randomization for Robotic Applications (2022)
Journal Article
Horváth, D., Erdős, G., Istenes, Z., Horváth, T., & Földi, S. (2023). Object Detection Using Sim2Real Domain Randomization for Robotic Applications. IEEE Transactions on Robotics, 39(2), 1225-1243. https://doi.org/10.1109/tro.2022.3207619

Robots working in unstructured environments must be capable of sensing and interpreting their surroundings. One of the main obstacles of deep-learning-based models in the field of robotics is the lack of domain-specific labeled data for different ind... Read More about Object Detection Using Sim2Real Domain Randomization for Robotic Applications.

Dynamic noise filtering for multi-class classification of beehive audio data (2022)
Journal Article
Várkonyi, D. T., Seixas Junior, J. L., & Horváth, T. (2023). Dynamic noise filtering for multi-class classification of beehive audio data. Expert Systems with Applications, 213(Part A), Article 118850. https://doi.org/10.1016/j.eswa.2022.118850

Honeybees are the most specialized insect pollinators and are critical not only for honey production but, also, for keeping the environmental balance by pollinating the flowers of a wide variety of crops. Recording and analyzing bee sounds became... Read More about Dynamic noise filtering for multi-class classification of beehive audio data.