Abstract
Because of the spread of epidemics and diseases in many countries around the world, news consumption from online sources has substantially increased. These news stories written in Arabic are about more than one topic, which is interesting for the multi-label classification paradigm. Furthermore, the recent studies based on multi-label Arabic text classification deal with news articles, which are rather long texts. Thus, we put forward a large dataset of concise Arabic news based basically on the Corona virus, namely Hana-MC, which has been built from various news portals. We conducted a comparative study using several multi-label classification approaches, including algorithm adaptation, problem transformation, and ensemble methods. Experimental results showed that the Ensemble Method RAKELD with the Random Forest base classifier obtained the best accuracy score.
Originalsprog | Engelsk |
---|---|
Tidsskrift | Procedia Computer Science |
Vol/bind | 246 |
Sider (fra-til) | 3556-3565 |
Antal sider | 10 |
ISSN | 1877-0509 |
DOI | |
Status | Udgivet - nov. 2024 |
Begivenhed | 28th International Conference on Knowledge Based and Intelligent information and Engineering Systems, KES 2024 - Seville, Spanien Varighed: 11. nov. 2022 → 12. nov. 2022 |
Konference
Konference | 28th International Conference on Knowledge Based and Intelligent information and Engineering Systems, KES 2024 |
---|---|
Land/Område | Spanien |
By | Seville |
Periode | 11/11/2022 → 12/11/2022 |
Bibliografisk note
Publisher Copyright:© 2024 The Authors.