A mixed neural network and support vector machine model for tender creation in the European union TED database

Publikation: Bidrag til bog/antologi/rapport/konference-proceedingKonferencebidrag i proceedingsForskningpeer review

Resumé

This research article proposes a new method of automatized text generation and subsequent classification of the European Union (EU) Tender Electronic Daily (TED) text documents into predefined technological categories of the dataset. The TED dataset provides information about the respective tenders includes features like Name of project, Title, Description, Types of contract, Common procurement vocabulary (CPV) code, and Additional CPV codes. The dataset is obtained from the SIMAP-Information system for the European public procurement website, which is comprised of tenders described in XML files. The dataset was preprocessed using tokenization, removal of stop words, removal of punctuation marks etc. We implemented a neural machine learning model based on Long Short-Term Memory (LSTM) nodes for text generation and subsequent code classification. Text generation means that given a single line or just two or three words of the title, the model generates the sequence of a whole sentence. After generating the title, the model predicts the main applicable CPV code for that title. The LSTM model reaches an accuracy of 97% for the text generation and 95% for code classification using Support Vector Machine(SVM). This experiment is a first step towards developing a system that based on TED data is able to auto-generate and code classify tender documents, easing the process of creating and disseminating tender information to TED and ultimately relevant vendors. The development and automation of this system will future vision and understand current undergoing projects and the deliveries by a SIMAP-Information system for European public procurement tenders organisation based on the tenders published by it.

OriginalsprogEngelsk
TitelProceedings of the 11th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2019))
RedaktørerJorge Bernardino, Ana Salgado, Joaquim Filipe
Antal sider7
Vol/bind3
ForlagSCITEPRESS Digital Library
Publikationsdato2019
Sider139-145
ISBN (Elektronisk)9789897583827
DOI
StatusUdgivet - 2019
Begivenhed11th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, IC3K 2019 - Vienna, Østrig
Varighed: 17. sep. 201919. sep. 2019

Konference

Konference11th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, IC3K 2019
LandØstrig
ByVienna
Periode17/09/201919/09/2019
SponsorInstitute for Systems and Technologies of Information, Control and Communication (INSTICC)

Fingeraftryk

Support vector machines
Neural networks
Information systems
XML
Learning systems
Websites
Automation
European Union
Experiments
Long short-term memory

Citer dette

Kayte, S., & Schneider-Kamp, P. (2019). A mixed neural network and support vector machine model for tender creation in the European union TED database. I J. Bernardino, A. Salgado, & J. Filipe (red.), Proceedings of the 11th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2019)) (Bind 3, s. 139-145). SCITEPRESS Digital Library. https://doi.org/10.5220/0008362701390145
Kayte, Sangramsing ; Schneider-Kamp, Peter. / A mixed neural network and support vector machine model for tender creation in the European union TED database. Proceedings of the 11th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2019)). red. / Jorge Bernardino ; Ana Salgado ; Joaquim Filipe. Bind 3 SCITEPRESS Digital Library, 2019. s. 139-145
@inproceedings{86c0d4938d904a458333da684406a307,
title = "A mixed neural network and support vector machine model for tender creation in the European union TED database",
abstract = "This research article proposes a new method of automatized text generation and subsequent classification of the European Union (EU) Tender Electronic Daily (TED) text documents into predefined technological categories of the dataset. The TED dataset provides information about the respective tenders includes features like Name of project, Title, Description, Types of contract, Common procurement vocabulary (CPV) code, and Additional CPV codes. The dataset is obtained from the SIMAP-Information system for the European public procurement website, which is comprised of tenders described in XML files. The dataset was preprocessed using tokenization, removal of stop words, removal of punctuation marks etc. We implemented a neural machine learning model based on Long Short-Term Memory (LSTM) nodes for text generation and subsequent code classification. Text generation means that given a single line or just two or three words of the title, the model generates the sequence of a whole sentence. After generating the title, the model predicts the main applicable CPV code for that title. The LSTM model reaches an accuracy of 97{\%} for the text generation and 95{\%} for code classification using Support Vector Machine(SVM). This experiment is a first step towards developing a system that based on TED data is able to auto-generate and code classify tender documents, easing the process of creating and disseminating tender information to TED and ultimately relevant vendors. The development and automation of this system will future vision and understand current undergoing projects and the deliveries by a SIMAP-Information system for European public procurement tenders organisation based on the tenders published by it.",
keywords = "Common Procurement Vocabulary, European Union, Logistic Regression, Named-entity Recognition, Natural Language Processing, Natural Language Understanding, Tender Electronic Daily",
author = "Sangramsing Kayte and Peter Schneider-Kamp",
year = "2019",
doi = "10.5220/0008362701390145",
language = "English",
volume = "3",
pages = "139--145",
editor = "Jorge Bernardino and Ana Salgado and Joaquim Filipe",
booktitle = "Proceedings of the 11th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2019))",
publisher = "SCITEPRESS Digital Library",

}

Kayte, S & Schneider-Kamp, P 2019, A mixed neural network and support vector machine model for tender creation in the European union TED database. i J Bernardino, A Salgado & J Filipe (red), Proceedings of the 11th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2019)). bind 3, SCITEPRESS Digital Library, s. 139-145, 11th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, IC3K 2019, Vienna, Østrig, 17/09/2019. https://doi.org/10.5220/0008362701390145

A mixed neural network and support vector machine model for tender creation in the European union TED database. / Kayte, Sangramsing; Schneider-Kamp, Peter.

Proceedings of the 11th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2019)). red. / Jorge Bernardino; Ana Salgado; Joaquim Filipe. Bind 3 SCITEPRESS Digital Library, 2019. s. 139-145.

Publikation: Bidrag til bog/antologi/rapport/konference-proceedingKonferencebidrag i proceedingsForskningpeer review

TY - GEN

T1 - A mixed neural network and support vector machine model for tender creation in the European union TED database

AU - Kayte, Sangramsing

AU - Schneider-Kamp, Peter

PY - 2019

Y1 - 2019

N2 - This research article proposes a new method of automatized text generation and subsequent classification of the European Union (EU) Tender Electronic Daily (TED) text documents into predefined technological categories of the dataset. The TED dataset provides information about the respective tenders includes features like Name of project, Title, Description, Types of contract, Common procurement vocabulary (CPV) code, and Additional CPV codes. The dataset is obtained from the SIMAP-Information system for the European public procurement website, which is comprised of tenders described in XML files. The dataset was preprocessed using tokenization, removal of stop words, removal of punctuation marks etc. We implemented a neural machine learning model based on Long Short-Term Memory (LSTM) nodes for text generation and subsequent code classification. Text generation means that given a single line or just two or three words of the title, the model generates the sequence of a whole sentence. After generating the title, the model predicts the main applicable CPV code for that title. The LSTM model reaches an accuracy of 97% for the text generation and 95% for code classification using Support Vector Machine(SVM). This experiment is a first step towards developing a system that based on TED data is able to auto-generate and code classify tender documents, easing the process of creating and disseminating tender information to TED and ultimately relevant vendors. The development and automation of this system will future vision and understand current undergoing projects and the deliveries by a SIMAP-Information system for European public procurement tenders organisation based on the tenders published by it.

AB - This research article proposes a new method of automatized text generation and subsequent classification of the European Union (EU) Tender Electronic Daily (TED) text documents into predefined technological categories of the dataset. The TED dataset provides information about the respective tenders includes features like Name of project, Title, Description, Types of contract, Common procurement vocabulary (CPV) code, and Additional CPV codes. The dataset is obtained from the SIMAP-Information system for the European public procurement website, which is comprised of tenders described in XML files. The dataset was preprocessed using tokenization, removal of stop words, removal of punctuation marks etc. We implemented a neural machine learning model based on Long Short-Term Memory (LSTM) nodes for text generation and subsequent code classification. Text generation means that given a single line or just two or three words of the title, the model generates the sequence of a whole sentence. After generating the title, the model predicts the main applicable CPV code for that title. The LSTM model reaches an accuracy of 97% for the text generation and 95% for code classification using Support Vector Machine(SVM). This experiment is a first step towards developing a system that based on TED data is able to auto-generate and code classify tender documents, easing the process of creating and disseminating tender information to TED and ultimately relevant vendors. The development and automation of this system will future vision and understand current undergoing projects and the deliveries by a SIMAP-Information system for European public procurement tenders organisation based on the tenders published by it.

KW - Common Procurement Vocabulary

KW - European Union

KW - Logistic Regression

KW - Named-entity Recognition

KW - Natural Language Processing

KW - Natural Language Understanding

KW - Tender Electronic Daily

U2 - 10.5220/0008362701390145

DO - 10.5220/0008362701390145

M3 - Article in proceedings

VL - 3

SP - 139

EP - 145

BT - Proceedings of the 11th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2019))

A2 - Bernardino, Jorge

A2 - Salgado, Ana

A2 - Filipe, Joaquim

PB - SCITEPRESS Digital Library

ER -

Kayte S, Schneider-Kamp P. A mixed neural network and support vector machine model for tender creation in the European union TED database. I Bernardino J, Salgado A, Filipe J, red., Proceedings of the 11th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2019)). Bind 3. SCITEPRESS Digital Library. 2019. s. 139-145 https://doi.org/10.5220/0008362701390145