Welcome to DBDM 2020

8th International Conference on Database and Data Mining (DBDM 2020)

December 12~13, 2020, Dubai, UAE

Accepted Papers

A Study into Math Document Classification using Deep Learning

Fatimah Alshamari and Abdou Youssef, Department of Computer Science, The George Washington University, Washington, D.C, USA

ABSTRACT

Document classification is a fundamental task for many applications, including document annotation, document understanding, and knowledge discovery. This is especially true in STEM fields where the growth rate of scientific publications is exponential, and where the need for document processing and understanding is essential to technological advancement. Classifying a new publication into a specific domain based on the content of the document is an expensive process in terms of cost and time. Therefore, there is a high demand for a reliable document classification system. In this paper, we focus on classification of mathematics documents, which consist of English text and mathematics formulas and symbols. The paper addresses two key questions. The first question is whether math-document classification performance is impacted by math expressions and symbols, either alone or in conjunction with the text contents of documents. Our investigations show that Text-Only embedding produces better classification results. The second question we address is the optimization of a deep learning (DL) model, the LSTM combined with one dimension CNN, for math document classification. We examine the model with several input representations, key design parameters and decision choices, and choices of the best input representation for math documents classification.

KEYWORDS

Math, document, classification, deep learning, LSTM


Proposed Model for Enhancing Retrieving Process in Big Data Management

Ayman E. Khedr1, Mohamed Attia Mohamed2, Abdulwahab Ali Almazroi3, 1University of Jeddah, College of Computing and Information Technology at Khulais, Department of Information Systems, Jeddah, Saudi Arabia, 2Future University in Egypt, Egypt, 3University of Jeddah, College of Computing and Information Technology at Khulais, Department of Information Technology, Jeddah, Saudi Arabia

ABSTRACT

Nowadays, operations of the Internet have a significant growth and size of data is increasing every second. Most of organizations and individuals were unaware of such data explosion because quantity of data is continuously increasing. Consequently, managing and controlling tools and methodologies of big data become critical aspect. One of the big issues that needed to be tackled when working with big data is how to manage data effectively. To address this issue, there are two main research directions exist. The first one is using big data frameworks like Hive and pig Latin while the other one is employing NoSQL data models like key-value, graph, column and document stores. In addition, unprecedented data volume and the complexity of managing data across complex multi- infrastructure only further exacerbate the problems. This paper reviews different representative techniques that treat with big data management challenges and finally, proposed a model for handling such issues.

KEYWORDS

Big data, NoSQL, Machine learning, JackHare, Hive


Experiments on Nl2Sql Using Sqlova, Tabert and Lookahead Optimizer

Shubham V Chaudhari and Kameshwar Rao JV, HCL Technologies LTD, India

ABSTRACT

With the advancement of deep learning in NLP there has been keen interest to convert natural language to SQL across academics and industry. Various models have been developed to address this problem which employ techniques like reinforcement learning, seq-to-seq, seq-to-set etc. We present an approach, where TaBERT and SQLOVA [2] are combined. TaBERT[1] trained on structured text improves over traditional BERT[3] thus better enhancing the features of the input query and headers. NL2SQL layer of the SQLOVA connected at the top of TaBERT which further encodes the query and headers further enhancing the features. The choice of optimizer plays a key role in improving the model’s results. This proposed architecture with lookahead optimizer[4] surpasses the accuracy of where-num, where-col and where-cond by 0.2%,0.5%,0.4% respectively.

KEYWORDS

nl2sql, deep neural networks, NLP


Blockchain-based Ticketing Solution for Collegiate Athletics

Zaki Zahed1, Matt Fitzgerald2, Ronald Sayles3, 1IT Engineering Department, Saudi Aramco, Dhahran, Saudi Arabia, 2TCP program, University of Colorado at Boulder, Boulder, Colorado, USA, 3TCP program, University of Colorado at Boulder, Boulder, Colorado, USA

ABSTRACT

This paper proposes an ecosystem for Blockchain-Based Ticketing Solution for Collegiate Athletics. Utilizing technologies such as digital ledgers paired with cryptography, this paper constructs a theoretical implementation of secure digital ticketing. Four components essential to operation are identified as: issuer, user, verifier and DID (Decentralized Identifiers). The proposed solution begins with an authenticated University user. Said user must grant the ticketing website access to the user's assigned University identifier through QR code/login. This initial handshake is signed with private keys of both the University and user which is confirmed by the ticketing website. A digital ticket to the event, signed with the website's private key, is then released to the user via smart contract. The smart contract is then stored by the ticketing website into the blockchain. Upon arrival at the event the user presents the digital ticket (QR code) signed by the website and user's private keys. By doing so, proof of identity through authenticated University identifier is confirmed while simultaneously executes the aforementioned smart contract. Once the ticket and respective signatures are verified through the University's QR code scanner, the user is granted access into the event and the ticket can no longer be reused/resold.

KEYWORDS

Blockchain, Digital Identity, Digital Ticketing, Collegiate Athletics


Genetic Algorithm for Exam Timetabling Problem-a Specific Case for Japanese University Final Presentation Timetabling

Jiawei LI and Tad Gonsalves, Department of Information & Communication Sciences. Faculty of Science and Technology, Sophia University, Tokyo, Japan

ABSTRACT

This paper presents a Genetic Algorithm approach to solve a specific examination timetabling problem which is common in Japanese Universities. The model is programmed in Excel VBA programming language, which could be run on the Microsoft Office Excel worksheets directly. The model uses direct chromosome representation. To satisfy hard and soft constraints, constraint-based initialization operation, constraint-based crossover operation and penalty points system are implemented. To further improve the result quality of the algorithm, this paper designed an improvement called initial population pre-training. The proposed model was tested by the real data from Sophia University, Tokyo, Japan. The model shows acceptable results and the comparison results prove that the initial population pre-training approach can improve the result quality.

KEYWORDS

Examination timetabling problem, Excel VBA, Direct chromosome representation, Genetic Algorithm Improvement


Real-time Emotion based Virtual Assistant

M.M.A Safnaj, E.A.S.Ahamed, B.K.S Geethmi, M.G.A.U Jayasooriya and Samantha Rajapaksha, Department of Information Technology, Sri Lanka Institute of Information Technology, New Kandy Road, Malabe, Sri Lanka

ABSTRACT

The objective of this project is to automate the interaction with the users and a virtual assistant application on smartphones by reading the user’s emotion in real-time. Human emotions play a major role in people’s day-to-day life. Therefore, understanding the human emotional state of the user enables efficient human-computer interaction and leads to build emotion aware applications. The existing virtual assistant applications such as Apple’s Siri, Google Assistant & Amazons’ Alexa are capable of performing some tasks based on users’ verbal inputs. Today’s advancement in technology has allowed various technologies such as Machine Learning & Artificial Intelligence to make ordinary applications to smarter. The application results from testing the live captured images and detect emotions using the Machine Learning Model which has been built with the Convolutional Neural Network to help to achieve a high accuracy rate to provide suggestions to the user based on the user’s current emotional state. Also, the solution has focused on content-based recommendation and user behavior analysis to provide more appropriate suggestions and tasks to the user to enhance more user experience. While this approach increases efficiency and user experience in the field of Virtual Assistants, our solution differs from other platforms being with having a privacy-based emotion recognition API which doesn’t require store user’s emotional pictures for the prediction purpose.

KEYWORDS

Real-Time Emotion Detection, Content-Based Recommendations, User Behaviour Analysis, Virtual Assistant.


Geothermal Energy for Refrigeration and Air Conditioning, Sustainable Development, and the Environment

A.M. Omer* , Energy Research Institute (ERI), Nottingham NG7 4EU, United Kingdom

ABSTRACT

Geothermal heat pumps (GSHPs), or direct expansion (DX) ground source heat pumps, are a highly efficient renewable energy technology, which uses the earth, groundwater or surface water as a heat source when operating in heating mode or as a heat sink when operating in a cooling mode. It is receiving increasing interest because of its potential to decrease primary energy consumption and thus reduce emissions of the greenhouse gases (GHGs). The main concept of this technology is that it uses the lower temperature of the ground (approximately lessthan 32°C), which remains relatively stable throughout the year, to provide space heating, cooling and domestic hot water inside the building area. The main goal of this study was to stimulate the uptake of the GSHPs. Recent attempts to stimulate alternative energy sources for heating and cooling of buildings have emphasised the utilisation of the ambient energy from ground source and other renewable energy sources. The purpose of this study, however, was to examine the means of reducing of energy consumption in buildings, identifying GSHPs as an environmental friendly technology able to provide efficient utilisation of energy in the buildings sector, promoting the use of GSHPs applications as an optimum means of heating and cooling, and presenting typical applications and recent advances of the DX GSHPs. The study highlighted the potential energy saving that could be achieved through the use of ground energy sources. It also focused on the optimisation and improvement of the operation conditions of the heat cycle and performance of the DX GSHP. It is concluded that the direct expansion of the GSHP, combined with the ground heat exchanger in foundation piles and the seasonal thermal energy storage from solar thermal collectors, is extendable to more comprehensive applications.

KEYWORDS

Geothermal heat pumps, direct expansion, ground heat exchanger, heating and cooling


How to Engage Followers: Classifying Fashion Brands According to their Instagram Profiles, Posts and Comments

Stefanie Scholz1 and Christian Winkler2, 1Department ofSocial Economy, WilhemLoehe University of Applied Sciences, Fuerth, Germany, 2Christian Winkler, datanizing GmbH, Schwarzenbruck, Germany

ABSTRACT

In this article we show how fashion brands communicate with their follower on Instagram. We use a continuously update dataset of 68 brands, more than 300,000 posts and more than 40,000,000 comments. Starting with descriptive statistics, we uncover different behavior and success of the various brands. It turns out that there are patterns specific to luxury, mass-market and sportswear brands. Posting volume is extremely brand dependent as is the number of comments and the engagement of the community.Having understood the statistics, we turn to machine learning techniques to measure the response of the community via comments. Topic models help us understand the structure of their respective community and uncover insights regarding the response to campaigns.Having up-to-date content is essential for this kind of analysis, as the market is highly volatile. Furthermore, automatic data analysis is crucial to measure the success of campaigns and adjust them accordingly for maximum effect.

KEYWORDS

Instagram, Fashion Brands, Data Extraction, Marketing, Analysis, Artificial Intelligence, Netnography, Descriptive Statistics, Visualization, Community Engagement, Artificial Intelligence, Unsupervised Learning, Topic Modelling.


A Research on Client-side-based Web Attack Response using Ensemble Model

Hyeongmin Kim1, Suhyeon Oh1, Yerin Im1, Hyeonseong Jeong1, Jiwon Hong1, Jaehyeon Cho1, Hyeonmin Kim2, Kyounggon Kim3, 1Best of the Best, Korea Information Technology Research Institute 2Financial Security Institute, 3Naif Arab University for Security Sciences

ABSTRACT

Pattern detection method, which is the existing web attack response method, has a high possibility of bypassing patterns and is less likely to respond to new attacks if JavaScript and others are updated. To improve this, this paper presents how to respond to client side-based web attacks by utilizing ensemble techniques of Random Forest, Deep Neural Network (DNN), and Convolutional Neural Network (CNN) models. By analyzing features that frequently appear in the exploit kit, we were able to extract features and obtain models with 99.33% (Random Forest), 99.64% (DNN) and 99.88% (CNN) accuracy through learning and testing. We propose various utilization methods to utilize the ensemble techniques of these models to provide users with a safe browsing environment.

KEYWORDS

AI, Machine Learning, Deep Learning, Client-Side, Ensemble technique.


Machine Learning Algorithm for Nlos Millimeter Wave in 5G V2X Communication

Deepika Mohan1, Peter Han Joo Chong1 and G.G. Md. Nawaz2, 1Department of Electrical and Electronics Engineering, Auckland University of Technology, Auckland, New Zealand, 2Department of Applied Computer Science, University of Charleston, WV 25304, USA

ABSTRACT

The 5G vehicle to everything (V2X) communication for autonomous and semi-autonomous driving utilizes the wireless technology for communication and the Millimeter Wave bands are widely implemented in this kind of vehicular network application. The main purpose of this paper is to broadcast the messages from the mmWave Base Station to vehicles at LOS (Line-of-sight) and NLOS (Non-LOS). Relay using Machine Learning (RML) algorithm is formulated to train the mmBS for identifying the blockages within its coverage area and broadcast the messages to the vehicles at NLOS using a LOS nodes as a relay. The transmission of information is faster with higher throughput and it covers a wider bandwidth which is reused, therefore when performing machine learning within the coverage area of mmBS most of the vehicles in NLOS can be benefited. A unique method of relay mechanism combined with machine learning is proposed to communicate with mobile nodes at NLOS.

KEYWORDS

5G, Millimeter Wave, Machine Learning, Relay, V2X communication.