Recent publications in Information Technology
Chaturvedi, Iti, Cambria, Erik, Chen, Qian, and McConnell, Desmond (2021) Landmark calibration for facial expressions and fish classification. Signal, Image and Video Processing. (In Press)
This paper considers the automatic labeling of emotions in face images found on social media. Facial landmarks are commonly used to classify the emotions from a face image. However, it is difficult to accurately segment landmarks for some faces and for subtle emotions. Previous authors used a Gaussian prior for the refinement of landmarks, but their model often gets stuck in a local minima. Instead, the calibration of the landmarks with respect to the known emotion class label using principal component analysis is proposed in this paper. Next, the face image is generated from the landmarks using an image translation model. The proposed model is evaluated on the classification of facial expressions and also for fish identification underwater and outperforms baselines in accuracy by over 20%.
Miao, Yuantian, Minhui, Xue, Chen, Chao, Pan, Lei, Zhang, Jun, Zhao, Benjamin Zi Hao, Kaafar, Dali, and Xiang, Yang (2021) The audio auditor: user-level membership inference in Internet of Things voice services. Proceedings on Privacy Enhancing Technologies, 2021 (1). pp. 209-228.
With the rapid development of deep learning techniques, the popularity of voice services implemented on various Internet of Things (IoT) devices is ever increasing. In this paper, we examine user-level membership inference in the problem space of voice services, by designing an audio auditor to verify whether a specific user had unwillingly contributed audio used to train an automatic speech recognition (ASR) model under strict black-box access. With user representation of the input audio data and their corresponding translated text, our trained auditor is effective in user-level audit. We also observe that the auditor trained on specific data can be generalized well regardless of the ASR model architecture. We validate the auditor on ASR models trained with LSTM, RNNs, and GRU algorithms on two state-of-the-art pipelines, the hybrid ASR system and the end-to-end ASR system. Finally, we conduct a real-world trial of our auditor on iPhone Siri, achieving an overall accuracy exceeding 80%. We hope the methodology developed in this paper and findings can inform privacy advocates to overhaul IoT privacy.
Chaturvedi, Iti, Chit, Lin Su, and Welsch, Roy E. (2021) Fuzzy aggregated topology evolution for cognitive multi-tasks. Cognitive Computation, 13. pp. 96-107.
Evolutionary optimization aims to tune the hyper-parameters during learning in a computationally fast manner. For optimization of multi-task problems evolution is done by creating a unified search space with a dimensionality that can include all the tasks. Multi-task evolution is achieved via selective imitation where two individuals with the same type of skill are encouraged to crossover. Due to the relatedness of the tasks, the resulting offspring may have a skill for a different task. In this way, we can simultaneously evolve a population where different individuals excel in different tasks. In this paper, we consider a type of evolution called Genetic Programming (GP) where the population of genes have a tree like structure and can be of different lengths and hence can naturally represent multiple tasks. Methods : We apply the model to multi-task neuroevolution that aims to determine the optimal hyper-parameters of a neural network such as number of nodes, learning rate and number of training epochs using evolution. Here each gene is encoded with the hyper parameters for a single neural network. Previously, optimization was done by enabling or disabling individual connections between neurons during evolution. This method is extremely slow and does not generalize well to new neural architectures such as Seq2Seq. To overcome this limitation, we follow a modular approach where each sub-tree in a GP can be a sub-neural architecture that is preserved during crossover across multiple tasks. Lastly, in order to leverage on the inter-task covariance for faster evolutionary search we project the features from both tasks to common space using fuzzy membership functions. Conclusions :The proposed model is used to determine the optimal topology of a feed-forward neural network for classification of emotions in physiological heart signals and also a Seq2seq chatbot that can converse with kindergarten children. We can outperform baselines by over $10\%$ in accuracy.
Suwanwiwat, Hemmaphan, Das, Abhijit, Saqib, Muhammad, and Pal, Umapada (2021) Benchmarked multi-script Thai scene text dataset and its multi-class detection solution. Multimedia Tools and Applications. (In Press)
Detecting text portion from scene images can be found to be one of the prevalent research topics. Text detection is considered challenging and non-interoperable since there could be multiple scripts in a scene image. Each of these scripts can have different properties, therefore, it is crucial to research the scene text detection based on the geographical location owing to different scripts. As no work on large-scale multi-script Thai scene text detection is found in the literature, the work conducted in this study focuses on multi-script text that includes Thai, English (Roman), Chinese or Chinese-like script, and Arabic. These scripts can generally be seen around Thailand. Thai script contains more consonants, vowels, and has numerals when compared to the Roman/ English script. Furthermore, the placement of letters, intonation marks, as well as vowels, are different from English or Chinese-like script. Hence, it could be considered challenging to detect and recognise the Thai text. This study proposed a multi-script dataset which includes the aforementioned scripts and numerals, along with a benchmarking employing Single Shot Multi-Box Detector (SSD) and Faster Regions with Convolutional Neural Networks (F-RCNN). The proposed dataset contains scene images which were recorded in Thailand. The dataset consists of 600 images, together with their manual detection annotation. This study also proposed a detection technique hypothesising a multiscript scene text detection problem as a multi-class detection problem which found to work more effective than legacy approaches. The experimental results from employing the proposed technique with the dataset achieved encouraging precision and recall rates when compared with such methods. The proposed dataset is available upon email request to the corresponding authors.
Belson, Bruce, Xiang, Wei, Holdsworth, Jason, and Philippa, Bronson (2021) C++20 coroutines on microcontrollers - what we learned. IEEE Embedded Systems Letters, 13 (1). pp. 9-12.
Coroutines will be added to C++ as part of the C++20 standard. Coroutines provide native language support for asynchronous operations. This study evaluates the C++ coroutine specification from the perspective of embedded systems developers. We find that the proposed language features are generally beneficial but that memory management of the coroutine state needs to be improved. Our experiments on an ARM Cortex-M4microcontroller evaluate the time and memory costs of coroutines in comparison with alternatives, and we show that context switching with coroutines is significantly faster than with thread-based real time operating systems. Furthermore, we analysed the impact of these language features on prototypical IoT sensor software. We find that the proposed language enhancements potentially bring significant benefits to programming in C++ for embedded computers, but that the implementation imposes constraints that may prevent its widespread acceptance among the embedded development community.
Hussain, Emtiaz, Hasan, Mahmudul, Rahman, Md Anisur, Lee, Ickjai, Tamanna, Tasmi, and Parvez, Mohammed Zavid (2021) CoroDet: a deep learning based classification for COVID-19 detection using chest X-ray images. Chaos Solitons and Fractals, 142. 110495.
Background and Objective The Coronavirus 2019, or shortly COVID-19, is a viral disease that causes serious pneumonia and impacts our different body parts from mild to severe depending on patient’s immune system. This infection was first reported in Wuhan city of China in December 2019, and afterward, it became a global pandemic spreading rapidly around the world. As the virus spreads through human to human contact, it has affected our lives in a devastating way, including the vigorous pressure on the public health system, the world economy, education sector, workplaces, and shopping malls. Preventing viral spreading requires early detection of positive cases and to treat infected patients as quickly as possible. The need for COVID-19 testing kits has increased, and many of the developing countries in the world are facing a shortage of testing kits as new cases are increasing day by day. In this situation, the recent research using radiology imaging (such as X-ray and CT scan) techniques can be proven helpful to detect COVID-19 as X-ray and CT scan images provide important information about the disease caused by COVID-19 virus. The latest data mining and machine learning techniques such as Convolutional Neural Network (CNN) can be applied along with X-ray and CT scan images of the lungs for the accurate and rapid detection of the disease, assisting in mitigating the problem of scarcity of testing kits. Methods Hence a novel CNN model called CoroDet for automatic detection of COVID-19 by using raw chest X-ray and CT scan images have been proposed in this study. CoroDet is developed to serve as an accurate diagnostics for 2 class classification (COVID and Normal), 3 class classification (COVID, Normal, and non-COVID pneumonia), and 4 class classification (COVID, Normal, non-COVID viral pneumonia, and non-COVID bacterial pneumonia). Results The performance of our proposed model was compared with ten existing techniques for COVID detection in terms of accuracy. A classification accuracy of 99.1% for 2 class classification, 94.2% for 3 class classification, and 91.2% for 4 class classification was produced by our proposed model, which is obviously better than the state-of-the-art-methods used for COVID-19 detection to the best of our knowledge. Moreover, the dataset with x-ray images that we prepared for the evaluation of our method is the largest datasets for COVID detection as far as our knowledge goes. Conclusion The experimental results of our proposed method CoroDet indicate the superiority of CoroDet over the existing state-of-the-art-methods. CoroDet may assist clinicians in making appropriate decisions for COVID-19 detection and may also mitigate the problem of scarcity of testing kits.
Abbasi, Umer F., Haider, Noman, Awang, Azlan, and Khan, Komal S. (2021) Cross-layer MAC/routing protocol for reliable communication in Internet of Health Things. IEEE Open Journal of the Communications Society, 2. pp. 199-216.
Internet of Health Things (IoHT) involves intelligent, low-powered, and miniaturized sensors nodes that measure physiological signals and report them to sink nodes over wireless links. IoHTs have a myriad of applications in e-health and personal health monitoring. Because of the data’s sensitivity measured by the nodes and power-constraints of the sensor nodes, reliability and energy-efficiency play a critical role in communication in IoHT. Reliability is degraded by the increase in packets’ loss due to inefficient MAC, routing protocols, environmental interference, and body shadowing. Simultaneously, inefficient node selection for routing may cause the depletion of critical nodes’ energy resources. Recent advancements in cross-layer protocol optimizations have proven their efficiency for packet-based Internet. In this article, we propose a MAC/Routing-based Cross-layer protocol for reliable communication while preserving the sensor nodes’ energy resource in IoHT. The proposed mechanism employs a timer-based strategy for relay node selection. The timer-based approach incorporates the metrics for residual energy and received signal strength indicator to preserve the vital underlying resources of critical sensors in IoHT. The proposed approach is also extended for multiple sensor networks, where sensor in vicinity are coordinating and cooperating for data forwarding. The performance of the proposed technique is evaluated for metrics like Packet Loss Probability, End-To-End delay, and energy used per data packet. Extensive simulation results show that the proposed technique improves the reliability and energy-efficiency compared to the Simple Opportunistic Routing protocol.
Abkenar, Forough Shirin, Khan, Komal S., and Jamalipour, Abbas (2021) Smart-cluster-based distributed caching for fog-IoT networks. IEEE Internet of Things Journal, 8 (5). pp. 3875-3884.
The idea of co-operative caching in a cache-enabled wireless network has gained much interest due to its services in terms of short service delay and improved transmission rate at the user end. In this article, we consider a co-operative caching mechanism for a fog-enabled Internet of Things (IoT) network. We propose a delay-minimizing policy for fog nodes (FNs), where the goal is to reduce the service delay for the IoT nodes, also known as terminal nodes (TNs). To this end, a novel smart clustering mechanism is proposed, aiming to efficiently assign FNs to the TNs while improving the network benefit by finding a tradeoff between the delay and the network’s energy consumption. We perform mathematical analysis and extensive simulations to highlight the potential gain and the proposed policy.
Khan, Komal S., Haider, Noman, and Jamalipour, Abbas (2021) Content caching and allocation in spatially correlated small cells. In: Proceedings of 2020 IEEE Global Communications Conference. From: GLOBECOM 2020: 2020 IEEE Global Communications Conference, 7-11 December 2020, Taipei, Taiwan.
Optimal content caching has been an important topic in dense small cell networks. Due to spatial and temporal variation in the popularity of data, most content requests cannot be directly served by the lower tiers of the network, increasing the chances of congestion at the core network. This raises the issues of what to cache and where to cache, especially for content with different popularity patterns in a given region. In this work, we focus on the issue of redundant caching of popular files in a cluster when designing a content allocation scheme. We formulate the considered problem as a stable matching theory problem, where the preferences of each cache entity are sent to the Macro Base Station (MBS) for stable matching. The caches share their request lists with the MBS, which subsequently uses Irving OneSided matching algorithm to generate a unique preference list for each caching entity such that every preference list is a representative of the popular data in that region. The algorithm achieves the desired goal of efficient caching with few but smartly planned repetitions of the popular files. Results show that our proposed scheme provides better performance in terms of cache hit ratio with increasing number of requests as compared to a popularity based scheme.
Zhu, Randy, Hardy, Dianna, and Myers, Trina (2021) Community led co-design of a social networking platform with adolescents with Autism Spectrum Disorder. Journal of Autism and Developmental Disorders. (In Press)
Adolescents with ASD face challenges in forming positive friendships due to their ASD condition. This study developed a social networking platform based on the needs of a small group of ASD adolescents and their parents/carers and examined what potential benefits such a system could provide. We conducted seven co-design workshops with six adolescents with ASD over eight months. The team exchanged ideas and communicated through group discussions and drawings. The findings suggest that: (1) participants demonstrated self-advocacy skills through an iterative co-design process; (2) a safe and familiar environment encourages active participation from adolescents with ASD as co-designers; and (3) parents, community group and fellow participants play a pivotal role in engaging adolescents with ASD on a social-network.
Possemiers, Aidan, and Lee, Ickjai (2021) Evaluating deep learned voice compression for use in video games. Expert Systems with Applications, 181. 115180.
In recent years video games have become one of the most popular entertainment mediums. This can partly be attributed to advances in computer graphics, and the availability, affordability and performance of hardware which have made modern video games the most realistic and immersive they have ever been. These games have a rich story with large open worlds, and a diverse cast of fully voice acted characters which also means that they take up large amounts of disk space. While a large percentage of this audio is sound effects and music, modern, character-driven, open world games contain multiple hours and many gigabytes of spoken voice audio. This paper examines how audio compression in video games poses distinctly different challenges than in telecommunications or archiving, the primary motivating factor that inspired audio compression systems currently used in video games. By evaluating new, deep learning based, methods of voice compression with video games in mind, we determine the criteria needed to be met for a new method to succeed current methods in measures of compression factor and quality at an acceptable level of algorithmic performance and what directions new research is needed to meet this criteria.
Liu, Sisi, and Lee, Ickjai (2021) Sequence encoding incorporated CNN model for Email document sentiment classification. Applied Soft Computing, 102. 107104.
Document sentiment classification is an area of study that has been developed for decades. However, sentiment classification of Email data is rather a specialized field that has not yet been thoroughly studied. Compared to typical social media and review data, Email data has characteristics of length variance, duplication caused by reply and forward messages, and implicitness in sentiment indicators. Due to these characteristics, existing techniques are incapable of fully capturing the complex syntactic and relational structure among words and phrases in Email documents. In this study, we introduce a dependency graph-based position encoding technique enhanced with weighted sentiment features, and incorporate it into the feature representation process. We combine encoded sentiment sequence features with traditional word embedding features as input for a revised deep CNN model for Email sentiment classification. Experiments are conducted on three sets of real Email data with adequate label conversion processes. Empirical results indicate that our proposed SSE-CNN model obtained the highest accuracy rate of 88.6%, 74.3% and 82.1% for three experimental Email datasets over other comparative state-of-the-art algorithms. Furthermore, our performance evaluations on the preprocessing and sentiment sequence encoding justify the effectiveness of Email preprocessing and sentiment sequence encoding with dependency-graph based position and SWN features on the improvement of Email document sentiment classification.
Chen, Qian, Chaturvedi, Iti, Ji, Shaoxiong, and Cambria, Erik (2021) Sequential fusion of facial appearance and dynamics for depression recognition. Pattern Recognition Letters, 150. pp. 115-121.
In mental health assessment, it is validated that nonverbal cues like facial expressions can be indicative of depressive disorders. Recently, the multimodal fusion of facial appearance and dynamics based on convolutional neural networks has demonstrated encouraging performance in depression analysis. However, correlation and complementarity between different visual modalities have not been well studied in prior methods. In this paper, we propose a sequential fusion method for facial depression recognition. For mining the correlated and complementary depression patterns in multimodal learning, a chained-fusion mechanism is introduced to jointly learn facial appearance and dynamics in a unified framework. We show that such sequential fusion can provide a probabilistic perspective of the model correlation and complementarity between two different data modalities for improved depression recognition. Results on a benchmark dataset show the superiority of our method against several state-of-the-art alternatives.
Sinclair, Jacob, Suwanwiwat, Hemmaphan, and Lee, Ickjai (2021) A hybrid data gathering and agent based cognitive architecture for realistic crowd simulations. Journal of Simulation. (In Press)
This paper proposes a realistic agent-based framework for crowd simulations that can encompass the input phase, the simulation process phase, and the output evaluation phase. In order to achieve this gathering, the three types of real-world data (physical, mental and visual) need to be considered. However, existing research has not used all the three data types to develop an agent-based framework since current data gathering methods are unable to collect all the three types. This paper introduces anew hybrid data gathering approach using a combination of virtual reality and questionnaires to gather all three data types. The data collected are incorporated into the simulation model to provide realism and flexibility. The performance of the framework is evaluated and benchmarked to prove the robustness and effectiveness of our framework. Various types of settings (self-set parameters and random parameters) are simulated to demonstrate that the framework can produce real-world like simulation.
Chaturvedi, Iti, Thapa, Kishor, Cavallari, Sandro, Cambria, Erik, and Welsch, Roy E. (2021) Predicting video engagement using heterogeneous DeepWalk. Neurocomputing, 465. pp. 228-237.
Video engagement is important in online advertisements where there is no physical interaction with the consumer. Engagement can be directly measured as the number of seconds after which a consumer skips an advertisement. In this paper, we propose a model to predict video engagement of an advertisement using only a few samples. This allows for early identification of poor quality videos. This can also help identify advertisement frauds where a robot runs fake videos behind the name of well-known brands. We leverage on the fact that videos with high engagement have similar viewing patterns over time. Hence, we can create a similarity network of videos and use a graph-embedding model called DeepWalk to cluster videos into significant communities. The learned embedding is able to identify viewing patterns of fraud and popular videos. In order to assess the impact of a video, we also consider how the view counts increase or decrease over time. This results in a heterogeneous graph where an edge indicates similar video engagement or history of view counts between two videos. Since it is difficult to find labelled samples for ‘fraud’ video, we leverage on a one-class model that can determine ‘fraud’ videos with outlier or abnormal behavior. The proposed model outperforms baselines in F-measure by over 20%.
Amos, Andrew James, Lee, Kyungmi, Sen Gupta, Tarun, and Malau-Aduli, Bunmi S. (2021) Systematic review of specialist selection methods with implications for diversity in the medical workforce. BMC Medical Education, 21. 448.
Purpose: There is growing concern that inequities in methods of selection into medical specialties reduce specialist cohort diversity, particularly where measures designed for another purpose are adapted for specialist selection, prioritising reliability over validity. This review examined how empirical measures affect the diversity of specialist selection. The goals were to summarise the groups for which evidence is available, evaluate evidence that measures prioritising reliability over validity contribute to under-representation, and identify novel measures or processes that address under-representation, in order to make recommendations on selection into medical specialties and research required to support diversity. Method: In 2020–1, the authors implemented a comprehensive search strategy across 4 electronic databases (Medline, PsychINFO, Scopus, ERIC) covering years 2000–2020, supplemented with hand-search of key journals and reference lists from identified studies. Articles were screened using explicit inclusion and exclusion criteria designed to focus on empirical measures used in medical specialty selection decisions. Results: Thirty-five articles were included from 1344 retrieved from databases and hand-searches. In order of prevalence these papers addressed the under-representation of women (21/35), international medical graduates (10/35), and race/ethnicity (9/35). Apart from well-powered studies of selection into general practice training in the UK, the literature was exploratory, retrospective, and relied upon convenience samples with limited follow-up. There was preliminary evidence that bias in the measures used for selection into training might contribute to under-representation of some groups. Conclusions: The review did not find convincing evidence that measures prioritising reliability drive under-representation of some groups in medical specialties, although this may be due to limited power analyses. In addition, the review did not identify novel specialist selection methods likely to improve diversity. Nevertheless, significant and divergent efforts are being made to promote the evolution of selection processes that draw on all the diverse qualities required for specialist practice serving diverse populations. More rigorous prospective research across different national frameworks will be needed to clarify whether eliminating or reducing the weighting of reliable pre-selection academic results in selection decisions will increase or decrease diversity, and whether drawing on a broader range of assessments can achieve both reliable and socially desirable outcomes.
Zhang, Jun, Pan, Lei, Han, Qing-Long, Chen, Chao, Wen, Sheng, and Xiang, Yang (2021) Deep learning based attack detection for cyber-physical system cybersecurity: a survey. IEEE - CAA Journal of Automatica Sinica. (In Press)
With the booming of cyber attacks and cyber criminals against cyber-physical systems (CPSs), detecting these attacks remains challenging. It might be the worst of times, but it might be the best of times because of opportunities brought by machine learning (ML), in particular deep learning (DL). In general, DL delivers superior performance to ML because of its layered setting and its effective algorithm for extract useful information from training data. DL models are adopted quickly to cyber attacks against CPS systems. In this survey, a holistic view of recently proposed DL solutions is provided to cyber attack detection in the CPS context. A six-step DL driven methodology is provided to summarize and analyze the surveyed literature for applying DL methods to detect cyber attacks against CPS systems. The methodology includes CPS scenario analysis, cyber attack identification, ML problem formulation, DL model customization, data acquisition for training, and performance evaluation. The reviewed works indicate great potential to detect cyber attacks against CPS through DL modules. Moreover, excellent performance is achieved partly because of several high-quality datasets that are readily available for public use. Furthermore, challenges, opportunities, and research trends are pointed out for future research.
Josi, Dario, Heg, Dik, Takeyama, Tomohiro, Bonfils, Danielle, Konovalov, Dmitry A., Frommen, Joachim G., Kohda, Masanori, and Taborsky, Michael (2021) Age‐ and sex‐dependent variation in relatedness corresponds to reproductive skew, territory inheritance and workload in cooperatively breeding cichlids. Evolution. (In Press)
Kin selection plays a major role in the evolution of cooperative systems. However, many social species exhibit complex within-group relatedness structures, where kin selection alone cannot explain the occurrence of cooperative behavior. Understanding such social structures is crucial to elucidate the evolution and maintenance of multi-layered cooperative societies. In lamprologine cichlids, intragroup relatedness seems to correlate positively with reproductive skew, suggesting that in this clade dominants tend to provide reproductive concessions to unrelated subordinates to secure their participation in brood care. We investigate how patterns of within-group relatedness covary with direct and indirect fitness benefits of cooperation in a highly social vertebrate, the cooperatively breeding, polygynous lamprologine cichlid Neolamprologus savoryi. Behavioral and genetic data from 43 groups containing 578 individuals show that groups are socially and genetically structured into subgroups. About 17% of group members were unrelated immigrants, and average relatedness between breeders and brood care helpers declined with helper age due to group membership dynamics. Hence the relative importance of direct and indirect fitness benefits of cooperation depends on helper age. Our findings highlight how both direct and indirect fitness benefits of cooperation and group membership can select for cooperative behavior in societies comprising complex social and relatedness structures.
Liu, Hongbin, and Lee, Ickjai (2020) Bridging the gap between training and inference for spatio-temporal forecasting. In: Frontiers in Artificial Intelligence and Applications (325) pp. 1316-1323. From: ECAI 2020: 24th European Conference on Artificial Intelligence, 29 August - 8 September 2020, Santiago, Spain.
Spatio-temporal sequence forecasting is one of the fundamental tasks in spatio-temporal data mining. It facilitates many real world applications such as precipitation now casting, city wide crowd flow prediction and air pollution forecasting. Recently, a few Seq2Seq based approaches have been proposed, but one of the drawbacks of Seq2Seq models is that, small errors can accumulate quickly along the generated sequence at the inference stage due to the different distributions of training and inference phase. That is because Seq2Seq models minimise single step errors only during training, however the entire sequence has to be generated during the inference phase which generates a discrepancy between training and inference. In this work, we propose a novel curriculum learning based strategy named Temporal Progressive Growing Sampling to effectively bridge the gap between training and inference for spatio-temporal sequence forecasting, by transformin the training process from a fully-supervised manner which utilises all available previous groundtruth values to a less-supervised manner which replaces some of theground-truth context with generated predictions. To do that we sam-ple the target sequence from midway outputs from intermediate models trained with bigger timescales through a carefully designed decaying strategy. Experimental results demonstrate that our proposed method better models long term dependencies and outperforms baseline approaches on two competitive datasets.