Recent publications in Information Technology
Miao, Yuantian, Minhui, Xue, Chen, Chao, Pan, Lei, Zhang, Jun, Zhao, Benjamin Zi Hao, Kaafar, Dali, and Xiang, Yang (2021) The audio auditor: user-level membership inference in Internet of Things voice services. Proceedings on Privacy Enhancing Technologies, 2021 (1). pp. 209-228.
With the rapid development of deep learning techniques, the popularity of voice services implemented on various Internet of Things (IoT) devices is ever increasing. In this paper, we examine user-level membership inference in the problem space of voice services, by designing an audio auditor to verify whether a specific user had unwillingly contributed audio used to train an automatic speech recognition (ASR) model under strict black-box access. With user representation of the input audio data and their corresponding translated text, our trained auditor is effective in user-level audit. We also observe that the auditor trained on specific data can be generalized well regardless of the ASR model architecture. We validate the auditor on ASR models trained with LSTM, RNNs, and GRU algorithms on two state-of-the-art pipelines, the hybrid ASR system and the end-to-end ASR system. Finally, we conduct a real-world trial of our auditor on iPhone Siri, achieving an overall accuracy exceeding 80%. We hope the methodology developed in this paper and findings can inform privacy advocates to overhaul IoT privacy.
Chaturvedi, Iti, Chit, Lin Su, and Welsch, Roy E. (2021) Fuzzy aggregated topology evolution for cognitive multi-tasks. Cognitive Computation, 13. pp. 96-107.
Evolutionary optimization aims to tune the hyper-parameters during learning in a computationally fast manner. For optimization of multi-task problems evolution is done by creating a unified search space with a dimensionality that can include all the tasks. Multi-task evolution is achieved via selective imitation where two individuals with the same type of skill are encouraged to crossover. Due to the relatedness of the tasks, the resulting offspring may have a skill for a different task. In this way, we can simultaneously evolve a population where different individuals excel in different tasks. In this paper, we consider a type of evolution called Genetic Programming (GP) where the population of genes have a tree like structure and can be of different lengths and hence can naturally represent multiple tasks. Methods : We apply the model to multi-task neuroevolution that aims to determine the optimal hyper-parameters of a neural network such as number of nodes, learning rate and number of training epochs using evolution. Here each gene is encoded with the hyper parameters for a single neural network. Previously, optimization was done by enabling or disabling individual connections between neurons during evolution. This method is extremely slow and does not generalize well to new neural architectures such as Seq2Seq. To overcome this limitation, we follow a modular approach where each sub-tree in a GP can be a sub-neural architecture that is preserved during crossover across multiple tasks. Lastly, in order to leverage on the inter-task covariance for faster evolutionary search we project the features from both tasks to common space using fuzzy membership functions. Conclusions :The proposed model is used to determine the optimal topology of a feed-forward neural network for classification of emotions in physiological heart signals and also a Seq2seq chatbot that can converse with kindergarten children. We can outperform baselines by over $10\%$ in accuracy.
Suwanwiwat, Hemmaphan, Das, Abhijit, Saqib, Muhammad, and Pal, Umapada (2021) Benchmarked multi-script Thai scene text dataset and its multi-class detection solution. Multimedia Tools and Applications. (In Press)
Detecting text portion from scene images can be found to be one of the prevalent research topics. Text detection is considered challenging and non-interoperable since there could be multiple scripts in a scene image. Each of these scripts can have different properties, therefore, it is crucial to research the scene text detection based on the geographical location owing to different scripts. As no work on large-scale multi-script Thai scene text detection is found in the literature, the work conducted in this study focuses on multi-script text that includes Thai, English (Roman), Chinese or Chinese-like script, and Arabic. These scripts can generally be seen around Thailand. Thai script contains more consonants, vowels, and has numerals when compared to the Roman/ English script. Furthermore, the placement of letters, intonation marks, as well as vowels, are different from English or Chinese-like script. Hence, it could be considered challenging to detect and recognise the Thai text. This study proposed a multi-script dataset which includes the aforementioned scripts and numerals, along with a benchmarking employing Single Shot Multi-Box Detector (SSD) and Faster Regions with Convolutional Neural Networks (F-RCNN). The proposed dataset contains scene images which were recorded in Thailand. The dataset consists of 600 images, together with their manual detection annotation. This study also proposed a detection technique hypothesising a multiscript scene text detection problem as a multi-class detection problem which found to work more effective than legacy approaches. The experimental results from employing the proposed technique with the dataset achieved encouraging precision and recall rates when compared with such methods. The proposed dataset is available upon email request to the corresponding authors.
Hussain, Emtiaz, Hasan, Mahmudul, Rahman, Md Anisur, Lee, Ickjai, Tamanna, Tasmi, and Parvez, Mohammed Zavid (2021) CoroDet: a deep learning based classification for COVID-19 detection using chest X-ray images. Chaos Solitons and Fractals, 142. 110495.
Background and Objective The Coronavirus 2019, or shortly COVID-19, is a viral disease that causes serious pneumonia and impacts our different body parts from mild to severe depending on patient’s immune system. This infection was first reported in Wuhan city of China in December 2019, and afterward, it became a global pandemic spreading rapidly around the world. As the virus spreads through human to human contact, it has affected our lives in a devastating way, including the vigorous pressure on the public health system, the world economy, education sector, workplaces, and shopping malls. Preventing viral spreading requires early detection of positive cases and to treat infected patients as quickly as possible. The need for COVID-19 testing kits has increased, and many of the developing countries in the world are facing a shortage of testing kits as new cases are increasing day by day. In this situation, the recent research using radiology imaging (such as X-ray and CT scan) techniques can be proven helpful to detect COVID-19 as X-ray and CT scan images provide important information about the disease caused by COVID-19 virus. The latest data mining and machine learning techniques such as Convolutional Neural Network (CNN) can be applied along with X-ray and CT scan images of the lungs for the accurate and rapid detection of the disease, assisting in mitigating the problem of scarcity of testing kits. Methods Hence a novel CNN model called CoroDet for automatic detection of COVID-19 by using raw chest X-ray and CT scan images have been proposed in this study. CoroDet is developed to serve as an accurate diagnostics for 2 class classification (COVID and Normal), 3 class classification (COVID, Normal, and non-COVID pneumonia), and 4 class classification (COVID, Normal, non-COVID viral pneumonia, and non-COVID bacterial pneumonia). Results The performance of our proposed model was compared with ten existing techniques for COVID detection in terms of accuracy. A classification accuracy of 99.1% for 2 class classification, 94.2% for 3 class classification, and 91.2% for 4 class classification was produced by our proposed model, which is obviously better than the state-of-the-art-methods used for COVID-19 detection to the best of our knowledge. Moreover, the dataset with x-ray images that we prepared for the evaluation of our method is the largest datasets for COVID detection as far as our knowledge goes. Conclusion The experimental results of our proposed method CoroDet indicate the superiority of CoroDet over the existing state-of-the-art-methods. CoroDet may assist clinicians in making appropriate decisions for COVID-19 detection and may also mitigate the problem of scarcity of testing kits.
Abbasi, Umer F., Haider, Noman, Awang, Azlan, and Khan, Komal S. (2021) Cross-layer MAC/routing protocol for reliable communication in Internet of Health Things. IEEE Open Journal of the Communications Society, 2. pp. 199-216.
Internet of Health Things (IoHT) involves intelligent, low-powered, and miniaturized sensors nodes that measure physiological signals and report them to sink nodes over wireless links. IoHTs have a myriad of applications in e-health and personal health monitoring. Because of the data’s sensitivity measured by the nodes and power-constraints of the sensor nodes, reliability and energy-efficiency play a critical role in communication in IoHT. Reliability is degraded by the increase in packets’ loss due to inefficient MAC, routing protocols, environmental interference, and body shadowing. Simultaneously, inefficient node selection for routing may cause the depletion of critical nodes’ energy resources. Recent advancements in cross-layer protocol optimizations have proven their efficiency for packet-based Internet. In this article, we propose a MAC/Routing-based Cross-layer protocol for reliable communication while preserving the sensor nodes’ energy resource in IoHT. The proposed mechanism employs a timer-based strategy for relay node selection. The timer-based approach incorporates the metrics for residual energy and received signal strength indicator to preserve the vital underlying resources of critical sensors in IoHT. The proposed approach is also extended for multiple sensor networks, where sensor in vicinity are coordinating and cooperating for data forwarding. The performance of the proposed technique is evaluated for metrics like Packet Loss Probability, End-To-End delay, and energy used per data packet. Extensive simulation results show that the proposed technique improves the reliability and energy-efficiency compared to the Simple Opportunistic Routing protocol.
Liu, Hongbin, and Lee, Ickjai (2020) Bridging the gap between training and inference for spatio-temporal forecasting. In: Frontiers in Artificial Intelligence and Applications (325) pp. 1316-1323. From: ECAI 2020: 24th European Conference on Artificial Intelligence, 29 August - 8 September 2020, Santiago, Spain.
Spatio-temporal sequence forecasting is one of the fundamental tasks in spatio-temporal data mining. It facilitates many real world applications such as precipitation now casting, city wide crowd flow prediction and air pollution forecasting. Recently, a few Seq2Seq based approaches have been proposed, but one of the drawbacks of Seq2Seq models is that, small errors can accumulate quickly along the generated sequence at the inference stage due to the different distributions of training and inference phase. That is because Seq2Seq models minimise single step errors only during training, however the entire sequence has to be generated during the inference phase which generates a discrepancy between training and inference. In this work, we propose a novel curriculum learning based strategy named Temporal Progressive Growing Sampling to effectively bridge the gap between training and inference for spatio-temporal sequence forecasting, by transformin the training process from a fully-supervised manner which utilises all available previous groundtruth values to a less-supervised manner which replaces some of theground-truth context with generated predictions. To do that we sam-ple the target sequence from midway outputs from intermediate models trained with bigger timescales through a carefully designed decaying strategy. Experimental results demonstrate that our proposed method better models long term dependencies and outperforms baseline approaches on two competitive datasets.
Chaturvedi, Iti, Cambria, Erik, Cavallari, Sandro, and Welsch, Roy E. (2020) Genetic programming for domain adaptation in product reviews. In: Proceedings of the IEEE Congress on Evolutionary Computation. From: CEC 2020: IEEE Congress on Evolutionary Computation, 19-24 July 2020, Glasgow, UK.
There is a large variety of products sold online and the websites are in several languages. Hence, it is desirable to train a model that can predict sentiments in different domains simultaneously. Previous authors have used deep learning to extract features from multiple domains. Here, each word is represented by a vector that is determined using co-occurrence data. Such a model requires that all sentences have the same length resulting in low accuracy. To overcome this challenge, we model the features in each sentence using a variable length tree called a Genetic Program. The polarity of clauses can be represented using mathematical operators such as '+' or '-' at internal nodes in the tree. The proposed model is evaluated on Amazon product reviews for different products and in different languages. We are able to outperform the accuracy of baseline multi-domain models in the range of 5-20%.
McNabb, Tim, Wicking, Kristin, Myers, Trina, and Lei, Lei (2020) Optimizing clinical spatial resources with IoT. In: Proceedings of the Australasian Computer Science Week Multiconference. 30. From: ACSW 2020: Australasian Computer Science Week Multiconference, 3-7 February 2020, Melbourne, VIC, Australia.
The cost of healthcare is significant within Australia where $185.4 billion was spent in the 2017-2018 financial year. This expenditure represents 10% of Australian GDP and grew by a ten-year annual average of 3.9% to 2015-16 financial year. There is limited ability to demonstrate the efficient use of existing healthcare spaces while Capital Works expenditure continues to grow. Executive decision-makers and front-line managers currently lack tools to optimize space utilization, as current techniques are either burdensome, costly or challenging to implement at scale. There is related literature that demonstrates the feasibility of using Internet of Things (IoT) to understand the utilization of non-clinical healthcare spaces. However, these technologies have not previously been validated as effective in an operational clinical setting. This paper presents findings from the introduction of an IoT-based space management system applied to a multi-disciplinary outpatient clinic in an operational public hospital fulltime across a six-month time period. Preliminary data validates IoT technology is appropriate for operational healthcare environments and is superior when compared to manual data collection methods.
Bermingham, Luke, and Lee, Ickjai (2020) Mining distinct and contiguous sequential patterns from large vehicle trajectories. Knowledge Based Systems, 189. 105076.
We focus on the problem of using contiguous SPM to extract succinct, redundancy controlled patterns from large vehicle trajectories. Although there exist several techniques to reduce the contiguous sequential pattern output such as closed and max SPM, they still produce massive redundant pattern outputs when the input sequence database is sufficiently large and homogeneous — as is often the case for vehicle trajectories. Therefore, in this work we propose DC-SPAN: a distinct contiguous SPM algorithm. DC-SPAN mines a set of sequential patterns where the maximum redundancy of the pattern output is controlled by a user-specified parameter. Through various experiments using real world trajectory datasets we show DC-SPAN effectively controls the redundancy of the pattern output with trade-offs in pattern distinctness. Additionally, our experiments also indicate that DC-SPAN efficiently computes these patterns, incurring only a marginal running time cost over existing state-of-the-art contiguous SPM approaches. Lastly, due to the less redundant and more succinct pattern output we also briefly explore visualisation as a useful technique to interpret the discovered vehicle routes.
Konovalov, Dmitry A., Swinhoe, Natalie, Efremova, Dina B., Birtles, R. Alastair, Kusetic, Martha, Hillcoat, Suzanne, Curnock, Matthew I., Williams, Genevieve, and Sheaves, Marcus (2020) Automatic sorting of Dwarf Minke Whale underwater images. Information, 11 (4). 200.
Abstract: Apredictableaggregationofdwarfminkewhales(Balaenopteraacutorostratasubspecies) occurs annually in the Australian waters of the northern Great Barrier Reef in June–July, which has been the subject of a long-term photo-identification study. Researchers from the Minke Whale Project (MWP) at James Cook University collect large volumes of underwater digital imagery each season (e.g., 1.8TB in 2018), much of which is contributed by citizen scientists. Manual processing and analysis of this quantity of data had become infeasible, and Convolutional Neural Networks (CNNs) offered a potential solution. Our study sought to design and train a CNN that could detect whales from video footage in complex near-surface underwater surroundings and differentiate the whales from people, boats and recreational gear. We modified known classification CNNs to localise whales in video frames and digital still images. The required high classification accuracy was achieved by discovering an effective negative-labelling training technique. This resulted in a less than 1% false-positive classification rate and below 0.1% false-negative rate. The final operation-version CNN-pipeline processed all videos (with the interval of 10 frames) in approximately four days (running on two GPUs) delivering 1.95 million sorted images.
Liu, Sisi, Lee, Kyungmi, and Lee, Ickjai (2020) Document-level multi-topic sentiment classification of email data with BiLSTM and data augmentation. Knowledge Based Systems, 197. 105918.
Email data has unique characteristics, involving multiple topics, lengthy replies, formal language, high variance in length, high duplication, anomalies, and indirect relationships that distinguish it from other social media data. In order to better model Email documents and to capture complex sentiment structures in the content, we develop a framework for document-level multi-topic sentiment classification of Email data. Note that, a large volume of labeled Email data is rarely publicly available. We introduce an optional data augmentation process to increase the size of datasets with synthetically labeled data to reduce the probability of overfitting and underfitting during the training process. To generate segments with topic embeddings and topic weighting vectors as inputs for our proposed model, we apply both latent Dirichlet allocation topic modeling and semantic text segmentation to post-process Email documents. Empirical results obtained with multiple sets of experiments, including performance comparison against various state-of-the-art algorithms with and without data augmentation and diverse parameter settings, are analyzed to demonstrate the effectiveness of our proposed framework.
Madanayake, Adikarige, Sankupellay, Mangalam, and Lee, Ickjai (2020) Profiling the natural environment using acoustics: long-term environment monitoring through cluster structure. In: Proceedings of the 3rd International Conference on Software Engineering and Information Management. pp. 74-78. From: ICSIM'20: 3rd International Conference on Software Engineering and Information Management, 12-15 January 2020, Sydney, NSW, Australia.
Eco-acoustic recordings of the natural environment are becoming an increasingly important technique for ecologists to monitor and interpret long-term terrestrial ecosystems. Visualisation has been a popular approach to analyse short-term eco-acoustic recordings, but it is practically not feasible for long-term monitoring. Unsupervised machine learning could be a solid candidate to find clustering structures within this long-term eco-acoustic data, and this paper investigates if unsupervised machine learning is able to find any clustering structural difference around an important environmental event, in particular with k-means clustering. Experimental results reveal that there are clear clustering structural changes in general geophony and biophony sounds before and after a bushfire in our study region which indicates that clustering approaches could be used to identify important environmental events.
Schoenhoff, Kurt, Holdsworth, Jason, and Lee, Ickjai (2020) Efficient semantic segmentation through dense upscaling convolutions. In: Proceedings of the 3rd International Conference on Software Engineering and Information Management. pp. 244-248. From: ICSIM'20: 3rd International Conference on Software Engineering and Information Management, 12-15 January 2020, Sydney, NSW, Australia.
Semantic segmentation is the classification of each pixel in an image to an object, the resultant pixel map has significant usage in many fields. Some fields where this technology is being actively researched is in medicine, agriculture and robotics. For uses where the resources or power requirements are restricted such as robotics or where large amounts of images are required to process, efficiency can be key to the feasibility of a technique. Other applications that require real-time processing have a need for fast and efficient methods, especially where collision avoidance or safety may be involved. We take a combination of existing semantic segmentation methods and improve upon the efficiency by the replacement of the decoder network in ERFNet with a method based upon Dense Upscaling Convolutions, we then add a novel layer that allows the fine tuning of the decoder channel depth and therefore the efficiency of the network. Our proposed modification achieves 20-30% improvement in efficiency on moderate hardware (Nvidia GTX 960) over the original ERFNET and an additional 10% efficiency over the original Dense Upscaling Convolution. We perform a series of experiments to determine viable hyperparameters for the modification and measure the efficiency and accuracy over a range of image sizes, proving the viability of our approach.
Yang, Hui, Ji, Shaobo, Chaturvedi, Iti, Xia, Huarong, Wang, Ting, Chen, Geng, Pan, Liang, Wan, Changjin, Qi, Dianpeng, Ong, Yew Soon, and Chen, Xiaodong (2020) Adhesive biocomposite electrodes on sweaty skin for long-term continuous electrophysiological monitoring. ACS Materials Letters, 2 (5). pp. 478-484.
Noninvasive on-skin electrodes record the electrical potential changes from human skin, which reflect body condition and are applied for healthcare, sports management, and modern lifestyle. However, current on-skin electrodes have poor conformal properties under sweaty condition in real-life because of decreased electrode-skin adhesion with sweat film at the interface. Here, we fabricated biocomposite electrodes based on silk fibroin (SF) through interfacial polymerization, which is applicable on sweaty skin. Interfacial polymerized conductive polypyrrole (PPy) and SF are structurally interlocked and endow the whole electrode with uniform stretchability. Existence of water results in similar Young's modulus of SF to the skin and enhanced interfacial adhesion. It keeps the electrodes conformal to skin under sweaty condition and allows reliable collection of ambulatory electrophysiological signals during sports and sweating. Wearable devices with these electrodes were used to acquire continuous and stable real-time electrocardiography (ECG) signals during running for 2 h. The collected signals can provide information for sports management and are also analyzed by artificial intelligence to show their potential for intelligent human emotion monitoring. Our strategy provides opportunities to record long-term continuous electrophysiological signals in real-life conditions for various smart monitoring systems.
Valdivia, Ana, Martínez-Cámara, Eugenio, Chaturvedi, Iti, Luzón, M. Victoria, Cambria, Erik, Ong, Yew Soon, and Herrera, Francisco (2020) What do people think about this monument? Understanding negative reviews via deep learning, clustering and descriptive rules. Journal of Ambient Intelligence and Humanized Computing, 11. pp. 39-52.
Aspect-based sentiment analysis enables the extraction of fine-grained information, as it connects specific aspects that appear in reviews with a polarity. Although we detect that the information from these algorithms is very accurate at local level, it does not contribute to obtain an overall understanding of reviews. To fill this gap, we propose a methodology to portray opinions through the most relevant associations between aspects and polarities. Our methodology combines three off-the-shelf algorithms: (1) deep learning for extracting aspects, (2) clustering for joining together similar aspects, and (3) subgroup discovery for obtaining descriptive rules that summarize the polarity information of set of reviews. Concretely, we aim at depicting negative opinions from three cultural monuments in order to detect those features that need to be improved. Experimental results show that our approach clearly gives an overview of negative aspects, therefore it will be able to attain a better comprehension of opinions.
The Digital Gaming Handbook covers the state-of-the-art in video and digital game research and development, from traditional to emerging elements of gaming across multiple disciplines. Chapters are presented with applicability across all gaming platforms over a broad range of topics, from game content creation through gameplay at a level accessible for the professional game developer while being deep enough to provide a valuable reference of the state-of-the-art research in this field.
Liu, Hong-Bin, and Lee, Ickjai (2020) Towards realistic meteorological predictive learning using conditional GAN. IEEE Access, 8. pp. 93179-93186.
Meteorological imagery prediction is an important and challenging problem for weather forecasting. It can also be seen as a video frame prediction problem that estimates future frames based on observed meteorological imageries. Despite it is a widely-investigated problem, it is still far from being solved. Current state-of-the-art deep learning based approaches mainly optimise the mean square error loss resulting in blurry predictions. We address this problem by introducing a Meteorological Predictive Learning GAN model (in short MPL-GAN) that utilises the conditional GAN along with the predictive learning module in order to handle the uncertainty in future frame prediction. Experiments on a real-world dataset demonstrate the superior performance of our proposed model. Our proposed model is able to map the blurry predictions produced by traditional mean square error loss based predictive learning methods back to their original data distributions, hence it is able to improve and sharpen the prediction. In particular, our MPL-GAN achieves an average sharpness of 52.82, which is 14% better than the baseline method. Furthermore, our model correctly detects the meteorological movement patterns that traditional unconditional GANs fail to do.
Wang, Ye, Lee, Kyungmi, and Lee, Ickjai (2020) Visual analytical tools for multivariate higher order information for emergency management. Journal of Visualization, 23. pp. 721-743.
Higher order information of moving objects is of great importance for processing in-case scenarios in emergency management. Multivariate higher order information management is a crucial key to the success of emergency management since emergency management involves developing plans with a given set of multiple resources. Past studies focus on univariate higher order information limiting the scope of applicability and usability. This paper proposes a set of visual analytical approaches supporting multivariate higher order information for dynamically moving disasters. We introduce a robust Voronoi based data structure supporting multivariate datasets and dynamic disasters, and propose visual analytical approaches for effective emergency management. The proposed visual analytical suite facilitates interactivity and enables users to explore in-case scenarios with multivarite datasets and dynamic disasters. A case study with real datasets is given to explain the applicability, usability and practicability of the proposed system.
Saleh, Alzayat, Laradji, Issam H., Konovalov, Dmitry A., Bradley, Michael, Vazquez, David, and Sheaves, Marcus (2020) A realistic fish-habitat dataset to evaluate algorithms for underwater visual analysis. Scientific Reports, 10. 14671.
Visual analysis of complex fish habitats is an important step towards sustainable fisheries for human consumption and environmental protection. Deep Learning methods have shown great promise for scene analysis when trained on large-scale datasets. However, current datasets for fish analysis tend to focus on the classification task within constrained, plain environments which do not capture the complexity of underwater fish habitats. To address this limitation, we present DeepFish as a benchmark suite with a large-scale dataset to train and test methods for several computer vision tasks. The dataset consists of approximately 40 thousand images collected underwater from 20 habitats in the marine-environments of tropical Australia. The dataset originally contained only classification labels. Thus, we collected point-level and segmentation labels to have a more comprehensive fish analysis benchmark. These labels enable models to learn to automatically monitor fish count, identify their locations, and estimate their sizes. Our experiments provide an in-depth analysis of the dataset characteristics, and the performance evaluation of several state-of-the-art approaches based on our benchmark. Although models pre-trained on ImageNet have successfully performed on this benchmark, there is still room for improvement. Therefore, this benchmark serves as a testbed to motivate further development in this challenging domain of underwater computer vision.
Sheaves, Marcus, Bradley, Michael, Herrera, Cesar, Mattone, Carlo, Lennard, Caitlin, Sheaves, Janine, and Konovalov, Dmitry A. (2020) Optimizing video sampling for juvenile fish surveys: using deep learning and evaluation of assumptions to produce critical fisheries parameters. Fish and Fisheries. (In Press)
The limitations imposed by traditional sampling methods have restricted the acquisi- tion of data on key fisheries parameters. This is particularly the case for juveniles because most traditional gear explicitly avoids the capture of juveniles, and the juveniles of many species use habitats in which traditional gear is ineffective. The increasing availability and sophistication of Remote Underwater Video Techniques (RUVs) such as Baited Remote Underwater Video, Unbaited Remote Underwater Video and Remotely Operated Underwater Vehicles offer the opportunity of over- coming some of the key limitations of more traditional approaches. However, RUV techniques come with their own set of limitations that need to be addressed before they can fully realize their potential to shed new light on the early life history of fish. We evaluate key strengths and limitations of RUV techniques, and how these can be overcome, in particular by employing bespoke computer vision Artificial Intelligence approaches, such as Deep Learning in its Convolutional Neural Networks instantia- tion. In addition, we investigate residual issues that remain to be solved despite the advances made possible by new technology, and the role of explicitly identifying and evaluating key residual assumptions.