
Hi, I am Zabir, a PhD fellow at the University of California, Riverside, where my research centers on large language model reasoning, information retrieval, and health informatics. I’m passionate about developing intelligent systems that meaningfully connect cutting-edge AI capabilities with real-world impact.
Before transitioning into academia, I spent over six years in industry as an applied researcher, contributing to projects at organizations such as Amazon.
Large Language Model Reasoning Information Retrieval Medical AI Multi-modal Reasoning.
The deployment of large language models (LLMs) within the healthcare sector has sparked both enthusiasm and apprehension. These models exhibit the remarkable ability to provide proficient responses to free-text queries, demonstrating a nuanced understanding of professional medical knowledge. This comprehensive survey delves into the functionalities of existing LLMs designed for healthcare applications and elucidates the trajectory of their development, starting with traditional Pretrained Language Models (PLMs) and then moving to the present state of LLMs in the healthcare sector. First, we explore the potential of LLMs to amplify the efficiency and effectiveness of diverse healthcare applications, particularly focusing on clinical language understanding tasks. These tasks encompass a wide spectrum, ranging from named entity recognition and relation extraction to natural language inference, multimodal medical applications, document classification, and question-answering. Additionally, we conduct an extensive comparison of the most recent state-of-the-art LLMs in the healthcare domain, while also assessing the utilization of various open-source LLMs and highlighting their significance in healthcare applications. Furthermore, we present the essential performance metrics employed to evaluate LLMs in the biomedical domain, shedding light on their effectiveness and limitations. Finally, we summarize the prominent challenges and constraints faced by large language models in the healthcare sector by offering a holistic perspective on their potential benefits and shortcomings. This review provides a comprehensive exploration of the current landscape of LLMs in healthcare, addressing their role in transforming medical applications and the areas that warrant further research and development.
Motor imagery EEG classification is a crucial task in the Brain Computer Interface (BCI) system. In this paper, we propose a Motor Imagery EEG signal classification framework based on Convolutional Neural Network (CNN) to enhance the classification accuracy. For the classification of 2 class motor imagery signals, firstly we apply Short Time Fourier Transform (STFT) on EEG time series signals to transform signals into 2D images. Next, we train our proposed multi-input convolutional neural network with feature concatenation to achieve robust classification from the images. Batch normalization is added to regularize the network. Data augmentation is used to increase samples and as a secondary regularizer. A three input CNN was proposed to feed the three channel EEG signals. In our work, the dataset of EEG signal collected from BCI Competition IV dataset 2b and dataset III of BCI Competition II were used. Experimental results show that average classification accuracy achieved was 89.19% on dataset 2b, whereas our model achieved the best performance of 97.7% accuracy for subject 7 on dataset III. We also extended our approach and explored a transfer learning based scheme with pre-trained ResNet -50 model which showed promising result. Overall, our approach showed competitive performance when compared with other methods.
As the global deployment of Large Language Models (LLMs) increases, the demand for multilingual capabilities becomes more crucial. While many LLMs excel in real-time applications for high-resource languages, few are tailored specifically for low-resource languages. The limited availability of text corpora for low-resource languages, coupled with their minimal utilization during LLM training, hampers the models’ ability to perform effectively in real-time applications. Additionally, evaluations of LLMs are significantly less extensive for low-resource languages. This study offers a comprehensive evaluation of both open-source and closed-source multilingual LLMs focused on low-resource language like Bengali, a language that remains notably underrepresented in computational linguistics. Despite the limited number of pre-trained models exclusively on Bengali, we assess the performance of six prominent LLMs, i.e., three closed-source (GPT-3.5, GPT-4o, Gemini) and three open-source (Aya 101, BLOOM, LLaMA) across key natural language processing (NLP) tasks, including text classification, sentiment analysis, summarization, and question answering. These tasks were evaluated using three prompting techniques: Zero-Shot, Few-Shot, and Chain-of-Thought (CoT). This study found that the default hyperparameters of these pre-trained models, such as temperature, maximum token limit, and the number of few-shot examples, did not yield optimal outcomes and led to hallucination issues in many instances. To address these challenges, ablation studies were conducted on key hyperparameters, particularly temperature and the number of shots, to optimize Few-Shot learning and enhance model performance. The focus of this research is on understanding how these LLMs adapt to low-resource downstream tasks, emphasizing their linguistic flexibility and contextual understanding. Experimental results demonstrated that the closed-source GPT-4o model, utilizing Few-Shot learning and Chain-of-Thought prompting, achieved the highest performance across multiple tasks: an F1 score of 84.54% for text classification, 99.00% for sentiment analysis, a F1bertscore of 72.87% for summarization, and 58.22% for question answering. For transparency and reproducibility, all methodologies and code from this study are available on our GitHub repository: https://github.com/zabir-nabil/bangla-multilingual-llm-eval.
Classification of ECG signals is of great importance for the detection of cardiac dysfunction. Recurrent Neural Network family has been greatly successful for time series related problems. In this paper, we compare different RNN variants and propose dot Residual LSTM network for ECG classification. Here, we use extracted features both from time and frequency domain with the network to improve the classification performance. A data generation scheme was developed with Conditional variational autoencoder (CVAE) and LSTM to increase training samples. A comparative analysis was studied to assess the performance of the model. The proposed dot Res LSTM achieved maximum accuracy of 80.00% and F1 score of 0.85. Furthermore, the model achieved maximum F1 score of 0.87 with augmented data. The study is expected to be useful in automatic cardiac diagnosis research.
Idiopathic pulmonary fibrosis (IPF) is a restrictive interstitial lung disease that causes lung function decline by lung tissue scarring. Although lung function decline is assessed by the forced vital capacity (FVC), determining the accurate progression of IPF remains a challenge. To address this challenge, we proposed Fibro-CoSANet, a novel end-to-end multi-modal learning-based approach, to predict the FVC decline. Fibro-CoSANet utilized CT images and demographic information in convolutional neural network frameworks with a stacked attention layer. Extensive experiments on the OSIC Pulmonary Fibrosis Progression Dataset demonstrated the superiority of our proposed Fibro-CoSANet by achieving the new state-of-the-art modified Laplace Log-Likelihood score of -6.68. This network may benefit research areas concerned with designing networks to improve the prognostic accuracy of IPF. The source-code for Fibro-CoSANet is available at: \url{https://github.com/zabir-nabil/Fibro-CoSANet}.
This work presents an Xception ensemble network based Bangla handwritten digit classification scheme. Bangla handwritten digits are challenging to recognize due to some strong similar features between different classes. In this study, heavy augmentation has been used in the training set along with dropout in the model to avoid overfitting. Competitive performance has been achieved with optimized number of model parameters. An ensemble of three Xception networks was evaluated on a hidden test set where it showed promising performance of 96.69% accuracy, F1 score of 97.14%.
Information collection from remote location is very important for several tasks such as temperate monitoring, air quality investigation, and wartime surveillance. Wireless sensor network is the first choice to complete these types of tasks. Basically, information prediction scheme is an important feature in any sensor nodes. The efficiency of the sensor network can be improved to large extent with a suitable information prediction scheme. Previously, there were several efforts to resolve this problem, but their accuracy is decreased as the prediction threshold reduces to a small value. Our proposed Adams-Bashforth-Moulton algorithm to overcome this drawback was compared with the Milne Simpson scheme. The proposed algorithm is simulated on distributed sensor nodes where information is gathered from the Intel Berkeley Research Laboratory. To maximize the power saving in wireless sensor network, our adopted method achieves the accuracy of 60.28 and 59.2238 for prediction threshold of 0.01 for Milne Simpson and Adams-Bashforth-Moulton algorithms, respectively.
When one drug interacts with another, it is known as a drug-drug interaction. This could change how one or both drugs work in the body, or induce unforeseen adverse effects. The performance of a combination can be effectively degraded or improved by mixing different medications. In some cases, it can adversely affect the patient's health. So, it's of great importance and time demanding to classify the interaction of drugs. Drug-drug interaction is amongst the top challenging and far-reaching applications of natural language processing. In this work, we have presented drug-drug interaction using the BERT (Bidirectional Encoder Representations from Transformers) model. The previous state of the art performance on the DDI Extraction 2013 corpus was using different variations of convolutional neural networks and LSTMs. In order to validate our proposed model, well-known benchmark data set are used and our BERT-based classification has achieved a much higher score than previous methods at 90.69% accuracy and 81.97% f1-score.
Wireless sensor network (WSN) is used to collect physical information from the environment at real time. The information may be temperature, humidity and air pressure. In modern days, the huge number of wireless sensors are distributed in the physical environment. So, the proper power management scheme is necessary for WSN. Interestingly, by using prediction algorithms in the literature, we can predict the future data and compare the predicted data with actual information. In this approach, if the absolute value is within the threshold value, then we can save power by not sending the actual measurements to the base station as the base station is already equipped with similar data prediction algorithm. Previous works are done on this problem by using Simpson 3/8 method and Kalman filter algorithm. Unfortunately, they are not very efficient when the threshold value is small. To maximize the power savings for smart sensors, we are proposing a Milne Simpsons algorithm for prediction and estimation of the transmitted signals. With this method, the prediction accuracy is higher than existing methods. With our proposed approach, the data prediction accuracy rate will be high, resulting in low power consumption in wireless networks.
Classification of electroencephalography (EEG) signals for brain-computer interface has great impact on people having various kinds of physical disabilities. Motor imagery EEG signals of hand and leg movement classification can help people whose limbs are replaced by prosthetics. In this paper, random subspace ensemble network with variable length feature sampling has been proposed for improving the prediction accuracy of motor imagery EEG signal classification. The method has been tested on eight different subjects and a hybrid dataset of two subjects data combined. Discrete wavelet transform based de-noising scheme has been adopted to remove artifacts from the EEG signal. For sub-band selection, dual-tree complex wavelet Transform has been employed. Mutual information scoring has been used for univariate feature selection from the feature space. A comparative analysis has been carried out where random subspace ensemble network outperformed other classification models. The maximum accuracy obtained by the model was 90.00%. Furthermore, the model showed better performance on the hybrid dataset with an average accuracy of 86.00%. The findings of this study are expected to be useful in artificial limb movements through brain-computer interfacing for rehabilitation of people with such physical disabilities.
Representing around 80% of breast cancer, Invasive Ductal Carcinoma is the most common type of breast cancer. In this work, we have proposed a self-attention GRU model to detect Invasive Ductal Carcinoma. Self-attention is a way to motivate the architecture paying the attention to different locations of the sequence generated by an image effectively mapping regions of the image. The model was used to discriminate between cancerous samples and non-cancerous samples through training on the breast cancer specimens. The ability of discriminative representation has been improved using the self-attention mechanism. We have achieved the best average accuracy of 86%, a mean f1 score of 86% from our proposed model (It should be noted that we used 1:1 train-test split to achieve this score). We also experimented with a baseline CNN, ResNets (ResNet-18, ResNet-34, ResNet-50) and RNN variants (LSTM, LSTM + Attention). Our simple recurrent architectures with the attention mechanism outperformed Convolutional Networks which are traditional choices for image classification tasks. We have demonstrated how the scale of data can play a big role in model selection by studying different RNN, CNN variations for breast cancer detection scheme. This result is expected to be helpful in the early detection of breast cancer.
In spite of the immense success of deep neural networks for classification tasks, it's challenging to use them to extend the usage for industrial applications. The struggle comes from the nature of variation in data sources and the distribution of data. For image classification and detection schemes, it's significant to design models that are less prone to transformation shifts. In this work, we propose LadonNet, a CNN which trains with multiple spatial transformations of the input instance. The model is extended to work with a Residual CNN model on training samples and generated augmented samples. The model was compared with other varieties of residual architectures and showed competitive performance. In the future, the model can be extended with attention to better visualize the strongest features.
The deployment of large language models (LLMs) within the healthcare sector has sparked both enthusiasm and apprehension. These models exhibit the remarkable capability to provide proficient responses to free-text queries, demonstrating a nuanced understanding of professional medical knowledge. This comprehensive survey delves into the functionalities of existing LLMs designed for healthcare applications, elucidating the trajectory of their development, starting from traditional Pretrained Language Models (PLMs) to the present state of LLMs in healthcare sector. First, we explore the potential of LLMs to amplify the efficiency and effectiveness of diverse healthcare applications, particularly focusing on clinical language understanding tasks. These tasks encompass a wide spectrum, ranging from named entity recognition and relation extraction to natural language inference, multi-modal medical applications, document classification, and question-answering. Additionally, we conduct an extensive comparison of the most recent state-of-the-art LLMs in the healthcare domain, while also assessing the utilization of various open-source LLMs and highlighting their significance in healthcare applications. Furthermore, we present the essential performance metrics employed to evaluate LLMs in the biomedical domain, shedding light on their effectiveness and limitations. Finally, we summarize the prominent challenges and constraints faced by large language models in the healthcare sector, offering a holistic perspective on their potential benefits and shortcomings. This review provides a comprehensive exploration of the current landscape of LLMs in healthcare, addressing their role in transforming medical applications and the areas that warrant further research and development.
Speech synthesis is one of the challenging tasks to automate by deep learning, also being a low-resource language there are very few attempts at Bangla speech synthesis. Most of the existing works can't work with anything other than simple Bangla characters script, very short sentences, etc. This work attempts to solve these problems by introducing Byakta, the first-ever open-source deep learning-based bilingual (Bangla and English) text to a speech synthesis system. A speech recognition model-based automated scoring metric was also proposed to evaluate the performance of a TTS model. We also introduce a test benchmark dataset for Bangla speech synthesis models for evaluating speech quality. The TTS is available at https://github.com/zabir-nabil/bangla-tts
Motor Imagery EEG Classification Using Random Subspace Ensemble Network with Variable Length Feature Sampling
Effective Question Answering (QA) on large biomedical document collections requires effective document retrieval techniques. The latter remains a challenging task due to the domain-specific vocabulary and semantic ambiguity in user queries. We propose BMQExpander, a novel ontology-aware query expansion pipeline that combines medical knowledge - definitions and relationships - from the UMLS Metathesaurus with the generative capabilities of large language models (LLMs) to enhance retrieval effectiveness. We implemented several state-of-the-art baselines, including sparse and dense retrievers, query expansion methods, and biomedical-specific solutions. We show that BMQExpander has superior retrieval performance on three popular biomedical Information Retrieval (IR) benchmarks: NFCorpus, TREC-COVID, and SciFact - with improvements of up to 22.1% in NDCG@10 over sparse baselines and up to 6.5% over the strongest baseline. Further, BMQExpander generalizes robustly under query perturbation settings, in contrast to supervised baselines, achieving up to 15.7% improvement over the strongest baseline. As a side contribution, we publish our paraphrased benchmarks. Finally, our qualitative analysis shows that BMQExpander has fewer hallucinations compared to other LLM-based query expansion baselines.