Language Proficiency as a Matter of Law: Judicial Reasoning on Miranda Waivers by Speakers with Limited English Proficiency (LEP)

Abstract

Judges wield enormous power in modern society and it is not surprising that scholars have long been interested in how judges think. The purpose of this article is to examine how US judges reason on language issues. To understand how courts decide on comprehension of constitutional rights by speakers with Limited English Proficiency (LEP), I analyzed 460 judicial opinions on appeals from LEP speakers, issued between 2000 and 2020. Two findings merit particular attention. Firstly, the analysis revealed that in 36% of the interrogations, LEP speakers were advised of their rights only in English. This means that two decades after the Executive Order 13166 (2000) Improving Access to Services for Persons with Limited English Proficiency, law enforcement still doesn’t have adequate resources to advise LEP speakers of their constitutional rights in their primary languages. Secondly, the analysis revealed that some courts treat second language proficiency as an all-or-none phenomenon. This approach results in linguistic discrimination against LEP speakers who cannot comprehend legal language but are denied the services of an interpreter because they can answer basic questions in English. I end the discusson with recommendations for best practices in delivery of constitutional rights.

Supervised feature selection using principal component analysis

Abstract

The principal component analysis (PCA) is widely used in computational science branches such as computer science, pattern recognition, and machine learning, as it can effectively reduce the dimensionality of high-dimensional data. In particular, it is a popular transformation method used for feature extraction. In this study, we explore PCA’s ability for feature selection in regression applications. We introduce a new approach using PCA, called Targeted PCA to analyze a multivariate dataset that includes the dependent variable—it identifies the principal component with a high representation of the dependent variable and then examines the selected principal component to capture and rank the contribution of the non-dependent variables. The study also compares the feature selected with that resulting from a Least Absolute Shrinkage and Selection Operator (LASSO) regression. Finally, the selected features were tested in two regression models: multiple linear regression (MLR) and artificial neural network (ANN). The results are presented for three socioeconomic, environmental, and computer image processing datasets. Our study found that 2 of 3 random datasets have more than 50% similarity in the selected features by the PCA and LASSO regression methods. In the regression predictions, our PCA-selected features resulted in little difference compared to the LASSO regression-selected features in terms of the MLR prediction accuracy. However, the ANN regression demonstrated a faster convergence and a higher reduction of error.

Recent developments in geographic information systems across different application domains: a review

Abstract

The recent advancements in geospatial technologies together with the availability of geographic information system (GIS) software and tools with enhanced capabilities in the recent decades have shifted the use of this technology from an early adoption phase to a more mature phase. The application of GIS technology has found new avenues in solving complex socio-economic engineering problems from disaster management and mitigation planning to environmental modelling for sustainable development of cities. In order to reveal the recent developments in use of this technology in different scientific disciplines, a new methodology is developed by considering the semantics to formalize the engineering knowledge in GIS applications. This formalized knowledge is then used to reveal emerging macrotrends, challenges and directions of future research in two different semantic aspects, the “the application” aspect and “the domain” aspect. The review of “the application” aspect shows that a significant change is seen in the applications of GIS, a shift towards complexity in analyses for engineering and building systems for management and decision-making. Further, a systematic keyword analysis on “the domain” aspect is used to develop sub-domains ontologies and a review of literature on each of these sub-domains details the core developments in GIS and some emerging domains of application. Semantic inferences are drawn based on domain-specific requirements, problems and challenges, and outcomes to improve the domain knowledge. The GIS domain knowledge can further be used in developing computational methods using ontologies, formalization of logic and engineering the future GIS systems. The review interestingly finds a growing trend in using GIS as a Decision Support System (DSS), to model domain-specific contexts, semantically integrate data and develop expert intelligence.

FishRNFuseNET: development of heuristic-derived recurrent neural network with feature fusion strategy for fish species classification

Abstract

The classification of fish species has become an essential task for marine ecologists and biologists for the estimation of large quantities of fish variants in their own environment and also to supervise their population changes. Different conventional classification is expensive, time-consuming, and laborious. Scattering and absorption of light in deep sea atmosphere achieves a very low-resolution image and becomes highly challenging for the recognition and classification of fish variants. Then, the performance rate of existing computer vision methods starts to reduce underwater because of highly indistinct features and background clutter of marine species. The attained classification issues can be resolved using deep structured models, which are highly recommended to enhance the performance rate in fish species classification. But, only a limited amount of fish datasets is available, which makes the system more complex, and also, they need enormous amounts of datasets to perform training. So, it is essential to develop an automated and optimized system to detect, categorize, track, and minimize manual interference in fish species classification. Thus, this paper aims to suggest a new fish species classification model by the optimized recurrent neural network (RNN) and feature fusion. Initially, standard underwater images are acquired from a standard database. Then, the gathered images are pre-processed for cleaning and enhancing the quality of images using “contrast limited adaptive histogram equalization (CLAHE) and histogram equalization”. Then, the deep feature extractions are obtained using DenseNet, MobileNet, ResNet, and VGG16, where the gathered features are given to the new phase optimal feature selection. They are performed with a new heuristic algorithm called “modified mating probability-based water strider algorithm (MMP-WSA)” that attains the optimal features. Further, the optimally selected features are further fed to the feature fusion process, where the feature fusion is carried out using the adaptive fusion concept. Here, the weights are tuned using the designed MMP-WSA. In addition, the fused features are sent to the classification phase, where the classification is performed using developed FishRNFuseNET, in which the parameters of the RNN are tuned by developed MMP-WSA for getting accurate classified outcomes. The proposed method is an effective substitute for time-consuming and strenuous approaches in human identification by professionals, and it turned as a benefit to monitor the biodiversity of fish in their place.

Alice  and the Caterpillar: A more descriptive null model for assessing data mining results

Abstract

We introduce novel null models for assessing the results obtained from observed binary transactional and sequence datasets, using statistical hypothesis testing. Our null models maintain more properties of the observed dataset than existing ones. Specifically, they preserve the Bipartite Joint Degree Matrix of the bipartite (multi-)graph corresponding to the dataset, which ensures that the number of caterpillars, i.e., paths of length three, is preserved, in addition to other properties considered by other models. We describe Alice, a suite of Markov chain Monte Carlo algorithms for sampling datasets from our null models, based on a carefully defined set of states and efficient operations to move between them. The results of our experimental evaluation show that Alice mixes fast and scales well, and that our null model finds different significant results than ones previously considered in the literature.

Evidence-based adaptive oversampling algorithm for imbalanced classification

Abstract

Classification task is complicated by several facts including skewed class proportion and unclear decision regions due to noise, class overlap, small disjunct, caused by large within-class variation. These issues make data classification difficult, reducing overall performance, and challenging to draw meaningful insights. In this research, the evidence-based adaptive oversampling algorithm (EVA-oversampling) based on Dempster–Shafer theory of evidence is developed for imbalance classification. This technique involves assigning probability regarding class belonging for each instance to represent uncertainty that each data point may hold. Synthetic data points are generated to make up for the under-representation of minority instances on the region with high confidence, thereby strengthening the minority class region. The experiments revealed that the proposed method worked effectively even in situations where imbalanced counts and data complexity would normally pose significant obstacles. This approach performs better than SMOTE, Borderline-SMOTE, ADASYN, MWMOTE, KMeansSMOTE, LoRAS, and SyMProD algorithms in terms of \(F_1\) -measure and G-mean for highly imbalanced data while maintaining the overall performance.

SimGCL: graph contrastive learning by finding homophily in heterophily

Abstract

Graph Contrastive learning (GCL) has been widely studied in unsupervised graph representation learning. Most existing GCL methods focus on modeling the invariances of identical instances in different augmented views of a graph and using the Graph Neural Network (GNN) as the underlying encoder to generate node representations. GNNs generally learn node representations by aggregating information from their neighbors, where homophily and heterophily in the graph can strongly affect the performance of GNNs. Existing GCL methods neglect the effect of homophily/heterophily in graphs, resulting in sub-optimal learned representations of graphs with more complex patterns, especially in the case of high heterophily. We propose a novel Similarity-based Graph Contrastive Learning model (SimGCL), which generates augmented views with a higher homophily ratio at the topology level by adding or removing edges. We treat dimension-wise features as weak labels and introduce a new similarity metric based on feature and feature dimension-wise distribution patterns as a guide to improving homophily in an unsupervised manner. To preserve node diversity in augmented views, we retain feature dimensions with higher heterophily to amplify the differences between nodes in augmented views at the feature level. We also use the proposed similarity in the negative sampling process to eliminate possible false negative samples. We conduct extensive experiments comparing our model with ten baseline methods on seven benchmark datasets. Experimental results show that SimGCL significantly outperforms the state-of-the-art GCL methods on both homophilic and heterophilic graphs and brings more than 10% improvement on heterophilic graphs.

Predicting document novelty: an unsupervised learning approach

Abstract

In the age of information deluge, it is pivotal to have access to information or knowledge which is not just relevant but also, novel. Knowledge workers often have to skim through tens or even hundreds of abstracts or articles to identify those novel documents which can enhance their knowledge. While there are personalized recommenders which can provide relevant documents to these knowledge workers, there aren’t many systems which can identify ‘novel’ documents which are relevant. Critical roadblocks in the discovery of novel documents are big, labelled datasets and complex, expensive infrastructure for training models. This work attempts to overcome these roadblocks by proposing an unsupervised classifier based on word associations to predict the novelty of a document based on its content. Evaluation based on a benchmarking dataset TAP-DLND 1.0 has revealed that, the performance of this classifier is comparable to that of many of the state-of-the-art supervised learning techniques including some of the deep learning models. The results can have potential implications in the way NLP researchers approach novelty detection in documents by not having to rely on large, labelled training datasets which are scarce!

Dynamic time-aware collaborative sequential recommendation with attention-based network

Abstract

A natural way of user modeling in sequential recommendation is to capture long-term and short-term preferences, respectively, given user historical behaviors and then fuse them together. Most existing approaches building on attention-based network focus only on exploring item–item relations within each user sequence and ignore collaborative relations among different user sequences, which restricts the improvement of recommendation quality, especially on sparse datasets. Moreover, construction and utilization of collaborative signals including the integration with the original information greatly impact the recommendation effects. In this paper, we propose a novel method named dynamic time-aware collaborative sequential recommendation with attention-based network(DTCoSR) to further address the issues. Specifically, we first design a time-aware collaborative item module to gain collaborative item representations for both long- and short-term interests, consisting of neighborhood selection and neighborhood information aggregation. Then, we utilize two independent self-attention networks to extract the two different levels of short-term interests dependent on the item representation and collaborative item representation, respectively, and then adaptively merge them as the final short-term interests. We achieve long-term interest via the correlation between the user embedding and its collaborative item embedding. Finally, DTCoSR fuses long- and short-term interests in an adaptive method. Extensive experiments on three real-world datasets show that DTCoSR outperforms state-of-the-art methods.