Details coming soon.
Published in the NeurIPS FMDM Workshop.
Published in PNAS.
Studied inference optimizations, like KV-Caching and Grouped Query Attention, in the attention module of transformers, including in their impact on inference speed and energy usage.
Theoretical and biological reasons suggest that sparsity may be important for deep neural network to perform well. Here, we examine the attention blocks of large transformer models to identify sparse features in their weights and/or activations. One interesting finding is that the weight matricies used in attention have very low stable rank, especially the matrix product \(W_qW_k^T\).
Historical training data used for training machine learning models can be biased, and machine learning models are susceptible to inheriting this bias. In this work, we use the masked token prediction capabilities of BERT models to show that they contain gender and racial bias. We create a dataset and use a novel loss function in order to reduce bias via finetuning. Preliminary analysis shows that this finetuning is successful at reducing bias, but needs to be examined further.
Humans have more general and robust object recognition capabilities in comparison to machines. In this work, we investigated whether constraining convolutional neural networks to be more human-like, via the use of Gabor filters, improves their performance and their robustness against adversarial attacks.
I competed in the 2023 MIT Pokerbots competition and placed in the top 10%, resulting in a cash prize. The variant played in this competition was River of Blood Hold'em.
Implemented basic PyTorch functionality from scratch using only NumPy arrays. Neural networks converge and perform well on non-trivial problems.