Projects

Exploring the Causes & Effects of Quantization-induced Degradation in LLMs

We extend on recent work(Scaling Laws for Precision) showing that quantized models degrade in performance more significantly past a certain amount of training steps. We confirm their findings on larger models (OLMo-1B & 7B) and on downstream tasks. Further, we find that there are no abnormally large or growing activation statistics across training steps, suggesting that large activations are not the cause of this degradation. Lastly, we trace the quantized error for each weight matrix of the model and find that across training steps, the attention projection (\(W_{QKV}\)) weights consistently has growing quantization error, while other modules have roughly similar quantization error across steps. This suggests that Quantization-induced degradation may be due to growing quantization error in the queries, keys, and values across training steps.

Exploring Activation-aware Quantization for LLMs

[paper] [code]

Studied mixed precision and activation aware quantization in order to extend lossless quantization of weights and activations past 8-bit. Our investigation finds limitations extending quantization past 6 bits, suggesting that achieving lossless 4-bit quanitzation, which can be leveraged for practical benefits, will be challenging to achieve.

Analyzing Inference Optimizations for Transformers

[paper]

Studied inference optimizations, like KV-Caching and Grouped Query Attention, in the attention module of transformers, including in their impact on inference speed and energy usage.

Characterizing Sparsity in Transformers

[paper] [code]

Theoretical and biological reasons suggest that sparsity may be important for deep neural network to perform well. Here, we examine the attention blocks of large transformer models to identify sparse features in their weights and/or activations. One interesting finding is that the weight matricies used in attention have very low stable rank, especially the matrix product \(W_qW_k^T\).

Bias in BERT models

[paper] [code]

Historical training data used for training machine learning models can be biased, and machine learning models are susceptible to inheriting this bias. In this work, we use the masked token prediction capabilities of BERT models to show that they contain gender and racial bias. We create a dataset and use a novel loss function in order to reduce bias via finetuning. Preliminary analysis shows that this finetuning is successful at reducing bias, but needs to be examined further.

Gabor filter-constrained CNNs

[paper] [code]

Humans have more general and robust object recognition capabilities in comparison to machines. In this work, we investigated whether constraining convolutional neural networks to be more human-like, via the use of Gabor filters, improves their performance and their robustness against adversarial attacks.

MIT Pokerbots

[code]

I competed in the 2023 MIT Pokerbots competition and placed in the top 10%, resulting in a cash prize. The variant played in this competition was River of Blood Hold'em.

PyTorch, but in NumPy

[code]

Implemented basic PyTorch functionality from scratch using only NumPy arrays. Neural networks converge and perform well on non-trivial problems.

Breadboard Computer

[Multiplication Execution Video] [Multiplication Program]

Created functioning 8-bit computer by wiring logic chips on breadboards by hand. Created micro instructions directly in binary and implemented hardware primitives built out of micro instructions. Used 16 bytes of memory to write multiplication program out of hardware primitives.

Reece Shuttleworth

Hello, I'm Reece!

Publications

LoRA vs Full Finetuning: An Illusion of Equivalence

PDDL Planning with Pretrained Large Language Models

Using Large Language Models to Solve College Math

Projects

Exploring the Causes & Effects of Quantization-induced Degradation in LLMs

Exploring Activation-aware Quantization for LLMs

Analyzing Inference Optimizations for Transformers

Characterizing Sparsity in Transformers

Bias in BERT models

Gabor filter-constrained CNNs

MIT Pokerbots

PyTorch, but in NumPy

Breadboard Computer