Hello, I'm Reece! I want to create safe AI that aims to solve our most important and pressing problems. Feel free to contact me at rshuttle@mit.edu.

Currently, I am an MEng student at MIT, working to understand how and why transformers and other deep neural nets work in hopes to make better models. I am primarily doing this work as a part of Jacob Andreas's lab. I also did my undergrad at MIT in 2.5 years, studying computer science and cognitive science.

This past summer, I interned in the Bay Area at Numenta, and worked on parameter efficient fine-tuning methods for sparse transformer models in order to reduce the hardware requirements associated with fine-tuning. Previously, I worked in CSAIL in the Learning and Intelligent Systems group, where I worked on solving planning problems with AI. Separately, I have also done work using Large Language Models to solve math and science problems.

I also run for MIT.

Theoretical and biological reasons suggest that sparsity may be
important for deep neural network to perform well. Here, we examine
the attention blocks of large transformer models to identify sparse
features in their weights and/or activations. One interesting finding
is that the weight matricies used in attention have
*very low stable rank*, especially the matrix product
\(W_qW_k^T\).

Historical training data used for training machine learning models can be biased, and machine learning models are susceptible to inheriting this bias. In this work, we use the masked token prediction capabilities of BERT models to show that they contain gender and racial bias. We create a dataset and use a novel loss function in order to reduce bias via finetuning. Preliminary analysis shows that this finetuning is successful at reducing bias, but needs to be examined further.

Humans have more general and robust object recognition capabilities in comparison to machines. In this work, we investigated whether constraining convolutional neural networks to be more human-like, via the use of Gabor filters, improves their performance and their robustness against adversarial attacks.

I competed in the 2023 MIT Pokerbots competition and placed in the top 10%, resulting in a cash prize. The variant played in this competition was River of Blood Hold'em.

Implemented basic PyTorch functionality from scratch using only NumPy arrays. Neural networks converge and perform well on non-trivial problems.

__Published in the NeurIPS FMDM Workshop.__

__Paper Abstract__: We study few-shot prompting of pretrained large
language models (LLMs) towards solving PDDL planning problems. We are
interested in two questions: (1) To what extent can LLMs solve PDDL
planning problems on their own? (2) How and to what extent can LLMs be
used to guide AI planners? Recent work by Valmeekam et al. (2022)
presents negative evidence for (1) in the classic blocks world domain.
We confirm this finding, but expand the inquiry to 18 domains and find
more mixed results with a few clear successes. For (2), we propose a
simple mechanism for using good-but-imperfect LLM outputs to aid a
heuristic-search planner. We also find that the LLM performance is due
not only to syntactic pattern matching, but also to its commonsense
understanding of English terms that appear in the PDDL.

__Published in PNAS.__

__Paper Abstract__: We demonstrate that a neural network pretrained
on text and fine-tuned on code solves mathematics course problems,
explains solutions, and generates questions at a human level. We
automatically synthesize programs using few-shot learning and OpenAI's
Codex transformer and execute them to solve course problems at 81%
automatic accuracy. We curate a dataset of questions from
Massachusetts Institute of Technology (MIT)'s largest mathematics
courses (Single Variable and Multivariable Calculus, Differential
Equations, Introduction to Probability and Statistics, Linear Algebra,
and Mathematics for Computer Science) and Columbia University's
Computational Linear Algebra. We solve questions from a MATH dataset
(on Prealgebra, Algebra, Counting and Probability, Intermediate
Algebra, Number Theory, and Precalculus), the latest benchmark of
advanced mathematics problems designed to assess mathematical
reasoning. We randomly sample questions and generate solutions with
multiple modalities, including numbers, equations, and plots. The
latest GPT-3 language model pretrained on text automatically solves
only 18.8% of these university questions using zero-shot learning and
30.8% using few-shot learning and the most recent chain of thought
prompting. In contrast, program synthesis with few-shot learning using
Codex fine-tuned on code generates programs that automatically solve
81% of these questions. Our approach improves the previous
state-of-the-art automatic solution accuracy on the benchmark topics
from 8.8 to 81.1%. We perform a survey to evaluate the quality and
difficulty of generated questions. This work automatically solves
university-level mathematics course questions at a human level and
explains and generates university-level mathematics course questions
at scale, a milestone for higher education.

Under Construction.