Running GPTJ (EleutherAI/gpt-j-6B) on a M1 Macbook

Running GPT locally on a M1 Macbook Actual runtime on my 64GB RAM Apple M1 Max Macbook was 15 minutes. The first answer I got back from this prompt was actually more concise but the answer in this video is actually a better, more thorough one

It is amazing how helpful GitHub CoPilot / ChatGPT can be sometimes. And I have no previous experience with Machine Learning really. At least past a surface level.

I wanted to learn more and really my goal is to create something like DocsGPT which will analyze your codebase's code and documentation and be able to provide in depth answers specific to your codebase. I tried some simple tests setup on a GPU instance on AWS but this was going to be very expensive and I of course would prefer to get this all running locally on my M1 Macbook.

After looking around I found that PyTorch JUST recently added support for M1 Macbooks.

Below is the documentation I used to get it working.

NOTE: Most of this is coming from (copied from) this blog post I will be updating the same information and it will be more up to date in the repository I just had to do a few different things which I document toward the end to get this working on my machine. Most importantly, this was failing to run due to a lack of CUDA support. To fix this I needed to install pytorch-nightly (which can be done a few ways). And then PyTorch doesn't supprt the M1 "mps" device so I needed to set the environment variable PYTORCH_ENABLE_MPS_FALLBACK=1 All of this is documented below.

https://lazycoder.ro/posts/using-gpt-neo-on-m1-mac/
https://github.com/huggingface/transformers/blob/ba0e370dc1713b0ddd9b1be0ac31ef1fdc7bdf76/docs/source/en/model_doc/gptj.mdx?plain=1#L62

PreRequirements:

  • Install xcode and brew
  • MiniForge3 (don’t use Anaconda) -> download the arm64 sh from GitHub - https://github.com/conda-forge/miniforge#download
  • Make sure you have the latest rust installed via rustup
  • You will need to have brew installed llvm, cmake, and pkgconfig as well
# set up conda for fish
export PATH="$HOME/miniforge3/bin/:$PATH"
conda init zsh

# create and use a new env
conda create --name tf python=3.9
conda activate tf

# install tensorflow deps
conda install -c apple tensorflow-deps
# base tensorflow + metal plugin
python -m pip install tensorflow-macos
python -m pip install tensorflow-metal

# install jupyter, pandas and whatnot
conda install -c conda-forge -y pandas jupyter

mkdir -p Projects/lab/tfsetup && cd Projects/lab/tfsetup
rm -rf tokenizers
git clone https://github.com/huggingface/tokenizers
cd tokenizers/bindings/python
# compile tokenizers - should be pretty fast on your m1
pip install setuptools_rust
# install tokenizers
python setup.py install

# install transformers using pip
pip install git+https://github.com/huggingface/transformers
pip install numpy --upgrade --ignore-installed

arch -arm64 brew install llvm
arch -arm64 brew install cmake
arch -arm64 brew install pkgconfig


cd ../../../../../../ # Back to root dir

# run the test code
export PYTORCH_ENABLE_MPS_FALLBACK=1
python gptjex.py


If you run now you will see errors about MPS not supporting different things:

RuntimeError: MPS does not support cumsum op with int64 input

To get around this:

export PYTORCH_ENABLE_MPS_FALLBACK=1
import tensorflow as tf

print("GPUs: ", len(tf.config.experimental.list_physical_devices('GPU')))
#!/usr/bin/env python3
# export PYTORCH_ENABLE_MPS_FALLBACK=1

from transformers import GPTJForCausalLM, AutoTokenizer
import torch
import os
model = GPTJForCausalLM.from_pretrained("EleutherAI/gpt-j-6B")
print("Created model")
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B")
print("Created tokenizer")
prompt = (
    "How do I create a thread in GoLang?"
)

input_ids = tokenizer(prompt, return_tensors="pt").input_ids
print("Got input_ids from tokenizer")
gen_tokens = model.generate(
    input_ids,
    do_sample=True,
    temperature=0.9,
    max_length=2048,
    pad_token_id=tokenizer.eos_token_id
)
print("Generated tokens")
gen_text = tokenizer.batch_decode(gen_tokens)[0]
print("Got gen_text from tokenizer.batch_decode\n\n\n")
print(gen_text)
os.system("say done")

If you get an error about the device number not being valid from the model.generate() call, try setting the device to new pytorch mps device:

# Edit the code for happytransformer
code /opt/homebrew/anaconda3/envs/tf2/lib/python3.9/site-packages/happytransformer/happy_generation.py

On line 58 or so there is:

device_number = detect_cuda_device_number()

I changed this to:

device_number = torch.device("mps")

I also had to make sure I had:

export PYTORCH_ENABLE_MPS_FALLBACK=1

Install nightly. I believe this is what finally worked:

conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch-nightly

Although the documentation (I believe) and what I tried before doing the command above was:

pip install --pre torch --extra-index-url https://download.pytorch.org/whl/nightly/cpu
python -c 'import torch; print torch.cuda.is_available()'\