Skip to content

James Williams

Notes from Stephen Wolfram's ChatGPT primer

Source Material: Stephen Wolfram, 2023-02-14

ChatGPT is a large-scale transformer-based language model that is designed to predict the next word in a sentence given the context of what has been said. It is a neural network with 175 billion parameters that has been trained on a vast corpus of text, enabling it to form and apply a semantic structure to human language.

The name ChatGPT stands for Generative Pre-trained Transformer. Generative means that the model is capable of generating new text rather than just recognizing patterns in existing text. Pre-trained means that the model has been trained on a large corpus of text before being fine-tuned for a specific task. Transformer refers to the specific type of neural network architecture which is designed to better handle long-term dependencies between words in a sentence.

To accomplish its task, ChatGPT uses a technique known as unsupervised learning, which allows it to learn patterns in the data without being explicitly taught. Instead of being trained on explicit examples of inputs and their associated outputs like in supervised learning, the model is given a large corpus of text and is trained to predict the next word in a sentence by masking the latter part of the sentence and having it predict what should come next. It then compares what it generated with the masked text, and iteratively adjusts its parameters to minimize the error.

To evaluate how well the model performs on each iteration, a loss function is used. The loss function calculates how far away the model’s predictions are from the desired outcome, and the neural net weights are adjusted in a way that minimizes the result of the loss function.

Training the model both optimizes the neural net weights and produces embeddings, which are a way of representing the meaning of words as arrays of numbers (in the vague, undefinable sense of ‘meaning’). Nearby words are represented by nearby numbers. ChatGPT takes this concept further by generating embeddings not just for individual words, but for entire sequences of words.

These embeddings are then used to predict the probabilities of different words that might come next in a sentence. This is accomplished using a transformer architecture, which is designed to better handle dependencies between tokens in the input and output even when they are far from each other in the input sequence. One of the defining features of the transformer is its use of an attention mechanism, which involves certain neurons focusing more on relevant parts of the sequence than others. This allows ChatGPT to take into account the context of the conversation that’s taken place, which can inform the next token that’s generated. The attention mechanism also allows ChatGPT to capture context from the prior conversation even when that context is not adjacent to the token being generated. This is the main reason ChatGPT comes across as a coherent entity.

Finally, ChatGPT uses a temperature setting to introduce a degree of randomness into its predictions, which can make the output more diverse and interesting.


What strikes me as the most profound point Stephen makes is the success of ChatGPT as a scientific discovery in that it shows that there may be simple rules that describe how the semantics of human language can be arranged that we ourselves don’t yet understand. Studying the pathways and structures ChatGPT uses could help deepen our own understanding of human language.

LLM Round-up

When I asked ChatGPT what type of AI it is, this is what it said:

ChatGPT is a type of language model, specifically a variant of a transformer model called the Generative Pretrained Transformer (GPT). It is an example of artificial general intelligence (AGI), which means that it is capable of performing a wide range of natural language processing tasks, such as text generation, summarization, translation, and question answering, among others.

I know some of these words 🤦‍♂️.

I’ll never be an AI researcher, but I do want to have a laymen’s understanding of what the tool is capable of and how it works. I’ve been searching for educational resources that lie somewhere in the sweet spot of accessibility and depth. I want to understand the basic terminology, like what a transformer is, and also understand a large language model’s shortcomings. And I want to understand where a language model fits into the broader landscape of AI.

To that end, I’m compiling a few resources that seem to be well regarded. I’ll treat this post as a digital bookshelf, and will update it with my thoughts and notes as I work through the material.

We’ll start with 35,000 words from Stephen Wolfram as a primer, then move into Andrej Karpathy’s Zero to Hero lecture series.

Three managerial principles I wrote down in 2015

In my mid-20s—okay, mid- to late-20s—I worked as the Assistant Manager of a restaurant. It was a formative time for me, though I didn’t fully realize it then.

In 2015 I jotted down three managerial principles on Medium that had been percolating in my mind. (Medium at the time was quite in-vogue). Having forgotten all about this account, I stumbled upon it and it struck me as uncharacteristically good advice from my otherwise uninformed young self.

The following is dated May 10, 2015.

I keep three things in mind on a day to day basis that help me decide on a particular course of action, how best to conduct myself in a given situation, or where to focus my energy where demand outstrips supply.

  1. As a leader, your most critical function is teaching and supporting your people. You are not your business — your employees are, and your job is to develop them into strong ambassadors of the brand. The quality of your product, the happiness of your customers, the volume of your sales all follow from the attitude and competence of your staff. Interpersonal skills and the ability to make improvements and criticisms without being hurtful or discouraging are perhaps the most critical attribute of any manager. Positive feedback is important.
  2. Organization and prioritization. Simply find a way to ensure that no details are missed and nothing slips through the cracks. Do what you can to minimize surprises; be flexible and adaptable when they do occur.
  3. Managing your own attitude. You set the tone and energy. I work in a fast-paced, high stress business where this is particularly important. Letting small incidents send you into an outwardly visible tailspin is a quick and sure way of crashing a complex or time-sensitive operation. Your staff feed off your energy and positivity, and being mindful of that is critically important. A friend once told me, “head to work each day as if you were going to war.” — Expect and anticipate problems in a manner such that they won’t ruin your day when they occur. Keep it light and save the over reactions for incidents that warrant it.

Along with hard work and volition, these three points seem to work well for me. I’m still young and stupid, so take with a dash of salt.

Will update as I make more mistakes and learn from them.


Bonus: I also wrote this, titled reviews_cometh.txt:

The customers grow restless in the evenings. They arrive in numbers by moonlight and street lamp, hungry for bread and wine. It is in these hours, as dusk comes ever sooner with each revolution, they wait.

At the door, they wait.

For drinks, they wait.

For their meat and their honey…

They wait.

In these desperate moments — moments of hope then anger, denial then acceptance, they reach upon their mobile devices and tap.

For the negative reviews cometh.

How to convert a subdirectory to its own git repository/submodule

# Make a clone of the repository
git clone <your_project> <new_submodule>

# CD into the new repository
cd <new_submodule>

# Use filter-branch to isolate the subdirectory
git filter-branch --subdirectory-filter 'path/to/subdirectory' --prune-empty -- --all

# Remove the old remote
git remote rm <remote_name>

git filter-branch lets you rewrite Git revision history and apply custom filters on each revision. It should be used with caution! From the Git Manual (emphasis mine):

git filter-branch has a plethora of pitfalls that can produce non-obvious manglings of the intended history rewrite (and can leave you with little time to investigate such problems since it has such abysmal performance). These safety and performance issues cannot be backward compatibly fixed and as such, its use is not recommended.

Nevertheless, since we live dangerously around these parts, here’s what’s happening with the filter-branch command should you choose to use it:

  • --subdirectory-filter is the main command. It takes a path to a subdirectory and filters the repository to only include that subdirectory.
  • --prune-empty removes commits that don’t change anything.
  • -- --all is a way to pass arguments to the internal git rev-list command. In this case, it’s telling git to run the command on all branches.

Once the operation is done, you’ll have a new repository with only the subdirectory you specified. You can then push it to a new remote and add it as a submodule to your original repository.

git filter-branch is a destructive operation. While I’ve used the above command with success on my own repositories, your mileage may vary and I probably can’t help you if something goes wrong.

GPG Field Guide

Jürgen Gmach on his personal blog:

While I need to use GPG pretty regularly, I always have to look up the commands - they just don’t stick :-)

Over the last couple of months I collected every command I had to use. Enjoy!

🔖 One for the bookmarks: GPG - All I Need to Know