Browser-only character model

MicroMedGPT

A tiny GPT model for generating new medication names.

Model snapshot

Waiting for exported browser weights.

Seed

Temperature Names Max length

Generated names

About

MicroMedGPT is a (very) small language model trained to invent new medication-like names, one character at a time. Can a very small model learn the spelling rhythms of drug names well enough to make plausible new ones?

The model has a tiny vocabulary of 27 tokens: the letters a to z, plus one end-of-name token. That means every name is treated as a sequence of characters rather than as words or subwords.

The project was inspired by Andrej Karpathy's microgpt.py, a compact, dependency-free GPT implementation written in Python. Karpathy's version demonstrates the idea of a GPT end to end: a transformer, an autograd engine, training, and sampling, all in one Python file, with no dependencies.

The training corpus is a list of drug names compiled from the US FDA National Drug Code Directory. It pulls both proprietary names and non-proprietary names, lowercased, with punctuation and numbers removed. It keeps alphabetic characters only, and strips confounding words such as tablet, capsule, injection, solution, cream, and spray.

After cleaning, the corpus contains 6,091 drug names. These are compiled in a list: abacavir, abatacept, abemaciclib, abilify, abiraterone, and so on. This is enough for the model to notice some of the characteristic endings and internal shapes of medicine names, while still being small enough to train in about a minute on a laptop.

Does it work? Well, you can see that if the seed a is used, the GPT will generate new drug names like adlarilo, atrzamasidene, atetapine, akunotrab, acadsel...

The offline training run uses 1,000 steps. At the end of that run the final training loss is 2.262, which corresponds to a perplexity of about 9.61. Perplexity is a rough measure of how uncertain the model is about the next character. Lower is better, and a single digit score is (to me) surprisingly good for such a small model.

Once trained, the Python script exports the learned weights as a JSON file. This webpage loads that model file, and runs the same transformer calculation in JavaScript. No server call is made when you press generate- the sampling happens locally in-browser.