I'm mostly done with the Natural Language Processing specialization and tonight I was playing around with a fun proof of concept written by Andrej Karpathy.
The proof of concept is a tiny transformer architecture, "GPT mini". Unlike its big GPT-3 and GPT-4 brothers, this tiny model is character based, not word based. The text I used are three short stories by author Nescio -- which, in fact, constitute his complete works, around 200k characters. I trained it for 20,000 iterations.