They did give us OpenAI gym (now Gymnasium) and PPO. It’s sad that they completely pivoted away from this line of work though.
We’ve seen similar effects in the context of reinforcement learning (see the “primacy bias” works of Evgenii Nikishin). It makes sense that it would also apply to LLMs, and any other ML model.
Men in Black 1997
He’s the worm guy.