GPT-5.4 OSWorld: 47% → 75%
GPT-5.4 jumped from 47% to 75% on the OSWorld computer-use benchmark in a single release, surpassing the human baseline — a striking leap attributed to combining a coding model with a general model.