OpenAI’s GPT-4 is Here. It’s Passing More Exams
- The new model of OpenAI’s ChatGPT can pass more exams with higher scores, accurately process images, and adopt different personalities.
- The model still makes simple factual and logical reasoning errors.
- The model is available to some developers and researchers. Users must sign up for the waitlist for access.
OpenAI’s ChatGPT-3.5 artificial intelligence has proved to be a passable student for law, medical, and college-level exams. But OpenAI’s new model, GPT-4, looks like it wants to be the top student in the class.
On March 14, OpenAI released a technical report of GPT-4, the newest iteration of the ChatGPT artificial intelligence showcasing the model’s capabilities and limitations. College students, professors and administrators take note: This version of the AI chatbot improves academic performance, tunes AI personalities, and even shows the ability to assess images.
However, the AI can still make simple logical and factual mistakes.
Here’s a first look at how GPT-4 performed on college- and graduate-level exams and other benchmarks.
What Exams Can GPT-4 Pass?
One of GPT-4’s biggest accomplishments is becoming a licensed practitioner of law.
GPT-4 shot beyond GPT-3.5’s performance in the Uniform Bar Exam with a 298/400, landing in the 90th percentile of students. GPT-3.5’s test score was 213/400, in the 10th percentile of students.
Image Processing
One of the biggest differences between GPT-3.5 and GPT-4 is the AI’s ability to accurately see and assess images. Previously, a study testing GPT-3.5 on the United States Medical Exam removed all questions containing visual assets due to the model’s inability to determine what was in an image.
OpenAI submitted a combination of text and images to ask the AI, “What’s funny about this image? Describe it panel by panel.”
Steerability
Developers, and later users, can change the AI’s “character” to be different from the usual style of ChatGPT. For example, students can now change GPT-4 into a Socratic tutor that will never give students the answer but guide them through problem-solving.
Or they can turn the AI into a Shakespearean pirate.
Limitations
GPT-4, like its predecessors, can still “hallucinate” facts and make reasoning errors. The base model is slightly better than GPT-3.5. The gap widens after Reinforcement Learning from Human Feedback (RLHF) training.
Like GPT-3.5, GPT-4’s brain is stuck in the past. It generally lacks knowledge of any event after September 2021.
GPT-4 is only available to some developers and researchers, but you can join OpenAI’s waitlist. Text-only requests are currently available, and pricing is $.03 per 1k prompt tokens and $.06 per 1k completion tokens.
“We look forward to GPT-4 becoming a valuable tool in improving people’s lives by powering many applications,” OpenAI said. “There’s still a lot of work to do, and we look forward to improving this model through the collective efforts of the community building on top of, exploring, and contributing to the model.”