Welcome to the summer of AI

April saw the release of Meta’s Llama 3 generative large language model, notable for being open-source

By Dr Ayesha Razzaque

May 20, 2024

This representational picture shows a metallic figure against a computer. — AFP/File

For decades, research in artificial intelligence (AI) has advanced in fits and starts, alternating between decades of euphoric optimism and, when those promises do not come to pass, falling into ruts called AI winters.

But this season is shaking out to be the summer of AI. April saw the release of Meta’s Llama 3 generative large language model, notable for being open-source. A few days ago, Google showed off Project Astra and Gemini 1.5 generative AI (GenAI) models at its Google I/O developer conference, and expectations are high for Apple’s 2024 Worldwide Developers Conference in June where it usually unveils updates and new features for its mobile and desktop operating systems, alongside other product announcements. While tech companies large and small have been falling over each other to release one AI product after another, Apple – known to be a last-mover – has been very quiet so far.

But I would probably not be wrong in my assessment that the showstopper this season thus far has been OpenAI’s release of GPT-4o GenAI model which it front-ran on Google I/O. The ‘o’ stands for omni (short for omni-modal) meaning it can input and output not only text, like earlier models, but can also understand and produce images and audio. In demos, multiple users interact with it by speaking to it and engaging with it through camera video feed and on-screen activity.

For me, what was the most fun to see was the wide variety of intonations in which GPT-4o can ‘speak’ to users, expressing excitement, patience, encouragement, mystery, drama, singing, and more. It promises to make interactions much more natural and, dare I say, fun. In fact, several commentators have criticized GPT-4o for sounding too flirtatious in its default mode.

The preview of OpenAI’s GPT-4o included some demos, including one of solving a simple linear equation in one variable. It was accompanied by the online release of a series of short demo videos. One featured Sal(man) Khan, of Khan Academy fame, and his son getting tutored and answering questions about right triangles in conversational style while interacting with it through screen sharing on an iPad and a pen/stylus. GPT-4o was able to interact and respond to the student’s on-screen annotations and activities and guide him, step-by-step, towards answers like a good tutor would.

These demos look very promising but are still too few in numbers to make any definitive claim about their usefulness to students and the education sector in general. However, given that the two demonstrated use-cases were education-related, this is an application area that is at the fore of developers’ minds. The pace of improvements in capabilities and the addition of guardrails has been rapid but I was not expecting a multi-modal GenAI model of this kind this soon, but here we are!

From an educationist’s perspective, what stands out most to me in this generation of models is their ability to not simply spit out an answer. Instead, they are able to gradually nudge and guide learners through multi-step problems while encouraging them to pick up the thread, as a human tutor would.

Duolingo is the world’s largest language learning platform that is free and comes in the form of an app. Following the release of GPT-4o, in an interview on a business channel, Duolingo’s CEO Luis von Ahn shared how Duolingo plans to replace its person-to-person chat feature and replace it with chat with GPT-4o. Learners are hesitant to use the existing person-to-person chat feature, possibly due to social anxiety, fear of embarrassment, and similar factors. Knowing that they are talking to a (good) chatbot might address that.

However, not all education service providers will come out as winners. Chegg, describes itself as a “24/7 homework help”. In academia, investigations often find it having a role in cases of plagiarism. Chegg’s fortunes rose during the pandemic. Its stock price hit an all-time high of $113+ in February 2021 but has been on a steady decline since then. The release of multiple GenAI models and increased access to them by the public is probably a contributing factor in that decline. Two days back, its stock closed at an 11-year, all-time low of $4.38. Why (blindly) copy an answer when it might actually be easier to show the problem to a personalized tutor that is available round the clock, who can walk you through the solution without any personal judgement?

The distinction between winners and losers among education service providers will extend to learners as well. In terms of resources, accessing a GenAI model requires access to at least a smartphone (or computer or tablet), Internet access, and (depending on the specific model) a subscription which, in turn, will require a credit card.

Beyond that, although several GenAI models presently support a few dozen languages, the primary language for which they are developed is English, the language of the Internet. That is fine for everyone who knows English or any well-supported language, but what about everybody else? In terms of numbers, various reports put Urdu as the 10th or 11th most widely spoken language worldwide but do not expect support for it to arrive any time soon.

Such decisions are linked to prioritizing more lucrative markets and customers’ ability to pay which goes back to the state of economies. In our context, we can expect a further widening of the gap between the haves with the resources and requisite command of the English language and the have-nots who lack both. Even if a GPT-4o-like model became available in Urdu tomorrow, that would still exclude millions of learners who are fluent in their local language but not in Urdu.

As we navigate this summer of AI, the transformative potential of multi-modal GenAI models like GPT-4o, Project Astra, Gemini and others in education is undeniably exciting. The ability to offer personalized, interactive tutoring can revolutionize learning, making education more engaging and accessible for many. However, the disparity in access to these advanced tools highlights a critical challenge. In Pakistan, where economic constraints and language barriers are prevalent, the divide between those who can benefit from these technologies and those who cannot may widen further.

It is imperative for policymakers, educators, and technology developers to address these gaps. There are a few groups in the private sector working on some of the necessary pieces but taking on this challenge requires significant investments, some of which I talked about in an earlier op-ed (‘The AI frontier at WGS’, The News International, March 1, 2024). Investing in infrastructure to improve Internet access and affordability, fostering digital literacy, and advocating for the development of AI tools in Urdu and local languages are essential steps. Moreover, public-private partnerships can play a crucial role in ensuring that the benefits of AI are equitably distributed.

The potential of AI in education is vast and is slowly coming into focus and materializing, but its promise must be inclusive. By actively working to bridge the digital divide, we can ensure that all students, regardless of their socio-economic background, have the opportunity to benefit from the educational advancements AI offers. Only then can we truly harness the power of this technological revolution to create a brighter, more equitable future for all learners in Pakistan.

The writer (she/her) has a PhD in Education.