Sunday, December 22, 2024

Google Gemini roadtest: Does this sound like someone you know?

Must read

Google Gemini is the latest — and arguably most ambitious — generative AI system created to date.

While established players in the AI business have used limited datasets to create their large language models (LLMs), Google has asked OpenAI to hold its beer, and trained Gemini on a dataset of more than 15 times the size of GPT’s most recent model.

With parent company Alphabet’s tentacles probably the most embedded in the online world, Gemini has access to vast swathes of information inaccessible to most other models.

Traditional competitors, including Apple and Opera, have been so impressed by Gemini that they are currently in talks with Google about using Google’s AI to power various elements of their own systems.

The problem is that Gemini currently has no way of distinguishing between what is useful information and what is utter madness pulled from a popular Reddit thread.

This is why Gemini has been offering up wild answers to simple questions, evidently unable to tell the difference between an academic and an internet troll. Critics have accused Google of making a “woke AI” that produces images of racially diverse Nazis and Native American senators from the 1800s.

It will also tell you the benefits of nuclear war and human sacrifice if you ask it just right.

But there are surprises hidden in Gemini as well. Various benchmarking tests have shown it outperforms ChatGPT and Microsoft Copilot at most high-level tasks like coding, and “comprehension” of academic literature.

Testing Gemini

Gemini was fed the prompt “Can you please pretend to be an Australian public servant from the *insert department*, and tell me about your job?” for each federal department.

All use the basic version of Gemini.

Here are the results:

Key takeaways

I ran Gemini’s answers through textual analysis tools to examine the readability, and searchability of the text.

The Flesch-Kincaid readability score is a standardised measure of text readability that will be familiar to many copywriters. Gemini’s answers sat between 30 and 45, which is generally considered to be a difficult read. However, in this case, that isn’t as bad a result as it sounds.

Once the Australianisms like “crikey” and “cobber” are removed, The Flesch-Kincaid score is a much more reasonable — 40-50, or late high school level.

It’s worth noting that Gemini has controls that allow you to adjust the complexity and formality of language, so these could likely have been adjusted to be more realistically APS-toned, and easier to read.

The search engine optimisation (SEO) is exceptionally good, although it shouldn’t be surprising, given Google’s own tool knows what kinds of answers its search engine is looking for.

The language is almost strangely casual compared to Gemini’s answers for public servants from other countries and seems to have picked up on a number of stereotypes about Australian culture to produce its Australian tone.

All of the text appears to be trying to “sell” the department, which probably reflects the fact so much of the content Google promotes is commercial in nature.

Perhaps most importantly for public servants, the text is mostly accurate — not something that can be taken for granted with an LLM known for telling people to eat rocks and kill themselves.

While the government works on how AI will be used by the public service, the advice remains any inputs for generative AI must be subject to human review.

Even so, tools like Gemini could prove a time saver for writing emails, briefs and reports, that are then fact-checked and edited by a human public servant.

Oh, and there is at least one department that “might just be the place for you!”

Can it do anything useful?

We asked Gemini to write a variety of policy briefs and speeches, and it generally performed relatively well. Most surprisingly, the information is referenced when you explicitly use the prompt “policy brief” or “literature review”, making it easy to fact-check.

There was usually one, but rarely more than one mistake in the policy briefs.

The speeches were almost startlingly good at emulating the style of various politicians, but less so senior public servants.

Response to prompt "Write a speech about trains in the style of Scott Morrison"
“Write a speech about trains in the style of Scott Morrison”

READ MORE:

Eat a rock a day, put glue on your pizza: How Google’s AI is losing touch with reality

Latest article