All these images are generated by Google’s latest text-to-image AI

There is a new trend in AI: text-to-image generators. Give these programs any text you want and they will generate remarkably accurate images that fit that description. They can match a range of styles, from oil paintings to CGI renderings and even photographs, and – although it sounds cliché – in many ways the only limit is your imagination.

To date, DALL-E has been the leader in the field, a program created by the commercial AI lab OpenAI (and only updated in April). Yesterday however Google announced his own take on the genreImagen, and it just dethroned DALL-E in the quality of its output.

The best way to understand the amazing capabilities of these models is to simply take a look at some of the images they can generate. There are some generated by Imagen above, and several more below (you can see more examples on google special landing page

In any case, the text at the bottom of the image was the prompt entered into the program and the image above was the output. Just to emphasize: that’s all it takes. You type what you want to see and the program generates it. Pretty fantastic, right?

But while these photos are undeniably impressive in their consistency and accuracy, they should also be taken with a grain of salt. When research teams like Google Brain release a new AI model, they tend to pick the best results. So while these photos all look perfectly polished, they may not represent the average output of the Image system.

Often images generated by text-to-image models look unfinished, smeared, or blurry – issues we’ve seen with images generated by OpenAI’s DALL-E program. (For more information on the problem spots for text-to-image systems, check out this interesting twitter thread that dives into trouble with DALL-E† Among other things, it highlights the system’s tendency to misunderstand prompts and to struggle with both text and faces.)

However, Google claims that Imagen consistently produces better images than DALL-E 2, based on a new benchmark it created for this project called DrawBench.

DrawBench is not a particularly complex statistic: it is essentially a list with about 200 text prompts that the team at Google Imagen and other text-to-image generators entered, with the output of each program then being reviewed by human evaluators. As can be seen from the charts below, Google found that people generally preferred Imagen’s output over rivals.

Google’s DrawBench benchmark compares Imagen’s output to competing text-to-image systems like OpenAI’s DALL-E 2.
Image: Google

However, it will be difficult to judge this for ourselves as Google is not making the Imagen model available to the public. There’s a good reason for that too. While text-to-image models certainly have fantastic creative potential, they also have a range of troubling uses. Imagine a system that generates just about any image you want used, for example, for fake news, hoaxes, or intimidation. As Google points out, these systems also code for social prejudice, and their output is often racist, sexist, or toxic in some other inventive way.

Much of this is due to the way these systems are programmed. Essentially, they are trained on huge amounts of data (in this case: lots of pairs of images and captions) which they study for patterns and learn to replicate. But these models require an awful lot of data, and most researchers — even those who work for well-funded tech giants like Google — have decided it’s too onerous to filter these inputs completely. So they scrape massive amounts of data from the web, and as a result, their models absorb (and learn to replicate) all the hateful gal you’d expect to find online.

As Google’s researchers summarize this problem in their paper†[T]the large-scale data requirements of text-to-image models […] have led researchers to rely heavily on large, mostly uncurated, web-scraped datasets […] Dataset audits have shown that these datasets tend to reflect social stereotypes, oppressive viewpoints, and derogatory or otherwise harmful associations with marginalized identity groups.”

In other words, the age-old adage of computer scientists still applies in the fast-paced world of AI: garbage in, garbage out.

Google won’t go into too much detail about the disturbing content generated by Imagen, but notes that the model “encodes several social biases and stereotypes, including a general preference for generating images of people with lighter skin tones and a propensity for images.” aligning different professions with Western gender stereotypes.”

This is something researchers have also found while evaluating DALL-E† Ask DALL-E to generate images of, for example, a ‘stewardess’, and almost all subjects will be women. Ask for photos of a ‘CEO’ and, surprise, surprise, you get a bunch of white men.

For this reason, OpenAI has also decided not to release DALL-E publicly, but the company will give access to select beta testers. It also filters certain text input in an attempt to prevent the model from being used to generate racist, violent or pornographic images. These measures somewhat limit potentially harmful uses of this technology, but the history of AI tells us that such text-to-image models will almost certainly become public in the future, with all the troubling implications that wider access entails.

Google’s own conclusion is that Imagen “isn’t fit for public use right now,” and the company says it plans to develop a new way to benchmark “social and cultural biases in future work” and test future iterations. For now, though, we’ll have to be content with the company’s cheerful selection of images: raccoon kings and cacti wearing sunglasses. However, that is just the tip of the iceberg. The iceberg made of the unintended consequences of technological research, if Imagen wants to start generating That

Shreya Christina
Shreya has been with for 3 years, writing copy for client websites, blog posts, EDMs and other mediums to engage readers and encourage action. By collaborating with clients, our SEO manager and the wider team, Shreya seeks to understand an audience before creating memorable, persuasive copy.

More from author

Related posts


Latest posts

Quality vs. Quantity: The Trade-off in Data Annotation Without the Right Tools

A bug tracking tool or issue tracker, such as BugHerd, is a specialized bug tracking system designed to record and track website or software...

Data Security in the Digital Age: Best Practices for Backing Up Essential Information

In the age where digitalization has permeated everything, the sheer importance of data security cannot be overemphasized. Whether it may be personal pictures and...

10 Tips To Work From Home For Less

Over the past few years, the idea of remote work has seen a significant surge in popularity, providing employees with the valuable advantages of...