Current State of AI-Generated Imaging
Artificial intelligence models that have the ability to recognize and process text and pictures, just as humans do, have emerged. In early 2021, artificial intelligence developed by OpenAI, a non-profit artificial intelligence research organization in the United States, showed its ability to draw a corresponding picture matched with a text input. In August of 2021, OpenAI also unveiled Codex, a software development tool that uses artificial intelligence to automatically code when people give commands in everyday language.
Using the artificial intelligence-based natural language processing model (GPT-3) developed by OpenAI, these tools show that computers have reached the level of understanding the content and context of speech just like humans, beyond performing explicit work orders. Until now, the ability to understand the meaning contained in pictures and text and express them in other forms has been regarded as a high-level cognitive ability that only human have, but that is a challenge artificial intelligence has started to take on.
In addition to solving old problems such as dog and cat identification, AI has been put to practical use, such as unlocking facial recognition, showing off ‘sight’ that is superior to humans. The smartphone photo management tool provides a function to automatically categorize and tag numerous photos according to content, such as characters and backgrounds.
In 2016, Google disclosed a function on GitHub, which would call on artificial intelligence to automatically write photo descriptions using its machine learning system, TensorFlow. There was no need for people to look at each picture and enter their descriptions.
Google launched it in 2019 as a ‘Live Caption’ feature that automatically adds subtitles to videos on smartphones. Microsoft announced in October of 2020 that it had upgraded to human level by doubling the accuracy of its photo’s description automatic input. This function is installed in the app for the visually impaired and will be applied to MS’s Office tools such as Word and PowerPoint.
In January 2021, OpenAI unveiled ‘DALL-E’, artificial intelligence which automatically draws an image or picture when you input a sentence. Using an artificial intelligence natural language processing model and image recognition technology, images that have never been learned before are drawn only with inputted sentences. It is trained on a vast text-image dataset and trained to generate images from sentences.
The name ‘DALL-E’’ came from surrealist painter Salvador Dali and Pixar’s animation WALL-E combined. When text is entered in English on the website of DALL-E, it is exemplified in various pictures. When you input ‘avocado-shaped chair’ or ‘baby penguin emoji wearing a green shirt and yellow pants, a blue hat and red gloves’, various image samples are presented. Sentences that suggest non-real cases such as ‘a baby radish walking a dog in a ballet suit’ and ‘a turtle that looks like a giraffe’ are also implemented in various forms.
According to Open AI DALL-E can have an important and broad social impact. And they plan to study social issues such as the economic impact of the tool on a particular job or occupation, the potential for bias in results, and ethical issues in the long term.
The fact that artificial intelligence can recognize images and sentences in an integrated way like humans and freely cross-implement them shows that it has approached human cognition one step closer. Computers overtook humans in their sophisticated and fast computing power, but they were unable to “understand” like humans. Now we must admit that artificial intelligence showing a kind of context understanding ability through the integrated processing function of text and pictures would bring us a great deal of convenience, and it also foretells the occurrence of new social problems.