web analytics

Learn AI With Kesse | Newest Trends in Artificial Intelligence

We answer questions about artificial intelligence and bring you what's new in the AI World.

Visual AI Models: More Hype Than Vision

5 min read

The latest AI models are promoted as advanced systems capable of understanding images, audio, and text seamlessly. However, a closer look reveals something startling: they don’t truly ‘see’ in the way humans do. This discrepancy between marketed capabilities and actual performance raises important questions about the effectiveness of these AI systems.

These models, marketed with terms like ‘vision capabilities’ and ‘visual understanding,’ aim to convince users they can handle visual data as effectively as textual data. They claim to solve various tasks, such as analyzing sports or helping with homework, using supposed visual prowess. However, in reality, these models merely match input patterns with their training data, leading to significant limitations in their true visual abilities.

Introduction to Visual AI Models

The latest AI models like GPT-4o and Gemini 1.5 Pro are praised for understanding images, audio, and text. However, a recent study reveals a stark truth: they don’t really ‘see’ like humans do. Although nobody has outright claimed that these models see like people, the marketing is misleading. Terms like ‘vision capabilities’ and ‘visual understanding’ suggest otherwise.

These AI models attempt to convince users that they can analyze images and videos just as adeptly as they handle text. Their touted abilities range from solving homework problems to watching sports. While these claims are intricately worded, it is evident that companies want to convey an impression of visual prowess. In reality, these models merely match patterns in the input data with patterns in their training data.

Exposing the Limitations

A group of researchers from Auburn University and the University of Alberta conducted a systematic and informal study to examine AI models’ visual understanding. They subjected the largest multimodal models to very simple visual tasks. These tasks included checking if two shapes overlap, counting pentagons, or identifying a circled letter in a word.

Shockingly, these tasks proved extremely difficult for the AI models. According to co-author Anh Nguyen, these tasks are simple enough for even a first-grader. Yet, the AI models struggled immensely, with error margins that shouldn’t exist for such straightforward tasks. Nguyen emphasized that if the best models are failing at these tasks, there’s a fundamental flaw.

Case Study: Overlapping Shapes

The overlapping shapes test showcased one of the simplest visual reasoning tasks possible. When presented with two circles either slightly overlapping, just touching, or with some distance between them, the models struggled. GPT-4o managed to get it right over 95% of the time when the circles were far apart. However, at zero or small distances, its accuracy plummeted to 18%.

Even though Gemini Pro 1.5 performed better, it only managed to get 7 out of 10 correct at close distances. The study thus pointed out the inconsistency in the models’ performance across different conditions. These inconsistencies are troubling, highlighting that what the models are doing doesn’t align with our notion of ‘seeing.’

Counting Circles and the Olympic Rings

Another test involved counting interlocking circles in an image. The AI models performed well when there were five rings, likely due to the Olympic Rings being prominently featured in their training data. However, they faltered miserably when an additional ring was added.

For instance, Gemini struggled and couldn’t get it right even once. Sonnet-3.5 only got it right a third of the time, while GPT-4o managed slightly better but still failed over half the time. Adding more rings further confused the models. This demonstrates that these models don’t ‘see’ as we do. Their perception is heavily influenced by the data they have been trained on.

This gap between their understanding and actual visual perception is underscored when the models do well on five-ring images due to their association with the Olympic Rings, yet fail on six- or seven-ring images. They haven’t learned to visually understand images beyond what’s in their training set.

The Concept of Blindness in AI Models

The term ‘blindness’ is apt when describing AI models’ inability to ‘see’ in a human sense. The AI models might extract approximate, abstract information from an image, like identifying a circle on the left side. However, they lack the ability to make nuanced visual judgments.

Co-author Anh Nguyen notes, “There is no existing technology that can visualize exactly what a model is seeing.” This leads to complex behaviors that combine input text prompts, image data, and model weights in unforeseen ways.

In one example, the model was asked about two overlapping circles and the resulting cyan-shaded area. A sighted person would easily identify this, but the AI model’s response was akin to an informed guess with eyes closed. This underscores the AI’s reliance on trained data over actual visual insight.

Misleading Marketing and Real Capabilities

Despite these limitations, these ‘visual’ AI models aren’t entirely useless. They excel in specific contexts, such as identifying human actions, expressions, and common objects in photos. Their intended purpose is to interpret such data accurately.

The marketing for these AI models, however, paints a misleading picture. It suggests they possess human-like visual abilities, which they clearly do not. Research is crucial to demystify these claims and showcase the models’ true capabilities.

This research sheds light on how these models operate. They are adept at recognizing familiar patterns but falter when asked to analyze unfamiliar visuals. Therefore, while they can tell if someone is sitting or walking, it’s not through ‘seeing’ as humans understand it.

Future of Visual AI Research

Looking ahead, continued research is essential to improving the visual understanding of AI models. This will involve not only refining their training data but also developing new methods to gauge their visual reasoning abilities.

As our reliance on AI grows, it’s important to have a clear understanding of their limitations and strengths. While they are powerful tools in many respects, expecting them to ‘see’ like humans is currently unrealistic.

Final Thoughts

This examination of ‘visual’ AI models reveals significant shortcomings in their ability to process visual information. They serve specific functions well but cannot replicate human vision. Future research will hopefully bridge this gap, improving their overall utility.


This deep dive into ‘visual’ AI models has revealed substantial limitations in their ability to handle visual data akin to human perception. While these models excel in specific applications, such as recognizing common objects or actions, they fall short in more nuanced visual tasks. Therefore, expecting these AI systems to ‘see’ like humans is currently unrealistic.

The findings underscore the need for continued research to improve these models’ visual understanding. The hope is to bridge the existing gap between their marketed capabilities and their actual performance. This will help in developing AI tools that are more reliable and effective in various contexts.

About The Author

We use cookies to personalize content and ads and to primarily analyze our geo traffic sources. We also may share information about your use of our site with our social media, advertising, and analytics partners to improve your user experience. We respect your privacy and will never abuse your information. [ Privacy Policy ] View more
Cookies settings
Accept
Decline
Privacy & Cookie Policy
Privacy & Cookies policy
Cookie name Active

The content on this page governs our Privacy Policy. It describes how your personal information is collected, used, and shared when you visit or make a purchase from learnaiwithkesse.com (the "Site").

Kesseswebsites and Advertising owns Learn AI With Kesse and the website learnaiwithkesse.wiki. For the purpose of this Terms and Agreements [ we, us, I, our ] represents the owner of Learning AI With Kesse which is Kesseswebsites and Advertising. [ You, your, student and buyer ] represents you as the user and visitor of this site. Terms of Conditions, Terms of Service, Terms and Agreement and Terms of use shall be considered the same here. This website or site refers to https://learnaiwithkesse.com. You agree that the content of this Terms and Agreement may include Privacy Policy and Refund Policy. Products refer to physical or digital products. This includes eBooks, PDFs, and text or video courses. If there is anything on this page you do not understand you agree to reach out to us via email [ emmanuel@learnaiwithkesse.com ] for explanation before using any part of this site.

1. Personal Information We Collect

When you visit this Site, we automatically collect certain information about your device, including information about your web browser, IP address, time zone, and some of the cookies that are installed on your device. The primary purpose of this activity is to provide you a better user experience the next time you visit our again and also the data collection is for analytics study. Additionally, as you browse the Site, we collect information about the individual web pages or products that you view, what websites or search terms referred you to the Site, and information about how you interact with the Site. We refer to this automatically-collected information as "Device Information."

We collect Device Information using the following technologies:

"Cookies" are data files that are placed on your device or computer and often include an anonymous unique identifier. For more information about cookies, and how to disable cookies, visit http://www.allaboutcookies.org. To comply with European Union's GDPR (General Data Protection Regulation), we do display a disclaimer a consent text at the bottom of this website. This disclaimer alerts you the visitor or user of this website about why we use cookies, and we also give you the option to accept or decline. If you accept for us to use cookies on your site, the agreement between you and us will expire after 180 has passed.

"Log files" track actions occurring on the Site, and collect data including your IP address, browser type, Internet service provider, referring/exit pages, and date/time stamps.

"Web beacons," "tags," and "pixels" are electronic files used to record information about how you browse the Site.

Additionally, when you make a purchase or attempt to make a purchase through the Site, we collect certain information from you, including your name, billing address, shipping address, payment information (including credit card numbers), email address, and phone number. We refer to this information as "Order Information."

When we talk about "Personal Information" in this Privacy Policy, we are talking both about Device Information and Order Information.

Payment Information

Please note that we use 3rd party payment processing companies like https://stripe.com and https://paypal.com to process your payment information. PayPal and Stripe protects your data according to their terms and agreement and may store your data to help make your subsequent transactions on this website easier. We never and [ DO NOT ] store your card information or payment login information on our website or server. By making payment on our site, you agree to abide by the Terms and Agreement of the 3rd Party payment processing companies we use. You can visit their websites to read their Terms of Use and learn more about them.

2. How Do We Use Your Personal Information?

We use the Order Information that we collect generally to fulfill any orders placed through the Site (including processing your payment information, arranging for shipping, and providing you with invoices and/or order confirmations). Additionally, we use this [a] Order Information to:

[b] Communicate with you;

[c] Screen our orders for potential risk or fraud; and

When in line with the preferences you have shared with us, provide you with information or advertising relating to our products or services. We use the Device Information that we collect to help us screen for potential risk and fraud (in particular, your IP address), and more generally to improve and optimize our Site (for example, by generating analytics about how our customers browse and interact with the Site, and to assess the success of our marketing and advertising campaigns).

3. Sharing Your Personal Information

We share your Personal Information with third parties to help us use your Personal Information, as described above. For example, we use System.io to power our online store--you can read more about how Systeme.io uses your Personal Information here: https://systeme.io/privacy-policy/ . We may also use Google Analytics to help us understand how our customers use the Site--you can read more about how Google uses your Personal Information here: https://www.google.com/intl/en/policies/privacy/. You can also opt-out of Google Analytics here: https://tools.google.com/dlpage/gaoptout.

Finally, we may also share your Personal Information to comply with applicable laws and regulations, to respond to a subpoena, search warrant or other lawful request for information we receive, or to otherwise protect our rights.

4. Behavioral Advertising

As described above, we use your Personal Information to provide you with targeted advertisements or marketing communications we believe may be of interest to you. For more information about how targeted advertising works, you can visit the Network Advertising Initiative’s (“NAI”) educational page at http://www.networkadvertising.org/understanding-online-advertising/how-does-it-work.

You can opt-out of targeted advertising by:

COMMON LINKS INCLUDE:

FACEBOOK - https://www.facebook.com/settings/?tab=ads

GOOGLE - https://www.google.com/settings/ads/anonymous

BING - https://advertise.bingads.microsoft.com/en-us/resources/policies/personalized-ads]

Additionally, you can opt-out of some of these services by visiting the Digital Advertising Alliance’s opt-out portal at: http://optout.aboutads.info/.

5. Data Retention

Besides your card payment and payment login information, when you place an order through the Site, we will maintain your Order Information for our records unless and until you ask us to delete this information. Example of such information include your first name, last name, email and phone number.

6. Changes

We may update this privacy policy from time to time in order to reflect, for example, changes to our practices or for other operational, legal or regulatory reasons.

7. Contact Us

For more information about our privacy practices, if you have questions, or if you would like to make a complaint, please contact us by e-mail at emmanuel@learnaiwithkesse.com or by mail using the details provided below:

8. Your acceptance of these terms

By using this Site, you signify your acceptance of this policy. If you do not agree to this policy, please do not use our Site. Your continued use of the Site following the posting of changes to this policy will be deemed your acceptance of those changes.

Last Update | 18th August 2024

Save settings
Cookies settings