web analytics

Learn AI With Kesse | Best Place For AI News

We make artificial intelligence easy and fun to read. Get Updated AI News.

OpenAI and Google Stunned by the Launch of the First Open Source AI Agent

OpenAI and Google Shocked by the First EVER Open Source AI Agent

Introduction to GLM 4.6V: A Game-Changer in AI Models

The recent launch of GLM 4.6V has created a significant buzz in the AI community. This model represents a monumental leap in the field of multimodal AI, introducing capabilities that combine various data types—such as images, videos, screenshots, and even web pages—into a unified tool-calling framework. What makes this release particularly extraordinary is that it’s the first open-source multimodal model of this caliber, allowing anyone to download and use it without restrictions.

Unprecedented Multimodal Capabilities

The major shift with GLM 4.6V lies in its ability to treat multiple input types as first-class citizens in its action loop. Unlike traditional models that rely solely on text processing, GLM 4.6V processes visual information directly. This redefines how agents function, enabling them to incorporate visuals into their reasoning rather than describing them textually first. This is a critical advancement in creating more effective AI systems that can understand and manipulate visual data efficiently.

Robust Training Context and Performance

GLM 4.6V is designed with a remarkable context capacity of 128,000 tokens. This expansive range allows the model to process extensive amounts of data in one go—up to 150 pages of text or an hour of video. Such capacity eliminates the cumbersome step of converting visuals into text, allowing for a smoother reasoning process that can handle vast mixed inputs seamlessly.

Two versions of GLM 4.6V were introduced:

  • The large model with 106 billion parameters for high-performance cloud setups.
  • The flash version, optimized for use on local devices with only 9 billion parameters, which is free to use and focused on low-latency tasks. Both versions are MIT licensed, meaning companies can deploy them without the burden of complex licensing fees.

Cost-Effectiveness Compared to Competitors

When compared to other models in the market, GLM 4.6V offers a cost-effective solution. The pricing for the larger version is $0.3 per million input tokens and $0.9 per million output tokens, making it extremely competitive against models that charge upwards of $1.25 per million tokens. The smaller flash model, available for free, adds to its attractiveness for startups and larger enterprises alike.

Innovative Tool-Calling System

One of GLM 4.6V’s standout features is its native multimodal tool-calling capability. Traditional language models typically require a tedious process to use images, as they must describe these visuals and translate them into text before any operations can be executed. In contrast, GLM 4.6V directly processes visual data as parameters. This streamlined approach greatly enhances performance, effectively closing the loop between perception, understanding, and action.

Additionally, the model can handle URLs representing images or frames, allowing it to avoid file size limitations while efficiently targeting specific visuals within larger documents. This creates a more intuitive workflow, facilitating interactions with complex documents such as PDFs and presentations.

Versatile and Powerful Capabilities

GLM 4.6V thrives in mixed scenarios where it needs to comprehend charts, tables, and various types of visuals. The model can ingest a research paper, parse figures, understand mathematical formulations, and even conduct a visual audit to filter out low-quality imagery. It assembles a complete structured article in a single pass without the need for separate processing pipelines.

This capability is monumental; traditional models often struggled with handling mixed types of content, leading to messy outcomes. GLM 4.6V was trained on vast interleaved corpora, enabling it to handle mixed visual and textual content fluidly.

Groundbreaking Visual Web Search

The visual web search function is where the model shines. It intelligently determines the appropriate search tasks, employing both text-to-image and image-to-text methodologies based on the requirements at hand. This allows GLM 4.6V to effectively evaluate search results and integrate relevant visuals into its reasoning process, making the search results part of its cognitive workflow rather than treating them as merely supplementary snapshots.

Front-End Automation Features

Zepuai has also touted the model’s capabilities in front-end automation. By providing a screenshot of any app or website, GLM 4.6V can reconstruct the full layout in clean HTML, CSS, and JavaScript. Users can make simple requests, such as adjusting button positions or background colors, and the model will accurately map these changes back to the underlying code. This is an incredibly rare feature in open-source models and speaks to its advanced visual feedback loop.

Advanced Training Mechanisms

The method of training GLM 4.6V is equally impressive, utilizing a multi-stage setup involving extensive pre-training, fine-tuning, and reinforcement learning. However, instead of relying on conventional human feedback, the reinforcement learning employs verifiable tasks that have clear right or wrong answers. This progressive learning method helps the model grow increasingly capable over time.

Benchmark Performance and Industry Impact

Benchmark results reveal why the excitement around GLM 4.6V is justified. In various assessments—such as Math Vista and Web Voyager—GLM 4.6V outperformed many competing models. Notably, its extensive context capability sets it apart from other high-parameter models, enabling effective multi-source reasoning and better handling of mixed content.

The launch of GLM 4.6V marks a pivotal shift in the development of open-source multimodal systems. While many existing models have displayed impressive capabilities, they often lack the full integration of visual understanding into actionable insights. GLM 4.6V fills this gap, providing tools that allow AI systems to observe, plan, and execute effectively.

Conclusion

GLM 4.6V not only represents a breakthrough in multimodal AI technology but also offers a glimpse into the future of open-source models. Its powerful features, ease of use, robust training contexts, and competitive pricing position it as a leading choice for enterprises looking to innovate. The excitement surrounding its potential applications in various fields, from education to business, is sure to drive further advancements in AI. Keep an eye on this transformative technology, as it facilitates new workflows and integrations across diverse sectors.



#OpenAI #Google #Shocked #Open #Source #Agent
Thanks for reaching. Please let us know your thoughts and ideas in the comment section.

Source link

About The Author

16 thoughts on “OpenAI and Google Stunned by the Launch of the First Open Source AI Agent

  1. RIP my wallet trying to keep up with all these separate subs. I started using omnely to bundle sora and kling together, way cheaper than paying $20 for like 5 different sites.

  2. the subscription fatigue is actually insane lately. i ended up consolidating on omnely since they have sora and veo in one spot, beats managing ten different accounts just to make a few clips.

  3. trying to keep up with sora and nano prices is impossible lol. omnely is pretty solid for grouping them so you aren't paying separate pro plans for every single AI that drops.

  4. I'm building a data lake for n8n and need a model to begin processing my ebook library, building my local agent with things I am interested in. I'm going to test this multimodal model with processing all these epubs. What a time to be alive.

Leave a Reply

Your email address will not be published. Required fields are marked *

We use cookies to personalize content and ads and to primarily analyze our geo traffic sources. We also may share information about your use of our site with our social media, advertising, and analytics partners to improve your user experience. We respect your privacy and will never abuse your information. [ Privacy Policy ] View more
Cookies settings
Accept
Decline
Privacy & Cookie Policy
Privacy & Cookies policy
Cookie name Active

The content on this page governs our Privacy Policy. It describes how your personal information is collected, used, and shared when you visit or make a purchase from learnaiwithkesse.com (the "Site").

Kesseswebsites and Advertising owns Learn AI With Kesse and the website learnaiwithkesse.wiki. For the purpose of this Terms and Agreements [ we, us, I, our ] represents the owner of Learning AI With Kesse which is Kesseswebsites and Advertising. [ You, your, student and buyer ] represents you as the user and visitor of this site. Terms of Conditions, Terms of Service, Terms and Agreement and Terms of use shall be considered the same here. This website or site refers to https://learnaiwithkesse.com. You agree that the content of this Terms and Agreement may include Privacy Policy and Refund Policy. Products refer to physical or digital products. This includes eBooks, PDFs, and text or video courses. If there is anything on this page you do not understand you agree to reach out to us via email [ emmanuel@learnaiwithkesse.com ] for explanation before using any part of this site.

1. Personal Information We Collect

When you visit this Site, we automatically collect certain information about your device, including information about your web browser, IP address, time zone, and some of the cookies that are installed on your device. The primary purpose of this activity is to provide you a better user experience the next time you visit our again and also the data collection is for analytics study. Additionally, as you browse the Site, we collect information about the individual web pages or products that you view, what websites or search terms referred you to the Site, and information about how you interact with the Site. We refer to this automatically-collected information as "Device Information."

We collect Device Information using the following technologies:

"Cookies" are data files that are placed on your device or computer and often include an anonymous unique identifier. For more information about cookies, and how to disable cookies, visit http://www.allaboutcookies.org. To comply with European Union's GDPR (General Data Protection Regulation), we do display a disclaimer a consent text at the bottom of this website. This disclaimer alerts you the visitor or user of this website about why we use cookies, and we also give you the option to accept or decline. If you accept for us to use cookies on your site, the agreement between you and us will expire after 180 has passed.

"Log files" track actions occurring on the Site, and collect data including your IP address, browser type, Internet service provider, referring/exit pages, and date/time stamps.

"Web beacons," "tags," and "pixels" are electronic files used to record information about how you browse the Site.

Additionally, when you make a purchase or attempt to make a purchase through the Site, we collect certain information from you, including your name, billing address, shipping address, payment information (including credit card numbers), email address, and phone number. We refer to this information as "Order Information."

When we talk about "Personal Information" in this Privacy Policy, we are talking both about Device Information and Order Information.

Payment Information

Please note that we use 3rd party payment processing companies like https://stripe.com and https://paypal.com to process your payment information. PayPal and Stripe protects your data according to their terms and agreement and may store your data to help make your subsequent transactions on this website easier. We never and [ DO NOT ] store your card information or payment login information on our website or server. By making payment on our site, you agree to abide by the Terms and Agreement of the 3rd Party payment processing companies we use. You can visit their websites to read their Terms of Use and learn more about them.

2. How Do We Use Your Personal Information?

We use the Order Information that we collect generally to fulfill any orders placed through the Site (including processing your payment information, arranging for shipping, and providing you with invoices and/or order confirmations). Additionally, we use this [a] Order Information to:

[b] Communicate with you;

[c] Screen our orders for potential risk or fraud; and

When in line with the preferences you have shared with us, provide you with information or advertising relating to our products or services. We use the Device Information that we collect to help us screen for potential risk and fraud (in particular, your IP address), and more generally to improve and optimize our Site (for example, by generating analytics about how our customers browse and interact with the Site, and to assess the success of our marketing and advertising campaigns).

3. Sharing Your Personal Information

We share your Personal Information with third parties to help us use your Personal Information, as described above. For example, we use System.io to power our online store--you can read more about how Systeme.io uses your Personal Information here: https://systeme.io/privacy-policy/ . We may also use Google Analytics to help us understand how our customers use the Site--you can read more about how Google uses your Personal Information here: https://www.google.com/intl/en/policies/privacy/. You can also opt-out of Google Analytics here: https://tools.google.com/dlpage/gaoptout.

Finally, we may also share your Personal Information to comply with applicable laws and regulations, to respond to a subpoena, search warrant or other lawful request for information we receive, or to otherwise protect our rights.

4. Behavioral Advertising

As described above, we use your Personal Information to provide you with targeted advertisements or marketing communications we believe may be of interest to you. For more information about how targeted advertising works, you can visit the Network Advertising Initiative’s (“NAI”) educational page at http://www.networkadvertising.org/understanding-online-advertising/how-does-it-work.

You can opt-out of targeted advertising by:

COMMON LINKS INCLUDE:

FACEBOOK - https://www.facebook.com/settings/?tab=ads

GOOGLE - https://www.google.com/settings/ads/anonymous

BING - https://advertise.bingads.microsoft.com/en-us/resources/policies/personalized-ads]

Additionally, you can opt-out of some of these services by visiting the Digital Advertising Alliance’s opt-out portal at: http://optout.aboutads.info/.

5. Data Retention

Besides your card payment and payment login information, when you place an order through the Site, we will maintain your Order Information for our records unless and until you ask us to delete this information. Example of such information include your first name, last name, email and phone number.

6. Changes

We may update this privacy policy from time to time in order to reflect, for example, changes to our practices or for other operational, legal or regulatory reasons.

7. Contact Us

For more information about our privacy practices, if you have questions, or if you would like to make a complaint, please contact us by e-mail at emmanuel@learnaiwithkesse.com or by mail using the details provided below:

8. Your acceptance of these terms

By using this Site, you signify your acceptance of this policy. If you do not agree to this policy, please do not use our Site. Your continued use of the Site following the posting of changes to this policy will be deemed your acceptance of those changes.

Last Update | 18th August 2024

Save settings
Cookies settings