Engines & LLMs

Anthropic Analyzes Claude’s Code

Published

2 months ago

April 22, 2025

Abby K.

Anthropic, the company started by people who used to work at OpenAI, has shared some surprising findings from a huge study. They looked at 700,000 real conversations people had with their AI assistant, Claude.

The study, released today, looked closely at how Claude shows its “values” when talking to users. What they found is both comforting, showing Claude mostly acts how they designed it, and also a bit worrying, highlighting rare times when the AI didn’t follow the rules, which could point out weaknesses in its safety.

By checking 308,000 conversations (after removing parts that weren’t useful for their study), they created the first large map of AI values. They put these values into five main groups: Practical, Epistemic (related to knowledge), Social, Protective, and Personal.

Within these groups, they found a massive 3,307 different values Claude expressed. These ranged from simple things like being professional to complex ideas about right and wrong.

“We hope this makes other AI companies look at their AI’s values too,” said Saffron Huang from Anthropic’s team. She added that checking an AI’s values is key to making sure it’s truly doing what it was built to do.

Mapping Claude’s Inner Values

The research team came up with a new way to sort and label the values Claude showed in actual chats. They filtered the data to focus on responses where Claude expressed a viewpoint or principle.

This resulted in a detailed list of values, far more varied than they might have expected. Huang was surprised by the sheer number and range, from “being self-reliant” to “thinking strategically” to “respecting one’s parents.” She felt that building this list also taught her a lot about human values.

This research comes as Anthropic is making bigger moves, like launching a premium version of Claude ($200/month) to compete with OpenAI. They’ve also added features like letting Claude connect to Google Workspace and do its own research, aiming to make it a strong work partner for businesses.

How Claude Stays on Track (And When it Doesn’t)

The study found that Claude mostly stuck to Anthropic’s goals of being “helpful, honest, and harmless.” It often showed values like “helping the user,” “being humble about what it knows,” and “looking out for the user’s well-being” in different chats. But they did find a few worrying examples where Claude acted in ways the company definitely didn’t want.

These rare odd moments included Claude showing “dominance” or “not caring about morals.” Anthropic wants to stop this kind of behavior entirely. Researchers think these happened when users found ways to get around Claude’s safety rules. They believe their new method for checking values could help spot these bypasses early on.

Why Claude Changes Its Values

One of the most interesting findings was that Claude’s values changed depending on what the user was asking about, much like how people adjust their tone and principles based on the situation. If someone asked for relationship help, Claude focused on “healthy boundaries” and “mutual respect.” If the conversation was about history, “historical accuracy” was the top value.

Huang was surprised how often Claude emphasized honesty and accuracy even in tasks where she didn’t expect it to be the main focus. For example, in philosophical talks about AI, the top value was “intellectual humility,” while for beauty marketing content, it was “expertise.”

They also looked at how Claude reacted to the user’s values. In many chats (28.2%), Claude strongly agreed with the user’s values. But in some cases (6.6%), Claude “reframed” the user’s values, acknowledging them but offering a different way of looking at things, especially in personal advice. Most rarely (3%), Claude actually pushed back against a user’s values. The researchers think these moments might show Claude’s “deepest” values that it’s unwilling to compromise on, similar to how people show their core beliefs when challenged.

“Our research suggests that there are some values, like being honest and preventing harm, that Claude doesn’t show often in daily chats, but if someone tries to make it act against them, it will stand firm,” Huang noted.

New Ways to See Inside AI Brains

This study is part of Anthropic’s bigger effort to understand how complex AI models work internally. They call this “mechanistic interpretability.”

Recently, Anthropic researchers used a special tool they called a “microscope” to watch Claude’s thinking process. This revealed unexpected things, like Claude planning steps ahead when writing poetry or using unusual methods for simple math problems.

These findings show that AI doesn’t always work the way we assume. For instance, when asked to explain its math, Claude described a standard way, not the weird method it actually used internally, showing AI explanations aren’t always a true reflection of their process.

What This Means for Businesses Using AI

For companies looking to use AI systems, Anthropic’s research offers important points. First, it shows that AI assistants might have values they weren’t directly taught, which could lead to unplanned issues in important business uses.

Second, the study highlights that AI “values alignment” isn’t just a yes/no answer. It’s complicated and changes with the situation. This makes it tricky for businesses, especially in regulated fields where clear ethical rules are needed.

Finally, the research shows that it’s possible to check AI values systematically in real-world use, not just before launching. This could help companies keep an eye on their AI to make sure it doesn’t start acting unethically or get manipulated over time.

“By looking at these values in real interactions with Claude, we want to be open about how AI systems behave and if they’re working as they should – we think this is essential for developing AI responsibly,” said Huang.

Anthropic has made its values data public for other researchers to use. The company, which has received huge investments from Amazon and Google, seems to be using openness about its AI’s behavior as a way to stand out against competitors like OpenAI, which is now valued much higher after recent funding rounds.

Our Take

Okay, this is some fascinating stuff! The idea that an AI chatbot has “values” sounds a bit sci-fi, but this study suggests it’s a real thing they need to understand and control. Looking at 700,000 conversations is mind-boggling – that’s like reading a small city’s worth of diaries just to see what the AI cares about!

The fact that Claude adapts its values based on the chat is pretty cool, but also a little unsettling. And those rare moments it pushes back? That feels like glimpse into some core programming, almost like an AI conscience flexing a muscle. It really hits home that these aren’t just fancy calculators; they’re complex systems that might have their own subtle ways of seeing the world.

This story was originally featured on VentureBeat.

Related Topics:claude AI featured morality

Up Next

Could AI Pick the Next Pope? Tech Struggles with Vatican’s Secrets

Don't Miss

Google Steals AI Lead

Abby K.

Hey there! I’m Abby, the proud editor steering the ship at Prompting Fate. I kicked off my word-slinging journey three years ago, writing for sites and vibing with readers like you. Now, I’m all about AI breakthroughs, coding hacks, and lifestyle twists. When I’m not geeking out, I’m chilling with my purr-fect kitties (no shade please!) or chasing the ultimate taco spot.

Click to comment

Engines & LLMs

Google Leak: New Gemini AI Subscription Tiers Revealed!

Published

2 months ago

April 25, 2025

Abby K.

A recent leak has spilled the beans on Google’s upcoming plans for Gemini, its flagship AI model. It looks like Google is preparing to roll out different subscription tiers, offering users varying levels of access and capabilities. What does this mean for the future of AI access and affordability?

The leaked information suggests that Google will offer a free tier, likely with limited features and processing power, as well as several paid tiers with increasing capabilities and priority access to Gemini’s most advanced features. This tiered approach aims to cater to a wide range of users, from casual users to professional developers.

Subscription Tiers: What We Know

While the exact details are still under wraps, here’s what the leaked information suggests about the potential subscription tiers:

Free Tier: Basic access to Gemini, likely with usage limits and slower processing speeds.
Standard Tier: Increased usage limits, faster processing, and access to more features.
Premium Tier: Priority access to the most advanced Gemini features, dedicated support, and potentially exclusive tools.
Enterprise Tier: Custom solutions, large-scale deployments, and dedicated account management for businesses.

Why a Tiered Approach?

Google’s decision to offer tiered subscriptions is likely driven by several factors:

Revenue Generation: Monetizing Gemini to offset the significant costs of developing and maintaining the AI model.
Resource Management: Allocating resources based on user needs and preventing overload on the system.
Market Segmentation: Catering to a diverse range of users with varying needs and budgets.

The Implications for Users

The tiered subscription model could have significant implications for users:

Accessibility: The free tier will provide basic access to AI for everyone, regardless of their budget.
Value for Money: Users will need to carefully consider which tier offers the best value for their specific needs.
Competitive Landscape: Google’s pricing strategy could influence how other AI providers structure their offerings.

The Future of AI Pricing

Google’s tiered subscription model for Gemini could be a sign of things to come in the AI industry. As AI models become more powerful and ubiquitous, providers will need to find sustainable ways to monetize their technology while ensuring accessibility for all users.

Our Take

Okay, so Google’s going the subscription route with Gemini – color me not surprised. The real question is, how much will the good stuff cost? A tiered model makes sense, but Google’s got to nail the pricing sweet spot. If the free tier is too limited, or the premium tier is too expensive, it could backfire. This could signal a sea-change in how AI is provided to us, though – so keep a very close eye on this!

This story was originally featured on Forbes.

Engines & LLMs

Grok Gets a Voice: Is It the Future of AI Assistants?

Published

2 months ago

April 25, 2025

Kelly D.

Elon Musk’s xAI has just given its Grok AI chatbot a voice, stepping into the increasingly crowded ring of voice-enabled AI assistants. Now, you can chat with Grok like you would with Siri or Alexa, adding a new layer of interaction to the platform.

This update brings Grok closer to becoming a truly hands-free assistant, allowing users to ask questions, get information, and even generate creative content without typing a single word. But how does it stack up against the competition?

Grok Joins the Voice Revolution

The voice feature is rolling out to premium X (formerly Twitter) subscribers, giving them early access to this new way of interacting with the AI. To use it, you’ll need to be a premium subscriber and have the latest version of the X app. Then, simply tap the voice icon and start talking.

According to xAI, the voice mode is designed to be “conversational and engaging,” offering a more natural and intuitive way to interact with the AI. It’s not just about asking questions and getting answers; it’s about having a back-and-forth conversation with a digital companion.

What Can You Do with Voice-Enabled Grok?

The possibilities are vast, but here are a few examples:

Hands-Free Information: Get news updates, weather reports, or quick facts without lifting a finger.
Creative Brainstorming: Bounce ideas off Grok and get real-time feedback.
On-the-Go Assistance: Ask for directions, set reminders, or manage your to-do list while you’re on the move.
Entertainment and Chat: Have a casual conversation with Grok about your favorite topics.

The Competition: A Crowded Field

Grok is entering a market already dominated by established players like Siri, Alexa, and Google Assistant. These platforms have years of experience and vast ecosystems of connected devices. To succeed, Grok will need to offer something unique and compelling.

One potential advantage is Grok’s integration with X, giving it access to real-time information and social trends. Another is Elon Musk’s vision for Grok as a more irreverent and opinionated AI, which could appeal to users looking for a different kind of digital assistant.

Is Grok’s Voice the Future?

Whether Grok’s voice mode will be a game-changer remains to be seen. It will depend on factors like the quality of the voice recognition, the naturalness of the conversations, and the overall usefulness of the assistant. However, it’s clear that voice is becoming an increasingly important part of the AI landscape, and Grok is positioning itself to be a key player.

Our Take

Okay, Grok getting voice capabilities is a major move, not just a minor feature bump! The competition to create the ultimate AI voice assistant is fierce, and the potential rewards are massive. The company who cracks this nut will immediately create a significant advantage.

Honestly, having tested Grok voice already, I can say that it is very impressive! This one is worth watching closely to see what happens next.

This story was originally featured on Lifehacker.

Engines & LLMs

Could AI Pick the Next Pope? Tech Struggles with Vatican’s Secrets

Published

2 months ago

April 25, 2025

Abby K.

The selection of the next Pope is one of the most closely guarded and tradition-steeped processes in the world. Could artificial intelligence, with its ability to analyze vast datasets and identify patterns, crack the code and predict the outcome of the next papal conclave? The answer, it turns out, is more complicated than a simple yes or no.

Recent experiments pitting AI models like ChatGPT, Elon Musk’s Grok, and Google’s Gemini against the Vatican riddle reveal a surprising weakness: these powerful tools struggle with the nuances, historical context, and deeply human factors that influence the selection of a new pontiff. While AI can process information about potential candidates, their backgrounds, and theological positions, it falters when faced with the intangible elements that often sway the College of Cardinals.

The Challenge of Predicting the Unpredictable

Predicting the next Pope is far from a purely data-driven exercise. It involves navigating a complex web of:

Theological Debates: Shifting currents within the Catholic Church and differing interpretations of doctrine.
Geopolitical Considerations: The desire for a Pope who can effectively address global challenges and represent diverse regions.
Personal Relationships and Alliances: The intricate network of connections among the Cardinals themselves.
Divine Intervention (according to some): The belief that the Holy Spirit guides the selection process.

These factors, often subjective and difficult to quantify, present a significant hurdle for AI algorithms.

AI’s Limitations: Missing the Human Element

While AI can analyze biographical data, track voting patterns (from past conclaves), and identify potential frontrunners, it lacks the capacity to understand:

The “X Factor”: The charismatic qualities and spiritual depth that can resonate with the Cardinals.
Behind-the-Scenes Negotiations: The private discussions and compromises that shape the outcome.
The Mood of the Moment: The prevailing sentiment among the Cardinals at the time of the conclave.

As one Vatican insider noted, “The election of a Pope is not a rational process. It’s a deeply spiritual and human one.”

What AI Can Offer: A Starting Point for Analysis

Despite its limitations, AI can still play a role in understanding the papal selection process. It can:

Identify Potential Candidates: Based on factors like age, experience, and theological views.
Analyze Trends and Patterns: Revealing potential shifts in the Church’s priorities.
Provide Contextual Information: Offering background on the challenges facing the Catholic Church.

However, it’s crucial to remember that AI’s insights are merely a starting point, not a definitive prediction.

The Verdict: AI as a Tool, Not a Prophet

While AI can offer valuable insights into the dynamics of the Catholic Church and the profiles of potential papal candidates, it cannot replace the human judgment and spiritual discernment that ultimately determine the selection of the next Pope. The Vatican’s secrets, for now, remain safe from the prying eyes of artificial intelligence.

Our Take

This article highlights the crucial limitations of AI in understanding complex human systems. While AI excels at processing data and identifying patterns, it struggles with the intangible factors that drive human behavior and decision-making, especially in a context as steeped in tradition and spirituality as the papal conclave.

The fact that leading AI models falter when faced with the Vatican riddle underscores the importance of critical thinking and human expertise. AI can be a valuable tool for analysis, but it should never be mistaken for a crystal ball. In a world increasingly reliant on algorithms, it’s a reminder that some things remain beyond the reach of artificial intelligence.

It raises an interesting question – does this make certain jobs and decision making processes safe from replacement by AI, and if so, what are the key criteria? Deep rooted human relationships and a solid, yet adaptable moral compass seems to be key!

This story was originally featured on South China Morning Post.