After two accidents in one week, looking back at how Anthropic's seven co-founders discussed "safety" a year ago

2026-04-02 04:50:53

Original video title: Building Anthropic | A conversation with our co-founders
Original video source: Anthropic
Original text compiled by: Shencha TechFlow

Key takeaways

Over the past week, Anthropic had two consecutive incidents:

First, nearly 3,000 internal documents were accidentally made publicly accessible due to a CMS configuration error. Then, Claude Code v2.1.88 shipped with a 59.8MB source map when it was released on npm—510,000 lines of source code were left exposed.

A company that wrote “safety” into its DNA kept stumbling on operations at home—an irony that’s hard to top.

But before rushing to mock them, it’s worth going back and listening to an internal conversation among Anthropic’s seven co-founders from a year and a half ago. This podcast was recorded in December 2024. The seven people discussed how the company was formed, how the RSP (Responsible Scaling Policy, literally “Responsible Scaling Policy”) was refined, why the word “safety” can’t be used casually, and the quote from CEO Dario that has been repeated again and again:

“If a building sets off a fire alarm every week, it’s actually a very unsafe building.”

Hearing this line again now feels different.

Seven co-founders, quick recognition

Dario Amodei｜CEO, former OpenAI research VP, trained in neuroscience. The final decision-maker behind Anthropic’s strategy and safety roadmap. He talked the most in this conversation.

Daniela Amodei｜President, Dario’s sister. She worked at Stripe for five and a half years, leading trust and safety, and earlier worked in the nonprofit and international development space. Anthropic’s organizational build-out and external communications are largely led by her.

Jared Kaplan｜A physics professor turned AI researcher, one of the core authors of scaling laws. He often offers judgment from the outsider’s perspective and says he started doing AI because he “got bored with physics.”

Chris Olah｜A leading figure in interpretability research. Entered the Bay Area AI scene at 19, and later worked at Google Brain and OpenAI. The most pronounced technical idealist in Anthropic.

Tom Brown｜First author of the GPT-3 paper; now manages Anthropic’s compute resources. His perspective leans more toward engineering and infrastructure. In the podcast, he talked a lot about how he moved from “I don’t really believe AI will get this fast” to changing his view.

Jack Clark｜Former Bloomberg tech reporter, head of policy and public affairs at Anthropic. He served as the host in this conversation, responsible for bridging segments and follow-up questions.

Sam McCandlish｜Research co-founder. The person who spoke the least in the whole event, but often pinpoints the crux with a single line—he’s the “finisher” position.

Highlights and viewpoints summary

Why do AI: from boredom with physics to “I’ll believe it once I’ve seen enough”

Jared Kaplan: “I spent a long time doing physics. It got a bit boring, and I wanted to work with more friends, so I did AI.”

Dario Amodei: “I don’t think I ever gave you a clear ‘persuasion.’ I just kept showing you AI model outputs. At some point I showed you enough, and then you said, ‘Yeah, that looks right.’”

Going against the consensus: most consensus is herd behavior disguised as maturity

Jared Kaplan: “A lot of AI researchers were psychologically badly wounded by AI winter, and it feels like ambition isn’t allowed.”

Dario Amodei: “The deepest lesson I learned over the past decade is this: many ‘everyone knows’ consensuses are actually herd behavior disguised as maturity. You’ve seen consensus get flipped overnight a few times, and then people say, ‘No, we’re betting on this.’ Even if you’re only 50% correct, you’ll still contribute a lot of things that others didn’t contribute.”

Safety and scaling are intertwined

Dario Amodei: “One of the motivations for scaling up the model at the time was that the model had to be smart enough first for RLHF to work. That’s still what we believe: safety and scaling are intertwined.”

RSP—the Responsible Scaling Policy—is Anthropic’s ‘constitution’

Tom Brown: “For Anthropic, RSP is like our constitution. It’s a core document with guidance, so we’re willing to invest a huge amount of time and effort to keep refining it.”

Dario Amodei: “RSP prevents plans that don’t meet safety standards from moving forward. We’re not just talking slogans—we actually embed safety into every single step.”

Fire alarms ring too many times; when it’s truly on fire, nobody runs

Daniela Amodei: “We can’t just use the word ‘safety’ to steer progress. Our real goal is to make sure everyone understands what we mean by safety.”

Dario Amodei: “What often damages safety is those frequent ‘safety drills.’ If there’s a building where the fire alarm goes off every week, that’s actually a very unsafe building.”

A “noble failure” is a trap

Chris Olah: “There’s a saying that the most moral action is to sacrifice other goals for safety, to demonstrate how pure your commitment to the cause is. But in practice, that approach is self-defeating. Because it puts decision power in the hands of people who don’t care about safety.”

The co-founders pledge to donate 80% of their revenue

Tom Brown: “We jointly commit to donating 80% of our revenue to causes that drive social progress. It’s something everyone supports without hesitation.”

Nobody wants to start a company, but it feels like we have to

Sam McCandlish: “Actually, none of us had the initial desire to start a company. We just felt it was our responsibility—because it’s the only way to ensure AI development moves in the right direction. That’s why we made that commitment.”

Daniela Amodei: “Our mission is both clear and pure. That kind of thing is not common in the tech industry.”

Interpretability: “artificial biology” hidden inside neural networks

Chris Olah: “Neural networks are wonderful. There’s a lot of beauty we still haven’t seen. Sometimes I imagine that ten years from now I walk into a bookstore and buy a textbook about neural network biology. And the book is full of all sorts of amazing things.”

AI to strengthen democracy, not become a tool for dictatorship

Dario Amodei: “We worry that if AI is developed the wrong way, it could become a tool for authoritarianism. How do we make AI a tool that promotes freedom and self-determination? The importance of this area isn’t any less than biology and interpretability.”

From White House meetings to Nobel Prizes: AI’s impact has long gone beyond the tech circle

Jared Kaplan: “Back in 2018, you wouldn’t think a president would call you to the White House to say they’re paying attention to language models.”

Dario Amodei: “We’ve seen Nobel Prizes in chemistry awarded to AlphaFold. We should work to develop tools that help us create hundreds of AlphaFolds.”

Why research AI?

Jack Clark: Why did we start doing AI in the first place? Jared, why did you do AI?

Jared Kaplan: “I spent a long time doing physics. It got a bit boring, and I wanted to work with more friends, so I did AI.”

Tom Brown: “I thought Dario persuaded you.”

Dario Amodei: “I don’t think I ever gave you a clear ‘persuasion.’ I just kept showing you AI model results—trying to express that they’re general and not only applicable to one specific problem. At some point I showed you enough, and then you said, ‘Yeah, that looks right.’”

Jack Clark: Chris, when you were doing interpretability research, were you getting to know everyone in Google?

Chris Olah: “No. Actually, when I was 19, I first came to the Bay Area and I got to know a lot of the people in that group. Back then, I met Dario and Jared; they were postdocs, and I thought that was incredibly cool. Later, at Google Brain, after Dario joined, we sat side by side for a while. I also worked with Tom, and then later at OpenAI I ended up working with all of you.”

Jack Clark: “I remember that in 2015 I saw Dario at a conference and he wanted to interview you. Even Google PR said I should read all your papers first.”

Dario Amodei: “At the time I was writing ‘Concrete Problems in AI Safety’ at Google.”

Sam McCandlish: “I started working with you, and you invited me to your office to chat—like you gave me an overview of AI as a whole. I remember after that conversation thinking, ‘Oh, this is way more serious than I realized.’ You talked about the ‘Big Blob of Compute,’ parameter counts, and the scale of neurons in the human brain.”

Breakthrough scaling

Jack Clark: I remember, when we were doing scaling laws at OpenAI, making the model bigger really started to work—truly and continuously, and in a somewhat uncanny way, across many projects. From GPT-2 to scaling laws to GPT-3, we just kept getting closer like that.

Dario Amodei: “We were just the group of people who get things done.”

Jared Kaplan: “We were also excited about safety. Back then there was an idea: AI would be very capable, but maybe it wouldn’t understand human values, and even might not be able to communicate with us. Language models, to some extent, can ensure they understand a lot of implicit knowledge.”

Dario Amodei: “And on top of language models, RLHF. One of the motivations for scaling up the model at the time was that the model had to be smart enough first for RLHF to work. That’s still what we believe: safety and scaling are intertwined.”

Chris Olah: “Yes—back then, the scaling work was actually part of the safety team too. Because we thought, if you want people to take safety seriously, first you have to be able to predict AI trends.”

Jack Clark: I remember I was at an airport in the UK, sampling from GPT-2 to write fake news, then sending it to Dario on Slack saying, “This really works—could have huge policy implications.” I remember Dario replying, “Yes.”

After that, we did a lot of publication-related work too, and that was crazy.

Daniela Amodei: “I remember the publication segment—that was the first time we really started collaborating, when GPT-2 got released.”

Jack Clark: “I think it helped us a lot. We started by doing a ‘slightly weird but safety-oriented’ thing together. Later we did Anthropic—another larger-scale, still slightly weird but safety-oriented thing.”

AI’s early days

Tom Brown: “Let’s go back to the ‘Concrete Problems’ paper. I joined OpenAI in 2016. At that time you and I were both among the earliest folks. I remember it felt like the first mainstream AI safety paper. Where did it come from?”

Dario Amodei: “Chris knew—he was involved. At the time at Google, I forgot what my main project even was. This one felt like something I procrastinated into producing.”

We wanted to write down what open problems there are in AI safety. Back then, AI safety was often talked about in a very abstract way. We wanted to ground it in real ML of the time. Now we’ve worked this line for six or seven years already, but back then it was still a weird idea.

Chris Olah: “I feel like, in some sense, it’s almost a political project. Back then, many people didn’t take safety seriously. We wanted to assemble a list of questions that everyone agreed were reasonable. Many of those questions were already in the literature. Then we found people with credibility across institutions and got them to co-sign.”

I remember it took me a long time to communicate with more than twenty researchers in Brain, to secure support for publication. If you only look at the questions themselves, going back to it today, not all of them might hold up. Maybe they weren’t the best questions. But if you view it as building consensus—proving, ‘These are real problems, and they’re worth taking seriously’—then it was an important moment.

Jack Clark: “Eventually you’d end up in a very peculiar sci-fi world. I remember early on Anthropic talked about Constitutional AI. Jared said, ‘We write a constitution for a language model, and then it behaves that way.’ It sounded insane back then. Why did you think it was feasible?”

Jared Kaplan: “I discussed it at length with Dario, and I think in AI, simple methods often work extremely well. The earliest version was pretty complex; later we kept stripping it down. In the end, it became: use the fact that the model is good at multiple-choice questions—give it a clear prompt telling it what it should look for—and that’s enough. Then we can write the principles directly.”

Dario Amodei: “This comes back to the ‘Big Blob of Compute,’ the ‘The Bitter Lesson,’ and the ‘Scaling Hypothesis.’ As long as you can give the AI a clear objective and data, it will learn it. A set of instructions, a set of principles—the language model can read them, and it can also compare them with its own behavior. The training objective is there. So Jared and my view is: it can be done, as long as you iterate the details repeatedly.”

Jared Kaplan: “For me it was weird at first. I came from physics, and now everyone’s excited about AI, so it’s easy to forget the atmosphere back then. I talked with Dario about these things and it felt like a lot of AI researchers had been psychologically hurt badly by AI winter—like ‘having ambition’ wasn’t allowed. Talking about safety required you to believe that AI could be very strong and very useful. But there was, at that time, a kind of anti-ambition ban. One advantage physicists have is ‘arrogance.’ They often do very ambitious things and are used to talking about big-picture visions.”

Dario Amodei: “I think that’s true. In 2014, a lot of things were just not allowed to be said. It’s also like a broader academic problem: besides certain fields, institutions increasingly dislike risk. Industrial AI inherited the same mindset. I think it wasn’t until around 2022 that this started to come out of it.”

Chris Olah: “And there were two forms of ‘conservatism’: one is taking risks seriously, and the other is treating it with arrogance—believing the ideas could succeed while still taking them seriously. Back then, we were dominated by the latter. Historically, in the 1939 nuclear physics discussions, it was similar too: Fermi resisted, while Szilard or Teller took risks more seriously.”

Dario Amodei: “The deepest lesson I learned over the past decade is: many ‘everyone knows’ consensuses are actually herd behavior disguised as maturity. You’ve seen consensus get flipped overnight a few times, and then people say, ‘No, we’re betting on this.’ Maybe it’s not guaranteed to be right, but bet while ignoring noise. Even if you’re only 50% correct, you’ll contribute a lot of things others didn’t contribute.”

Changing public attitudes toward artificial intelligence

Jared Kaplan: “It’s the same in some safety debates today. External consensus suggests that many safety issues don’t naturally emerge from technology itself, but in our research at Anthropic, we’ve seen it does emerge naturally.”

Daniela Amodei: “But over the past 18 months, this has been changing, and the world’s emotions toward AI are clearly shifting too. When we do user research, we hear more often from ordinary users who worry about AI’s overall impact on the world.”

Sometimes it’s about work, bias, toxicity. Sometimes it’s: “Will it mess up the world and change how humans collaborate?”—and honestly, I hadn’t fully anticipated all of that.

Sam McCandlish: “For some reason, the ML research community is often more pessimistic about ‘AI will become very strong’ than the general public.”

Jared Kaplan: “In 2023, Dario and I went to the White House. In the meeting, Harris and Raimondo basically meant this: we’re watching you. AI is a big deal. We’re paying serious attention. But in 2018, you wouldn’t think ‘a president would call you to the White House to say they’re paying attention to language models.’”

Tom Brown: “What’s interesting is that many of us jumped in when the outcome still seemed uncertain—like Fermi doubting the atomic bomb. There was some evidence that the atomic bomb could be made, but also a lot of evidence that it wouldn’t. And in the end he decided to try. Because if it’s true, the impact is so huge that it’s worth doing.”

From 2015 to 2017, there was some—and increasingly more—evidence suggesting AI could be a big deal. In 2016, I talked with my mentor: I’d done startups, and I wanted to do AI safety, but my math wasn’t strong enough, and I didn’t know what to do. Back then, someone said you need to be proficient in decision theory. Someone else said that no crazy AI incidents would happen, and that there were very few people who truly supported that view.

Jack Clark: “I was considered crazy back in 2014 when I reported on the ImageNet trend. In 2015, I wanted to write about NVIDIA because their papers proposed GPU, and I was still told I was crazy. In 2016, I left news to go into AI, and I even got emails saying, ‘You made the biggest mistake of your life.’ In many ways, betting seriously that scaling would come true really looked like insanity.”

Jared Kaplan: “How did you decide? Were you conflicted?”

Jack Clark: “I made a reverse bet: I asked for them to make me a full-time AI reporter and double my salary. I knew they wouldn’t agree. Then I went to sleep and, when I woke up, I quit. Because every day I was reading archive documents, and I kept feeling like something crazy big was happening—at some point, you had to make a high-conviction bet.”

Tom Brown: “I wasn’t that decisive. I wavered for six months.”

Daniela Amodei: “And at the time, it wasn’t mainstream that ‘engineers could significantly move AI.’ Back then, it was ‘only researchers can do AI,’ so your hesitation isn’t surprising.”

Tom Brown: “Later, OpenAI said, ‘You can help with AI safety through engineering.’ That’s what got me to join. Daniela, you were my manager at OpenAI. Why did you join at the time?”

Daniela Amodei: “I was at Stripe for five and a half years, and Greg used to be my boss. I also introduced Greg and Dario to each other. At the time he was founding OpenAI, and I told him: ‘The smartest person I know is Dario. If you can get him onto your team, that would be your good luck.’ Later, Dario joined OpenAI.”

Maybe like you, I was also thinking about what I could do after leaving Stripe. The reason I joined Stripe was because earlier, when I worked in the nonprofit and international development space, I felt like I needed more skills. At the time I actually thought I’d eventually come back to that space.

Before joining Stripe, I felt like I didn’t have enough ability to help people who had it worse than I did. So I was looking at other tech companies, hoping to find a new way to have a bigger impact. And at the time, OpenAI made it feel like a very good choice. It was a nonprofit organization dedicated to achieving a very important goal with far-reaching significance.

I’ve always believed in AI’s potential. I knew something about Dario, and they really did need someone to help manage things, so I thought this job was a strong match with my background. I remember thinking: “It’s a nonprofit. A group of very talented people with a beautiful vision are here—but their operations seem a bit chaotic.” And it was exactly that challenge that excited me, because I could join in and help.

At the time, I felt like I was an all-rounder. Not only did I manage team members, I also led some technical teams, managed the expansion of the organization, and I was responsible for org scaling work. I also worked with the language team, and later took on some other tasks. I also got involved in certain policy-related work and collaborated with Chris. I felt like the company had so many excellent people, and that made me especially want to join—help the company become more efficient and more organized.

Jack Clark: “I remember after you finished GPT-3 you said, ‘Have you heard of trust and safety?’”

Daniela Amodei: “I used to lead a trust and safety team at Stripe. For technical areas like this, you might need to consider trust and safety. It’s really a bridge between AI Safety Research and more practical, day-to-day work—meaning how to make models truly safe.”

It’s very important to propose that this technology in the future will have a major impact. At the same time, we also need to do some more practical work day to day, laying the groundwork for dealing with higher-risk scenarios in the future.

Responsible scaling policy: ensuring the safe development of AI

Jack Clark: “This lines up perfectly with how we talk about Responsible Scaling Policy (RSP)—how it was proposed, why we thought of it, and how we apply it now, especially given the work we’re doing today on trust and safety for models. So who first came up with this RSP (Responsible Scaling Policy)?”

Dario Amodei: “It was originally proposed by me and Paul Christiano, around the end of 2022. The initial idea was: should we temporarily limit model scaling to a certain level before we get to some specific scale, until we find ways to solve certain safety issues?”

But later we felt that simply limiting scaling at one point, then opening it up again, was a bit strange. So we decided to set a series of thresholds. Every time the model reached a threshold, we needed to run a series of tests to assess whether the model had the corresponding safety capabilities.

Every time it reached a threshold, we needed to adopt stricter safety and assurance measures. But from the beginning we had one idea: it might be better if a third party executed it. In other words, this strategy shouldn’t be solely the responsibility of a single company. Otherwise, other companies might not be willing to adopt it. That’s why Paul personally designed the strategy. Of course, over time, many details also changed. Meanwhile, our team has continued researching how to make the strategy work better.

After Paul organized this concept into a stable form, it was almost like he was announcing it—within one or two months, our team published our own version too. Many members on our team were deeply involved in this process. I remember writing at least one of the early drafts, but the whole document went through many rounds of revisions.

Tom Brown: “For Anthropic, RSP is like our ‘constitution.’ It’s a core document with guidance. That’s why we’re willing to invest a lot of time and effort to refine it repeatedly—to ensure it’s accurate and complete.”

Daniela Amodei: “I think it’s really interesting how RSP has evolved as Anthropic has grown. It’s gone through multiple stages, and it also requires a variety of different skills to implement it. For example, some of the big-picture ideas are handled mainly by Dario, Paul, Sam, Jared, and others. They were thinking: ‘What are our core principles? What message do we want to convey? How do we know we’re heading in the right direction?’”

But besides that, there’s also very practical operational work. As we iterate, we evaluate and adjust some details. For example, we might have originally expected to achieve certain goals at a particular safety level, but if we didn’t, we would reassess and make sure we’re accountable for our work outcomes.

In addition, there are many organizational-structure adjustments. For example, we decided to redesign RSP’s organizational structure so responsibilities could be separated more clearly. I really like the analogy to a constitution for the importance of this document. Just like in the United States, to ensure the constitution is implemented, a whole set of institutions and systems were created: courts, the Supreme Court, the president, and the two houses of Congress. While those institutions also take on other responsibilities, their existence largely serves to uphold the constitution. And Anthropic’s RSP has been going through a similar process.

Sam McCandlish: “I think this reflects a core belief we have about safety: safety issues can be solved. This is a very complex and difficult task that requires lots of time and effort.”

Like in the automotive safety field, the relevant systems and institutions were built over many years. But the problem we face now is: do we have enough time to finish all this work? Therefore, we have to find the key institutions needed for AI safety as quickly as possible, build them here first, and simultaneously make sure these institutions can be borrowed and promoted elsewhere.

Dario Amodei: “This also helps unify collaboration internally. Because if any part of the organization behaves in ways that don’t match our safety values, RSP will, in some way, expose the problem, right? RSP will stop plans that don’t meet safety standards from moving forward. So it also becomes a constant reminder for everyone—to ensure safety becomes a baseline requirement in the product development and planning process. We’re not just talking slogans—we embed safety into every step. If someone joins the team and can’t align with these principles, they’ll find they can’t fit in. Either they adapt to this direction, or they’ll find it hard to continue.”

Jack Clark: “Over time, RSP has become more and more important. We’ve put thousands of hours into it. And when I explained RSP to senators, I said: ‘We created measures to ensure our technology isn’t easily misused and also to guarantee safety.’ Their reactions were usually: ‘That sounds normal. Don’t all companies do this?’ It made me a bit laugh and cry at the same time—because no, not all companies do this.”

Daniela Amodei: “Also, I think beyond aligning the team’s values, RSP increases the company’s transparency. Because it clearly records what our goals are. Everyone inside the company can understand them, and people outside can also clearly know what our goals and direction are regarding safety. Even though it’s not perfect yet, we’ve been continuously optimizing and improving it.”

I think the key is to explicitly point out, “What are the core problems we’re focused on?” We can’t just use the word “safety” to steer progress, like “because safety issues mean we can’t do X,” or “because safety issues mean we have to do X.” Our real goal is to make sure everyone understands what we mean by safety.

Dario Amodei: “In the long run, what really damages safety is those frequent ‘safety drills.’ I once said: ‘If there’s a building where the fire alarm goes off every week, it’s actually a very unsafe building.’ Because when a real fire happens, maybe nobody will pay attention. We have to be very focused on the accuracy and calibration of alarms.”

Chris Olah: “Looking at it from another angle, I think RSP creates healthy incentive mechanisms in many layers. For example, internally, RSP aligns each team’s incentives with safety goals. This means if we don’t make enough progress on safety, the related work gets paused.”

And externally too, RSP is better at creating healthy incentive mechanisms than other approaches. For example, one day if we have to take some major actions, like admitting, ‘Our models have progressed to a certain stage, but we can’t ensure their safety yet,’ then RSP provides a clear framework and evidence to support that decision. This framework already exists in advance, and it’s clear and easy to understand. When we discussed early versions of RSP, I didn’t fully realize its potential, but now I think it’s more effective than other methods I could have imagined.”

Jared Kaplan: “I agree with these points, but I think it might underestimate the challenges we face in crafting the right policies, setting evaluation criteria, and defining boundaries. We’ve been iterating a lot on these areas, and we’re still optimizing. One difficult problem is that with some emerging technologies, it’s sometimes hard to clearly tell whether they’re dangerous or safe. Most of the time, we run into a huge gray zone. These challenges excited me early on when developing RSP, and they still do—but at the same time, I’ve also realized that implementing this strategy clearly and making it truly work is more complicated and challenging than I initially imagined.”

Sam McCandlish: “You can’t fully predict the gray zones because they’re everywhere. You can only find where the problems are once you really start implementing. That’s why our goal is to implement everything as early as possible so we can discover potential problems as quickly as possible.”

Dario Amodei: “You need three to four iterations to truly get it perfect. Iteration is a powerful tool, and it’s almost impossible to get everything right the first time. So if the risks are continuously increasing, you need to complete those iterations early—not wait until the end.”

Jack Clark: “At the same time, you also need to build internal institutions and processes. Even though specific details might change over time, training the team’s execution capability is the most important part.”

Tom Brown: “I manage Anthropic’s compute resources. For me, we need to communicate with external stakeholders. Different external parties have different views on the speed of technological development. At first I also thought technology wouldn’t progress that fast, but later my view changed, and I can understand that very well now. I think RSP is especially useful for me, particularly when talking with people who believe technology development will be slower. We can tell them: ‘Before technology becomes extremely urgent, we don’t need extreme safety measures.’ If they say, ‘I think it won’t become urgent for a long time,’ I can respond: ‘Okay, then we don’t need extreme safety measures for now.’ That makes communicating with the outside world smoother.”

Jack Clark: “So, in what other ways has RSP affected everyone?”

Sam McCandlish: “Everything revolves around evaluation. Each team is doing evaluation. For example, your training team has been doing evaluation work all along—we’re trying to determine whether the model has become strong enough that it could pose danger.”

Daniela Amodei: “This actually means we need to measure the model’s performance using RSP standards, including checking whether there are signs that could raise concerns.”

Sam McCandlish: “Evaluating a model’s minimum capability is relatively easy. But evaluating a model’s maximum capability is extremely difficult. So we put a lot of research effort into trying to answer questions like: ‘Can this model execute certain dangerous tasks? Are there methods we haven’t considered yet—like mind maps, best event, or the use of certain tools—that could enable the model to carry out very dangerous behavior?’”

Jack Clark: “These evaluation tools are very helpful during policy making. Because ‘safety’ is such an abstract concept. When I say, ‘We have an evaluation tool that determines whether we can deploy this model,’ then we can collaborate with policy makers, national security experts, and CBRN (chemistry, biology, radiology, and nuclear) domain experts to jointly set precise evaluation criteria. Without these specific tools, these collaborations might not be possible at all. But once there are clear standards, people are more willing to participate and help ensure accuracy. So in this regard, RSP’s impact is significant.”

Daniela Amodei: “RSP is also very important to me, and it often influences my work. What’s interesting is that I think about RSP in a slightly different way—more from its ‘tone,’ its mode of expression. Recently we significantly adjusted RSP’s tone, because the earlier tone was too technical and even felt a bit confrontational. I spent a lot of time thinking about how to build a system that people would want to participate in.”

If RSP were a document everyone in the company could easily understand, it would be much better. Like the OKRs we use today (Objectives and Key Results). For example, what’s RSP’s main objective? How do we know whether we achieved it? What’s the current AI Safety Level (ASL)—is it ASL-2 or ASL-3? If everyone knows what to focus on, it becomes easier to spot potential issues. On the other hand, if RSP is too technical and only a small number of people can understand it, its real usefulness is greatly reduced.

It makes me happy to see RSP moving toward being easier to understand. Now, I believe most people in the company—possibly everyone, regardless of their position—can read the document and think, “This makes sense. I hope we develop AI guided by these principles, and I understand why we need to focus on these issues. If I run into problems at work, I roughly know what to pay attention to.” We want RSP to be simple enough that someone working in a manufacturing factory can easily judge: “The seat belt should be connected here, but it isn’t connected properly yet.” That way, problems can be discovered in time.

The key is to build a healthy feedback mechanism so there can be smooth communication between leadership, the board, other departments within the company, and the teams actually doing R&D. I think most problems arise because communication isn’t smooth or information is transmitted incorrectly. If problems only happen for these reasons, that would be a real shame, right? In the end, what we need to do is translate these ideas into real practice—and make sure they’re simple and clear so everyone can understand them.

Anthropic’s founding story

Sam McCandlish: “Actually, none of us had any initial intention to start a company. We just felt it was our responsibility—we had to take action—because it’s the only way to ensure AI development moves in the right direction. That’s why we made that commitment.”

Dario Amodei: “My original idea was very simple: I just wanted to invent and explore new things in a beneficial way. That idea pulled me into the AI field. AI research requires a lot of engineering support, and eventually it also requires a lot of funding.”

However, I found that without a clear goal and a plan to set up the company and manage the environment, many things could still be done—but they would repeat the kinds of mistakes that leave me feeling detached from the tech industry. Those mistakes often come from the same people, the same attitudes, and the same ways of thinking. So at some point, I realized we had to do this in a completely new way—it was almost inevitable.

Jared Kaplan: “Remember when we were in graduate school? You had a full plan to explore how scientific research could serve the public interest. I think it’s very similar to how we think now. I remember you had a project called ‘Project Vannevar,’ and the goal was to make that happen. At the time I was a professor. I watched what was going on and I was deeply convinced that AI’s influence was growing at an extremely fast rate.”

But because AI research has a high demand for funding—and because I’m a physics professor—I realized I couldn’t drive these progress on academic research alone. I hoped I could build an institution with people we could trust, to ensure AI development would move in the right direction. But honestly, I would never recommend that others start a company, and I’ve never had that desire. To me, it’s just a means to achieve a goal. I think, generally speaking, the key to success is that you truly care about achieving a goal that matters to the world, and then find the best way to achieve it.

How to build a culture of trust

Daniela Amodei: “I often think about our team’s strategic advantages. One factor that may sound a bit unexpected, but is extremely important, is our high level of trust. It’s very hard to get a large group of people to rally around a shared mission, but at Anthropic, we’ve managed to transmit that sense of mission to more and more people. In this team, including leadership and every member, we come together because of our shared mission. Our mission is both clear and pure, and that’s not common in the tech industry.”

I feel like the goal we’re working toward has a certain kind of pure meaning. None of us started because we wanted to start a company. We just felt we had to do it. We couldn’t keep moving forward in our previous place—we had to do it ourselves.

Jack Clark: “At the time, with the arrival of GPT-3 and the trends of projects all of us had interacted with or contributed to—like scaling laws—we were already clearly seeing the direction AI was heading in by 2020. We realized that if we didn’t act soon enough, we might quickly reach an irreversible threshold. We had to take action to influence the environment.”

Tom Brown: “I want to build on Daniela’s point. I really do believe there’s a high level of trust inside the team. Each of us clearly understands that we joined this team because we wanted to contribute to the world. We also jointly committed to donating 80% of our revenue to causes that drive social progress, and it’s something everyone supports without hesitation: ‘Yes, of course we’ll do that.’ That kind of trust is very special and very rare.”

Daniela Amodei: “I think Anthropic is a company with very little political coloration. Of course, our perspective might differ from that of ordinary people, and I remind myself of that all the time. I believe our hiring process and the traits of our team members create a culture that naturally rejects ‘office politics.’”

Dario Amodei: “And the cohesion of the team. Team cohesion is critical. Whether it’s the product team, the research team, the trust and safety team, the marketing team, or the policy team—everyone is working toward the same corporate goal. When different departments inside the company each pursue completely different goals, it often leads to chaos. If they think another department is undermining their work, that would be completely abnormal.”

I think one of our most important achievements is that we managed to keep the company internally consistent. Mechanisms like RSP play a key role in that. This mechanism ensures it’s not some departments creating problems and other departments trying to fix them. Instead, all departments fulfill their roles and collaborate under a unified theory of change.

Chris Olah: “I joined OpenAI initially because it’s a nonprofit organization, and I could focus on AI safety research there. But over time, I gradually found that this model wasn’t a perfect fit for me, and it forced me to make some hard decisions. In the process, I trusted Dario and Daniela’s judgment a lot, but I didn’t want to leave. Because I didn’t think adding more AI labs would necessarily be good for the world. That made me very hesitant to leave.”

When we ultimately decided to leave, I still had reservations about starting a company. I used to argue that we should set up a nonprofit focused on safety research. But ultimately, a more pragmatic attitude and a candid acknowledgment of real-world constraints made us realize that founding Anthropic was the best way to achieve our goals.

Dario Amodei: “One important lesson we learned early on was: fewer promises, more delivery. Stay realistic, face the trade-offs directly. Because trust and credibility matter more than any particular policy.”

Daniela Amodei: “One unique thing about Anthropic is our high trust and unity within the team. For example, when I see Mike Krieger insisting not to release certain products for safety reasons, and at the same time I see Vinay discussing how to balance business needs to push projects to completion, I feel something is very special. Also, engineers on the technical safety team and those on the reasoning team discuss how to ensure products are both safe and practical. This kind of unified goal and pragmatic attitude is one of the most attractive parts of the working environment at Anthropic.”

Dario Amodei: “A healthy organizational culture is one where everyone can understand and accept the trade-offs we face together. Our world isn’t perfect. Every decision requires balancing between different interests, and that balance often can’t be perfectly satisfying. However, as long as the entire team can face these trade-offs together under a unified goal, and from their respective roles contribute to the overall objective, that’s what a healthy ecosystem looks like.”

Sam McCandlish: “In a sense, it’s a ‘race upward.’ Yes, it really is a ‘race upward.’ It’s not a choice with zero risk—things could go wrong—but we all agreed on one thing: ‘This is the choice we made.’”

The pinnacle of racing in AI

Jack Clark: “But the market is fundamentally pragmatic. So the more successful Anthropic is as a company, the more motivated others will be to imitate the approaches that helped us succeed. And when our success is closely tied to our actual work in safety, that success creates a kind of ‘gravitational pull’ in the industry—pushing other companies to join this competition too. Like when we developed seat belts, other companies can imitate that. That’s a healthy ecosystem.”

Dario Amodei: “But if you say, ‘We won’t develop this technology, and you can’t do better than anyone else,’ that doesn’t work. Because you haven’t demonstrated that the path from today to the future is feasible. What the world needs—whether it’s the industry or a single company—is a way to transition society from ‘the technology doesn’t exist’ to ‘the technology exists in a powerful form and society effectively manages it.’ I think the only way to achieve that is at the level of individual companies—and ultimately across the entire industry—by facing these trade-offs directly.”

You need to find a way that stays competitive, even leads the industry in some areas, while also ensuring the technology is safe. If you can do that, your pull on the industry will be extremely strong. From the regulatory environment, to talented people hoping to join different companies, to customers’ perceptions—everything pushes the industry in the same direction. And if you can prove that you can achieve safety without sacrificing competitiveness—that is, find win-win solutions—then other companies will be incentivized to follow suit.

Jared Kaplan: “I think that’s why mechanisms like RSP are so important. It allows us to clearly see where technology is headed, recognize that we need to be highly alert to certain issues, but also avoid the wrong kind of ‘wolf cries’—not simply say, ‘Innovation should stop here.’ We need a way to let AI deliver useful, innovative, and enjoyable experiences to customers, while also clearly stating the constraints we must adhere to. Those constraints ensure the system’s safety, and they also help other companies believe they can succeed with safety in mind and compete with us.”

Dario Amodei: “A few months after we rolled out RSP, three of the most well-known AI companies released similar mechanisms too. Interpretability research is another area where we achieved a breakthrough. In addition, we’ve collaborated with AI safety research institutions, and this overall emphasis on safety is having a deep impact.”

Jack Clark: “Yes—Frontier Red Team was basically copied by other companies almost immediately. That’s a good thing. We want every lab to test for potential high-risk safety issues.”

Daniela Amodei: “Jack mentioned before that customers also care a lot about safety. They don’t want models to generate false information, and they don’t want models to be easy to bypass safety restrictions. They want models to be useful and harmless. We often hear customers say in communications with us: ‘We chose Claude because we know it’s safer.’ I think that has a huge impact on the market. We can provide trustworthy and reliable models, which also creates significant market pressure on competitors.”

Chris Olah: “Maybe we can expand on the point Dario just made. There’s a saying that the most moral action is ‘noble failure.’ In other words, you sacrifice other goals for safety—maybe even in an impractical way—to demonstrate purity in your commitment to the cause. But I think that approach is actually self-defeating.”

First, this approach puts decision power into the hands of people who don’t value safety and don’t prioritize it. On the other hand, if you work hard to find a way to align incentives—putting the hard decisions where they’re most strongly supported by the right decisions, and basing them on the strongest evidence—then you can trigger the “race upward” that Dario described. In that race, it’s not the people who care about safety who get marginalized; instead, other people are forced to keep pace and join the race.

Looking ahead to the future of AI

Jack Clark: “So, what are you all excited about for what we do next?”

Chris Olah: “I think there are many reasons to be excited about interpretability. One obvious reason is for safety. But there’s another reason too—emotionally, it excites me or feels especially meaningful. That’s because I think neural networks are wonderful, and there’s a lot of beauty we still haven’t seen. We always treat neural networks as a black box and aren’t particularly interested in their internal structures. But when you start studying them more deeply, you find that their internals are full of astonishing structures.”

It’s a bit like how people view biology. Some people might think, “Evolution is boring. It’s just a simple process that runs for a long time and then creates animals.” But in reality, every animal created by evolution is filled with incredible complexity and structure. And I think evolution is an optimization process—kind of like training a neural network. Inside neural networks there’s also all sorts of complex structures that resemble something like ‘artificial biology.’ If you’re willing to dig into them, you’ll find many astonishing things.

I feel like we’ve only just started peeling back the layers. It’s so incredible—there are too many things inside waiting for us to discover. We’ve only just started opening the door. I think the discoveries to come will be extremely exciting and wonderful. Sometimes I imagine ten years from now I walk into a bookstore and buy a textbook about neural network interpretability, or a book that genuinely tells the “biology” of neural networks. It would be full of amazing content. I believe that over the next decade, and even in the next few years, we’ll start to genuinely discover these things. It’ll be a crazy and incredible journey.

Jack Clark: “A few years ago, if someone said, ‘The government will set up new institutions to test and evaluate AI systems, and these institutions will be professional and actually work,’ you probably wouldn’t believe it. But it’s already happened. You could say the government has already created a ‘new embassy’ to handle this new category of technology. I’m looking forward to seeing where this goes. I think it actually means countries have the capability to handle this kind of societal transition—not just rely on companies. I’m excited to be part of it.”

Daniela Amodei: “I’m excited about this now. But I also think it’s hard not to be thrilled just imagining what future AI could do for humanity. Even today, there are signs that Claude can help develop vaccines, do cancer research, and work in biology—this is already incredible. Seeing what it can do right now is astonishing. And when I look ahead three to five years, imagining Claude being able to truly solve many fundamental problems humanity faces—especially in health—excites me a lot. Thinking back to my days working in international development, it would be amazing if Claude could help complete those low-efficiency tasks back then.”

Tom Brown: “From a personal perspective, I really like using Claude at work. Recently I’ve also been using Claude at home to talk about things. The biggest change recently has been coding. Six months ago, I wasn’t using Claude to handle any programming-related work, and our team also barely used Claude to write code. But now that has changed significantly. For example, last week I gave a talk at an event at Y Combinator. At the beginning I asked everyone, ‘How many people are using Claude to code?’ And almost 95% of people raised their hands. Almost everyone raised their hands, which is completely different from four months ago.”

Dario Amodei: “When I think about things that excite me, one is the kind of consensus I mentioned earlier that seems to have already been reached but is actually about to be broken. One of those is interpretability. I think interpretability isn’t just key for guiding and ensuring AI system safety. It also contains deep insights into intelligence optimization problems and how the human brain works. I once said that Chris Olah will get a Nobel Prize in medicine one day.”

Because I used to be a neuroscientist, and many mental disorders we still haven’t solved—like schizophrenia or mood disorders—I suspect they’re related to some higher-level system problem. But due to the complexity of the human brain and how hard it is to study directly, these problems are difficult to fully understand. Neural networks aren’t a perfect analogy, but they’re not as hard to parse and interact with as the human brain. Over time, neural networks will become a better analogy tool.

Another related area is applying AI to biology. Biology is an extremely complex problem, and people have been skeptical about it for many reasons. But I think that consensus of skepticism is starting to break down. We’ve already seen Nobel Prizes in chemistry awarded to AlphaFold, which is an incredible achievement. We should work to develop tools that can help us create hundreds of “AlphaFolds.”

Finally, it’s using AI to strengthen democracy. We worry that if AI is developed incorrectly, it could become a tool for authoritarianism. So how do we make AI a tool that promotes freedom and self-determination? I think development in this area might start earlier than the first two areas, but its importance is just as great.

Jared Kaplan: “I want to echo at least two points you made earlier. One is that I think many people join Anthropic because they have a tremendous curiosity about AI science. As AI technology advances, they gradually come to agree that we don’t just need to push technological progress—we need to understand it more deeply and ensure its safety. I think it’s exciting to work with more and more people who share the same vision about AI development and responsibility. And I think many of the technical advances that happened over the past year have truly helped form that consensus.”

Another side is, going back to the practical issue, I think we’ve already done a lot in AI safety. But with some developments recently, we’ve begun to have an initial understanding of the risks that very advanced systems could pose. That means we can directly investigate these risks through interpretability research and other kinds of safety mechanisms.

Through this, we can understand more clearly the risks that advanced AI systems might bring. That will allow us to advance our mission in a more scientific and evidence-based way. So I’m very excited about the next six months. We’ll use our understanding of potential issues with advanced systems to further research and find ways to avoid these traps.

Original video link

Click to learn more about LawMotion BlockBeats hiring open positions

Welcome to join the official LawMotion BlockBeats community:

Telegram subscription group: https://t.me/theblockbeats

Telegram group chat: https://t.me/BlockBeats_App

Twitter official account: https://twitter.com/BlockBeatsAsia

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.