Principles

Do No Harm

0:00

Download MP3 · All tracks

The principle

Every product we build begins with one question: who could this hurt, and what would it take to prevent that?

This is not risk management. Risk management is a business function focused on protecting the business. Do No Harm is about the person on the other end of the system, including the person we have not yet met, the person who may be vulnerable, the person who will use the product in a state we did not anticipate.

What it means in practice

Design for the most vulnerable plausible user, not the average one

When we scope a product, we imagine the person most at risk if something goes wrong. For a trip planner, that is the driver who is exhausted and making bad decisions. For an AI assistant, that is the user in crisis reaching for anything that sounds human. For a content pipeline, that is the reader who will believe what we publish because our brand carries weight.

The question is not "what does the average user need" but "what happens to the vulnerable user if the system misbehaves." We build for the vulnerable user. The average user is automatically covered.

If a feature could cause harm, the default is not to ship it

Features with harm potential ship only when the harm pathway has been identified, mitigated, and tested. If we cannot name the mitigation, we do not ship. If the mitigation is "the user will probably not do X," we do not ship.

Safety guardrails are not features to balance against convenience

We do not remove a guardrail because it inconveniences the majority. Guardrails exist for the minority who need them. The majority's experience is shaped by everything else the product does.

If a guardrail makes the product feel too restrictive for 95% of users, the fix is to improve the guardrail's precision, not to remove it.

The architecture matters more than the policy

A policy that says "our AI will never encourage self-harm" is not a safety measure. It is a statement of intent. The safety measure is the part of the system that detects self-harm-adjacent language, interrupts the conversation, surfaces real help, and logs the event for review.

Policy without architecture is theatre. Architecture without policy is accidental. Both have to be in place.

The crisis-interruption pattern

For any AI product that bonds with users (voice assistant, coach, companion, chat), implement this pattern:

Detect. Monitor for crisis language (self-harm, violence toward others, acute distress, withdrawal language).
Interrupt. Stop the AI's normal conversation flow when a threshold is crossed.
Refuse to profit. If the user is in a paid session, stop charging.
Handoff to real help. Provide country-appropriate crisis resources (UK: Samaritans 116 123, CALM, Shout 85258; US: 988; adapt per market). Not a link buried at the bottom of a page; surfaced prominently.
Log. Record the event for review, so we can learn from real-world triggers.

Theia, our celestial connection product, implements this exact pattern. If a user reaches the product in their darkest moment, the reading stops, their money is refunded, and they are directed to real human support. We consider this non-negotiable for any Marbl product that forms a bonded partnership with users.

The hard cases

Some scenarios resist clean answers. When that happens:

Escalate. A human owner makes the call, with the question and reasoning documented.
Default to safe. When uncertain, the more cautious option wins.
Document the dilemma. We keep a log of hard cases so our practice gets better over time, and so we can be honest about the tradeoffs we made.

Red lines (non-negotiable)

Regardless of commercial pressure, regardless of user request:

No sexualised content, full stop
Zero tolerance for content involving minors in any unsafe context
No hate speech, dehumanising rhetoric, or content that targets protected groups
No manipulation tactics or dark patterns, even when effective
No encouragement of self-harm, harm to others, or high-risk behaviours
No disinformation, deepfakes of real people, or engagement-farming content
No weapons creation guidance, attack tooling, or exploitation content
No surveillance tooling that targets individuals without consent

These are not "red lines unless the business case is strong enough." They are red lines.

When we fail

We will fail at Do No Harm sometimes. Every builder does. When we fail:

Acknowledge publicly if the harm was public
Learn in writing what failed, where, and why
Update the principles if the case reveals a gap
Compensate the harmed party where practical
Never pretend it did not happen

A document that claims perfect safety is not a safety stance. It is marketing. This piece is intended to reduce failure, not eliminate it.

The principle

What it means in practice

Design for the most vulnerable plausible user, not the average one

If a feature could cause harm, the default is not to ship it

Safety guardrails are not features to balance against convenience

The architecture matters more than the policy

The crisis-interruption pattern

The hard cases

Red lines (non-negotiable)

When we fail

See also