Architecture Maps

Bluesky & AT Protocol

Interactive architecture map of Bluesky and the AT Protocol (Authenticated Transfer Protocol) — a federated social networking platform built on decentralized identity, portable data repositories, open algorithms, and composable moderation. The protocol that puts users in control of their social graph.

Open Source (MIT / Apache 2.0) Since 2021 TypeScript / Go / Rust ~30M Users Federated Protocol
01

System Overview

Bluesky is a decentralized social network built on the AT Protocol (atproto). Unlike traditional platforms, the architecture separates identity, data hosting, indexing, feed curation, and moderation into independent, replaceable services. Users own their data and identity, and can migrate between providers without losing their social graph.

~30M
Registered Users
5
Core Layers
~50K
Events/sec Firehose
3
DID Methods
Interactive Architecture Diagram — Click nodes for details
Identity / DID
Data Storage
Network / Relay
Feed / Indexing
Moderation
Protocol / Schema
Client Apps
Infrastructure
02

Identity — DIDs & Handles

AT Protocol decouples identity from any single server. Users have a persistent DID (Decentralized Identifier) that survives server migrations, and a human-readable handle (domain name) for discovery. This two-layer identity system is the foundation of data portability.

did:plc

The primary DID method for Bluesky. A "placeholder" method designed for the transition to full decentralization. DIDs are short (e.g., did:plc:abc123), managed by a central PLC directory server that logs signed rotation operations. Users can rotate their signing keys and update their PDS endpoint without changing their DID.

Identity

did:web

An alternative DID method that uses DNS and HTTPS. The DID document is hosted at a well-known URL on the user's domain. Useful for organizations and self-hosters who want full control of their identity without relying on the PLC directory. Less resilient to domain loss.

Identity

Handle Resolution

Handles are domain names (e.g., alice.bsky.social or alice.com). They resolve to DIDs via DNS TXT records (_atproto.handle) or HTTPS well-known endpoints. Handles are mutable cosmetic labels — the DID is the true persistent identity. Custom domains verify ownership.

Identity

PLC Directory

A centralized audit log for did:plc operations. Stores the history of all DID document updates (key rotations, PDS migrations, handle changes). Designed to eventually be replaced by or mirrored to a more decentralized system. Critical infrastructure for the network.

Identity
Why DIDs Matter

Traditional social platforms tie your identity to your account on their server. If you leave, you lose your identity, followers, and content. With AT Protocol, your DID is independent of any server. You can migrate your entire account — posts, follows, identity — to a different PDS, and your followers never need to update anything. The DID stays the same; only the PDS endpoint in the DID document changes.

03

Personal Data Server (PDS)

The PDS is a user's home server — it hosts their data repository, manages authentication, and serves as the origin point for all their content. Users can self-host a PDS or use a hosting provider. Bluesky Social PBC operates the largest PDS cluster (bsky.social), but the protocol is designed for any number of independent PDS instances.

Data Repository Host

Each PDS stores one or more user repositories. A repository is a signed, content-addressed Merkle Search Tree (MST) containing all of a user's records (posts, likes, follows, profile). The PDS serves the repo over XRPC and syncs changes to the network via the firehose.

Data

Authentication

PDS handles user auth via OAuth 2.0 (DPoP-bound tokens). Supports the AT Protocol OAuth profile with PKCE, pushed authorization requests, and DID-based client identification. Session tokens are short-lived JWTs; refresh tokens enable persistent sessions.

Data

Event Stream (Firehose)

Every PDS emits a real-time event stream (WebSocket) of repository commits. Each event contains the repo DID, the commit operation (create/update/delete), the affected records, and a signed commit object. Relays subscribe to these streams to aggregate the network.

Network

Blob Storage

Media files (images, videos) are stored as blobs alongside the repository. Blobs are content-addressed by CID (Content Identifier). The PDS serves blobs via HTTP and tracks which records reference which blobs. Blob lifecycle is tied to the records that reference them.

Data

XRPC Server

The PDS exposes AT Protocol APIs via XRPC (Cross-Server RPC) — essentially HTTP endpoints defined by Lexicon schemas. Handles both authenticated user operations (creating posts, following) and unauthenticated reads (repo sync, public profile). Endpoints follow the com.atproto.* and app.bsky.* namespaces.

Protocol

Account Migration

Users can migrate between PDS instances without losing data or followers. The migration process: export the signed repo from the old PDS, import to the new PDS, update the DID document to point to the new PDS. Since the repo is signed by the user's key (not the server), any PDS can verify and host it.

Data
Self-Hosting

The official PDS distribution is a TypeScript/Node.js application with SQLite for metadata and local disk for blob storage. It is designed to run on minimal hardware — a single VPS can comfortably host hundreds of accounts. Self-hosters point their domain at the PDS and configure DID resolution, then connect to the network by having a Relay subscribe to their firehose.

04

Data Repositories & MST

Every user on AT Protocol has a data repository — a signed, content-addressed data structure containing all their records. The repo uses a Merkle Search Tree (MST) for efficient verification and sync, and CBOR/CID-based encoding for compact, deterministic serialization.

Merkle Search Tree (MST)

The MST is a B-tree-like structure where each node's position is determined by the leading zeros of the SHA-256 hash of its key. This creates a deterministic, balanced tree that enables efficient diff-based sync: two repos can exchange only the tree nodes that differ, rather than the entire dataset. Each commit object contains the root CID of the MST, signed by the user's key.

Record Types

Records are typed by Lexicon NSID (e.g., app.bsky.feed.post, app.bsky.graph.follow). Each record is a CBOR-encoded object stored at a path like collection/rkey. The collection is the Lexicon ID; the rkey is a TID (timestamp-based ID) or self-describing key.

Protocol

Signed Commits

Every change to a repo produces a new commit object containing: the repo DID, a sequence number, the new MST root CID, the previous commit CID, and a signature from the user's signing key. This creates an auditable, tamper-evident chain of all changes.

Data

CBOR + CID Encoding

Records use DAG-CBOR (Concise Binary Object Representation) for deterministic serialization. Every object is addressable by its CID (Content Identifier) — a hash of its CBOR bytes. This content-addressing enables deduplication, integrity verification, and efficient sync across the network.

Data

CAR File Export

Repositories can be exported as CAR (Content ARchive) files — a standard IPLD format that bundles all blocks (MST nodes, records, commit objects) into a single binary file. This is the format used for account migration and full repo backup. Any conforming implementation can read and verify a CAR export.

Data
Repository Structure
User Signs
with Key
Commit
Object
MST Root
(CID)
Collection
Nodes
Records
(CBOR)
05

Relay & Big Graph Service (BGS)

The Relay (also called Big Graph Service or BGS) is the network aggregation layer. It subscribes to the firehose of every known PDS, validates and merges the event streams, and re-broadcasts a unified firehose that downstream consumers (App Views, Feed Generators, Labelers) can subscribe to. It is the backbone of AT Protocol's federated data flow.

Firehose Aggregation

The Relay maintains WebSocket connections to every known PDS in the network. It receives commit events, validates signatures and repo integrity, deduplicates, and merges them into a single ordered firehose. At scale, this stream carries tens of thousands of events per second.

Network

PDS Crawling & Discovery

The Relay discovers new PDS instances through multiple channels: manual registration, DID document resolution, and crawling references in the data. When a new PDS is found, the Relay subscribes to its firehose and begins backfilling historical data by syncing full repos.

Network

Repo Verification

The Relay validates that incoming commits are properly signed by the repo owner's key (resolved via the DID document). It verifies MST integrity, checks sequence numbers for gaps, and rejects malformed or unauthorized data. This prevents PDS operators from forging content on behalf of their users.

Network

Cursor-Based Consumption

Downstream consumers connect to the Relay's firehose with a cursor (sequence number). If disconnected, they can resume from their last cursor without missing events. The Relay buffers a window of recent events to support this catch-up mechanism, enabling reliable at-least-once delivery.

Network
Federation Model

AT Protocol uses a "big world" federation model, unlike ActivityPub's server-to-server approach. Instead of every server talking to every other server, PDS instances push data to Relays, and consumers pull from Relays. This hub-and-spoke pattern reduces the N-squared connection problem and enables global-scale indexing. Multiple independent Relays can coexist, each aggregating the full network or a subset.

06

App View

The App View is the indexing and API layer that transforms raw repository data into the rich, queryable APIs that client applications consume. It subscribes to the Relay firehose, builds materialized views (timelines, thread trees, notification lists, search indices), and serves the app.bsky.* Lexicon endpoints.

Firehose Consumer

The App View subscribes to the Relay's unified firehose and processes every event in real time. It parses records by Lexicon type, resolves references (reply parents, quote posts, embeds), hydrates user profiles, and updates its materialized views. Handles the full app.bsky.* record namespace.

Feed / Indexing

Timeline Construction

Builds per-user timelines by indexing follow relationships and chronologically ordering posts from followed accounts. The "Following" feed is computed server-side by the App View. Custom algorithmic feeds are delegated to external Feed Generators via the Feed Generator protocol.

Feed / Indexing

Thread Resolution

Resolves reply trees into threaded conversations. Posts reference their parent and root via AT URIs (at://did/collection/rkey). The App View materializes these references into navigable thread structures, handling deleted posts, blocked users, and deeply nested replies.

Feed / Indexing

Search & Discovery

Full-text search for posts and user profiles, trending topic detection, and suggested follows. The search infrastructure indexes record content, user metadata, and social signals. Powers the Explore/Discover features in client apps.

Feed / Indexing

Notification System

Generates notifications by indexing events that reference a user: likes on their posts, replies, follows, mentions, quote posts, and reposts. Notifications are computed from the firehose events and served via the app.bsky.notification.* endpoints.

Feed / Indexing

Label Application

The App View integrates labels from Labeler services into API responses. When serving content, it attaches relevant labels (content warnings, flags, categories) so clients can implement their own moderation display logic based on user preferences.

Moderation
07

Feed Generators

Feed Generators are one of AT Protocol's most distinctive features: algorithmic feeds are external services that anyone can build and publish. Instead of a single company controlling what you see, users choose from an open marketplace of feed algorithms — from simple chronological filters to sophisticated ML-powered recommendation engines.

Feed Generator Protocol

A Feed Generator is an XRPC service that implements the app.bsky.feed.getFeedSkeleton endpoint. It receives a request with the user's DID and a cursor, and returns an ordered list of post AT URIs (the "skeleton"). The App View then hydrates these URIs into full post objects with profiles, embeds, and labels.

Feed

Firehose Indexing

Feed Generators typically subscribe to the Relay firehose to build their own index of posts. They filter and score content based on their algorithm (topic, language, engagement, ML signals) and store a ranked index. When a skeleton is requested, they query this index and return matching post URIs.

Feed

Feed Registration

Feed Generators publish a generator record (app.bsky.feed.generator) in their creator's repository with metadata: display name, description, avatar, and the service endpoint URL. Users "pin" feeds to their sidebar by saving references to these generator records. The App View resolves the endpoint and proxies skeleton requests.

Feed

Open Marketplace

Anyone can build and host a Feed Generator. Popular examples include topic feeds (science, art, sports), language-specific feeds, "mutuals only" feeds, and community-curated feeds. This separation of content ranking from content hosting is a core architectural principle of AT Protocol.

Feed
Feed Skeleton Flow
Client
Requests Feed
App View
Proxies
Feed Generator
Returns Skeleton
App View
Hydrates Posts
Client
Renders Feed
08

Moderation Architecture

AT Protocol's moderation system is composable and multi-layered. Rather than a single moderation authority, the protocol defines a labeling system where independent Labeler services can flag content, and users choose which labelers to subscribe to. This enables community-driven, transparent, and customizable moderation.

Labeler Services

Labelers are independent services that subscribe to the firehose, analyze content, and emit labels (e.g., "nudity", "spam", "misleading"). Labels are signed by the labeler's DID and published via the com.atproto.label.* endpoints. Users subscribe to labelers they trust, and clients apply labels according to user preferences.

Moderation

Ozone Moderation Tool

Ozone is the open-source moderation dashboard built by Bluesky. It provides a web interface for reviewing reports, applying labels, managing appeals, and coordinating moderation teams. Any labeler operator can run an Ozone instance. It connects to the labeler backend and the firehose for real-time review queues.

Moderation

Composable Stacking

Users can subscribe to multiple labelers simultaneously. Labels from different sources stack: Bluesky's official labeler, community-run topical labelers, and personal block/mute lists all combine. Clients resolve conflicts using priority rules and user preferences (hide, warn, or show).

Moderation

Takedown Layers

Content removal operates at multiple levels: PDS operators can remove content from their hosting, the App View can filter content from API responses, and labelers can flag content for client-side hiding. This layered approach means no single entity has absolute power, but each layer can act independently based on its policies.

Moderation
Moderation Philosophy

AT Protocol explicitly rejects the idea that moderation must be all-or-nothing. By separating content hosting (PDS), content indexing (App View), content ranking (Feed Generators), and content labeling (Labelers), the protocol creates checks and balances. A PDS cannot prevent you from being seen on other PDS instances. A labeler cannot delete your data. An App View cannot prevent other App Views from indexing you. Users choose their own trust boundaries.

09

Protocol Layer — Lexicon, XRPC & Data Model

The AT Protocol defines the wire format, schema system, and RPC conventions that enable all services to interoperate. Lexicon provides type-safe API schemas, XRPC maps them to HTTP, and the data model ensures records are portable and verifiable across the network.

Lexicon Schema System

Lexicon is AT Protocol's schema definition language. Each API endpoint and record type is defined by a Lexicon document (JSON) with a reverse-DNS NSID (e.g., app.bsky.feed.post). Lexicons define field types, constraints, and references. They enable code generation, validation, and forward-compatible schema evolution.

Protocol

XRPC (Cross-Server RPC)

XRPC maps Lexicon methods to HTTP endpoints. Queries map to GET requests, procedures to POST. The endpoint path is derived from the NSID (e.g., /xrpc/app.bsky.feed.getTimeline). XRPC also defines subscription methods for WebSocket streams (used for the firehose). Content-type is application/json for queries/procedures and application/cbor for repo sync.

Protocol

AT URI Scheme

Records are referenced by AT URIs: at://did:plc:abc/app.bsky.feed.post/3jui7k. The scheme encodes the authority (DID), collection (Lexicon NSID), and record key (rkey). AT URIs are the universal pointer format for cross-referencing records (replies, quotes, embeds) across the network.

Protocol

Namespaced IDs (NSIDs)

NSIDs use reverse-DNS notation to namespace all Lexicon types. The app.bsky.* namespace is Bluesky's social application; com.atproto.* is the core protocol; third parties can define their own (e.g., blue.zio.*, community.lexicon.*). This enables extensibility without coordination.

Protocol

Record Key (rkey)

Records within a collection are identified by their rkey. Most use TIDs (Timestamp IDs) — base32-encoded microsecond timestamps that sort chronologically and avoid collisions. Some collections use semantic keys (e.g., "self" for profile records, the followed DID for follow records).

Protocol

Inter-Service Auth

Services authenticate to each other using signed JWTs with the service's DID as the issuer. The App View authenticates to PDS instances when proxying user requests. Feed Generators verify the App View's identity. This chain of signed tokens ensures each hop in the request path is authenticated and authorized.

Protocol
Record Type Lexicon NSID Description
Post app.bsky.feed.post A text post with optional embeds (images, links, quote posts), facets (mentions, links, tags), and reply references
Like app.bsky.feed.like A like on a post, referencing the subject post's AT URI and CID
Repost app.bsky.feed.repost A repost/boost of another user's post
Follow app.bsky.graph.follow A follow relationship, stored in the follower's repo with the followed DID as the subject
Profile app.bsky.actor.profile User display name, bio, avatar, and banner image (rkey is always "self")
Block app.bsky.graph.block A block record that prevents mutual interaction (bidirectional enforcement)
List app.bsky.graph.list A curated list of users (mute list, moderation list, or curation list)
Feed Generator app.bsky.feed.generator Declaration record for a custom feed algorithm with service endpoint
Labeler app.bsky.labeler.service Declaration record for a labeler service with supported label definitions

Technologies

Connections