I’ve been wanting to play with Cloudflare’s Agents and Durable Objects for a while now. Stateless serverless functions are great for simple API endpoints but coordinating long-running async tasks usually forces you back to traditional backend. with database polling or message queues.

Usually what works for me to learn something new is to get my hands dirty and build something real, reading documentation wasn’t going to cut it.

So, I built IsSafe.store, a free utility that checks online stores before you buy. You paste a store URL, and it runs a series of background checks (RDAP, threat lists, review aggregation, and AI synthesis) to generate a public-signal risk report.

This post describes how I built it and what I learned about stateful edge orchestration along the way.

The problem with long edge checks

Gathering reputation signals for an online store isn’t instant. The application needs to query Google Web Risk, check RDAP servers, check URLhaus, run searches via Tavily, fetch the site’s SSL certificate, and feed everything to Cloudflare Workers AI.

This process takes from 5 to 25 seconds. In a traditional stateless serverless environment, this has a few problems:

  • If the client disconnects or the worker is evicted mid-run, you lose all progress.
  • If you block a HTTP request for 20 seconds, you can hit timeouts and give the user a terrible UI experience.
  • If you use database polling, you have to track the progress for each steps are done.

Durable Objects and the Agents SDK solve this nicely. Instead of a database-driven queue, each safety check gets its own stateful actor living at the edge. The client starts a check, receives a report ID, and immediately starts polling the agent’s current state while the agent runs the heavy lifting in the background.

Durable Objects with the Agents SDK

The Agents SDK does a lot of heavy lifting. Instead of writing raw Durable Object code, you extend the base Agent class and let the SDK handle the RPC routing.

Here is the core structure of my StoreSafetyAgent:

// src/agents/store-safety-agent.ts
import { Agent, type FiberRecoveryContext } from 'agents'
import { runStoreResearch } from '../lib/research'
import { saveReport } from '../lib/reports-db'
import type { StoreSafetyReport, StoreSafetyRequest, StoreSafetyState } from '../types/report'

type StoreSafetyAgentState = StoreSafetyState & {
  checkpoint: StoreResearchCheckpoint | null
}

export class StoreSafetyAgent extends Agent<Env, StoreSafetyAgentState> {
  initialState: StoreSafetyAgentState = {
    report: null,
    updatedAt: null,
    checkpoint: null,
  }

  async startCheck(request: StoreSafetyRequest): Promise<StoreSafetyReport> {
    const existing = this.state.report

    // If there's already an active check running, return it
    if (
      existing?.id === request.id &&
      existing.status !== 'failed' &&
      new Date(existing.expiresAt).getTime() > Date.now()
    ) {
      return existing
    }

    const queuedReport = createBaseReport(request, 'queued', 'Store safety check queued.', this.env, 0)

    this.setState({
      report: queuedReport,
      updatedAt: new Date().toISOString(),
      checkpoint: null,
    })

    // Enqueue the background work to run asynchronously
    await this.queue('processCheck', request)

    return queuedReport
  }

  async getCurrentReport(): Promise<StoreSafetyReport | null> {
    return this.state.report
  }
}

When a user submits a URL, startStoreCheck creates a deterministic ID based on the normalized domain name. It resolves the agent via getAgentByName using this ID, and calls startCheck.

Because Durable Objects guarantee that only a single instance of a given ID runs globally, you don’t have to worry about race conditions where two users checking the same domain at the exact same second launch duplicate research pipelines.

Named Fibers and Checkpoints

One of the most interesting things I learned while building this was resilience through checkpoints. Cloudflare Workers are ephemeral, isolates can be restarted at any point, especially during a 20-second background process.

To handle this, I split my research pipeline into sequential steps:

  1. Site Evidence (HTML, policy links, SSL)
  2. RDAP (Domain Age)
  3. Threat Lists (URLhaus, OpenPhish, Google Web Risk)
  4. Reputation Searches (Tavily scoped query)
  5. Numeric Scoring
  6. AI Synthesis

After each step completes, I stash a checkpoint in the agent’s fiber context. If the isolate restarts, onFiberRecovered fires, reads the stashed checkpoint, restores the progressive UI status, and re-queues the background processing right where it left off instead of restarting from scratch:

// src/agents/store-safety-agent.ts (continued)
  override async onFiberRecovered(ctx: FiberRecoveryContext): Promise<void> {
    if (!ctx.name.startsWith('store-check:')) {
      return
    }

    const checkpoint = toCheckpoint(ctx.snapshot)
    if (!checkpoint) return

    const progress = getCheckpointProgress(checkpoint)
    
    // Restore the progressive loading status for the polling UI
    this.setState({
      report: createProgressReport(
        this.state.report,
        checkpoint.request,
        progress.status,
        progress.summary,
        this.env,
        progress.progressStep,
      ),
      updatedAt: new Date().toISOString(),
      checkpoint,
    })
    
    await this.queue('processCheck', checkpoint.request)
  }

The background fiber itself handles stashing these checkpoints in real time as the research runs:

// src/agents/store-safety-agent.ts (continued)
  private async runCheckFiber(
    request: StoreSafetyRequest,
    checkpoint: StoreResearchCheckpoint | null,
  ) {
    return this.runFiber(`store-check:${request.id}`, async (ctx) => {
      const report = await runStoreResearch(
        request,
        this.env,
        (status, summary, progressStep) => {
          // Updates UI state for polling clients
          this.setState({
            report: createProgressReport(this.state.report, request, status, summary, this.env, progressStep),
            updatedAt: new Date().toISOString(),
            checkpoint: this.state.checkpoint,
          })
        },
        {
          checkpoint,
          onCheckpoint: (nextCheckpoint) => {
            // Save state to Durable Object memory and fiber context
            ctx.stash(nextCheckpoint)
            this.setState({
              report: this.state.report,
              updatedAt: new Date().toISOString(),
              checkpoint: nextCheckpoint,
            })
          },
        },
      )

      // Finally, persist report to D1 cache
      await this.retry(
        () => saveReport(this.env.DB, report),
        { maxAttempts: 3, baseDelayMs: 250 },
      )

      return report
    })
  }

Writing this resilient loop forced me to think about my state structures in a much more granular way. It was a nice change from the usual try/catch approach.

Designing a deterministic score

I did not want the model to invent a risk score, or let AI do it. To make the tool reliable, the score engine is plain, deterministic TypeScript.

The pipeline collects the raw evidence (like “domain is less than 3 months old”, “SSL is issued by Let’s Encrypt”, or “domain found in URLhaus threat list”). It then runs a deterministic factor model to weigh these inputs and calculates specific weights so that ordinary review noise doesn’t trigger a critical alert.

Workers AI is used precisely for what it is best at:

  • Structuring raw strings: Helping classify contact details or interpreting unstructured feedback.
  • Synthesis: Taking the deterministic score, the confidence rating, and the parsed evidence list to output a friendly, readable summary.

Interesting bit: since I wanted to run entirely on Cloudflare’s free tier, if my Workers AI daily quota gets exhausted, the app degrades gracefully: the report is still generated with the correct score and listed evidence but skips the AI summary.

TanStack Start

For the frontend, I chose TanStack Start. The main reason behind this choice was its native compatibility with Cloudflare Workers. Instead of dealing with an architectural mismatch between Next.js and Cloudflare, I can compile my entire React app, file-based API routes, and colocated server functions into a single compiled Worker script.

The integration flow is very simple:

  • The client submits the URL check via a server function at /api/check.
  • The user is redirected to the dynamic /report/:id route.
  • A simple useEffect poller queries /api/check/:id to stream progress state right from the stateful Durable Object.

Results and conclusions

If you’re looking for a project to learn these tools, I highly recommend building a utility that coordinates several third-party APIs. There is no better way to understand state, eviction, and edge orchestration than letting your code handle real-world network failures in real time.

If you have any questions let me know.