Syncs
HubSpot never shipped a first-class sync primitive, so every integration team rebuilds the same machinery: a scheduler, cursor storage, batching, backoff, and a pile of property-creation scripts. An HS-X sync collapses that to a declaration. You describe the source and the destination; the runtime owns the plumbing, in your own Cloudflare account.
TL;DR — Declare a source with defineSource (pull: a fetch that returns { key, data } rows and a cursor; push: a receive behind an HMAC-verified webhook). Attach it with worker.sync: a schedule, a destination object, a typed schema, and a manageSchema mode. The runtime schedules runs, persists the cursor, batches writes into the destination object, and shares the portal's rate budget. The worked pull example is Email Guard's suppression-list sync; hs-x init --type sync-source scaffolds a fresh one.
A sync is a declaration, not a service
The integration you have probably written before looks like this: a cron job somewhere, a database row holding "last synced at," a loop that pages an API, a batch upsert against HubSpot, and retry code you wrote at 2am after the first rate-limit incident. None of that logic was your product. It was the cost of moving rows.
HS-X splits a sync into the two parts you actually care about and absorbs the rest:
- A source produces rows. Pull sources fetch pages from an external API on a schedule. Push sources receive rows when the external system calls a webhook.
- A sync binds that source to a HubSpot destination: which object the rows land in, what the typed schema is, and whether HS-X manages that schema in the portal.
Scheduling, cursor persistence, batching, retry, and rate-limit fairness belong to the runtime, which runs in your own Cloudflare account. One sync misbehaving cannot starve your workflow actions; the limiter shares the portal budget across capabilities.
Syncs are one-way into HubSpot in the current version. Your handlers can always read HubSpot through ctx.hubspot, but pushing changes back out to the external system is not what this primitive does today.
Push sources: rows arrive on a webhook
This is the scaffold hs-x init --type sync-source generates, trimmed to its shape:
import { defineSource, defineWorker } from "@hs-x/sdk";
const incomingCustomers = defineSource.push({
name: "incoming-customers",
auth: { type: "hmac", secret: process.env.INCOMING_CUSTOMERS_WEBHOOK_SECRET },
async receive({ event }) {
const customer = event as { external_id: string; email: string };
return { rows: [{ key: customer.external_id, data: customer }] };
},
});
const worker = defineWorker("sync");
worker.sync(incomingCustomers, {
schedule: "event",
into: "p_customer",
schema: {
email: "string",
external_id: "string",
},
manageSchema: "full",
});Reading it top to bottom: the source declares an HMAC secret, so the runtime verifies the webhook signature before your receive ever runs. receive turns one event into rows, each a { key, data } pair where key is the row's stable identity. The sync binds those rows to the p_customer object with a two-field typed schema, schedule: "event" means runs happen when rows arrive rather than on a clock, and manageSchema: "full" tells HS-X to manage the destination object and its properties in the portal from that schema.
Run hs-x dev and the dev server prints the webhook URL to point your external system at. Run hs-x check and the schema declaration is validated before anything deploys.
Pull sources: fetch pages on a schedule
A pull source implements one method, fetch, which receives the cursor from the previous run and returns a page: { key, data } rows plus the cursor for the next run. This is Email Guard's suppression-list source from the Getting started guide, the verification provider's list of addresses that hard-bounced or complained, pulled into HubSpot contacts every five minutes:
type SuppressionPage = {
next?: string;
entries: Array<{ email: string; reason: string; suppressed_at: string }>;
};
const suppressionList = defineSource({
name: "suppression-list",
auth: { type: "bearer", token: process.env.EMAILCHECK_API_KEY },
async fetch({ cursor, http }) {
const res = await http.get("https://api.emailcheck.example/v1/suppressions", {
query: { pageSize: 100, after: cursor },
});
const page = res.body as SuppressionPage;
return {
cursor: page.next,
rows: page.entries.map((entry) => ({
key: entry.email,
data: {
email: entry.email,
email_suppressed: true,
email_suppression_reason: entry.reason,
email_suppressed_at: entry.suppressed_at,
},
})),
};
},
});
worker.sync(suppressionList, {
into: "contacts",
schedule: "5m",
manageSchema: "properties",
schema: {
email: "string",
email_health_status: { type: "enumeration", options: ["deliverable", "risky", "undeliverable"] },
email_health_score: "number",
email_suppressed: "bool",
email_suppression_reason: { type: "enumeration", options: ["bounce", "complaint", "manual"] },
email_suppressed_at: "datetime",
},
});Two things differ from the push shape. The source declares auth: { type: "bearer", ... } and gets an injected http client, so retries, backoff, and rate-limit handling on the provider call belong to the runtime. And the cursor drives the pagination: it is whatever type your source needs (a string id, a timestamp, an opaque token like the provider's next here), persisted in your own Cloudflare storage between runs and handed back on the next fetch. Returning undefined for the page's cursor ends the run; the next scheduled run starts from the last persisted cursor. Inside any sync handler you can also touch it directly with ctx.sync.cursor() and ctx.sync.setCursor(...), which is how backfills and resets are done deliberately instead of by deleting state.
The schema block is the superset contract for every contact property Email Guard owns, including email_health_status and email_health_score, which this sync never writes (the validate-email action does). Rows that carry a subset of the schema upsert into it cleanly, and key: entry.email is what makes a re-delivered suppression update the same contact instead of duplicating it.
schedule takes an interval shorthand like "5m" or a cron expression for pull sources, or "event" for push. manageSchema has three settings: "full" (HS-X manages the object and its properties), "properties" (HS-X manages properties on an object that already exists — the right mode here, since contacts ships with every portal), and false (HS-X touches no portal schema; you own it). The default is false — leaving the field off means HS-X validates rows locally but never reads or mutates the portal. The reconciliation itself runs at deploy time and is explicit: hs-x deploy --portal-schema-live diffs your declaration against the portal and prints a WILL CREATE / WILL ALTER plan line per difference, and adding --apply-schema applies that plan.
What the runtime does with your rows
Between your source returning rows and records appearing in HubSpot, the runtime in your Cloudflare account does the unglamorous work: batching writes into HubSpot's batch endpoints, pacing them through the same hierarchical rate limiter every other capability shares, and retrying transient failures with backoff. The rate limits guide covers how that budget is split when a sync and a workflow action want the same portal at the same time.
Because all of this runs in your account, a sync keeps running on its schedule even if HS-X disappears. The cursor state, the destination data, and the Worker are yours.
Test it before it touches a portal
hs-x init customer-sync --type sync-source
cd customer-sync && bun install
hs-x check # validates the worker + schema declaration
hs-x dev # local runs, webhook URL, live logshs-x dev is where syncs earn trust: trigger a run, watch the rows land, inspect the cursor between runs, and only then hs-x deploy. The local dev guide covers that loop in depth.
