Rate limits in production: surviving HubSpot's per-app-install budget at scale.
A Worker that never throws a 429 at your user, with retry, backoff, and jitter tuned to HubSpot's published budget for your app's tier, fair sharing between syncs and on-demand actions, and a dashboard that tells you how close to the cap you are before HubSpot does. The reference page at /docs/rate-limits has the numbers; this guide has the patterns.
Before you begin
The single most important thing to internalize is this. For a marketplace OAuth app, HubSpot rate-limits each install to 110 requests per 10 seconds, applied per (app, installed account). Each app you ship gets its own bucket on every portal it's installed in — your app does not share that bucket with the Salesforce sync, the Zapier connector, or any other OAuth app the customer has installed. Private apps run on a different budget entirely (100 req/10s on Free/Starter, 190 req/10s on Pro/Enterprise). Confirm the exact tier numbers for your situation against the usage guidelines before tuning.
The 110/10s ceiling is a rolling window, not a fixed bucket that resets on the second — short spikes within the window are fine as long as the 10-second total stays under cap. The other fact that shapes every decision in this guide. Batch endpoints cap at 100 records per request and count as 1 request against the limit, which is the single largest lever you have for surviving high write volume. The full per-tier table lives at /docs/rate-limits; the rest of this page covers what to do with those numbers.
How HS-X tracks the budget
Every HubSpot response carries X-HubSpot-RateLimit-Daily, X-HubSpot-RateLimit-Daily-Remaining, X-HubSpot-RateLimit-Interval-Milliseconds, X-HubSpot-RateLimit-Max, and X-HubSpot-RateLimit-Remaining — the -Max/-Remaining pair describes the current rolling-window budget for your app on that portal. (HubSpot also still emits X-HubSpot-RateLimit-Secondly and -Secondly-Remaining, but those have been marked deprecated since 2018; don't build new code against them.) Those headers are the only honest source of remaining budget — counting calls in your own code drifts the moment another caller on the same app install makes a request. The runtime's HTTP layer reads them on every response and feeds them into a Durable Object token bucket keyed by (appId, portalId), which every sync, action, and tool on the Worker shares.
One more thing worth knowing up front. The Search API has its own separate 5-req/s cap that does not count against the 110/10s pool. It also caps page size at 200 records and won't paginate past a 10,000-record total per query. Hot paths through Search are the single most common cause of "we got 429s but our dashboard said we had headroom" tickets. We cover what to do about it in step 5.
Wrap HubSpot calls with retry + backoff
The http.fetch retry helper, the http.batch.upsert wrapper, the shared (appId, portalId) token bucket, worker.metrics.enable(), worker.action(..., { concurrency: 1 }), worker.sync(..., { batch }), and the hs-x doctor --ratelimit command described in this guide are documented design surface for the runtime — they're how we plan for the patterns to land. Today, your handler receives context.fetch (a standard Web fetch bound to the installed account's OAuth token), and you wire retries, jitter, batching, and metrics yourself. The snippets below show the design-preview API; the real example after them is what you can implement against today's SDK.
You almost never want to call raw fetch against HubSpot without a retry layer, because there are five things you'd otherwise reimplement badly: reading the rate-limit headers on every response, serializing through a shared token bucket, retrying 429 and 5xx with exponential backoff plus jitter, surfacing the request to the dev-mode log with timing, and normalizing the hubspot-api-nodejs "throws on 404" misbehaviour into a regular response with a status field.
// Design preview — not yet wired in @hs-x/sdk.
import { defineWorker } from '@hs-x/sdk';
export default defineWorker(({ worker }) => {
worker.action('enrich-contact', async ({ input, http }) => {
const res = await http.fetch(`/crm/v3/objects/contacts/${input.id}`);
if (res.status === 404) return { found: false };
return { found: true, contact: res.body };
});
});The interesting bits are the parts that touch the rate limit:
- Header parsing. Every response updates the shared bucket with the freshest
X-HubSpot-RateLimit-Remainingvalue HubSpot reported. Even a request that succeeds feeds the next request's decision about whether to wait. - Pre-flight wait. Before the call goes out, the layer asks the bucket for a token. If headroom is below 5% (configurable), the call sleeps until the rolling window opens or yields to a higher-priority caller. You see this in the dev log as a
RATELIMIT WAIT 412msline. - Retry on 429. If HubSpot returns 429 anyway, the layer reads the
Retry-Afterheader, waits, and retries up to the configuredmaxtimes with exponential backoff and decorrelated jitter. The retry budget is per-call, not global, so a slow loop doesn't starve fast ones. - Idempotency. For
GET,PUT, andDELETEthe retry is unconditional. ForPOSTit retries only when the response is a 429 or a 5xx that arrived before the body was acknowledged. That avoids the classic "we created the contact twice because the timeout happened mid-write" bug.
What to do today: fetch-with-retry against context.fetch
The same shape, hand-rolled against the runtime you have right now:
async function hubspotFetch(
fetch: typeof globalThis.fetch,
url: string,
init: RequestInit = {},
retry = { max: 5, baseMs: 200, capMs: 10_000 },
): Promise<Response> {
for (let attempt = 0; ; attempt++) {
const res = await fetch(url, init);
if (res.status !== 429 && res.status < 500) return res;
if (attempt >= retry.max) return res;
const retryAfter = Number(res.headers.get('retry-after')) * 1000;
const backoff = Math.min(retry.capMs, retry.baseMs * 2 ** attempt);
const wait = Number.isFinite(retryAfter) && retryAfter > 0
? retryAfter
: Math.random() * backoff; // decorrelated jitter
await new Promise((r) => setTimeout(r, wait));
}
}Calling raw fetch from your handler still counts against the portal limit, but nothing else in the Worker sees that the budget moved. If you go this route, funnel every HubSpot call through the helper above so backoff state is at least consistent within a single invocation.
Tune the retry policy per call site
The planned defaults are tuned for background work: { max: 5, baseMs: 200, capMs: 10_000 }. That's correct for a sync because the alternative — failing fast — means you lose the record. It's wrong for an interactive UI extension or a workflow action where the user is staring at a spinner. Override per call. (The http.fetch(url, { retry }) option is still design-preview — for now, pass a per-call retry object into the hubspotFetch helper above.)
// Design preview API. Interactive: fail fast, surface the error, let the UI offer a retry button.
const res = await http.fetch(`/crm/v3/objects/deals/${id}`, {
retry: { max: 1, baseMs: 100, capMs: 500 },
});
// Sync engine: patient, retries hard, takes its time.
const res = await http.fetch('/crm/v3/objects/contacts/batch/upsert', {
method: 'POST',
body: chunk,
retry: { max: 8, baseMs: 500, capMs: 30_000 },
});
// Webhook handler: in between. The portal will retry the webhook itself
// if you 500, so don't sit on the connection forever.
const res = await http.fetch(url, {
retry: { max: 3, baseMs: 200, capMs: 4_000 },
});Picking max, baseMs, and capMs
A short tour of the three knobs and how they interact, because picking values without a model in your head usually produces something that's wrong in the worst way.
maxis the number of retries, not attempts.max: 5means up to 6 total calls. Set it low when there's a human waiting (1 or 2), high when there isn't (5 to 8). Beyond 8 you're rarely getting more reliability, just longer tail latency on a portal that's already saturated.baseMsis the first backoff. The actual wait on attempt N is roughlyrandom(0, min(capMs, baseMs * 2^N))— that's decorrelated jitter, which avoids the synchronized-retry-storm failure mode where every caller backs off the same amount and slams the API at the same instant. Start at 200ms for interactive, 500ms for background.capMsis the ceiling on a single backoff. Important whenmaxis high, becausebaseMs * 2^8 = 51,200msis almost certainly longer than you want to wait. Set it to your user-perceptible budget for interactive calls (say 500ms to 1s) and to your sync's tolerance for tail latency for background work (10s to 30s is normal).
When the defaults are wrong on purpose
Two cases worth calling out. First, if your call is inside a workflow action that HubSpot itself will retry on failure, you want max: 1 to fail fast and let HubSpot's own retry handle it — otherwise you've built two retry loops stacked on each other and your timing math is wrong. Second, if you're in a scheduled (cron) job that runs every minute, set capMs below 60_000 so a retry storm can't make the next invocation overlap the previous one.
Batch correctly with /batch/upsert
Single-record writes are the most common reason a healthy-looking Worker hits the rate limit. A sync that writes 1,000 contacts one at a time burns 1,000 requests; the same sync using /crm/v3/objects/contacts/batch/upsert burns 10. Every write path in production should be batched unless you have a specific reason it can't be.
// Design preview — the worker.sync `batch` option is documented here for
// the API we're building toward. Today, drive the batch endpoint manually
// using the hubspotFetch helper from step 1.
import { defineSource, defineWorker, env } from '@hs-x/sdk';
export default defineWorker(({ worker }) => {
worker.sync(stripeCustomers, {
into: 'contacts',
schedule: '15m',
schema: {
email: 'email',
stripe_customer_id: 'string',
mrr_cents: 'number',
plan: { type: 'enum', values: ['free', 'starter', 'pro', 'enterprise'] },
},
// The two knobs are the chunk size (default 100, which is the HubSpot
// max) and the idProperty used for upsert.
batch: { size: 100, idProperty: 'stripe_customer_id' },
});
});If you need to write outside a sync — say, in a workflow action that processes a list of records — the design-preview http.batch.upsert helper will chunk the array into 100-record requests, run them with the configured concurrency, and return the combined result with per-record success and failure:
// Design preview — wrap a real POST to /crm/v3/objects/contacts/batch/upsert today.
const result = await http.batch.upsert('contacts', records, {
idProperty: 'stripe_customer_id',
concurrency: 2,
});
// result.created, result.updated, result.failed are all typed arrays.The idProperty rules that bite people
HubSpot's /batch/upsert endpoint is upsert by a specific property, and the rules around that property are not symmetric with single-record upsert. Three things to know before you ship a batch write:
- The idProperty must be unique on the object. That's
emailandhs_object_idfor contacts out of the box, plus any custom property you've marked unique in property settings. If you try to upsert by a non-unique property, the call 400s on every record in the batch. emailas an idProperty does not support partial upsert on contacts. HubSpot's object APIs require a custom unique property when you want partial upserts on contacts — passingidProperty=emailwill reject the partial-upsert path. The fix is to declare a custom unique property (for examplestripe_customer_id) and upsert by that; reserve email for the create path.- Properties not in the payload are not cleared. This is the upsert-vs-replace distinction. Sending
{ email, mrr_cents: 0 }setsmrr_centsto 0 and leaves every other property alone. If you want to clear a property explicitly, send it asnull— omitting it does nothing.
Watching the chunk boundary
The 100-record cap is hard. http.batch.upsert chunks for you, but if you're rolling your own (say, because you need custom error handling per chunk), the call shape is POST /crm/v3/objects/{object}/batch/upsert with { inputs: [...100 records] }. Counting against the limit, this is one request, which is the whole point. A 1,000-record sync at 100-per-chunk is 10 requests; running them serially fits inside a single 10-second window with 100 requests of headroom for everything else on the portal.
Avoid the Search API in hot paths
The Search API (/crm/v3/objects/{object}/search) has its own rate limit that does not share with the main pool, and it is tight. Five requests per second, regardless of your subscription tier, with page size capped at 200 records and a 10,000-record hard ceiling per query. A UI extension that calls Search to filter a list of deals will hit the cap with a handful of people refreshing the same record at the same second. If your dashboard says you have plenty of headroom and you're still seeing 429s, this is almost always why.
| Pattern | API | Limit | Use when |
|---|---|---|---|
| Get one record by id | /crm/v3/objects/{type}/{id} | 110/10s app-install pool | You have the object id |
| Get by unique property | /crm/v3/objects/{type}/{id}?idProperty=email | 110/10s app-install pool | You have email or another unique prop |
| List paginated | /crm/v3/objects/{type} | 110/10s app-install pool | You need every record |
| Filter by 1 property | Stored list or pre-indexed property | n/a | You filter on this property often |
| Filter by 2+ properties | /crm/v3/objects/{type}/search | 5 req/s separate cap, 200/page, 10k total | Genuinely ad-hoc, low-frequency |
When Search is a smell
A Search call in a UI extension's hot path usually means a property you should have stored isn't stored. If you find yourself writing search where stripe_customer_id = X, the fix is to write stripe_customer_id as a unique HubSpot property on the contact (declare it in your schema, add unique: true), and then use the much faster objects/{id}?idProperty=stripe_customer_id endpoint. That's the shared pool, no 5-req/s cap, and a single round trip.
The other common Search anti-pattern is the dashboard query — "show me deals modified in the last hour, owned by user X, in stage Y." The right fix there is to run that query as part of a 15-minute sync into a small materialized object you control (a custom object or a KV namespace on the Worker), and have the extension read from your cache. Search stays for genuinely ad-hoc, low-frequency reporting.
When you actually need Search
Sometimes you do need the live query — a workflow action that takes a free-text filter input from the user, a one-off report, a debug tool. For those, two things help. First, set retry: { max: 8, baseMs: 1_000, capMs: 15_000 } on the call, because 429s on Search are common and the backoff needs to be on the order of the 1-second window, not the 10-second one. Second, gate access — in the planned worker.action(..., { concurrency: 1 }) option, a single action is serialized across the whole Worker so a Search-heavy endpoint slows down instead of 429-ing. Today, you can approximate this with an in-memory mutex around the call inside your handler.
Alert on headroom, not on 429 count
A 429 count alert is the wrong alert. It fires after the failure, it fires at a fixed threshold that bears no relationship to your portal's actual budget, and it tells you nothing about how close you came to failing on the requests that succeeded. The right metric is headroom — the minimum value of X-HubSpot-RateLimit-Remaining / X-HubSpot-RateLimit-Max the Worker has seen in the last N minutes.
The planned worker.metrics.enable() helper and the suggested metric names (hubspot_ratelimit_headroom_pct, hubspot_ratelimit_429_total, etc.) are documented design surface — they're not yet emitted by the runtime. Today, log the X-HubSpot-RateLimit-Remaining / -Max headers from your retry wrapper and forward them to your observability stack (Cloudflare Analytics Engine, Datadog, Grafana) yourself.
Alert when the 5-minute minimum drops below 20%, page when it drops below 5%. By the time you're seeing 429s, headroom has been at zero for at least one 10-second window — the headroom alert gives you minutes of lead time on a problem the 429 alert tells you about after it's already user-visible.
// Design preview — not yet wired in @hs-x/sdk.
import { defineWorker } from '@hs-x/sdk';
export default defineWorker(({ worker }) => {
// Will emit headroom, 429 total, and request-duration metrics by default.
// Configure scrape targets and alerts in your observability stack of choice.
worker.metrics.enable();
});The four signals worth a dashboard tile
- Headroom (gauge). Minimum
remaining / maxover the last 5 minutes, per (app, portal). The leading indicator. Anything under 20% sustained is "you have a runaway loop, or a sync just gained an order of magnitude of work." - 429 rate (per minute). Not the count — the rate, normalized by total request volume. A spike from 0.1% to 5% means something changed; a steady 0.2% means HubSpot has occasional flakiness and your retry logic is working as designed.
- Retry depth p99. The 99th-percentile number of retries before a call succeeds. If this climbs from 0 to 3 over a day, you're operating closer to the cap than you think — every call is paying a backoff tax even when it succeeds.
- Batch chunk count. How many
/batch/upsertchunks the Worker has sent in the window. A sync that suddenly goes from 10 chunks to 200 means your source's row count grew an order of magnitude and your next problem is going to be the daily limit, not the secondly one.
Cross-link: /docs/guides/monitoring covers the full metrics surface — how to scrape, what to graph, and which alert thresholds correlate with real incidents in production.
Common rate-limit failures
The patterns below cover most of the rate-limit support tickets we see. If you're hitting 429s and the cause isn't on this list, instrument your retry wrapper to log the X-HubSpot-RateLimit-Remaining value seen on every response, broken down by caller and endpoint.
"We get 429s in bursts even when our average is under cap."
The 110/10s ceiling is a rolling window, not a per-second cap with reset. A sync that opportunistically blasts 200 requests in a single second will exceed the rolling 110/10s instantly, even if the next nine seconds are quiet. There is no published "burst credit" multiplier and no documented recovery cooldown beyond what the rolling window naturally enforces. The fix is to either batch (step 3, you go from 5,000 calls to 50) or to clamp concurrency so you can't push more than the window allows in any one-second slice.
"Sync engine and a workflow action are fighting for budget."
This is the case the shared token bucket exists to handle, but only if both callers go through the same HTTP layer. If one of them is calling raw fetch or using hubspot-api-nodejs directly without a shared limiter, it bypasses the bucket and starves the other. The fix is to route both through the same retry/limit wrapper, then bias the interactive caller — in the design-preview API that's http.fetch(url, { priority: 'high' }). Today, you can approximate this by giving the interactive path its own short-max retry config and the sync a long-max one, so the sync naturally backs off further when contention shows up.
"Search API limit confused with REST limit."
The single most common one. Your dashboard shows the main pool at 30% utilized, and your Search-heavy extension is still 429-ing. That's because the 5-req/s Search cap is a completely separate bucket. Track Search headroom independently (count Search 200/429 responses per second, alert on the ratio). The fix is to either move the query off Search (see step 4) or to serialize Search calls behind a mutex in your handler.
Where next
- Reference · Rate limits — the canonical per-tier table, every published header, and the exact rolling-window semantics.
- How to · Monitoring and observability — the full metrics surface, how to scrape it, and the alert thresholds we use internally on production portals.
- How to · Marketplace listing — what HubSpot's app review team looks for in rate-limit handling before approving a public listing, and how to demonstrate compliance.