[{"data":1,"prerenderedAt":1656},["ShallowReactive",2],{"post-\u002Fblog\u002Fwhy-i-built-singletonjob":3},{"id":4,"title":5,"body":6,"book":1645,"date":1646,"description":1647,"extension":1648,"meta":1649,"navigation":289,"path":1650,"seo":1651,"stem":1652,"tags":1653,"__hash__":1655},"blog\u002Fblog\u002Fwhy-i-built-singletonjob.md","Why I built SingletonJob",{"type":7,"value":8,"toc":1627},"minimark",[9,14,18,21,24,27,35,39,42,45,67,71,74,77,91,94,98,101,129,132,136,139,528,533,538,543,547,550,558,565,571,585,588,620,623,626,652,656,659,662,668,683,686,692,721,737,743,747,760,763,769,773,781,795,802,805,815,819,825,974,991,994,998,1015,1022,1224,1231,1262,1269,1280,1282,1285,1368,1374,1377,1439,1442,1446,1449,1469,1472,1476,1492,1505,1516,1526,1539,1543,1554,1557,1572,1581,1587,1598,1602,1620,1623],[10,11,13],"h2",{"id":12},"background","Background",[15,16,17],"p",{},"I work on a trading system. Without going into the specifics of what we trade, two things matter for the rest of this post.",[15,19,20],{},"First, prices tick. Every second. Sometimes faster than that.",[15,22,23],{},"Second, the prediction pipeline needs fresh data, and that data has to be pulled from a lot of different sources every 500 ms or so by background threads. If you zoom out far enough, the whole thing is basically \"is the data ready, and how stale is it?\" on a loop.",[15,25,26],{},"This is the story of a library I've wanted to write for years and finally did.",[15,28,29],{},[30,31,32],"a",{"href":32,"rel":33},"https:\u002F\u002Fgithub.com\u002Fhaiilong\u002FSingletonJob",[34],"nofollow",[10,36,38],{"id":37},"what-i-wanted","What I wanted",[15,40,41],{},"A way to run a periodic job across a few pods in Kubernetes, where exactly one pod runs it at any given moment. That's basically it.",[15,43,44],{},"But the requirements piled up:",[46,47,48,52,55,58,61,64],"ul",{},[49,50,51],"li",{},"Sub-second frequency. Some jobs need to fire every 500 ms.",[49,53,54],{},"Drop-on-overlap. If the previous tick is still running, skip the next one. Don't queue it. Don't run two at once. Just drop.",[49,56,57],{},"No persistence overhead. I don't need a job history, a dashboard, retry policies, or a database table per job. Just \"exactly one pod is the leader, and that pod runs the loop\".",[49,59,60],{},"Failover in seconds, not minutes.",[49,62,63],{},"Cheap. Hundreds of bytes per job in Redis, not hundreds of megabytes of in-memory state per pod.",[49,65,66],{},"AOT compatible, because eventually I want everything trimmed and AOT'd anyway.",[10,68,70],{"id":69},"why-not-hangfire","Why not Hangfire",[15,72,73],{},"Hangfire is great for what it was built for: durable, retryable, observable background jobs with a dashboard, like email queues and nightly reports. These are jobs where you come back tomorrow and see what happened.",[15,75,76],{},"But it isn't the right shape for what I needed:",[46,78,79,82,85,88],{},[49,80,81],{},"Cron has a one second minimum. That alone disqualifies it for tick driven work.",[49,83,84],{},"Overlapping runs queue. If the previous job runs long, the next one doesn't get skipped, it stacks. For price ticks, that's exactly backwards. You want the new tick to fire and the old one to die.",[49,86,87],{},"Memory and CPU spikes on startup. For a worker pod that already holds models in memory and runs hot loops, a Hangfire startup spike is not free.",[49,89,90],{},"The storage backend is structural overhead I don't need. A SQL Server schema with histories, retries, states, hash tables. For \"one pod runs this every second\", that is far too much machinery.",[15,92,93],{},"I'm not picking a fight with Hangfire. I just needed a different shape of tool, and the right shape happened to be small enough that nobody had bothered to publish it.",[10,95,97],{"id":96},"why-redis","Why Redis",[15,99,100],{},"When I floated the idea, the first question I got was usually \"couldn't you do this with a SQL Server row, or etcd, or ZooKeeper?\" Yes, you can. All of those work. Here is why I went with Redis anyway.",[46,102,103,106,117,120,126],{},[49,104,105],{},"Almost every .NET microservice I have worked on already had Redis somewhere: caching, pub\u002Fsub, rate limiting, locks for other things. Adding a 50 byte lock key per job is basically free.",[49,107,108,112,113,116],{},[109,110,111],"code",{},"SET NX PX"," is one command. The SQL Server equivalent is a transaction with ",[109,114,115],{},"WITH (UPDLOCK, HOLDLOCK)"," wrapped in a stored procedure. It works, but it's a lot more moving parts for the same outcome.",[49,118,119],{},"Lua scripts. The renewal and release patterns below are seven lines each. The SQL equivalents are not.",[49,121,122,125],{},[109,123,124],{},"StackExchange.Redis"," is mature and well behaved under load. I have never once had to debug the client itself, which is more than I can say for some SQL drivers.",[49,127,128],{},"A lock key is around 50 bytes. Three replicas × five jobs, each heartbeat every three seconds → 15\u002F3 = 5 ops\u002Fsec. Cost isn't something I have to think about.",[15,130,131],{},"If your stack already has etcd or Consul, those work fine too. But for a typical .NET shop with Redis already in the picture, this is about as cheap as it gets.",[10,133,135],{"id":134},"the-shape-of-the-thing","The shape of the thing",[15,137,138],{},"Three job types ended up covering pretty much every periodic workload I've had at work.",[140,141,146],"pre",{"className":142,"code":143,"language":144,"meta":145,"style":145},"language-csharp shiki shiki-themes one-light one-dark-pro","\u002F\u002F 1) Run, wait, run. \"At least N seconds between runs.\"\npublic sealed class HeartbeatJob(...) : SingletonIntervalJob(...)\n{\n    public override string JobName => \"heartbeat\";\n    protected override TimeSpan GetJobInterval() => TimeSpan.FromSeconds(1);\n    protected override Task ExecuteJobAsync(CancellationToken ct) { ... }\n}\n\n\u002F\u002F 2) Fire on a fixed rate. Drop the tick if the previous run is still in flight.\npublic sealed class PriceTickJob(...) : SingletonFixedRateJob(...)\n{\n    public override string JobName => \"price-tick\";\n    protected override TimeSpan GetJobInterval() => TimeSpan.FromMilliseconds(500);\n    protected override Task ExecuteJobAsync(CancellationToken ct) { ... }\n}\n\n\u002F\u002F 3) Cron schedule.\npublic sealed class DailyReportJob(...) : SingletonCronJob(...)\n{\n    private static readonly CronExpression Expr = CronExpression.Parse(\"0 3 * * *\"); \u002F\u002F from the Cronos library\n    public override string JobName => \"daily-report\";\n    protected override CronExpression GetCronExpression() => Expr;\n    protected override Task ExecuteJobAsync(CancellationToken ct) { ... }\n}\n","csharp","",[109,147,148,157,184,190,216,254,278,284,291,297,316,321,339,366,385,390,395,401,420,425,466,484,504,523],{"__ignoreMap":145},[149,150,153],"span",{"class":151,"line":152},"line",1,[149,154,156],{"class":155},"sW2Sy","\u002F\u002F 1) Run, wait, run. \"At least N seconds between runs.\"\n",[149,158,160,164,167,170,174,178,181],{"class":151,"line":159},2,[149,161,163],{"class":162},"sLKXg","public",[149,165,166],{"class":162}," sealed",[149,168,169],{"class":162}," class",[149,171,173],{"class":172},"sC09Y"," HeartbeatJob",[149,175,177],{"class":176},"s5ixo","(...) : ",[149,179,180],{"class":172},"SingletonIntervalJob",[149,182,183],{"class":176},"(...)\n",[149,185,187],{"class":151,"line":186},3,[149,188,189],{"class":176},"{\n",[149,191,193,196,199,202,206,209,213],{"class":151,"line":192},4,[149,194,195],{"class":162},"    public",[149,197,198],{"class":162}," override",[149,200,201],{"class":162}," string",[149,203,205],{"class":204},"siaei"," JobName",[149,207,208],{"class":176}," => ",[149,210,212],{"class":211},"sDhpE","\"heartbeat\"",[149,214,215],{"class":176},";\n",[149,217,219,222,224,227,231,234,238,241,244,247,251],{"class":151,"line":218},5,[149,220,221],{"class":162},"    protected",[149,223,198],{"class":162},[149,225,226],{"class":172}," TimeSpan",[149,228,230],{"class":229},"sAdtL"," GetJobInterval",[149,232,233],{"class":176},"() => ",[149,235,237],{"class":236},"s7GmK","TimeSpan",[149,239,240],{"class":176},".",[149,242,243],{"class":229},"FromSeconds",[149,245,246],{"class":176},"(",[149,248,250],{"class":249},"sAGMh","1",[149,252,253],{"class":176},");\n",[149,255,257,259,261,264,267,269,272,275],{"class":151,"line":256},6,[149,258,221],{"class":162},[149,260,198],{"class":162},[149,262,263],{"class":172}," Task",[149,265,266],{"class":229}," ExecuteJobAsync",[149,268,246],{"class":176},[149,270,271],{"class":172},"CancellationToken",[149,273,274],{"class":236}," ct",[149,276,277],{"class":176},") { ... }\n",[149,279,281],{"class":151,"line":280},7,[149,282,283],{"class":176},"}\n",[149,285,287],{"class":151,"line":286},8,[149,288,290],{"emptyLinePlaceholder":289},true,"\n",[149,292,294],{"class":151,"line":293},9,[149,295,296],{"class":155},"\u002F\u002F 2) Fire on a fixed rate. Drop the tick if the previous run is still in flight.\n",[149,298,300,302,304,306,309,311,314],{"class":151,"line":299},10,[149,301,163],{"class":162},[149,303,166],{"class":162},[149,305,169],{"class":162},[149,307,308],{"class":172}," PriceTickJob",[149,310,177],{"class":176},[149,312,313],{"class":172},"SingletonFixedRateJob",[149,315,183],{"class":176},[149,317,319],{"class":151,"line":318},11,[149,320,189],{"class":176},[149,322,324,326,328,330,332,334,337],{"class":151,"line":323},12,[149,325,195],{"class":162},[149,327,198],{"class":162},[149,329,201],{"class":162},[149,331,205],{"class":204},[149,333,208],{"class":176},[149,335,336],{"class":211},"\"price-tick\"",[149,338,215],{"class":176},[149,340,342,344,346,348,350,352,354,356,359,361,364],{"class":151,"line":341},13,[149,343,221],{"class":162},[149,345,198],{"class":162},[149,347,226],{"class":172},[149,349,230],{"class":229},[149,351,233],{"class":176},[149,353,237],{"class":236},[149,355,240],{"class":176},[149,357,358],{"class":229},"FromMilliseconds",[149,360,246],{"class":176},[149,362,363],{"class":249},"500",[149,365,253],{"class":176},[149,367,369,371,373,375,377,379,381,383],{"class":151,"line":368},14,[149,370,221],{"class":162},[149,372,198],{"class":162},[149,374,263],{"class":172},[149,376,266],{"class":229},[149,378,246],{"class":176},[149,380,271],{"class":172},[149,382,274],{"class":236},[149,384,277],{"class":176},[149,386,388],{"class":151,"line":387},15,[149,389,283],{"class":176},[149,391,393],{"class":151,"line":392},16,[149,394,290],{"emptyLinePlaceholder":289},[149,396,398],{"class":151,"line":397},17,[149,399,400],{"class":155},"\u002F\u002F 3) Cron schedule.\n",[149,402,404,406,408,410,413,415,418],{"class":151,"line":403},18,[149,405,163],{"class":162},[149,407,166],{"class":162},[149,409,169],{"class":162},[149,411,412],{"class":172}," DailyReportJob",[149,414,177],{"class":176},[149,416,417],{"class":172},"SingletonCronJob",[149,419,183],{"class":176},[149,421,423],{"class":151,"line":422},19,[149,424,189],{"class":176},[149,426,428,431,434,437,440,444,448,450,452,455,457,460,463],{"class":151,"line":427},20,[149,429,430],{"class":162},"    private",[149,432,433],{"class":162}," static",[149,435,436],{"class":162}," readonly",[149,438,439],{"class":172}," CronExpression",[149,441,443],{"class":442},"sJa8x"," Expr",[149,445,447],{"class":446},"sknuh"," =",[149,449,439],{"class":236},[149,451,240],{"class":176},[149,453,454],{"class":229},"Parse",[149,456,246],{"class":176},[149,458,459],{"class":211},"\"0 3 * * *\"",[149,461,462],{"class":176},"); ",[149,464,465],{"class":155},"\u002F\u002F from the Cronos library\n",[149,467,469,471,473,475,477,479,482],{"class":151,"line":468},21,[149,470,195],{"class":162},[149,472,198],{"class":162},[149,474,201],{"class":162},[149,476,205],{"class":204},[149,478,208],{"class":176},[149,480,481],{"class":211},"\"daily-report\"",[149,483,215],{"class":176},[149,485,487,489,491,493,496,498,502],{"class":151,"line":486},22,[149,488,221],{"class":162},[149,490,198],{"class":162},[149,492,439],{"class":172},[149,494,495],{"class":229}," GetCronExpression",[149,497,233],{"class":176},[149,499,501],{"class":500},"sz0mV","Expr",[149,503,215],{"class":176},[149,505,507,509,511,513,515,517,519,521],{"class":151,"line":506},23,[149,508,221],{"class":162},[149,510,198],{"class":162},[149,512,263],{"class":172},[149,514,266],{"class":229},[149,516,246],{"class":176},[149,518,271],{"class":172},[149,520,274],{"class":236},[149,522,277],{"class":176},[149,524,526],{"class":151,"line":525},24,[149,527,283],{"class":176},[15,529,530,532],{},[109,531,180],{}," is the simple one. Run, wait N seconds, run again. The time between iterations is bounded below, not above. If a job takes longer than the interval, the next start just gets pushed out.",[15,534,535,537],{},[109,536,313],{}," is the one I actually wrote this library for. Ticks come at fixed wall-clock offsets. If the previous tick is still running when the next one fires, that next tick gets dropped on the floor. No queue, no overlap, no surprise stacking later when the load picks back up.",[15,539,540,542],{},[109,541,417],{}," is for the boring stuff. Nightly reports, hourly cleanups, anything where the time of day matters. Cron expression in, callback out.",[10,544,546],{"id":545},"how-leader-election-actually-works","How leader election actually works",[15,548,549],{},"Leader election comes down to a single Redis key per job.",[140,551,556],{"className":552,"code":554,"language":555},[553],"language-text","{ProjectName}:{JobName}:lock\n","text",[109,557,554],{"__ignoreMap":145},[15,559,560,561,564],{},"Every replica, every ",[109,562,563],{},"HeartbeatInterval"," (3 seconds by default), runs:",[140,566,569],{"className":567,"code":568,"language":555},[553],"SET {lockKey} {nodeId} NX PX {LockExpiry}\n",[109,570,568],{"__ignoreMap":145},[15,572,573,576,577,580,581,584],{},[109,574,575],{},"NX"," means \"only set if absent\". ",[109,578,579],{},"PX"," is a TTL in milliseconds. The first pod to land that SET becomes the leader. Everyone else gets ",[109,582,583],{},"null"," back and stays a follower.",[15,586,587],{},"Renewal is a tiny Lua script the leader runs on every heartbeat:",[140,589,593],{"className":590,"code":591,"language":592,"meta":145,"style":145},"language-lua shiki shiki-themes one-light one-dark-pro","if redis.call('GET', KEYS[1]) == ARGV[1] then\n    return redis.call('PEXPIRE', KEYS[1], ARGV[2])\nelse\n    return 0\nend\n","lua",[109,594,595,600,605,610,615],{"__ignoreMap":145},[149,596,597],{"class":151,"line":152},[149,598,599],{},"if redis.call('GET', KEYS[1]) == ARGV[1] then\n",[149,601,602],{"class":151,"line":159},[149,603,604],{},"    return redis.call('PEXPIRE', KEYS[1], ARGV[2])\n",[149,606,607],{"class":151,"line":186},[149,608,609],{},"else\n",[149,611,612],{"class":151,"line":192},[149,613,614],{},"    return 0\n",[149,616,617],{"class":151,"line":218},[149,618,619],{},"end\n",[15,621,622],{},"Only the holder can extend the TTL. If the script returns 0, we lost leadership (probably because too many heartbeats failed in a row and the key expired in between), and the loop drops back to follower mode.",[15,624,625],{},"On graceful shutdown, there's a third Lua script:",[140,627,629],{"className":590,"code":628,"language":592,"meta":145,"style":145},"if redis.call('GET', KEYS[1]) == ARGV[1] then\n    return redis.call('DEL', KEYS[1])\nelse\n    return 0\nend\n",[109,630,631,635,640,644,648],{"__ignoreMap":145},[149,632,633],{"class":151,"line":152},[149,634,599],{},[149,636,637],{"class":151,"line":159},[149,638,639],{},"    return redis.call('DEL', KEYS[1])\n",[149,641,642],{"class":151,"line":186},[149,643,609],{},[149,645,646],{"class":151,"line":192},[149,647,614],{},[149,649,650],{"class":151,"line":218},[149,651,619],{},[10,653,655],{"id":654},"why-all-three-of-these-are-one-redis-command","Why all three of these are one Redis command",[15,657,658],{},"You might notice that acquire, renew, and release are each a single Redis operation. That's deliberate. Anything that does \"check, then act\" against shared state across multiple round trips is a race waiting to happen.",[15,660,661],{},"Take acquire. The naive version would be:",[140,663,666],{"className":664,"code":665,"language":555},[553],"EXISTS lockKey      # returns 0, nobody owns it\nSET lockKey nodeId  # OK, I'll set it\n",[109,667,665],{"__ignoreMap":145},[15,669,670,671,674,675,678,679,682],{},"Between those two commands, another pod can also see ",[109,672,673],{},"EXISTS"," return 0 and also issue its own ",[109,676,677],{},"SET",". Now both pods think they're the leader, and you spend the next incident figuring out why two workers fought over the same tick. ",[109,680,681],{},"SET ... NX"," solves this by collapsing the check and the write into one operation that Redis runs as a single atomic step. There is no window for anyone to slip in.",[15,684,685],{},"Renewal has the same problem. The naive version is:",[140,687,690],{"className":688,"code":689,"language":555},[553],"GET lockKey            # is it still me?\nPEXPIRE lockKey 10000  # yes, extend the TTL\n",[109,691,689],{"__ignoreMap":145},[15,693,694,695,698,699,702,703,706,707,709,710,714,715,717,718,720],{},"Between the ",[109,696,697],{},"GET"," and the ",[109,700,701],{},"PEXPIRE",", the lock can expire on its own (a network blip, a few missed heartbeats), and another pod can ",[109,704,705],{},"SET NX"," and become the new leader. If we then run our ",[109,708,701],{},", we just extended ",[711,712,713],"em",{},"their"," lock without realizing it. The new leader now holds a key with twice the TTL it should have, and we don't even know we lost leadership. The Lua script wraps ",[109,716,697],{}," and ",[109,719,701],{}," into one call. Redis runs the whole script atomically from every other client's perspective, so nothing can sneak in between the two steps.",[15,722,723,724,726,727,730,731,733,734,736],{},"Release is the same shape. ",[109,725,697],{}," then ",[109,728,729],{},"DEL"," is two commands and a race: if the lock expires and another pod acquires it between the two, our ",[109,732,729],{}," deletes ",[711,735,713],{}," lock. The Lua version checks ownership and deletes in one step.",[15,738,739,740,742],{},"So the rule is: any operation whose correctness depends on the current state of the lock has to run on the Redis server in one shot. ",[109,741,705],{}," handles acquire. Lua handles the other two.",[10,744,746],{"id":745},"why-explicit-release-matters","Why explicit release matters",[15,748,749,750,753,754,756,757,759],{},"Without an explicit release, peers have to wait up to ",[109,751,752],{},"LockExpiry"," (10 seconds by default) before a fresh ",[109,755,705],{}," can win. With release, the next pod takes over within one ",[109,758,563],{},", which is 3 seconds by default.",[15,761,762],{},"On a rolling deploy, that's the difference between \"10 seconds of nobody running the job\" and \"3 seconds of nobody running the job\". For a tick driven loop firing every 500 ms, that's the difference between a few stale ticks and around thirty of them.",[15,764,765,766,768],{},"On a hard kill (SIGKILL, OOM, the node dropping off the network), nothing graceful runs. The lock just expires after ",[109,767,752],{},". That's still fine. It's the worst case, and the worst case is bounded by your config.",[10,770,772],{"id":771},"sizing-heartbeatinterval-and-lockexpiry","Sizing HeartbeatInterval and LockExpiry",[15,774,775,776,717,778,780],{},"There are really only two knobs to tune: ",[109,777,563],{},[109,779,752],{},". The relationship between them is what you actually care about.",[15,782,783,785,786,788,789,791,792,794],{},[109,784,563],{}," is how often the leader tries to renew. ",[109,787,752],{}," is the TTL on the key. Once ",[109,790,752],{}," passes without a successful renewal, the key vanishes from Redis and whichever replica wins the next ",[109,793,705],{}," is the new leader.",[15,796,797,798,801],{},"Set them too close together (say 3s and 4s), and one slow round trip costs you leadership. Set them too far apart (3s and 60s), and a hard kill takes a full minute to fail over. The rule I land on is ",[109,799,800],{},"LockExpiry >= 3 * HeartbeatInterval",". Three missed renewals before we lose the lock. The defaults (3s and 10s) fit that rule.",[15,803,804],{},"For very fast jobs (every 500 ms, every 100 ms), the job loop tightens up, but the heartbeat doesn't have to match the job tick. The job loop and the election loop run in parallel inside the same hosted service, so you can run the job every 500 ms and still heartbeat at 3 s without anything fighting.",[15,806,807,808,811,812,814],{},"The library also logs a warning if a single iteration of ",[109,809,810],{},"ExecuteJobAsync"," runs longer than 80% of ",[109,813,752],{},". That's the canary for \"your job is so slow it's about to time out the lock and another pod will take it from you\". If you see that warning regularly, the sizing is wrong, not the job.",[10,816,818],{"id":817},"the-drop-on-overlap-bit","The drop-on-overlap bit",[15,820,821,822,824],{},"This is what ",[109,823,313],{}," does. The iteration loop, simplified, is:",[140,826,828],{"className":142,"code":827,"language":144,"meta":145,"style":145},"while (!ct.IsCancellationRequested)\n{\n    await _timer.WaitForNextTickAsync(ct);\n\n    if (!IsLeader) continue;\n    if (_isJobRunning) { \u002F* drop this tick *\u002F continue; }\n\n    _isJobRunning = true;\n    try { await ExecuteJobAsync(ct); }\n    finally { _isJobRunning = false; }\n}\n",[109,829,830,852,856,875,879,899,920,924,936,953,970],{"__ignoreMap":145},[149,831,832,835,838,841,844,846,849],{"class":151,"line":152},[149,833,834],{"class":162},"while",[149,836,837],{"class":176}," (",[149,839,840],{"class":446},"!",[149,842,843],{"class":236},"ct",[149,845,240],{"class":176},[149,847,848],{"class":236},"IsCancellationRequested",[149,850,851],{"class":176},")\n",[149,853,854],{"class":151,"line":159},[149,855,189],{"class":176},[149,857,858,861,864,866,869,871,873],{"class":151,"line":186},[149,859,860],{"class":176},"    await ",[149,862,863],{"class":236},"_timer",[149,865,240],{"class":176},[149,867,868],{"class":229},"WaitForNextTickAsync",[149,870,246],{"class":176},[149,872,843],{"class":500},[149,874,253],{"class":176},[149,876,877],{"class":151,"line":192},[149,878,290],{"emptyLinePlaceholder":289},[149,880,881,884,886,888,891,894,897],{"class":151,"line":218},[149,882,883],{"class":162},"    if",[149,885,837],{"class":176},[149,887,840],{"class":446},[149,889,890],{"class":500},"IsLeader",[149,892,893],{"class":176},") ",[149,895,896],{"class":162},"continue",[149,898,215],{"class":176},[149,900,901,903,905,908,911,914,917],{"class":151,"line":256},[149,902,883],{"class":162},[149,904,837],{"class":176},[149,906,907],{"class":500},"_isJobRunning",[149,909,910],{"class":176},") { ",[149,912,913],{"class":155},"\u002F* drop this tick *\u002F",[149,915,916],{"class":162}," continue",[149,918,919],{"class":176},"; }\n",[149,921,922],{"class":151,"line":280},[149,923,290],{"emptyLinePlaceholder":289},[149,925,926,929,931,934],{"class":151,"line":286},[149,927,928],{"class":500},"    _isJobRunning",[149,930,447],{"class":446},[149,932,933],{"class":249}," true",[149,935,215],{"class":176},[149,937,938,941,944,946,948,950],{"class":151,"line":293},[149,939,940],{"class":162},"    try",[149,942,943],{"class":176}," { await ",[149,945,810],{"class":229},[149,947,246],{"class":176},[149,949,843],{"class":500},[149,951,952],{"class":176},"); }\n",[149,954,955,958,961,963,965,968],{"class":151,"line":299},[149,956,957],{"class":162},"    finally",[149,959,960],{"class":176}," { ",[149,962,907],{"class":500},[149,964,447],{"class":446},[149,966,967],{"class":249}," false",[149,969,919],{"class":176},[149,971,972],{"class":151,"line":318},[149,973,283],{"class":176},[15,975,976,979,980,983,984,986,987,990],{},[109,977,978],{},"PeriodicTimer.WaitForNextTickAsync"," gives you ticks at fixed wall-clock instants instead of drifting like ",[109,981,982],{},"Task.Delay"," would. The ",[109,985,907],{}," flag is just a ",[109,988,989],{},"volatile bool",". If a tick arrives while a previous run is still going, we drop it on the floor.",[15,992,993],{},"This is the semantic Hangfire's recurring job runner doesn't give you. Hangfire queues overlapping runs. Mine drops them. For \"run prediction every 500 ms\" workloads, drop is the correct default. A stale prediction is worse than a missed one.",[10,995,997],{"id":996},"aot-and-the-source-generator","AOT and the source generator",[15,999,1000,1001,717,1004,1007,1008,717,1011,1014],{},"The library targets ",[109,1002,1003],{},"net8.0",[109,1005,1006],{},"net10.0",", and it's marked ",[109,1009,1010],{},"IsAotCompatible=true",[109,1012,1013],{},"IsTrimmable=true",". Those flags only mean something if you actually avoid reflection at startup, so I shipped a Roslyn source generator inside the package.",[15,1016,1017,1018,1021],{},"The generator scans your compilation, finds every non-abstract subclass of ",[109,1019,1020],{},"SingletonBackgroundJob",", and emits an extension method directly into your assembly:",[140,1023,1025],{"className":142,"code":1024,"language":144,"meta":145,"style":145},"internal static class SingletonJobGeneratedRegistration\n{\n    internal static IServiceCollection AddSingletonJobs(this IServiceCollection services, IConfiguration? configuration = null)\n    {\n        services.ConfigureSingletonJobOptions(configuration);\n        services.TryAddEnumerable(ServiceDescriptor.Singleton\u003CIHostedService, MyApp.DailyReportJob>());\n        services.TryAddEnumerable(ServiceDescriptor.Singleton\u003CIHostedService, MyApp.HeartbeatJob>());\n        services.TryAddEnumerable(ServiceDescriptor.Singleton\u003CIHostedService, MyApp.PriceTickJob>());\n        return services;\n    }\n}\n",[109,1026,1027,1039,1043,1085,1090,1106,1144,1175,1206,1215,1220],{"__ignoreMap":145},[149,1028,1029,1032,1034,1036],{"class":151,"line":152},[149,1030,1031],{"class":162},"internal",[149,1033,433],{"class":162},[149,1035,169],{"class":162},[149,1037,1038],{"class":172}," SingletonJobGeneratedRegistration\n",[149,1040,1041],{"class":151,"line":159},[149,1042,189],{"class":176},[149,1044,1045,1048,1050,1053,1056,1058,1061,1063,1066,1069,1072,1075,1078,1080,1083],{"class":151,"line":186},[149,1046,1047],{"class":162},"    internal",[149,1049,433],{"class":162},[149,1051,1052],{"class":172}," IServiceCollection",[149,1054,1055],{"class":229}," AddSingletonJobs",[149,1057,246],{"class":176},[149,1059,1060],{"class":162},"this",[149,1062,1052],{"class":172},[149,1064,1065],{"class":236}," services",[149,1067,1068],{"class":176},", ",[149,1070,1071],{"class":172},"IConfiguration",[149,1073,1074],{"class":176},"? ",[149,1076,1077],{"class":236},"configuration",[149,1079,447],{"class":446},[149,1081,1082],{"class":249}," null",[149,1084,851],{"class":176},[149,1086,1087],{"class":151,"line":192},[149,1088,1089],{"class":176},"    {\n",[149,1091,1092,1095,1097,1100,1102,1104],{"class":151,"line":218},[149,1093,1094],{"class":236},"        services",[149,1096,240],{"class":176},[149,1098,1099],{"class":229},"ConfigureSingletonJobOptions",[149,1101,246],{"class":176},[149,1103,1077],{"class":500},[149,1105,253],{"class":176},[149,1107,1108,1110,1112,1115,1117,1120,1122,1125,1128,1131,1133,1136,1138,1141],{"class":151,"line":256},[149,1109,1094],{"class":236},[149,1111,240],{"class":176},[149,1113,1114],{"class":229},"TryAddEnumerable",[149,1116,246],{"class":176},[149,1118,1119],{"class":236},"ServiceDescriptor",[149,1121,240],{"class":176},[149,1123,1124],{"class":229},"Singleton",[149,1126,1127],{"class":176},"\u003C",[149,1129,1130],{"class":172},"IHostedService",[149,1132,1068],{"class":176},[149,1134,1135],{"class":172},"MyApp",[149,1137,240],{"class":176},[149,1139,1140],{"class":172},"DailyReportJob",[149,1142,1143],{"class":176},">());\n",[149,1145,1146,1148,1150,1152,1154,1156,1158,1160,1162,1164,1166,1168,1170,1173],{"class":151,"line":280},[149,1147,1094],{"class":236},[149,1149,240],{"class":176},[149,1151,1114],{"class":229},[149,1153,246],{"class":176},[149,1155,1119],{"class":236},[149,1157,240],{"class":176},[149,1159,1124],{"class":229},[149,1161,1127],{"class":176},[149,1163,1130],{"class":172},[149,1165,1068],{"class":176},[149,1167,1135],{"class":172},[149,1169,240],{"class":176},[149,1171,1172],{"class":172},"HeartbeatJob",[149,1174,1143],{"class":176},[149,1176,1177,1179,1181,1183,1185,1187,1189,1191,1193,1195,1197,1199,1201,1204],{"class":151,"line":286},[149,1178,1094],{"class":236},[149,1180,240],{"class":176},[149,1182,1114],{"class":229},[149,1184,246],{"class":176},[149,1186,1119],{"class":236},[149,1188,240],{"class":176},[149,1190,1124],{"class":229},[149,1192,1127],{"class":176},[149,1194,1130],{"class":172},[149,1196,1068],{"class":176},[149,1198,1135],{"class":172},[149,1200,240],{"class":176},[149,1202,1203],{"class":172},"PriceTickJob",[149,1205,1143],{"class":176},[149,1207,1208,1211,1213],{"class":151,"line":293},[149,1209,1210],{"class":162},"        return",[149,1212,1065],{"class":500},[149,1214,215],{"class":176},[149,1216,1217],{"class":151,"line":299},[149,1218,1219],{"class":176},"    }\n",[149,1221,1222],{"class":151,"line":318},[149,1223,283],{"class":176},[15,1225,1226,1227,1230],{},"So in ",[109,1228,1229],{},"Program.cs"," you write:",[140,1232,1234],{"className":142,"code":1233,"language":144,"meta":145,"style":145},"builder.Services.AddSingletonJobs(builder.Configuration);\n",[109,1235,1236],{"__ignoreMap":145},[149,1237,1238,1241,1243,1246,1248,1251,1253,1255,1257,1260],{"class":151,"line":152},[149,1239,1240],{"class":236},"builder",[149,1242,240],{"class":176},[149,1244,1245],{"class":236},"Services",[149,1247,240],{"class":176},[149,1249,1250],{"class":229},"AddSingletonJobs",[149,1252,246],{"class":176},[149,1254,1240],{"class":236},[149,1256,240],{"class":176},[149,1258,1259],{"class":236},"Configuration",[149,1261,253],{"class":176},[15,1263,1264,1265,1268],{},"That call expands at compile time into the class above. No ",[109,1266,1267],{},"Assembly.GetTypes()",", no reflection, no trim warnings at publish. I’ve built several source generators before, so this was familiar territory.",[15,1270,1271,1272,1275,1276,1279],{},"One small catch: the generator only runs as part of a build. On a fresh checkout your IDE will scream ",[109,1273,1274],{},"CS1061: 'IServiceCollection' does not contain a definition for 'AddSingletonJobs'"," until you ",[109,1277,1278],{},"dotnet build"," once. After that it resolves and stays resolved.",[10,1281,1259],{"id":1077},[15,1283,1284],{},"Defaults look like:",[140,1286,1290],{"className":1287,"code":1288,"language":1289,"meta":145,"style":145},"language-json shiki shiki-themes one-light one-dark-pro","{\n  \"ConnectionStrings\": { \"Redis\": \"localhost:6379\" },\n  \"SingletonJob\": {\n    \"ProjectName\": \"myapp\",\n    \"HeartbeatInterval\": \"00:00:03\",\n    \"LockExpiry\": \"00:00:10\"\n  }\n}\n","json",[109,1291,1292,1296,1316,1324,1337,1349,1359,1364],{"__ignoreMap":145},[149,1293,1294],{"class":151,"line":152},[149,1295,189],{"class":176},[149,1297,1298,1301,1304,1307,1310,1313],{"class":151,"line":159},[149,1299,1300],{"class":442},"  \"ConnectionStrings\"",[149,1302,1303],{"class":176},": { ",[149,1305,1306],{"class":442},"\"Redis\"",[149,1308,1309],{"class":176},": ",[149,1311,1312],{"class":211},"\"localhost:6379\"",[149,1314,1315],{"class":176}," },\n",[149,1317,1318,1321],{"class":151,"line":186},[149,1319,1320],{"class":442},"  \"SingletonJob\"",[149,1322,1323],{"class":176},": {\n",[149,1325,1326,1329,1331,1334],{"class":151,"line":192},[149,1327,1328],{"class":442},"    \"ProjectName\"",[149,1330,1309],{"class":176},[149,1332,1333],{"class":211},"\"myapp\"",[149,1335,1336],{"class":176},",\n",[149,1338,1339,1342,1344,1347],{"class":151,"line":218},[149,1340,1341],{"class":442},"    \"HeartbeatInterval\"",[149,1343,1309],{"class":176},[149,1345,1346],{"class":211},"\"00:00:03\"",[149,1348,1336],{"class":176},[149,1350,1351,1354,1356],{"class":151,"line":256},[149,1352,1353],{"class":442},"    \"LockExpiry\"",[149,1355,1309],{"class":176},[149,1357,1358],{"class":211},"\"00:00:10\"\n",[149,1360,1361],{"class":151,"line":280},[149,1362,1363],{"class":176},"  }\n",[149,1365,1366],{"class":151,"line":286},[149,1367,283],{"class":176},[15,1369,1370,1371,1373],{},"The relationship that actually matters is ",[109,1372,800],{},". A single dropped network call shouldn't cost you leadership. Three in a row, sure.",[15,1375,1376],{},"Per-job override if you have one heavy job that needs a longer lock:",[140,1378,1380],{"className":142,"code":1379,"language":144,"meta":145,"style":145},"services.PostConfigureSingletonJob(\"heavy-job\", o =>\n{\n    o.LockExpiry = TimeSpan.FromMinutes(5);\n});\n",[109,1381,1382,1405,1409,1434],{"__ignoreMap":145},[149,1383,1384,1387,1389,1392,1394,1397,1399,1402],{"class":151,"line":152},[149,1385,1386],{"class":236},"services",[149,1388,240],{"class":176},[149,1390,1391],{"class":229},"PostConfigureSingletonJob",[149,1393,246],{"class":176},[149,1395,1396],{"class":211},"\"heavy-job\"",[149,1398,1068],{"class":176},[149,1400,1401],{"class":236},"o",[149,1403,1404],{"class":176}," =>\n",[149,1406,1407],{"class":151,"line":159},[149,1408,189],{"class":176},[149,1410,1411,1414,1416,1418,1420,1422,1424,1427,1429,1432],{"class":151,"line":186},[149,1412,1413],{"class":236},"    o",[149,1415,240],{"class":176},[149,1417,752],{"class":236},[149,1419,447],{"class":446},[149,1421,226],{"class":236},[149,1423,240],{"class":176},[149,1425,1426],{"class":229},"FromMinutes",[149,1428,246],{"class":176},[149,1430,1431],{"class":249},"5",[149,1433,253],{"class":176},[149,1435,1436],{"class":151,"line":192},[149,1437,1438],{"class":176},"});\n",[15,1440,1441],{},"Per-job options are frozen at startup. If you need to change them, redeploy. I went back and forth on whether to support hot reload and eventually convinced myself that hot reloading leader election config is a great way to invent a heisenbug.",[10,1443,1445],{"id":1444},"what-it-does-not-do","What it does not do",[15,1447,1448],{},"Libraries that try to do too much are how you end up rebuilding Hangfire, so the non-goals matter here.",[46,1450,1451,1457,1460,1463,1466],{},[49,1452,1453,1454,1456],{},"No retries. If ",[109,1455,810],{}," throws, it's logged and the next tick runs. Want retries? Write them in your handler.",[49,1458,1459],{},"No history. Ticks aren't persisted anywhere. Want a record? Log it yourself.",[49,1461,1462],{},"No dashboard. There is no UI. There never will be.",[49,1464,1465],{},"No cross-pod work distribution. Exactly one pod runs the job, the others sit idle. If you want round-robin or sharded execution, that's a different problem and a different library.",[49,1467,1468],{},"No durability. Jobs are in-memory loops. A pod restart means the loop restarts. That is on purpose.",[15,1470,1471],{},"What's left is the one thing the library actually does: make sure exactly one pod across N replicas runs a given periodic loop, with fast and bounded failover.",[10,1473,1475],{"id":1474},"a-few-things-i-learned-along-the-way","A few things I learned along the way",[15,1477,1478,1487,1488,1491],{},[1479,1480,1481,1484,1485,240],"strong",{},[109,1482,1483],{},"volatile"," is the right primitive for ",[109,1486,890],{}," Single writer (the election loop), many readers (the job loop, the release path). Eventually consistent publication is fine here, because losing leadership only delays a single iteration check by one tick at worst. Reaching for ",[109,1489,1490],{},"Interlocked"," or a lock would be cargo-cult programming, more complexity than the problem needs.",[15,1493,1494,1500,1501,1504],{},[1479,1495,1496,1499],{},[109,1497,1498],{},"PeriodicTimer"," is the right primitive for fixed rate ticks."," It produces ticks at fixed wall-clock instants. ",[109,1502,1503],{},"await Task.Delay(interval)"," does not. The drift adds up over a few hours, and you only notice when you check the timestamps in the logs and realize you've quietly lost a beat.",[15,1506,1507,1510,1511,726,1513,1515],{},[1479,1508,1509],{},"Lua scripts make Redis atomic for free."," ",[109,1512,697],{},[109,1514,701],{}," is two round trips and a race. The Lua version is one round trip and atomic. After writing the first script, the other two were easy: renew, release, and a no-op ownership check.",[15,1517,1518,1521,1522,1525],{},[1479,1519,1520],{},"Backoff with jitter."," When Redis comes back after an outage, you don't want N replicas to all retry at the exact same moment and dogpile the server. The formula is ",[109,1523,1524],{},"delay = min(HeartbeatInterval * 2^failures, MaxBackoffDelay)"," plus or minus 20% jitter. Four lines of code that save you an entire class of follow-on incident.",[15,1527,1528,1531,1532,1535,1536,1538],{},[1479,1529,1530],{},"Cron without a time zone is UTC."," Cronos is great, but the default for \"cron with no time zone\" is UTC. We're in Singapore (UTC+8), so a daily 3 AM job actually fires at 11 AM local. After running into this in a test deployment, I added the optional ",[109,1533,1534],{},"TimeZone"," override on ",[109,1537,417],{},". If you only ever run in UTC, ignore. If not, set it explicitly, most likely to your server's own local time.",[10,1540,1542],{"id":1541},"try-it","Try it",[140,1544,1548],{"className":1545,"code":1546,"language":1547,"meta":145,"style":145},"language-sh shiki shiki-themes one-light one-dark-pro","dotnet add package SingletonJob\n","sh",[109,1549,1550],{"__ignoreMap":145},[149,1551,1552],{"class":151,"line":152},[149,1553,1546],{},[15,1555,1556],{},"Or clone the repo and spin up three workers locally:",[140,1558,1560],{"className":1545,"code":1559,"language":1547,"meta":145,"style":145},"cd samples\ndocker compose up --build --scale worker=3\n",[109,1561,1562,1567],{"__ignoreMap":145},[149,1563,1564],{"class":151,"line":152},[149,1565,1566],{},"cd samples\n",[149,1568,1569],{"class":151,"line":159},[149,1570,1571],{},"docker compose up --build --scale worker=3\n",[15,1573,1574,1575,1578,1579,240],{},"Exactly one of them prints ",[109,1576,1577],{},"became LEADER",". Kill it, and another takes over within ",[109,1580,563],{},[15,1582,1583,1584],{},"Repo: ",[30,1585,32],{"href":32,"rel":1586},[34],[15,1588,1589,1590,1593,1594,1597],{},"If you are interested, please study the ",[109,1591,1592],{},"README.md"," and all the technical documents under ",[109,1595,1596],{},".\u002Fdoc",". I'm happy to hear any feedbacks.",[10,1599,1601],{"id":1600},"closing","Closing",[15,1603,1604,1605,1608,1609,1611,1612,1615,1616,1619],{},"I've wanted a library like this to exist for years. Every team I've been on has eventually written some version of it: a half broken ",[109,1606,1607],{},"try\u002Ffinally"," around a Redis ",[109,1610,705],{},", a hand rolled scheduler that quietly queues runs it should have dropped, and worst of all: a ",[109,1613,1614],{},"Quartz","\u002F",[109,1617,1618],{},"BackgroundService"," that runs in every pod or configured to \"only run on a pod 0\" (extremely brittle). None of them were ever good enough to pull out into a package.",[15,1621,1622],{},"This one I think actually is. The code is short enough to read in one sitting. The surface area is small enough that I keep failing to find new things to add to it. And the design has held up across enough rewrites at work that I'm not nervous about it anymore. If you have a tick driven workload in .NET and you've been fighting Hangfire about it, give this a try.",[1624,1625,1626],"style",{},"html .default .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .dark .shiki span {color: var(--shiki-dark);background: var(--shiki-dark-bg);font-style: var(--shiki-dark-font-style);font-weight: var(--shiki-dark-font-weight);text-decoration: var(--shiki-dark-text-decoration);}html.dark .shiki span {color: var(--shiki-dark);background: var(--shiki-dark-bg);font-style: var(--shiki-dark-font-style);font-weight: var(--shiki-dark-font-weight);text-decoration: var(--shiki-dark-text-decoration);}html pre.shiki code .sW2Sy, html code.shiki .sW2Sy{--shiki-default:#A0A1A7;--shiki-default-font-style:italic;--shiki-dark:#7F848E;--shiki-dark-font-style:italic}html pre.shiki code .sLKXg, html code.shiki .sLKXg{--shiki-default:#A626A4;--shiki-dark:#C678DD}html pre.shiki code .sC09Y, html code.shiki .sC09Y{--shiki-default:#C18401;--shiki-dark:#E5C07B}html pre.shiki code .s5ixo, html code.shiki .s5ixo{--shiki-default:#383A42;--shiki-dark:#ABB2BF}html pre.shiki code .siaei, html code.shiki .siaei{--shiki-default:#4078F2;--shiki-dark:#ABB2BF}html pre.shiki code .sDhpE, html code.shiki .sDhpE{--shiki-default:#50A14F;--shiki-dark:#98C379}html pre.shiki code .sAdtL, html code.shiki .sAdtL{--shiki-default:#4078F2;--shiki-dark:#61AFEF}html pre.shiki code .s7GmK, html code.shiki .s7GmK{--shiki-default:#383A42;--shiki-dark:#E5C07B}html pre.shiki code .sAGMh, html code.shiki .sAGMh{--shiki-default:#986801;--shiki-dark:#D19A66}html pre.shiki code .sJa8x, html code.shiki .sJa8x{--shiki-default:#E45649;--shiki-dark:#E06C75}html pre.shiki code .sknuh, html code.shiki .sknuh{--shiki-default:#383A42;--shiki-dark:#56B6C2}html pre.shiki code .sz0mV, html code.shiki .sz0mV{--shiki-default:#383A42;--shiki-dark:#E06C75}",{"title":145,"searchDepth":159,"depth":159,"links":1628},[1629,1630,1631,1632,1633,1634,1635,1636,1637,1638,1639,1640,1641,1642,1643,1644],{"id":12,"depth":159,"text":13},{"id":37,"depth":159,"text":38},{"id":69,"depth":159,"text":70},{"id":96,"depth":159,"text":97},{"id":134,"depth":159,"text":135},{"id":545,"depth":159,"text":546},{"id":654,"depth":159,"text":655},{"id":745,"depth":159,"text":746},{"id":771,"depth":159,"text":772},{"id":817,"depth":159,"text":818},{"id":996,"depth":159,"text":997},{"id":1077,"depth":159,"text":1259},{"id":1444,"depth":159,"text":1445},{"id":1474,"depth":159,"text":1475},{"id":1541,"depth":159,"text":1542},{"id":1600,"depth":159,"text":1601},null,"2026-05-11","A Redis-backed singleton background job library for high-frequency .NET workloads.","md",{},"\u002Fblog\u002Fwhy-i-built-singletonjob",{"title":5,"description":1647},"blog\u002Fwhy-i-built-singletonjob",[1654],"tech","NHt5gVis_1dYUi9Qx7uyTrBCed34BWQPMg5sCG3U0Pc",1778998257277]