Ramblings to follow on from the github conversation:
I'm not sure a datestamped subfolder would work well without some kind of garbage collection added because it will cause the cache to grow daily, or else introduce a new requirement for people to run cache clear regularly.
I sort of like the idea where the subfolder name is the cacheCode. In the routes to get to cache clear that I can think of (drush cc, the button on the admin page, cv flush) the cacheCode has changed before it gets to the asset clear function in System::flushCache. So if clear() only deletes folders that don't match the current cacheCode, you'd be unlikely (although it's still possible) to have a race condition. I suppose clear() itself could reset the code for good measure, and then it doesn't matter what route it takes to get there.
A variation that's less change but might run slower, is for the clear function to only delete files whose filename hash/digest don't match the current cacheCode. Then it doesn't need to revise how the files get created, but needs to check every filename before deleting. Still a possible race condition though.
If the cache itself were TTL-based (currently TTL=infinity) instead of cacheCode, then the retrieval function could itself be responsible for clearing stale files (on a per file basis at retrieval time), and then cache clear wouldn't actually do any clearing itself, it would just set everything to expired so the next retrieval call would do a cleanup/recreate. The current url parameter could still be available as a way to request a fresh copy, but it would do the same as cache clear - set everythng to expired.
I'm not sure a datestamped subfolder would work well without some kind of garbage collection added because it will cause the cache to grow daily, or else introduce a new requirement for people to run cache clear regularly.
Ooh, really good point. It sort of raises the stakes on pruning regularly. I mean... we could add a scheduled job to do that.
I sort of like the idea where the subfolder name is the cacheCode.
Yeah, this is really appealing - it feels more precise (producing churn only when needed -- whereas datestamp approach leads to churn every day).
Some oddball configurations can allow for more than one resCacheCode to be valid at the same time. On a closer look, the only variable that I can find is domain_id. To wit: in multi-domain configurations, the civicrm_settings are partitioned by domain, so each domain can have a different resCacheCode. The domain ID is a hidden variable... but perhaps we can just make it visible:
Cleanup process: delete all {IMAGE_UPLOAD_DIR}/dyn/{DOMAIN_ID}_{RES_CACHE_CODE} where {DOMAIN_ID} matches and {RES_CACHE_CODE} does not match
A variation that's less change but might run slower, is for the clear function to only delete files whose filename hash/digest don't match the current cacheCode. Then it doesn't need to revise how the files get created, but needs to check every filename before deleting.
If there's anything that needs to know/match the resCacheCode, then I think we have to store it explicitly (in file-name or a metadata file). There's a lot of info packed into the digest, so the slow-way would probably be measured in CPU-years.
Still a possible race condition though.
Right. The race basically arises if System.flush tries to change the resetCacheCode() at the same time that it deletes the dyn files. The System.flush cannot know if (e.g.) some web-browser has an active page-view with a scheduled/pending/concurrent-ish request for some older assets. The operations for resetCacheCode() and delete files should be separated by some time/delay (like minutes or hours).
The prior datestamp suggestion would create a delay by skipping-back further (i.e. only pruning very old datestamped-folders). With the resCacheCode folders, you could get a more precise skip-back... if you changed resCacheCode to allow strict ordering:
// Old `resetCacheCode()`; not strictly ordered$this->setCacheCode(CRM_Utils_String::createRandom(5,CRM_Utils_String::ALPHANUMERIC));// New `resetCacheCode()`; strictly ordered$packedTime=sprintf("%07s",base_convert(time(),10,36));$rand=CRM_Utils_String::createRandom(5,CRM_Utils_String::ALPHANUMERIC);$this->setCacheCode($packedTime.'-''.$rand);
and then pruning would be:
// Retain the two newest cache subdirs for this domain. Delete all others.$folders=(array)glob("{$assetBuilderCacheDir}/{$domainId}_*");sort($folders);array_pop($folders);array_pop($folders);remove($folders);
Another way to create the needful delay between resetCacheCode() and delete-files... if we had a default queue available, the System.flush operation could enqueue a task to prune after (say) 60min. (I think there's a trick for delaying queue-items by some timeframe... CiviRules uses that trick?)
If the cache itself were TTL-based (currently TTL=infinity) instead of cacheCode, then the retrieval function could itself be responsible for clearing stale files (on a per file basis at retrieval time), and then cache clear wouldn't actually do any clearing itself
I was initially concerned that TTL checks would add more I/O to every pageview - but now I think that part's OK. Just patch AssetBuilder::build() to replace file_exists() with stat(). stat() will report existence as well as timestamps. (I'm pretty sure it's the same syscall - just more info coming back.)
My gut sense is that a TTL could be valid on its own merit, but there may be some unintuitiveness WRT sync'ing up timeframes in PHP/HTTPD/reverse-proxies. I think TTL will also require a job to prune expired files. (The list of cached-files and digest-codes is an open-set which varies based on version/environment/use-cases, so some kind of pruning is needed for old stragglers.)