Requirement #7911 (new)
Opened 12 years ago
Cache support in backened
Reported by: | jamoore | Owned by: | jamoore |
---|---|---|---|
Priority: | major | Milestone: | Unscheduled |
Component: | Services | Keywords: | n.a. |
Cc: | cxallan, jburel, ckm@…, oleg@… | Business Value: | n.a. |
Total Story Points: | n.a. | Roif: | n.a. |
Mandatory Story Points: | n.a. |
Description
Goals
A central caching component should be provided by the OMERO server so that arbitrary elements (perhaps represented as Ice objects) can be saved and eventually be purged either based on time or size policies.
Current status
The server makes use of ehcache for some minimal forms of caching. Insight uses ehcache in a more significant way. Web also has a django-based caching system that can be enabled by sysadmins.
Components to use caching
- OmeroSearch: The indexer could store hash-ed content for files so that it does not need to re-parse the file. For example, given file:1 with hash: abcdef0123456790 and the image:2, the indexer could store a map-of-string-sets under /<indexername>/file:abcdef0123456789/image:2/
- OmeroScripts: scripts could request blocks of image data for processing. If another execution of the script requests the same block it could be preferred from the cache. (Here data localization is key. If we can't simply guarantee that, then script execution may require something more than just a cache. cF Hadoop)
- ...
Design considerations
- Deployment: Like #7902, deployment and configuration of the caching should be trivial if not completely transparent.
- Monitoring: Metrics on the size of the cache would be quite interesting, along with hits and misses.
- Configuration: If possible, the size/time/location settings of the cache should be configurable, even remotely. For example, if the name of cache segment changes between versions, it may be necessary to allow the old cache location to remain but strictly decrease in size, while both versions of the code are in use.
- Client usage: Where possible, client cache requirements should be offset to use on the server. Due to the network overhead, this will likely not cover most needs. Transparent uses may be preferred. For example, the documentation of a method like IContainer.loadContainerHierarchies may stipulate that if the group_id, owner_id, and arguments are available under the cache segment /loadContainerHierarchy/group_id:X/owner_id:Y/...etc that the value will be returned a no query performed. The documentation should also stipulate what will invalidate the cache. (Note: Especially with respect to caching Hibernate queries, it's likely more efficient to make use of the second-level cache)
Related material
The cache may make use of the MQ (#7902) for some of its maintenance, invalidations, etc.