CQ DataStore GC
Datastore Garbage Collection
The data store is optionally used to store large binary values. Normally all node and property data is stored in a persistence manager, but for large binaries such as files special treatment can improve performance and reduce disk usage.
The main features of the data store are:
- Space saving: only one copy per unique object it kept
- Fast copy: only the identifier is copied
- Storing and reading does not block others
- Multiple repositories can use the same data store
- Objects in the data store are immutable
- Garbage collection is used to purge unused objects
- Hot backup is supported
- Clustering: all cluster nodes use the same data store
Like Tar, Datastore also support append only architecture and hence size of datastore grows over time. Datastore GC is a process of cleaning unused data from datastore. More information about datastore can be found from Here. In CQ Datastore is located at /crx-quickstart/repository/repository/datastore
Starting Datastore GC from UI:
- Go to HOST:PORT/crx/explorer/config/index.jsp
- Click on Datastore Garbage Collection
- Change options if required
- Start Datastore GC
Starting Datastore GC from curl:
Datastore GC from Curl
curl -u <UID>:<PASSWORD> -X POST http://<HOST>:<PORT>/system/console/jmx/com.adobe.granite:type=Repository/op/runDataStoreGarbageCollection/java.lang.Boolean
To delete data and delay as 2
curl -u <UID>:<PASSWORD> -X POST --data "delete=true&delay=2" http://HOST:PORT/system/console/jmx/com.adobe.granite%3Atype%3DRepository/op/runDataStoreGarbageCollection/java.lang.Boolean
CRX Log message to look for
*INFO* [127.0.0.1  POST /system/console/jmx/com.adobe.granite%3Atype%3DRepository/op/runDataStoreGarbageCollection/java.lang.Boolean HTTP/1.1] com.day.crx.sling.server.impl.jmx.GarbageCollection Scanning /libs/wcm/core/content/siteadmin/actions/create/menu/createPage
Q: Similar to tar optimization does Datastore GC also run at some default time OOTB ?
Q: Any performance impact on running Datastore GC
A: Yes. Depend upon what delay you set. Less delay is more performance impact would be there.
Q: How about running datastore GC in cluster
A: You have to run in each node in cluster. If it is shared then only on one node.