CQ DataStore GC

CQ Maintenance

Datastore Garbage Collection

Overview:

The data store is optionally used to store large binary values. Normally all node and property data is stored in a persistence manager, but for large binaries such as files special treatment can improve performance and reduce disk usage.

The main features of the data store are:

  • Space saving: only one copy per unique object it kept
  • Fast copy: only the identifier is copied
  • Storing and reading does not block others
  • Multiple repositories can use the same data store
  • Objects in the data store are immutable
  • Garbage collection is used to purge unused objects
  • Hot backup is supported
  • Clustering: all cluster nodes use the same data store

Like Tar, Datastore also support append only architecture and hence size of datastore grows over time. Datastore GC is a process of cleaning unused data from datastore. More information about datastore can be found from Here. In CQ Datastore is located at /crx-quickstart/repository/repository/datastore

Starting Datastore GC from UI:

  • Go to HOST:PORT/crx/explorer/config/index.jsp
  • Click on Datastore Garbage Collection
  • Change options if required
  • Start Datastore GC

Starting Datastore GC from curl:

Datastore GC from Curl

curl -u <UID>:<PASSWORD> -X POST http://<HOST>:<PORT>/system/console/jmx/com.adobe.granite:type=Repository/op/runDataStoreGarbageCollection/java.lang.Boolean

To delete data and delay as 2

curl -u <UID>:<PASSWORD> -X POST --data "delete=true&delay=2" http://HOST:PORT/system/console/jmx/com.adobe.granite%3Atype%3DRepository/op/runDataStoreGarbageCollection/java.lang.Boolean

CRX Log message to look for

*INFO* [127.0.0.1 [1332343886706] POST /system/console/jmx/com.adobe.granite%3Atype%3DRepository/op/runDataStoreGarbageCollection/java.lang.Boolean HTTP/1.1] com.day.crx.sling.server.impl.jmx.GarbageCollection Scanning /libs/wcm/core/content/siteadmin/actions/create/menu/createPage

Some QA:

Q: Similar to tar optimization does Datastore GC also run at some default time OOTB ?

A: No

Q: Any performance impact on running Datastore GC

A: Yes. Depend upon what delay you set. Less delay is more performance impact would be there.

Q: How about running datastore GC in cluster

A: You have to run in each node in cluster. If it is shared then only on one node.