Understanding Alfresco document life cycle for backup strategy

Before implementing your Alfresco backup strategy it is highly recommended that you first fully understand the Alfresco document life cycle (i.e what happen once a file is deleted).

The following diagram summarize the complete life cycle of a document and what kind of operations occure during the deletion process:

Diagram legend:

 

So at the end of the life cycle, the document binary is still not deleted from the file system but rather moved to the designated ‘deletedContentStore’ (usually ‘./alf_data/contentstore.deleted’). On the storage file system, the doc can then be removed via script or cron job once an appropriate backup has been performed.

For more information about the Modules involved in the document life cycle, you can refer to the following ressources:

Alfresco Trashcan Cleaner Module:
The purpose of this module is to automate the cleanup of the trashcan. That means that the user trashcan is not automatically cleaned up, so a document can stay there forever.
The “retention period” can be configured through the protectedDays parameter (trashcan-cleaner-context.xml).

For more details:
trashcancleaner readme
trashcancleaner code

Content Store Cleaner Module:
Once a document has been removed from trashcan (when trashcan has been manually or automatically cleaned up), it becomes orphan. The purpose of the Content Store Cleaner Module is to move the orphan document to the designated deletedContentStore, usually contentstore.deleted.

The orphan document retention period is configurable through the protectDays parameter.

For more details:
Content Store Cleaner Module

————

Basically, the conclusion is that you can configure the Alfresco system so that a file binary is deleted only if it has been backuped properly. Otherwhise, you can also choose to never delete files (and move it on low cost file storage solution). So you have several solutions available…

I will detail the backup strategy in another dedicated post (because it is relatively complex), but hope this one will help you !

4 Responses to Understanding Alfresco document life cycle for backup strategy

  1. Abhi says:

    Thanks for the lovely post and insite on the delete function and its actual impact to the document binary.

    One thing I am not very clear on is “a file binary is deleted only if it has been backped properly” .. if there is no metadata for the binary file, then why would anyone like to back it up? I am unable to comeup with a business scenario for that.

    Or the backup is required in case Alfresco fails to start after deletion?

    Also what if someone manually deletes the files form the “contentstore.deleted” store, will the indexes fail?

    Thanks,
    -Abhi

    • Enguerrand SPINDLER says:

      One thing I am not very clear on is “a file binary is deleted only if it has been backped properly” .. if there is no metadata for the binary file, then why would anyone like to back it up? I am unable to comeup with a business scenario for that.
      – It is just in case someone ask for the binary after deletion…sometimes people remember it after a long time.

      Or the backup is required in case Alfresco fails to start after deletion?
      – no impact

      Also what if someone manually deletes the files form the “contentstore.deleted” store, will the indexes fail?
      – no, you can remove it w/o any impact.

  2. i5513 says:

    So, it is safe to do a:

    rm -rf /opt/alfresco/alf_data/contentstore.deleted/200[8-9] ?

    my alfresco is not removing this content, but bean id=”contentStoreCleaner” is like configured. How can I view which messages gives cron task configured at 0 4 * * * which is the responsible ?

    Thank you

    PD: I will post this question in forums too

  3. Enguerrand SPINDLER says:

    Yes you can delete the content of contentstore.deleted if you need to get more disk space, and if you are sure nobody will never ask you to restore/find such old documents…

    Documents moved in the contentstore.deleted, are not removed by an Alfresco process (so contentStoreCleaner is not involved here). They can stay here forever, unless an admin manually delete them…

Leave a comment