3 ways to find and restore a (single) “lost” document stored in Alfresco

This week, someone of my team told me he “lost” one document when working in Alfresco.
He told me he did some cut and copy actions using CIFS (windows explorer) and that he experienced some unexpected system behaviour…finally he was not able to find the document anymore in the Alfresco space…
I looked into the trashcan (managed deleted items) => nothing.
I ran a Lucene search on the full repo => nothing.

So yes, the document was not available from the Alfresco interface. But as the document has not been deleted, I was quite sure that it could in an “orphan” state (i.e no meta-data associated with the binaries on the storage file system). Maybe a “transaction failure” might have caused this situation…

Fortunately, the user gave me the following information about the doc:
– Filename : team_metodology
– Document type (MS .ppt document),
– The document topic (it is about “methodology”),
– The last time he saved it in Alfresco (“2009/12/1” at 10 AM).
– The space path is “Company Home/Space1/Space2/Space3”.

In this case, you have several solutions. Please note that none of them is a perfect approach, but these are the only solutions you have with the current 3.2 version.
They are ordered by complexity order (from most complex to less complex):

———-

1/ Restore a full backup of day D-1. This approach requires you have:
– Full backup of database and content store for D-1 (or any backup which contains the doc).
– An Alfresco server dedicated to restore operation (that means it should have the same disk space available as your current production server).

Obviously, if you have a large repository, this is a very “heavy” operation, just to restore a single file.

———-

2/ Restore only a database backup of day D-1:
– With this approach, you will restore only the DB, but not the ContentStore (because you might not have enough disk space to restore all documents),
– Basically, the approach is to:
   – Find the filesytem path of the doc in the DB (like “alfresco/alf_data/contentstore/2009/12/1/hh/mm/”),
   – Try to find the doc in the ContentStore backup based on the previous path,
– I assume here you have a Database server dedicated to restore operation (with enough disk space).
– Of course, you will not be able to use Alfresco application here => you will have to start the
DB only and do some SQL queries to find the corresponding path of the document.
– I never tested this approach, so I cannot give you the corresponding SQL queries. But basically, you should first use the filename (team_metodology.ppt) to find the corresponding entry in DB, and then try to find its path on filesystem (like “2009/12/1/hh/mm/”).

Once again, this has never been tested, so this is a pure theoretical approach…

———-

3/ Search the doc binary on the Alfresco storage (filesystem):
– In our case, we are using a Linux filesystem as Alfresco storage. So the corresponding .bin object we are looking for is somewhere in the “alfresco/alf_data/contentstore/” directory.
– We know that last time user saved the doc in Alfresco was “2009/12/1” at 10 AM.
– We also know that doc is about “methodology” (so I assume here the string “methodology” can be found in the doc content),
– You should know that document filename and extension are modified by Alfresco when stored on filesystem (e.g “team_metodology.ppt” becomes something similar to “f8001f06-ddda-11de-a614-ad54d765801e.bin”).
– So the approach is to:
   – go in directory “alfresco/alf_data/contentstore/2009/12/1/10”,
   – do a grep on “methodology” (grep -ir “methodology” *): you might have several results:
 
Binary file 3/cc1eaf57-dddc-11de-ba7b-ad54d765801e.bin matches
Binary file 19/f18bad09-dddc-11de-ba7b-ad54d765801e.bin matches
Binary file 10/8b555d79-ddda-11de-a614-ad54d765801e.bin matches
Binary file 11/844c0916-dddd-11de-83bf-ad54d765801e.bin matches
Binary file 5/f8001f06-ddda-11de-a614-ad54d765801e.bin matches

   – Based on the last modification hour, these 2 files are matching:

   Binary file 10/8b555d79-ddda-11de-a614-ad54d765801e.bin matches
   Binary file 11/844c0916-dddd-11de-83bf-ad54d765801e.bin matches

   – Download the files on your computer, and then change the extension to .ppt:

   8b555d79-ddda-11de-a614-ad54d765801e.ppt
   844c0916-dddd-11de-83bf-ad54d765801e.ppt

   – Try to open these files (through windows explorer, not through the “winscp” client as this might not work).

   – One of them should be the doc you are looking for !

This last approach has been tested, so I can “guarantee” it is working.
But obviously, this is a “workaround” solution…Also you really need to know the last time
the document has been updated, otherwhise your “grep” search might take too much time.

———-

I did talk with an Alfresco service guy last week, and he agreed that this could be a very useful evolution to be able to quickly find and restore an “orphan” document.

Our company employees are used to work with “Windows server” facilities, and MS do provide such feature to easily find and restore a single file from the backup.

He told me a basic custom dev could help here: each time a file is created/saved, Alfresco could add an entry in a log file to reference the mapping between a doc path (like “Company Home/Space1/Space2/Space3/team_metodology.ppt”)
and the corresponding path on the filesystem (like “alfresco/alf_data/contentstore/2009/12/1/hh/mm/”). This would
greatly simplify the document search into the ContentStore backup (or within the live ContentStore itself).

Of course a more industrialized solution provided by the editor would be even more appropriate…
I think I will open a JIRA for this evolution…

Advertisements

One Response to 3 ways to find and restore a (single) “lost” document stored in Alfresco

  1. Choorsedeld says:

    I really enjoyed reading your blogpost, keep on making such exciting articles!!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: