Alfresco high availability infrastructure in real life (Part 1):

November 29, 2009

In our company, as the Alfresco DM service becomes more and more popular everyday (+ 2 GB of documents added each day), we are currently studying the implementation of a high availability (H.A) infrastructure (cluster, failover, etc).

So I will start this series of post to share my comments, questions and recommendations with you, because implementing such a H.A is a somewhat complex project…I should say that this complexity is not due to Alfresco software itself, but to the common underlying infrastructure concerns you have when you need to install any “big scale” architecture, with huge volume of documents to backup and hundreds or thousands of users.

I think most of the companies which have deployed Alfresco a few years ago, did start the project infrastructure with a pragmatic approach, i.e. using a more or less “basic” standalone server, as we did. So after a few years of production deployment, I’m sure they are in the same situation as we are: due to the success of the project, you now need to build a more industrialized architecture to better manage DRP, failover, load-balancing, backup, etc.

As I said, we have almost completed the design of our H.A infrastructure. I will give you more details about it once the final design will be validated (see next post), and also why we have taken these decisions, but basically here are the main concepts which have been selected so far:

  • Failover (YES): 2 distinct Alfresco instances located in 2 distinct Data-Centers geographically separated.
  • Clustering (YES/NO): yes, because we will have a 2 cluster nodes. But this will be a hot standby system (only one of the 2 Alfresco instances is exposed to end-users). The standby instance is mainly used for backup purpose (and of course failover).
  • Load-balancing (NO): after long discussion we have decided not to address load-balancing for the first release of the project. The main constraint is that load-balancing requires the 2 nodes of the cluster to be “in synch” in real time which can be complex to achieve (see next post). Also, managing CIFS and load-balancing is something we did not have the time to study for the first release.

Please note that these criteria does NOT illustrate the perfect infra which can match every company needs, but this is rather a pragmatic approach which address our functional constraints (RTO, RPO), leverage existing infra components (datacenter, backup robot), and which takes into account the “cost restrictions” .

At this step of the project study, I can suggest you the following recommendations:

1 – Read the Alfresco wiki and the editor technical documentation to give you an overview of what is and what is not possible with the latest version of the product.

2 – Work with Alfresco service team at the early beginning of the study,

3 – Define and validate the SLA with your “functional” team. Common indicators are RTO (Recovery Time Objective) and RPO (Recovery Point Objective).

4 – Estimate the number of concurrent users you plan to have.

5 – Estimate your documents data volume and how it will scale up in the future (this is a key point that will drive your backup strategy).

6 – Study carefully the backup and the restore strategy (DRP). Also define if you want to be able to restore a single file.

7 – Involve your internal infrastructure team (and Alfresco service team) to finalize the design of your infrastructure.

I will detail each of these items in the next blog post.

But the conclusion for us, at this step of the H.A design study, is clearly that there is no “perfect” architecture, but that the final infrastructure design is only the best compromise between all your internal company constraints (functional and technical). Once again, I can say that most of these compromises are not due to the Alfresco software limitation itself, but rather to the huge volume of documents that we have to manage.

So go to the next blog post if you want to know the next episode of the story…

Advertisements

Coding best practice : Lucene search query : resultSet.close() : part2

November 27, 2009

To continue on the same topic (see Part1 of this post):

I have open a JIRA issue, because I think we have identified a code defect in Alfresco code itself…Click here for more details.

Basically the “symptom” is that we can see that our lucene index seems to grow continuously in production.

When running the command:
find . -type d | wc -l
in:
<alfresco>/alf_data/lucene-indexes/workspace/SpacesStore/

It returns more than 21000 directories !

As per our discussion with Alfresco service guy, we should not have more than 100 directories in “<alfresco>/alf_data/lucene-indexes/workspace/SpacesStore/”.

This is a know issue in v2.1.1 when Lucene ResultSet are not closed properly.

We have checked the code of alfresco-enterprise-sdk-2.1.1 and it seems that ResultSet is not closed properly in:
java.org.alfresco.web/bean.wizard.InviteUsersWizard.java

Also, As per our discussion with Alfresco service:
– in release 2.1.6 and higher, the issue is fixed (i.e not closing resultset does not leads to a growing lucene index).

– if we want to “merge” the current lucene indexes to reduce the number of files, we must upgrade to release 2.2.3 (E ?). When the server will start, it will be able to automatically merge the lucene indexes.

Please note that this has to be confirmed by support, so check JIRA issue ALFCOM-3683 status for latest update on this.


Youtube for documents (Docstoc)

November 25, 2009

Do you know the Docstoc service ?

Basically this service can be compared to “the youtube for documents”. Docstoc is a web site to publish your documents online and find the interesting, helpful, & free docs across lots of categories. This platform is for users but also for businesses.

I did some brief testing…and I’m fan !

There is no relation with Alfresco here, but this is a very good example of DM SaaS.

It seems that other competitors are Scribd and issuu. I think I will continue to test these other services to understand what the propose in term of Document Management features…

If some of you are already using such services, I would be interested in sharing your experience. Thanks for your comments.


Some ressources to follow Cloud actualities

November 23, 2009

If you want to follow the latest new about Cloud technologies, I recommend you to read the following resources:

The Wisdom of Clouds : concepts, principles and evolutions of Cloud technologies…

GIGAOM Cloud computing : about Cloud Market, cloud software vendors, cloud hosting providers…

(Alfresco) Luis Sala Blog : Cloud technologies and Alfresco…

Enjoy !


Coding best practice : Lucene search query : resultSet.close()

November 19, 2009

Just want to share a very important coding best practice if you are using Lucence search query in Alfresco : do not forget to close the ResultSet object (org.alfresco.service.cmr.search.ResultSet) at the end of your treatments.

This is very important to avoid disk space issue (and also probably memory leak issue). Indeed, if the ResultSet object is not closed properly, then it seems that Lucene will have to “keep it on disk” as a “piece of Lucene index” (serialization here ?) and this block will not be removed by the “index cleaner process”. So space of Lucene index will continue to grow indefinitely….

There are a lots of  ” ” here, but I think you understand that this could be a major issue.

Code sample:

try {

 ResultSet resultSet = null;
 
 resultSet = searchService.query(
 Repository.getStoreRef(), SearchService.LANGUAGE_LUCENE, query.toString());           
 List<NodeRef> nodes = resultSet.getNodeRefs();
 for (int index=0; index<nodes.size(); index++) {
 
 //Do whatever you want here
 
 }

catch (Throwable err) {
 (…)
}
finally {
         if (resultSet != null)
         {
          resultSet.close();
         }
}