Upgrade path from 2.x to 3.2 : some tuning and recommendations (Part 2)

May 27, 2010

Previous post : Upgrade path from 2.x to 3.2 : some tuning and recommendations (Part 1)

Here are some other tuning recommendations that might help you to reduce the time of the Alfresco upgrade path, from 2.x to 3.2:

Please note that we have tested some of these “tuning” and that some of them, or a combination of some of them, could have a negative impact on the time of the upgrade path…So I recommend that you test these options carefully one by one first, and that you select only those which are relevant in your case:

1/ If you experience OOM:

– Simply give more memory the heap (Xmx), if possible,
– Use the -XX:+HeapDumpOnOutOfMemoryError option so that the cause of the OOM can be analyzed afterwards through yourkit or other tools understanding hprof dumps (this cause no overhead, as heapdump file is created only if OOM occurs).


2/ Increase transactional caches:

During upgrade path, you might see some log traces as follows:

18:14:01,264 WARN [org.alfresco.repo.cache.TransactionalCache.org.alfresco.storeAndNodeIdTransactionalCache] Transactional update cache ‘org.alfresco.storeAndNodeIdTransactionalCache’ is full (10000).

In our case, the support recommendation was to increase Cache Size as follows:

In cache-context.xml, increase maxCacheSize for the following:

authorityTransactionalCache from 100 to 5000

org.alfresco.personTransactionalCache from 1000 to 10000

org.alfresco.storeAndNodeIdTransactionalCache from 10000 to 50000

But do not set the ehcache caches too high or you’ll accumulate uncollectable cached objects in the JVM old gen, leading to more and more full GCs and leaving less room for the rest. Even if they is enough memory you’ll spend a large amount of time processing weak refs. it’s a memory / performance tradeoff.(see also next recommendation below).


2/ Increase transactional caches can have a bad impact on JVM memory:

Increasing Transactional Cache can help in some cases, but on the condition that it does not fill up your old generation, which you cannot guess unless there is proper JVM monitoring and/ or heap dumps…

However, in some other cases, it might make sense, instead of increasing caches, to rather disable hibernate’s L2 cache for the upgrade path scenario…
This can be done by setting hibernate.cache.use_second_level_cache=false in alfresco-global.properties.

So clearly, this recommendation is not consistent with the previous one….But this is not surprising, since there is no “one true set” of settings that are written in stone. So test this option carefully, and use it only if you can see the improvement in your context.

3/ Another option to try is to “make the hibernate session size resource interceptors more aggressive”:

You can try to make the hibernate session size resource interceptors more aggressive (see beans sessionSizeResourceInterceptor and sessionSizeResourceManager): as a result, the caches will be flushed more frequently (more frequent commit) and the old gen accumulation will grow more slowly (assuming that stateful persistence contexts associated with the threads might be an issue in your context):

By default, you should have:

<bean id=”sessionSizeResourceInterceptor”
ptor” <property name=”methodResourceManagers”>
<ref bean=”sessionSizeResourceManager”></ref>
<property name=”elapsedTimeBeforeActivationMillis”>
<property name=”resourceManagerCallFrequencyMillis”>
<bean id=”sessionSizeResourceManager”
<property name=”sessionFactory”>
<ref bean=”sessionFactory” />
<property name=”writeThreshold”>
<property name=”readThreshold”>
<property name=”retentionFactor”>

You can try to set values as follows:

<bean id=”sessionSizeResourceInterceptor” >
<property name=”methodResourceManagers”>
<ref bean=”sessionSizeResourceManager”></ref>
<property name=”elapsedTimeBeforeActivationMillis”>
<property name=”resourceManagerCallFrequencyMillis”>

<bean id=”sessionSizeResourceManager”>
<property name=”sessionFactory”>
<ref bean=”sessionFactory” />
<property name=”writeThreshold”>
<property name=”readThreshold”>
<property name=”retentionFactor”>

This config will allow to free more space in the old gen at each collection.

Don’t forget to use this setting only for the upgrade, not for production / runtime.

It may or not help, so once again, try it and use it only if this leads to upgrade time improvements….

4/ Tuning the Linux swap threshold:

During the test of the upgrade path you should also check that the system is not swapping.

If it’s linux this can be controlled with the “vm.swappiness” kernel parameter, which can be set
safely to 10 (meaning don’t swap anything until ram is 90% full).
By default on Linux, this parameter is set to 40, which mean that system will swap as soon as memory is 60% full.

Note: we have not tested this tuning option during our upgrade path test.


Hope this series of post will help you to complete a succesfull (and quicker) data migration !

Upgrade path from 2.x to 3.2 : some tuning and recommendations (Part 1)

May 27, 2010

I would like to share with you some information about the upgrade from 2.1.1 to 3.2 (Enterprise version) we are currently executing, and especially the time you should plan for migration if you have a big repository as we do…

1/ Description of the context:

So we have executed the upgrade path from 2.1.1 to 2.1.7 and from 2.1.7 to 3.2 (please note it is not
mandatory to go through 3.1.1, as validated by the support).

Upgrade path from 2.1.1 to 2.1.7 is quite quick (as there is no schema upgrade here).

However, the upgrade 2.1.7 to 3.2 takes approx. 40 hours in our case…!
So the first thing you have to take care is to plan for a downtime period (or at least read-only) of your source v2.x production system…The only way to assess the duration is to run a “real” migration with a copy of your production data.

For your information, our hadware settings and data volume is described at the end of this post.
Basically, we have about 500 Gb of documents in our repository, but this is not the most important indicator actually.

What you especially have to measure is the number of records you have in ALF_NODE table (and also maybe in ALF_TRANSACTION table). In our case:
SELECT COUNT(*) from ALF_NODE; => 962984

Indeed, during the upgrade, documents are not updated at all, but there are a lots of treatments on the database to upgrade the schema and update the meta-data. We have clearly seen that most of the time/CPU is spent at database level (Oracle for us).

Especially, we can see that most of the time is spent on these 2 treatments:


b/ Applying patch ‘patch.updateDmPermissions’ (Update ACLs on all DM node objects to the new 3.0 permission model).

2/ Some tuning and recommendations:

2.1/ The first advice (mandatory !) is to do a “cold backup” of your source production database. That means when you will do the database export to get a copy of your data, you have to stop Alfresco application. Otherwhise, you will probably experience several unique constraint violation errors…


2.2/ If you experience some SQL errors during data migration, it might be “normal”:

For instance, in our case we had some errors like this:
17:09:48,034 ERROR [org.alfresco.repo.domain.schema.SchemaBootstrap] Statement execution failed:
SQL: INSERT INTO t_alf_node
id, version, store_id, uuid, transaction_id, node_deleted, type_qname_id, acl_id,
audit_creator, audit_created, audit_modifier, audit_modified
n.id, 1, s.id, n.uuid, nstat.transaction_id, 0, q.qname_id, n.acl_id,
null, null, null, null
alf_node n
JOIN t_qnames q ON (q.qname = n.type_qname)
JOIN alf_node_status nstat ON (nstat.node_id = n.id)
JOIN t_alf_store s ON (s.protocol = nstat.protocol AND s.identifier = nstat.identifier)

This might not necessarily a big problem. Some of these errors are well know by the support and due to some bugs in the v2.x Alfresco code. If you have such problem, then contact the support directly, and they will probably be able to provide an SQL script to cleanup the source v2.x data. Of course, you will have to restart the upgrade path from scratch, and re-import the initial v2.x database copy (cleanup script should be applied on the data copy, not directly in production :-).


2.3/ Database tuning:

The best way to speed up these treatments (if you want to avoid a too big downtime during migration) is to do some database tuning.

We have asked the support to know what are the best practices for Oracle, and here is the feedback:

” From previous occurrences we have analyzed the Oracle performances using certain statements.

We would recommend that you recalculate the schema stats on the database which may well speed things up a little:

dbms_stats.gather_schema_stats(ownname => user, options => ‘GATHER AUTO’, estimate_percent => dbms_stats.auto_sample_size); “.

We have tested that for Oracle 10g (run a dbms_stats before running the upgrade), but it has no significant effect…

If you are using MySQL, then I think there are a lots of tuning recommendation that the support could share with you, depending
on your configuration…sorry I have no more details for you, but support will probably be able to help.


2.4/ JVM tuning
Another tuning we applied was to increase -Xmx to 4G (we have a 64 bits server). This clearly helps for migration, at least it prevent us from OOM during the migration.


2.5/ If you experience OutOfMemory during data migration, then in some case you can simply – increase heap size – and relaunch the upgrade. This is possible only if all the Schema Update database script have been succesfully applied, and that the script fail during the “patches” execution.

So basically, if you see such lines in your log files…

10:37:06,193 INFO [org.alfresco.repo.domain.schema.SchemaBootstrap] Executing database script /home/alfresco/alfresco211/tomcat/temp/Alfresco/AlfrescoSchemaUpdate-org.hibernate.dialect.Oracle9Dialect-52799.sql (Copied from classpath:alfresco/dbscripts/create/1.4/org.hibernate.dialect.Oracle9Dialect/post-create-indexes-02.sql).
10:37:06,226 INFO [org.alfresco.repo.domain.schema.SchemaBootstrap] All executed statements written to file /home/alfresco/alfresco211/tomcat/temp/Alfresco/AlfrescoSchemaUpdate-All_Statements-52800.sql.
10:37:47,789 INFO [org.alfresco.repo.admin.patch.PatchExecuter] Checking for patches to apply …
18:13:41,080 INFO [org.alfresco.repo.admin.patch.PatchExecuter] Applying patch ‘patch.updateDmPermissions’ (Update ACLs on all DM node objects to the new 3.0 permission model).
18:14:01,264 WARN [org.alfresco.repo.cache.TransactionalCache.org.alfresco.storeAndNodeIdTransactionalCache] Transactional update cache ‘org.alfresco.storeAndNodeIdTransactionalCache’ is full (10000).
18:52:15,387 ERROR [org.alfresco.repo.admin.patch.PatchExecuter] (…)

…that means you can restart the server, and the PatchExecuter will probably be able to restart the upgrade from the latest point.


2.6/ See next post : Upgrade path from 2.x to 3.2 : some tuning and recommendations (Part 2)

3/ Our hardware settings and volume of data:

FYI, our stack is:
– Linux server with 4CPU and 8GB of RAM (64 bits)
– Red Hat Enterprise Linux Server release 5.x
– Oracle 10g (installed on the same server as Alfresco application)
– Tomcat 6.0.18
– JDK 6 u16 x64
– JVM settings:
export JAVA_OPTS=”-Xms4G -Xmx4G -XX:NewRatio=2 -XX:PermSize=160m -XX:MaxPermSize=160m -server”
export JAVA_OPTS=” ${JAVA_OPTS} -Dalfresco.home=${ALF_HOME} -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=80 -XX:+HeapDumpOnOutOfMemoryError”

Our volume of data:

Our contentstore contains ~ 1 400 000 items (files and directories) and ~ 464Gb of documents.

SELECT MAX(ID) from ALF_NODE; => 103353578
SELECT COUNT(*) from ALF_NODE; => 962984


Alfresco is doing strong investment in workflow and BPM : Apache-licensed Activiti open source project

May 17, 2010


Interesting news from Alfresco today, which has launched the “Activiti” BPM (Business Process Management) project and hired 2 GURUs of the Red Hat’s jBPM team (Tom Baeyens, founder and architect of the JBoss jBPM project, and fellow architect Joram Barrez) to create it.

Activiti will be an independently-run and branded open source project (Apache-licensed), and will work independently of the Alfresco open source ECM system.

This Activiti BPM suite is a very good news for Alfresco (customers were waiting for strong workflow capabilities !) but for all competitors as well (it’s open source) !

Today a lots of blog post have been published on this annoucement. I will try to summarize some of them hereafter, and understand the impact and perspective from the Alfresco point of view:


What is the purpose of this new project and Alfresco motivations ?

Activiti emerged from Alfresco’s desire to have an Apache-licensed BPM engine. Although the previous jBPM engine (embedded in Alfresco) was working well, it seems that it’s LGPL license was preventing Alfresco from being OEM into larger software companies…

So basically, Alfresco’s motivation is to have a more liberally-licensed default process engine, although they will continue to support jBPM.

The Activiti project is led by Alfresco and includes SpringSource, Signavio and Camunda. But it is an independently-run and branded open source project and it will work independently of the Alfresco open source ECM system.

Activiti will be liberally licensed under Apache License 2.0 to encourage widespread usage and adoption of the Activiti BPM engine and BPMN 2.0, which is being finalized as standard by OMG.

Design and Packaging of Activiti:

Activiti will be built from the ground up to be a light-weight, embeddable BPM engine, but also designed to operate in scalable Cloud environments.

The Activiti Engine, which is the underlying process execution engine, is packaged as a JAR file with small classes that can be embedded within other applications, such as is done in Alfresco for content management workflows. It can be easily deployed in the cloud, allowing for cross-enterprise processes.

By answering these questions, Activiti is addressing the requirements of BPM for new applications (not only Alfresco). Applications that wouldn’t have even considered a large scale, stand alone workflow server because of cost and complexity will now be able to freely embed a business process engine.

Integration with Alfresco ?

Activiti will become Alfresco’s default business process engine (current jBPM engine will still be supported, but this is clearly not the target anymore).

Alfresco will build a business around Activiti only for content-centric applications by tightly integrating it with their ECM, leaving other applications of BPM to other companies.

It will be very interesting to see the extent of the content-process integration in Alfresco, and if it includes triggering of process events based on document state changes as well as links from processes into the content repository.

Alfresco roadmap and support:

Alfresco will now do major investment in Activiti (and BPMN 2.0 standard) but will continue to support jBPM (as well as other business process engines currently integrated with its ECM software).

Alfresco will also offer support, maintenance and indemnity for the Activiti suite alongside the Alfresco Enterprise Edition.

So support will be provided for Activiti when it is used in conjunction with the Alfresco engine.


Current status of the project:

Alpha 1 release has been announced today with a planned November GA date (or end of the year).

(Alpha is available now at the Activiti web site, www.activiti.org).

Alfresco is looking to incorporate Activiti into its suite in a release at the end of 2010. The target release is 3.4 E (project “swift”).
Some early features will be included in 3.3 E (through Share interface), in May 2010.


What are the features provided by Activiti ?

Activiti is not only a BPM engine. There is a complete suite which includes a modeler, a process engine, an end-user application for participating in processes, and an administration console.

The first alpha release includes:
– Activiti Engine: A simple JAR file containing the Process Virtual Machine and BPMN process language implementation;
– Activiti Probe: A system administration console to control and operate the Activiti Engine;
– Activiti Explorer: A simple end-user application for managing task lists and executing process tasks;
– Activiti Modeler: A browser-based and Ajax-enabled BPMN 2.0 process modeling tool designed for business analysts.

For more information:

Some screenshots:

Alfresco announcement:

John Newton’s Blog:

The 451 Group analysis:

Open Source BPM with Alfresco’s Activiti (Blog):

How to find (and restore) a single document from your Alfresco backup ?

May 14, 2010

If you want to find/restore a copy of a single document from a previous “alf_data/contentstore” backup, you can use the following steps:

First I assume you know the filename of the document you are looking for…
In the example below, it is: ‘Portal services offering.ppt’

Also, I assume you will be able to restore the backup copy of the database on a testing database server (with the proposed approach, no need to restore the contentstore on a real server filesystem, I also assume that your documents backup utility give you the opportunity to search a file into the backup archive without needing to restore the full contentstore on a file system…).

Run the following query on your database copy (restored from the backup):
select * from ALF_NODE_PROPERTIES where STRING_VALUE=’Portal services offering.ppt’ and QNAME='{http://www.alfresco.org/model/content/1.0}name’;

If you get one result (or more), you can then use the NODE_ID value(s) to identify the specific version your are looking for:
(if you get several values, that propably means there are several versions or copies of the same document).

select * from ALF_NODE_PROPERTIES where NODE_ID=14066558;

In the search result, you can check the version number of the document (if any) by looking at the value related to this attribute (see QNAME column):

For each NODE_ID of your list, the QNAME='{http://www.alfresco.org/model/content/1.0}content’ property should contain a value like:

select * from ALF_NODE_PROPERTIES where NODE_ID=14066558;

select * from ALF_NODE_PROPERTIES where NODE_ID=93913845;

The path ‘//2009/10/28/16/9/’ corresponds to this directory in the alf_data contentstore (document storage):

So you should be able to use this info to find the corresponding directory in your contentstore backup archive.

And the file “e594bfcc-c3d3-11de-a2cf-79b22a9754d5.bin” correspond to a copy of your documents ‘Portal services offering.ppt’ which has been renamed automatically by Alfresco durung the save operation.

Hope this will help,

related post: 3 ways to find and restore a (single) “lost” document stored in Alfresco

Alfresco WCM roadmap 2010

May 9, 2010

As you may know, Alfresco has planned to do major investments this year on WCM.

I got some news about that at the french meetup last week.

Technically, there is a big merge under way between the Document Management core and the WCM core (AVM). Initially, these 2 components did not share the same API and runtime…so the target is to merge them together to have a more consistent (single repo) and rich product.

As far as I understand the merge should be 100% completed for Q2 2011…finally they will be embedded in a single package.

I asked the question to John Newton about what could be the target pricing of this “new” offer…John’s answer is that the pricing model is still not clearly defined. However, it is likely that if customer use both software together in a kind of publisher/subscriber pair (WCM used for content authoring and HTML content published to DM) then the price of WCM should be only a fraction of the standalone WCM price…
Also Alfresco will do major investment in the “Transfer service” (for AVM to DM deployment or DM to DM replication).
Technically, you can can consider this service as a kind of “syndication” (IBM WCM) or like an IBM Notes replication.

Now the roadmap of WCM tool is to become more a framework, to allow developpers building rich “content application”, rather than a Droopal like software (with out of the box basic web sites capabilities). In the past, Alfresco staff know that they might have been too ambitious about WCM, or maybe not well oriented. Now the focus is to provide a “content centric” platform, very open, with strong persistency capabilities, and powerfull authoring, rendition and data transfer services.

Roadmap is clear, let’s start to build smart data centric application !