Alfresco permission management: how to know on which spaces a Group has access ?

July 30, 2010

In Alfresco, you can create Groups (i.e group of users), and use them to manage permission on spaces. This is obviously a best practice to assign permission through Groups, not Users.

If you have a LDAP, you can also easily synchronized your Groups and Users from LDAP to Alfresco, using the standard synchro job provided by Alfresco.

But once the Group has been applied on several spaces ACL (as Coordinator, Contributor, etc), you have currently no possibility to know the corresponding space list…So what if you want to check on which spaces/documents the Group has access and also which rights have been granted to this Group ? This is the purpose of the SQL query provided hereafter…

Actually we had this need, because we had to rename the Group name in Alfresco into upper case (due to some upper/lower case conflict between the Group name in our LDAP and in Alfresco).
Basically, (our) LDAP is case “unsensitive” (I mean we cannot create 2 Groups with the same name but distinct case, like “My_Group” and “MY_GROUP”), but Alfresco is case “sensitive” (so you can create 2 Groups with the same name but distinct case, like “My_Group” and “MY_GROUP”).

Also, what you should know is that there is no way to change the Group name in Alfresco programatically….(validated with support ; see

This query will return the list of spaces, on which the Group “My_Group” has access (explicitely, not through inheritance), and also the level of access granted (Coordinator, Contributor, etc):

SELECT ap.node_id, ap.string_value, acl.inherits,
FROM alf_node an, alf_node_properties ap, alf_acl_member acl_m,alf_access_control_entry ace,alf_authority alfauth, alf_access_control_list acl, alf_permission aperm
WHERE = ap.node_id AND ap.qname_id= 90
AND an.acl_id = acl_m.acl_id AND alfauth.authority LIKE ‘%My_Group’
AND acl_m.ace_id = AND = ace.authority_id
AND = acl_m.acl_id AND = ace.permission_id
AND acl_m.pos=0;

Some explanation about the parameters used:
ap.node_id (space id)
ap.string_value (space name)
acl.inherits (Inherit Parent Permission check box : value 1- true, 0- false) (Permission role : value Coordinator, Collaborator, Consumer, Editor etc)

Then with the node id retrieved from above query you could get the exact path of the space, with this other query:

SELECT parent_node_id, child_node_id, child_node_name
FROM alf_child_assoc
start with child_node_id = 66469
CONNECT BY PRIOR parent_node_id = child_node_id
order by parent_node_id;

If you want to get also the corresponding inherited permission (i.e sub-spaces and documents which inherits from the parent space ACL), you just have to remove the filtering condition ” AND acl_m.pos=0; “.

Note1: these queries has been tested with version 3.2 E.

Note2: if the 3.2 Alfresco instance has been installed from scratch (empty), you should use the value : “ap.qname_id= 90”.
Otherwhise, if the instance has been upgraded (e.g from v2.x to 3.2) then you might have to get the proper qname_id value with this query:

Upgrade path from 2.x to 3.2 : some tuning and recommendations (Part 2)

May 27, 2010

Previous post : Upgrade path from 2.x to 3.2 : some tuning and recommendations (Part 1)

Here are some other tuning recommendations that might help you to reduce the time of the Alfresco upgrade path, from 2.x to 3.2:

Please note that we have tested some of these “tuning” and that some of them, or a combination of some of them, could have a negative impact on the time of the upgrade path…So I recommend that you test these options carefully one by one first, and that you select only those which are relevant in your case:

1/ If you experience OOM:

– Simply give more memory the heap (Xmx), if possible,
– Use the -XX:+HeapDumpOnOutOfMemoryError option so that the cause of the OOM can be analyzed afterwards through yourkit or other tools understanding hprof dumps (this cause no overhead, as heapdump file is created only if OOM occurs).


2/ Increase transactional caches:

During upgrade path, you might see some log traces as follows:

18:14:01,264 WARN [] Transactional update cache ‘org.alfresco.storeAndNodeIdTransactionalCache’ is full (10000).

In our case, the support recommendation was to increase Cache Size as follows:

In cache-context.xml, increase maxCacheSize for the following:

authorityTransactionalCache from 100 to 5000

org.alfresco.personTransactionalCache from 1000 to 10000

org.alfresco.storeAndNodeIdTransactionalCache from 10000 to 50000

But do not set the ehcache caches too high or you’ll accumulate uncollectable cached objects in the JVM old gen, leading to more and more full GCs and leaving less room for the rest. Even if they is enough memory you’ll spend a large amount of time processing weak refs. it’s a memory / performance tradeoff.(see also next recommendation below).


2/ Increase transactional caches can have a bad impact on JVM memory:

Increasing Transactional Cache can help in some cases, but on the condition that it does not fill up your old generation, which you cannot guess unless there is proper JVM monitoring and/ or heap dumps…

However, in some other cases, it might make sense, instead of increasing caches, to rather disable hibernate’s L2 cache for the upgrade path scenario…
This can be done by setting hibernate.cache.use_second_level_cache=false in

So clearly, this recommendation is not consistent with the previous one….But this is not surprising, since there is no “one true set” of settings that are written in stone. So test this option carefully, and use it only if you can see the improvement in your context.

3/ Another option to try is to “make the hibernate session size resource interceptors more aggressive”:

You can try to make the hibernate session size resource interceptors more aggressive (see beans sessionSizeResourceInterceptor and sessionSizeResourceManager): as a result, the caches will be flushed more frequently (more frequent commit) and the old gen accumulation will grow more slowly (assuming that stateful persistence contexts associated with the threads might be an issue in your context):

By default, you should have:

<bean id=”sessionSizeResourceInterceptor”
ptor” <property name=”methodResourceManagers”>
<ref bean=”sessionSizeResourceManager”></ref>
<property name=”elapsedTimeBeforeActivationMillis”>
<property name=”resourceManagerCallFrequencyMillis”>
<bean id=”sessionSizeResourceManager”
<property name=”sessionFactory”>
<ref bean=”sessionFactory” />
<property name=”writeThreshold”>
<property name=”readThreshold”>
<property name=”retentionFactor”>

You can try to set values as follows:

<bean id=”sessionSizeResourceInterceptor” >
<property name=”methodResourceManagers”>
<ref bean=”sessionSizeResourceManager”></ref>
<property name=”elapsedTimeBeforeActivationMillis”>
<property name=”resourceManagerCallFrequencyMillis”>

<bean id=”sessionSizeResourceManager”>
<property name=”sessionFactory”>
<ref bean=”sessionFactory” />
<property name=”writeThreshold”>
<property name=”readThreshold”>
<property name=”retentionFactor”>

This config will allow to free more space in the old gen at each collection.

Don’t forget to use this setting only for the upgrade, not for production / runtime.

It may or not help, so once again, try it and use it only if this leads to upgrade time improvements….

4/ Tuning the Linux swap threshold:

During the test of the upgrade path you should also check that the system is not swapping.

If it’s linux this can be controlled with the “vm.swappiness” kernel parameter, which can be set
safely to 10 (meaning don’t swap anything until ram is 90% full).
By default on Linux, this parameter is set to 40, which mean that system will swap as soon as memory is 60% full.

Note: we have not tested this tuning option during our upgrade path test.


Hope this series of post will help you to complete a succesfull (and quicker) data migration !

Upgrade path from 2.x to 3.2 : some tuning and recommendations (Part 1)

May 27, 2010

I would like to share with you some information about the upgrade from 2.1.1 to 3.2 (Enterprise version) we are currently executing, and especially the time you should plan for migration if you have a big repository as we do…

1/ Description of the context:

So we have executed the upgrade path from 2.1.1 to 2.1.7 and from 2.1.7 to 3.2 (please note it is not
mandatory to go through 3.1.1, as validated by the support).

Upgrade path from 2.1.1 to 2.1.7 is quite quick (as there is no schema upgrade here).

However, the upgrade 2.1.7 to 3.2 takes approx. 40 hours in our case…!
So the first thing you have to take care is to plan for a downtime period (or at least read-only) of your source v2.x production system…The only way to assess the duration is to run a “real” migration with a copy of your production data.

For your information, our hadware settings and data volume is described at the end of this post.
Basically, we have about 500 Gb of documents in our repository, but this is not the most important indicator actually.

What you especially have to measure is the number of records you have in ALF_NODE table (and also maybe in ALF_TRANSACTION table). In our case:
SELECT COUNT(*) from ALF_NODE; => 962984

Indeed, during the upgrade, documents are not updated at all, but there are a lots of treatments on the database to upgrade the schema and update the meta-data. We have clearly seen that most of the time/CPU is spent at database level (Oracle for us).

Especially, we can see that most of the time is spent on these 2 treatments:


b/ Applying patch ‘patch.updateDmPermissions’ (Update ACLs on all DM node objects to the new 3.0 permission model).

2/ Some tuning and recommendations:

2.1/ The first advice (mandatory !) is to do a “cold backup” of your source production database. That means when you will do the database export to get a copy of your data, you have to stop Alfresco application. Otherwhise, you will probably experience several unique constraint violation errors…


2.2/ If you experience some SQL errors during data migration, it might be “normal”:

For instance, in our case we had some errors like this:
17:09:48,034 ERROR [org.alfresco.repo.domain.schema.SchemaBootstrap] Statement execution failed:
SQL: INSERT INTO t_alf_node
id, version, store_id, uuid, transaction_id, node_deleted, type_qname_id, acl_id,
audit_creator, audit_created, audit_modifier, audit_modified
SELECT, 1,, n.uuid, nstat.transaction_id, 0, q.qname_id, n.acl_id,
null, null, null, null
alf_node n
JOIN t_qnames q ON (q.qname = n.type_qname)
JOIN alf_node_status nstat ON (nstat.node_id =
JOIN t_alf_store s ON (s.protocol = nstat.protocol AND s.identifier = nstat.identifier)

This might not necessarily a big problem. Some of these errors are well know by the support and due to some bugs in the v2.x Alfresco code. If you have such problem, then contact the support directly, and they will probably be able to provide an SQL script to cleanup the source v2.x data. Of course, you will have to restart the upgrade path from scratch, and re-import the initial v2.x database copy (cleanup script should be applied on the data copy, not directly in production :-).


2.3/ Database tuning:

The best way to speed up these treatments (if you want to avoid a too big downtime during migration) is to do some database tuning.

We have asked the support to know what are the best practices for Oracle, and here is the feedback:

” From previous occurrences we have analyzed the Oracle performances using certain statements.

We would recommend that you recalculate the schema stats on the database which may well speed things up a little:

dbms_stats.gather_schema_stats(ownname => user, options => ‘GATHER AUTO’, estimate_percent => dbms_stats.auto_sample_size); “.

We have tested that for Oracle 10g (run a dbms_stats before running the upgrade), but it has no significant effect…

If you are using MySQL, then I think there are a lots of tuning recommendation that the support could share with you, depending
on your configuration…sorry I have no more details for you, but support will probably be able to help.


2.4/ JVM tuning
Another tuning we applied was to increase -Xmx to 4G (we have a 64 bits server). This clearly helps for migration, at least it prevent us from OOM during the migration.


2.5/ If you experience OutOfMemory during data migration, then in some case you can simply – increase heap size – and relaunch the upgrade. This is possible only if all the Schema Update database script have been succesfully applied, and that the script fail during the “patches” execution.

So basically, if you see such lines in your log files…

10:37:06,193 INFO [org.alfresco.repo.domain.schema.SchemaBootstrap] Executing database script /home/alfresco/alfresco211/tomcat/temp/Alfresco/AlfrescoSchemaUpdate-org.hibernate.dialect.Oracle9Dialect-52799.sql (Copied from classpath:alfresco/dbscripts/create/1.4/org.hibernate.dialect.Oracle9Dialect/post-create-indexes-02.sql).
10:37:06,226 INFO [org.alfresco.repo.domain.schema.SchemaBootstrap] All executed statements written to file /home/alfresco/alfresco211/tomcat/temp/Alfresco/AlfrescoSchemaUpdate-All_Statements-52800.sql.
10:37:47,789 INFO [org.alfresco.repo.admin.patch.PatchExecuter] Checking for patches to apply …
18:13:41,080 INFO [org.alfresco.repo.admin.patch.PatchExecuter] Applying patch ‘patch.updateDmPermissions’ (Update ACLs on all DM node objects to the new 3.0 permission model).
18:14:01,264 WARN [] Transactional update cache ‘org.alfresco.storeAndNodeIdTransactionalCache’ is full (10000).
18:52:15,387 ERROR [org.alfresco.repo.admin.patch.PatchExecuter] (…)

…that means you can restart the server, and the PatchExecuter will probably be able to restart the upgrade from the latest point.


2.6/ See next post : Upgrade path from 2.x to 3.2 : some tuning and recommendations (Part 2)

3/ Our hardware settings and volume of data:

FYI, our stack is:
– Linux server with 4CPU and 8GB of RAM (64 bits)
– Red Hat Enterprise Linux Server release 5.x
– Oracle 10g (installed on the same server as Alfresco application)
– Tomcat 6.0.18
– JDK 6 u16 x64
– JVM settings:
export JAVA_OPTS=”-Xms4G -Xmx4G -XX:NewRatio=2 -XX:PermSize=160m -XX:MaxPermSize=160m -server”
export JAVA_OPTS=” ${JAVA_OPTS} -Dalfresco.home=${ALF_HOME} -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=80 -XX:+HeapDumpOnOutOfMemoryError”

Our volume of data:

Our contentstore contains ~ 1 400 000 items (files and directories) and ~ 464Gb of documents.

SELECT MAX(ID) from ALF_NODE; => 103353578
SELECT COUNT(*) from ALF_NODE; => 962984


Google Docs integration with Alfresco

April 20, 2010

For those who were considering using Google Docs, but feel that it does not offer (at least not for the moment) any advanced document management or ECM feature, it seems that Google and Alfresco have some plan to work together:

See the technology preview here:  Google Docs integration with Alfresco

As far as I know, Alfresco is working with Google to integrate Alfresco into Google docs for those organizations and teams who want ECM-style functionality such as version control, workflow, etc.

It is not clear whether Google Docs will be simply used as a remote repository (this is what the demo shows) or if Alfresco solution could be embedded as a backend into Google OEM ….

It seems we will have more info in the next few days….

Count the total number of documents in Alfresco (using SQL)

January 20, 2010

For reporting purpose, we had to count the exact number of documents stored in Alfresco.
(This is mainly because we are using an external search engine, and we wanted to make sure this tool is able to index the full document repository).

My constraints was that:
– We have about 1TB of document (so a javascript is not probably not appropriate, should takes too long to execute…),
– We are using Alfresco v2.1.1 (so this info is not exposed in JMX console, as it should be the case in 3.1 or 3.2).

So I had to deal with SQL request….

Here is some sample requests I did. Hope this could help you if you have the same need:

Count total number of documents in Alfresco (note: this includes all the live “cm:content” object visible under “Company Home”, so also the .js or .ftl files located in “Data Dictionary”):

SELECT count( FROM alfresco.alf_node as a
where a.protocol=”workspace” AND a.identifier=”SpacesStore”
AND a.type_qname=”{}content”;

Count total number of Spaces in Alfresco:

SELECT count( FROM alfresco.alf_node as a
where a.protocol=”workspace” AND a.identifier=”SpacesStore”
AND a.type_qname=”{}folder”;

Count number of documents in Alfresco, but exclude all “.log” files extension:

SELECT count( FROM alfresco.alf_node as a, alfresco.alf_node_properties as b
AND a.protocol=”workspace” AND a.identifier=”SpacesStore”
AND a.type_qname=”{}content”
AND b.qname=”{}name”
AND b.string_value NOT like “%.log%”;

Count only the number of “.ftl” documents in Alfresco:

SELECT count( FROM alfresco.alf_node as a, alfresco.alf_node_properties as b
AND a.protocol=”workspace” AND a.identifier=”SpacesStore”
AND a.type_qname=”{}content”
AND b.qname=”{}name”
AND b.string_value like “%.ftl%”;

Note: I’m not a SQL expert, but I think this request is equivalent but more optimized:

SELECT count(*) FROM alfresco.alf_node a
inner join alfresco.alf_node_properties b on AND b.qname='{}name’
where a.protocol=’workspace’ AND a.identifier=’SpacesStore’
AND a.type_qname='{}content’
AND b.string_value NOT like ‘%.log%’;



Another need was to be able to not take into account some spaces in the total count of documents.
I find a way, but partially only:

This request count the number of document object located in a specific space (at the root of the space only, please note that this request is not recursive….sorry):

The nodeID (uuid) of the space is passed as input: a.uuid=”19097e2d-0502-11df-80c7-0bb47275447a” (you can get that value from the web UI):

SELECT * FROM alfresco.alf_node a where IN (SELECT b.child_node_id FROM alfresco.alf_node a, alfresco.alf_child_assoc as b
where a.uuid=”19097e2d-0502-11df-80c7-0bb47275447a”
AND a.type_qname=”{}folder”
AND a.type_qname=”{}content”;

Some more details about how the previous request is built:

First we perform this query to get all the children object ID of the specific space (this returns folder and documents):
(SELECT b.child_node_id FROM alfresco.alf_node a, alfresco.alf_child_assoc as b
where a.uuid=”19097e2d-0502-11df-80c7-0bb47275447a”
AND a.type_qname=”{}folder”

Then we filter the object list to retrieve only the documents (not the folders):
SELECT * FROM alfresco.alf_node a where IN (…)
AND a.type_qname=”{}content”;

I did test these requests on a dedicated testing instance of Alfresco (with MySQL). So I was able to check that each document added was reflected in the new count value. There are some tests I did not do (like with versionning enabled….). So of course, should you have the same need for legal or contract reason, ask first the support to validate that the count method is correct.

3 ways to find and restore a (single) “lost” document stored in Alfresco

December 2, 2009

This week, someone of my team told me he “lost” one document when working in Alfresco.
He told me he did some cut and copy actions using CIFS (windows explorer) and that he experienced some unexpected system behaviour…finally he was not able to find the document anymore in the Alfresco space…
I looked into the trashcan (managed deleted items) => nothing.
I ran a Lucene search on the full repo => nothing.

So yes, the document was not available from the Alfresco interface. But as the document has not been deleted, I was quite sure that it could in an “orphan” state (i.e no meta-data associated with the binaries on the storage file system). Maybe a “transaction failure” might have caused this situation…

Fortunately, the user gave me the following information about the doc:
– Filename : team_metodology
– Document type (MS .ppt document),
– The document topic (it is about “methodology”),
– The last time he saved it in Alfresco (“2009/12/1” at 10 AM).
– The space path is “Company Home/Space1/Space2/Space3”.

In this case, you have several solutions. Please note that none of them is a perfect approach, but these are the only solutions you have with the current 3.2 version.
They are ordered by complexity order (from most complex to less complex):


1/ Restore a full backup of day D-1. This approach requires you have:
– Full backup of database and content store for D-1 (or any backup which contains the doc).
– An Alfresco server dedicated to restore operation (that means it should have the same disk space available as your current production server).

Obviously, if you have a large repository, this is a very “heavy” operation, just to restore a single file.


2/ Restore only a database backup of day D-1:
– With this approach, you will restore only the DB, but not the ContentStore (because you might not have enough disk space to restore all documents),
– Basically, the approach is to:
   – Find the filesytem path of the doc in the DB (like “alfresco/alf_data/contentstore/2009/12/1/hh/mm/”),
   – Try to find the doc in the ContentStore backup based on the previous path,
– I assume here you have a Database server dedicated to restore operation (with enough disk space).
– Of course, you will not be able to use Alfresco application here => you will have to start the
DB only and do some SQL queries to find the corresponding path of the document.
– I never tested this approach, so I cannot give you the corresponding SQL queries. But basically, you should first use the filename (team_metodology.ppt) to find the corresponding entry in DB, and then try to find its path on filesystem (like “2009/12/1/hh/mm/”).

Once again, this has never been tested, so this is a pure theoretical approach…


3/ Search the doc binary on the Alfresco storage (filesystem):
– In our case, we are using a Linux filesystem as Alfresco storage. So the corresponding .bin object we are looking for is somewhere in the “alfresco/alf_data/contentstore/” directory.
– We know that last time user saved the doc in Alfresco was “2009/12/1” at 10 AM.
– We also know that doc is about “methodology” (so I assume here the string “methodology” can be found in the doc content),
– You should know that document filename and extension are modified by Alfresco when stored on filesystem (e.g “team_metodology.ppt” becomes something similar to “f8001f06-ddda-11de-a614-ad54d765801e.bin”).
– So the approach is to:
   – go in directory “alfresco/alf_data/contentstore/2009/12/1/10”,
   – do a grep on “methodology” (grep -ir “methodology” *): you might have several results:
Binary file 3/cc1eaf57-dddc-11de-ba7b-ad54d765801e.bin matches
Binary file 19/f18bad09-dddc-11de-ba7b-ad54d765801e.bin matches
Binary file 10/8b555d79-ddda-11de-a614-ad54d765801e.bin matches
Binary file 11/844c0916-dddd-11de-83bf-ad54d765801e.bin matches
Binary file 5/f8001f06-ddda-11de-a614-ad54d765801e.bin matches

   – Based on the last modification hour, these 2 files are matching:

   Binary file 10/8b555d79-ddda-11de-a614-ad54d765801e.bin matches
   Binary file 11/844c0916-dddd-11de-83bf-ad54d765801e.bin matches

   – Download the files on your computer, and then change the extension to .ppt:


   – Try to open these files (through windows explorer, not through the “winscp” client as this might not work).

   – One of them should be the doc you are looking for !

This last approach has been tested, so I can “guarantee” it is working.
But obviously, this is a “workaround” solution…Also you really need to know the last time
the document has been updated, otherwhise your “grep” search might take too much time.


I did talk with an Alfresco service guy last week, and he agreed that this could be a very useful evolution to be able to quickly find and restore an “orphan” document.

Our company employees are used to work with “Windows server” facilities, and MS do provide such feature to easily find and restore a single file from the backup.

He told me a basic custom dev could help here: each time a file is created/saved, Alfresco could add an entry in a log file to reference the mapping between a doc path (like “Company Home/Space1/Space2/Space3/team_metodology.ppt”)
and the corresponding path on the filesystem (like “alfresco/alf_data/contentstore/2009/12/1/hh/mm/”). This would
greatly simplify the document search into the ContentStore backup (or within the live ContentStore itself).

Of course a more industrialized solution provided by the editor would be even more appropriate…
I think I will open a JIRA for this evolution…