Count the total number of documents in Alfresco (using SQL)

January 20, 2010

For reporting purpose, we had to count the exact number of documents stored in Alfresco.
(This is mainly because we are using an external search engine, and we wanted to make sure this tool is able to index the full document repository).

My constraints was that:
– We have about 1TB of document (so a javascript is not probably not appropriate, should takes too long to execute…),
– We are using Alfresco v2.1.1 (so this info is not exposed in JMX console, as it should be the case in 3.1 or 3.2).

So I had to deal with SQL request….

Here is some sample requests I did. Hope this could help you if you have the same need:

========================================================
Count total number of documents in Alfresco (note: this includes all the live “cm:content” object visible under “Company Home”, so also the .js or .ftl files located in “Data Dictionary”):

SELECT count(a.id) FROM alfresco.alf_node as a
where a.protocol=”workspace” AND a.identifier=”SpacesStore”
AND a.type_qname=”{http://www.alfresco.org/model/content/1.0}content”;

Count total number of Spaces in Alfresco:

SELECT count(a.id) FROM alfresco.alf_node as a
where a.protocol=”workspace” AND a.identifier=”SpacesStore”
AND a.type_qname=”{http://www.alfresco.org/model/content/1.0}folder”;

Count number of documents in Alfresco, but exclude all “.log” files extension:

SELECT count(a.id) FROM alfresco.alf_node as a, alfresco.alf_node_properties as b
where a.id=b.node_id
AND a.protocol=”workspace” AND a.identifier=”SpacesStore”
AND a.type_qname=”{http://www.alfresco.org/model/content/1.0}content”
AND b.qname=”{http://www.alfresco.org/model/content/1.0}name”
AND b.string_value NOT like “%.log%”;

Count only the number of “.ftl” documents in Alfresco:

SELECT count(a.id) FROM alfresco.alf_node as a, alfresco.alf_node_properties as b
where a.id=b.node_id
AND a.protocol=”workspace” AND a.identifier=”SpacesStore”
AND a.type_qname=”{http://www.alfresco.org/model/content/1.0}content”
AND b.qname=”{http://www.alfresco.org/model/content/1.0}name”
AND b.string_value like “%.ftl%”;

********************
Note: I’m not a SQL expert, but I think this request is equivalent but more optimized:

SELECT count(*) FROM alfresco.alf_node a
inner join alfresco.alf_node_properties b on a.id=b.node_id AND b.qname='{http://www.alfresco.org/model/content/1.0}name’
where a.protocol=’workspace’ AND a.identifier=’SpacesStore’
AND a.type_qname='{http://www.alfresco.org/model/content/1.0}content’
AND b.string_value NOT like ‘%.log%’;

********************

========================================================

Another need was to be able to not take into account some spaces in the total count of documents.
I find a way, but partially only:

This request count the number of document object located in a specific space (at the root of the space only, please note that this request is not recursive….sorry):

The nodeID (uuid) of the space is passed as input: a.uuid=”19097e2d-0502-11df-80c7-0bb47275447a” (you can get that value from the web UI):

SELECT * FROM alfresco.alf_node a where a.id IN (SELECT b.child_node_id FROM alfresco.alf_node a, alfresco.alf_child_assoc as b
where a.uuid=”19097e2d-0502-11df-80c7-0bb47275447a”
AND a.type_qname=”{http://www.alfresco.org/model/content/1.0}folder”
AND a.id=b.parent_node_id)
AND a.type_qname=”{http://www.alfresco.org/model/content/1.0}content”;

Some more details about how the previous request is built:

First we perform this query to get all the children object ID of the specific space (this returns folder and documents):
(SELECT b.child_node_id FROM alfresco.alf_node a, alfresco.alf_child_assoc as b
where a.uuid=”19097e2d-0502-11df-80c7-0bb47275447a”
AND a.type_qname=”{http://www.alfresco.org/model/content/1.0}folder”
AND a.id=b.parent_node_id)

Then we filter the object list to retrieve only the documents (not the folders):
SELECT * FROM alfresco.alf_node a where a.id IN (…)
AND a.type_qname=”{http://www.alfresco.org/model/content/1.0}content”;

I did test these requests on a dedicated testing instance of Alfresco (with MySQL). So I was able to check that each document added was reflected in the new count value. There are some tests I did not do (like with versionning enabled….). So of course, should you have the same need for legal or contract reason, ask first the support to validate that the count method is correct.

Advertisements