The queryanalysis part of the toolkit allows for analysing the characteristics of a set of queries stored in a directory, one query per file.

Using the queryanalysis tool

For using it, it suffices to run:

$ java -cp lib/containmenttester.jar fr.inrialpes.tyrexmo.queryanalysis.Analysis querylog 
with querylog the name of the directory containing the queries.

We present below statistics about diverse available query logs (the script directory features a perl script (splitlog.pl) for extracting queries from log files.

Other datasets

We plan to process soon the USEWOD 2013 dataset.

DBPedia 3.5.1dataset

The dbpedia 3.5.1 dataset contains 3 210 368 queries.

305333 errors over 3210368 queries (residu: 2905035)

Number of queries with cycles using only ndvariables: 0

		proj (326356)		noproj (2578679)
		tree	dag	cycle	tree	dag	cycle
none		175220	562	1	1534150	1761	1748
union		9	26625	547	24	29629	1166
opt		2052	685	0	311608	722	1
filter		7912	711	6	264821	340	1
un-opt		0	306	0	0	12659	1
opt-filt	7991	779	0	4933	52401	0
filt-un		2	183	0	23802	12286	0
un-opt-filt	0	102765	0	0	302657	23969


		proj (11)		noproj (88)
		tree	dag	cycle	tree	dag	cycle
none		6	0	0	52	0	0
union		0	0	0	0	1	0
opt		0	0	0	10	0	0
filter		0	0	0	9	0	0
un-opt		0	0	0	0	0	0
opt-filt	0	0	0	0	1	0
filt-un		0	0	0	0	0	0
un-opt-filt	0	3	0	0	10	0

DBPedia 3.5.1 2010-07-13 (initial analysis)

59717 errors (15.8%) over 378,530 queries (residu: 318813)
Number of queries with cycles using only ndvariables: 0
Number of cyclic queries: (87%)

		tree	dag	cycle
other		50141	95522	)22591
union		100001	55355	)

Beyond the cyclic and acyclic tests, we checked how many of the queries have projection, i.e., not all variables in the graph pattern are distinguished, or not. We found out that 63% of the queries have projection and 37% of the queries have no projection. Further, all of the cyclic queries have projection and out of the acyclic ones, 65% of the queries have projection and the rest have no projection. 88964 queries use OPTIONAL only 40448 of which are conjunctive queries.