The queryanalysis part of the toolkit allows for analysing the characteristics of a set of queries stored in a directory, one query per file.
For using it, it suffices to run:
$ java -cp lib/containmenttester.jar fr.inrialpes.tyrexmo.queryanalysis.Analysis querylogwith querylog the name of the directory containing the queries.
We present below statistics about diverse available query logs (the script directory features a perl script (splitlog.pl) for extracting queries from log files.
We plan to process soon the USEWOD 2013 dataset.
The dbpedia 3.5.1 dataset contains 3 210 368 queries.
305333 errors over 3210368 queries (residu: 2905035) Number of queries with cycles using only ndvariables: 0 proj (326356) noproj (2578679) tree dag cycle tree dag cycle none 175220 562 1 1534150 1761 1748 union 9 26625 547 24 29629 1166 opt 2052 685 0 311608 722 1 filter 7912 711 6 264821 340 1 un-opt 0 306 0 0 12659 1 opt-filt 7991 779 0 4933 52401 0 filt-un 2 183 0 23802 12286 0 un-opt-filt 0 102765 0 0 302657 23969 proj (11) noproj (88) tree dag cycle tree dag cycle none 6 0 0 52 0 0 union 0 0 0 0 1 0 opt 0 0 0 10 0 0 filter 0 0 0 9 0 0 un-opt 0 0 0 0 0 0 opt-filt 0 0 0 0 1 0 filt-un 0 0 0 0 0 0 un-opt-filt 0 3 0 0 10 0
59717 errors (15.8%) over 378,530 queries (residu: 318813) Number of queries with cycles using only ndvariables: 0 Number of cyclic queries: (87%) tree dag cycle other 50141 95522 )22591 union 100001 55355 )
Beyond the cyclic and acyclic tests, we checked how many of the queries have projection, i.e., not all variables in the graph pattern are distinguished, or not. We found out that 63% of the queries have projection and 37% of the queries have no projection. Further, all of the cyclic queries have projection and out of the acyclic ones, 65% of the queries have projection and the rest have no projection. 88964 queries use OPTIONAL only 40448 of which are conjunctive queries.