Class: SPARQL::Algebra::Operator::Dataset

Inherits:

Binary

Object
SPARQL::Algebra::Operator
Binary
SPARQL::Algebra::Operator::Dataset

show all

Includes:: Query

Defined in:: lib/sparql/algebra/operator/dataset.rb

Overview

The SPARQL GraphPattern dataset operator.

Instantiated with two operands, the first being an array of data source URIs, either bare, indicating a default dataset, or expressed as an array \[:named, <uri\>\], indicating that it represents a named data source.

This operator loads from the datasource, unless a graph named by the datasource URI already exists in the repository.

The contained BGP queries are then performed against the specified default and named graphs. Rather than using the actual default graph of the dataset, queries against the default dataset are run against named graphs matching a non-distinctive variable and the results are filtered against those URIs included in the default dataset.

Specifically, each BGP which is not part of a graph pattern is replaced with a union of graph patterns with that BGP repeated for each graph URI in the default dataset. This requires recursively updating the operator.

Each graph pattern containing a variable graph name is replaced by a filter on that variable such that the variable must match only those named datasets specified.

If no default or no named graphs are specified, these queries are eliminated.

Multiple default graphs union the information from a graph query on each default datasource.

Multiple named graphs place a filter on all variables used to identify those named graphs so that they are restricted to come only from the specified set. Note that this requires descending through expressions to find graph patterns using variables and placing a filter on each identified variable.

Examples:

Dataset with one default and one named data source


(prefix ((: <http://example/>))
  (dataset (<data-g1.ttl> (named <data-g2.ttl>))
    (union
      (bgp (triple ?s ?p ?o))
      (graph ?g (bgp (triple ?s ?p ?o))))))

is effectively re-written to the following:

(prefix ((: <http://example/>))
  (union
    (graph <data-g1.ttl> (bgp (triple ?s ?p ?o)))
    (filter (= ?g <data-g2.ttl>)
      (graph ?g (bgp (triple ?s ?p ?o))))))

Dataset with one default no named data sources


(prefix ((: <http://example/>))
  (dataset (<data-g1.ttl>)
    (union
      (bgp (triple ?s ?p ?o))
      (graph ?g (bgp (triple ?s ?p ?o))))))

is effectively re-written to the following:

(prefix ((: <http://example/>))
  (union
    (graph <data-g1.ttl> (bgp (triple ?s ?p ?o)))
    (bgp))

Dataset with two default data sources


(prefix ((: <http://example/>))
  (dataset (<data-g1.ttl> <data-g2.ttl)
    (bgp (triple ?s ?p ?o))))

is effectively re-written to the following:

(prefix ((: <http://example/>))
  (union
    (graph <data-g1.ttl> (bgp (triple ?s ?p ?o)))
    (graph <data-g2.ttl> (bgp (triple ?s ?p ?o)))))

Dataset with two named data sources


(prefix ((: <http://example/>))
  (dataset ((named <data-g1.ttl>) (named <data-g2.ttl>))
    (graph ?g (bgp (triple ?s ?p ?o)))))

is effectively re-written to the following:

(prefix ((: <http://example/>))
  (filter ((= ?g <data-g1.ttl>) || (= ?g <data-g2.ttl>))
    (graph ?g (bgp (triple ?s ?p ?o))))))

SPARQL Grammar

BASE     <http://example.org/>
PREFIX : <http://example.com/>

SELECT * 
FROM <data-g1.ttl>
{ ?s ?p ?o }

SSE

(base <http://example.org/>
 (prefix ((: <http://example.com/>))
  (dataset (<data-g1.ttl>)
   (bgp (triple ?s ?p ?o)))))

Constant Summary collapse

NAME =

:dataset

Instance Attribute Summary

Attributes included from Query

#solutions

Attributes inherited from SPARQL::Algebra::Operator

#operands

Instance Method Summary collapse

#execute(queryable, **options) {|solution| ... } ⇒ RDF::Query::Solutions

Executes this query on the given queryable graph or repository.
#to_sparql(**options) ⇒ String

Returns a partial SPARQL grammar for this operator.

Methods included from Query

#each_solution, #empty?, #failed?, #graph_name=, #matched?, #query_yields_boolean?, #query_yields_solutions?, #query_yields_statements?, #unshift, #variables

Methods inherited from Binary

#initialize

Methods inherited from SPARQL::Algebra::Operator

#aggregate?, arity, #base_uri, base_uri, base_uri=, #bind, #boolean, #constant?, #deep_dup, #each_descendant, #eql?, #evaluatable?, evaluate, #executable?, #first_ancestor, for, #initialize, #inspect, #mergable?, #ndvars, #node?, #operand, #optimize, #optimize!, #parent, #parent=, #prefixes, prefixes, prefixes=, #rewrite, #to_binary, to_sparql, #to_sxp, #to_sxp_bin, #validate!, #variable?, #variables, #vars

Methods included from Expression

cast, #constant?, #evaluate, extension, extension?, extensions, for, #invalid?, new, #node?, open, #optimize, #optimize!, parse, register_extension, #to_sxp_bin, #valid?, #validate!, #variable?

Constructor Details

This class inherits a constructor from SPARQL::Algebra::Operator::Binary

Instance Method Details

#execute(queryable, **options) {|solution| ... } ⇒ `RDF::Query::Solutions`

Executes this query on the given queryable graph or repository. Reads specified data sources into queryable. Named data sources are added using a context of the data source URI.

Datasets are specified in operand(1), which is an array of default or named graph URIs.

If options contains any of the Protocol attributes, the dataset is constructed on creation, and these operations should be ignored:

default-graph-uri
named-graph-uri

Parameters:

queryable (RDF::Queryable) —

the graph or repository to query
options (Hash{Symbol => Object}) —

any additional keyword options

Yields:

(solution) —

each matching solution

Yield Parameters:

solution (RDF::Query::Solution)

Yield Returns:

(void) —

ignored

Returns:

(RDF::Query::Solutions) —

the resulting solution sequence

See Also:

https://www.w3.org/TR/sparql11-query/#sparqlAlgebra

# File 'lib/sparql/algebra/operator/dataset.rb', line 148

def execute(queryable, **options, &base)
  debug(options) {"Dataset"}
  if %i(default-graph-uri named-graph-uri).any? {|k| options.key?(k)} 
    debug("=> Skip constructing merge repo due to options", options)
    return queryable.query(operands.last, **options.merge(depth: options[:depth].to_i + 1), &base)
  end
 
  default_datasets = []
  named_datasets = []
  operand(0).each do |uri|
    case uri
    when Array
      # Format is (named <uri>), only need the URI part
      uri = uri.last
      debug(options) {"=> named data source #{uri}"}
      named_datasets << uri
    else
      debug(options) {"=> default data source #{uri}"}
      default_datasets << uri
    end
    load_opts = {logger: options.fetch(:logger, false), graph_name: uri, base_uri: uri}
    unless queryable.has_graph?(uri)
      debug(options) {"=> load #{uri}"}
      queryable.load(uri.to_s, **load_opts)
    end
  end
  debug(options) {
    require 'rdf/nquads'
    queryable.dump(:nquads)
  }

  # Create an aggregate based on queryable having just the bits we want
  aggregate = RDF::AggregateRepo.new(queryable)
  named_datasets.each {|name| aggregate.named(name) if queryable.has_graph?(name)}
  aggregate.default(*default_datasets.select {|name| queryable.has_graph?(name)})
  aggregate.query(operands.last, **options.merge(depth: options[:depth].to_i + 1), &base)
end

#to_sparql(**options) ⇒ `String`

Returns a partial SPARQL grammar for this operator.

Extracts datasets