Class: SPARQL::Algebra::Operator::Regex

Inherits:
SPARQL::Algebra::Operator show all
Includes:
Evaluatable
Defined in:
lib/sparql/algebra/operator/regex.rb

Overview

The SPARQL regex operator.

[122] RegexExpression ::= ‘REGEX’ ‘(’ Expression ‘,’ Expression ( ‘,’ Expression )? ‘)’

Examples:

SPARQL Grammar

PREFIX  ex: <http://example.com/#>
PREFIX  rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT ?val
WHERE {
  ex:foo rdf:value ?val .
  FILTER regex(?val, "GHI")
}

SSE

(prefix ((ex: <http://example.com/#>)
         (rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>))
 (project (?val)
  (filter (regex ?val "GHI")
   (bgp (triple ex:foo rdf:value ?val)))))

See Also:

Constant Summary collapse

NAME =
:regex

Constants inherited from SPARQL::Algebra::Operator

ARITY, IsURI, URI

Constants included from Expression

Expression::PATTERN_PARENTS

Instance Attribute Summary

Attributes inherited from SPARQL::Algebra::Operator

#operands

Instance Method Summary collapse

Methods included from Evaluatable

#evaluate, #memoize, #replace_aggregate!, #replace_vars!

Methods inherited from SPARQL::Algebra::Operator

#aggregate?, arity, #base_uri, base_uri, base_uri=, #bind, #boolean, #constant?, #deep_dup, #each_descendant, #eql?, #evaluatable?, evaluate, #executable?, #first_ancestor, for, #initialize, #inspect, #mergable?, #ndvars, #node?, #operand, #optimize, #optimize!, #parent, #parent=, #prefixes, prefixes, prefixes=, #rewrite, #to_binary, to_sparql, #to_sxp, #to_sxp_bin, #validate!, #variable?, #variables, #vars

Methods included from Expression

cast, #constant?, #evaluate, extension, extension?, extensions, for, #invalid?, new, #node?, open, #optimize, #optimize!, parse, register_extension, #to_sxp_bin, #valid?, #validate!, #variable?

Constructor Details

This class inherits a constructor from SPARQL::Algebra::Operator

Instance Method Details

#apply(text, pattern, flags = RDF::Literal(''), **options) ⇒ RDF::Literal::Boolean

Matches text against a regular expression pattern.

Parameters:

  • text (RDF::Literal)

    a simple literal

  • pattern (RDF::Literal)

    a simple literal

  • flags (RDF::Literal) (defaults to: RDF::Literal(''))

    a simple literal (defaults to an empty string)

Returns:

  • (RDF::Literal::Boolean)

    true or false

Raises:

  • (TypeError)

    if any operand is unbound

  • (TypeError)

    if any operand is not a simple literal



43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
# File 'lib/sparql/algebra/operator/regex.rb', line 43

def apply(text, pattern, flags = RDF::Literal(''), **options)
  # @see https://www.w3.org/TR/xpath-functions/#regex-syntax
  raise TypeError, "expected a plain RDF::Literal, but got #{text.inspect}" unless text.is_a?(RDF::Literal) && text.plain?
  text = text.to_s
  # TODO: validate text syntax

  # @see https://www.w3.org/TR/xpath-functions/#regex-syntax
  raise TypeError, "expected a plain RDF::Literal, but got #{pattern.inspect}" unless pattern.is_a?(RDF::Literal) && pattern.plain?
  pattern = pattern.to_s
  # TODO: validate pattern syntax

  # @see https://www.w3.org/TR/xpath-functions/#flags
  raise TypeError, "expected a plain RDF::Literal, but got #{flags.inspect}" unless flags.is_a?(RDF::Literal) && flags.plain?
  flags = flags.to_s
  # TODO: validate flag syntax

  # 's' mode in XPath is like ruby MUTLILINE
  # 'm' mode in XPath is like ruby /^$/ vs /\A\z/
  unless flags.include?(?m)
    pattern = '\A' + pattern[1..-1] if pattern.start_with?('^')
    pattern = pattern[0..-2] + '\z' if pattern.end_with?('$')
  end

  options = 0
  if flags.include?('x')
    flags = flags.sub('x', '')
    # If present, whitespace characters (#x9, #xA, #xD and #x20) in the regular expression are removed prior to matching with one exception: whitespace characters within character class expressions (charClassExpr) are not removed. This flag can be used, for example, to break up long regular expressions into readable lines.
    # Scan pattern entering a state when scanning `[` that does nto remove whitespace and exit that state when scanning `]`.
    in_charclass = false
    pattern = pattern.chars.map do |c|
      case c
      when '['
        in_charclass = true
        c
      when ']'
        in_charclass = false
        c
      else
        c.match?(/\s/) && !in_charclass ? '' : c
      end
    end.join('')
  end

  if flags.include?('q')
    flags = flags.sub('x', '')
    # if present, all characters in the regular expression are treated as representing themselves, not as metacharacters. In effect, every character that would normally have a special meaning in a regular expression is implicitly escaped by preceding it with a backslash.
    # Simply replace every character with an escaped version of that character
    pattern = pattern.chars.map do |c|
      case c
      when '.', '?', '*', '^', '$', '+', '(', ')', '[', ']', '{', '}'
        "\\#{c}"
      else
        c
      end
    end.join("")
  end

  options |= Regexp::MULTILINE  if flags.include?(?s) # dot-all mode
  options |= Regexp::IGNORECASE if flags.include?(?i)
  RDF::Literal(Regexp.new(pattern, options) === text)
end

#to_sparql(**options) ⇒ String

Returns a partial SPARQL grammar for this operator.

Returns:

  • (String)


110
111
112
113
# File 'lib/sparql/algebra/operator/regex.rb', line 110

def to_sparql(**options)
  ops = operands.last.to_s.empty? ? operands[0..-2] : operands
  "regex(" + ops.to_sparql(delimiter: ', ', **options) + ")"
end