Class: RDF::NTriples::Reader
- Includes:
- Util::Logger
- Defined in:
- lib/rdf/ntriples/reader.rb
Overview
N-Triples parser.
** RDF=star
Supports statements as resources using <<(s p o)>>
.
Direct Known Subclasses
Constant Summary collapse
- ESCAPE_CHARS =
["\b", "\f", "\t", "\n", "\r", "\"", "'", "\\"].freeze
- UCHAR4 =
/(?<!\\)\\(?!\\)u([0-9A-Fa-f]{4,4})/.freeze
- UCHAR8 =
/(?<!\\)\\(?!\\)U([0-9A-Fa-f]{8,8})/.freeze
- UCHAR =
Regexp.union(UCHAR4, UCHAR8).freeze
- U_CHARS1 =
Terminals from rdf-turtle.
Unicode regular expressions.
Regexp.compile(<<-EOS.gsub(/\s+/, '')) [\\u00C0-\\u00D6]|[\\u00D8-\\u00F6]|[\\u00F8-\\u02FF]| [\\u0370-\\u037D]|[\\u037F-\\u1FFF]|[\\u200C-\\u200D]| [\\u2070-\\u218F]|[\\u2C00-\\u2FEF]|[\\u3001-\\uD7FF]| [\\uF900-\\uFDCF]|[\\uFDF0-\\uFFFD]|[\\u{10000}-\\u{EFFFF}] EOS
- U_CHARS2 =
Regexp.compile("\\u00B7|[\\u0300-\\u036F]|[\\u203F-\\u2040]").freeze
- IRI_RANGE =
Regexp.compile("[[^<>\"{}\|\^`\\\\]&&[^\\x00-\\x20]]").freeze
- PN_CHARS_BASE =
/[A-Z]|[a-z]|#{U_CHARS1}/.freeze
- PN_CHARS_U =
/_|#{PN_CHARS_BASE}/.freeze
- PN_CHARS =
/-|[0-9]|#{PN_CHARS_U}|#{U_CHARS2}/.freeze
- ECHAR =
/\\[tbnrf"'\\]/.freeze
- IRIREF =
/<((?:#{IRI_RANGE}|#{UCHAR})*)>/.freeze
- BLANK_NODE_LABEL =
/_:((?:[0-9]|#{PN_CHARS_U})(?:(?:#{PN_CHARS}|\.)*#{PN_CHARS})?)/.freeze
- LANG_DIR =
/@([a-zA-Z]+(?:-[a-zA-Z0-9]+)*(?:--[a-zA-Z]+)?)/.freeze
- STRING_LITERAL_QUOTE =
/"((?:[^\"\\\n\r]|#{ECHAR}|#{UCHAR})*)"/.freeze
- TT_START =
/^<<\(/.freeze
- TT_END =
/^\s*\)>>/.freeze
- QT_START =
DEPRECATED
/^<</.freeze
- QT_END =
DEPRECATED
/^\s*>>/.freeze
- COMMENT =
/^#\s*(.*)$/.freeze
- NODEID =
/^#{BLANK_NODE_LABEL}/.freeze
- URIREF =
/^#{IRIREF}/.freeze
- LITERAL_PLAIN =
/^#{STRING_LITERAL_QUOTE}/.freeze
- LITERAL_WITH_LANGUAGE =
/^#{STRING_LITERAL_QUOTE}#{LANG_DIR}/.freeze
- LITERAL_WITH_DATATYPE =
/^#{STRING_LITERAL_QUOTE}\^\^#{IRIREF}/.freeze
- DATATYPE_URI =
/^\^\^#{IRIREF}/.freeze
- LITERAL =
Regexp.union(LITERAL_WITH_LANGUAGE, LITERAL_WITH_DATATYPE, LITERAL_PLAIN).freeze
- SUBJECT =
Regexp.union(URIREF, NODEID).freeze
- PREDICATE =
Regexp.union(URIREF).freeze
- OBJECT =
Regexp.union(URIREF, NODEID, LITERAL).freeze
- END_OF_STATEMENT =
/^\s*\.\s*(?:#.*)?$/.freeze
- LANGTAG =
LANGTAG is deprecated
LANG_DIR
- RDF_VERSION =
/VERSION/.freeze
- ESCAPE_CHARS_ESCAPED =
cache constants to optimize escaping the escape chars in self.unescape
{ "\\b" => "\b", "\\f" => "\f", "\\t" => "\t", "\\n" => "\n", "\\r" => "\r", "\\\"" => "\"", "\\'" => "'", "\\\\" => "\\" } .freeze
- ESCAPE_CHARS_ESCAPED_REGEXP =
Regexp.union( ESCAPE_CHARS_ESCAPED.keys ).freeze
Constants included from Util::Logger
Instance Attribute Summary
Attributes inherited from Reader
Class Method Summary collapse
-
.parse_literal(input, **options) ⇒ RDF::Term, RDF::Literal
Reconstructs an RDF value from its serialized N-Triples representation.
-
.parse_node(input, **options) ⇒ RDF::Term, RDF::Node
Reconstructs an RDF value from its serialized N-Triples representation.
-
.parse_object(input, **options) ⇒ RDF::Term
Reconstructs an RDF value from its serialized N-Triples representation.
-
.parse_predicate(input, **options) ⇒ RDF::Term, RDF::URI
Reconstructs an RDF value from its serialized N-Triples representation.
-
.parse_subject(input, **options) ⇒ RDF::Term, RDF::Resource
Reconstructs an RDF value from its serialized N-Triples representation.
-
.parse_uri(input, intern: false, **options) ⇒ RDF::Term, RDF::URI
Reconstructs an RDF value from its serialized N-Triples representation.
- .unescape(string) ⇒ String
-
.unserialize(input, **options) ⇒ RDF::Term
Reconstructs an RDF value from its serialized N-Triples representation.
Instance Method Summary collapse
- #read_comment ⇒ Boolean
- #read_eos ⇒ Boolean
- #read_literal ⇒ RDF::Literal
- #read_node ⇒ RDF::Node
- #read_triple ⇒ Array
- #read_tripleTerm ⇒ RDF::Statement
- #read_uriref(intern: false, **options) ⇒ RDF::URI
- #read_value ⇒ RDF::Term
- #read_version ⇒ String
Methods included from Util::Logger
#log_debug, #log_depth, #log_error, #log_fatal, #log_info, #log_recover, #log_recovering?, #log_statistics, #log_warn, #logger
Methods inherited from Reader
#base_uri, #canonicalize?, #close, each, #each_pg_statement, #each_statement, #each_triple, #encoding, #fail_object, #fail_predicate, #fail_subject, for, format, #initialize, #intern?, #lineno, open, options, #prefix, #prefixes, #prefixes=, #read_statement, #rewind, to_sym, #to_sym, #valid?, #validate?, #version
Methods included from Util::Aliasing::LateBound
Methods included from Enumerable
#canonicalize, #canonicalize!, #dump, #each_graph, #each_object, #each_predicate, #each_quad, #each_statement, #each_subject, #each_term, #each_triple, #enum_graph, #enum_object, #enum_predicate, #enum_quad, #enum_statement, #enum_subject, #enum_term, #enum_triple, #graph?, #graph_names, #invalid?, #method_missing, #object?, #objects, #predicate?, #predicates, #project_graph, #quad?, #quads, #respond_to_missing?, #statement?, #statements, #subject?, #subjects, #supports?, #term?, #terms, #to_a, #to_h, #to_set, #triple?, #triples, #valid?, #validate!
Methods included from Countable
Methods included from Readable
Constructor Details
This class inherits a constructor from RDF::Reader
Dynamic Method Handling
This class handles dynamic methods through the method_missing method in the class RDF::Enumerable
Class Method Details
.parse_literal(input, **options) ⇒ RDF::Term, RDF::Literal
Reconstructs an RDF value from its serialized N-Triples representation.
155 156 157 158 159 160 161 162 163 164 165 |
# File 'lib/rdf/ntriples/reader.rb', line 155 def self.parse_literal(input, **) case input when LITERAL_WITH_LANGUAGE language, direction = $4.split('--') RDF::Literal.new(unescape($1), language: language, direction: direction) when LITERAL_WITH_DATATYPE RDF::Literal.new(unescape($1), datatype: $4) when LITERAL_PLAIN RDF::Literal.new(unescape($1)) end end |
.parse_node(input, **options) ⇒ RDF::Term, RDF::Node
Reconstructs an RDF value from its serialized N-Triples representation.
136 137 138 139 140 |
# File 'lib/rdf/ntriples/reader.rb', line 136 def self.parse_node(input, **) if input =~ NODEID RDF::Node.new($1) end end |
.parse_object(input, **options) ⇒ RDF::Term
Reconstructs an RDF value from its serialized N-Triples representation.
129 130 131 |
# File 'lib/rdf/ntriples/reader.rb', line 129 def self.parse_object(input, **) parse_uri(input, **) || parse_node(input, **) || parse_literal(input, **) end |
.parse_predicate(input, **options) ⇒ RDF::Term, RDF::URI
Reconstructs an RDF value from its serialized N-Triples representation.
123 124 125 |
# File 'lib/rdf/ntriples/reader.rb', line 123 def self.parse_predicate(input, **) parse_uri(input, intern: true) end |
.parse_subject(input, **options) ⇒ RDF::Term, RDF::Resource
Reconstructs an RDF value from its serialized N-Triples representation.
116 117 118 |
# File 'lib/rdf/ntriples/reader.rb', line 116 def self.parse_subject(input, **) parse_uri(input, **) || parse_node(input, **) end |
.parse_uri(input, intern: false, **options) ⇒ RDF::Term, RDF::URI
Reconstructs an RDF value from its serialized N-Triples representation.
146 147 148 149 150 |
# File 'lib/rdf/ntriples/reader.rb', line 146 def self.parse_uri(input, intern: false, **) if input =~ URIREF RDF::URI.send(intern ? :intern : :new, unescape($1)) end end |
.unescape(string) ⇒ String
188 189 190 191 192 193 194 195 196 197 198 |
# File 'lib/rdf/ntriples/reader.rb', line 188 def self.unescape(string) # Note: avoiding copying the input string when no escaping is needed # greatly reduces the number of allocations and the processing time. string = string.dup.force_encoding(Encoding::UTF_8) unless string.encoding == Encoding::UTF_8 string .gsub(UCHAR) do [($1 || $2).hex].pack('U*') end .gsub(ESCAPE_CHARS_ESCAPED_REGEXP, ESCAPE_CHARS_ESCAPED) end |
.unserialize(input, **options) ⇒ RDF::Term
Reconstructs an RDF value from its serialized N-Triples representation.
106 107 108 109 110 111 |
# File 'lib/rdf/ntriples/reader.rb', line 106 def self.unserialize(input, **) case input when nil then nil else self.new(input, logger: [], **).read_value end end |
Instance Method Details
#read_comment ⇒ Boolean
264 265 266 |
# File 'lib/rdf/ntriples/reader.rb', line 264 def read_comment match(COMMENT) end |
#read_eos ⇒ Boolean
335 336 337 |
# File 'lib/rdf/ntriples/reader.rb', line 335 def read_eos match(END_OF_STATEMENT) end |
#read_literal ⇒ RDF::Literal
296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 |
# File 'lib/rdf/ntriples/reader.rb', line 296 def read_literal if literal_str = match(LITERAL_PLAIN) literal_str = self.class.unescape(literal_str) literal = case when lang_dir = match(LANG_DIR) language, direction = lang_dir.split('--') raise ArgumentError if direction && !@options[:rdfstar] log_warn("Literal base direction used with version #{version}") if version && version == "1.1" RDF::Literal.new(literal_str, language: language, direction: direction) when datatype = match(/^(\^\^)/) # FIXME RDF::Literal.new(literal_str, datatype: read_uriref || fail_object) else RDF::Literal.new(literal_str) # plain string literal end literal.validate! if validate? literal.canonicalize! if canonicalize? literal end rescue ArgumentError v = literal_str v += "@#{lang_dir}" if lang_dir log_error("Invalid Literal (found: \"#{v}\")", lineno: lineno, token: v, exception: RDF::ReaderError) end |
#read_node ⇒ RDF::Node
286 287 288 289 290 291 |
# File 'lib/rdf/ntriples/reader.rb', line 286 def read_node if node_id = match(NODEID) @nodes ||= {} @nodes[node_id] ||= RDF::Node.new(node_id) end end |
#read_triple ⇒ Array
215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 |
# File 'lib/rdf/ntriples/reader.rb', line 215 def read_triple loop do readline.strip! # EOFError thrown on end of input line = @line # for backtracking input in case of parse error begin if blank? || read_comment # No-op elsif version = read_version @options[:version] = version else subject = read_uriref || read_node || fail_subject predicate = read_uriref(intern: true) || fail_predicate object = read_uriref || read_node || read_literal || read_tripleTerm || fail_object if validate? && !read_eos log_error("Expected end of statement (found: #{current_line.inspect})", lineno: lineno, exception: RDF::ReaderError) end spo = [subject, predicate, object] # Only return valid triples if validating return spo if !validate? || spo.all?(&:valid?) end rescue RDF::ReaderError => e @line = line # this allows #read_value to work raise e end end end |
#read_tripleTerm ⇒ RDF::Statement
246 247 248 249 250 251 252 253 254 255 256 257 258 259 |
# File 'lib/rdf/ntriples/reader.rb', line 246 def read_tripleTerm if @options[:rdfstar] && match(TT_START) if version && version != "1.2" log_warn("Triple term used with version #{version}") end subject = read_uriref || read_node || fail_subject predicate = read_uriref(intern: true) || fail_predicate object = read_uriref || read_node || read_literal || read_tripleTerm || fail_object if !match(TT_END) log_error("Expected end of statement (found: #{current_line.inspect})", lineno: lineno, exception: RDF::ReaderError) end RDF::Statement.new(subject, predicate, object, tripleTerm: true) end end |
#read_uriref(intern: false, **options) ⇒ RDF::URI
272 273 274 275 276 277 278 279 280 281 |
# File 'lib/rdf/ntriples/reader.rb', line 272 def read_uriref(intern: false, **) if uri_str = match(URIREF) uri_str = self.class.unescape(uri_str) uri = RDF::URI.send(intern? && intern ? :intern : :new, uri_str, canonicalize: canonicalize?) uri.validate! if validate? uri end rescue ArgumentError log_error("Invalid URI (found: \"<#{uri_str}>\")", lineno: lineno, token: "<#{uri_str}>", exception: RDF::ReaderError) end |
#read_value ⇒ RDF::Term
202 203 204 205 206 207 208 209 210 |
# File 'lib/rdf/ntriples/reader.rb', line 202 def read_value begin read_statement rescue RDF::ReaderError value = read_uriref || read_node || read_literal || read_tripleTerm log_recover value end end |
#read_version ⇒ String
322 323 324 325 326 327 328 329 330 |
# File 'lib/rdf/ntriples/reader.rb', line 322 def read_version if match(RDF_VERSION) ver_tok = match(LITERAL_PLAIN) unless RDF::Format::VERSIONS.include?(ver_tok) log_warn("Expected version to be one of #{RDF::Format::VERSIONS.join(', ')}, was #{ver_tok}") end ver_tok end end |