Class: RDF::Reader Abstract

Inherits:
Object
  • Object
show all
Extended by:
Enumerable, Util::Aliasing::LateBound
Includes:
Enumerable, Readable, Util::Logger
Defined in:
lib/rdf/reader.rb

Overview

This class is abstract.

The base class for RDF parsers.

Examples:

Loading an RDF reader implementation

require 'rdf/ntriples'

Iterating over known RDF reader classes

RDF::Reader.each { |klass| puts klass.name }

Obtaining an RDF reader class

RDF::Reader.for(:ntriples)     #=> RDF::NTriples::Reader
RDF::Reader.for("etc/doap.nt")
RDF::Reader.for(file_name:      "etc/doap.nt")
RDF::Reader.for(file_extension: "nt")
RDF::Reader.for(content_type:   "application/n-triples")

Instantiating an RDF reader class

RDF::Reader.for(:ntriples).new($stdin) { |reader| ... }

Parsing RDF statements from a file

RDF::Reader.open("etc/doap.nt") do |reader|
  reader.each_statement do |statement|
    puts statement.inspect
  end
end

Parsing RDF statements from a string

data = StringIO.new(File.read("etc/doap.nt"))
RDF::Reader.for(:ntriples).new(data) do |reader|
  reader.each_statement do |statement|
    puts statement.inspect
  end
end

See Also:

Direct Known Subclasses

NTriples::Reader

Constant Summary

Constants included from Util::Logger

Util::Logger::IOWrapper

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Methods included from Util::Aliasing::LateBound

alias_method

Methods included from Enumerable

#canonicalize, #canonicalize!, #dump, #each_graph, #each_object, #each_predicate, #each_quad, #each_subject, #each_term, #enum_graph, #enum_object, #enum_predicate, #enum_quad, #enum_statement, #enum_subject, #enum_term, #enum_triple, #graph?, #graph_names, #invalid?, #method_missing, #object?, #objects, #predicate?, #predicates, #project_graph, #quad?, #quads, #respond_to_missing?, #statement?, #statements, #subject?, #subjects, #supports?, #term?, #terms, #to_a, #to_h, #to_set, #triple?, #triples, #validate!

Methods included from Countable

#count, #empty?

Methods included from Readable

#readable?

Methods included from Util::Logger

#log_debug, #log_depth, #log_error, #log_fatal, #log_info, #log_recover, #log_recovering?, #log_statistics, #log_warn, #logger

Constructor Details

#initialize(input = $stdin, base_uri: nil, canonicalize: false, encoding: Encoding::UTF_8, intern: true, prefixes: Hash.new, rdfstar: false, validate: false, **options) {|reader| ... } ⇒ Reader

Initializes the reader.

Parameters:

  • input (IO, File, String) (defaults to: $stdin)

    the input stream to read

  • base_uri (#to_s) (defaults to: nil)

    (nil) the base URI to use when resolving relative URIs (not supported by all readers)

  • canonicalize (Boolean) (defaults to: false)

    (false) whether to canonicalize parsed URIs and Literals.

  • encoding (Encoding) (defaults to: Encoding::UTF_8)

    (Encoding::UTF_8) the encoding of the input stream

  • intern (Boolean) (defaults to: true)

    (true) whether to intern all parsed URIs

  • rdfstar (Boolean) (defaults to: false)

    (false) Preliminary support for RDF 1.2.

  • prefixes (Hash) (defaults to: Hash.new)

    (Hash.new) the prefix mappings to use (not supported by all readers)

  • options (Hash{Symbol => Object})

    any additional options

  • validate (Boolean) (defaults to: false)

    (false) whether to validate the parsed statements and values

Options Hash (**options):

  • :version (String)

    Parse a specific version of RDF (“1.1’, ”1.2“, or ”1.2-basic“”)

Yields:

  • (reader)

    self

Yield Parameters:

Yield Returns:

  • (void)

    ignored



298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
# File 'lib/rdf/reader.rb', line 298

def initialize(input = $stdin,
               base_uri:      nil,
               canonicalize:  false,
               encoding:      Encoding::UTF_8,
               intern:        true,
               prefixes:      Hash.new,
               rdfstar:       false,
               validate:      false,
               **options,
               &block)

  base_uri     ||= input.base_uri if input.respond_to?(:base_uri)
  @options = options.merge({
    base_uri:       base_uri,
    canonicalize:   canonicalize,
    encoding:       encoding,
    intern:         intern,
    prefixes:       prefixes,
    rdfstar:        rdfstar,
    validate:       validate
  })

  # The rdfstar option implies version 1.2, but can be overridden
  @options[:version] ||= "1.2" if @options[:rdfstar]

  unless self.version.nil? || RDF::Format::VERSIONS.include?(self.version)
    log_error("Expected version to be one of #{RDF::Format::VERSIONS.join(', ')}, was #{self.version}")
  end

  @input = case input
    when String then StringIO.new(input)
    else input
  end

  if block_given?
    case block.arity
      when 0 then instance_eval(&block)
      else block.call(self)
    end
  end
end

Dynamic Method Handling

This class handles dynamic methods through the method_missing method in the class RDF::Enumerable

Instance Attribute Details

#optionsHash (readonly)

Any additional options for this reader.

Returns:

  • (Hash)

Since:

  • 0.3.0



345
346
347
# File 'lib/rdf/reader.rb', line 345

def options
  @options
end

Class Method Details

.each {|klass| ... } ⇒ Enumerator

Enumerates known RDF reader classes.

Yields:

  • (klass)

Yield Parameters:

  • klass (Class)

Returns:



53
54
55
# File 'lib/rdf/reader.rb', line 53

def self.each(&block)
  RDF::Format.map(&:reader).reject(&:nil?).each(&block)
end

.for(format) ⇒ Class .for(filename) ⇒ Class .for(options = {}) ⇒ Class

Finds an RDF reader class based on the given criteria.

If the reader class has a defined format, use that.

Overloads:

  • .for(format) ⇒ Class

    Finds an RDF reader class based on a symbolic name.

    Parameters:

    • format (Symbol)

    Returns:

    • (Class)
  • .for(filename) ⇒ Class

    Finds an RDF reader class based on a file name.

    Parameters:

    • filename (String)

    Returns:

    • (Class)
  • .for(options = {}) ⇒ Class

    Finds an RDF reader class based on various options.

    Parameters:

    • options (Hash{Symbol => Object}) (defaults to: {})

    Options Hash (options):

    • :file_name (String, #to_s) — default: nil
    • :file_extension (Symbol, #to_sym) — default: nil
    • :content_type (String, #to_s) — default: nil
    • :sample (String) — default: nil

      A sample of input used for performing format detection. If we find no formats, or we find more than one, and we have a sample, we can perform format detection to find a specific format to use, in which case we pick the first one we find

    Yield Returns:

    • (String)

      another way to provide a sample, allows lazy for retrieving the sample.

    Returns:

    • (Class)
    • (Class)

Returns:

  • (Class)


91
92
93
94
95
96
97
98
99
100
101
102
# File 'lib/rdf/reader.rb', line 91

def self.for(*arg, &block)
  case arg.length
  when 0 then arg = nil
  when 1 then arg = arg.first
  else
    raise ArgumentError, "Format.for accepts zero or one argument, got #{arg.length}."
  end
  arg = arg.merge(has_reader: true) if arg.is_a?(Hash)
  if format = self.format || Format.for(arg, &block)
    format.reader
  end
end

.format(klass = nil) ⇒ Class Also known as: format_class

Retrieves the RDF serialization format class for this reader class.

Returns:

  • (Class)


108
109
110
111
112
113
114
115
116
117
# File 'lib/rdf/reader.rb', line 108

def self.format(klass = nil)
  if klass.nil?
    Format.each do |format|
      if format.reader == self
        return format
      end
    end
    nil # not found
  end
end

.open(filename, format: nil, **options) {|reader| ... } ⇒ Object

Note:

A reader returned via this method may not be readable depending on the processing model of the specific reader, as the file is only open during the scope of open. The reader is intended to be accessed through a block.

Parses input from the given file name or URL.

Examples:

Parsing RDF statements from a file

RDF::Reader.open("etc/doap.nt") do |reader|
  reader.each_statement do |statement|
    puts statement.inspect
  end
end

Parameters:

Yields:

  • (reader)

Yield Parameters:

Yield Returns:

  • (void)

    ignored

Raises:



215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
# File 'lib/rdf/reader.rb', line 215

def self.open(filename, format: nil, **options, &block)
  # If we're the abstract reader, and we can figure out a concrete reader from format, use that.
  if self == RDF::Reader && format && reader = self.for(format)
    return reader.open(filename, format: format, **options, &block)
  end

  # If we are a concrete reader class or format is not nil, set accept header from our content_types.
  unless self == RDF::Reader
    headers = (options[:headers] ||= {})
    headers['Accept'] ||= (self.format.accept_type + %w(*/*;q=0.1)).join(", ")
  end

  Util::File.open_file(filename, **options) do |file|
    format_options = options.dup
    format_options[:content_type] ||= file.content_type if
      file.respond_to?(:content_type) &&
      !file.content_type.to_s.include?('text/plain')
    format_options[:file_name] ||= filename
    reader = if self == RDF::Reader
      # We are the abstract reader class, find an appropriate reader
      self.for(format || format_options) do
        # Return a sample from the input file
        sample = file.read(1000)
        file.rewind
        sample
      end
    else
      # We are a concrete reader class
      self
    end

    options[:encoding] ||= file.encoding if file.respond_to?(:encoding)
    options[:filename] ||= filename

    if reader
      reader.new(file, **options, &block)
    else
      raise FormatError, "unknown RDF format: #{format_options.inspect}#{"\nThis may be resolved with a require of the 'linkeddata' gem." unless Object.const_defined?(:LinkedData)}"
    end
  end
end

.optionsArray<RDF::CLI::Option>

Options suitable for automatic Reader provisioning.

Returns:



122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
# File 'lib/rdf/reader.rb', line 122

def self.options
  [
    RDF::CLI::Option.new(
      symbol: :base_uri,
      control: :url,
      datatype: RDF::URI,
      on: ["--uri URI"],
      description: "Base URI of input file, defaults to the filename.") {|arg| RDF::URI(arg)},
    RDF::CLI::Option.new(
      symbol: :canonicalize,
      datatype: TrueClass,
      on: ["--canonicalize"],
      control: :checkbox,
      default: false,
      description: "Canonicalize URI/literal forms") {true},
    RDF::CLI::Option.new(
      symbol: :encoding,
      datatype: Encoding,
      control: :text,
      on: ["--encoding ENCODING"],
      description: "The encoding of the input stream.") {|arg| Encoding.find arg},
    RDF::CLI::Option.new(
      symbol: :intern,
      datatype: TrueClass,
      control: :none,
      on: ["--intern"],
      description: "Intern all parsed URIs."),
    RDF::CLI::Option.new(
      symbol: :prefixes,
      datatype: Hash,
      control: :none,
      multiple: true,
      on: ["--prefixes PREFIX:URI,PREFIX:URI"],
      description: "A comma-separated list of prefix:uri pairs.") do |arg|
        arg.split(',').inject({}) do |memo, pfxuri|
          pfx,uri = pfxuri.split(':', 2)
          memo.merge(pfx.to_sym => RDF::URI(uri))
        end
    end,
    RDF::CLI::Option.new(
      symbol: :rdfstar,
      datatype: TrueClass,
      control: :checkbox,
      on: ["--rdfstar"],
      description: "Parse RDF-star for preliminary RDF 1.2 support."),
    RDF::CLI::Option.new(
      symbol: :validate,
      datatype: TrueClass,
      control: :checkbox,
      on: ["--[no-]validate"],
      description: "Validate on input and output."),
    RDF::CLI::Option.new(
      symbol: :verifySSL,
      datatype: TrueClass,
      default: true,
      control: :checkbox,
      on: ["--[no-]verifySSL"],
      description: "Verify SSL results on HTTP GET"),
    RDF::CLI::Option.new(
      symbol: :version,
      control: :select,
      datatype: RDF::Format::VERSIONS, # 1.1, 1.2, or 1.2-basic
      on: ["--version VERSION"],
      description: "RDF Version."),
  ]
end

.to_symSymbol

Returns a symbol appropriate to use with RDF::Reader.for()

Returns:

  • (Symbol)


260
261
262
# File 'lib/rdf/reader.rb', line 260

def self.to_sym
  self.format.to_sym
end

Instance Method Details

#base_uriRDF::URI

Returns the base URI determined by this reader.

Examples:

reader.base_uri  #=> RDF::URI('http://example.com/')

Returns:

Since:

  • 0.3.0



355
356
357
# File 'lib/rdf/reader.rb', line 355

def base_uri
  RDF::URI(@options[:base_uri]) if @options[:base_uri]
end

#canonicalize?Boolean

Note:

This is for term canonicalization, for graph/dataset canonicalization use RDF::Normalize.

Returns true if parsed values should be in canonical form.

Returns:

  • (Boolean)

    true or false

Since:

  • 0.3.0



644
645
646
# File 'lib/rdf/reader.rb', line 644

def canonicalize?
  @options[:canonicalize]
end

#close Also known as: close!

This method returns an undefined value.

Closes the input stream, after which an IOError will be raised for further read attempts.

If the input stream is already closed, does nothing.



513
514
515
# File 'lib/rdf/reader.rb', line 513

def close
  @input.close unless @input.closed?
end

#each_pg_statement(statement, &block) ⇒ Object (protected)

Recursively emit embedded statements in Property Graph mode

Parameters:



600
601
602
603
604
605
606
607
608
609
610
# File 'lib/rdf/reader.rb', line 600

def each_pg_statement(statement, &block)
  if statement.subject.is_a?(Statement)
    block.call(statement.subject)
    each_pg_statement(statement.subject, &block)
  end

  if statement.object.is_a?(Statement)
    block.call(statement.object)
    each_pg_statement(statement.object, &block)
  end
end

#each_statement {|statement| ... } #each_statementEnumerator Also known as: each

This method returns an undefined value.

Iterates the given block for each RDF statement.

If no block was given, returns an enumerator.

Statements are yielded in the order that they are read from the input stream.

Overloads:

  • #each_statement {|statement| ... }

    This method returns an undefined value.

    Yields:

    • (statement)

      each statement

    Yield Parameters:

    Yield Returns:

    • (void)

      ignored

  • #each_statementEnumerator

    Returns:

Raises:

See Also:



442
443
444
445
446
447
448
449
450
451
452
453
454
# File 'lib/rdf/reader.rb', line 442

def each_statement(&block)
  if block_given?
    begin
      loop do
        st = read_statement
        block.call(st) unless st.nil?
      end
    rescue EOFError
      rewind rescue nil
    end
  end
  enum_for(:each_statement)
end

#each_triple {|subject, predicate, object| ... } #each_tripleEnumerator

This method returns an undefined value.

Iterates the given block for each RDF triple.

If no block was given, returns an enumerator.

Triples are yielded in the order that they are read from the input stream.

Overloads:

  • #each_triple {|subject, predicate, object| ... }

    This method returns an undefined value.

    Yields:

    • (subject, predicate, object)

      each triple

    Yield Parameters:

    Yield Returns:

    • (void)

      ignored

  • #each_tripleEnumerator

    Returns:

See Also:



479
480
481
482
483
484
485
486
487
488
489
490
491
# File 'lib/rdf/reader.rb', line 479

def each_triple(&block)
  if block_given?
    begin
      loop do
        triple = read_triple
        block.call(*triple) unless triple.nil?
      end
    rescue EOFError
      rewind rescue nil
    end
  end
  enum_for(:each_triple)
end

#encodingEncoding

Returns the encoding of the input stream.

Returns:

  • (Encoding)


617
618
619
620
621
622
623
624
625
626
# File 'lib/rdf/reader.rb', line 617

def encoding
  case @options[:encoding]
  when String, Symbol
    Encoding.find(@options[:encoding].to_s)
  when Encoding
    @options[:encoding]
  else
    @options[:encoding] ||= Encoding.find(self.class.format.content_encoding.to_s)
  end
end

#fail_object (protected)

This method returns an undefined value.

Raises an “expected object” parsing error on the current line.

Raises:



592
593
594
# File 'lib/rdf/reader.rb', line 592

def fail_object
  log_error("Expected object (found: #{current_line.inspect})", lineno: lineno, exception: RDF::ReaderError)
end

#fail_predicate (protected)

This method returns an undefined value.

Raises an “expected predicate” parsing error on the current line.

Raises:



583
584
585
# File 'lib/rdf/reader.rb', line 583

def fail_predicate
  log_error("Expected predicate (found: #{current_line.inspect})", lineno: lineno, exception: RDF::ReaderError)
end

#fail_subject (protected)

This method returns an undefined value.

Raises an “expected subject” parsing error on the current line.

Raises:



574
575
576
# File 'lib/rdf/reader.rb', line 574

def fail_subject
  log_error("Expected subject (found: #{current_line.inspect})", lineno: lineno, exception: RDF::ReaderError)
end

#intern?Boolean

Returns true if parsed URIs should be interned.

Returns:

  • (Boolean)

    true or false

Since:

  • 0.3.0



653
654
655
# File 'lib/rdf/reader.rb', line 653

def intern?
  @options[:intern]
end

#linenoInteger

Current line number being processed. For formats that can associate generated Statement with a particular line number from input, this value reflects that line number.

Returns:

  • (Integer)


521
522
523
# File 'lib/rdf/reader.rb', line 521

def lineno
  @input.lineno
end

#prefix(name, uri) ⇒ RDF::URI #prefix(name) ⇒ RDF::URI Also known as: prefix!

Defines the given named URI prefix for this reader.

Examples:

Defining a URI prefix

reader.prefix :dc, RDF::URI('http://purl.org/dc/terms/')

Returning a URI prefix

reader.prefix(:dc)    #=> RDF::URI('http://purl.org/dc/terms/')

Overloads:

  • #prefix(name, uri) ⇒ RDF::URI

    Parameters:

    • name (Symbol, #to_s)
    • uri (RDF::URI, #to_s)
  • #prefix(name) ⇒ RDF::URI

    Parameters:

    • name (Symbol, #to_s)

Returns:



403
404
405
406
# File 'lib/rdf/reader.rb', line 403

def prefix(name, uri = nil)
  name = name.to_s.empty? ? nil : (name.respond_to?(:to_sym) ? name.to_sym : name.to_s.to_sym)
  uri.nil? ? prefixes[name] : prefixes[name] = uri
end

#prefixesHash{Symbol => RDF::URI}

Returns the URI prefixes currently defined for this reader.

Examples:

reader.prefixes[:dc]  #=> RDF::URI('http://purl.org/dc/terms/')

Returns:

Since:

  • 0.3.0



367
368
369
# File 'lib/rdf/reader.rb', line 367

def prefixes
  @options[:prefixes] ||= {}
end

#prefixes=(prefixes) ⇒ Hash{Symbol => RDF::URI}

Defines the given URI prefixes for this reader.

Examples:

reader.prefixes = {
  dc: RDF::URI('http://purl.org/dc/terms/'),
}

Parameters:

Returns:

Since:

  • 0.3.0



382
383
384
# File 'lib/rdf/reader.rb', line 382

def prefixes=(prefixes)
  @options[:prefixes] = prefixes
end

#read_statementRDF::Statement (protected)

This method is abstract.

Reads a statement from the input stream.

Returns:

Raises:

  • (NotImplementedError)

    unless implemented in subclass



555
556
557
# File 'lib/rdf/reader.rb', line 555

def read_statement
  Statement.new(*read_triple)
end

#read_tripleArray(RDF::Term) (protected)

This method is abstract.

Reads a triple from the input stream.

Returns:

Raises:

  • (NotImplementedError)

    unless implemented in subclass



565
566
567
# File 'lib/rdf/reader.rb', line 565

def read_triple
  raise NotImplementedError, "#{self.class}#read_triple" # override in subclasses
end

#rewind Also known as: rewind!

This method returns an undefined value.

Rewinds the input stream to the beginning of input.



499
500
501
# File 'lib/rdf/reader.rb', line 499

def rewind
  @input.rewind
end

#to_symSymbol

Returns a symbol appropriate to use with RDF::Reader.for()

Returns:

  • (Symbol)


267
268
269
# File 'lib/rdf/reader.rb', line 267

def to_sym
  self.class.to_sym
end

#valid?Boolean

Note:

this parses the full input and is valid only in the reader block. Use Reader.new(input, validate: true) if you intend to capture the result.

Examples:

Parsing RDF statements from a file

RDF::NTriples::Reader.new("!!invalid input??") do |reader|
  reader.valid? # => false
end

Returns:

  • (Boolean)

See Also:



540
541
542
543
544
545
# File 'lib/rdf/reader.rb', line 540

def valid?
  super && !log_statistics[:error]
rescue ArgumentError, RDF::ReaderError => e
  log_error(e.message + " at #{e.backtrace.first}")
  false
end

#validate?Boolean

Returns true if parsed statements and values should be validated.

Returns:

  • (Boolean)

    true or false

Since:

  • 0.3.0



633
634
635
# File 'lib/rdf/reader.rb', line 633

def validate?
  @options[:validate]
end

#versionString

Returns the RDF version determined by this reader.

Examples:

reader.version  #=> "1.2"

Returns:

  • (String)

Since:

  • 3.3.4



417
418
419
# File 'lib/rdf/reader.rb', line 417

def version
  @options[:version]
end