Class: RDF::Microdata::Reader

Inherits:
Reader
  • Object
show all
Includes:
Expansion, Util::Logger
Defined in:
lib/rdf/microdata/reader.rb,
lib/rdf/microdata/reader/nokogiri.rb

Overview

An Microdata parser in Ruby

Based on processing rules, amended with the following:

Defined Under Namespace

Modules: Nokogiri

Constant Summary collapse

URL_PROPERTY_ELEMENTS =
%w(a area audio embed iframe img link object source track video)

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Methods included from Expansion

#expand, #rule

Constructor Details

#initialize(input = $stdin, **options) {|reader| ... } ⇒ reader

Initializes the Microdata reader instance.

Parameters:

  • input (Nokogiri::HTML::Document, Nokogiri::XML::Document, IO, File, String) (defaults to: $stdin)

    the input stream to read

  • options (Hash{Symbol => Object})

    any additional options

Options Hash (**options):

  • :encoding (Encoding) — default: Encoding::UTF_8

    the encoding of the input stream (Ruby 1.9+)

  • :validate (Boolean) — default: false

    whether to validate the parsed statements and values

  • :canonicalize (Boolean) — default: false

    whether to canonicalize parsed literals

  • :intern (Boolean) — default: true

    whether to intern all parsed URIs

  • :base_uri (#to_s) — default: nil

    the base URI to use when resolving relative URIs

  • :registry (#to_s)

Yields:

  • (reader)

    self

Yield Parameters:

  • reader (RDF::Reader)

Yield Returns:

  • (void)

    ignored

Raises:

  • (Error)

    Raises RDF::ReaderError when validating



97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
# File 'lib/rdf/microdata/reader.rb', line 97

def initialize(input = $stdin, **options, &block)
  super do
    @library = :nokogiri

    require "rdf/microdata/reader/#{@library}"
    @implementation = Nokogiri
    self.extend(@implementation)

    input.rewind if input.respond_to?(:rewind)
    initialize_html(input, **options) rescue log_fatal($!.message, exception: RDF::ReaderError)

    log_error("Empty document") if root.nil?
    log_error(doc_errors.map(&:message).uniq.join("\n")) if !doc_errors.empty?

    log_debug('', "library = #{@library}")

    # Load registry
    begin
      registry_uri = options[:registry] || RDF::Microdata::DEFAULT_REGISTRY
      log_debug('', "registry = #{registry_uri.inspect}")
      Registry.load_registry(registry_uri)
    rescue JSON::ParserError => e
      log_fatal("Failed to parse registry: #{e.message}", exception: RDF::ReaderError) if (root.nil? && validate?)
    end
    
    if block_given?
      case block.arity
        when 0 then instance_eval(&block)
        else block.call(self)
      end
    end
  end
end

Instance Attribute Details

#implementationModule (readonly)

Returns the HTML implementation module for this reader instance.

Returns:

  • (Module)

    Returns the HTML implementation module for this reader instance.



23
24
25
# File 'lib/rdf/microdata/reader.rb', line 23

def implementation
  @implementation
end

#memoryHash{Object => RDF::Resource} (readonly)

Returns maps RDF elements (items) to resources.

Returns:

  • (Hash{Object => RDF::Resource})

    maps RDF elements (items) to resources



26
27
28
# File 'lib/rdf/microdata/reader.rb', line 26

def memory
  @memory
end

Class Method Details

.optionsObject

Reader options



43
44
45
46
47
48
49
50
51
# File 'lib/rdf/microdata/reader.rb', line 43

def self.options
  super + [
    RDF::CLI::Option.new(
      symbol: :rdfa,
      datatype: TrueClass,
      on: ["--rdfa"],
      description: "Transform and parse as RDFa.") {true},
  ]
end

Instance Method Details

#base_uriHash{Symbol => RDF::URI}

Returns the base URI determined by this reader.

Examples:

reader.prefixes[:dc]  #=> RDF::URI('http://purl.org/dc/terms/')

Returns:

  • (Hash{Symbol => RDF::URI})

Since:

  • 0.3.0



36
37
38
# File 'lib/rdf/microdata/reader.rb', line 36

def base_uri
  @options[:base_uri]
end

#each_statement {|statement| ... }

This method returns an undefined value.

Iterates the given block for each RDF statement in the input.

Reads to graph and performs expansion if required.

Yields:

  • (statement)

Yield Parameters:

  • statement (RDF::Statement)


139
140
141
142
143
144
145
146
147
148
149
150
151
# File 'lib/rdf/microdata/reader.rb', line 139

def each_statement(&block)
  if block_given?
    @callback = block

    # parse
    parse_whole_document(@doc, base_uri)

    if validate? && log_statistics[:error]
      raise RDF::ReaderError, "Errors found during processing"
    end
  end
  enum_for(:each_statement)
end

#each_triple {|subject, predicate, object| ... }

This method returns an undefined value.

Iterates the given block for each RDF triple in the input.

Yields:

  • (subject, predicate, object)

Yield Parameters:

  • subject (RDF::Resource)
  • predicate (RDF::URI)
  • object (RDF::Value)


161
162
163
164
165
166
167
168
# File 'lib/rdf/microdata/reader.rb', line 161

def each_triple(&block)
  if block_given?
    each_statement do |statement|
      block.call(*statement.to_triple)
    end
  end
  enum_for(:each_triple)
end