Module: RDF::Microdata::Reader::Nokogiri
- Defined in:
- lib/rdf/microdata/reader/nokogiri.rb
Overview
Nokogiri implementation of an HTML parser.
Defined Under Namespace
Classes: NodeProxy, NodeSetProxy
Class Method Summary collapse
-
.library ⇒ Symbol
Returns the name of the underlying XML library.
Instance Method Summary collapse
-
#doc_base(base) ⇒ String
Find value of document base.
-
#doc_errors ⇒ Object
Document errors.
-
#find_element_by_id(id) ⇒ Object
Look up an element in the document by id.
-
#getItems ⇒ Object
Based on Microdata element.getItems.
-
#initialize_html(input, **options)
Initializes the underlying XML library.
-
#root ⇒ Object
Return proxy for document root.
Class Method Details
.library ⇒ Symbol
Returns the name of the underlying XML library.
12 13 14 |
# File 'lib/rdf/microdata/reader/nokogiri.rb', line 12 def self.library :nokogiri end |
Instance Method Details
#doc_base(base) ⇒ String
Find value of document base
224 225 226 227 228 229 |
# File 'lib/rdf/microdata/reader/nokogiri.rb', line 224 def doc_base(base) # find if the document has a base element base_el = @doc.at_css("html>head>base") base = base_el.attribute("href").to_s.split("#").first if base_el base end |
#doc_errors ⇒ Object
Document errors
213 214 215 216 217 |
# File 'lib/rdf/microdata/reader/nokogiri.rb', line 213 def doc_errors @doc.errors.reject do |e| e.to_s =~ %r{(The doctype must be the first token in the document)|(Expected a doctype token)|(Unexpected '\?' where start tag name is expected)} end end |
#find_element_by_id(id) ⇒ Object
Look up an element in the document by id
241 242 243 |
# File 'lib/rdf/microdata/reader/nokogiri.rb', line 241 def find_element_by_id(id) (e = @doc.at_css("##{id}")) && NodeProxy.new(e) end |
#getItems ⇒ Object
Based on Microdata element.getItems
235 236 237 |
# File 'lib/rdf/microdata/reader/nokogiri.rb', line 235 def getItems @doc.css('[itemscope]').select {|el| !el.has_attribute?('itemprop')}.map {|n| NodeProxy.new(n)} end |
#initialize_html(input, **options)
This method returns an undefined value.
Initializes the underlying XML library.
181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 |
# File 'lib/rdf/microdata/reader/nokogiri.rb', line 181 def initialize_html(input, **) require 'nokogiri' unless defined?(::Nokogiri) @doc = case input when ::Nokogiri::XML::Document input else # Try to detect charset from input [:encoding] ||= input.charset if input.respond_to?(:charset) # Otherwise, default is utf-8 [:encoding] ||= 'utf-8' [:encoding] = [:encoding].to_s if [:encoding] begin input = input.read if input.respond_to?(:read) ::Nokogiri::HTML5(input.force_encoding([:encoding]), max_parse_errors: 1000) rescue LoadError, NoMethodError ::Nokogiri::HTML.parse(input, base_uri.to_s, [:encoding]) end end end |