Class: RDF::NTriples::Writer
- Includes:
- Util::Logger
- Defined in:
- lib/rdf/ntriples/writer.rb
Overview
N-Triples serializer.
Output is serialized for UTF-8, to serialize as ASCII (with) unicode escapes, set encoding: Encoding::ASCII as an option to #initialize.
Direct Known Subclasses
Constant Summary collapse
- ESCAPE_PLAIN =
/\A[\x20-\x21\x23-\x26\x28#{Regexp.escape '['}#{Regexp.escape ']'}-\x7E]*\z/m.freeze
- ESCAPE_PLAIN_U =
/\A(?:#{Reader::IRI_RANGE}|#{Reader::UCHAR})*\z/.freeze
Constants included from Util::Logger
Instance Attribute Summary
Attributes inherited from Writer
Class Method Summary collapse
-
.escape(string, encoding = nil) ⇒ String
Escape Literal and URI content.
-
.escape_ascii(u, encoding) ⇒ String
Standard ASCII escape sequences.
- .escape_uchar(u) ⇒ String
-
.escape_unicode(u, encoding) ⇒ String
Escape ascii and unicode characters.
-
.escape_utf16(u) ⇒ String
deprecated
Deprecated.
use escape_uchar, this name is non-intuitive
-
.escape_utf32(u) ⇒ String
deprecated
Deprecated.
use escape_uchar, this name is non-intuitive
-
.serialize(value) ⇒ String
Returns the serialized N-Triples representation of the given RDF value.
Instance Method Summary collapse
-
#format_literal(literal, **options) ⇒ String
Returns the N-Triples representation of a literal.
-
#format_node(node, unique_bnodes: false, **options) ⇒ String
Returns the N-Triples representation of a blank node.
-
#format_statement(statement, **options) ⇒ String
Returns the N-Triples representation of a statement.
-
#format_triple(subject, predicate, object, **options) ⇒ String
Returns the N-Triples representation of a triple.
-
#format_tripleTerm(statement, **options) ⇒ String
Returns the N-Triples representation of an RDF 1.2 triple term.
-
#format_uri(uri, **options) ⇒ String
Returns the N-Triples representation of a URI reference using write encoding.
-
#initialize(output = $stdout, validate: true, **options) {|writer| ... } ⇒ Writer
constructor
Initializes the writer.
-
#write_comment(text)
Outputs an N-Triples comment line.
-
#write_prologue ⇒ self
abstract
Output VERSION directive, if specified and not canonicalizing.
-
#write_triple(subject, predicate, object)
Outputs the N-Triples representation of a triple.
Methods included from Util::Logger
#log_debug, #log_depth, #log_error, #log_fatal, #log_info, #log_recover, #log_recovering?, #log_statistics, #log_warn, #logger
Methods inherited from Writer
accept?, #base_uri, buffer, #canonicalize?, dump, each, #encoding, #flush, for, format, #format_list, #format_term, #node_id, open, options, #prefix, #prefixes, #prefixes=, #puts, #quoted, to_sym, #to_sym, #uri_for, #validate?, #version, #write_epilogue, #write_statement, #write_triples
Methods included from Util::Aliasing::LateBound
Methods included from Writable
#<<, #insert, #insert_graph, #insert_reader, #insert_statement, #insert_statements, #writable?
Methods included from Util::Coercions
Constructor Details
#initialize(output = $stdout, validate: true, **options) {|writer| ... } ⇒ Writer
Initializes the writer.
211 212 213 |
# File 'lib/rdf/ntriples/writer.rb', line 211 def initialize(output = $stdout, validate: true, **, &block) super end |
Class Method Details
.escape(string, encoding = nil) ⇒ String
Escape Literal and URI content. If encoding is ASCII, all unicode is escaped, otherwise only ASCII characters that must be escaped are escaped.
57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 |
# File 'lib/rdf/ntriples/writer.rb', line 57 def self.escape(string, encoding = nil) ret = case when string.match?(ESCAPE_PLAIN) # a shortcut for the simple case string when string.ascii_only? StringIO.open do |buffer| buffer.set_encoding(Encoding::ASCII) string.each_byte { |u| buffer << escape_ascii(u, encoding) } buffer.string end when encoding && encoding != Encoding::ASCII # Not encoding UTF-8 characters StringIO.open do |buffer| buffer.set_encoding(encoding) string.each_char do |u| buffer << case u.ord when (0x00..0x7F) escape_ascii(u, encoding) when (0xFFFE..0xFFFF) # NOT A CHARACTER # @see https://corp.unicode.org/~asmus/proposed_faq/private_use.html#history1 escape_uchar(u) else u end end buffer.string end else # Encode ASCII && UTF-8 characters StringIO.open do |buffer| buffer.set_encoding(Encoding::ASCII) string.each_codepoint { |u| buffer << escape_unicode(u, encoding) } buffer.string end end encoding ? ret.encode(encoding) : ret end |
.escape_ascii(u, encoding) ⇒ String
Standard ASCII escape sequences. If encoding is ASCII, use Test-Cases sequences, otherwise, assume the test-cases escape sequences. Otherwise, the N-Triples recommendation includes \b
and \f
escape sequences.
Within STRING_LITERAL_QUOTE, only the characters U+0022
, U+005C
, U+000A
, U+000D
are encoded using ECHAR
. ECHAR
must not be used for characters that are allowed directly in STRING_LITERAL_QUOTE.
128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 |
# File 'lib/rdf/ntriples/writer.rb', line 128 def self.escape_ascii(u, encoding) case (u = u.ord) when (0x08) then "\\b" when (0x09) then "\\t" when (0x0A) then "\\n" when (0x0C) then "\\f" when (0x0D) then "\\r" when (0x22) then "\\\"" when (0x5C) then "\\\\" when (0x00..0x1F) then escape_uchar(u) when (0x7F) then escape_uchar(u) # DEL when (0x20..0x7E) then u.chr else raise ArgumentError.new("expected an ASCII character in (0x00..0x7F), but got 0x#{u.to_s(16)}") end end |
.escape_uchar(u) ⇒ String
150 151 152 153 154 155 156 157 158 |
# File 'lib/rdf/ntriples/writer.rb', line 150 def self.escape_uchar(u) #require 'byebug'; byebug case u.ord when (0x00..0xFFFF) sprintf("\\u%04X", u.ord) else sprintf("\\U%08X", u.ord) end end |
.escape_unicode(u, encoding) ⇒ String
Escape ascii and unicode characters. If encoding is UTF_8, only ascii characters are escaped.
105 106 107 108 109 110 111 112 113 114 |
# File 'lib/rdf/ntriples/writer.rb', line 105 def self.escape_unicode(u, encoding) case (u = u.ord) when (0x00..0x7F) # ECHAR escape_ascii(u, encoding) when (0x80...0x10FFFF) # UCHAR escape_uchar(u) else raise ArgumentError.new("expected a Unicode codepoint in (0x00..0x10FFFF), but got 0x#{u.to_s(16)}") end end |
.escape_utf16(u) ⇒ String
use escape_uchar, this name is non-intuitive
165 166 167 |
# File 'lib/rdf/ntriples/writer.rb', line 165 def self.escape_utf16(u) sprintf("\\u%04X", u.ord) end |
.escape_utf32(u) ⇒ String
use escape_uchar, this name is non-intuitive
174 175 176 |
# File 'lib/rdf/ntriples/writer.rb', line 174 def self.escape_utf32(u) sprintf("\\U%08X", u.ord) end |
.serialize(value) ⇒ String
Returns the serialized N-Triples representation of the given RDF value.
185 186 187 188 189 190 191 192 193 194 195 196 197 |
# File 'lib/rdf/ntriples/writer.rb', line 185 def self.serialize(value) writer = (@serialize_writer_memo ||= self.new) case value when nil then nil when FalseClass then value.to_s when RDF::Statement writer.format_statement(value) + "\n" when RDF::Term writer.format_term(value) else raise ArgumentError, "expected an RDF::Statement or RDF::Term, but got #{value.inspect}" end end |
Instance Method Details
#format_literal(literal, **options) ⇒ String
Returns the N-Triples representation of a literal.
338 339 340 341 342 343 344 345 346 347 348 349 350 |
# File 'lib/rdf/ntriples/writer.rb', line 338 def format_literal(literal, **) case literal when RDF::Literal # Note, escaping here is more robust than in Term text = quoted(escaped(literal.value)) text << "@#{literal.language}" if literal.language? text << "--#{literal.direction}" if literal.direction? text << "^^<#{uri_for(literal.datatype)}>" if literal.datatype? text else quoted(escaped(literal.to_s)) end end |
#format_node(node, unique_bnodes: false, **options) ⇒ String
Returns the N-Triples representation of a blank node.
285 286 287 |
# File 'lib/rdf/ntriples/writer.rb', line 285 def format_node(node, unique_bnodes: false, **) unique_bnodes ? node.to_unique_base : node.to_s end |
#format_statement(statement, **options) ⇒ String
Returns the N-Triples representation of a statement.
251 252 253 |
# File 'lib/rdf/ntriples/writer.rb', line 251 def format_statement(statement, **) format_triple(*statement.to_triple, **) end |
#format_triple(subject, predicate, object, **options) ⇒ String
Returns the N-Triples representation of a triple.
273 274 275 |
# File 'lib/rdf/ntriples/writer.rb', line 273 def format_triple(subject, predicate, object, **) "%s %s %s ." % [subject, predicate, object].map { |value| format_term(value, **) } end |
#format_tripleTerm(statement, **options) ⇒ String
Returns the N-Triples representation of an RDF 1.2 triple term.
261 262 263 |
# File 'lib/rdf/ntriples/writer.rb', line 261 def format_tripleTerm(statement, **) "<<( %s %s %s )>>" % statement.to_a.map { |value| format_term(value, **) } end |
#format_uri(uri, **options) ⇒ String
Returns the N-Triples representation of a URI reference using write encoding.
295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 |
# File 'lib/rdf/ntriples/writer.rb', line 295 def format_uri(uri, **) string = uri.to_s iriref = case when string.match?(ESCAPE_PLAIN_U) # a shortcut for the simple case string when string.ascii_only? || (encoding && encoding != Encoding::ASCII) StringIO.open do |buffer| buffer.set_encoding(encoding) string.each_char do |u| buffer << case u.ord when (0x00..0x20) then self.class.escape_uchar(u) when 0x22, 0x3c, 0x3e, 0x5c, 0x5e, 0x60, 0x7b, 0x7c, 0x7d # "<>\^`{|} self.class.escape_uchar(u) else u end end buffer.string end else # Encode ASCII && UTF-8/16 characters StringIO.open do |buffer| buffer.set_encoding(Encoding::ASCII) string.each_byte do |u| buffer << case u when (0x00..0x20) then self.class.escape_uchar(u) when 0x22, 0x3c, 0x3e, 0x5c, 0x5e, 0x60, 0x7b, 0x7c, 0x7d # "<>\^`{|} self.class.escape_uchar(u) when (0x80..0x10FFFF) then self.class.escape_uchar(u) else u end end buffer.string end end encoding ? "<#{iriref}>".encode(encoding) : "<#{iriref}>" end |
#write_comment(text)
This method returns an undefined value.
Outputs an N-Triples comment line.
230 231 232 |
# File 'lib/rdf/ntriples/writer.rb', line 230 def write_comment(text) puts "# #{text.chomp}" # TODO: correctly output multi-line comments end |
#write_prologue ⇒ self
Output VERSION directive, if specified and not canonicalizing
219 220 221 222 223 |
# File 'lib/rdf/ntriples/writer.rb', line 219 def write_prologue puts %(VERSION #{version.inspect}) if version && !canonicalize? @logged_errors_at_prolog = log_statistics[:error].to_i super end |
#write_triple(subject, predicate, object)
This method returns an undefined value.
Outputs the N-Triples representation of a triple.
241 242 243 |
# File 'lib/rdf/ntriples/writer.rb', line 241 def write_triple(subject, predicate, object) puts format_triple(subject, predicate, object, **@options) end |