• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

ruby-rdf / rdf-isomorphic / 7451942624

08 Jan 2024 06:59PM UTC coverage: 88.764%. Remained the same
7451942624

push

github

gkellogg
* CI on Ruby 3.3.
* Reference release notes on GitHub.

79 of 89 relevant lines covered (88.76%)

1120.43 hits per line

Source File
Press 'n' to go to next uncovered line, 'b' for previous

88.76
/lib/rdf/isomorphic.rb
1
require 'digest/sha1'
2✔
2
require 'rdf'
2✔
3
require 'rdf/ntriples'
2✔
4

5

6
module RDF
2✔
7
  ##
8
  # Isomorphism for rdf.rb Enumerables
9
  #
10
  # RDF::Isomorphic provides the functions isomorphic_with and bijection_to for RDF::Enumerable.
11
  #
12
  # @see https://ruby-rdf.github.io/rdf/
13
  # @see https://www.hpl.hp.com/techreports/2001/HPL-2001-293.pdf
14
  module Isomorphic
2✔
15
    autoload :VERSION,        'rdf/isomorphic/version'
2✔
16

17
    # Returns `true` if this RDF::Enumerable is isomorphic with another.
18
    #
19
    # @param canonicalize [Boolean] (false)
20
    #   If `true`, RDF::Literals will be canonicalized while producing a bijection.  This results in broader matches for isomorphism in the case of equivalent literals with different representations.
21
    # @param opts [Hash<Symbol => Any>] other options ignored
22
    # @param other [RDF::Enumerable]
23
    # @return [Boolean]
24
    # @example
25
    #     repository_a.isomorphic_with repository_b #=> true
26
    def isomorphic_with?(other, canonicalize: false, **opts)
2✔
27
      !(bijection_to(other, canonicalize: false, **opts).nil?)
48✔
28
    end
29

30
    alias_method :isomorphic?, :isomorphic_with?
2✔
31

32

33
    # Returns a hash of RDF:Nodes: RDF::Nodes representing an isomorphic
34
    # bijection of this RDF::Enumerable's to another RDF::Enumerable's blank
35
    # nodes, or nil if a bijection cannot be found.
36
    #
37
    # Takes a canonicalize: true argument.  If true, RDF::Literals will be
38
    # canonicalized while producing a bijection.  This results in broader
39
    # matches for isomorphism in the case of equivalent literals with different
40
    # representations.
41
    #
42
    # @example
43
    #     repository_a.bijection_to repository_b
44
    # @param other [RDF::Enumerable]
45
    # @param canonicalize [Boolean] (false)
46
    #   If true, RDF::Literals will be canonicalized while producing a bijection.  This results in broader matches for isomorphism in the case of equivalent literals with different representations.
47
    # @param opts [Hash<Symbol => Any>] other options ignored
48
    # @return [Hash, nil]
49
    def bijection_to(other, canonicalize: false, **opts)
2✔
50

51
      grounded_stmts_match = (count == other.count)
48✔
52

53
      grounded_stmts_match &&= each_statement.all? do | stmt |
48✔
54
        stmt.node? || other.has_statement?(stmt)
374✔
55
      end
56

57
      if grounded_stmts_match
48✔
58
        # blank_stmts and other_blank_stmts are just a performance
59
        # consideration--we could just as well pass in self and other.  But we
60
        # will be iterating over this list quite a bit during the algorithm, so
61
        # we break it down to the parts we're interested in.
62
        blank_stmts = find_all { |statement| statement.node? }
416✔
63
        other_blank_stmts = other.find_all { |statement| statement.node? }
416✔
64

65
        nodes = RDF::Isomorphic.blank_nodes_in(blank_stmts)
44✔
66
        other_nodes = RDF::Isomorphic.blank_nodes_in(other_blank_stmts)
44✔
67
        build_bijection_to blank_stmts, nodes, other_blank_stmts, other_nodes,
44✔
68
          these_grounded_hashes: {},
69
          other_grounded_hashes: {},
70
          canonicalize: false
71
      else
72
        nil
73
      end
74

75
    end
76

77
    private
2✔
78

79
    # The main recursive bijection algorithm.
80
    #
81
    # This algorithm is very similar to the one explained by Jeremy Carroll in
82
    # https://www.hpl.hp.com/techreports/2001/HPL-2001-293.pdf. Page 12 has the
83
    # relevant pseudocode.
84
    #
85
    # Many more comments are in the method itself.
86
    #
87
    # @param [RDF::Enumerable]  anon_stmts
88
    # @param [Array]            nodes
89
    # @param [RDF::Enumerable]  other_anon_stmts
90
    # @param [Array]            other_nodes
91
    # @param [Hash]             these_grounded_hashes
92
    # @param [Hash]             other_grounded_hashes
93
    # @param canonicalize [Boolean] (false)
94
    #   If true, RDF::Literals will be canonicalized while producing a bijection.  This results in broader matches for isomorphism in the case of equivalent literals with different representations.
95
    # @return [nil,Hash]
96
    # @private
97
    def build_bijection_to(anon_stmts, nodes, other_anon_stmts, other_nodes,
2✔
98
                           these_grounded_hashes: {},
99
                           other_grounded_hashes: {},
100
                           canonicalize: false)
101

102
      # Create a hash signature of every node, based on the signature of
103
      # statements it exists in.  
104
      # We also save hashes of nodes that cannot be reliably known; we will use
105
      # that information to eliminate possible recursion combinations.
106
      # 
107
      # Any mappings given in the method parameters are considered grounded.
108
      these_hashes, these_ungrounded_hashes = RDF::Isomorphic.hash_nodes(anon_stmts, nodes, these_grounded_hashes, canonicalize: canonicalize)
44✔
109
      other_hashes, other_ungrounded_hashes = RDF::Isomorphic.hash_nodes(other_anon_stmts, other_nodes, other_grounded_hashes, canonicalize: canonicalize)
44✔
110

111
      # Grounded hashes are built at the same rate between the two graphs (if
112
      # they are isomorphic).  If there exists a grounded node in one that is
113
      # not in the other, we can just return.  Ungrounded nodes might still
114
      # conflict, so we don't check them.  This is a little bit messy in the
115
      # middle of the method, and probably slows down isomorphic checks,  but
116
      # prevents almost-isomorphic cases from getting nutty.
117
      return nil if these_hashes.values.any? { |hash| !(other_hashes.values.member?(hash)) }
142✔
118
      return nil if other_hashes.values.any? { |hash| !(these_hashes.values.member?(hash)) }
88✔
119

120
      # Using the created hashes, map nodes to other_nodes
121
      # Ungrounded hashes will also be equal, but we keep the distinction
122
      # around for when we recurse later (we only recurse on ungrounded nodes)
123
      bijection = {}
26✔
124
      nodes.each do | node |
26✔
125
        other_node, _ = other_ungrounded_hashes.find do | other_node, other_hash |
62✔
126
          # we need to use eql?, as coincedentally-named bnode identifiers are == in rdf.rb
127
          these_ungrounded_hashes[node].eql? other_hash
162✔
128
        end
129
        next unless other_node
62✔
130
        bijection[node] = other_node
62✔
131

132
        # Deletion is required to keep counts even; two nodes with identical
133
        # signatures can biject to each other at random.
134
        other_ungrounded_hashes.delete other_node
62✔
135
      end
136

137
      # bijection is now a mapping of nodes to other_nodes.  If all are
138
      # accounted for on both sides, we have a bijection.
139
      #
140
      # If not, we will speculatively mark pairs with matching ungrounded
141
      # hashes as bijected and recurse.
142
      unless (bijection.keys.sort == nodes.sort) && (bijection.values.sort == other_nodes.sort)
26✔
143
        bijection = nil
×
144
        nodes.any? do | node |
×
145

146
          # We don't replace grounded nodes' hashes
147
          next if these_hashes.member? node
×
148
          other_nodes.any? do | other_node |
×
149

150
            # We don't replace grounded other_nodes' hashes
151
            next if other_hashes.member? other_node
×
152

153
            # The ungrounded signature must match for this to potentially work
154
            next unless these_ungrounded_hashes[node] == other_ungrounded_hashes[other_node]
×
155

156
            hash = Digest::SHA1.hexdigest(node.to_s)
×
157
            bijection = build_bijection_to(anon_stmts, nodes,
×
158
                         other_anon_stmts, other_nodes,
159
                         these_grounded_hashes: these_hashes.merge( node => hash),
160
                         other_grounded_hashes: other_hashes.merge(other_node => hash),
161
                         canonicalize: canonicalize)
162
          end
163
          bijection
×
164
        end
165
      end
166

167
      bijection
26✔
168
    end
169

170
    # Blank nodes appearing in given list of statements
171
    # @private
172
    # @param [Array<RDF::Statement>] blank_stmt_list
173
    # @return [Array<RDF::Node>]
174
    def self.blank_nodes_in(blank_stmt_list)
2✔
175
      blank_stmt_list.map {|statement | statement.terms.select(&:node?)}.flatten.uniq
828✔
176
    end
177

178
    # Given a set of statements, create a mapping of node => SHA1 for a given
179
    # set of blank nodes.  grounded_hashes is a mapping of node => SHA1 pairs
180
    # that we will take as a given, and use those to make more specific
181
    # signatures of other nodes.  
182
    #
183
    # Returns a tuple of hashes:  one of grounded hashes, and one of all
184
    # hashes.  grounded hashes are based on non-blank nodes and grounded blank
185
    # nodes, and can be used to determine if a node's signature matches
186
    # another.
187
    #
188
    # @param [Array] statements 
189
    # @param [Array] nodes
190
    # @param [Hash] grounded_hashes
191
    # @private
192
    # @return [Hash, Hash]
193
    def self.hash_nodes(statements, nodes, grounded_hashes, canonicalize: false)
2✔
194
      hashes = grounded_hashes.dup
88✔
195
      ungrounded_hashes = {}
88✔
196
      hash_needed = true
88✔
197

198
      # We may have to go over the list multiple times.  If a node is marked as
199
      # grounded, other nodes can then use it to decide their own state of
200
      # grounded.
201
      while hash_needed
88✔
202
        starting_grounded_nodes = hashes.size
178✔
203
        nodes.each do | node |
178✔
204
          unless hashes.member? node
1,074✔
205
            grounded, hash = node_hash_for(node, statements, hashes, canonicalize: canonicalize)
762✔
206
            if grounded
762✔
207
              hashes[node] = hash
100✔
208
            end
209
            ungrounded_hashes[node] = hash
762✔
210
          end
211
        end
212
        # after going over the list, any nodes with a unique hash can be marked
213
        # as grounded, even if we have not tied them back to a root yet.
214
        uniques = {}
178✔
215
        ungrounded_hashes.each do |node, hash|
178✔
216
          uniques[hash] = uniques.has_key?(hash) ? false : node
1,074✔
217
        end
218
        uniques.each do |hash, node|
178✔
219
          hashes[node] = hash if node
602✔
220
        end
221
        hash_needed = starting_grounded_nodes != hashes.size
178✔
222
      end
223
      [hashes,ungrounded_hashes]
88✔
224
    end
225

226
    # Generate a hash for a node based on the signature of the statements it
227
    # appears in.  Signatures consist of grounded elements in statements
228
    # associated with a node, that is, anything but an ungrounded anonymous
229
    # node.  Creating the hash is simply hashing a sorted list of each
230
    # statement's signature, which is itself a concatenation of the string form
231
    # of all grounded elements.
232
    #
233
    # Nodes other than the given node are considered grounded if they are a
234
    # member in the given hash.
235
    #
236
    # Returns a tuple consisting of grounded being true or false and the String
237
    # for the hash
238
    # @private
239
    # @param [RDF::Node] node
240
    # @param [Array<RDF::Statement>] statements
241
    # @param [Hash] hashes
242
    # @param [Boolean] canonicalize
243
    # @return [Boolean, String]
244
    def self.node_hash_for(node, statements, hashes, canonicalize:)
2✔
245
      statement_signatures = []
762✔
246
      grounded = true
762✔
247
      statements.each do | statement |
762✔
248
        if statement.terms.include?(node)
50,490✔
249
          statement_signatures << hash_string_for(statement, hashes, node, canonicalize: canonicalize)
1,978✔
250
          statement.terms.each do | resource |
1,978✔
251
            grounded = false unless grounded?(resource, hashes) || resource == node
6,074✔
252
          end
253
        end
254
      end
255
      # Note that we sort the signatures--without a canonical ordering, 
256
      # we might get different hashes for equivalent nodes.
257
      [grounded,Digest::SHA1.hexdigest(statement_signatures.sort.to_s)]
762✔
258
    end
259

260
    # Provide a string signature for the given statement, collecting
261
    # string signatures for grounded node elements.
262
    # return [String]
263
    # @private
264
    def self.hash_string_for(statement, hashes, node, canonicalize:)
2✔
265
      statement.terms.map {|r| string_for_node(r, hashes, node, canonicalize: canonicalize)}.join("")
8,052✔
266
    end
267

268
    # Returns true if a given node is grounded
269
    # A node is groundd if it is not a blank node or it is included
270
    # in the given mapping of grounded nodes.
271
    # @return [Boolean]
272
    # @private
273
    def self.grounded?(node, hashes)
2✔
274
      (!(node.node?)) || (hashes.member? node)
6,074✔
275
    end
276

277
    # Provides a string for the given node for use in a string signature
278
    # Non-anonymous nodes will return their string form.  Grounded anonymous
279
    # nodes will return their hashed form.
280
    # @return [String]
281
    # @private
282
    def self.string_for_node(node, hashes,target, canonicalize:)
2✔
283
      case
284
        when node.nil?
6,074✔
285
          ""
×
286
        when node == target
287
          "itself"
2,008✔
288
        when node.node? && hashes.member?(node)
289
          hashes[node]
264✔
290
        when node.node?
291
          "a blank node"
940✔
292
        # RDF.rb auto-boxing magic makes some literals the same when they
293
        # should not be; the ntriples serializer will take care of us
294
        when node.literal?
295
          node.class.name + RDF::NTriples.serialize(canonicalize ? node.canonicalize : node)
616✔
296
        else
297
          node.to_s
2,246✔
298
      end
299
    end
300
  end
301

302

303
  # Extend RDF::Enumerables with these functions.
304
  module Enumerable
2✔
305
    include RDF::Isomorphic
2✔
306
  end
307
  class Enumerable::Enumerator
2✔
308
    include RDF::Isomorphic
2✔
309
  end
310
end
311

STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2025 Coveralls, Inc