Class: HexaPDF::Revisions
- Inherits:
-
Object
- Object
- HexaPDF::Revisions
- Includes:
- Enumerable
- Defined in:
- lib/hexapdf/revisions.rb
Overview
Manages the revisions of a PDF document.
A PDF document has one revision when it is created. Later, new revisions are added when changes are made. This allows for adding information/content to a PDF file without changing the original content.
The order of the revisions is important. In HexaPDF the oldest revision always has index 0 and the newest revision the highest index. This is also the order in which the revisions get written.
Important: It is possible to manipulate the individual revisions and their objects oneself but this should only be done if one is familiar with the inner workings of HexaPDF. Otherwise it is best to use the convenience methods of this class to create, access or delete indirect objects.
See: PDF1.7 s7.5.6, HexaPDF::Revision
Instance Attribute Summary collapse
-
#parser ⇒ Object
readonly
The Parser instance used for reading the initial revisions.
Class Method Summary collapse
-
.from_io(document, io) ⇒ Object
Loads all revisions for the document from the given IO and returns the created Revisions object.
Instance Method Summary collapse
-
#add ⇒ Object
Adds a new empty revision to the document and returns it.
-
#add_object(obj) ⇒ Object
:call-seq: revisions.add_object(object) -> object.
-
#all ⇒ Object
Returns a list of all revisions.
-
#current ⇒ Object
Returns the current revision.
-
#delete_object(ref) ⇒ Object
:call-seq: revisions.delete_object(ref) revisions.delete_object(oid).
-
#each(&block) ⇒ Object
:call-seq: revisions.each {|rev| block } -> revisions revisions.each -> Enumerator.
-
#each_object(only_current: true, only_loaded: false, &block) ⇒ Object
:call-seq: revisions.each_object(only_current: true, only_loaded: false) {|obj| block } -> revisions revisions.each_object(only_current: true, only_loaded: false) {|obj, rev| block } -> revisions revisions.each_object(only_current: true, only_loaded: false) -> Enumerator.
-
#initialize(document, initial_revisions: nil, parser: nil) ⇒ Revisions
constructor
Creates a new revisions object for the given PDF document.
-
#merge(range = 0..-1)) ⇒ Object
:call-seq: revisions.merge(range = 0..-1) -> revisions.
-
#next_oid ⇒ Object
Returns the next object identifier that should be used when adding a new object.
-
#object(ref) ⇒ Object
:call-seq: revisions.object(ref) -> obj or nil revisions.object(oid) -> obj or nil.
-
#object?(ref) ⇒ Boolean
:call-seq: revisions.object?(ref) -> true or false revisions.object?(oid) -> true or false.
Constructor Details
#initialize(document, initial_revisions: nil, parser: nil) ⇒ Revisions
Creates a new revisions object for the given PDF document.
Options:
- initial_revisions
-
An array of revisions that should initially be used. If this option is not specified, a single empty revision is added.
- parser
-
The parser with which the initial revisions were read. If this option is not specified even though the document was read from an IO stream, some parts may not work, like incremental writing.
127 128 129 130 131 132 133 134 135 136 137 |
# File 'lib/hexapdf/revisions.rb', line 127 def initialize(document, initial_revisions: nil, parser: nil) @document = document @parser = parser @revisions = [] if initial_revisions @revisions += initial_revisions else add end end |
Instance Attribute Details
#parser ⇒ Object (readonly)
The Parser instance used for reading the initial revisions.
113 114 115 |
# File 'lib/hexapdf/revisions.rb', line 113 def parser @parser end |
Class Method Details
.from_io(document, io) ⇒ Object
Loads all revisions for the document from the given IO and returns the created Revisions object.
If the io object is nil, an empty Revisions object is returned.
67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 |
# File 'lib/hexapdf/revisions.rb', line 67 def from_io(document, io) return new(document) if io.nil? parser = Parser.new(io, document) object_loader = lambda {|xref_entry| parser.load_object(xref_entry) } revisions = [] begin offset = parser.startxref_offset seen_xref_offsets = {} while offset && !seen_xref_offsets.key?(offset) # PDF1.7 s7.5.5 states that :Prev needs to be indirect, Adobe's reference 3.4.4 says it # should be direct. Adobe's POV is followed here. Same with :XRefStm. xref_section, trailer = parser.load_revision(offset) seen_xref_offsets[offset] = true stm = trailer[:XRefStm] if stm && !seen_xref_offsets.key?(stm) stm_xref_section, = parser.load_revision(stm) stm_xref_section.merge!(xref_section) xref_section = stm_xref_section seen_xref_offsets[stm] = true end revisions.unshift(Revision.new(document.wrap(trailer, type: :XXTrailer), xref_section: xref_section, loader: object_loader)) offset = trailer[:Prev] end rescue HexaPDF::MalformedPDFError reconstructed_revision = parser.reconstructed_revision unless revisions.empty? reconstructed_revision.trailer.data.value = revisions.last.trailer.data.value end revisions << reconstructed_revision end document.version = parser.file_header_version rescue '1.0' new(document, initial_revisions: revisions, parser: parser) end |
Instance Method Details
#add ⇒ Object
Adds a new empty revision to the document and returns it.
Note: This method should only be used if one is familiar with the inner workings of HexaPDF *and the PDF specification.
287 288 289 290 291 292 293 294 295 296 297 298 299 |
# File 'lib/hexapdf/revisions.rb', line 287 def add if @revisions.empty? trailer = {} else trailer = current.trailer.value.dup trailer.delete(:Prev) trailer.delete(:XRefStm) end rev = Revision.new(@document.wrap(trailer, type: :XXTrailer)) @revisions.push(rev) rev end |
#add_object(obj) ⇒ Object
:call-seq:
revisions.add_object(object) -> object
Adds the given HexaPDF::Object to the current revision and returns it.
If object is a direct object, an object number is automatically assigned.
185 186 187 188 189 190 191 192 193 194 195 196 197 |
# File 'lib/hexapdf/revisions.rb', line 185 def add_object(obj) if obj.indirect? && (rev_obj = current.object(obj.oid)) if rev_obj.data == obj.data return obj else raise HexaPDF::Error, "Can't add object because there is already " \ "an object with object number #{obj.oid}" end end obj.oid = next_oid unless obj.indirect? current.add(obj) end |
#all ⇒ Object
Returns a list of all revisions.
Note: This method should only be used if one is familiar with the inner workings of HexaPDF *and the PDF specification.
265 266 267 |
# File 'lib/hexapdf/revisions.rb', line 265 def all @revisions end |
#current ⇒ Object
Returns the current revision.
Note: This method should only be used if one is familiar with the inner workings of HexaPDF *and the PDF specification.
257 258 259 |
# File 'lib/hexapdf/revisions.rb', line 257 def current @revisions.last end |
#delete_object(ref) ⇒ Object
:call-seq:
revisions.delete_object(ref)
revisions.delete_object(oid)
Deletes the indirect object specified by an exact reference or by an object number.
204 205 206 207 208 209 210 211 |
# File 'lib/hexapdf/revisions.rb', line 204 def delete_object(ref) @revisions.reverse_each do |rev| if rev.object?(ref) rev.delete(ref) break end end end |
#each(&block) ⇒ Object
:call-seq:
revisions.each {|rev| block } -> revisions
revisions.each -> Enumerator
Iterates over all revisions from oldest to current one.
Note: This method should only be used if one is familiar with the inner workings of HexaPDF *and the PDF specification.
277 278 279 280 281 |
# File 'lib/hexapdf/revisions.rb', line 277 def each(&block) return to_enum(__method__) unless block_given? @revisions.each(&block) self end |
#each_object(only_current: true, only_loaded: false, &block) ⇒ Object
:call-seq:
revisions.each_object(only_current: true, only_loaded: false) {|obj| block } -> revisions
revisions.each_object(only_current: true, only_loaded: false) {|obj, rev| block } -> revisions
revisions.each_object(only_current: true, only_loaded: false) -> Enumerator
Yields every object and optionally the revision it is in.
If only_loaded is true, only the already loaded objects of the PDF document are yielded. This does only matter when the document instance was created from an existing PDF document.
By default, only the current version of each object is returned which implies that each object number is yielded exactly once. If the only_current option is false, all stored objects from newest to oldest are returned, not only the current version of each object.
The only_current option can make a difference because the document can contain multiple revisions:
-
Multiple revisions may contain objects with the same object and generation numbers, e.g. two (different) objects with oid/gen [3,0].
-
Additionally, there may also be objects with the same object number but different generation numbers in different revisions, e.g. one object with oid/gen [3,0] and one with oid/gen [3,1].
236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 |
# File 'lib/hexapdf/revisions.rb', line 236 def each_object(only_current: true, only_loaded: false, &block) unless block_given? return to_enum(__method__, only_current: only_current, only_loaded: only_loaded) end yield_rev = (block.arity == 2) oids = {} @revisions.reverse_each do |rev| rev.each(only_loaded: only_loaded) do |obj| next if only_current && oids.include?(obj.oid) yield_rev ? yield(obj, rev) : yield(obj) oids[obj.oid] = true end end self end |
#merge(range = 0..-1)) ⇒ Object
:call-seq:
revisions.merge(range = 0..-1) -> revisions
Merges the revisions specified by the given range into one. Objects from newer revisions overwrite those from older ones.
306 307 308 309 310 311 312 313 314 315 316 317 318 319 |
# File 'lib/hexapdf/revisions.rb', line 306 def merge(range = 0..-1) @revisions[range].reverse.each_cons(2) do |rev, prev_rev| prev_rev.trailer.value.replace(rev.trailer.value) rev.each do |obj| if obj.data != prev_rev.object(obj)&.data prev_rev.delete(obj.oid, mark_as_free: false) prev_rev.add(obj) end end end _first, *other = *@revisions[range] other.each {|rev| @revisions.delete(rev) } self end |
#next_oid ⇒ Object
Returns the next object identifier that should be used when adding a new object.
140 141 142 |
# File 'lib/hexapdf/revisions.rb', line 140 def next_oid @revisions.map(&:next_free_oid).max end |
#object(ref) ⇒ Object
:call-seq:
revisions.object(ref) -> obj or nil
revisions.object(oid) -> obj or nil
Returns the current version of the indirect object for the given exact reference or for the given object number.
For references to unknown objects, nil is returned but free objects are represented by a PDF Null object, not by nil!
See: PDF1.7 s7.3.9
155 156 157 158 159 160 161 162 163 164 |
# File 'lib/hexapdf/revisions.rb', line 155 def object(ref) i = @revisions.size - 1 while i >= 0 if (result = @revisions[i].object(ref)) return result end i -= 1 end nil end |
#object?(ref) ⇒ Boolean
:call-seq:
revisions.object?(ref) -> true or false
revisions.object?(oid) -> true or false
Returns true if one of the revisions contains an indirect object for the given exact reference or for the given object number.
Even though this method might return true for some references, #object may return nil because this method takes all revisions into account.
175 176 177 |
# File 'lib/hexapdf/revisions.rb', line 175 def object?(ref) @revisions.any? {|rev| rev.object?(ref) } end |