Class: HexaPDF::Document::Metadata

Inherits:
Object
  • Object
show all
Defined in:
lib/hexapdf/document/metadata.rb

Overview

This class provides methods for reading and writing the document-level metadata.

When an instance is created (usually through HexaPDF::Document#metadata), the metadata is read from the document’s information dictionary (see HexaPDF::Type::Info) and made available through the various methods.

By default, the metadata is written to the information dictionary as well as to the document’s metadata stream (see HexaPDF::Type::Metadata) once the document is written. This can be controlled via the #write_info_dict and #write_metdata_stream methods.

While HexaPDF is able to write an XMP packet (using a limited form) to the document’s metadata stream, it provides no way for reading XMP metadata. If reading functionality or extended writing functionality is needed, make sure this class does not write the metadata and read/create the metadata stream yourself.

Caveats

  • Disabling writing to the information dictionary will only prevent parts from being written. The #producer is always written to the information dictionary as per the AGPL license terms. The #modification_date may be written depending on the arguments to HexaPDF::Document#write.

  • If writing the metadata stream is enabled, any existing metadata stream is completely overwritten. This means the metadata stream is not updated with the changed information.

Adding custom metadata properties

All the properties specified for the information dictionary are supported.

Furthermore, HexaPDF supports writing custom properties to the metadata stream. For this to work the used XMP namespaces need to be registered using #register_namespace. Additionally, the types of all used XMP properties need to be registered using #register_property.

The following types for XMP properties are supported:

String

Maps to the XMP simple string value. Values need to be of type String.

Date

Maps to the XMP simple string value, correctly formatted. Values need to be of type Time, Date, or DateTime

URI

Maps to the XMP simple value variant of URI. Values need to be of type String or URI.

Boolean

Maps to the XMP simple string value, correctly formatted. Values need to be either true or false.

OrderedArray

Maps to the XMP ordered array. Values need to be of type Array and items must be XMP simple values.

UnorderedArray

Maps to the XMP unordered array. Values need to be of type Array and items must be simple values.

LanguageArray

Maps to the XMP language alternatives array. Values need to be of type Array and items
must either be strings (they are associated with the set default language) or
LocalizedString instances.

See: PDF2.0 s14.3, www.adobe.com/products/xmp.html

Defined Under Namespace

Classes: LocalizedString

Constant Summary collapse

PREDEFINED_NAMESPACES =

Contains a mapping of predefined prefixes for XMP namespaces for metadata.

{
  "rdf" => "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
  "xmp" => "http://ns.adobe.com/xap/1.0/",
  "pdf" => "http://ns.adobe.com/pdf/1.3/",
  "dc"  => "http://purl.org/dc/elements/1.1/",
  "x"   => "adobe:ns:meta/",
}.freeze
PREDEFINED_PROPERTIES =

Contains a mapping of predefined XMP properties to their types, i.e. from namespace to property and then type.

{
  "http://ns.adobe.com/xap/1.0/" => {
    'CreatorTool' => 'String',
    'CreateDate' => 'Date',
    'ModifyDate' => 'Date',
  }.freeze,
  "http://ns.adobe.com/pdf/1.3/" => {
    'Keywords' => 'String',
    'Producer' => 'String',
    'Trapped' => 'Boolean',
  }.freeze,
  "http://purl.org/dc/elements/1.1/" => {
    'creator' => 'OrderedArray',
    'description' => 'LanguageArray',
    'title' => 'LanguageArray',
  }.freeze,
}.freeze

Instance Method Summary collapse

Constructor Details

#initialize(document) ⇒ Metadata

Creates a new Metadata object for the given PDF document.



148
149
150
151
152
153
154
155
156
157
158
# File 'lib/hexapdf/document/metadata.rb', line 148

def initialize(document)
  @document = document
  @namespaces = PREDEFINED_NAMESPACES.dup
  @properties = PREDEFINED_PROPERTIES.transform_values {|value| value.dup}
  @default_language = document.catalog[:Lang] || 'en'
  @metadata = Hash.new {|h, k| h[k] = {} }
  write_info_dict(true)
  (true)
  @document.register_listener(:complete_objects, &method(:write_metadata))
  
end

Instance Method Details

#author(value = :UNSET) ⇒ Object

:call-seq:

metadata.author           -> author or nil
metadata.author(value)    -> value

Returns the name of the person who created the document (author) if no argument is given. Otherwise sets the author to the given value.

The value nil is returned if the property ist not set. And by using nil as value the property is deleted from the metadata.

This metadata property is represented by the XMP name dc:creator.



269
270
271
# File 'lib/hexapdf/document/metadata.rb', line 269

def author(value = :UNSET)
  property('dc', 'creator', value)
end

#creation_date(value = :UNSET) ⇒ Object

:call-seq:

metadata.creation_date           -> creation_date or nil
metadata.creation_date(value)    -> value

Returns the date and time (a Time object) the document was created if no argument is given. Otherwise sets the creation date to the given value.

The value nil is returned if the property ist not set. And by using nil as value the property is deleted from the metadata.

This metadata property is represented by the XMP name xmp:CreateDate.



347
348
349
# File 'lib/hexapdf/document/metadata.rb', line 347

def creation_date(value = :UNSET)
  property('xmp', 'CreateDate', value)
end

#creator(value = :UNSET) ⇒ Object

:call-seq:

metadata.creator           -> creator or nil
metadata.creator(value)    -> value

Returns the name of the PDF processor that created the original document from which this PDF was converted if no argument is given. Otherwise sets the name of the creator tool to the given value.

The value nil is returned if the property ist not set. And by using nil as value the property is deleted from the metadata.

This metadata property is represented by the XMP name xmp:CreatorTool.



317
318
319
# File 'lib/hexapdf/document/metadata.rb', line 317

def creator(value = :UNSET)
  property('xmp', 'CreatorTool', value)
end

#default_language(value = :UNSET) ⇒ Object

:call-seq:

metadata.default_language          -> language
metadata.default_language(value)   -> value

Returns the default language in RFC3066 format used for unlocalized strings if no argument is given. Otherwise sets the default language to the given language.

The initial default lanuage is taken from the document catalog’s /Lang entry. If that is not set, the default language is assumed to be English (‘en’).



169
170
171
172
173
174
175
# File 'lib/hexapdf/document/metadata.rb', line 169

def default_language(value = :UNSET)
  if value == :UNSET
    @default_language
  else
    @default_language = value
  end
end

#keywords(value = :UNSET) ⇒ Object

:call-seq:

metadata.keywords           -> keywords or nil
metadata.keywords(value)    -> value

Returns the keywords associated with the document if no argument is given. Otherwise sets keywords to the given value.

The value nil is returned if the property ist not set. And by using nil as value the property is deleted from the metadata.

This metadata property is represented by the XMP name pdf:Keywords.



301
302
303
# File 'lib/hexapdf/document/metadata.rb', line 301

def keywords(value = :UNSET)
  property('pdf', 'Keywords', value)
end

#modification_date(value = :UNSET) ⇒ Object

:call-seq:

metadata.modification_date           -> modification_date or nil
metadata.modification_date(value)    -> value

Returns the date and time (a Time object) the document was most recently modified if no argument is given. Otherwise sets the modification date to the given value.

The value nil is returned if the property ist not set. And by using nil as value the property is deleted from the metadata.

This metadata property is represented by the XMP name xmp:ModifyDate.



362
363
364
# File 'lib/hexapdf/document/metadata.rb', line 362

def modification_date(value = :UNSET)
  property('xmp', 'ModifyDate', value)
end

#namespace(ns) ⇒ Object

Returns the namespace URI associated with the given prefix.



207
208
209
210
211
# File 'lib/hexapdf/document/metadata.rb', line 207

def namespace(ns)
  @namespaces.fetch(ns) do
    raise HexaPDF::Error, "Namespace prefix '#{ns}' not registered"
  end
end

#producer(value = :UNSET) ⇒ Object

:call-seq:

metadata.producer           -> producer or nil
metadata.producer(value)    -> value

Returns the name of the PDF processor that converted the original document to PDF if no argument is given. Otherwise sets the name of the producer to the given value.

The value nil is returned if the property ist not set. And by using nil as value the property is deleted from the metadata.

This metadata property is represented by the XMP name pdf:Producer.



332
333
334
# File 'lib/hexapdf/document/metadata.rb', line 332

def producer(value = :UNSET)
  property('pdf', 'Producer', value)
end

#property(ns, property, value = :UNSET) ⇒ Object

:call-seq:

metadata.property(ns_prefix, name)           -> property_value
metadata.property(ns_prefix, name, value)    -> value

Returns the value for the property specified via the namespace prefix ns_prefix and name if the value argument is not provided. Otherwise sets the property to value.

The value nil is returned if the property ist not set. And by using nil as value the property is deleted from the metadata.



230
231
232
233
234
235
236
237
238
239
# File 'lib/hexapdf/document/metadata.rb', line 230

def property(ns, property, value = :UNSET)
  ns = @metadata[namespace(ns)]
  if value == :UNSET
    ns[property]
  elsif value.nil?
    ns.delete(property)
  else
    ns[property] = value
  end
end

#register_namespace(prefix, uri) ⇒ Object

Registers the prefix for the given namespace uri.



202
203
204
# File 'lib/hexapdf/document/metadata.rb', line 202

def register_namespace(prefix, uri)
  @namespaces[prefix] = uri
end

#register_property_type(prefix, property, type) ⇒ Object

Registers the property for the namespace specified via prefix as the given type.

The argument type has to be one of the following: ‘String’, ‘Date’, ‘URI’, ‘Boolean’, ‘OrderedArray’, ‘UnorderedArray’, or ‘LanguageArray’.



217
218
219
# File 'lib/hexapdf/document/metadata.rb', line 217

def register_property_type(prefix, property, type)
  (@properties[namespace(prefix)] ||= {})[property] = type
end

#subject(value = :UNSET) ⇒ Object

:call-seq:

metadata.subject           -> subject or nil
metadata.subject(value)    -> value

Returns the subject of the document if no argument is given. Otherwise sets the subject to the given value.

The language for the subject is specified via #default_language.

The value nil is returned if the property ist not set. And by using nil as value the property is deleted from the metadata.

This metadata property is represented by the XMP name dc:description.



286
287
288
# File 'lib/hexapdf/document/metadata.rb', line 286

def subject(value = :UNSET)
  property('dc', 'description', value)
end

#title(value = :UNSET) ⇒ Object

:call-seq:

metadata.title          -> title or nil
metadata.title(value    -> value

Returns the document’s title if no argument is given. Otherwise sets the document’s title to the given value.

The language for the title is specified via #default_language.

The value nil is returned if the property is not set. And by using nil as value the property is deleted from the metadata.

This metadata property is represented by the XMP name dc:title.



254
255
256
# File 'lib/hexapdf/document/metadata.rb', line 254

def title(value = :UNSET)
  property('dc', 'title', value)
end

#trapped(value = :UNSET) ⇒ Object

:call-seq:

metadata.trapped           -> trapped or nil
metadata.trapped(value)    -> value

Returns true if the document has been modified to include trapping information if no argument is given. Otherwise sets the trapped status to the given boolean value.

The value nil is returned if the property ist not set. And by using nil as value the property is deleted from the metadata.

This metadata property is represented by the XMP name pdf:Trapped.



377
378
379
# File 'lib/hexapdf/document/metadata.rb', line 377

def trapped(value = :UNSET)
  property('pdf', 'Trapped', value)
end

#write_info_dict(value) ⇒ Object

Makes HexaPDF write the information dictionary if value is true.

See the class documentation for caveats.



185
186
187
# File 'lib/hexapdf/document/metadata.rb', line 185

def write_info_dict(value)
  @write_info_dict = value
end

#write_info_dict?Boolean

Returns true if the information dictionary should be written.

Returns:

  • (Boolean)


178
179
180
# File 'lib/hexapdf/document/metadata.rb', line 178

def write_info_dict?
  @write_info_dict
end

#write_metadata_stream(value) ⇒ Object

Makes HexaPDF write the metadata stream if value is true.

See the class documentation for caveats.



197
198
199
# File 'lib/hexapdf/document/metadata.rb', line 197

def (value)
  @write_metadata_stream = value
end

#write_metadata_stream?Boolean

Returns true if the metadata stream should be written.

Returns:

  • (Boolean)


190
191
192
# File 'lib/hexapdf/document/metadata.rb', line 190

def 
  @write_metadata_stream
end