Module documentation

Main module

For most uses the main module datapackage.datapackage which includes the DataPackage class is what users will need to know about. The class can also be imported directly from datapackage:

from datapackage import DataPackage
class datapackage.datapackage.DataPackage(*args, **kwargs)[source]

Package for loading and managing a data package as defined by: http://www.dataprotocols.org/en/latest/data-packages.html

add_license(license_type, url=None)[source]

Adds a license to the list of licenses for the datapackage.

Parameters:
  • license_type (string) – The name of the license, which should be an Open Definition license ID if an ID exists for the license and otherwise may be the general license name or identifier.
  • url (string) – The URL corresponding to the license. If license_type is a standard Open Definition license, then the URL will try to be inferred automatically.
add_source(name, web=None, email=None)[source]

Adds a source to the list of sources for this datapackage.

Parameters:
  • name (string) – The human-readable name of the source.
  • web (string) – A URL pointing to the source.
  • email (string) – An email address for the contact of the source.
as_dict()[source]

Override base to deal with resources.

bump_major_version(keep_metadata=False)[source]

Increases the major version by one, e.g. 1.0.0 –> 2.0.0

Note that this sets the minor and patch versions to zero, and erases the prerelease and metadata information (unless keep_metadata is True, in which case the metadata will be preserved).

bump_minor_version(keep_metadata=False)[source]

Increases the minor version by one, e.g. 1.0.0 –> 1.1.0

Note that this sets the patch version to zero, and erases the prerelease and metadata information (unless keep_metadata is True, in which case the metadata will be preserved).

bump_patch_version(keep_metadata=False)[source]

Increases the patch version by one, e.g. 1.0.0 –> 1.0.1

Note that this erases the prerelease and metadata information (unless keep_metadata is True, in which case the metadata will be preserved).

contributors[source]

List of contributors as a Person object

From specification: Array of hashes each containing the details of a contributor. Must contain a “name” property and MAY contain an email and web property. By convention, the first contributor is the original author of the package.

data[source]

An iterator that returns dictionary representation of the rows in all resources.

datapackage_version[source]

The version of the data package specification this datapackage.json conforms to. It should follow the Semantic Versioning requirements (http://semver.org/).

description[source]

The description of the dataset as described by its descriptor.

get_data(resource)[source]

Generator that yields the data for a given resource.

get_descriptor()[source]

Get the descriptor for the data package (as defined by the standard) as a dictionary. This uses the URI provided by the constructor and performs a join with the descriptor URN. This follows the join rules of urlparse.urljoin which means for URLs that if the URI does not end with a slash the last piece of the URI will be replaced with the descriptor URN.

get_resources()[source]

Get the data package’s resources as a dictionary. The key for each resource is the value of its name attribute. If no name is provided then the key is an empty string. This means that resources can be overwritten if they have the same (or no name).

homepage[source]

URL string for the data packages web site

image[source]

A link to an image to use for this data package.

keywords[source]

An array of string keywords to assist users searching for the package in catalogs.

license[source]

MUST be a string and its value SHOULD be an Open Definition license ID (preferably one that is Open Definition approved).

licenses[source]

MUST be an array. Each entry MUST be a hash with a type and a url property linking to the actual text. The type SHOULD be an Open Definition license ID if an ID exists for the license and otherwise may be the general license name or identifier.

maintainers[source]

List of maintainers as a Person object

From specification: Array of maintainers of the package. Each maintainer is a hash which must have a “name” property and may optionally provide “email” and “web” properties.

name[source]

The name of the dataset as described by its descriptor. This is a required property, described by the datapackage protocol as follows:

short url-usable (and preferably human-readable) name of the package. This MUST be lower-case and contain only alphanumeric characters along with ”.”, “_” or “-” characters. It will function as a unique identifier and therefore SHOULD be unique in relation to any registry in which this package will be deposited (and preferably globally unique).

The name SHOULD be invariant, meaning that it SHOULD NOT change when a data package is updated, unless the new package version should be considered a distinct package, e.g. due to significant changes in structure or interpretation. Version distinction SHOULD be left to the version field. As a corollary, the name also SHOULD NOT include an indication of time range covered.

publisher[source]

List of publishers as a Person object which behaves just like contributors.

resources[source]

List of Resource instances representing the contents of the package

From the specification: [A] JSON array of hashes that describe the contents of the package.

sources[source]

An array of source hashes. Each source hash may have name, web and email fields.

Defaults to an empty list.

title[source]

The title of the dataset as described by its descriptor.

version[source]

A version string identifying the version of the package. It should conform to the Semantic Versioning requirements (http://semver.org/).

Defaults to 0.0.1 if not specified.

Helper classes

To help with creation of data packages datapackage includes a few helper classes which aid construction of data package hashes. These come in five modules: datapackage.resource (includes Resource class which can also be imported directly from datapackage just like DataPackage), datapackage.schema, datapackage.licenses, datapackage.sources and datapackage.persons.

Each of these includes various amount of helper classes. datapackage.schema is notable for containing most helper classes (since constructing a schema requires quite a lot of hashes).

class datapackage.schema.Constraints(*args, **kwargs)[source]

Constraints object which can be added to a field in a resource schema in order to represent the constraints put on that particular field.

class datapackage.schema.Field(*args, **kwargs)[source]

Field object for adding fields to a resource schema.

Currently this is built around the Tabular Data Package.

class datapackage.schema.ForeignKey(*args, **kwargs)[source]

ForeignKey object which can be added to a resource schema object to represent a foreign key in another data package.

class datapackage.schema.Reference(*args, **kwargs)[source]

Reference object which can be added to a ForeignKey object to represent the reference to the other datapackage.

class datapackage.schema.Schema(*args, **kwargs)[source]

Schema object which holds the representation of the schema for a Tabular Data Package (using the JSON Table Schema protocol). The schema can be used just like a dictionary which means it is ready for json serialization and export as part of a data package descriptor (when added to a resource).

add_field(field)[source]

Adds a field to the resource schema

Parameters:field (~Field) – A Field instance containing the field to be appended to the schema.
add_fields(fields)[source]

Adds fields to the resource schema

Parameters:fields (list) – A list of Field instances which should be appended (extend) to the resource schema fields.
add_foreign_key(foreign_key)[source]

Adds a foreign key to the resource schema.

Parameters:foreign_key (~ForeignKey) – A ForeignKey object which keeps track of a foreign key relationship to another data package.
add_foreign_keys(foreign_keys)[source]

Adds foreign keys to the resource schema

Parameters:foreign_keys (list) – A list of ForeignKey instances which should be appended (extend) to the resource schema fields or create a foreignKeys attribute if it doesn’t exist.
class datapackage.licenses.License(*args, **kwargs)[source]

License object which can be added to a DataPackage’s license or licenses array or a Resource’s licensese array.

From the specification: “[E]ntry MUST be a hash with a type. The type SHOULD be an Open Definition license ID if an ID exists for the license. If another license name or identifier is used as type then the url property MUST link to the actual license text. The url property MAY be specified when used in combination with an Open Definition license ID.

type[source]

The type should be an Open Definition license ID but can be any string which then has to be combined with a link

url[source]

Link to the license text. This must be provided if the type is not an Open Definition license ID and should then link to the actual license text.

class datapackage.sources.Source(*args, **kwargs)[source]

Source object which can be added to a DataPackage object or a Resource object to represent where the data comes from.

From the specification: Each source hash may have name, web and email fields.

email[source]

Email address to the source of the data (person, organisation etc.)

web[source]

Link to the source of the data on the web

class datapackage.persons.Person(*args, **kwargs)[source]

Person object which can be added to a DataPackage object, e.g. as maintainers, contributors or publishers. Person could in theory also be an organisation but is left here as a Person.

From the specification: [A] hash which must have a “name” property and may optionally provide “email” and “web” properties.

email[source]

Email address of the person or organisation.

name[source]

Name of the person or organisation

web[source]

Link to the person’s or organisation’s website

Helper module

datapackage includes a utils helper module datapackage.util which provides a few helper methods to make working with data packages (and life in general) easier.

class datapackage.util.SemanticVersion

SemanticVersion(major, minor, patch, prerelease, metadata)

major

Alias for field number 0

metadata

Alias for field number 4

minor

Alias for field number 1

patch

Alias for field number 2

prerelease

Alias for field number 3

datapackage.util.format_version(version)[source]

Formats a semantic version given by a tuple with:

(major, minor, patch, prerelease, metadata)

where prerelease and metadata may be None.

datapackage.util.is_email(val)[source]

Checks to see whether a string is a valid email address. Email addresses can actually be complicated, so this just performs the minimal check that there is <something>@<something>.<something>

datapackage.util.is_local(path)[source]

Checks whether a path is a local path, or a remote URL. This simple check just looks if there is a scheme or netloc associated with the path (and will therefore return False when the path uses the file: scheme)

datapackage.util.is_mimetype(val)[source]

Checks to see whether a string is a valid mimetype. This is a very basic check that just looks for <something>/<something>.

datapackage.util.is_url(path)[source]

Checks whether a path is a valid http or https URL. This simple check just looks if the scheme is HTTP or HTTPS.

datapackage.util.load_licenses()[source]

Reads a dictionary of licenses, and their corresponding URLs, out of a JSON file.

datapackage.util.parse_version(version)[source]

Parse a version string according to semantic versioning.

datapackage.util.verify_version(version)[source]

Verifies that a version string follows semantic versioning. If it passes, this will just return the version string; if it fails, an exception will be raised.