utils Package

utils Package

Various utilities for the distro-tracker project.

class distro_tracker.core.utils.PrettyPrintList(the_list=None, delimiter=' ')[source]

Bases: object

A class which wraps the built-in list object so that when it is converted to a string, its contents are printed using the given delimiter.

The default delimiter is a space.

>>> a = PrettyPrintList([1, 2, 3])
>>> print(a)
1 2 3
>>> print(PrettyPrintList([u'one', u'2', u'3']))
one 2 3
>>> print(PrettyPrintList([1, 2, 3], delimiter=', '))
1, 2, 3
>>> # Still acts as a list
>>> a == [1, 2, 3]
>>> a == ['1', '2', '3']
class distro_tracker.core.utils.SpaceDelimitedTextField(verbose_name=None, name=None, primary_key=False, max_length=None, unique=False, blank=False, null=False, db_index=False, rel=None, default=<class 'django.db.models.fields.NOT_PROVIDED'>, editable=True, serialize=True, unique_for_date=None, unique_for_month=None, unique_for_year=None, choices=None, help_text='', db_column=None, db_tablespace=None, auto_created=False, validators=(), error_messages=None)[source]

Bases: django.db.models.fields.TextField

A custom Django model field which stores a list of strings.

It stores the list in a TextField as a space delimited list. It is marshalled back to a PrettyPrintList in the Python domain.

description = 'Stores a space delimited list of strings'
from_db_value(value, expression, connection)[source]
get_db_prep_value(value, **kwargs)[source]

Return field’s value prepared for interacting with the database backend.

Used by the default implementations of get_db_prep_save().

get_prep_value(value, **kwargs)[source]

Perform preliminary non-db specific value checks and conversions.


Convert the input value into the expected Python data type, raising django.core.exceptions.ValidationError if the data can’t be converted. Return the converted value. Subclasses should override this.


Return a string value of this field from the passed obj. This is used by the serialization framework.

distro_tracker.core.utils.VCS_SHORTHAND_TO_NAME = {'bzr': 'Bazaar', 'cvs': 'CVS', 'darcs': 'Darcs', 'git': 'Git', 'hg': 'Mercurial', 'mtn': 'Monotone', 'svn': 'Subversion'}

A map of currently available VCS systems’ shorthands to their names.

distro_tracker.core.utils.add_developer_extras(general, url_only=False)[source]

Receives a general dict with package data and add to it more data regarding that package’s developers

distro_tracker.core.utils.distro_tracker_render_to_string(template_name, context=None)[source]

A custom function to render a template to a string which injects extra distro-tracker specific information to the context, such as the name of the derivative.

This function is necessary since Django’s TEMPLATE_CONTEXT_PROCESSORS, whereas this function can be called independently from any HTTP request.


Returns developer’s information url based on his/her email through vendor-specific function

distro_tracker.core.utils.get_or_none(model, **kwargs)[source]

Gets a Django Model object from the database or returns None if it does not exist.


Returns a full name for the VCS given its shorthand.

If the given shorthand is unknown an empty string is returned.

Parameters:shorthand – The shorthand of a VCS for which a name is required.
Return type:string

Returns the current timestamp in the requested timezone (UTC by default) and can be easily mocked out for tests.


Helper function creating an HttpResponse by serializing the given response object to a JSON string.

The resulting HTTP response has Content-Type set to application/json.

Parameters:response – The object to be serialized in the response. It must be serializable by the json module.
Return type:HttpResponse

The function extracts any possible signature information found in the given content.

Uses the DISTRO_TRACKER_KEYRING_DIRECTORY setting to access the keyring. If this setting does not exist, no signatures can be validated.

Returns:Information about the signers of the content as a list or None if there is no (valid) signature.
Return type:list of (name, email) pairs or None

datastructures Module

email_messages Module

Module including some utility functions and classes for manipulating email.

class distro_tracker.core.utils.email_messages.CustomEmailMessage(msg=None, *args, **kwargs)[source]

Bases: django.core.mail.message.EmailMessage

A subclass of django.core.mail.EmailMessage which can be fed an email.message.Message instance to define the body of the message.

If msg is set, the body attribute is ignored.

If the user wants to attach additional parts to the message, the attach() method can be used but the user must ensure that the given msg instance is a multipart message before doing so.

Effectively, this is also a wrapper which allows sending instances of email.message.Message via Django email backends.


Returns the underlying email.message.Message object. In case the user did not set a msg attribute for this instance the parent EmailMessage.message method is used.

distro_tracker.core.utils.email_messages.decode_header(header, default_encoding='utf-8')[source]

Decodes an email message header and returns it coded as a unicode string.

This is necessary since it is possible that a header is made of multiple differently encoded parts which makes email.header.decode_header() insufficient.


Extracts the email address from the From email header.

>>> str(extract_email_address_from_header('Real Name <foo@domain.com>'))
>>> str(extract_email_address_from_header('foo@domain.com'))
distro_tracker.core.utils.email_messages.get_decoded_message_payload(message, default_charset='utf-8')[source]

Extracts the payload of the given email.message.Message and returns it decoded based on the Content-Transfer-Encoding and charset.


Returns a live-patched email.Message object from the given bytes.

The changes ensure that parsing the message’s bytes with this method and then returning them by using the returned object’s as_string method is an idempotent operation.

An as_bytes method is also created since Django’s SMTP backend relies on this method (which is usually brought by its own django.core.mail.SafeMIMEText object but that we don’t use in our CustomEmailMessage).


Takes an address in almost-RFC822 format and turns it into a dict {‘name’: real_name, ‘email’: email_address}

The difference with email.utils.parseaddr and rfc822.parseaddr is that this routine allows unquoted commas to appear in the real name (in violation of RFC822).


Takes a string with addresses in RFC822 format and returns a list of dicts {‘name’: real_name, ‘email’: email_address} It tries to be forgiving about unquoted commas in addresses.


Live patch the email.message.Message object passed as parameter so that:

  • the as_string() method return the same set of bytes it has been parsed from (to preserve as much as possible the original message)
  • the as_bytes() is added too (this method is expected by Django’s SMTP backend)

Unfolding is the process to remove the line wrapping added by mail agents. A header is a single logical line and they are not allowed to be multi-line values.

We need to unfold their values in particular when we want to reuse the values to compose a reply message as Python’s email API chokes on those newline characters.

If header is None, the return value is None as well.

:param:header: the header value to unfold :type param: str :returns: the unfolded version of the header. :rtype: str

http Module

Utilities for handling HTTP resource access.

class distro_tracker.core.utils.http.HttpCache(cache_directory_path)[source]

Bases: object

A class providing an interface to a cache of HTTP responses.

get_content(url, compression='auto')[source]

Returns the content of the cached response for the given URL.

If the file is compressed, then uncompress it, else, consider it as plain file.

Parameters:compression (str) – Specifies the compression method used to generate the resource, and thus the compression method one should use to decompress it.
Return type:bytes

Returns the HTTP headers of the cached response for the given URL.

Return type:dict

If the cached response for the given URL is expired based on Cache-Control or Expires headers, returns True.


Removes the cached response for the given URL.

update(url, force=False)[source]

Performs an update of the cached resource. This means that it validates that its most current version is found in the cache by doing a conditional GET request.

Parameters:force – To force the method to perform a full GET request, set the parameter to True
Returns:The original HTTP response and a Boolean indicating whether the cached value was updated.
Return type:two-tuple of (requests.Response, Boolean)
distro_tracker.core.utils.http.get_resource_content(url, cache=None, compression='auto', only_if_updated=False, force_update=False, ignore_network_failures=False, ignore_http_error=None)[source]

A helper function which returns the content of the resource found at the given URL.

If the resource is already cached in the cache object and the cached content has not expired, the function will not do any HTTP requests and will return the cached content.

If the resource is stale or not cached at all, it is from the Web.

If the HTTP request returned an error code, the requests module will raise a requests.exceptions.HTTPError.

In case of network failures, some IOError exception will be raised unless ignore_network_failures is set to True.

  • url (str) – The URL of the resource to be retrieved
  • cache (HttpCache or an object with an equivalent interface) – A cache object which should be used to look up and store the cached resource. If it is not provided, an instance of HttpCache with a DISTRO_TRACKER_CACHE_DIRECTORY cache directory is used.
  • compression (str) – Specifies the compression method used to generate the resource, and thus the compression method one should use to decompress it. If auto, then guess it from the url file extension.
  • only_if_updated (bool) – if set to True returns None when no update is done. Otherwise, returns the content in any case.
  • force_update (bool) – if set to True do a new HTTP request even if we non-expired data in the cache.
  • ignore_network_failures (bool) – if set to True, then the function will return None in case of network failures and not raise any exception.
  • ignore_http_error (int) – if the request results in an HTTP error with the given status code, then the error is ignored and no exception is raised. And None is returned.

The bytes representation of the resource found at the given url

Return type:


distro_tracker.core.utils.http.get_resource_text(*args, **kwargs)[source]

Clone of get_resource_content() which transparently decodes the downloaded content into text. It supports the same parameters and adds the encoding parameter.

Parameters:encoding (str) – Specifies an encoding to decode the resource content.
Returns:The textual representation of the resource found at the given url.
Return type:str

Parses the given Cache-Control header’s values.

Returns:The key-value pairs found in the header. If some key did not have an associated value in the header, None is used instead.
Return type:dict
distro_tracker.core.utils.http.safe_redirect(to, fallback, allowed_hosts=None)[source]

Implements a safe redirection to to provided that it’s safe. Else, goes to fallback. allowed_hosts describes the list of valid hosts for the call to django.utils.http.is_safe_url().

  • to (str or None) – The URL that one should be returned to.
  • fallback (str) – A safe URL to fall back on if to isn’t safe. WARNING! This url is NOT checked! The developer is advised to put only an url he knows to be safe!
  • allowed_hosts (list of str) – A list of “safe” hosts. If None, relies on the default behaviour of django.utils.http.is_safe_url().

A ResponseRedirect instance containing the appropriate intel for the redirection.

Return type:


packages Module

Utilities for processing Debian package information.

class distro_tracker.core.utils.packages.AptCache[source]

Bases: object

A class for handling cached package information.

class AcquireProgress(*args, **kwargs)[source]

Bases: apt.progress.base.AcquireProgress

Instances of this class can be passed to apt.cache.Cache.update() calls. It provides a way to track which files were changed and which were not by an update operation.


Invoked when an item is successfully and completely fetched.


Invoked when an item is confirmed to be up-to-date.

Invoked when an item is confirmed to be up-to-date. For instance, when an HTTP download is informed that the file on the server was not modified.


Periodically invoked while the Acquire process is underway.

This method gets invoked while the Acquire progress given by the parameter ‘owner’ is underway. It should display information about the current state.

This function returns a boolean value indicating whether the acquisition should be continued (True) or cancelled (False).

DEFAULT_MAX_SIZE = 1073741824
QUILT_FORMAT = '3.0 (quilt)'

Removes all cache information. This causes the next update to retrieve fresh repository files.


Clears all cached package source files.


Configures the cache based on the most current repository information.


Returns cached files, optionally filtered by the given filter_function

Parameters:filter_function (callable) – Takes a file name as the only parameter and returns a bool indicating whether it should be included in the result.
Returns:A list of cached file names
Return type:list

Returns the total space taken by the given directory in bytes.

Parameters:directory_path (string) – The path to the directory
Return type:int

Returns the path to the directory where a particular source package is cached.

Parameters:package_name (string) – The name of the source package
Return type:string

Returns all Packages files which are cached for the given repository.

For instance, Packages files for different suites are cached separately.

Parameters:repository (Repository) – The repository for which to return all cached Packages files
Return type:iterable of strings
get_source_version_cache_directory(package_name, version)[source]

Returns the path to the directory where a particular source package version files are extracted.

  • package_name (string) – The name of the source package
  • version (string) – The version of the source package
Return type:



Returns all Sources files which are cached for the given repository.

For instance, Sources files for different suites are cached separately.

Parameters:repository (Repository) – The repository for which to return all cached Sources files
Return type:iterable of strings
retrieve_source(source_name, version, debian_directory_only=False)[source]

Retrieve the source package files for the given source package version.

  • source_name (string) – The name of the source package
  • version (string) – The version of the source package
  • debian_directory_only (Boolean) – Flag indicating if the method should try to retrieve only the debian directory of the source package. This is usually only possible when the package format is 3.0 (quilt).

The path to the directory containing the extracted source package files.

Return type:


source_cache_directory = None

The directory where source package files are cached


Updates the apt.conf file which gives general settings for the apt.cache.Cache.

In particular, this updates the list of all architectures which should be considered in package updates based on architectures that the repositories support.


Initiates a cache update.

Parameters:force_download – If set to True causes the cache to be cleared before starting the update, thus making sure all index files are downloaded again.
Returns:A two-tuple (updated_sources, updated_packages). Each of the tuple’s members is a list of (Repository, component, file_name) tuple representing the repository which was updated, component, and the file which contains the fresh information. The file is either a Sources or a Packages file respectively.

Updates the sources.list file used to list repositories for which package information should be cached.

exception distro_tracker.core.utils.packages.SourcePackageRetrieveError[source]

Bases: Exception


Extracts the name of the .dsc file from a package’s Sources entry.

Parameters:stanza (dict) – The Sources entry from which to extract the VCS info. Maps Sources key names to values.

Extracts information from a Packages file entry and returns it in the form of a dictionary.

Parameters:stanza (Case-insensitive dict) – The raw entry’s key-value pairs.

Extracts information from a Sources file entry and returns it in the form of a dictionary.

Parameters:stanza (Case-insensitive dict) – The raw entry’s key-value pairs.

Extracts the VCS information from a package’s Sources entry.

Parameters:stanza (dict) – The Sources entry from which to extract the VCS info. Maps Sources key names to values.
Returns:VCS information regarding the package. Contains the following keys: type[, browser, url, branch]
Return type:dict

Return a HTML-formatted list of packages.


Returns the name of the hash directory used to avoid having too many entries in a single directory. It’s usually the first letter of the package except for lib* packages where it’s the first 4 letters.

Parameters:package_name (str) – The package name.
Returns:Name of the hash directory.
Return type:str

Returns the URL of the page dedicated to this package name.

Parameters:package_name (str or PackageName model) – The package name.
Returns:Name of the hash directory.
Return type:str

plugins Module

Classes to build a plugin mechanism.

class distro_tracker.core.utils.plugins.PluginRegistry(name, bases, attrs)[source]

Bases: type

A metaclass which any class that wants to behave as a registry can use.

When classes derived from classes which use this metaclass are instantiated, they are added to the list plugins. The concrete classes using this metaclass are free to decide how to use this list.

This metaclass also adds an unregister_plugin() classmethod to all concrete classes which removes the class from the list of plugins.

verp Module

Module for encoding and decoding Variable Envelope Return Path addresses.

It is implemented following the recommendations laid out in VERP and https://www.courier-mta.org/draft-varshavchik-verp-smtpext.txt

>>> from distro_tracker.core.utils import verp
>>> str(verp.encode('itny-out@domain.com', 'node42!ann@old.example.com'))
>>> map(str, decode('itny-out-node42+21ann=old.example.com@domain.com'))
['itny-out@domain.com', 'node42!ann@old.example.com']
distro_tracker.core.utils.verp.encode(sender_address, recipient_address, separator='-')[source]

Encodes sender_address, recipient_address to a VERP compliant address to be used as the envelope-from (return-path) address.

  • sender_address (string) – The email address of the sender
  • recipient_address (string) – The email address of the recipient
  • separator – The separator to be used between the sender’s local part and the encoded recipient’s local part in the resulting VERP address.
Return type:


>>> str(encode('itny-out@domain.com', 'node42!ann@old.example.com'))
>>> str(encode('itny-out@domain.com', 'tom@old.example.com'))
>>> str(encode('itny-out@domain.com', 'dave+priority@new.example.com'))
>>> str(encode('bounce@dom.com', 'user+!%-:@[]+@other.com'))
distro_tracker.core.utils.verp.decode(verp_address, separator='-')[source]

Decodes the given VERP encoded from address and returns the original sender address and recipient address, returning them as a tuple.

  • verp_address – The return path address
  • separator – The separator to be expected between the sender’s local part and the encoded recipient’s local part in the given verp_address
>>> from_email, to_email = 'bounce@domain.com', 'user@other.com'
>>> decode(encode(from_email, to_email)) == (from_email, to_email)
>>> map(str, decode('itny-out-dave+2Bpriority=new.example.com@domain.com'))
['itny-out@domain.com', 'dave+priority@new.example.com']
>>> map(str, decode('itny-out-node42+21ann=old.example.com@domain.com'))
['itny-out@domain.com', 'node42!ann@old.example.com']
>>> map(str, decode('bounce-addr+2B40=dom.com@asdf.com'))
['bounce@asdf.com', 'addr+40@dom.com']
>>> s = 'bounce-user+2B+21+25+2D+3A+40+5B+5D+2B=other.com@dom.com'
>>> str(decode(s)[1])