distro_tracker.core.utils¶
Various utilities for the distro-tracker project.
-
distro_tracker.core.utils.
get_or_none
(model, **kwargs)[source]¶ Gets a Django Model object from the database or returns
None
if it does not exist.
-
distro_tracker.core.utils.
distro_tracker_render_to_string
(template_name, context=None)[source]¶ A custom function to render a template to a string which injects extra distro-tracker specific information to the context, such as the name of the derivative.
This function is necessary since Django’s
TEMPLATE_CONTEXT_PROCESSORS
, whereas this function can be called independently from any HTTP request.
-
distro_tracker.core.utils.
render_to_json_response
(response)[source]¶ Helper function creating an
HttpResponse
by serializing the givenresponse
object to a JSON string.The resulting HTTP response has Content-Type set to application/json.
- Parameters
response – The object to be serialized in the response. It must be serializable by the
json
module.- Return type
-
class
distro_tracker.core.utils.
PrettyPrintList
(the_list=None, delimiter=' ')[source]¶ Bases:
object
A class which wraps the built-in
list
object so that when it is converted to a string, its contents are printed using the givendelimiter
.The default delimiter is a space.
>>> a = PrettyPrintList([1, 2, 3]) >>> print(a) 1 2 3 >>> print(PrettyPrintList([u'one', u'2', u'3'])) one 2 3 >>> print(PrettyPrintList([1, 2, 3], delimiter=', ')) 1, 2, 3 >>> # Still acts as a list >>> a == [1, 2, 3] True >>> a == ['1', '2', '3'] False
-
class
distro_tracker.core.utils.
SpaceDelimitedTextField
(*args, db_collation=None, **kwargs)[source]¶ Bases:
django.db.models.fields.TextField
A custom Django model field which stores a list of strings.
It stores the list in a
TextField
as a space delimited list. It is marshalled back to aPrettyPrintList
in the Python domain.-
description
= 'Stores a space delimited list of strings'¶
-
to_python
(value)[source]¶ Convert the input value into the expected Python data type, raising django.core.exceptions.ValidationError if the data can’t be converted. Return the converted value. Subclasses should override this.
-
get_prep_value
(value, **kwargs)[source]¶ Perform preliminary non-db specific value checks and conversions.
-
-
distro_tracker.core.utils.
VCS_SHORTHAND_TO_NAME
= {'bzr': 'Bazaar', 'cvs': 'CVS', 'darcs': 'Darcs', 'git': 'Git', 'hg': 'Mercurial', 'mtn': 'Monotone', 'svn': 'Subversion'}¶ A map of currently available VCS systems’ shorthands to their names.
-
distro_tracker.core.utils.
get_vcs_name
(shorthand)[source]¶ Returns a full name for the VCS given its shorthand.
If the given shorthand is unknown an empty string is returned.
- Parameters
shorthand – The shorthand of a VCS for which a name is required.
- Return type
string
-
distro_tracker.core.utils.
verify_signature
(content)[source]¶ The function extracts any possible signature information found in the given content.
Uses the
DISTRO_TRACKER_KEYRING_DIRECTORY
setting to access the keyring. If this setting does not exist, no signatures can be validated.- Returns
Information about the signers of the content as a list or
None
if there is no (valid) signature.- Return type
list of
(name, email)
pairs orNone
-
distro_tracker.core.utils.
now
(tz=datetime.timezone.utc)[source]¶ Returns the current timestamp in the requested timezone (UTC by default) and can be easily mocked out for tests.
distro_tracker.core.utils.compression¶
Utilities for handling compression
-
distro_tracker.core.utils.compression.
guess_compression_method
(filepath)[source]¶ Given filepath, tries to determine the compression of the file.
-
distro_tracker.core.utils.compression.
get_uncompressed_stream
(input_stream, compression='auto', text=False, encoding='utf-8')[source]¶ Returns a file-like object (aka stream) providing an uncompressed version of the content read on the input stream provided.
- Parameters
input_stream – The file-like object providing compressed data.
compression (str) – The compression type. Specify “auto” to let the function guess it out of the associated filename (the input_stream needs to have a name attribute, otherwise a ValueError is raised).
text (boolean) – If True, open the stream as a text stream.
encoding (str) – Encoding to use to decode the text.
-
distro_tracker.core.utils.compression.
get_compressor_factory
(compression)[source]¶ Returns a function that can create a file-like object used to compress data. The returned function has actually the same API as gzip.open, lzma.open and bz2.open. You have to pass mode=’wb’ or mode=’wt’ to the returned function to use it in write mode.
compressor_factory = get_compressor_factory("xz") compressor = compressor_factory(path, mode="wb") compressor.write(b"Test") compressor.close()
- Parameters
compression (str) – The compression method to use.
distro_tracker.core.utils.email_messages¶
Module including some utility functions and classes for manipulating email.
-
distro_tracker.core.utils.email_messages.
extract_email_address_from_header
(header)[source]¶ Extracts the email address from the From email header.
>>> str(extract_email_address_from_header('Real Name <foo@domain.com>')) 'foo@domain.com' >>> str(extract_email_address_from_header('foo@domain.com')) 'foo@domain.com'
-
distro_tracker.core.utils.email_messages.
name_and_address_from_string
(content)[source]¶ Takes an address in almost-RFC822 format and turns it into a dict {‘name’: real_name, ‘email’: email_address}
The difference with email.utils.parseaddr and rfc822.parseaddr is that this routine allows unquoted commas to appear in the real name (in violation of RFC822).
-
distro_tracker.core.utils.email_messages.
names_and_addresses_from_string
(content)[source]¶ Takes a string with addresses in RFC822 format and returns a list of dicts {‘name’: real_name, ‘email’: email_address} It tries to be forgiving about unquoted commas in addresses.
-
distro_tracker.core.utils.email_messages.
get_decoded_message_payload
(message, default_charset='utf-8')[source]¶ Extracts the payload of the given
email.message.Message
and returns it decoded based on the Content-Transfer-Encoding and charset.
-
distro_tracker.core.utils.email_messages.
patch_message_for_django_compat
(message)[source]¶ Live patch the
email.message.Message
object passed as parameter so that:the
as_string()
method return the same set of bytes it has been parsed from (to preserve as much as possible the original message)the
as_bytes()
is added too (this method is expected by Django’s SMTP backend)
-
distro_tracker.core.utils.email_messages.
message_from_bytes
(message_bytes)[source]¶ Returns a live-patched
email.Message
object from the given bytes.The changes ensure that parsing the message’s bytes with this method and then returning them by using the returned object’s as_string method is an idempotent operation.
An as_bytes method is also created since Django’s SMTP backend relies on this method (which is usually brought by its own
django.core.mail.SafeMIMEText
object but that we don’t use in ourCustomEmailMessage
).
-
distro_tracker.core.utils.email_messages.
get_message_body
(msg)[source]¶ Returns the message body, joining together all parts into one string.
- Parameters
msg (
email.message.Message
) – The original received package message
-
class
distro_tracker.core.utils.email_messages.
CustomEmailMessage
(msg=None, *args, **kwargs)[source]¶ Bases:
django.core.mail.message.EmailMessage
A subclass of
django.core.mail.EmailMessage
which can be fed anemail.message.Message
instance to define the body of the message.If
msg
is set, thebody
attribute is ignored.If the user wants to attach additional parts to the message, the
attach()
method can be used but the user must ensure that the givenmsg
instance is a multipart message before doing so.Effectively, this is also a wrapper which allows sending instances of
email.message.Message
via Django email backends.-
message
()[source]¶ Returns the underlying
email.message.Message
object. In case the user did not set amsg
attribute for this instance the parentEmailMessage.message
method is used.
-
-
distro_tracker.core.utils.email_messages.
decode_header
(header, default_encoding='utf-8')[source]¶ Decodes an email message header and returns it coded as a unicode string.
This is necessary since it is possible that a header is made of multiple differently encoded parts which makes
email.header.decode_header()
insufficient.
-
distro_tracker.core.utils.email_messages.
unfold_header
(header)[source]¶ Unfolding is the process to remove the line wrapping added by mail agents. A header is a single logical line and they are not allowed to be multi-line values.
We need to unfold their values in particular when we want to reuse the values to compose a reply message as Python’s email API chokes on those newline characters.
If header is None, the return value is None as well.
- Param:header
the header value to unfold
- Returns
the unfolded version of the header.
- Return type
distro_tracker.core.utils.http¶
Utilities for handling HTTP resource access.
-
distro_tracker.core.utils.http.
parse_cache_control_header
(header)[source]¶ Parses the given Cache-Control header’s values.
- Returns
The key-value pairs found in the header. If some key did not have an associated value in the header,
None
is used instead.- Return type
-
class
distro_tracker.core.utils.http.
HttpCache
(cache_directory_path, url_to_cache_path=None)[source]¶ Bases:
object
A class providing an interface to a cache of HTTP responses.
-
is_expired
(url)[source]¶ If the cached response for the given URL is expired based on Cache-Control or Expires headers, returns True.
-
get_content_stream
(url, compression='auto', text=False)[source]¶ Returns a file-like object that reads the cached copy of the given URL.
If the file is compressed, the file-like object will read the decompressed stream.
-
get_content
(url, compression='auto')[source]¶ Returns the content of the cached response for the given URL.
If the file is compressed, then uncompress it, else, consider it as plain file.
-
get_headers
(url)[source]¶ Returns the HTTP headers of the cached response for the given URL.
- Return type
-
update
(url, force=False, invalidate_cache=True)[source]¶ Performs an update of the cached resource. This means that it validates that its most current version is found in the cache by doing a conditional GET request.
- Parameters
force – To force the method to perform a full GET request, set the parameter to
True
- Returns
The original HTTP response and a Boolean indicating whether the cached value was updated.
- Return type
two-tuple of (
requests.Response
,Boolean
)
-
url_to_cache_path
(url)[source]¶ Transforms an arbitrary URL into a relative path within the cache directory. Can be overridden by the user by supplying its own implementation in the
url_to_cache_path
attribute of the__init__()
method.- Parameters
url (str) – The URL to be cached.
- Returns
A relative path within the cache directory, used to store a copy of the resource.
-
-
distro_tracker.core.utils.http.
get_resource_content
(url, cache=None, compression='auto', only_if_updated=False, force_update=False, ignore_network_failures=False, ignore_http_error=None)[source]¶ A helper function which returns the content of the resource found at the given URL.
If the resource is already cached in the
cache
object and the cached content has not expired, the function will not do any HTTP requests and will return the cached content.If the resource is stale or not cached at all, it is from the Web.
If the HTTP request returned an error code, the requests module will raise a
requests.exceptions.HTTPError
.In case of network failures, some IOError exception will be raised unless ignore_network_failures is set to True.
- Parameters
url (str) – The URL of the resource to be retrieved
cache (
HttpCache
or an object with an equivalent interface) – A cache object which should be used to look up and store the cached resource. If it is not provided, an instance ofHttpCache
with aDISTRO_TRACKER_CACHE_DIRECTORY
cache directory is used.compression (str) – Specifies the compression method used to generate the resource, and thus the compression method one should use to decompress it. If auto, then guess it from the url file extension.
only_if_updated (bool) – if set to True returns None when no update is done. Otherwise, returns the content in any case.
force_update (bool) – if set to True do a new HTTP request even if we non-expired data in the cache.
ignore_network_failures (bool) – if set to True, then the function will return None in case of network failures and not raise any exception.
ignore_http_error (int) – if the request results in an HTTP error with the given status code, then the error is ignored and no exception is raised. And None is returned.
- Returns
The bytes representation of the resource found at the given url
- Return type
-
distro_tracker.core.utils.http.
get_resource_text
(*args, **kwargs)[source]¶ Clone of
get_resource_content()
which transparently decodes the downloaded content into text. It supports the same parameters and adds the encoding parameter.
-
distro_tracker.core.utils.http.
safe_redirect
(to, fallback, allowed_hosts=None)[source]¶ Implements a safe redirection to to provided that it’s safe. Else, goes to fallback. allowed_hosts describes the list of valid hosts for the call to
django.utils.http.url_has_allowed_host_and_scheme()
.- Parameters
fallback (str) – A safe URL to fall back on if to isn’t safe. WARNING! This url is NOT checked! The developer is advised to put only an url he knows to be safe!
allowed_hosts (list of str) – A list of “safe” hosts. If None, relies on the default behaviour of
django.utils.http.url_has_allowed_host_and_scheme()
.
- Returns
A ResponseRedirect instance containing the appropriate intel for the redirection.
- Return type
django.http.HttpResponseRedirectBase
distro_tracker.core.utils.linkify¶
Module including some utility functions to inject links in plain text.
-
class
distro_tracker.core.utils.linkify.
Linkify
[source]¶ Bases:
object
A base class representing ways to inject useful links in plain text data
If you want to recognize a new syntax where links could provide value to a view of the content, just create a subclass and implement the linkify method.
-
static
linkify
(text)[source]¶ - Parameters
text – the text where we should inject HTML links
- Returns
the text formatted with HTML links
- Return type
-
plugins
= [<class 'distro_tracker.core.utils.linkify.LinkifyHttpLinks'>, <class 'distro_tracker.core.utils.linkify.LinkifyDebianBugLinks'>, <class 'distro_tracker.core.utils.linkify.LinkifyUbuntuBugLinks'>, <class 'distro_tracker.core.utils.linkify.LinkifyCVELinks'>]¶
-
classmethod
unregister_plugin
()¶
-
static
-
class
distro_tracker.core.utils.linkify.
LinkifyHttpLinks
[source]¶ Bases:
distro_tracker.core.utils.linkify.Linkify
Detect http:// and https:// URLs and transform them in true HTML links.
-
static
linkify
(text)[source]¶ - Parameters
text – the text where we should inject HTML links
- Returns
the text formatted with HTML links
- Return type
-
classmethod
unregister_plugin
()¶
-
static
-
class
distro_tracker.core.utils.linkify.
LinkifyDebianBugLinks
[source]¶ Bases:
distro_tracker.core.utils.linkify.Linkify
Detect “Closes: #123, 234” syntax used in Debian changelogs to close bugs and inject HTML links to the corresponding bug tracker entry. Also handles the “Closes: 123 456” fields of .changes files.
-
close_prefix
= 'Closes:'¶
-
close_field
= 'Closes:'¶
-
bug_url
= 'https://bugs.debian.org/'¶
-
classmethod
linkify
(text)[source]¶ - Parameters
text – the text where we should inject HTML links
- Returns
the text formatted with HTML links
- Return type
-
classmethod
unregister_plugin
()¶
-
-
class
distro_tracker.core.utils.linkify.
LinkifyUbuntuBugLinks
[source]¶ Bases:
distro_tracker.core.utils.linkify.LinkifyDebianBugLinks
Detect “LP: #123, 234” syntax used in Ubuntu changelogs to close bugs and inject HTML links to the corresponding bug tracker entry.
-
close_prefix
= 'LP:'¶
-
close_field
= 'Launchpad-Bugs-Fixed:'¶
-
bug_url
= 'https://bugs.launchpad.net/bugs/'¶
-
classmethod
unregister_plugin
()¶
-
-
class
distro_tracker.core.utils.linkify.
LinkifyCVELinks
[source]¶ Bases:
distro_tracker.core.utils.linkify.Linkify
Detect “CVE-2014-1234” words and transform them into links to the CVE tracker at cve.mitre.org. The exact URL can be overridden with a
DISTRO_TRACKER_CVE_URL
configuration setting to redirect the URL to a custom tracker.-
static
linkify
(text)[source]¶ - Parameters
text – the text where we should inject HTML links
- Returns
the text formatted with HTML links
- Return type
-
classmethod
unregister_plugin
()¶
-
static
distro_tracker.core.utils.misc¶
Miscellaneous utilities that don’t require their own python module.
-
distro_tracker.core.utils.misc.
get_data_checksum
(data)[source]¶ Checksums a dict, without its prospective ‘checksum’ key/value.
distro_tracker.core.utils.packages¶
Utilities for processing Debian package information.
-
distro_tracker.core.utils.packages.
package_hashdir
(package_name)[source]¶ Returns the name of the hash directory used to avoid having too many entries in a single directory. It’s usually the first letter of the package except for lib* packages where it’s the first 4 letters.
-
distro_tracker.core.utils.packages.
package_url
(package_name)[source]¶ Returns the URL of the page dedicated to this package name.
-
distro_tracker.core.utils.packages.
extract_vcs_information
(stanza)[source]¶ Extracts the VCS information from a package’s Sources entry.
-
distro_tracker.core.utils.packages.
extract_dsc_file_name
(stanza)[source]¶ Extracts the name of the .dsc file from a package’s Sources entry.
- Parameters
stanza (dict) – The
Sources
entry from which to extract the VCS info. MapsSources
key names to values.
-
distro_tracker.core.utils.packages.
extract_information_from_sources_entry
(stanza)[source]¶ Extracts information from a
Sources
file entry and returns it in the form of a dictionary.- Parameters
stanza (Case-insensitive dict) – The raw entry’s key-value pairs.
-
distro_tracker.core.utils.packages.
extract_information_from_packages_entry
(stanza)[source]¶ Extracts information from a
Packages
file entry and returns it in the form of a dictionary.- Parameters
stanza (Case-insensitive dict) – The raw entry’s key-value pairs.
-
class
distro_tracker.core.utils.packages.
AptCache
[source]¶ Bases:
object
A class for handling cached package information.
-
DEFAULT_MAX_SIZE
= 1073741824¶
-
QUILT_FORMAT
= '3.0 (quilt)'¶
-
class
AcquireProgress
(*args, **kwargs)[source]¶ Bases:
apt.progress.base.AcquireProgress
Instances of this class can be passed to
apt.cache.Cache.update()
calls. It provides a way to track which files were changed and which were not by an update operation.-
ims_hit
(item)[source]¶ Invoked when an item is confirmed to be up-to-date.
Invoked when an item is confirmed to be up-to-date. For instance, when an HTTP download is informed that the file on the server was not modified.
-
pulse
(owner)[source]¶ Periodically invoked while the Acquire process is underway.
This method gets invoked while the Acquire progress given by the parameter ‘owner’ is underway. It should display information about the current state.
This function returns a boolean value indicating whether the acquisition should be continued (True) or cancelled (False).
-
-
source_cache_directory
¶ The directory where source package files are cached
-
property
cache_size
¶
-
get_directory_size
(directory_path)[source]¶ Returns the total space taken by the given directory in bytes.
- Parameters
directory_path (string) – The path to the directory
- Return type
-
clear_cache
()[source]¶ Removes all cache information. This causes the next update to retrieve fresh repository files.
-
update_sources_list
()[source]¶ Updates the
sources.list
file used to list repositories for which package information should be cached.
-
update_apt_conf
()[source]¶ Updates the
apt.conf
file which gives general settings for theapt.cache.Cache
.In particular, this updates the list of all architectures which should be considered in package updates based on architectures that the repositories support.
-
get_cached_files
(filter_function=None)[source]¶ Returns cached files, optionally filtered by the given
filter_function
-
get_sources_files_for_repository
(repository)[source]¶ Returns all
Sources
files which are cached for the given repository.For instance,
Sources
files for different suites are cached separately.- Parameters
repository (
Repository
) – The repository for which to return all cachedSources
files- Return type
iterable
of strings
-
get_packages_files_for_repository
(repository)[source]¶ Returns all
Packages
files which are cached for the given repository.For instance,
Packages
files for different suites are cached separately.- Parameters
repository (
Repository
) – The repository for which to return all cachedPackages
files- Return type
iterable
of strings
-
update_repositories
(force_download=False)[source]¶ Initiates a cache update.
- Parameters
force_download – If set to
True
causes the cache to be cleared before starting the update, thus making sure all index files are downloaded again.- Returns
A two-tuple
(updated_sources, updated_packages)
. Each of the tuple’s members is a list of (Repository
,component
,file_name
) tuple representing the repository which was updated, component, and the file which contains the fresh information. The file is either aSources
or aPackages
file respectively.
-
get_package_source_cache_directory
(package_name)[source]¶ Returns the path to the directory where a particular source package is cached.
- Parameters
package_name (string) – The name of the source package
- Return type
string
-
get_source_version_cache_directory
(package_name, version)[source]¶ Returns the path to the directory where a particular source package version files are extracted.
- Parameters
package_name (string) – The name of the source package
version (string) – The version of the source package
- Return type
string
-
retrieve_source
(source_name, version, debian_directory_only=False)[source]¶ Retrieve the source package files for the given source package version.
- Parameters
source_name (string) – The name of the source package
version (string) – The version of the source package
debian_directory_only (Boolean) – Flag indicating if the method should try to retrieve only the debian directory of the source package. This is usually only possible when the package format is 3.0 (quilt).
- Returns
The path to the directory containing the extracted source package files.
- Return type
string
-
distro_tracker.core.utils.plugins¶
Classes to build a plugin mechanism.
-
class
distro_tracker.core.utils.plugins.
PluginRegistry
(name, bases, attrs)[source]¶ Bases:
type
A metaclass which any class that wants to behave as a registry can use.
When classes derived from classes which use this metaclass are instantiated, they are added to the list
plugins
. The concrete classes using this metaclass are free to decide how to use this list.This metaclass also adds an
unregister_plugin()
classmethod to all concrete classes which removes the class from the list of plugins.
distro_tracker.core.utils.urls¶
Utilities for generating URLs of various kinds
-
distro_tracker.core.utils.urls.
RepologyUrl
(target_page, repo, package)[source]¶ Build a repology.org URL
distro_tracker.core.utils.verp¶
Module for encoding and decoding Variable Envelope Return Path addresses.
It is implemented following the recommendations laid out in VERP and https://www.courier-mta.org/draft-varshavchik-verp-smtpext.txt
>>> from distro_tracker.core.utils import verp
>>> str(verp.encode('itny-out@domain.com', 'node42!ann@old.example.com'))
'itny-out-node42+21ann=old.example.com@domain.com'
>>> map(str, decode('itny-out-node42+21ann=old.example.com@domain.com'))
['itny-out@domain.com', 'node42!ann@old.example.com']
-
distro_tracker.core.utils.verp.
encode
(sender_address, recipient_address, separator='-')[source]¶ Encodes
sender_address
,recipient_address
to a VERP compliant address to be used as the envelope-from (return-path) address.- Parameters
sender_address (string) – The email address of the sender
recipient_address (string) – The email address of the recipient
separator – The separator to be used between the sender’s local part and the encoded recipient’s local part in the resulting VERP address.
- Return type
string
>>> str(encode('itny-out@domain.com', 'node42!ann@old.example.com')) 'itny-out-node42+21ann=old.example.com@domain.com' >>> str(encode('itny-out@domain.com', 'tom@old.example.com')) 'itny-out-tom=old.example.com@domain.com' >>> str(encode('itny-out@domain.com', 'dave+priority@new.example.com')) 'itny-out-dave+2Bpriority=new.example.com@domain.com'
>>> str(encode('bounce@dom.com', 'user+!%-:@[]+@other.com')) 'bounce-user+2B+21+25+2D+3A+40+5B+5D+2B=other.com@dom.com'
-
distro_tracker.core.utils.verp.
decode
(verp_address, separator='-')[source]¶ Decodes the given VERP encoded from address and returns the original sender address and recipient address, returning them as a tuple.
- Parameters
verp_address – The return path address
separator – The separator to be expected between the sender’s local part and the encoded recipient’s local part in the given
verp_address
>>> from_email, to_email = 'bounce@domain.com', 'user@other.com' >>> decode(encode(from_email, to_email)) == (from_email, to_email) True
>>> map(str, decode('itny-out-dave+2Bpriority=new.example.com@domain.com')) ['itny-out@domain.com', 'dave+priority@new.example.com'] >>> map(str, decode('itny-out-node42+21ann=old.example.com@domain.com')) ['itny-out@domain.com', 'node42!ann@old.example.com'] >>> map(str, decode('bounce-addr+2B40=dom.com@asdf.com')) ['bounce@asdf.com', 'addr+40@dom.com']
>>> s = 'bounce-user+2B+21+25+2D+3A+40+5B+5D+2B=other.com@dom.com' >>> str(decode(s)[1]) 'user+!%-:@[]+@other.com'