Collection utilities

This module implements helpers for working with collections. In some cases, the iterable is restricted to a particular type, such as a list or set.

Many of the function names mention specific data structures, such as “list”s or “dict”s, in the names for historical reasons. In most cases, these functions work with any instance of the more general type (such as Iterable or Mapping). Please see the specific documentation for more details, though.

Iterable helpers

apply_no_return(items, func, *args, …) Apply func to each item in items
flatten_lists(list_of_lists) Flatten a list of iterables into a single list
is_iterator_exhausted(iterator, return_element) Check if the iterator is exhausted
list_insert_list(l, to_insert, index) Insert to_insert into a shallow copy of l at position index.
list_remove_list(l, to_remove) Remove items in to_remove from l
list_to_dict(l, f) Convert the list to a dictionary in which keys and values are adjacent in the list.
remove_nones(l, return_np_array) Remove None`s from `l
replace_none_with_empty_iter(i) Return an empty iterator if i is None.
wrap_in_list(maybe_sequence) If maybe_sequence is not a sequence, then wrap it in a list
wrap_string_in_list(maybe_string) If maybe_string is a string, then wrap it in a list.

Set helpers

wrap_in_set(maybe_set, wrap_string) If maybe_set is not a set, then wrap it in a set.
get_set_pairwise_intersections(dict_of_sets, …) Find the pairwise intersections among sets in dict_of_sets
merge_sets(*set_args) Given any number of sets, merge them into a single set

Mapping helpers

reverse_dict(d) Create a new dictionary in which the keys and values of d are switched
sort_dict_keys_by_value(d) Sort the keys in d by their value and return as a list

Definitions

This module implements helpers for working with collections. In some cases, the iterable is restricted to a particular type, such as a list or set.

pyllars.collection_utils.apply_no_return(items: Iterable, func: Callable, *args, progress_bar: bool = False, total_items: Optional[int] = None, **kwargs) → None[source]

Apply func to each item in items

Unlike map(), this function does not return anything.

Parameters:
  • items (typing.Iterable) – An iterable
  • func (typing.Callable) – The function to apply to each item
  • args – Positional arguments for func.
  • kwargs – Keyword arguments to pass to func
  • progress_bar (bool) – Whether to show a progress bar when waiting for results.
  • total_items (int or None) – The number of items in items. If not given, len is used. Presumably, this is used when items is a generator and len does not work.
Returns:

None – If a return value is expected, use list comprehension instead.

Return type:

None

pyllars.collection_utils.flatten_lists(list_of_lists: Iterable) → List[source]

Flatten a list of iterables into a single list

This function does not further flatten inner iterables.

Parameters:list_of_lists (typing.Iterable) – The iterable to flatten
Returns:flattened_list – The flattened list
Return type:typing.List
pyllars.collection_utils.get_set_pairwise_intersections(dict_of_sets: Mapping[str, Set], return_intersections: bool = True) → pandas.core.frame.DataFrame[source]

Find the pairwise intersections among sets in dict_of_sets

Parameters:
  • dict_of_sets (typing.Mapping[str,typing.Set]) – A mapping in which the keys are the “names” of the sets and the values are the actual sets
  • return_intersections (bool) – Whether to include the actual set intersections in the return. If False, then only the intersection size will be included.
Returns:

df_pairswise_intersections – A dataframe with the following columns:

  • set1 : the name of one set in the pair
  • set2 : the name of the second set in the pair
  • len(set1) : the size of set1
  • len(set2) : the size of set2
  • len(intersection) : the size of the intersection
  • coverage_small : the fraction of the smaller of set1 or set2 in the intersection
  • coverage_large : the fraction of the larger of set1 or set2 in the intersection
  • intersection : the intersection set. Only included if return_intersections is True.

Return type:

pandas.DataFrame

pyllars.collection_utils.is_iterator_exhausted(iterator: Iterable, return_element: bool = False) → Tuple[bool, object][source]

Check if the iterator is exhausted

N.B. THIS CONSUMES THE NEXT ELEMENT OF THE ITERATOR! The return_element parameter can change this behavior.

This method is adapted from this SO question: https://stackoverflow.com/questions/661603

Parameters:
  • iterator (typing.Iterable) – The iterator
  • return_element (bool) – Whether to return the next element of the iterator
Returns:

  • is_exhausted (bool) – Whether there was a next element in the iterator
  • [optional] next_element (object) – It return_element is True, then the consumed element is also returned.

pyllars.collection_utils.list_insert_list(l: Sequence, to_insert: Sequence, index: int) → List[source]

Insert to_insert into a shallow copy of l at position index.

This function is adapted from: http://stackoverflow.com/questions/7376019/

Parameters:
Returns:

updated_l – A list with to_insert inserted into l at position index

Return type:

typing.List

pyllars.collection_utils.list_remove_list(l: Iterable, to_remove: Container) → List[source]

Remove items in to_remove from l

Note that “not in” is used to match items in to_remove. Additionally, the return is not lazy.

Parameters:
Returns:

copy_of_l – A shallow copy of l without the items in to_remove.

Return type:

typing.List

pyllars.collection_utils.list_to_dict(l: Sequence, f: Optional[Callable] = None) → Dict[source]

Convert the list to a dictionary in which keys and values are adjacent in the list. Optionally, a function f can be passed to apply to each value before adding it to the dictionary.

Parameters:
  • l (typing.Sequence) – The list of items
  • f (typing.Callable) – A function to apply to each value before inserting it into the list. For example, float could be passed to convert each value to a float.
Returns:

d – The dictionary, defined as described above

Return type:

typing.Dict

Examples

l = ["key1", "value1", "key2", "value2"]
list_to_dict(l, f) == {"key1": f("value1"), "key2": f("value2")}
pyllars.collection_utils.merge_sets(*set_args) → Set[source]

Given any number of sets, merge them into a single set

N.B. This function only performs a “shallow” merge. It does not handle nested containers within the “outer” sets.

Parameters:set_args (typing.Iterable[typing.Container]) – The sets to merge
Returns:merged_set – A single set containing unique elements from each of the input sets
Return type:typing.Set
pyllars.collection_utils.remove_nones(l: Iterable, return_np_array: bool = False) → List[source]

Remove None`s from `l

Compared to other single-function tests, this uses “is” and avoids strange behavior with data frames, lists of bools, etc.

This function returns a shallow copy and is not lazy.

N.B. This does not test nested lists. So, for example, a list of lists of None values would be unchanged by this function.

Parameters:
  • l (typing.Iterable) – The iterable
  • return_np_array (bool) – If true, the filtered list will be wrapped in an np.array.
Returns:

l_no_nones – A list or np.array with the None`s removed from `l

Return type:

typing.List

pyllars.collection_utils.replace_none_with_empty_iter(i: Optional[Iterable]) → Iterable[source]

Return an empty iterator if i is None. Otherwise, return i.

The purpose of this function is to make iterating over results from functions which return either an iterator or None cleaner. This function does not verify that i is actually an iterator.

Parameters:i (None or typing.Iterable) – The possibly-empty iterator
Returns:i – An empty list if iterator is None, or the original iterator otherwise
Return type:typing.Iterable
pyllars.collection_utils.reverse_dict(d: Mapping) → Dict[source]

Create a new dictionary in which the keys and values of d are switched

In the case of duplicate values, it is arbitrary which will be retained.

Parameters:d (typing.Mapping) – The mapping
Returns:reversed_d – A dictionary in which the values of d now map to the keys
Return type:typing.Dict
pyllars.collection_utils.sort_dict_keys_by_value(d: Mapping) → List[source]

Sort the keys in d by their value and return as a list

This function uses sorted, so the values should be able to be sorted appropriately by that builtin function.

Parameters:d (typing.Mapping) – The dictionary
Returns:sorted_keys – The keys sorted by the associated values
Return type:typing.List
pyllars.collection_utils.wrap_in_list(maybe_sequence: Any) → Sequence[source]

If maybe_sequence is not a sequence, then wrap it in a list

See pyllars.validation_utils.is_sequence() for more details about what counts as a sequence.

Parameters:maybe_sequence (typing.Any) – An object which may be a sequence
Returns:list – Either the original object, or maybe_sequence wrapped in a list, if it was not already a sequence
Return type:typing.Sequence
pyllars.collection_utils.wrap_in_set(maybe_set: Optional[Any], wrap_string: bool = True) → Set[source]

If maybe_set is not a set, then wrap it in a set.

Parameters:
  • maybe_set (typing.Optional[typing.Any]) – An object which may be a set
  • wrap_string (bool) – Whether to wrap maybe_set as a singleton if it is a string. Otherwise, the string will be converted into a set of individual characters.
Returns:

s – Either the original object, or maybe_set wrapped in a set, if it was not already a set. If maybe_set was None, then an empty set is returned.

Return type:

typing.Set

pyllars.collection_utils.wrap_string_in_list(maybe_string: Any) → Sequence[source]

If maybe_string is a string, then wrap it in a list.

The motivation for this function is that some functions return either a single string or multiple strings as a list. The return value of this function can be iterated over safely.

This function will fail if maybe_string is not a string and it not a sequence.

Parameters:maybe_string (typing.Any) – An object which may be a string
Returns:l – Either the original object, or maybe_string wrapped in a list, if it was a string}
Return type:typing.Sequence