Collection utilities¶

This module implements helpers for working with collections. In some cases, the iterable is restricted to a particular type, such as a list or set.

Many of the function names mention specific data structures, such as “list”s or “dict”s, in the names for historical reasons. In most cases, these functions work with any instance of the more general type (such as Iterable or Mapping). Please see the specific documentation for more details, though.

Iterable helpers¶

`apply_no_return`(items, func, *args, …)	Apply func to each item in items
`flatten_lists`(list_of_lists)	Flatten a list of iterables into a single list
`is_iterator_exhausted`(iterator, return_element)	Check if the iterator is exhausted
`list_insert_list`(l, to_insert, index)	Insert to_insert into a shallow copy of l at position index.
`list_remove_list`(l, to_remove)	Remove items in to_remove from l
`list_to_dict`(l, f)	Convert the list to a dictionary in which keys and values are adjacent in the list.
`remove_nones`(l, return_np_array)	Remove None`s from `l
`replace_none_with_empty_iter`(i)	Return an empty iterator if i is None.
`wrap_in_list`(maybe_sequence)	If maybe_sequence is not a sequence, then wrap it in a list
`wrap_string_in_list`(maybe_string)	If maybe_string is a string, then wrap it in a list.

Set helpers¶

`wrap_in_set`(maybe_set, wrap_string)	If maybe_set is not a set, then wrap it in a set.
`get_set_pairwise_intersections`(dict_of_sets, …)	Find the pairwise intersections among sets in dict_of_sets
`merge_sets`(*set_args)	Given any number of sets, merge them into a single set

Mapping helpers¶

`reverse_dict`(d)	Create a new dictionary in which the keys and values of d are switched
`sort_dict_keys_by_value`(d)	Sort the keys in d by their value and return as a list

Definitions¶

This module implements helpers for working with collections. In some cases, the iterable is restricted to a particular type, such as a list or set.

pyllars.collection_utils.apply_no_return(items: Iterable, func: Callable, *args, progress_bar: bool = False, total_items: Optional[int] = None, **kwargs) → None[source]¶

Apply func to each item in items

Unlike map(), this function does not return anything.

Parameters:	items (typing.Iterable) – An iterable func (typing.Callable) – The function to apply to each item args – Positional arguments for func. kwargs – Keyword arguments to pass to func progress_bar (bool) – Whether to show a progress bar when waiting for results. total_items (int or None) – The number of items in items. If not given, len is used. Presumably, this is used when items is a generator and len does not work.
Returns:	None – If a return value is expected, use list comprehension instead.
Return type:	None

pyllars.collection_utils.flatten_lists(list_of_lists: Iterable) → List[source]¶

Flatten a list of iterables into a single list

This function does not further flatten inner iterables.

Parameters:	list_of_lists (typing.Iterable) – The iterable to flatten
Returns:	flattened_list – The flattened list
Return type:	typing.List

pyllars.collection_utils.get_set_pairwise_intersections(dict_of_sets: Mapping[str, Set], return_intersections: bool = True) → pandas.core.frame.DataFrame[source]¶

Find the pairwise intersections among sets in dict_of_sets

Parameters:

dict_of_sets (typing.Mapping[str,typing.Set]) – A mapping in which the keys are the “names” of the sets and the values are the actual sets
return_intersections (bool) – Whether to include the actual set intersections in the return. If False, then only the intersection size will be included.

Returns:

df_pairswise_intersections – A dataframe with the following columns:

set1 : the name of one set in the pair
set2 : the name of the second set in the pair
len(set1) : the size of set1
len(set2) : the size of set2
len(intersection) : the size of the intersection
coverage_small : the fraction of the smaller of set1 or set2 in the intersection
coverage_large : the fraction of the larger of set1 or set2 in the intersection
intersection : the intersection set. Only included if return_intersections is True.

Return type:

pandas.DataFrame

pyllars.collection_utils.is_iterator_exhausted(iterator: Iterable, return_element: bool = False) → Tuple[bool, object][source]¶

Check if the iterator is exhausted

N.B. THIS CONSUMES THE NEXT ELEMENT OF THE ITERATOR! The return_element parameter can change this behavior.

This method is adapted from this SO question: https://stackoverflow.com/questions/661603

Parameters:

iterator (typing.Iterable) – The iterator
return_element (bool) – Whether to return the next element of the iterator

Returns:

is_exhausted (bool) – Whether there was a next element in the iterator
[optional] next_element (object) – It return_element is True, then the consumed element is also returned.

pyllars.collection_utils.list_insert_list(l: Sequence, to_insert: Sequence, index: int) → List[source]¶

Insert to_insert into a shallow copy of l at position index.

This function is adapted from: http://stackoverflow.com/questions/7376019/

Parameters:	l (typing.Sequence) – An iterable to_insert (typing.Sequence) – The items to insert index (int) – The location to begin the insertion
Returns:	updated_l – A list with to_insert inserted into l at position index
Return type:	typing.List

pyllars.collection_utils.list_remove_list(l: Iterable, to_remove: Container) → List[source]¶

Remove items in to_remove from l

Note that “not in” is used to match items in to_remove. Additionally, the return is not lazy.

Parameters:	l (typing.Iterable) – An iterable of items to_remove (typing.Container) – The set of items to remove from l
Returns:	copy_of_l – A shallow copy of l without the items in to_remove.
Return type:	typing.List

pyllars.collection_utils.list_to_dict(l: Sequence, f: Optional[Callable] = None) → Dict[source]¶

Convert the list to a dictionary in which keys and values are adjacent in the list. Optionally, a function f can be passed to apply to each value before adding it to the dictionary.

Parameters:	l (typing.Sequence) – The list of items f (typing.Callable) – A function to apply to each value before inserting it into the list. For example, float could be passed to convert each value to a float.
Returns:	d – The dictionary, defined as described above
Return type:	typing.Dict

Examples

l = ["key1", "value1", "key2", "value2"]
list_to_dict(l, f) == {"key1": f("value1"), "key2": f("value2")}

pyllars.collection_utils.merge_sets(*set_args) → Set[source]¶

Given any number of sets, merge them into a single set

N.B. This function only performs a “shallow” merge. It does not handle nested containers within the “outer” sets.

Parameters:	set_args (typing.Iterable[typing.Container]) – The sets to merge
Returns:	merged_set – A single set containing unique elements from each of the input sets
Return type:	typing.Set

pyllars.collection_utils.remove_nones(l: Iterable, return_np_array: bool = False) → List[source]¶

Remove None`s from `l

Compared to other single-function tests, this uses “is” and avoids strange behavior with data frames, lists of bools, etc.

This function returns a shallow copy and is not lazy.

N.B. This does not test nested lists. So, for example, a list of lists of None values would be unchanged by this function.

Parameters:	l (typing.Iterable) – The iterable return_np_array (bool) – If true, the filtered list will be wrapped in an np.array.
Returns:	l_no_nones – A list or np.array with the None`s removed from `l
Return type:	typing.List

pyllars.collection_utils.replace_none_with_empty_iter(i: Optional[Iterable]) → Iterable[source]¶

Return an empty iterator if i is None. Otherwise, return i.

The purpose of this function is to make iterating over results from functions which return either an iterator or None cleaner. This function does not verify that i is actually an iterator.

Parameters:	i (None or typing.Iterable) – The possibly-empty iterator
Returns:	i – An empty list if iterator is None, or the original iterator otherwise
Return type:	typing.Iterable

pyllars.collection_utils.reverse_dict(d: Mapping) → Dict[source]¶

Create a new dictionary in which the keys and values of d are switched

In the case of duplicate values, it is arbitrary which will be retained.

Parameters:	d (typing.Mapping) – The mapping
Returns:	reversed_d – A dictionary in which the values of d now map to the keys
Return type:	typing.Dict

pyllars.collection_utils.sort_dict_keys_by_value(d: Mapping) → List[source]¶

Sort the keys in d by their value and return as a list

This function uses sorted, so the values should be able to be sorted appropriately by that builtin function.

Parameters:	d (typing.Mapping) – The dictionary
Returns:	sorted_keys – The keys sorted by the associated values
Return type:	typing.List

pyllars.collection_utils.wrap_in_list(maybe_sequence: Any) → Sequence[source]¶

If maybe_sequence is not a sequence, then wrap it in a list

See pyllars.validation_utils.is_sequence() for more details about what counts as a sequence.

Parameters:	maybe_sequence (typing.Any) – An object which may be a sequence
Returns:	list – Either the original object, or maybe_sequence wrapped in a list, if it was not already a sequence
Return type:	typing.Sequence

pyllars.collection_utils.wrap_in_set(maybe_set: Optional[Any], wrap_string: bool = True) → Set[source]¶

If maybe_set is not a set, then wrap it in a set.

Parameters:	maybe_set (typing.Optional[typing.Any]) – An object which may be a set wrap_string (bool) – Whether to wrap maybe_set as a singleton if it is a string. Otherwise, the string will be converted into a set of individual characters.
Returns:	s – Either the original object, or maybe_set wrapped in a set, if it was not already a set. If maybe_set was None, then an empty set is returned.
Return type:	typing.Set

pyllars.collection_utils.wrap_string_in_list(maybe_string: Any) → Sequence[source]¶

If maybe_string is a string, then wrap it in a list.

The motivation for this function is that some functions return either a single string or multiple strings as a list. The return value of this function can be iterated over safely.

This function will fail if maybe_string is not a string and it not a sequence.

Parameters:	maybe_string (typing.Any) – An object which may be a string
Returns:	l – Either the original object, or maybe_string wrapped in a list, if it was a string}
Return type:	typing.Sequence