Collection utilities¶
This module implements helpers for working with collections. In some cases, the iterable is restricted to a particular type, such as a list or set.
Many of the function names mention specific data structures, such as “list”s or “dict”s, in the names for historical reasons. In most cases, these functions work with any instance of the more general type (such as Iterable or Mapping). Please see the specific documentation for more details, though.
Iterable helpers¶
apply_no_return (items, func, *args, …) |
Apply func to each item in items |
flatten_lists (list_of_lists) |
Flatten a list of iterables into a single list |
is_iterator_exhausted (iterator, return_element) |
Check if the iterator is exhausted |
list_insert_list (l, to_insert, index) |
Insert to_insert into a shallow copy of l at position index. |
list_remove_list (l, to_remove) |
Remove items in to_remove from l |
list_to_dict (l, f) |
Convert the list to a dictionary in which keys and values are adjacent in the list. |
remove_nones (l, return_np_array) |
Remove None`s from `l |
replace_none_with_empty_iter (i) |
Return an empty iterator if i is None. |
wrap_in_list (maybe_sequence) |
If maybe_sequence is not a sequence, then wrap it in a list |
wrap_string_in_list (maybe_string) |
If maybe_string is a string, then wrap it in a list. |
Set helpers¶
wrap_in_set (maybe_set, wrap_string) |
If maybe_set is not a set, then wrap it in a set. |
get_set_pairwise_intersections (dict_of_sets, …) |
Find the pairwise intersections among sets in dict_of_sets |
merge_sets (*set_args) |
Given any number of sets, merge them into a single set |
Mapping helpers¶
reverse_dict (d) |
Create a new dictionary in which the keys and values of d are switched |
sort_dict_keys_by_value (d) |
Sort the keys in d by their value and return as a list |
Definitions¶
This module implements helpers for working with collections. In some cases, the iterable is restricted to a particular type, such as a list or set.
-
pyllars.collection_utils.
apply_no_return
(items: Iterable, func: Callable, *args, progress_bar: bool = False, total_items: Optional[int] = None, **kwargs) → None[source]¶ Apply func to each item in items
Unlike
map()
, this function does not return anything.Parameters: - items (typing.Iterable) – An iterable
- func (typing.Callable) – The function to apply to each item
- args – Positional arguments for func.
- kwargs – Keyword arguments to pass to func
- progress_bar (bool) – Whether to show a progress bar when waiting for results.
- total_items (int or None) – The number of items in items. If not given, len is used. Presumably, this is used when items is a generator and len does not work.
Returns: None – If a return value is expected, use list comprehension instead.
Return type:
-
pyllars.collection_utils.
flatten_lists
(list_of_lists: Iterable) → List[source]¶ Flatten a list of iterables into a single list
This function does not further flatten inner iterables.
Parameters: list_of_lists (typing.Iterable) – The iterable to flatten Returns: flattened_list – The flattened list Return type: typing.List
-
pyllars.collection_utils.
get_set_pairwise_intersections
(dict_of_sets: Mapping[str, Set], return_intersections: bool = True) → pandas.core.frame.DataFrame[source]¶ Find the pairwise intersections among sets in dict_of_sets
Parameters: - dict_of_sets (typing.Mapping[str,typing.Set]) – A mapping in which the keys are the “names” of the sets and the values are the actual sets
- return_intersections (bool) – Whether to include the actual set intersections in the return. If False, then only the intersection size will be included.
Returns: df_pairswise_intersections – A dataframe with the following columns:
- set1 : the name of one set in the pair
- set2 : the name of the second set in the pair
- len(set1) : the size of set1
- len(set2) : the size of set2
- len(intersection) : the size of the intersection
- coverage_small : the fraction of the smaller of set1 or set2 in the intersection
- coverage_large : the fraction of the larger of set1 or set2 in the intersection
- intersection : the intersection set. Only included if return_intersections is True.
Return type:
-
pyllars.collection_utils.
is_iterator_exhausted
(iterator: Iterable, return_element: bool = False) → Tuple[bool, object][source]¶ Check if the iterator is exhausted
N.B. THIS CONSUMES THE NEXT ELEMENT OF THE ITERATOR! The return_element parameter can change this behavior.
This method is adapted from this SO question: https://stackoverflow.com/questions/661603
Parameters: - iterator (typing.Iterable) – The iterator
- return_element (bool) – Whether to return the next element of the iterator
Returns: - is_exhausted (bool) – Whether there was a next element in the iterator
- [optional] next_element (object) – It return_element is True, then the consumed element is also returned.
-
pyllars.collection_utils.
list_insert_list
(l: Sequence, to_insert: Sequence, index: int) → List[source]¶ Insert to_insert into a shallow copy of l at position index.
This function is adapted from: http://stackoverflow.com/questions/7376019/
Parameters: - l (typing.Sequence) – An iterable
- to_insert (typing.Sequence) – The items to insert
- index (int) – The location to begin the insertion
Returns: updated_l – A list with to_insert inserted into l at position index
Return type:
-
pyllars.collection_utils.
list_remove_list
(l: Iterable, to_remove: Container) → List[source]¶ Remove items in to_remove from l
Note that “not in” is used to match items in to_remove. Additionally, the return is not lazy.
Parameters: - l (typing.Iterable) – An iterable of items
- to_remove (typing.Container) – The set of items to remove from l
Returns: copy_of_l – A shallow copy of l without the items in to_remove.
Return type:
-
pyllars.collection_utils.
list_to_dict
(l: Sequence, f: Optional[Callable] = None) → Dict[source]¶ Convert the list to a dictionary in which keys and values are adjacent in the list. Optionally, a function f can be passed to apply to each value before adding it to the dictionary.
Parameters: - l (typing.Sequence) – The list of items
- f (typing.Callable) – A function to apply to each value before inserting it into the list. For example, float could be passed to convert each value to a float.
Returns: d – The dictionary, defined as described above
Return type: Examples
l = ["key1", "value1", "key2", "value2"] list_to_dict(l, f) == {"key1": f("value1"), "key2": f("value2")}
-
pyllars.collection_utils.
merge_sets
(*set_args) → Set[source]¶ Given any number of sets, merge them into a single set
N.B. This function only performs a “shallow” merge. It does not handle nested containers within the “outer” sets.
Parameters: set_args (typing.Iterable[typing.Container]) – The sets to merge Returns: merged_set – A single set containing unique elements from each of the input sets Return type: typing.Set
-
pyllars.collection_utils.
remove_nones
(l: Iterable, return_np_array: bool = False) → List[source]¶ Remove None`s from `l
Compared to other single-function tests, this uses “is” and avoids strange behavior with data frames, lists of bools, etc.
This function returns a shallow copy and is not lazy.
N.B. This does not test nested lists. So, for example, a list of lists of None values would be unchanged by this function.
Parameters: - l (typing.Iterable) – The iterable
- return_np_array (bool) – If true, the filtered list will be wrapped in an np.array.
Returns: l_no_nones – A list or np.array with the None`s removed from `l
Return type:
-
pyllars.collection_utils.
replace_none_with_empty_iter
(i: Optional[Iterable]) → Iterable[source]¶ Return an empty iterator if i is None. Otherwise, return i.
The purpose of this function is to make iterating over results from functions which return either an iterator or None cleaner. This function does not verify that i is actually an iterator.
Parameters: i (None or typing.Iterable) – The possibly-empty iterator Returns: i – An empty list if iterator is None, or the original iterator otherwise Return type: typing.Iterable
-
pyllars.collection_utils.
reverse_dict
(d: Mapping) → Dict[source]¶ Create a new dictionary in which the keys and values of d are switched
In the case of duplicate values, it is arbitrary which will be retained.
Parameters: d (typing.Mapping) – The mapping Returns: reversed_d – A dictionary in which the values of d now map to the keys Return type: typing.Dict
-
pyllars.collection_utils.
sort_dict_keys_by_value
(d: Mapping) → List[source]¶ Sort the keys in d by their value and return as a list
This function uses sorted, so the values should be able to be sorted appropriately by that builtin function.
Parameters: d (typing.Mapping) – The dictionary Returns: sorted_keys – The keys sorted by the associated values Return type: typing.List
-
pyllars.collection_utils.
wrap_in_list
(maybe_sequence: Any) → Sequence[source]¶ If maybe_sequence is not a sequence, then wrap it in a list
See
pyllars.validation_utils.is_sequence()
for more details about what counts as a sequence.Parameters: maybe_sequence (typing.Any) – An object which may be a sequence Returns: list – Either the original object, or maybe_sequence wrapped in a list, if it was not already a sequence Return type: typing.Sequence
-
pyllars.collection_utils.
wrap_in_set
(maybe_set: Optional[Any], wrap_string: bool = True) → Set[source]¶ If maybe_set is not a set, then wrap it in a set.
Parameters: - maybe_set (typing.Optional[typing.Any]) – An object which may be a set
- wrap_string (bool) – Whether to wrap maybe_set as a singleton if it is a string. Otherwise, the string will be converted into a set of individual characters.
Returns: s – Either the original object, or maybe_set wrapped in a set, if it was not already a set. If maybe_set was None, then an empty set is returned.
Return type:
-
pyllars.collection_utils.
wrap_string_in_list
(maybe_string: Any) → Sequence[source]¶ If maybe_string is a string, then wrap it in a list.
The motivation for this function is that some functions return either a single string or multiple strings as a list. The return value of this function can be iterated over safely.
This function will fail if maybe_string is not a string and it not a sequence.
Parameters: maybe_string (typing.Any) – An object which may be a string Returns: l – Either the original object, or maybe_string wrapped in a list, if it was a string} Return type: typing.Sequence