ablkit.utils

class ablkit.utils.ABLLogger(name: str, logger_name='abl', log_file: str | None = None, log_level: int | str = 'INFO', file_mode: str = 'w')[source]

Bases: Logger, ManagerMixin

Formatted logger used to record messages with different log levels and features.

ABLLogger provides a formatted logger that can log messages with different log levels. It allows the creation of logger instances in a similar manner to ManagerMixin. The logger has features like distributed log storage and colored terminal output for different log levels.

Parameters:

name (str) – Global instance name.
logger_name (str, optional) – name attribute of logging.Logger instance. Defaults to ‘abl’.
log_file (str, optional) – The log filename. If specified, a FileHandler will be added to the logger. Defaults to None.
log_level (Union[int, str], optional) – The log level of the handler. Defaults to ‘INFO’. If log level is ‘DEBUG’, distributed logs will be saved during distributed training.
file_mode (str, optional) – The file mode used to open log file. Defaults to ‘w’.

Notes

The name of the logger and the instance_name of ABLLogger could be different. ABLLogger instances are retrieved using ABLLogger.get_instance, not logging.getLogger. This ensures ABLLogger is not influenced by third-party logging configurations.
Unlike logging.Logger, ABLLogger will not log warning or error messages without Handler.

Examples

>>> logger = ABLLogger.get_instance(name='ABLLogger', logger_name='Logger')
>>> # Although logger has a name attribute like ``logging.Logger``
>>> # We cannot get logger instance by ``logging.getLogger``.
>>> assert logger.name == 'Logger'
>>> assert logger.instance_name == 'ABLLogger'
>>> assert id(logger) != id(logging.getLogger('Logger'))
>>> # Get logger that does not store logs.
>>> logger1 = ABLLogger.get_instance('logger1')
>>> # Get logger only save rank0 logs.
>>> logger2 = ABLLogger.get_instance('logger2', log_file='out.log')
>>> # Get logger only save multiple ranks logs.
>>> logger3 = ABLLogger.get_instance('logger3', log_file='out.log', distributed=True)

callHandlers(record: LogRecord) → None[source]

Pass a record to all relevant handlers.

Override the callHandlers method in logging.Logger to avoid multiple warning messages in DDP mode. This method loops through all handlers of the logger instance and its parents in the logger hierarchy.

Parameters:: record (LogRecord) – A LogRecord instance containing the logged message.

classmethod get_current_instance() → ABLLogger[source]

Get the latest created ABLLogger instance.

Returns:: The latest created ABLLogger instance. If no instance has been created, returns a logger with the instance name “abl”.
Return type:: ABLLogger

property log_dir

Get the directory where the log is stored.

Returns:: Directory where the log is stored.
Return type:: str

property log_file

Get the file path of the log.

Returns:: Path of the log.
Return type:: str

setLevel(level)[source]

Set the logging level of this logger.

Override the setLevel method to clear caches of all ABLLogger instances managed by ManagerMixin. The level must be an int or a str.

Parameters:: level (Union[int, str]) – The logging level to set.

class ablkit.utils.Cache(func: Callable[[K], T])[source]

Bases: Generic[K, T]

A generic caching mechanism that stores the results of a function call and retrieves them to avoid repeated calculations.

This class implements a dictionary-based cache with a circular doubly linked list to manage the cache entries efficiently. It is designed to be generic, allowing for caching of any callable function.

Parameters:: func (Callable[[K], T]) – The function to be cached. This function takes an argument of type K and returns a value of type T.

clear_cache()[source]: Invalidate the entire cache.

get_from_dict(obj, *args) → T[source]

Retrieve a value from the cache or compute it using self.func.

Parameters:

obj (Any) – The object to which the cached method/function belongs.
*args (Any) – Arguments used in key generation for cache retrieval or function computation.

Returns:

The value from the cache or computed by the function.

Return type:

T

init_cache(obj)[source]

Initialize the cache settings.

Parameters:: obj (Any) – The object containing settings for cache initialization.

ablkit.utils.abl_cache()[source]

Decorator to enable caching for a function.

Returns:: The wrapped function with caching capability.
Return type:: Callable

ablkit.utils.avg_confidence_dist(pred_prob: ndarray, candidates_idxs: List[List[Any]]) → ndarray[source]

Compute the average confidence distance between prediction probabilities and candidates, where the confidence distance is defined as 1 - the average of prediction probabilities.

Parameters:

pred_prob (np.ndarray) – Prediction probability distributions, each element is an array representing the probability distribution of a particular prediction.
candidates_idxs (List[List[Any]]) – Multiple possible candidates’ indices.

Returns:

Confidence distances computed for each candidate.

Return type:

np.ndarray

ablkit.utils.confidence_dist(pred_prob: ndarray, candidates_idxs: List[List[Any]]) → ndarray[source]

Compute the confidence distance between prediction probabilities and candidates, where the confidence distance is defined as 1 - the product of prediction probabilities.

Parameters:

pred_prob (np.ndarray) – Prediction probability distributions, each element is an array representing the probability distribution of a particular prediction.
candidates_idxs (List[List[Any]]) – Multiple possible candidates’ indices.

Returns:

Confidence distances computed for each candidate.

Return type:

np.ndarray

ablkit.utils.flatten(nested_list: List[Any | List[Any] | Tuple[Any, ...]]) → List[Any][source]

Flattens a nested list at the first level.

Parameters:: nested_list (List[Union[Any, List[Any], Tuple[Any, ...]]]) – A list which might contain sublists or tuples at the first level.
Returns:: A flattened version of the input list, where only the first level of sublists and tuples are reduced.
Return type:: List[Any]

ablkit.utils.hamming_dist(pred_pseudo_label: List[Any], candidates: List[List[Any]]) → ndarray[source]

Compute the Hamming distance between two arrays.

Parameters:

pred_pseudo_label (List[Any]) – Pseudo-labels of an example.
candidates (List[List[Any]]) – Multiple possible candidates.

Returns:

Hamming distances computed for each candidate.

Return type:

np.ndarray

ablkit.utils.print_log(msg, logger: Logger | str | None = None, level: int | None = 20) → None[source]

Print a log message using the specified logger or a default method.

This function logs a message with a given logger, if provided, or prints it using the standard print function. It supports special logger types such as ‘silent’ and ‘current’.

Parameters:

msg (str) – The message to be logged.
logger (Union[Logger, str], optional) – The logger to use for logging the message. It can be a logging.Logger instance, a string specifying the logger name, ‘silent’, ‘current’, or None. If None, the print method is used. - ‘silent’: No message will be printed. - ‘current’: Use the latest created logger to log the message. - other str: The instance name of the logger. A ValueError is raised if the logger has not been created. - None: The print() method is used for logging.
level (int, optional) – The logging level. This is only applicable when logger is a Logger object, ‘current’, or a named logger instance. The default is logging.INFO.

ablkit.utils.reform_list(flattened_list: List[Any], structured_list: List[Any | List[Any] | Tuple[Any, ...]]) → List[List[Any]][source]

Reform the list based on the structure of structured_list.

Parameters:

flattened_list (List[Any]) – A flattened list of elements.
structured_list (List[Union[Any, List[Any], Tuple[Any, ...]]]) – A list that reflects the desired structure, which may contain sublists or tuples.

Returns:

A reformed list that mimics the structure of structured_list.

Return type:

List[List[Any]]

ablkit.utils.rejection_dist(pred_prob: ndarray, candidates_idxs: List[List[Any]], alpha: float = 0.5) → ndarray[source]

Compute a rejection-aware cost that combines model confidence with candidate complexity. Each candidate’s cost is a convex combination of the standard confidence distance and a normalized length term, so longer (more complex) candidates are penalized.

Parameters:

pred_prob (np.ndarray) – Prediction probability distributions for the symbols in a single data example.
candidates_idxs (List[List[Any]]) – Candidate label assignments.
alpha (float, optional) – Weight in [0, 1] for the complexity term. Defaults to 0.5.

Returns:

Cost for each candidate.

Return type:

np.ndarray

ablkit.utils.similarity_dist(pred_embeddings: ndarray, candidates_idxs: List[List[Any]]) → ndarray[source]

Compute a similarity-based cost for each candidate label assignment.

For each candidate, the cost is the average cosine similarity between symbol pairs assigned different labels minus the average between pairs assigned the same label. Lower values mean the candidate’s labeling is more consistent with the embedding geometry.

Parameters:

pred_embeddings (np.ndarray) – Embedding matrix for the symbols in a single data example, with shape (num_symbols, embedding_dim).
candidates_idxs (List[List[Any]]) – Candidate label assignments, each of length num_symbols.

Returns:

Cost for each candidate.

Return type:

np.ndarray

ablkit.utils.tab_data_to_tuple(X: List[Any] | Any, y: List[Any] | Any, reasoning_result: Any | None = 0) → Tuple[List[List[Any]], List[List[Any]], List[Any]][source]

Convert a tabular data to a tuple by adding a dimension to each element of X and y. The tuple contains three elements: data, label, and reasoning result. If X is None, return None.

Parameters:

X (Union[List[Any], Any]) – The data.
y (Union[List[Any], Any]) – The label.
reasoning_result (Any, optional) – The reasoning result. Defaults to 0.

Returns:

A tuple of (data, label, reasoning_result).

Return type:

Tuple[List[List[Any]], List[List[Any]], List[Any]]

ablkit.utils.to_hashable(x: List[Any] | Any) → Tuple[Any, ...] | Any[source]

Convert a nested list to a nested tuple so it is hashable.

Parameters:: x (Union[List[Any], Any]) – A potentially nested list to convert to a tuple.
Returns:: The input converted to a tuple if it was a list, otherwise the original input.
Return type:: Union[Tuple[Any, …], Any]