ablkit.utils
- class ablkit.utils.ABLLogger(name: str, logger_name='abl', log_file: str | None = None, log_level: int | str = 'INFO', file_mode: str = 'w')[source]
Bases:
Logger,ManagerMixinFormatted logger used to record messages with different log levels and features.
ABLLoggerprovides a formatted logger that can log messages with different log levels. It allows the creation of logger instances in a similar manner toManagerMixin. The logger has features like distributed log storage and colored terminal output for different log levels.- Parameters:
name (str) – Global instance name.
logger_name (str, optional) –
nameattribute oflogging.Loggerinstance. Defaults to ‘abl’.log_file (str, optional) – The log filename. If specified, a
FileHandlerwill be added to the logger. Defaults to None.log_level (Union[int, str], optional) – The log level of the handler. Defaults to ‘INFO’. If log level is ‘DEBUG’, distributed logs will be saved during distributed training.
file_mode (str, optional) – The file mode used to open log file. Defaults to ‘w’.
Notes
The
nameof the logger and theinstance_nameofABLLoggercould be different.ABLLoggerinstances are retrieved usingABLLogger.get_instance, notlogging.getLogger. This ensuresABLLoggeris not influenced by third-party logging configurations.Unlike
logging.Logger,ABLLoggerwill not log warning or error messages withoutHandler.
Examples
>>> logger = ABLLogger.get_instance(name='ABLLogger', logger_name='Logger') >>> # Although logger has a name attribute like ``logging.Logger`` >>> # We cannot get logger instance by ``logging.getLogger``. >>> assert logger.name == 'Logger' >>> assert logger.instance_name == 'ABLLogger' >>> assert id(logger) != id(logging.getLogger('Logger')) >>> # Get logger that does not store logs. >>> logger1 = ABLLogger.get_instance('logger1') >>> # Get logger only save rank0 logs. >>> logger2 = ABLLogger.get_instance('logger2', log_file='out.log') >>> # Get logger only save multiple ranks logs. >>> logger3 = ABLLogger.get_instance('logger3', log_file='out.log', distributed=True)
- callHandlers(record: LogRecord) None[source]
Pass a record to all relevant handlers.
Override the
callHandlersmethod inlogging.Loggerto avoid multiple warning messages in DDP mode. This method loops through all handlers of the logger instance and its parents in the logger hierarchy.- Parameters:
record (LogRecord) – A
LogRecordinstance containing the logged message.
- classmethod get_current_instance() ABLLogger[source]
Get the latest created
ABLLoggerinstance.- Returns:
The latest created
ABLLoggerinstance. If no instance has been created, returns a logger with the instance name “abl”.- Return type:
- property log_dir
Get the directory where the log is stored.
- Returns:
Directory where the log is stored.
- Return type:
str
- property log_file
Get the file path of the log.
- Returns:
Path of the log.
- Return type:
str
- class ablkit.utils.Cache(func: Callable[[K], T])[source]
Bases:
Generic[K,T]A generic caching mechanism that stores the results of a function call and retrieves them to avoid repeated calculations.
This class implements a dictionary-based cache with a circular doubly linked list to manage the cache entries efficiently. It is designed to be generic, allowing for caching of any callable function.
- Parameters:
func (Callable[[K], T]) – The function to be cached. This function takes an argument of type K and returns a value of type T.
- get_from_dict(obj, *args) T[source]
Retrieve a value from the cache or compute it using
self.func.- Parameters:
obj (Any) – The object to which the cached method/function belongs.
*args (Any) – Arguments used in key generation for cache retrieval or function computation.
- Returns:
The value from the cache or computed by the function.
- Return type:
T
- ablkit.utils.abl_cache()[source]
Decorator to enable caching for a function.
- Returns:
The wrapped function with caching capability.
- Return type:
Callable
- ablkit.utils.avg_confidence_dist(pred_prob: ndarray, candidates_idxs: List[List[Any]]) ndarray[source]
Compute the average confidence distance between prediction probabilities and candidates, where the confidence distance is defined as 1 - the average of prediction probabilities.
- Parameters:
pred_prob (np.ndarray) – Prediction probability distributions, each element is an array representing the probability distribution of a particular prediction.
candidates_idxs (List[List[Any]]) – Multiple possible candidates’ indices.
- Returns:
Confidence distances computed for each candidate.
- Return type:
np.ndarray
- ablkit.utils.confidence_dist(pred_prob: ndarray, candidates_idxs: List[List[Any]]) ndarray[source]
Compute the confidence distance between prediction probabilities and candidates, where the confidence distance is defined as 1 - the product of prediction probabilities.
- Parameters:
pred_prob (np.ndarray) – Prediction probability distributions, each element is an array representing the probability distribution of a particular prediction.
candidates_idxs (List[List[Any]]) – Multiple possible candidates’ indices.
- Returns:
Confidence distances computed for each candidate.
- Return type:
np.ndarray
- ablkit.utils.flatten(nested_list: List[Any | List[Any] | Tuple[Any, ...]]) List[Any][source]
Flattens a nested list at the first level.
- Parameters:
nested_list (List[Union[Any, List[Any], Tuple[Any, ...]]]) – A list which might contain sublists or tuples at the first level.
- Returns:
A flattened version of the input list, where only the first level of sublists and tuples are reduced.
- Return type:
List[Any]
- ablkit.utils.hamming_dist(pred_pseudo_label: List[Any], candidates: List[List[Any]]) ndarray[source]
Compute the Hamming distance between two arrays.
- Parameters:
pred_pseudo_label (List[Any]) – Pseudo-labels of an example.
candidates (List[List[Any]]) – Multiple possible candidates.
- Returns:
Hamming distances computed for each candidate.
- Return type:
np.ndarray
- ablkit.utils.print_log(msg, logger: Logger | str | None = None, level: int | None = 20) None[source]
Print a log message using the specified logger or a default method.
This function logs a message with a given logger, if provided, or prints it using the standard
printfunction. It supports special logger types such as ‘silent’ and ‘current’.- Parameters:
msg (str) – The message to be logged.
logger (Union[Logger, str], optional) – The logger to use for logging the message. It can be a
logging.Loggerinstance, a string specifying the logger name, ‘silent’, ‘current’, or None. If None, theprintmethod is used. - ‘silent’: No message will be printed. - ‘current’: Use the latest created logger to log the message. - other str: The instance name of the logger. AValueErroris raised if the logger has not been created. - None: Theprint()method is used for logging.level (int, optional) – The logging level. This is only applicable when
loggeris a Logger object, ‘current’, or a named logger instance. The default islogging.INFO.
- ablkit.utils.reform_list(flattened_list: List[Any], structured_list: List[Any | List[Any] | Tuple[Any, ...]]) List[List[Any]][source]
Reform the list based on the structure of
structured_list.- Parameters:
flattened_list (List[Any]) – A flattened list of elements.
structured_list (List[Union[Any, List[Any], Tuple[Any, ...]]]) – A list that reflects the desired structure, which may contain sublists or tuples.
- Returns:
A reformed list that mimics the structure of
structured_list.- Return type:
List[List[Any]]
- ablkit.utils.rejection_dist(pred_prob: ndarray, candidates_idxs: List[List[Any]], alpha: float = 0.5) ndarray[source]
Compute a rejection-aware cost that combines model confidence with candidate complexity. Each candidate’s cost is a convex combination of the standard confidence distance and a normalized length term, so longer (more complex) candidates are penalized.
- Parameters:
pred_prob (np.ndarray) – Prediction probability distributions for the symbols in a single data example.
candidates_idxs (List[List[Any]]) – Candidate label assignments.
alpha (float, optional) – Weight in
[0, 1]for the complexity term. Defaults to 0.5.
- Returns:
Cost for each candidate.
- Return type:
np.ndarray
- ablkit.utils.similarity_dist(pred_embeddings: ndarray, candidates_idxs: List[List[Any]]) ndarray[source]
Compute a similarity-based cost for each candidate label assignment.
For each candidate, the cost is the average cosine similarity between symbol pairs assigned different labels minus the average between pairs assigned the same label. Lower values mean the candidate’s labeling is more consistent with the embedding geometry.
- Parameters:
pred_embeddings (np.ndarray) – Embedding matrix for the symbols in a single data example, with shape
(num_symbols, embedding_dim).candidates_idxs (List[List[Any]]) – Candidate label assignments, each of length
num_symbols.
- Returns:
Cost for each candidate.
- Return type:
np.ndarray
- ablkit.utils.tab_data_to_tuple(X: List[Any] | Any, y: List[Any] | Any, reasoning_result: Any | None = 0) Tuple[List[List[Any]], List[List[Any]], List[Any]][source]
Convert a tabular data to a tuple by adding a dimension to each element of X and y. The tuple contains three elements: data, label, and reasoning result. If X is None, return None.
- Parameters:
X (Union[List[Any], Any]) – The data.
y (Union[List[Any], Any]) – The label.
reasoning_result (Any, optional) – The reasoning result. Defaults to 0.
- Returns:
A tuple of (data, label, reasoning_result).
- Return type:
Tuple[List[List[Any]], List[List[Any]], List[Any]]
- ablkit.utils.to_hashable(x: List[Any] | Any) Tuple[Any, ...] | Any[source]
Convert a nested list to a nested tuple so it is hashable.
- Parameters:
x (Union[List[Any], Any]) – A potentially nested list to convert to a tuple.
- Returns:
The input converted to a tuple if it was a list, otherwise the original input.
- Return type:
Union[Tuple[Any, …], Any]