ablkit.reasoning

class ablkit.reasoning.KBBase(pseudo_label_list: ~typing.List[~typing.Any], max_err: float = 1e-10, use_cache: bool = True, key_func: ~typing.Callable = <function to_hashable>, cache_size: int = 4096)[source]

Bases: ABC

Base class for knowledge base.

Parameters:

pseudo_label_list (List[Any]) – List of possible pseudo-labels. It’s recommended to arrange the pseudo-labels in this list so that each aligns with its corresponding index in the base model: the first with the 0th index, the second with the 1st, and so forth.
max_err (float, optional) – The upper tolerance limit when comparing the similarity between the reasoning result of pseudo-labels and the ground truth. This is only applicable when the reasoning result is of a numerical type. This is particularly relevant for regression problems where exact matches might not be feasible. Defaults to 1e-10.
use_cache (bool, optional) – Whether to use abl_cache for previously abduced candidates to speed up subsequent operations. Defaults to True.
key_func (Callable, optional) – A function employed for hashing in abl_cache. This is only operational when use_cache is set to True. Defaults to to_hashable.
cache_size (int, optional) – The cache size in abl_cache. This is only operational when use_cache is set to True. Defaults to 4096.

Notes

Users should derive from this base class to build their own knowledge base. For the user-build KB (a derived subclass), it’s only required for the user to provide the pseudo_label_list and override the logic_forward function (specifying how to perform logical reasoning). After that, other operations (e.g. how to perform abductive reasoning) will be automatically set up.

abduce_candidates(pseudo_label: List[Any], y: Any, x: List[Any], max_revision_num: int, require_more_revision: int) → List[List[Any]][source]

Perform abductive reasoning to get a candidate compatible with the knowledge base.

Parameters:

pseudo_label (List[Any]) – Pseudo-labels of an example (to be revised by abductive reasoning).
y (Any) – Ground truth of the reasoning result for the example.
x (List[Any]) – The example. If the information from the example is not required in the reasoning process, then this parameter will not have any effect.
max_revision_num (int) – The upper limit on the number of revised labels for each example.
require_more_revision (int) – Specifies additional number of revisions permitted beyond the minimum required.

Returns:

A tuple of two elements. The first element is a list of candidate revisions, i.e. revised pseudo-labels of the example. that are compatible with the knowledge base. The second element is a list of reasoning results corresponding to each candidate, i.e., the outcome of the logic_forward function.

Return type:

Tuple[List[List[Any]], List[Any]]

abstract logic_forward(pseudo_label: List[Any], x: List[Any] | None = None) → Any[source]

How to perform (deductive) logical reasoning, i.e. matching an example’s pseudo-labels to its reasoning result. Users are required to provide this.

Parameters:

pseudo_label (List[Any]) – Pseudo-labels of an example.
x (List[Any], optional) – The example. If deductive logical reasoning does not require any information from the example, the overridden function provided by the user can omit this parameter.

Returns:

The reasoning result.

Return type:

Any

revise_at_idx(pseudo_label: List[Any], y: Any, x: List[Any], revision_idx: List[int]) → List[List[Any]][source]

Revise the pseudo-labels at specified index positions.

Parameters:

pseudo_label (List[Any]) – Pseudo-labels of an example (to be revised).
y (Any) – Ground truth of the reasoning result for the example.
x (List[Any]) – The example. If the information from the example is not required in the reasoning process, then this parameter will not have any effect.
revision_idx (List[int]) – A list specifying indices of where revisions should be made to the pseudo-labels.

Returns:

A tuple of two elements. The first element is a list of candidate revisions, i.e. revised pseudo-labels of the example. that are compatible with the knowledge base. The second element is a list of reasoning results corresponding to each candidate, i.e., the outcome of the logic_forward function.

Return type:

Tuple[List[List[Any]], List[Any]]

class ablkit.reasoning.GroundKB(pseudo_label_list: List[Any], GKB_len_list: List[int], max_err: float = 1e-10)[source]

Bases: KBBase

Knowledge base with a ground KB (GKB). Ground KB is a knowledge base prebuilt upon class initialization, storing all potential candidates along with their respective reasoning result. Ground KB can accelerate abductive reasoning in abduce_candidates.

Parameters:

pseudo_label_list (List[Any]) – Refer to class KBBase.
GKB_len_list (List[int]) – List of possible lengths for pseudo-labels of an example.
max_err (float, optional) – Refer to class KBBase.

Notes

Users can also inherit from this class to build their own knowledge base. Similar to KBBase, users are only required to provide the pseudo_label_list and override the logic_forward function. Additionally, users should provide the GKB_len_list. After that, other operations (e.g. auto-construction of GKB, and how to perform abductive reasoning) will be automatically set up.

abduce_candidates(pseudo_label: List[Any], y: Any, x: List[Any], max_revision_num: int, require_more_revision: int) → List[List[Any]][source]

Perform abductive reasoning by directly retrieving compatible candidates from the prebuilt GKB. In this way, the time-consuming exhaustive search can be avoided.

Parameters:

pseudo_label (List[Any]) – Pseudo-labels of an example (to be revised by abductive reasoning).
y (Any) – Ground truth of the reasoning result for the example.
x (List[Any]) – The example (unused in GroundKB).
max_revision_num (int) – The upper limit on the number of revised labels for each example.
require_more_revision (int) – Specifies additional number of revisions permitted beyond the minimum required.

Returns:

A tuple of two elements. The first element is a list of candidate revisions, i.e. revised pseudo-labels of the example. that are compatible with the knowledge base. The second element is a list of reasoning results corresponding to each candidate, i.e., the outcome of the logic_forward function.

Return type:

Tuple[List[List[Any]], List[Any]]

class ablkit.reasoning.PrologKB(pseudo_label_list: List[Any], pl_file: str)[source]

Bases: KBBase

Knowledge base provided by a Prolog (.pl) file.

Parameters:

pseudo_label_list (List[Any]) – Refer to class KBBase.
pl_file (str) – Prolog file containing the KB.

Notes

Users can instantiate this class to build their own knowledge base. During the instantiation, users are only required to provide the pseudo_label_list and pl_file. To use the default logic forward and abductive reasoning methods in this class, in the Prolog (.pl) file, there needs to be a rule which is strictly formatted as logic_forward(Pseudo_labels, Res)., e.g., logic_forward([A,B], C) :- C is A+B. For specifics, refer to the logic_forward and get_query_string functions in this class. Users are also welcome to override related functions for more flexible support.

get_query_string(pseudo_label: List[Any], y: Any, x: List[Any], revision_idx: List[int]) → str[source]

Get the query to be used for consulting Prolog. This is a default function for demo, users would override this function to adapt to their own Prolog file. In this demo function, return query logic_forward([kept_labels, Revise_labels], Res)..

Parameters:

pseudo_label (List[Any]) – Pseudo-labels of an example (to be revised by abductive reasoning).
y (Any) – Ground truth of the reasoning result for the example.
x (List[Any]) – The corresponding input example. If the information from the input is not required in the reasoning process, then this parameter will not have any effect.
revision_idx (List[int]) – A list specifying indices of where revisions should be made to the pseudo-labels.

Returns:

A string of the query.

Return type:

str

logic_forward(pseudo_label: List[Any], x: List[Any] | None = None) → Any[source]

Consult prolog with the query logic_forward(pseudo_labels, Res)., and set the returned Res as the reasoning results. To use this default function, there must be a logic_forward method in the pl file to perform reasoning. Otherwise, users would override this function.

Parameters:

pseudo_label (List[Any]) – Pseudo-labels of an example.
x (List[Any]) – The corresponding input example. If the information from the input is not required in the reasoning process, then this parameter will not have any effect.

revise_at_idx(pseudo_label: List[Any], y: Any, x: List[Any], revision_idx: List[int]) → List[List[Any]][source]

Revise the pseudo-labels at specified index positions by querying Prolog.

Parameters:

pseudo_label (List[Any]) – Pseudo-labels of an example (to be revised).
y (Any) – Ground truth of the reasoning result for the example.
x (List[Any]) – The corresponding input example. If the information from the input is not required in the reasoning process, then this parameter will not have any effect.
revision_idx (List[int]) – A list specifying indices of where revisions should be made to the pseudo-labels.

Returns:

A tuple of two elements. The first element is a list of candidate revisions, i.e. revised pseudo-labels of the example. that are compatible with the knowledge base. The second element is a list of reasoning results corresponding to each candidate, i.e., the outcome of the logic_forward function.

Return type:

Tuple[List[List[Any]], List[Any]]

class ablkit.reasoning.Reasoner(kb: KBBase, dist_func: str | Callable = 'confidence', idx_to_label: dict | None = None, max_revision: int | float = -1, require_more_revision: int = 0, use_zoopt: bool = False)[source]

Bases: object

Reasoner for minimizing the inconsistency between the knowledge base and learning models.

Parameters:

kb (class KBBase) – The knowledge base to be used for reasoning.
dist_func (Union[str, Callable], optional) – The distance function used to determine the cost list between each candidate and the given prediction. The cost is also referred to as a consistency measure, wherein the candidate with lowest cost is selected as the final abduced label. It can be either a string representing a predefined distance function or a callable function. The available predefined distance functions: ‘hamming’ | ‘confidence’ | ‘avg_confidence’ | ‘similarity’ | ‘rejection’. ‘hamming’ directly calculates the Hamming distance between the predicted pseudo-label in the data example and each candidate. ‘confidence’ and ‘avg_confidence’ calculate the confidence distance between the predicted probabilities and each candidate, defined as 1 - product and 1 - average of the candidate’s per-symbol probabilities respectively. ‘similarity’ compares candidates against the geometry of the model’s embeddings (requires the base model to expose extract_features; ABLModel then stores the result on data_example.embeddings). ‘rejection’ combines confidence distance with a candidate-complexity penalty, favoring shorter candidates when scores are close. Alternatively, the callable function should have the signature dist_func(data_example, candidates, candidate_idxs, reasoning_results) and must return a cost list. Each element in this cost list should be a numerical value representing the cost for each candidate, and the list should have the same length as candidates. Defaults to ‘confidence’.
idx_to_label (dict, optional) – A mapping from index in the base model to label. If not provided, a default order-based index to label mapping is created. Defaults to None.
max_revision (Union[int, float], optional) – The upper limit on the number of revisions for each data example when performing abductive reasoning. If float, denotes the fraction of the total length that can be revised. A value of -1 implies no restriction on the number of revisions. Defaults to -1.
require_more_revision (int, optional) – Specifies additional number of revisions permitted beyond the minimum required when performing abductive reasoning. Defaults to 0.
use_zoopt (bool, optional) – Whether to use ZOOpt library during abductive reasoning. Defaults to False.

abduce(data_example: ListData) → List[Any][source]

Perform abductive reasoning on the given data example.

Parameters:: data_example (ListData) – Data example.
Returns:: A revised pseudo-labels of the example through abductive reasoning, which is compatible with the knowledge base.
Return type:: List[Any]

batch_abduce(data_examples: ListData) → List[List[Any]][source]: Perform abductive reasoning on the given prediction data examples. For detailed information, refer to abduce.

batch_supervised_abduce(data_examples: ListData) → List[List[Any]][source]: Perform abductive reasoning on the given prediction data examples, using supervised data when gt_pseudo_label is given.

zoopt_budget(symbol_num: int) → int[source]

Set the budget for ZOOpt optimization. The budget can be dynamic relying on the number of symbols considered, e.g., the default implementation shown below. Alternatively, it can be a fixed value, such as simply setting it to 100.

Parameters:: symbol_num (int) – The number of symbols to be considered in the ZOOpt optimization process.
Returns:: The budget for ZOOpt optimization.
Return type:: int

zoopt_score(symbol_num: int, data_example: ListData, sol: Solution) → int[source]

Set the score for a solution. A lower score suggests that ZOOpt library has a higher preference for this solution.

Parameters:

symbol_num (int) – Number of total symbols.
data_example (ListData) – Data example.
sol (Solution) – The solution for ZOOpt library.

Returns:

The score for the solution.

Return type:

int

class ablkit.reasoning.A3BLReasoner(kb, dist_func='confidence', idx_to_label=None, max_revision: int | float = -1, require_more_revision: int = 0, use_zoopt: bool = False, topK: int = 16, temperature: float = 0.2, multi_label: bool = False)[source]

Bases: Reasoner

Reasoner for minimizing the inconsistency between the knowledge base and learning models.

Parameters:

kb (class KBBase) – The knowledge base to be used for reasoning.
dist_func (Union[str, Callable], optional) – The distance function used to determine the cost list between each candidate and the given prediction. The cost is also referred to as a consistency measure, wherein the candidate with the lowest cost is selected as the final abduced label. It can be either a string representing a predefined distance function or a callable function. The available predefined distance functions: ‘hamming’ | ‘confidence’ | ‘avg_confidence’ | ‘similarity’ | ‘rejection’. See Reasoner for the full description of each option. Defaults to ‘confidence’.
idx_to_label (dict, optional) – A mapping from index in the base model to label. If not provided, a default order-based index to label mapping is created. Defaults to None.
max_revision (Union[int, float], optional) – The upper limit on the number of revisions for each data example when performing abductive reasoning. If float, denotes the fraction of the total length that can be revised. A value of -1 implies no restriction on the number of revisions. Defaults to -1.
require_more_revision (int, optional) – Specifies additional number of revisions permitted beyond the minimum required when performing abductive reasoning. Defaults to 0.
use_zoopt (bool, optional) – Whether to use ZOOpt library during abductive reasoning. Defaults to False.
topK (int, optional) – Number of top-ranked candidates to keep when forming the soft label. -1 keeps all candidates. Defaults to 16.
temperature (float, optional) – Softmax temperature used when aggregating candidate probabilities into a soft label. Lower values produce sharper distributions. Defaults to 0.2.
multi_label (bool, optional) – Whether the underlying task is multi-label (each symbol is a binary vector rather than a single class index). Defaults to False.

abduce(data_example: ListData) → Tuple[List[Any], List[Any]][source]

Perform abduction and get a soft label distribution aggregated from all valid candidates that satisfy the underlying rules.

Parameters:

data_example (ListData) – Data example.

Returns:

soft_label (List[Any]) – Soft label aggregated from the top-k valid candidates.
pseudo_label (List[Any]) – Hard pseudo-label revision (the top-1 candidate) that is consistent with the knowledge base.

aggregate(candidates: List[List[int]], candidate_probs: List[float])[source]

batch_abduce(data_examples: ListData) → List[List[Any]][source]: Perform abductive reasoning on the given prediction data examples. For detailed information, refer to abduce.

multi_label_aggregate(candidates: List[List[int]], candidate_probs: List[float])[source]: An multi-label version of A3BL.

class ablkit.reasoning.VerificationReasoner(kb: KBBase, top_k: int = 1, max_iter: int = 10000, idx_to_label: dict | None = None)[source]

Bases: object

Reasoner used by VerificationBridge. Rather than picking a single best candidate via a distance function, it enumerates the top top_k label assignments that satisfy the knowledge base, ordered by joint probability. The bridge then trains the model on each of those candidates.

Parameters:

kb (KBBase) – The knowledge base used to verify candidates. kb.logic_forward must return the reasoning result so it can be compared with each data example’s Y.
top_k (int, optional) – Number of satisfying candidates to enumerate per example. Defaults to 1.
max_iter (int, optional) – Maximum number of enumeration steps per example before giving up and returning the fallback. Defaults to 10000.
idx_to_label (dict, optional) – A mapping from base-model index to pseudo-label. If omitted a default order-based mapping is built from kb.pseudo_label_list.

batch_top_k(data_examples) → List[List[List[Any]]][source]: Run top_k_candidates() on every example in data_examples. Stores the result on data_examples.top_k_candidates and data_examples.top_k_probs. Returns the list of per-example candidate lists.

top_k_candidates(pred_prob: ndarray, y: Any) → Tuple[List[List[Any]], List[float]][source]: Return up to top_k label assignments for one data example whose kb.logic_forward matches y.

ablkit.reasoning.reasoner.enumerate_label_assignments(pred_prob: ndarray, max_iter: int = 10000) → Iterator[Tuple[List[int], float, List[float]]][source]

Yield label-index assignments for a single data example in descending joint-probability order. The walk is a Lawler-style best-first search: each state is the tuple of per-symbol rank indices, and successors are generated by advancing any one symbol to its next-best class.

Parameters:

pred_prob (np.ndarray) – Per-symbol probability matrix with shape (num_symbols, num_classes).
max_iter (int, optional) – Hard cap on the number of yields. Defaults to 10000.

Yields:

labels (List[int]) – Class indices for each symbol.
joint_prob (float) – Product of the chosen per-symbol probabilities.
per_symbol_probs (List[float]) – The chosen probability for each symbol.

ablkit.reasoning.reasoner.top_k_satisfying(pred_prob: ndarray, predicate: Callable[[List[Any]], bool], top_k: int = 1, max_iter: int = 10000, idx_to_label: dict | None = None) → Tuple[List[List[Any]], List[float]][source]

Walk label assignments in descending joint-probability order and return the first top_k that satisfy predicate. If none is found within max_iter iterations the single highest-probability assignment is returned as a fallback so callers always receive a usable label.

Parameters:

pred_prob (np.ndarray) – Per-symbol probability matrix with shape (num_symbols, num_classes).
predicate (Callable[[List[Any]], bool]) – Function called on each candidate label sequence; truthy means the candidate is consistent with the knowledge base.
top_k (int, optional) – Maximum number of satisfying candidates to return. Defaults to 1.
max_iter (int, optional) – Hard cap on enumeration steps. Defaults to 10000.
idx_to_label (dict, optional) – Optional mapping from class index to pseudo-label. When omitted, the raw class indices are returned.

Returns:

candidates (List[List[Any]]) – Label assignments that satisfy predicate (or the fallback).
probs (List[float]) – Joint probability of each returned candidate.