ablkit.bridge

class ablkit.bridge.A3BLBridge(model: ABLModel, reasoner: A3BLReasoner, metric_list: List[BaseMetric])[source]

Bases: SimpleBridge

An ambiguity-aware implementation for bridging machine learning and reasoning parts.

Reference: https://github.com/Hao-Yuan-He/A3BL

Involves the following five steps:
  • Predict class probabilities and indices for the given data examples.

  • Map indices into pseudo-labels.

  • Enumerate all valid pseudo-labels.

  • Revise pseudo-labels to label distribution based on the class probabilities.

  • Train the model.

Parameters:
  • model (ABLModel) – The machine learning model wrapped in ABLModel, used for prediction and training. The wrapped base model should expose extract_features so embeddings are available for the soft-label aggregation.

  • reasoner (A3BLReasoner) – The reasoning part wrapped in A3BLReasoner, used for pseudo-label enumeration and soft-label aggregation.

  • metric_list (List[BaseMetric]) – A list of metrics used for evaluating the model’s performance.

abduce_soft_label(data_examples: ListData) List[List[Any]][source]

Revise predicted pseudo-labels to a soft label, given data examples using abduction.

Parameters:

data_examples (ListData) – Data examples containing predicted pseudo-labels.

Returns:

A list of abduced soft labels for the given data examples.

Return type:

List[List[Any]]

train(train_data: ListData | Tuple[List[List[Any]], List[List[Any]] | None, List[Any]], val_data: ListData | Tuple[List[List[Any]], List[List[Any]] | None, List[Any] | None] | None = None, loops: int = 50, segment_size: int | float = 1.0, eval_interval: int = 1, save_interval: int | None = None, save_dir: str | None = None)[source]

A typical training pipeline of Abuductive Learning.

Parameters:
  • train_data (Union[ListData, Tuple[List[List[Any]], Optional[List[List[Any]]], List[Any]]]) – Training data should be in the form of (X, gt_pseudo_label, Y) or a ListData object with X, gt_pseudo_label and Y attributes. - X is a list of sublists representing the input data. - gt_pseudo_label is only used to evaluate the performance of the ABLModel but not to train. gt_pseudo_label can be None. - Y is a list representing the ground truth reasoning result for each sublist in X.

  • label_data (Union[ListData, Tuple[List[List[Any]], List[List[Any]], List[Any]]], optional) – Labeled data should be in the same format as train_data. The only difference is that the gt_pseudo_label in label_data should not be None and will be utilized to train the model. Defaults to None.

  • val_data (Union[ListData, Tuple[List[List[Any]], Optional[List[List[Any]]], Optional[List[Any]]]], optional) – Validation data should be in the same format as train_data. Both gt_pseudo_label and Y can be either None or not, which depends on the evaluation metircs in self.metric_list. If val_data is None, train_data will be used to validate the model during training time. Defaults to None.

  • loops (int) – Learning part and Reasoning part will be iteratively optimized for loops times. Defaults to 50.

  • segment_size (Union[int, float]) – Data will be split into segments of this size and data in each segment will be used together to train the model. Defaults to 1.0.

  • eval_interval (int) – The model will be evaluated every eval_interval loop during training, Defaults to 1.

  • save_interval (int, optional) – The model will be saved every eval_interval loop during training. Defaults to None.

  • save_dir (str, optional) – Directory to save the model. Defaults to None.

train_data_iter(train_data, val_data=None, segment_size=1.0)[source]
class ablkit.bridge.BaseBridge(model: ABLModel, reasoner: Reasoner)[source]

Bases: object

A base class for bridging learning and reasoning parts.

This class provides necessary methods that need to be overridden in subclasses to construct a typical pipeline of Abductive Learning (corresponding to train), which involves the following four methods:

  • predict: Predict class indices on the given data examples.

  • idx_to_pseudo_label: Map indices into pseudo-labels.

  • abduce_pseudo_label: Revise pseudo-labels based on abdutive reasoning.

  • pseudo_label_to_idx: Map revised pseudo-labels back into indices.

Parameters:
  • model (ABLModel) – The machine learning model wrapped in ABLModel, which is mainly used for prediction and model training.

  • reasoner (Reasoner) – The reasoning part wrapped in Reasoner, which is used for pseudo-label revision.

abstract abduce_pseudo_label(data_examples: ListData) List[List[Any]][source]

Placeholder for revising pseudo-labels based on abdutive reasoning.

filter_pseudo_label(data_examples: ListData) List[List[Any]][source]

Default filter function for pseudo-label.

abstract idx_to_pseudo_label(data_examples: ListData) List[List[Any]][source]

Placeholder for mapping indices to pseudo-labels.

abstract predict(data_examples: ListData) Tuple[List[List[Any]], List[List[Any]]][source]

Placeholder for predicting class indices from input.

abstract pseudo_label_to_idx(data_examples: ListData) List[List[Any]][source]

Placeholder for mapping pseudo-labels to indices.

abstract test(test_data: ListData | Tuple[List[List[Any]], List[List[Any]] | None, List[Any]]) None[source]

Placeholder for model validation.

abstract train(train_data: ListData | Tuple[List[List[Any]], List[List[Any]] | None, List[Any]])[source]

Placeholder for training loop of ABductive Learning.

abstract valid(val_data: ListData | Tuple[List[List[Any]], List[List[Any]] | None, List[Any]]) None[source]

Placeholder for model test.

class ablkit.bridge.SimpleBridge(model: ABLModel, reasoner: Reasoner, metric_list: List[BaseMetric])[source]

Bases: BaseBridge

A basic implementation for bridging machine learning and reasoning parts.

This class implements the typical pipeline of Abductive Learning, which involves the following five steps:

  • Predict class probabilities and indices for the given data examples.

  • Map indices into pseudo-labels.

  • Revise pseudo-labels based on abdutive reasoning.

  • Map the revised pseudo-labels to indices.

  • Train the model.

Parameters:
  • model (ABLModel) – The machine learning model wrapped in ABLModel, which is mainly used for prediction and model training.

  • reasoner (Reasoner) – The reasoning part wrapped in Reasoner, which is used for pseudo-label revision.

  • metric_list (List[BaseMetric]) – A list of metrics used for evaluating the model’s performance.

abduce_pseudo_label(data_examples: ListData) List[List[Any]][source]

Revise predicted pseudo-labels of the given data examples using abduction.

Parameters:

data_examples (ListData) – Data examples containing predicted pseudo-labels.

Returns:

A list of abduced pseudo-labels for the given data examples.

Return type:

List[List[Any]]

concat_data_examples(unlabel_data_examples: ListData, label_data_examples: ListData | None) ListData[source]

Concatenate unlabeled and labeled data examples. abduced_pseudo_label of unlabeled data examples and gt_pseudo_label of labeled data examples will be used to train the model.

Parameters:
  • unlabel_data_examples (ListData) – Unlabeled data examples to concatenate.

  • label_data_examples (ListData, optional) – Labeled data examples to concatenate, if available.

Returns:

Concatenated data examples.

Return type:

ListData

data_preprocess(prefix: str, data: ListData | Tuple[List[List[Any]], List[List[Any]] | None, List[Any]]) ListData[source]

Transform data in the form of (X, gt_pseudo_label, Y) into ListData.

Parameters:
  • prefix (str) – A prefix indicating the type of data processing (e.g., ‘train’, ‘test’).

  • data (Union[ListData, Tuple[List[List[Any]], Optional[List[List[Any]]], List[Any]]]) – Data to be preprocessed. Can be ListData or a tuple of lists.

Returns:

The preprocessed ListData object.

Return type:

ListData

idx_to_pseudo_label(data_examples: ListData) List[List[Any]][source]

Map indices of data examples into pseudo-labels.

Parameters:

data_examples (ListData) – Data examples containing the indices.

Returns:

A list of pseudo-labels converted from indices.

Return type:

List[List[Any]]

predict(data_examples: ListData) Tuple[List[ndarray], List[ndarray]][source]

Predict class indices and probabilities (if predict_proba is implemented in self.model.base_model) on the given data examples.

Parameters:

data_examples (ListData) – Data examples on which predictions are to be made.

Returns:

A tuple containing lists of predicted indices and probabilities.

Return type:

Tuple[List[ndarray], List[ndarray]]

pseudo_label_to_idx(data_examples: ListData) List[List[Any]][source]

Map pseudo-labels of data examples into indices.

Parameters:

data_examples (ListData) – Data examples containing pseudo-labels.

Returns:

A list of indices converted from pseudo-labels.

Return type:

List[List[Any]]

supervised_abduce_pseudo_label(data_examples: ListData) List[List[Any]][source]

Revise predicted pseudo-labels of the given data examples using ground truth.

Parameters:

data_examples (ListData) – Data examples containing predicted pseudo-labels.

Returns:

A list of ground truth/abduced pseudo-labels for the given data examples.

Return type:

List[List[Any]]

test(test_data: ListData | Tuple[List[List[Any]], List[List[Any]] | None, List[Any] | None]) None[source]

Test the model with the given test data.

Parameters:

test_data (Union[ListData, Tuple[List[List[Any]], Optional[List[List[Any]]], Optional[List[Any]]]]) – Test data should be in the form of (X, gt_pseudo_label, Y) or a ListData object with X, gt_pseudo_label and Y attributes. Both gt_pseudo_label and Y can be either None or not, which depends on the evaluation metircs in self.metric_list.

train(train_data: ListData | Tuple[List[List[Any]], List[List[Any]] | None, List[Any]], label_data: ListData | Tuple[List[List[Any]], List[List[Any]], List[Any]] | None = None, val_data: ListData | Tuple[List[List[Any]], List[List[Any]] | None, List[Any] | None] | None = None, loops: int = 50, segment_size: int | float = 1.0, use_supervised_data: bool = False, eval_interval: int = 1, save_interval: int | None = None, save_dir: str | None = None)[source]

A typical training pipeline of Abuductive Learning.

Parameters:
  • train_data (Union[ListData, Tuple[List[List[Any]], Optional[List[List[Any]]], List[Any]]]) – Training data should be in the form of (X, gt_pseudo_label, Y) or a ListData object with X, gt_pseudo_label and Y attributes. - X is a list of sublists representing the input data. - gt_pseudo_label is only used to evaluate the performance of the ABLModel but not to train. gt_pseudo_label can be None. - Y is a list representing the ground truth reasoning result for each sublist in X.

  • label_data (Union[ListData, Tuple[List[List[Any]], List[List[Any]], List[Any]]], optional) – Labeled data should be in the same format as train_data. The only difference is that the gt_pseudo_label in label_data should not be None and will be utilized to train the model. Defaults to None.

  • val_data (Union[ListData, Tuple[List[List[Any]], Optional[List[List[Any]]], Optional[List[Any]]]], optional) – Validation data should be in the same format as train_data. Both gt_pseudo_label and Y can be either None or not, which depends on the evaluation metircs in self.metric_list. If val_data is None, train_data will be used to validate the model during training time. Defaults to None.

  • loops (int) – Learning part and Reasoning part will be iteratively optimized for loops times. Defaults to 50.

  • segment_size (Union[int, float]) – Data will be split into segments of this size and data in each segment will be used together to train the model. Defaults to 1.0.

  • eval_interval (int) – The model will be evaluated every eval_interval loop during training, Defaults to 1.

  • save_interval (int, optional) – The model will be saved every eval_interval loop during training. Defaults to None.

  • save_dir (str, optional) – Directory to save the model. Defaults to None.

valid(val_data: ListData | Tuple[List[List[Any]], List[List[Any]] | None, List[Any] | None]) None[source]

Validate the model with the given validation data.

Parameters:

val_data (Union[ListData, Tuple[List[List[Any]], Optional[List[List[Any]]], Optional[List[Any]]]]) – Validation data should be in the form of (X, gt_pseudo_label, Y) or a ListData object with X, gt_pseudo_label and Y attributes. Both gt_pseudo_label and Y can be either None or not, which depends on the evaluation metircs in self.metric_list.

class ablkit.bridge.VerificationBridge(model: ABLModel, reasoner: VerificationReasoner, metric_list: List[BaseMetric])[source]

Bases: SimpleBridge

Bridge implementing the Verification Learning training loop.

Parameters:
  • model (ABLModel) – Wrapped learning model.

  • reasoner (VerificationReasoner) – Top-K reasoner. The bridge reads reasoner.top_k to decide how many training passes to run per segment.

  • metric_list (List[BaseMetric]) – Evaluation metrics, identical to SimpleBridge.

train(train_data: ListData | Tuple[List[List[Any]], List[List[Any]] | None, List[Any]], val_data: ListData | Tuple[List[List[Any]], List[List[Any]] | None, List[Any] | None] | None = None, loops: int = 50, segment_size: int | float = 1.0, eval_interval: int = 1, save_interval: int | None = None, save_dir: str | None = None) None[source]

Verification Learning training loop. For each segment we predict once, enumerate the top-K consistent candidates, then run a model.train pass per candidate.