ablkit.learning

class ablkit.learning.ABLModel(base_model: Any)[source]

Bases: object

Serialize data and provide a unified interface for different machine learning models.

Parameters:

base_model (Machine Learning Model) – The machine learning base model used for training and prediction. This model should implement the fit and predict methods. It’s recommended, but not required, for the model to also implement predict_proba (used to populate pred_prob) and extract_features (used to populate data_example.embeddings for distance functions such as similarity).

load(*args, **kwargs) None[source]

Load the model from a file.

This method delegates to the load method of self.base_model. The arguments passed to this method should match those expected by the load method of self.base_model.

predict(data_examples: ListData) Dict[str, Any][source]

Predict the labels and probabilities for the given data.

Parameters:

data_examples (ListData) – A batch of data to predict on.

Returns:

A dictionary containing the predicted labels and probabilities.

Return type:

dict

save(*args, **kwargs) None[source]

Save the model to a file.

This method delegates to the save method of self.base_model. The arguments passed to this method should match those expected by the save method of self.base_model.

train(data_examples: ListData) float[source]

Train the model on the given data.

Parameters:

data_examples (ListData) – A batch of data to train on, which typically contains the data, X, and the corresponding labels, abduced_idx.

Returns:

The loss value of the trained model.

Return type:

float

valid(data_examples: ListData) float[source]

Validate the model on the given data.

Parameters:

data_examples (ListData) – A batch of data to train on, which typically contains the data, X, and the corresponding labels, abduced_idx.

Returns:

The accuracy of the trained model.

Return type:

float

class ablkit.learning.BasicNN(model: Module, loss_fn: Module, optimizer: Optimizer, scheduler: Callable[[...], Any] | None = None, device: device | str = device(type='cpu'), batch_size: int = 32, num_epochs: int = 1, stop_loss: float | None = 0.0001, num_workers: int = 0, save_interval: int | None = None, save_dir: str | None = None, train_transform: Callable[[...], Any] | None = None, test_transform: Callable[[...], Any] | None = None, collate_fn: Callable[[List[Any]], Any] | None = None)[source]

Bases: object

Wrap NN models into the form of an sklearn estimator.

Parameters:
  • model (torch.nn.Module) – The PyTorch model to be trained or used for prediction.

  • loss_fn (torch.nn.Module) – The loss function used for training.

  • optimizer (torch.optim.Optimizer) – The optimizer used for training.

  • scheduler (Callable[..., Any], optional) – The learning rate scheduler used for training, which will be called at the end of each run of the fit method. It should implement the step method. Defaults to None.

  • device (Union[torch.device, str]) – The device on which the model will be trained or used for prediction, Defaults to torch.device(“cpu”).

  • batch_size (int, optional) – The batch size used for training. Defaults to 32.

  • num_epochs (int, optional) – The number of epochs used for training. Defaults to 1.

  • stop_loss (float, optional) – The loss value at which to stop training. Defaults to 0.0001.

  • num_workers (int) – The number of workers used for loading data. Defaults to 0.

  • save_interval (int, optional) – The model will be saved every save_interval epoch during training. Defaults to None.

  • save_dir (str, optional) – The directory in which to save the model during training. Defaults to None.

  • train_transform (Callable[..., Any], optional) – A function/transform that takes an object and returns a transformed version used in the fit and train_epoch methods. Defaults to None.

  • test_transform (Callable[..., Any], optional) – A function/transform that takes an object and returns a transformed version in the predict, predict_proba and score methods. Defaults to None.

  • collate_fn (Callable[[List[T]], Any], optional) – The function used to collate data. Defaults to None.

extract_features(data_loader: DataLoader | None = None, X: List[Any] | None = None) ndarray[source]

Compute feature embeddings for X (or a prebuilt data_loader). When both are provided, data_loader takes precedence.

The wrapped PyTorch model must implement extract_features(x) returning the embedding tensor (typically penultimate-layer activations) used by downstream consumers such as dist_func='similarity'.

Parameters:
  • data_loader (DataLoader, optional) – DataLoader to use directly. Defaults to None.

  • X (List[Any], optional) – Raw input list; converted to a PredictionDataset when used. Defaults to None.

Returns:

Feature embeddings of shape (num_samples, embedding_dim).

Return type:

numpy.ndarray

fit(data_loader: DataLoader | None = None, X: List[Any] | None = None, y: List[int] | None = None) BasicNN[source]

Train the model for self.num_epochs times or until the average loss on one epoch is less than self.stop_loss. It supports training with either a DataLoader object (data_loader) or a pair of input data (X) and target labels (y). If both data_loader and (X, y) are provided, the method will prioritize using the data_loader.

Parameters:
  • data_loader (DataLoader, optional) – The data loader used for training. Defaults to None.

  • X (List[Any], optional) – The input data. Defaults to None.

  • y (List[int], optional) – The target data. Defaults to None.

Returns:

The model itself after training.

Return type:

BasicNN

load(load_path: str) None[source]

Load the model and the optimizer.

Parameters:

load_path (str) – The directory to load the model. Defaults to “”.

predict(data_loader: DataLoader | None = None, X: List[Any] | None = None) ndarray[source]

Predict the class of the input data. This method supports prediction with either a DataLoader object (data_loader) or a list of input data (X). If both data_loader and X are provided, the method will predict the input data in data_loader instead of X.

Parameters:
  • data_loader (DataLoader, optional) – The data loader used for prediction. Defaults to None.

  • X (List[Any], optional) – The input data. Defaults to None.

Returns:

The predicted class of the input data.

Return type:

numpy.ndarray

predict_proba(data_loader: DataLoader | None = None, X: List[Any] | None = None) ndarray[source]

Predict the probability of each class for the input data. This method supports prediction with either a DataLoader object (data_loader) or a list of input data (X). If both data_loader and X are provided, the method will predict the input data in data_loader instead of X.

Parameters:
  • data_loader (DataLoader, optional) – The data loader used for prediction. Defaults to None.

  • X (List[Any], optional) – The input data. Defaults to None.

Warning

This method calculates the probability by applying a softmax function to the output of the neural network. If your neural network already includes a softmax function as its final activation, applying softmax again here will lead to incorrect probabilities.

Returns:

The predicted probability of each class for the input data.

Return type:

numpy.ndarray

save(epoch_id: int = 0, save_path: str | None = None) None[source]

Save the model and the optimizer. User can either provide a save_path or specify the epoch_id at which the model and optimizer is saved. if both save_path and epoch_id are provided, save_path will be used. If only epoch_id is specified, model and optimizer will be saved to the path f”model_checkpoint_epoch_{epoch_id}.pth” under self.save_dir. save_path and epoch_id can not be None simultaneously.

Parameters:
  • epoch_id (int) – The epoch id.

  • save_path (str, optional) – The path to save the model. Defaults to None.

score(data_loader: DataLoader | None = None, X: List[Any] | None = None, y: List[int] | None = None) float[source]

Validate the model. It supports validation with either a DataLoader object (data_loader) or a pair of input data (X) and ground truth labels (y). If both data_loader and (X, y) are provided, the method will prioritize using the data_loader.

Parameters:
  • data_loader (DataLoader, optional) – The data loader used for scoring. Defaults to None.

  • X (List[Any], optional) – The input data. Defaults to None.

  • y (List[int], optional) – The target data. Defaults to None.

Returns:

The accuracy of the model.

Return type:

float

train_epoch(data_loader: DataLoader) float[source]

Train the model with an instance of DataLoader (data_loader) for one epoch.

Parameters:

data_loader (DataLoader) – The data loader used for training.

Returns:

The average loss on one epoch.

Return type:

float

torch_dataset

class ablkit.learning.torch_dataset.ClassificationDataset(X: List[Any], Y: List[int], transform: Callable[[...], Any] | None = None)[source]

Bases: Dataset

Dataset used for classification task.

Parameters:
  • X (List[Any]) – The input data.

  • Y (List[int]) – The target data.

  • transform (Callable[..., Any], optional) – A function/transform that takes an object and returns a transformed version. Defaults to None.

class ablkit.learning.torch_dataset.MultiLabelClassificationDataset(X: List[Any], Y: List[Any], transform: Callable[[...], Any] | None = None)[source]

Bases: ClassificationDataset

Dataset used for multi-label classification, where each target Y[i] is a binary indicator vector (one entry per label) rather than a single class index. Y is stored as a float32 tensor so it can be fed directly into BCEWithLogitsLoss and similar losses.

Parameters:
  • X (List[Any]) – The input data.

  • Y (List[Any]) – The per-sample label vectors. Each entry is converted via numpy.stack and stored as a FloatTensor.

  • transform (Callable[..., Any], optional) – A function/transform that takes an object and returns a transformed version. Defaults to None.

class ablkit.learning.torch_dataset.PredictionDataset(X: List[Any], transform: Callable[[...], Any] | None = None)[source]

Bases: Dataset

Dataset used for prediction.

Parameters:
  • X (List[Any]) – The input data.

  • transform (Callable[..., Any], optional) – A function/transform that takes an object and returns a transformed version. Defaults to None.

class ablkit.learning.torch_dataset.RegressionDataset(X: List[Any], Y: List[Any])[source]

Bases: Dataset

Dataset used for regression task.

Parameters:
  • X (List[Any]) – A list of objects representing the input data.

  • Y (List[Any]) – A list of objects representing the output data.