ablkit.learning
- class ablkit.learning.ABLModel(base_model: Any)[source]
Bases:
objectSerialize data and provide a unified interface for different machine learning models.
- Parameters:
base_model (Machine Learning Model) – The machine learning base model used for training and prediction. This model should implement the
fitandpredictmethods. It’s recommended, but not required, for the model to also implementpredict_proba(used to populatepred_prob) andextract_features(used to populatedata_example.embeddingsfor distance functions such assimilarity).
- load(*args, **kwargs) None[source]
Load the model from a file.
This method delegates to the
loadmethod of self.base_model. The arguments passed to this method should match those expected by theloadmethod of self.base_model.
- predict(data_examples: ListData) Dict[str, Any][source]
Predict the labels and probabilities for the given data.
- Parameters:
data_examples (ListData) – A batch of data to predict on.
- Returns:
A dictionary containing the predicted labels and probabilities.
- Return type:
dict
- save(*args, **kwargs) None[source]
Save the model to a file.
This method delegates to the
savemethod of self.base_model. The arguments passed to this method should match those expected by thesavemethod of self.base_model.
- class ablkit.learning.BasicNN(model: Module, loss_fn: Module, optimizer: Optimizer, scheduler: Callable[[...], Any] | None = None, device: device | str = device(type='cpu'), batch_size: int = 32, num_epochs: int = 1, stop_loss: float | None = 0.0001, num_workers: int = 0, save_interval: int | None = None, save_dir: str | None = None, train_transform: Callable[[...], Any] | None = None, test_transform: Callable[[...], Any] | None = None, collate_fn: Callable[[List[Any]], Any] | None = None)[source]
Bases:
objectWrap NN models into the form of an sklearn estimator.
- Parameters:
model (torch.nn.Module) – The PyTorch model to be trained or used for prediction.
loss_fn (torch.nn.Module) – The loss function used for training.
optimizer (torch.optim.Optimizer) – The optimizer used for training.
scheduler (Callable[..., Any], optional) – The learning rate scheduler used for training, which will be called at the end of each run of the
fitmethod. It should implement thestepmethod. Defaults to None.device (Union[torch.device, str]) – The device on which the model will be trained or used for prediction, Defaults to torch.device(“cpu”).
batch_size (int, optional) – The batch size used for training. Defaults to 32.
num_epochs (int, optional) – The number of epochs used for training. Defaults to 1.
stop_loss (float, optional) – The loss value at which to stop training. Defaults to 0.0001.
num_workers (int) – The number of workers used for loading data. Defaults to 0.
save_interval (int, optional) – The model will be saved every
save_intervalepoch during training. Defaults to None.save_dir (str, optional) – The directory in which to save the model during training. Defaults to None.
train_transform (Callable[..., Any], optional) – A function/transform that takes an object and returns a transformed version used in the
fitandtrain_epochmethods. Defaults to None.test_transform (Callable[..., Any], optional) – A function/transform that takes an object and returns a transformed version in the
predict,predict_probaandscoremethods. Defaults to None.collate_fn (Callable[[List[T]], Any], optional) – The function used to collate data. Defaults to None.
- extract_features(data_loader: DataLoader | None = None, X: List[Any] | None = None) ndarray[source]
Compute feature embeddings for
X(or a prebuiltdata_loader). When both are provided,data_loadertakes precedence.The wrapped PyTorch model must implement
extract_features(x)returning the embedding tensor (typically penultimate-layer activations) used by downstream consumers such asdist_func='similarity'.- Parameters:
data_loader (DataLoader, optional) – DataLoader to use directly. Defaults to None.
X (List[Any], optional) – Raw input list; converted to a
PredictionDatasetwhen used. Defaults to None.
- Returns:
Feature embeddings of shape
(num_samples, embedding_dim).- Return type:
numpy.ndarray
- fit(data_loader: DataLoader | None = None, X: List[Any] | None = None, y: List[int] | None = None) BasicNN[source]
Train the model for self.num_epochs times or until the average loss on one epoch is less than self.stop_loss. It supports training with either a DataLoader object (data_loader) or a pair of input data (X) and target labels (y). If both data_loader and (X, y) are provided, the method will prioritize using the data_loader.
- Parameters:
data_loader (DataLoader, optional) – The data loader used for training. Defaults to None.
X (List[Any], optional) – The input data. Defaults to None.
y (List[int], optional) – The target data. Defaults to None.
- Returns:
The model itself after training.
- Return type:
- load(load_path: str) None[source]
Load the model and the optimizer.
- Parameters:
load_path (str) – The directory to load the model. Defaults to “”.
- predict(data_loader: DataLoader | None = None, X: List[Any] | None = None) ndarray[source]
Predict the class of the input data. This method supports prediction with either a DataLoader object (data_loader) or a list of input data (X). If both data_loader and X are provided, the method will predict the input data in data_loader instead of X.
- Parameters:
data_loader (DataLoader, optional) – The data loader used for prediction. Defaults to None.
X (List[Any], optional) – The input data. Defaults to None.
- Returns:
The predicted class of the input data.
- Return type:
numpy.ndarray
- predict_proba(data_loader: DataLoader | None = None, X: List[Any] | None = None) ndarray[source]
Predict the probability of each class for the input data. This method supports prediction with either a DataLoader object (data_loader) or a list of input data (X). If both data_loader and X are provided, the method will predict the input data in data_loader instead of X.
- Parameters:
data_loader (DataLoader, optional) – The data loader used for prediction. Defaults to None.
X (List[Any], optional) – The input data. Defaults to None.
Warning
This method calculates the probability by applying a softmax function to the output of the neural network. If your neural network already includes a softmax function as its final activation, applying softmax again here will lead to incorrect probabilities.
- Returns:
The predicted probability of each class for the input data.
- Return type:
numpy.ndarray
- save(epoch_id: int = 0, save_path: str | None = None) None[source]
Save the model and the optimizer. User can either provide a save_path or specify the epoch_id at which the model and optimizer is saved. if both save_path and epoch_id are provided, save_path will be used. If only epoch_id is specified, model and optimizer will be saved to the path f”model_checkpoint_epoch_{epoch_id}.pth” under
self.save_dir. save_path and epoch_id can not be None simultaneously.- Parameters:
epoch_id (int) – The epoch id.
save_path (str, optional) – The path to save the model. Defaults to None.
- score(data_loader: DataLoader | None = None, X: List[Any] | None = None, y: List[int] | None = None) float[source]
Validate the model. It supports validation with either a DataLoader object (data_loader) or a pair of input data (X) and ground truth labels (y). If both data_loader and (X, y) are provided, the method will prioritize using the data_loader.
- Parameters:
data_loader (DataLoader, optional) – The data loader used for scoring. Defaults to None.
X (List[Any], optional) – The input data. Defaults to None.
y (List[int], optional) – The target data. Defaults to None.
- Returns:
The accuracy of the model.
- Return type:
float
torch_dataset
- class ablkit.learning.torch_dataset.ClassificationDataset(X: List[Any], Y: List[int], transform: Callable[[...], Any] | None = None)[source]
Bases:
DatasetDataset used for classification task.
- Parameters:
X (List[Any]) – The input data.
Y (List[int]) – The target data.
transform (Callable[..., Any], optional) – A function/transform that takes an object and returns a transformed version. Defaults to None.
- class ablkit.learning.torch_dataset.MultiLabelClassificationDataset(X: List[Any], Y: List[Any], transform: Callable[[...], Any] | None = None)[source]
Bases:
ClassificationDatasetDataset used for multi-label classification, where each target
Y[i]is a binary indicator vector (one entry per label) rather than a single class index.Yis stored as afloat32tensor so it can be fed directly intoBCEWithLogitsLossand similar losses.- Parameters:
X (List[Any]) – The input data.
Y (List[Any]) – The per-sample label vectors. Each entry is converted via
numpy.stackand stored as aFloatTensor.transform (Callable[..., Any], optional) – A function/transform that takes an object and returns a transformed version. Defaults to None.
- class ablkit.learning.torch_dataset.PredictionDataset(X: List[Any], transform: Callable[[...], Any] | None = None)[source]
Bases:
DatasetDataset used for prediction.
- Parameters:
X (List[Any]) – The input data.
transform (Callable[..., Any], optional) – A function/transform that takes an object and returns a transformed version. Defaults to None.