nillanet module

class nillanet.model.NN(features, architecture, activation, derivative1, resolver, derivative2, loss, derivative3, learning_rate, scheduler=None, dtype=cupy.float32, backup='/tmp/nn.pkl', initializer=None)[source]

Bases: object

Minimal feedforward neural network using CuPy.

This class implements batched SGD with configurable activation, resolver, and loss functions. Inputs/targets are kept on device to avoid host↔device copies.

Parameters:
  • features (int) – Columnar shape of inputs.

  • architecture (list[int]) – Units per layer, including output layer.

  • activation (Callable[[cupy.ndarray], cupy.ndarray]) – Hidden-layer activation function.

  • derivative1 (Callable[[cupy.ndarray], cupy.ndarray]) – Derivative of activation evaluated at pre-activations.

  • resolver (Callable[[cupy.ndarray], cupy.ndarray]) – Output-layer transfer function (e.g., identity, sigmoid, softmax).

  • derivative2 (Callable[[cupy.ndarray], cupy.ndarray]) – Derivative of resolver evaluated at pre-activations.

  • loss (Callable[..., cupy.ndarray]) – Loss function that accepts named arguments (e.g., yhat, y) and returns per-sample losses or their average.

  • derivative3 (Callable[..., cupy.ndarray]) – Derivative of loss with respect to predictions (same signature as loss).

  • learning_rate (float) – SGD step size.

  • scheduler (Scheduler) – Learning rate scheduler.

  • dtype (cupy.dtype, optional) – Floating point dtype for parameters and data. Defaults to cupy.float32.

  • backup (str) – Path for saving the highest performing model during training.

  • initializer (Initializer) – Function for initializing weights.

W

Layer weight matrices; W[i] has shape (in_features_i, out_features_i).

Type:

list[cupy.ndarray]

batch(x, y)[source]

Run a single forward/backward/update step.

Parameters:
Returns:

the predictions as a tensor

Return type:

q

predict(input)[source]

Run a forward pass to produce predictions.

Parameters:

input (cupy.ndarray | numpy.ndarray) – Inputs of shape (n_samples, n_features). If NumPy, it will be moved to device.

Returns:

Model outputs of shape (n_samples, n_outputs).

Return type:

cupy.ndarray

summary()[source]

Print layer shapes and total parameter count.

train(input, output, epoch=1, epochs=1, batch=0, verbose=False, step=1000, autosave=False, minloss=[999999999])[source]

Train the model for one epoch using simple SGD.

Each epoch is a full pass over the training data. Note that your own external training loop is expected.

Parameters:
  • input (cupy.ndarray | numpy.ndarray) – Training inputs of shape (n_samples, n_features). If NumPy, it will be moved to device.

  • output (cupy.ndarray | numpy.ndarray) – Training targets of shape (n_samples, n_outputs). If NumPy, it will be moved to device.

  • epoch – Step number of epoch.

  • epochs – Expected number of SGD steps that will be run.

  • batch – One of: - 1: sample a single example per step (pure SGD) - 0: use all samples per step (full batch) - >1 and < len(Y): use that mini-batch size per step

  • verbose (bool) – Print progress to stdout.

  • step (int) – Print progress every step epochs.

  • autosave (bool) – Save the model with the highest loss to disk.

  • minloss (list[float]) – Internal use only.

Raises:

SystemExit – If batch is invalid.

class nillanet.activations.Activations[source]

Bases: object

Nonlinearities for NN class. When multiple forms are available, the activation is paired with the most compatible derivative.

linear(x)[source]

Computes the identity function for the given input.

Parameters:

x (tensor) – The values for which the identity function needs to be computed.

Returns:

The values corresponding to the input x.

Return type:

tensor

linear_derivative(x)[source]

Computes the derivative of the identity function for the given input.

Parameters:

x (tensor) – The values for which the derivative of the identity function needs to be computed.

Returns:

The derivative values corresponding to the input x.

Return type:

tensor

relu(x)[source]

Computes the rectified linear unit (ReLU) function for the given input.

Parameters:

x (tensor) – The values for which the ReLU function needs to be computed.

Returns:

The ReLU values corresponding to the input x.

Return type:

tensor

relu_derivative(x)[source]

Computes the derivative of the rectified linear unit (ReLU) function for the given input.

Parameters:

x (tensor) – The values for which the derivative of the ReLU function needs to be computed.

Returns:

The derivative ReLU values corresponding to the input x.

Return type:

tensor

sigmoid(x)[source]

Computes the sigmoid function for the given input.

Parameters:

x (tensor) – The values for which the sigmoid function needs to be computed.

Returns:

The computed sigmoid values corresponding to the input x.

Return type:

tensor

sigmoid_derivative(x)[source]

Computes the derivative of the sigmoid function for the given input.

Parameters:

x (tensor) – The values for which the derivative of the sigmoid function needs to be computed.

Returns:

The computed derivative values corresponding to the input x.

Return type:

tensor

softmax(x)[source]

Computes the softmax function for the given input.

Parameters:

x (tensor) – The input values for which the softmax function needs to be computed.

Returns:

The computed softmax values corresponding to the input x.

Return type:

tensor

softmax_derivative(x)[source]

Computes the derivative of the sigmoid function as a proxy for the softmax derivative.

Parameters:

x (tensor) – The input values for which the derivative of the softmax function needs to be computed.

Returns:

The computed sigmoid derivative values corresponding to the input x.

Return type:

tensor

tanh(x)[source]

Computes the hyperbolic tangent function for the given input.

Parameters:

x (tensor) – The values for which the hyperbolic tangent function needs to be computed.

Returns:

The computed hyperbolic tangent values corresponding to the input x.

Return type:

tensor

tanh_derivative(x)[source]

Computes the derivative of the hyperbolic tangent function for the given input.

Parameters:

x (tensor) – The values for which the derivative of the hyperbolic tangent function needs to be computed.

Returns:

The computed derivative values corresponding to the input x.

Return type:

tensor

class nillanet.loss.Loss[source]

Bases: object

Loss functions for NN class

binary_crossentropy(y, yhat, epsilon=1e-15)[source]

Evaluates the distance between the true labels and the predicted probabilities by evaluating the logarithmic loss.

Parameters

y: tensor

The true labels.

yhat: tensor

The predicted probabilities.

Returns

float:

The logarithmic loss between the true labels and the predicted probabilities.

binary_crossentropy_derivative(y, yhat, epsilon=1e-15)[source]

Derivative of the binary cross-entropy loss function.

Parameters

y: tensor

The true labels.

yhat: tensor

The predicted probabilities.

Returns

tensor:

The derivative of the binary cross-entropy loss function with respect to the inputs.

Raises

RuntimeWarning: divide by zero or invalid value encountered in divide.

Fix afterward.

mae(y, yhat)[source]

Mean Absolute Error (MAE) between predicted values and actual values.

Parameters

yhat: tensor

The predicted values.

y: tensor

The actual values.

Returns

float:

The Mean Absolute Error between the predicted and actual values.

mae_derivative(y, yhat, epsilon=1e-15)[source]

Derivative of the Mean Absolute Error (MAE) loss function.

Parameters

yhat: tensor

The predicted values.

y: tensor

The actual values.

Returns

tensor

The derivative of the MAE loss function with respect to the inputs.

mse(y, yhat)[source]

Mean Squared Error (MSE) between predicted values and actual values.

Parameters

yhat: tensor

The predicted values.

y: tensor

The actual values.

Returns

float:

The Mean Squared Error between the predicted and actual values.

mse_derivative(y, yhat)[source]

Derivative of the Mean Squared Error (MSE) loss function.

Parameters

yhat: tensor

The predicted values.

y: tensor

The actual values.

Returns

tensor

The derivative of the MSE loss function with respect to the inputs.

class nillanet.io.IO[source]

Bases: object

Helper functions for NN class.

load(filename)[source]

Read serialized file

Parameters:

filename – Path to the model pickle file.

save(model, filename)[source]

Serialize the model to disk using pickle.

Parameters:

filename – Path to the output pickle file.

class nillanet.distributions.Distributions[source]

Bases: object

Random training distributions for test modules.

arithmetic_distribution(depth, mode)[source]

predict arithmetic result from distributions of two input values

Parameters:
  • depth (int) – The number of rows for the generated matrix of floating point numbers.

  • mode (str) – The mode of operation. Accepts either “add”, “subtract”, “multiply”, “divide”, or “zero” (always predict 0).

Returns:

tuple of (generated matrix, expected output)

Raises:

SystemExit – If the provided mode is not “summation” or “one_hot”.

linear_distribution(depth)[source]

linear regression that predicts y from x for x-values on a random line with slope and intercept

Parameters:

depth (int) – The number of x-values to generate.

Returns:

tuple of (generated vector of x-values, vector of expected y-values)

logical_distribution(depth, mode)[source]

boolean logic

Parameters:
  • depth (int) – The number of rows for the generated two-column binary matrix.

  • mode (str) – Accepts “and”, “or”, “xor”, or “xnor”.

Returns:

tuple of (generated binary matrix, expected output)

sort(rows, cols)[source]

numerical sort

Parameters:
  • rows (int) – the number of rows for the generated matrix

  • cols (int) – the number of columns for the generated matrix

Returns:

tuple of (generated matrix, sorted matrix)

summation(rows, cols, mode='one_hot')[source]

distributions of binary vectors for testing binary cross entropy (one-hot mode only)

Parameters:
  • rows (int) – The number of rows for the generated binary matrix.

  • cols (int) – The number of columns for the generated binary matrix.

  • mode (str) – The mode of operation. Accepts either “summation” or “one_hot”. - “summation”: Produces a scalar count of the number of ones in each x vector. - “one_hot”: Produces a one-hot encoded representation of the count of ones in each x vector. Defaults to “one_hot”.

Returns:

tuple of (generated binary matrix, expected output)

Raises:

SystemExit if the provided mode is not "summation" or "one_hot".

class nillanet.initializer.Initializer(distribution=None, low=0.0, high=1.0, mean=0.0, std=1.0)[source]

Bases: object

Weight distributions for custom initializations.

__init__(distribution=None, low=0.0, high=1.0, mean=0.0, std=1.0)[source]

Models a configurable statistical distribution for initializing weights.

Parameters:
  • distribution (function) – A function representing the desired distribution. If None, a default normal distribution will be used.

  • low (float) –

    normal distribution:

    the lower boundary for standard deviations away from the mean

    otherwise:

    the lower boundary of the abscissae of the distribution

    Defaults to 0.0.

  • high (float) –

    normal distribution:

    the upper boundary for standard deviations away from the mean

    otherwise:

    the upper boundary of the abscissae of the distribution

    Defaults to 1.0.

  • mean (float) – The mean value for the distribution. Defaults to 0.0.

  • std (float) – The standard deviation for the distribution. Defaults to 1.0.

he(shape)[source]

Generates random numbers with variance equal to 2 over the square of n.

He demonstrated its usefulness for the relu activation function. The low and high parameters are interpreted as the lower and upper bounds of the plateau.

Parameters:

shape (tuple) – The desired shape of the output array containing samples.

Returns:

numpy.ndarray

An array of random samples drawn from the truncated normal distribution.

normal(shape)[source]

Generates random numbers from a bell-shaped distribution within the defined range.

Note that the low and high parameters are interpreted as standard deviations away from the mean.

Parameters:

shape (tuple) – The desired shape of the output array containing samples.

Returns:

numpy.ndarray

An array of random samples drawn from the truncated normal distribution.

uniform(shape)[source]

Generates random numbers from a plateau-shaped distribution within the defined range.

The low and high parameters are interpreted as the lower and upper bounds of the plateau.

Parameters:

shape (tuple) – The desired shape of the output array containing samples.

Returns:

numpy.ndarray

An array of random samples drawn from the uniform distribution.

xavier(shape)[source]

Generates random numbers with variance equal to 1 over the square of n.

Xavier demonstrated its usefulness for tanh or sigmoid activation functions. The low and high parameters are interpreted as the lower and upper bounds of the plateau.

Parameters:

shape (tuple) – The desired shape of the output array containing samples.

Returns:

numpy.ndarray

An array of random samples drawn from the truncated normal distribution.

class nillanet.scheduler.Scheduler(mode, lr, lowbound=1e-08, scaler=0, warmup=0, interval=1, maxsteps=0, custom=None)[source]

Bases: object

Learning rate scheduler for NN class

__init__(mode, lr, lowbound=1e-08, scaler=0, warmup=0, interval=1, maxsteps=0, custom=None)[source]

Read parameters for initializing the scheduler.

Parameters:
  • mode (str) – The mode of learning rate decay. Required.

  • lr (float) – The initial learning rate. Required.

  • lowbound (float) – The lower bound for the learning rate. Default: 1e-8.

  • scaler (float) –

    Mode:

    The scaling factor for the constant mode only.

    Range:

    { x | 0 < x < 1 }.

    Optional:

    Set zero to skip.

  • warmup (int) – The number of epochs for an optional warmup period. Optional, set zero to skip.

  • interval (int) – The interval at which a step is applied. Default: 1.

  • maxsteps (int) – The maximum number of updates applied to the learning rate. Optional, set zero to skip.

  • custom (function) – A custom function for updating the learning rate. Optional, set None to skip.

sigma

the current learning rate

Type:

float

steps

the number of updates applied so far

Type:

int

constant(epoch, epochs)[source]

varies sigma by a constant factor of scaler which ranges from 0 to 1

cosine(epoch, epochs)[source]

varies sigma trigonometrically from lr to lowbound

inverse(epoch, epochs)[source]

varies sigma as the inverse square of the number of steps taken

linear(epoch, epochs)[source]

varies sigma linearly as the number of remaining epochs

step(epoch, epochs)[source]

Update the learning rate based on the current epoch and mode.

Parameters:
  • epoch (int) – the current epoch

  • epochs (int) – the total number of epochs

Returns:

the updated learning rate

Return type:

sigma (float)