Getting started

How to create an account?

Click the 'Get Started' or 'Try it for Free' on novaml.com, fill in a short form and we will send a confirmation to your e-mail account shortly. You will be able to log into your account via the 'Sign In' link on novaml.com with the credentials used at signup

How to build my first predictive model?

After having logged in, you will be redirected to the Models (Projects) List. If this is your first login, the only available option will be - 'Create a model'. Click this link and upload you training dataset. You will be redirected to the 'Training' tab, where the only obligatory field that you have to fill in is 'Which column would you like to predict?' Select the desired column from the drop down menu. If necessary, change other training parameters: model validation metric, drop columns, upload validation data. Click 'START'. After a few moments you will see the log messages that will inform you on the training progress. The metrics will appear on the interactive plot.

After the training is finished, you will be directed to the 'Training Completed' page. Use one of the three available tabs:
Evaluate - all the relevant model validation metrics
Interpret - check out Feature Importance Matrixes for you model
Predict - upload the new data (test dataset) to predict the target variable using a built model
After the prediction is finalized, you will be able to download a csv file with your predictions

What is the 'Impute missing values' service and how to use it?

This option does not supply a predictive model

Impute missing values in a dataset is a service that will find all the missing values in each column in your dataset, transform all the variables into machine learning format and deploy multiple advanced machine learning algorithms to learn from the rest of your data to predict the missing values

The number of built models will be equal to the number of columns with missing values

After all the missing values will have been imputed, the data will be transformed back to the original format (as uploaded) and returned without missing values

To impute missing values in your dataset please select the 'Impute Missing Values' option in the 'Machine Learning Type' drop down menu and click START. Afther the missing values are imputed, you will be prompted to download you dataset

Restrictions

What kind of data can be used for building a machine learning model?

We currently support tabular data in csv and excell files with 'UTF-8' or 'ISO-8859-1' encoding The uploaded datasets have to comply with a few restrictions: - more than 1 column - more than 100 rows - the training data and the data used for predictions must have the same format

Cooperation

Can the novaml automated machine learning capabilities complement third party analytics services?

We are currently working on an API that will provide access to the full capabilities of the platform from third party vendors. This means that other software will be able to send data/settings to our platform. We will automaticaly deploy the necessary infrastructure, process your data build a model, send analytics/prediction results back to vendor. This way the platform backend will be tied to the third party front end

Confidentiality

Who has the access to the uploaded data?

abstract: Any data uploaded to any third party server can be accessed by the service provider.

We will not try to convince you that we can not access your data, but we have made strict access regulations providing wide access rights to only a handfull of our developers. Our confidentiality policy does not allow employees to open/download any of the user's data. The only information that can be retrieved is the training logs, used to debug & fix errors, making our service stable This is reflected in our confidentiality agreement.

P.S. We are currently working on an on-premises version - software that a user will be able to install on a local machine and get full advantage of the Nova ML automated platform without sending any data to the cloud

What happens under the hood?

What kind of data formats can be processed by novaml algorithm?

Basically we support all tabular data formats: textual and numeric, as well as date-time. Your data will be analyzed and transformed accordingly. Irrelevant information will be excluded from the machine learning pipeline or transformed in a meaningful way

Do I have to clean my data before uploading?

Short answer - No! There is nothing you have to do with your data, just drag & drop and we will do the rest

What if the data contains a datetime column?

Great! Datetime is usually a meaningfull information. We will extract various features from datetime, such as year/month/day/season/hollyday/hour/minute/second/etc. And we have some extra features for multiple datetime columns

Is any feature engineering involved?

Feature engineering is an essential part of a good machine learning project. It is usually a lengthy and complicated process. The good news is that with novaml it is completely automated

What kind of machine learning models are used?

We totally agree with the concept of 'No free lunch'! Thus we will test all kinds of different models and their variations to see which one fits better. To name a few: Decision Trees, Random Forest, XGBoost, Linear/Logistic regression, SVM, etc. Our ensembling method will even combine a few and create some really powerfull features

How is the the model quality evaluated?

Our validation strategy includes different depth: from 1 fold validation to 10 fold cross-validation. During this process the model is trained on a portion of the training dataset and is used to predict on the other portion of the training datset. Because the ground truth (target variable) is available, we then measure the error between the predicted values and known values - this error is then transferred to the metric value

Can I use custom metrics?

Currently we support only the conventional metrics. Our list of metrics will cover the 95% of your needs. But we are working on a custom validation metric interface that will alow the user to define the loss formula presicely