Load data.Keep input data for next step.Branch processing for categorical and numerical columns.Check if there are any date or time columns in the data.Calculate minimum for the column.If there are date or time columns, handle them inside.Calculate minimum for dates / times.Check if there are any nominal columns in the data.Calculate mode of column.If there are nominal columns, handle them inside.Calculate mode for nominal columns.Check if there are any numerical columns in the data.If there are numerical columns, handle them inside.Calculate average for numerical columns.Join numerical and nominal results.Join this result with date results.Remove join key.Replace column names with their original names before aggregation.Remember single row result.Creates a single row version of the data which can be used in deployments.Transform all nominal columns to text so that we make sure that all will have polynominal type after the next transformation.Transform all text columns into polynominal columns.Turn all numerical columns (not integers though) into real columns.After these three operator, all columns (attributes) will be either nominal or real.<br><br>Date or time columns will keep their original type.Unify all value types before anything else.Change role to 'regular' for all columns.Define the target column for the predictive model.Should define a target column?Discretize by binning (same range per bin).Discretize by frequency (same count per bin).Should discretize numerical target column?Map some nominal target values to new values.Should map nominal values?Make sure that target is binary for positive class mapping.Potentially define which one should be the positive class.Should define positive class?Potentially remove columns.Should remove columns?No date processing is desired here, so simply remove the date columns completely.Check if there actually are any date columns in the data.Adds an additional column with the date today. This can be useful for calculations of ages etc.Select the other way around here and store in the macro if that column already exists.Store if the other way round exists.Generate the difference for the two date columns in milliseconds.Both date columns are the same or the other way round already was created - do nothing here!Only calculate the differences between the two date columns if the columns are not equal and if the other way around has not been calculated yet.Loop over all combinations of date attributes and calculate their differences (which includes the new today column generated previously).Loop over all combinations of date attributes and calculate their differences (which includes the new today column generated previously).Remove the generated today column again.<- Extract Day of Month<- Extract Month of Year<- Extract Year<- Extract Quarter of Year<- Extract Half of the Year<- Extract Day of Week<- Extract Month of Quarter<- Extract Time Features<- Remove original date column<- Remove all constant columns (e.g. because all in the same month)<- Transform all remaining date / time features into binary numerical ones<- Rename all generated date attributes with a color instead of an underscore.Loop over the date columns. We needed the check and branch before since otherwise this loop will fail.Do nothing if there are no date columns in the data table.If there are any date columns in the data, work on them inside of this Branch operator.Should handle dates?Remove all unused values so that they are not shown by models and do not change calculations based on the number of values in the data. Also order the value mappings alphabetically.Transform all nominal columns to text so that we make sure that all will have polynominal type after the next transformation.Transform all text columns into polynominal columns.Turn all numerical columns (not integers though) into real columns.Define the value type of all columns which have been identified as text.Unify all value typesAll general preprocessing steps (which do not have an effect on validation) happen inside this operator - double click on it to see the details.Model on cases with label value, apply the model on cases with a missing for the target column.Creates a sample of the data if necessary to guarantee that this process will finish with reasonable runtimes and without memory issues.Remember the unlabeled data.Remember the labeled data.Recall the labeled data to split into training and validation set.Split of a validation set.Remember raw validation data, i.e. without text processing or feature engineering.Remember raw training data, i.e. without text processing or feature engineering.Recall training data.Remember all categorical values in a preprocessing model so that we can later ensure that scoring data does not contain any unknown values.Remember all known categorical values.Replace all missing and infinite values with a keyword for nominal columns, with the average for numerical columns, and with the first date for date columns.Remember missing value handling processing for later application on validation and scoring data.Remember the processing for nominal values, i.e. removing columns with too many and / or one-hot encoding.Remember training data.Recall raw training data.Should handle text columns?Remembers the text processing model.Remembers the transformed training data.Recalls the raw training data.Sample down to fewer examples in case there are too many.Perform a light-weight parameter optimization before we start with the automatic feature engineering to make sure the model is somewhat appropriate.Create calibration data set.Train model.Calibrate the confidences.Apply confidence scaling to optimize accuracy for a given cost matrix.Apply model.Calculate performance.Validating the model on the feature set (ensure same splits for all evaluations).Automatic Featue Engineering.Remember the optimization log.Remember all optimal tradeoffs between low complexity and low error rates.Copy feature set.Apply resulting feature set on complete training data.Remembers the fully transformed training data.Remembers the optimal feature set.Recall training data.Copy training data for preparing the cost-based scoring.Create calibration data.Train model.Calibrate the confidences.Create calibration data.Train model.Calibrate the confidences.Either optimize the hyperparameters of this model or use the desired parameters.Apply confidence scaling to optimize accuracy for a given cost matrix.Remember Model.Recall known values.Recall validation data.Replace all unknown categorical values by missing values.Recall missing processing.Missing processing on validation data.Recall nominal value handling and encoding processing.Remove nominal columns with too many values and perform one-hot encoding if desired.Recall text processing.Recall optimal feature set.Apply text vectorization (if applicable) on validation set.Apply resulting feature set on validation data.Remember validation data.Recall known values.Recall the scoring data (no target value known).Replace all unknown categorical values by missing values.Recall the missing processing.Missing processing on scoring data.Recall nominal value handling and encoding processing.Remove nominal columns with too many values and perform one-hot encoding if desired.Recall text processing model.Recall the optimal feature set.Apply text vectorization (if applicable) on scoring set.Apply resulting feature set on scoring data.Remember transformed scoring data.Recall scoring data.Recall model.Recall training data.Recall validation data.Append validation and scoring data (if any).Create predictions for cases without value and add explanations for predictions.Remember model-specific weights based on the explained predictions.Remember explained predictions.Recall the model.Recall validation data.Create index for multiple hold-out sets.Remember size of scoring set.Only keep examples for current hold-out index.Apply model.Log the time needed for the scoring.Calculate error on current hold-out set.Try to add additional binominal performance measurements.Calculate performance for each hold-out set.Calculate average performance for multiple hold-out sets.Remember performance.Recall the model.Recall validation data.Create lift chart.Remembers the lift chart.This subprocess will try to create a lift chart which will only succeed if this is a binary classification problem.Recall model.Recall training data.Create model simulator.Remembers the model simulator.Recall training data.Recall validation data.Append the training and the validation data. Make sure that the data sets are compatible (same features and nominal values).Remember the data used to train the final production model.Remember size of final production data set.Generate the statistics for the fully transformed training data.Recall optimal parameters.Use optimal parameters for production model.Create calibration data.Train model.Calibrate the confidences.Create calibration data.Train model.Calibrate the confidences.Build the final model.Apply confidence scaling to optimize accuracy for a given cost matrix.Remember the prouction model.Remembers the fully transformed training statistics.Log the time since the Retrieve operator has started.Convert to a data set.Extract total time into a macro.Log the training time.Convert to a data set.Extract total time into a macro.Divide by the number of trained examples and multiply with 1000.Retrieve the logged times for the Apply Model operator in the loop.Sum all scoring times up.Extract summed up time into macro.Divide by the number of scored examples and multiply with 1000.Log the number of evaluated feature sets and generated features.Convert to a data set.Extract number of evaluated feature sets as macro.Extract number of generated features as macro.Log all the times a model has been created in the process.Convert to a data set.Derive the model counts from the number of validations.Transpose model application data.Sum up all model applications.Extract total number of model applications into a macro.Combine all logged times and provide them as a data set.Clear individual log table.Clear individual log table.Clear individual log table.Clear individual log table.Clear individual log table.Clear all log tables to not clutter up the results.Collect all runtimes and provide them as data.Deliver all logged performances.Collect runtimes and add annotations to the results. Finally, deliver all results to the result ports.<b>(1) - BASIC PREPROCESSING</b>Creates a training and a validation set - the validation set will be used in a robust multiple hold-out performance calculation.Loads the data set and performs some basic preprocessing task. Delivers all labeled data points as well unlabeled ones for which the model should be applied later on.<b>(2) - FEATURE ENGINEERING & MODELING</b>Handle text columns if desired and stores the text processing model.Performs automatic feature engineering if desired. This happens in addition to the basic feature engineering done before (text processing, date handling, one-hot encoding etc.)Performs the actual model training and automatic hyperparameter tuning (parameter optimization) if desired.<b>(3) - TRANSFORM VALIDATION & SCORING DATA</b>Transform the validation data (known target value) using the same preprocessing and features.Transform the scoring data (no known target value) using the same preprocessing and features.<b>(4) - SCORING, VALIDATION, EXPLANATIONS, WEIGHTS & SIMULATOR</b>Creates the model simulator.Applies the model on the validation and the scoring data sets for scoring. Also explains the predictions and calculate model-specific weights.Perform a multiple hold-out set validation with robust estimation which provides similar quality of performance estimations than a cross validation with smaller runtimes.<b>(6) - PROCESS RESULTS</b>Performs some basic feature engineering and preprocessing such as missing value handling or one-hot encoding. Text columns will be handled later.<b>(5) - PRODUCTION MODEL</b>Creates a final production model by training a model with the same parameters on the combined training and validation data sets.