Class SimpleCart

All Implemented Interfaces:
Serializable, Cloneable, AdditionalMeasureProducer, CapabilitiesHandler, OptionHandler, Randomizable, RevisionHandler, TechnicalInformationHandler

Class implementing minimal cost-complexity pruning.
Note when dealing with missing values, use "fractional instances" method instead of surrogate split method.

For more information, see:

Leo Breiman, Jerome H. Friedman, Richard A. Olshen, Charles J. Stone (1984). Classification and Regression Trees. Wadsworth International Group, Belmont, California.

BibTeX:

 @book{Breiman1984,
    address = {Belmont, California},
    author = {Leo Breiman and Jerome H. Friedman and Richard A. Olshen and Charles J. Stone},
    publisher = {Wadsworth International Group},
    title = {Classification and Regression Trees},
    year = {1984}
 }
 

Valid options are:

 -S <num>
  Random number seed.
  (default 1)
 -D
  If set, classifier is run in debug mode and
  may output additional info to the console
 -M <min no>
  The minimal number of instances at the terminal nodes.
  (default 2)
 -N <num folds>
  The number of folds used in the minimal cost-complexity pruning.
  (default 5)
 -U
  Don't use the minimal cost-complexity pruning.
  (default yes).
 -H
  Don't use the heuristic method for binary split.
  (default true).
 -A
  Use 1 SE rule to make pruning decision.
  (default no).
 -C
  Percentage of training data size (0-1].
  (default 1).
Version:
$Revision: 10491 $
Author:
Haijian Shi (hs69@cs.waikato.ac.nz)
See Also:
  • Constructor Details

    • SimpleCart

      public SimpleCart()
  • Method Details

    • globalInfo

      public String globalInfo()
      Return a description suitable for displaying in the explorer/experimenter.
      Returns:
      a description suitable for displaying in the explorer/experimenter
    • getTechnicalInformation

      public TechnicalInformation getTechnicalInformation()
      Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.
      Specified by:
      getTechnicalInformation in interface TechnicalInformationHandler
      Returns:
      the technical information about this class
    • getCapabilities

      public Capabilities getCapabilities()
      Returns default capabilities of the classifier.
      Specified by:
      getCapabilities in interface CapabilitiesHandler
      Overrides:
      getCapabilities in class Classifier
      Returns:
      the capabilities of this classifier
      See Also:
    • buildClassifier

      public void buildClassifier(Instances data) throws Exception
      Build the classifier.
      Specified by:
      buildClassifier in class Classifier
      Parameters:
      data - the training instances
      Throws:
      Exception - if something goes wrong
    • prune

      public void prune(double alpha) throws Exception
      Prunes the original tree using the CART pruning scheme, given a cost-complexity parameter alpha.
      Parameters:
      alpha - the cost-complexity parameter
      Throws:
      Exception - if something goes wrong
    • prune

      public int prune(double[] alphas, double[] errors, Instances test) throws Exception
      Method for performing one fold in the cross-validation of minimal cost-complexity pruning. Generates a sequence of alpha-values with error estimates for the corresponding (partially pruned) trees, given the test set of that fold.
      Parameters:
      alphas - array to hold the generated alpha-values
      errors - array to hold the corresponding error estimates
      test - test set of that fold (to obtain error estimates)
      Returns:
      the iteration of the pruning
      Throws:
      Exception - if something goes wrong
    • modelErrors

      public void modelErrors() throws Exception
      Updates the numIncorrectModel field for all nodes when subtree (to be pruned) is rooted. This is needed for calculating the alpha-values.
      Throws:
      Exception - if something goes wrong
    • treeErrors

      public void treeErrors() throws Exception
      Updates the numIncorrectTree field for all nodes. This is needed for calculating the alpha-values.
      Throws:
      Exception - if something goes wrong
    • calculateAlphas

      public void calculateAlphas() throws Exception
      Updates the alpha field for all nodes.
      Throws:
      Exception - if something goes wrong
    • distributionForInstance

      public double[] distributionForInstance(Instance instance) throws Exception
      Computes class probabilities for instance using the decision tree.
      Overrides:
      distributionForInstance in class Classifier
      Parameters:
      instance - the instance for which class probabilities is to be computed
      Returns:
      the class probabilities for the given instance
      Throws:
      Exception - if something goes wrong
    • toString

      public String toString()
      Prints the decision tree using the protected toString method from below.
      Overrides:
      toString in class Object
      Returns:
      a textual description of the classifier
    • numNodes

      public int numNodes()
      Compute size of the tree.
      Returns:
      size of the tree
    • numInnerNodes

      public int numInnerNodes()
      Method to count the number of inner nodes in the tree.
      Returns:
      the number of inner nodes
    • numLeaves

      public int numLeaves()
      Compute number of leaf nodes.
      Returns:
      number of leaf nodes
    • listOptions

      public Enumeration listOptions()
      Returns an enumeration describing the available options.
      Specified by:
      listOptions in interface OptionHandler
      Overrides:
      listOptions in class RandomizableClassifier
      Returns:
      an enumeration of all the available options.
    • setOptions

      public void setOptions(String[] options) throws Exception
      Parses a given list of options.

      Valid options are:

       -S <num>
        Random number seed.
        (default 1)
       -D
        If set, classifier is run in debug mode and
        may output additional info to the console
       -M <min no>
        The minimal number of instances at the terminal nodes.
        (default 2)
       -N <num folds>
        The number of folds used in the minimal cost-complexity pruning.
        (default 5)
       -U
        Don't use the minimal cost-complexity pruning.
        (default yes).
       -H
        Don't use the heuristic method for binary split.
        (default true).
       -A
        Use 1 SE rule to make pruning decision.
        (default no).
       -C
        Percentage of training data size (0-1].
        (default 1).
      Specified by:
      setOptions in interface OptionHandler
      Overrides:
      setOptions in class RandomizableClassifier
      Parameters:
      options - the list of options as an array of strings
      Throws:
      Exception - if an options is not supported
    • getOptions

      public String[] getOptions()
      Gets the current settings of the classifier.
      Specified by:
      getOptions in interface OptionHandler
      Overrides:
      getOptions in class RandomizableClassifier
      Returns:
      the current setting of the classifier
    • enumerateMeasures

      public Enumeration enumerateMeasures()
      Return an enumeration of the measure names.
      Specified by:
      enumerateMeasures in interface AdditionalMeasureProducer
      Returns:
      an enumeration of the measure names
    • measureTreeSize

      public double measureTreeSize()
      Return number of tree size.
      Returns:
      number of tree size
    • getMeasure

      public double getMeasure(String additionalMeasureName)
      Returns the value of the named measure.
      Specified by:
      getMeasure in interface AdditionalMeasureProducer
      Parameters:
      additionalMeasureName - the name of the measure to query for its value
      Returns:
      the value of the named measure
      Throws:
      IllegalArgumentException - if the named measure is not supported
    • minNumObjTipText

      public String minNumObjTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setMinNumObj

      public void setMinNumObj(double value)
      Set minimal number of instances at the terminal nodes.
      Parameters:
      value - minimal number of instances at the terminal nodes
    • getMinNumObj

      public double getMinNumObj()
      Get minimal number of instances at the terminal nodes.
      Returns:
      minimal number of instances at the terminal nodes
    • numFoldsPruningTipText

      public String numFoldsPruningTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setNumFoldsPruning

      public void setNumFoldsPruning(int value)
      Set number of folds in internal cross-validation.
      Parameters:
      value - number of folds in internal cross-validation.
    • getNumFoldsPruning

      public int getNumFoldsPruning()
      Set number of folds in internal cross-validation.
      Returns:
      number of folds in internal cross-validation.
    • usePruneTipText

      public String usePruneTipText()
      Return the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui.
    • setUsePrune

      public void setUsePrune(boolean value)
      Set if use minimal cost-complexity pruning.
      Parameters:
      value - if use minimal cost-complexity pruning
    • getUsePrune

      public boolean getUsePrune()
      Get if use minimal cost-complexity pruning.
      Returns:
      if use minimal cost-complexity pruning
    • heuristicTipText

      public String heuristicTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui.
    • setHeuristic

      public void setHeuristic(boolean value)
      Set if use heuristic search for nominal attributes in multi-class problems.
      Parameters:
      value - if use heuristic search for nominal attributes in multi-class problems
    • getHeuristic

      public boolean getHeuristic()
      Get if use heuristic search for nominal attributes in multi-class problems.
      Returns:
      if use heuristic search for nominal attributes in multi-class problems
    • useOneSETipText

      public String useOneSETipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui.
    • setUseOneSE

      public void setUseOneSE(boolean value)
      Set if use the 1SE rule to choose final model.
      Parameters:
      value - if use the 1SE rule to choose final model
    • getUseOneSE

      public boolean getUseOneSE()
      Get if use the 1SE rule to choose final model.
      Returns:
      if use the 1SE rule to choose final model
    • sizePerTipText

      public String sizePerTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui.
    • setSizePer

      public void setSizePer(double value)
      Set training set size.
      Parameters:
      value - training set size
    • getSizePer

      public double getSizePer()
      Get training set size.
      Returns:
      training set size
    • getRevision

      public String getRevision()
      Returns the revision string.
      Specified by:
      getRevision in interface RevisionHandler
      Overrides:
      getRevision in class Classifier
      Returns:
      the revision
    • main

      public static void main(String[] args)
      Main method.
      Parameters:
      args - the options for the classifier