Package weka.classifiers.trees
Class BFTree
java.lang.Object
weka.classifiers.Classifier
weka.classifiers.RandomizableClassifier
weka.classifiers.trees.BFTree
- All Implemented Interfaces:
Serializable
,Cloneable
,AdditionalMeasureProducer
,CapabilitiesHandler
,OptionHandler
,Randomizable
,RevisionHandler
,TechnicalInformationHandler
public class BFTree
extends RandomizableClassifier
implements AdditionalMeasureProducer, TechnicalInformationHandler
Class for building a best-first decision tree classifier. This class uses binary split for both nominal and numeric attributes. For missing values, the method of 'fractional' instances is used.
For more information, see:
Haijian Shi (2007). Best-first decision tree learning. Hamilton, NZ.
Jerome Friedman, Trevor Hastie, Robert Tibshirani (2000). Additive logistic regression : A statistical view of boosting. Annals of statistics. 28(2):337-407. BibTeX:
For more information, see:
Haijian Shi (2007). Best-first decision tree learning. Hamilton, NZ.
Jerome Friedman, Trevor Hastie, Robert Tibshirani (2000). Additive logistic regression : A statistical view of boosting. Annals of statistics. 28(2):337-407. BibTeX:
@mastersthesis{Shi2007, address = {Hamilton, NZ}, author = {Haijian Shi}, note = {COMP594}, school = {University of Waikato}, title = {Best-first decision tree learning}, year = {2007} } @article{Friedman2000, author = {Jerome Friedman and Trevor Hastie and Robert Tibshirani}, journal = {Annals of statistics}, number = {2}, pages = {337-407}, title = {Additive logistic regression : A statistical view of boosting}, volume = {28}, year = {2000}, ISSN = {0090-5364} }Valid options are:
-S <num> Random number seed. (default 1)
-D If set, classifier is run in debug mode and may output additional info to the console
-P <UNPRUNED|POSTPRUNED|PREPRUNED> The pruning strategy. (default: POSTPRUNED)
-M <min no> The minimal number of instances at the terminal nodes. (default 2)
-N <num folds> The number of folds used in the pruning. (default 5)
-H Don't use heuristic search for nominal attributes in multi-class problem (default yes).
-G Don't use Gini index for splitting (default yes), if not information is used.
-R Don't use error rate in internal cross-validation (default yes), but root mean squared error.
-A Use the 1 SE rule to make pruning decision. (default no).
-C Percentage of training data size (0-1] (default 1).
- Version:
- $Revision: 6947 $
- Author:
- Haijian Shi (hs69@cs.waikato.ac.nz)
- See Also:
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final int
pruning strategy: post-pruningstatic final int
pruning strategy: pre-pruningstatic final int
pruning strategy: un-prunedstatic final Tag[]
pruning strategy -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoid
buildClassifier
(Instances data) Method for building a BestFirst decision tree classifier.double[]
distributionForInstance
(Instance instance) Computes class probabilities for instance using the decision tree.Return an enumeration of the measure names.Returns default capabilities of the classifier.boolean
Get if use heuristic search for nominal attributes in multi-class problems.double
getMeasure
(String additionalMeasureName) Returns the value of the named measureint
Get minimal number of instances at the terminal nodes.int
Set number of folds in internal cross-validation.String[]
Gets the current settings of the Classifier.Gets the pruning strategy.Returns the revision string.double
Get training set size.Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.boolean
Get if use error rate in internal cross-validation.boolean
Get if use Gini index as splitting criterion.boolean
Get if use the 1SE rule to choose final model.Returns a string describing classifierReturns the tip text for this propertyReturns an enumeration describing the available options.static void
Main method.double
Return number of tree size.Returns the tip text for this propertyReturns the tip text for this propertyint
Compute number of leaf nodes.int
numNodes()
Compute size of the tree.Returns the tip text for this propertyvoid
setHeuristic
(boolean value) Set if use heuristic search for nominal attributes in multi-class problems.void
setMinNumObj
(int value) Set minimal number of instances at the terminal nodes.void
setNumFoldsPruning
(int value) Set number of folds in internal cross-validation.void
setOptions
(String[] options) Parses the options for this object.void
setPruningStrategy
(SelectedTag value) Sets the pruning strategy.void
setSizePer
(double value) Set training set size.void
setUseErrorRate
(boolean value) Set if use error rate in internal cross-validation.void
setUseGini
(boolean value) Set if use Gini index as splitting criterion.void
setUseOneSE
(boolean value) Set if use the 1SE rule to choose final model.Returns the tip text for this propertytoString()
Prints the decision tree using the protected toString method from below.Returns the tip text for this propertyReturns the tip text for this propertyReturns the tip text for this propertyMethods inherited from class weka.classifiers.RandomizableClassifier
getSeed, seedTipText, setSeed
Methods inherited from class weka.classifiers.Classifier
classifyInstance, debugTipText, forName, getDebug, makeCopies, makeCopy, setDebug
-
Field Details
-
PRUNING_UNPRUNED
public static final int PRUNING_UNPRUNEDpruning strategy: un-pruned- See Also:
-
PRUNING_POSTPRUNING
public static final int PRUNING_POSTPRUNINGpruning strategy: post-pruning- See Also:
-
PRUNING_PREPRUNING
public static final int PRUNING_PREPRUNINGpruning strategy: pre-pruning- See Also:
-
TAGS_PRUNING
pruning strategy
-
-
Constructor Details
-
BFTree
public BFTree()
-
-
Method Details
-
globalInfo
Returns a string describing classifier- Returns:
- a description suitable for displaying in the explorer/experimenter gui
-
getTechnicalInformation
Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.- Specified by:
getTechnicalInformation
in interfaceTechnicalInformationHandler
- Returns:
- the technical information about this class
-
getCapabilities
Returns default capabilities of the classifier.- Specified by:
getCapabilities
in interfaceCapabilitiesHandler
- Overrides:
getCapabilities
in classClassifier
- Returns:
- the capabilities of this classifier
- See Also:
-
buildClassifier
Method for building a BestFirst decision tree classifier.- Specified by:
buildClassifier
in classClassifier
- Parameters:
data
- set of instances serving as training data- Throws:
Exception
- if decision tree cannot be built successfully
-
distributionForInstance
Computes class probabilities for instance using the decision tree.- Overrides:
distributionForInstance
in classClassifier
- Parameters:
instance
- the instance for which class probabilities is to be computed- Returns:
- the class probabilities for the given instance
- Throws:
Exception
- if something goes wrong
-
toString
Prints the decision tree using the protected toString method from below. -
numNodes
public int numNodes()Compute size of the tree.- Returns:
- size of the tree
-
numLeaves
public int numLeaves()Compute number of leaf nodes.- Returns:
- number of leaf nodes
-
listOptions
Returns an enumeration describing the available options.- Specified by:
listOptions
in interfaceOptionHandler
- Overrides:
listOptions
in classRandomizableClassifier
- Returns:
- an enumeration describing the available options.
-
setOptions
Parses the options for this object. Valid options are:-S <num> Random number seed. (default 1)
-D If set, classifier is run in debug mode and may output additional info to the console
-P <UNPRUNED|POSTPRUNED|PREPRUNED> The pruning strategy. (default: POSTPRUNED)
-M <min no> The minimal number of instances at the terminal nodes. (default 2)
-N <num folds> The number of folds used in the pruning. (default 5)
-H Don't use heuristic search for nominal attributes in multi-class problem (default yes).
-G Don't use Gini index for splitting (default yes), if not information is used.
-R Don't use error rate in internal cross-validation (default yes), but root mean squared error.
-A Use the 1 SE rule to make pruning decision. (default no).
-C Percentage of training data size (0-1] (default 1).
- Specified by:
setOptions
in interfaceOptionHandler
- Overrides:
setOptions
in classRandomizableClassifier
- Parameters:
options
- the options to use- Throws:
Exception
- if setting of options fails
-
getOptions
Gets the current settings of the Classifier.- Specified by:
getOptions
in interfaceOptionHandler
- Overrides:
getOptions
in classRandomizableClassifier
- Returns:
- the current settings of the Classifier
-
enumerateMeasures
Return an enumeration of the measure names.- Specified by:
enumerateMeasures
in interfaceAdditionalMeasureProducer
- Returns:
- an enumeration of the measure names
-
measureTreeSize
public double measureTreeSize()Return number of tree size.- Returns:
- number of tree size
-
getMeasure
Returns the value of the named measure- Specified by:
getMeasure
in interfaceAdditionalMeasureProducer
- Parameters:
additionalMeasureName
- the name of the measure to query for its value- Returns:
- the value of the named measure
- Throws:
IllegalArgumentException
- if the named measure is not supported
-
pruningStrategyTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setPruningStrategy
Sets the pruning strategy.- Parameters:
value
- the strategy
-
getPruningStrategy
Gets the pruning strategy.- Returns:
- the current strategy.
-
minNumObjTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setMinNumObj
public void setMinNumObj(int value) Set minimal number of instances at the terminal nodes.- Parameters:
value
- minimal number of instances at the terminal nodes
-
getMinNumObj
public int getMinNumObj()Get minimal number of instances at the terminal nodes.- Returns:
- minimal number of instances at the terminal nodes
-
numFoldsPruningTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setNumFoldsPruning
public void setNumFoldsPruning(int value) Set number of folds in internal cross-validation.- Parameters:
value
- the number of folds
-
getNumFoldsPruning
public int getNumFoldsPruning()Set number of folds in internal cross-validation.- Returns:
- number of folds in internal cross-validation
-
heuristicTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui.
-
setHeuristic
public void setHeuristic(boolean value) Set if use heuristic search for nominal attributes in multi-class problems.- Parameters:
value
- if use heuristic search for nominal attributes in multi-class problems
-
getHeuristic
public boolean getHeuristic()Get if use heuristic search for nominal attributes in multi-class problems.- Returns:
- if use heuristic search for nominal attributes in multi-class problems
-
useGiniTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui.
-
setUseGini
public void setUseGini(boolean value) Set if use Gini index as splitting criterion.- Parameters:
value
- if use Gini index splitting criterion
-
getUseGini
public boolean getUseGini()Get if use Gini index as splitting criterion.- Returns:
- if use Gini index as splitting criterion
-
useErrorRateTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui.
-
setUseErrorRate
public void setUseErrorRate(boolean value) Set if use error rate in internal cross-validation.- Parameters:
value
- if use error rate in internal cross-validation
-
getUseErrorRate
public boolean getUseErrorRate()Get if use error rate in internal cross-validation.- Returns:
- if use error rate in internal cross-validation.
-
useOneSETipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui.
-
setUseOneSE
public void setUseOneSE(boolean value) Set if use the 1SE rule to choose final model.- Parameters:
value
- if use the 1SE rule to choose final model
-
getUseOneSE
public boolean getUseOneSE()Get if use the 1SE rule to choose final model.- Returns:
- if use the 1SE rule to choose final model
-
sizePerTipText
Returns the tip text for this property- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui.
-
setSizePer
public void setSizePer(double value) Set training set size.- Parameters:
value
- training set size
-
getSizePer
public double getSizePer()Get training set size.- Returns:
- training set size
-
getRevision
Returns the revision string.- Specified by:
getRevision
in interfaceRevisionHandler
- Overrides:
getRevision
in classClassifier
- Returns:
- the revision
-
main
Main method.- Parameters:
args
- the options for the classifier
-