Package weka.classifiers.meta
Class ThresholdSelector
java.lang.Object
weka.classifiers.Classifier
weka.classifiers.SingleClassifierEnhancer
weka.classifiers.RandomizableSingleClassifierEnhancer
weka.classifiers.meta.ThresholdSelector
- All Implemented Interfaces:
Serializable
,Cloneable
,CapabilitiesHandler
,Drawable
,OptionHandler
,Randomizable
,RevisionHandler
public class ThresholdSelector
extends RandomizableSingleClassifierEnhancer
implements OptionHandler, Drawable
A metaclassifier that selecting a mid-point threshold on the probability output by a Classifier. The midpoint threshold is set so that a given performance measure is optimized. Currently this is the F-measure. Performance is measured either on the training data, a hold-out set or using cross-validation. In addition, the probabilities returned by the base learner can have their range expanded so that the output probabilities will reside between 0 and 1 (this is useful if the scheme normally produces probabilities in a very narrow range).
Valid options are:
-C <integer> The class for which threshold is determined. Valid values are: 1, 2 (for first and second classes, respectively), 3 (for whichever class is least frequent), and 4 (for whichever class value is most frequent), and 5 (for the first class named any of "yes","pos(itive)" "1", or method 3 if no matches). (default 5).
-X <number of folds> Number of folds used for cross validation. If just a hold-out set is used, this determines the size of the hold-out set (default 3).
-R <integer> Sets whether confidence range correction is applied. This can be used to ensure the confidences range from 0 to 1. Use 0 for no range correction, 1 for correction based on the min/max values seen during threshold selection (default 0).
-E <integer> Sets the evaluation mode. Use 0 for evaluation using cross-validation, 1 for evaluation using hold-out set, and 2 for evaluation on the training data (default 1).
-M [FMEASURE|ACCURACY|TRUE_POS|TRUE_NEG|TP_RATE|PRECISION|RECALL] Measure used for evaluation (default is FMEASURE).
-manual <real> Set a manual threshold to use. This option overrides automatic selection and options pertaining to automatic selection will be ignored. (default -1, i.e. do not use a manual threshold).
-S <num> Random number seed. (default 1)
-D If set, classifier is run in debug mode and may output additional info to the console
-W Full name of base classifier. (default: weka.classifiers.functions.Logistic)
Options specific to classifier weka.classifiers.functions.Logistic:
-D Turn on debugging output.
-R <ridge> Set the ridge in the log-likelihood.
-M <number> Set the maximum number of iterations (default -1, until convergence).Options after -- are passed to the designated sub-classifier.
- Version:
- $Revision: 1.43 $
- Author:
- Eibe Frank (eibe@cs.waikato.ac.nz)
- See Also:
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final int
accuracystatic final int
n-fold cross-validationstatic final int
entire training setstatic final int
single tuned foldstatic final int
F-measurestatic final int
first class valuestatic final int
second class valuestatic final int
least frequent class valuestatic final int
most frequent class valuestatic final int
class value name, either 'yes' or 'pos(itive)'static final int
precisionstatic final int
Correct based on min/max observedstatic final int
no range correctionstatic final int
recallstatic final Tag[]
The evaluation modesstatic final Tag[]
the measure to usestatic final Tag[]
How to determine which class value to optimize forstatic final Tag[]
Type of correction applied to threshold rangestatic final int
true-positive ratestatic final int
true-negativestatic final int
true-positiveFields inherited from interface weka.core.Drawable
BayesNet, Newick, NOT_DRAWABLE, TREE
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoid
buildClassifier
(Instances instances) Generates the classifier.double[]
distributionForInstance
(Instance instance) Calculates the class membership probabilities for the given test instance.Returns default capabilities of the classifier.Gets the method to determine which class value to optimize.Gets the evaluation mode used.double
Returns the value of the manual threshold.get measure used for determining thresholdint
Get the number of folds used for cross-validation.String[]
Gets the current settings of the Classifier.Gets the confidence range correction mode used.Returns the revision string.graph()
Returns graph describing the classifier (if possible).int
Returns the type of graph this classifier represents.Returns an enumeration describing the available options.static void
Main method for testing this class.Tooltip for this property.void
setDesignatedClass
(SelectedTag newMethod) Sets the method to determine which class value to optimize.void
setEvaluationMode
(SelectedTag newMethod) Sets the evaluation mode used.void
setManualThresholdValue
(double threshold) Sets the value for a manual threshold.void
setMeasure
(SelectedTag newMeasure) set measure used for determining thresholdvoid
setNumXValFolds
(int newNumFolds) Set the number of folds used for cross-validation.void
setOptions
(String[] options) Parses a given list of options.void
setRangeCorrection
(SelectedTag newMethod) Sets the confidence range correction mode used.toString()
Returns description of the cross-validated classifier.Methods inherited from class weka.classifiers.RandomizableSingleClassifierEnhancer
getSeed, seedTipText, setSeed
Methods inherited from class weka.classifiers.SingleClassifierEnhancer
classifierTipText, getClassifier, setClassifier
Methods inherited from class weka.classifiers.Classifier
classifyInstance, debugTipText, forName, getDebug, makeCopies, makeCopy, setDebug
-
Field Details
-
RANGE_NONE
public static final int RANGE_NONEno range correction- See Also:
-
RANGE_BOUNDS
public static final int RANGE_BOUNDSCorrect based on min/max observed- See Also:
-
TAGS_RANGE
Type of correction applied to threshold range -
EVAL_TRAINING_SET
public static final int EVAL_TRAINING_SETentire training set- See Also:
-
EVAL_TUNED_SPLIT
public static final int EVAL_TUNED_SPLITsingle tuned fold- See Also:
-
EVAL_CROSS_VALIDATION
public static final int EVAL_CROSS_VALIDATIONn-fold cross-validation- See Also:
-
TAGS_EVAL
The evaluation modes -
OPTIMIZE_0
public static final int OPTIMIZE_0first class value- See Also:
-
OPTIMIZE_1
public static final int OPTIMIZE_1second class value- See Also:
-
OPTIMIZE_LFREQ
public static final int OPTIMIZE_LFREQleast frequent class value- See Also:
-
OPTIMIZE_MFREQ
public static final int OPTIMIZE_MFREQmost frequent class value- See Also:
-
OPTIMIZE_POS_NAME
public static final int OPTIMIZE_POS_NAMEclass value name, either 'yes' or 'pos(itive)'- See Also:
-
TAGS_OPTIMIZE
How to determine which class value to optimize for -
FMEASURE
public static final int FMEASUREF-measure- See Also:
-
ACCURACY
public static final int ACCURACYaccuracy- See Also:
-
TRUE_POS
public static final int TRUE_POStrue-positive- See Also:
-
TRUE_NEG
public static final int TRUE_NEGtrue-negative- See Also:
-
TP_RATE
public static final int TP_RATEtrue-positive rate- See Also:
-
PRECISION
public static final int PRECISIONprecision- See Also:
-
RECALL
public static final int RECALLrecall- See Also:
-
TAGS_MEASURE
the measure to use
-
-
Constructor Details
-
ThresholdSelector
public ThresholdSelector()Constructor.
-
-
Method Details
-
measureTipText
Tooltip for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setMeasure
set measure used for determining threshold- Parameters:
newMeasure
- Tag representing measure to be used
-
getMeasure
get measure used for determining threshold- Returns:
- Tag representing measure used
-
listOptions
Returns an enumeration describing the available options.- Specified by:
listOptions
in interfaceOptionHandler
- Overrides:
listOptions
in classRandomizableSingleClassifierEnhancer
- Returns:
- an enumeration of all the available options.
-
setOptions
Parses a given list of options. Valid options are:-C <integer> The class for which threshold is determined. Valid values are: 1, 2 (for first and second classes, respectively), 3 (for whichever class is least frequent), and 4 (for whichever class value is most frequent), and 5 (for the first class named any of "yes","pos(itive)" "1", or method 3 if no matches). (default 5).
-X <number of folds> Number of folds used for cross validation. If just a hold-out set is used, this determines the size of the hold-out set (default 3).
-R <integer> Sets whether confidence range correction is applied. This can be used to ensure the confidences range from 0 to 1. Use 0 for no range correction, 1 for correction based on the min/max values seen during threshold selection (default 0).
-E <integer> Sets the evaluation mode. Use 0 for evaluation using cross-validation, 1 for evaluation using hold-out set, and 2 for evaluation on the training data (default 1).
-M [FMEASURE|ACCURACY|TRUE_POS|TRUE_NEG|TP_RATE|PRECISION|RECALL] Measure used for evaluation (default is FMEASURE).
-manual <real> Set a manual threshold to use. This option overrides automatic selection and options pertaining to automatic selection will be ignored. (default -1, i.e. do not use a manual threshold).
-S <num> Random number seed. (default 1)
-D If set, classifier is run in debug mode and may output additional info to the console
-W Full name of base classifier. (default: weka.classifiers.functions.Logistic)
Options specific to classifier weka.classifiers.functions.Logistic:
-D Turn on debugging output.
-R <ridge> Set the ridge in the log-likelihood.
-M <number> Set the maximum number of iterations (default -1, until convergence).
Options after -- are passed to the designated sub-classifier.- Specified by:
setOptions
in interfaceOptionHandler
- Overrides:
setOptions
in classRandomizableSingleClassifierEnhancer
- Parameters:
options
- the list of options as an array of strings- Throws:
Exception
- if an option is not supported
-
getOptions
Gets the current settings of the Classifier.- Specified by:
getOptions
in interfaceOptionHandler
- Overrides:
getOptions
in classRandomizableSingleClassifierEnhancer
- Returns:
- an array of strings suitable for passing to setOptions
-
getCapabilities
Returns default capabilities of the classifier.- Specified by:
getCapabilities
in interfaceCapabilitiesHandler
- Overrides:
getCapabilities
in classSingleClassifierEnhancer
- Returns:
- the capabilities of this classifier
- See Also:
-
buildClassifier
Generates the classifier.- Specified by:
buildClassifier
in classClassifier
- Parameters:
instances
- set of instances serving as training data- Throws:
Exception
- if the classifier has not been generated successfully
-
distributionForInstance
Calculates the class membership probabilities for the given test instance.- Overrides:
distributionForInstance
in classClassifier
- Parameters:
instance
- the instance to be classified- Returns:
- predicted class probability distribution
- Throws:
Exception
- if instance could not be classified successfully
-
globalInfo
- Returns:
- a description of the classifier suitable for displaying in the explorer/experimenter gui
-
designatedClassTipText
- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getDesignatedClass
Gets the method to determine which class value to optimize. Will be one of OPTIMIZE_0, OPTIMIZE_1, OPTIMIZE_LFREQ, OPTIMIZE_MFREQ, OPTIMIZE_POS_NAME.- Returns:
- the class selection mode.
-
setDesignatedClass
Sets the method to determine which class value to optimize. Will be one of OPTIMIZE_0, OPTIMIZE_1, OPTIMIZE_LFREQ, OPTIMIZE_MFREQ, OPTIMIZE_POS_NAME.- Parameters:
newMethod
- the new class selection mode.
-
evaluationModeTipText
- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setEvaluationMode
Sets the evaluation mode used. Will be one of EVAL_TRAINING, EVAL_TUNED_SPLIT, or EVAL_CROSS_VALIDATION- Parameters:
newMethod
- the new evaluation mode.
-
getEvaluationMode
Gets the evaluation mode used. Will be one of EVAL_TRAINING, EVAL_TUNED_SPLIT, or EVAL_CROSS_VALIDATION- Returns:
- the evaluation mode.
-
rangeCorrectionTipText
- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setRangeCorrection
Sets the confidence range correction mode used. Will be one of RANGE_NONE, or RANGE_BOUNDS- Parameters:
newMethod
- the new correciton mode.
-
getRangeCorrection
Gets the confidence range correction mode used. Will be one of RANGE_NONE, or RANGE_BOUNDS- Returns:
- the confidence correction mode.
-
numXValFoldsTipText
- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getNumXValFolds
public int getNumXValFolds()Get the number of folds used for cross-validation.- Returns:
- the number of folds used for cross-validation.
-
setNumXValFolds
public void setNumXValFolds(int newNumFolds) Set the number of folds used for cross-validation.- Parameters:
newNumFolds
- the number of folds used for cross-validation.
-
graphType
public int graphType()Returns the type of graph this classifier represents. -
graph
Returns graph describing the classifier (if possible). -
manualThresholdValueTipText
- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setManualThresholdValue
Sets the value for a manual threshold. If this option is set (non-negative value between 0 and 1), then options pertaining to automatic threshold selection are ignored.- Parameters:
threshold
- the manual threshold to use- Throws:
Exception
-
getManualThresholdValue
public double getManualThresholdValue()Returns the value of the manual threshold. (a negative value indicates that no manual threshold is being used.- Returns:
- the value of the manual threshold.
-
toString
Returns description of the cross-validated classifier. -
getRevision
Returns the revision string.- Specified by:
getRevision
in interfaceRevisionHandler
- Overrides:
getRevision
in classClassifier
- Returns:
- the revision
-
main
Main method for testing this class.- Parameters:
argv
- the options
-