Class SubsetSizeForwardSelection

java.lang.Object
weka.attributeSelection.ASSearch
weka.attributeSelection.SubsetSizeForwardSelection
All Implemented Interfaces:
Serializable, OptionHandler, RevisionHandler

public class SubsetSizeForwardSelection extends ASSearch implements OptionHandler
SubsetSizeForwardSelection:

Extension of LinearForwardSelection. The search performs an interior cross-validation (seed and number of folds can be specified). A LinearForwardSelection is performed on each foldto determine the optimal subset-size (using the given SubsetSizeEvaluator). Finally, a LinearForwardSelection up to the optimal subset-size is performed on the whole data.

For more information see:

Martin Guetlein (2006). Large Scale Attribute Selection Using Wrappers. Freiburg, Germany.

Valid options are:

 -I
  Perform initial ranking to select the
  top-ranked attributes.
 
 -K <num>
  Number of top-ranked attributes that are 
  taken into account by the search.
 
 -T <0 = fixed-set | 1 = fixed-width>
  Type of Linear Forward Selection (default = 0).
 
 -S <num>
  Size of lookup cache for evaluated subsets.
  Expressed as a multiple of the number of
  attributes in the data set. (default = 1)
 
 -E <subset evaluator>
  Subset-evaluator used for subset-size determination.-- -M
 
 -F <num>
  Number of cross validation folds
  for subset size determination (default = 5).
 
 -R <num>
  Seed for cross validation
  subset size determination. (default = 1)
 
 -Z
  verbose on/off
 
 Options specific to evaluator weka.attributeSelection.ClassifierSubsetEval:
 
 -B <classifier>
  class name of the classifier to use for accuracy estimation.
  Place any classifier options LAST on the command line
  following a "--". eg.:
   -B weka.classifiers.bayes.NaiveBayes ... -- -K
  (default: weka.classifiers.rules.ZeroR)
 
 -T
  Use the training data to estimate accuracy.
 
 -H <filename>
  Name of the hold out/test set to 
  estimate accuracy on.
 
 Options specific to scheme weka.classifiers.rules.ZeroR:
 
 -D
  If set, classifier is run in debug mode and
  may output additional info to the console
 
Version:
$Revision: 11198 $
Author:
Martin Guetlein (martin.guetlein@gmail.com)
See Also:
  • Field Details

    • TAGS_TYPE

      public static final Tag[] TAGS_TYPE
  • Constructor Details

    • SubsetSizeForwardSelection

      public SubsetSizeForwardSelection()
      Constructor
  • Method Details

    • globalInfo

      public String globalInfo()
      Returns a string describing this search method
      Returns:
      a description of the search method suitable for displaying in the explorer/experimenter gui
    • getTechnicalInformation

      public TechnicalInformation getTechnicalInformation()
      Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.
      Returns:
      the technical information about this class
    • listOptions

      public Enumeration listOptions()
      Returns an enumeration describing the available options.
      Specified by:
      listOptions in interface OptionHandler
      Returns:
      an enumeration of all the available options.
    • setOptions

      public void setOptions(String[] options) throws Exception
      Parses a given list of options. Valid options are:

      -I
      Perform initial ranking to select top-ranked attributes.

      -K
      Number of top-ranked attributes that are taken into account.

      -T <0 = fixed-set | 1 = fixed-width>
      Typ of Linear Forward Selection (default = 0).

      -S
      Size of lookup cache for evaluated subsets. Expressed as a multiple of the number of attributes in the data set. (default = 1).

      -E
      class name of subset evaluator to use for subset size determination (default = null, same subset evaluator as for ranking and final forward selection is used). Place any evaluator options LAST on the command line following a "--". eg. -A weka.attributeSelection.ClassifierSubsetEval ... -- -M -F
      Number of cross validation folds for subset size determination (default = 5).

      -R
      Seed for cross validation subset size determination. (default = 1)

      -Z
      verbose on/off.

      Specified by:
      setOptions in interface OptionHandler
      Parameters:
      options - the list of options as an array of strings
      Throws:
      Exception - if an option is not supported
    • setLookupCacheSize

      public void setLookupCacheSize(int size)
      Set the maximum size of the evaluated subset cache (hashtable). This is expressed as a multiplier for the number of attributes in the data set. (default = 1).
      Parameters:
      size - the maximum size of the hashtable
    • getLookupCacheSize

      public int getLookupCacheSize()
      Return the maximum size of the evaluated subset cache (expressed as a multiplier for the number of attributes in a data set.
      Returns:
      the maximum size of the hashtable.
    • lookupCacheSizeTipText

      public String lookupCacheSizeTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • performRankingTipText

      public String performRankingTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setPerformRanking

      public void setPerformRanking(boolean b)
      Perform initial ranking to select top-ranked attributes.
      Parameters:
      b - true if initial ranking should be performed
    • getPerformRanking

      public boolean getPerformRanking()
      Get boolean if initial ranking should be performed to select the top-ranked attributes
      Returns:
      true if initial ranking should be performed
    • numUsedAttributesTipText

      public String numUsedAttributesTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setNumUsedAttributes

      public void setNumUsedAttributes(int k) throws Exception
      Set the number of top-ranked attributes that taken into account by the search process.
      Parameters:
      k - the number of attributes
      Throws:
      Exception - if k is less than 2
    • getNumUsedAttributes

      public int getNumUsedAttributes()
      Get the number of top-ranked attributes that taken into account by the search process.
      Returns:
      the number of top-ranked attributes that taken into account
    • typeTipText

      public String typeTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setType

      public void setType(SelectedTag t)
      Set the type
      Parameters:
      t - the Linear Forward Selection type
    • getType

      public SelectedTag getType()
      Get the type
      Returns:
      the Linear Forward Selection type
    • subsetSizeEvaluatorTipText

      public String subsetSizeEvaluatorTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setSubsetSizeEvaluator

      public void setSubsetSizeEvaluator(ASEvaluation eval) throws Exception
      Set the subset evaluator to use for subset size determination.
      Parameters:
      eval - the subset evaluator to use for subset size determination.
      Throws:
      Exception
    • getSubsetSizeEvaluator

      public ASEvaluation getSubsetSizeEvaluator()
      Get the subset evaluator used for subset size determination.
      Returns:
      the evaluator used for subset size determination.
    • numSubsetSizeCVFoldsTipText

      public String numSubsetSizeCVFoldsTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setNumSubsetSizeCVFolds

      public void setNumSubsetSizeCVFolds(int f)
      Set the number of cross validation folds for subset size determination (default = 5).
      Parameters:
      f - number of folds
    • getNumSubsetSizeCVFolds

      public int getNumSubsetSizeCVFolds()
      Get the number of cross validation folds for subset size determination (default = 5).
      Returns:
      number of folds
    • seedTipText

      public String seedTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setSeed

      public void setSeed(int s)
      Seed for cross validation subset size determination. (default = 1)
      Parameters:
      s - seed
    • getSeed

      public int getSeed()
      Seed for cross validation subset size determination. (default = 1)
      Returns:
      seed
    • verboseTipText

      public String verboseTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setVerbose

      public void setVerbose(boolean b)
      Set whether verbose output should be generated.
      Parameters:
      d - true if output is to be verbose.
    • getVerbose

      public boolean getVerbose()
      Get whether output is to be verbose
      Returns:
      true if output will be verbose
    • getOptions

      public String[] getOptions()
      Gets the current settings of LinearForwardSelection.
      Specified by:
      getOptions in interface OptionHandler
      Returns:
      an array of strings suitable for passing to setOptions()
    • toString

      public String toString()
      returns a description of the search as a String
      Overrides:
      toString in class Object
      Returns:
      a description of the search
    • search

      public int[] search(ASEvaluation ASEval, Instances data) throws Exception
      Searches the attribute subset space by subset size forward selection
      Specified by:
      search in class ASSearch
      Parameters:
      ASEvaluator - the attribute evaluator to guide the search
      data - the training instances.
      Returns:
      an array (not necessarily ordered) of selected attribute indexes
      Throws:
      Exception - if the search can't be completed
    • getRevision

      public String getRevision()
      Returns the revision string.
      Specified by:
      getRevision in interface RevisionHandler
      Overrides:
      getRevision in class ASSearch
      Returns:
      the revision