Class TextDirectoryLoader

java.lang.Object
weka.core.converters.AbstractLoader
weka.core.converters.TextDirectoryLoader
All Implemented Interfaces:
Serializable, BatchConverter, Loader, OptionHandler, RevisionHandler

public class TextDirectoryLoader extends AbstractLoader implements BatchConverter, OptionHandler
Loads all text files in a directory and uses the subdirectory names as class labels. The content of the text files will be stored in a String attribute, the filename can be stored as well.

Valid options are:

 -D
  Enables debug output.
  (default: off)
 -F
  Stores the filename in an additional attribute.
  (default: off)
 -dir <directory>
  The directory to work on.
  (default: current directory)
Based on code from the TextDirectoryToArff tool:
Version:
$Revision: 11199 $
Author:
Ashraf M. Kibriya (amk14 at cs.waikato.ac.nz), Richard Kirkby (rkirkby at cs.waikato.ac.nz), fracpete (fracpete at waikato dot ac dot nz)
See Also:
  • Constructor Details

    • TextDirectoryLoader

      public TextDirectoryLoader()
      default constructor
  • Method Details

    • globalInfo

      public String globalInfo()
      Returns a string describing this loader
      Returns:
      a description of the evaluator suitable for displaying in the explorer/experimenter gui
    • listOptions

      public Enumeration listOptions()
      Lists the available options
      Specified by:
      listOptions in interface OptionHandler
      Returns:
      an enumeration of the available options
    • setOptions

      public void setOptions(String[] options) throws Exception
      Parses a given list of options.

      Valid options are:

       -D
        Enables debug output.
        (default: off)
       -F
        Stores the filename in an additional attribute.
        (default: off)
       -dir <directory>
        The directory to work on.
        (default: current directory)
      Specified by:
      setOptions in interface OptionHandler
      Parameters:
      options - the options
      Throws:
      Exception - if options cannot be set
    • getOptions

      public String[] getOptions()
      Gets the setting
      Specified by:
      getOptions in interface OptionHandler
      Returns:
      the current setting
    • charSetTipText

      public String charSetTipText()
      the tip text for this property
      Returns:
      the tip text
    • setCharSet

      public void setCharSet(String charSet)
      Set the character set to use when reading text files (an empty string indicates that the default character set will be used).
      Parameters:
      charSet - the character set to use.
    • getCharSet

      public String getCharSet()
      Get the character set to use when reading text files. An empty string indicates that the default character set will be used.
      Returns:
      the character set name to use (or empty string to indicate that the default character set will be used).
    • setDebug

      public void setDebug(boolean value)
      Sets whether to print some debug information.
      Parameters:
      value - if true additional debug information will be printed.
    • getDebug

      public boolean getDebug()
      Gets whether additional debug information is printed.
      Returns:
      true if additional debug information is printed
    • debugTipText

      public String debugTipText()
      the tip text for this property
      Returns:
      the tip text
    • setOutputFilename

      public void setOutputFilename(boolean value)
      Sets whether the filename will be stored as an extra attribute.
      Parameters:
      value - if true the filename will be stored in an extra attribute
    • getOutputFilename

      public boolean getOutputFilename()
      Gets whether the filename will be stored as an extra attribute.
      Returns:
      true if the filename is stored in an extra attribute
    • outputFilenameTipText

      public String outputFilenameTipText()
      the tip text for this property
      Returns:
      the tip text
    • getFileDescription

      public String getFileDescription()
      Returns a description of the file type, actually it's directories.
      Returns:
      a short file description
    • getDirectory

      public File getDirectory()
      get the Dir specified as the source
      Returns:
      the source directory
    • setDirectory

      public void setDirectory(File dir) throws IOException
      sets the source directory
      Parameters:
      dir - the source directory
      Throws:
      IOException - if an error occurs
    • reset

      public void reset()
      Resets the loader ready to read a new data set
      Specified by:
      reset in interface Loader
      Overrides:
      reset in class AbstractLoader
    • setSource

      public void setSource(File dir) throws IOException
      Resets the Loader object and sets the source of the data set to be the supplied File object.
      Specified by:
      setSource in interface Loader
      Overrides:
      setSource in class AbstractLoader
      Parameters:
      dir - the source directory.
      Throws:
      IOException - if an error occurs
    • getStructure

      public Instances getStructure() throws IOException
      Determines and returns (if possible) the structure (internally the header) of the data set as an empty set of instances.
      Specified by:
      getStructure in interface Loader
      Specified by:
      getStructure in class AbstractLoader
      Returns:
      the structure of the data set as an empty set of Instances
      Throws:
      IOException - if an error occurs
    • getDataSet

      public Instances getDataSet() throws IOException
      Return the full data set. If the structure hasn't yet been determined by a call to getStructure then method should do so before processing the rest of the data set.
      Specified by:
      getDataSet in interface Loader
      Specified by:
      getDataSet in class AbstractLoader
      Returns:
      the structure of the data set as an empty set of Instances
      Throws:
      IOException - if there is no source or parsing fails
    • getNextInstance

      public Instance getNextInstance(Instances structure) throws IOException
      TextDirectoryLoader is unable to process a data set incrementally.
      Specified by:
      getNextInstance in interface Loader
      Specified by:
      getNextInstance in class AbstractLoader
      Parameters:
      structure - ignored
      Returns:
      never returns without throwing an exception
      Throws:
      IOException - always. TextDirectoryLoader is unable to process a data set incrementally.
    • getRevision

      public String getRevision()
      Returns the revision string.
      Specified by:
      getRevision in interface RevisionHandler
      Returns:
      the revision
    • main

      public static void main(String[] args)
      Main method.
      Parameters:
      args - should contain the name of an input file.