Package weka.clusterers
Class XMeans
java.lang.Object
weka.clusterers.AbstractClusterer
weka.clusterers.RandomizableClusterer
weka.clusterers.XMeans
- All Implemented Interfaces:
Serializable
,Cloneable
,Clusterer
,CapabilitiesHandler
,OptionHandler
,Randomizable
,RevisionHandler
,TechnicalInformationHandler
Cluster data using the X-means algorithm.
X-Means is K-Means extended by an Improve-Structure part In this part of the algorithm the centers are attempted to be split in its region. The decision between the children of each center and itself is done comparing the BIC-values of the two structures.
For more information see:
Dan Pelleg, Andrew W. Moore: X-means: Extending K-means with Efficient Estimation of the Number of Clusters. In: Seventeenth International Conference on Machine Learning, 727-734, 2000. BibTeX:
X-Means is K-Means extended by an Improve-Structure part In this part of the algorithm the centers are attempted to be split in its region. The decision between the children of each center and itself is done comparing the BIC-values of the two structures.
For more information see:
Dan Pelleg, Andrew W. Moore: X-means: Extending K-means with Efficient Estimation of the Number of Clusters. In: Seventeenth International Conference on Machine Learning, 727-734, 2000. BibTeX:
@inproceedings{Pelleg2000, author = {Dan Pelleg and Andrew W. Moore}, booktitle = {Seventeenth International Conference on Machine Learning}, pages = {727-734}, publisher = {Morgan Kaufmann}, title = {X-means: Extending K-means with Efficient Estimation of the Number of Clusters}, year = {2000} }Valid options are:
-I <num> maximum number of overall iterations (default 1).
-M <num> maximum number of iterations in the kMeans loop in the Improve-Parameter part (default 1000).
-J <num> maximum number of iterations in the kMeans loop for the splitted centroids in the Improve-Structure part (default 1000).
-L <num> minimum number of clusters (default 2).
-H <num> maximum number of clusters (default 4).
-B <value> distance value for binary attributes (default 1.0).
-use-kdtree Uses the KDTree internally (default no).
-K <KDTree class specification> Full class name of KDTree class to use, followed by scheme options. eg: "weka.core.neighboursearch.kdtrees.KDTree -P" (default no KDTree class used).
-C <value> cutoff factor, takes the given percentage of the splitted centroids if none of the children win (default 0.0).
-D <distance function class specification> Full class name of Distance function class to use, followed by scheme options. (default weka.core.EuclideanDistance).
-N <file name> file to read starting centers from (ARFF format).
-O <file name> file to write centers to (ARFF format).
-U <int> The debug level. (default 0)
-Y <file name> The debug vectors file.
-S <num> Random number seed. (default 10)
- Version:
- $Revision: 9986 $
- Author:
- Gabi Schmidberger (gabi@cs.waikato.ac.nz), Mark Hall (mhall@cs.waikato.ac.nz), Malcolm Ware (mfw4@cs.waikato.ac.nz)
- See Also:
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic int
have a closer look at converge children.static int
for current debug.static int
follows the splitting of the centers.static int
general debugging.static int
follow iterations.static int
check on kdtree.static int
functions were maybe misused.static int
print the centers.static int
check on random vectors.boolean
Flag: I'm debugging.static int
Index in ranges for HIGH.static int
Index in ranges for LOW.static int
Index in ranges for WIDTH. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionReturns the tip text for this property.void
buildClusterer
(Instances data) Generates the X-Means clusterer.boolean
Checks for nominal attributes in the dataset.int
clusterInstance
(Instance instance) Classifies a given instance.Returns the tip text for this property.Returns the tip text for this property.Returns the tip text for this property.Returns the tip text for this property.double
Gets value that represents true in a new numeric attribute.Returns default capabilities of the clusterer.Return the centers of the clusters as an Instances objectdouble
Gets the cutoff factor.int
Gets the debug level.Gets the file name for a file that has the random vectors stored.Gets the distance function.Gets the file to read the list of centers from.Gets the KDTree class.int
Gets the maximum number of iterations.int
Gets the maximum number of iterations in KMeans.int
Gets the maximum number of iterations in KMeans.int
Gets the maximum number of clusters to generate.int
Gets the minimum number of clusters to generate.Read an instance from debug vectors file.String[]
Gets the current settings of SimpleKMeans.Gets the file to write the list of centers to.Returns the revision string.Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.boolean
Gets whether the KDTree is used or not.Returns a string describing this clusterer.void
Initialises the debug vector input.Returns the tip text for this property.Returns the tip text for this property.Returns an enumeration describing the available options.static void
Main method for testing this class.Returns the tip text for this property.Returns the tip text for this property.Returns the tip text for this property.Returns the tip text for this property.Returns the tip text for this property.int
Returns the number of clusters.Returns the tip text for this property.void
setBinValue
(double value) Sets the distance value between true and false of binary attributes.void
setCutOffFactor
(double i) Sets a new cutoff factor.void
setDebugLevel
(int d) Sets the debug level.void
setDebugVectorsFile
(File value) Sets the file that has the random vectors stored.void
setDistanceF
(DistanceFunction distanceF) gets the "binary" distance value.void
setInputCenterFile
(File value) Sets the file to read the list of centers from.void
Sets the KDTree class.void
setMaxIterations
(int i) Sets the maximum number of iterations to perform.void
setMaxKMeans
(int i) Set the maximum number of iterations to perform in KMeans.void
setMaxKMeansForChildren
(int i) Sets the maximum number of iterations KMeans that is performed on the child centers.void
setMaxNumClusters
(int n) Sets the maximum number of clusters to generate.void
setMinNumClusters
(int n) Sets the minimum number of clusters to generate.void
setOptions
(String[] options) Parses a given list of options.void
setOutputCenterFile
(File value) Sets file to write the list of centers to.void
setUseKDTree
(boolean value) Sets whether to use the KDTree or not.toString()
Return a string describing this clusterer.Returns the tip text for this property.Methods inherited from class weka.clusterers.RandomizableClusterer
getSeed, seedTipText, setSeed
Methods inherited from class weka.clusterers.AbstractClusterer
distributionForInstance, forName, makeCopies, makeCopy
-
Field Details
-
R_LOW
public static int R_LOWIndex in ranges for LOW. -
R_HIGH
public static int R_HIGHIndex in ranges for HIGH. -
R_WIDTH
public static int R_WIDTHIndex in ranges for WIDTH. -
D_PRINTCENTERS
public static int D_PRINTCENTERSprint the centers. -
D_FOLLOWSPLIT
public static int D_FOLLOWSPLITfollows the splitting of the centers. -
D_CONVCHCLOSER
public static int D_CONVCHCLOSERhave a closer look at converge children. -
D_RANDOMVECTOR
public static int D_RANDOMVECTORcheck on random vectors. -
D_KDTREE
public static int D_KDTREEcheck on kdtree. -
D_ITERCOUNT
public static int D_ITERCOUNTfollow iterations. -
D_METH_MISUSE
public static int D_METH_MISUSEfunctions were maybe misused. -
D_CURR
public static int D_CURRfor current debug. -
D_GENERAL
public static int D_GENERALgeneral debugging. -
m_CurrDebugFlag
public boolean m_CurrDebugFlagFlag: I'm debugging.
-
-
Constructor Details
-
XMeans
public XMeans()the default constructor.
-
-
Method Details
-
globalInfo
Returns a string describing this clusterer.- Returns:
- a description of the evaluator suitable for displaying in the explorer/experimenter gui
-
getTechnicalInformation
Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.- Specified by:
getTechnicalInformation
in interfaceTechnicalInformationHandler
- Returns:
- the technical information about this class
-
getCapabilities
Returns default capabilities of the clusterer.- Specified by:
getCapabilities
in interfaceCapabilitiesHandler
- Specified by:
getCapabilities
in interfaceClusterer
- Overrides:
getCapabilities
in classAbstractClusterer
- Returns:
- the capabilities of this clusterer
- See Also:
-
buildClusterer
Generates the X-Means clusterer.- Specified by:
buildClusterer
in interfaceClusterer
- Specified by:
buildClusterer
in classAbstractClusterer
- Parameters:
data
- set of instances serving as training data- Throws:
Exception
- if the clusterer has not been generated successfully
-
checkForNominalAttributes
Checks for nominal attributes in the dataset. Class attribute is ignored.- Parameters:
data
- the data to check- Returns:
- false if no nominal attributes are present
-
clusterInstance
Classifies a given instance.- Specified by:
clusterInstance
in interfaceClusterer
- Overrides:
clusterInstance
in classAbstractClusterer
- Parameters:
instance
- the instance to be assigned to a cluster- Returns:
- the number of the assigned cluster as an integer if the class is enumerated, otherwise the predicted value
- Throws:
Exception
- if instance could not be classified successfully
-
numberOfClusters
public int numberOfClusters()Returns the number of clusters.- Specified by:
numberOfClusters
in interfaceClusterer
- Specified by:
numberOfClusters
in classAbstractClusterer
- Returns:
- the number of clusters generated for a training dataset.
-
listOptions
Returns an enumeration describing the available options.- Specified by:
listOptions
in interfaceOptionHandler
- Overrides:
listOptions
in classRandomizableClusterer
- Returns:
- an enumeration of all the available options
-
minNumClustersTipText
Returns the tip text for this property.- Returns:
- tip text for this property
-
setMinNumClusters
public void setMinNumClusters(int n) Sets the minimum number of clusters to generate.- Parameters:
n
- the minimum number of clusters to generate
-
getMinNumClusters
public int getMinNumClusters()Gets the minimum number of clusters to generate.- Returns:
- the minimum number of clusters to generate
-
maxNumClustersTipText
Returns the tip text for this property.- Returns:
- tip text for this property
-
setMaxNumClusters
public void setMaxNumClusters(int n) Sets the maximum number of clusters to generate.- Parameters:
n
- the maximum number of clusters to generate
-
getMaxNumClusters
public int getMaxNumClusters()Gets the maximum number of clusters to generate.- Returns:
- the maximum number of clusters to generate
-
maxIterationsTipText
Returns the tip text for this property.- Returns:
- tip text for this property
-
setMaxIterations
Sets the maximum number of iterations to perform.- Parameters:
i
- the number of iterations- Throws:
Exception
- if i is less than 1
-
getMaxIterations
public int getMaxIterations()Gets the maximum number of iterations.- Returns:
- the number of iterations
-
maxKMeansTipText
Returns the tip text for this property.- Returns:
- tip text for this property
-
setMaxKMeans
public void setMaxKMeans(int i) Set the maximum number of iterations to perform in KMeans.- Parameters:
i
- the number of iterations
-
getMaxKMeans
public int getMaxKMeans()Gets the maximum number of iterations in KMeans.- Returns:
- the number of iterations
-
maxKMeansForChildrenTipText
Returns the tip text for this property.- Returns:
- tip text for this property
-
setMaxKMeansForChildren
public void setMaxKMeansForChildren(int i) Sets the maximum number of iterations KMeans that is performed on the child centers.- Parameters:
i
- the number of iterations
-
getMaxKMeansForChildren
public int getMaxKMeansForChildren()Gets the maximum number of iterations in KMeans.- Returns:
- the number of iterations
-
cutOffFactorTipText
Returns the tip text for this property.- Returns:
- tip text for this property
-
setCutOffFactor
public void setCutOffFactor(double i) Sets a new cutoff factor.- Parameters:
i
- the new cutoff factor
-
getCutOffFactor
public double getCutOffFactor()Gets the cutoff factor.- Returns:
- the cutoff factor
-
binValueTipText
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
getBinValue
public double getBinValue()Gets value that represents true in a new numeric attribute. (False is always represented by 0.0.)- Returns:
- the value that represents true in a new numeric attribute
-
setBinValue
public void setBinValue(double value) Sets the distance value between true and false of binary attributes. and "same" and "different" of nominal attributes- Parameters:
value
- the distance
-
distanceFTipText
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setDistanceF
gets the "binary" distance value.- Parameters:
distanceF
- the distance function with all options set
-
getDistanceF
Gets the distance function.- Returns:
- the distance function
-
debugVectorsFileTipText
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setDebugVectorsFile
Sets the file that has the random vectors stored. Only used for debugging reasons.- Parameters:
value
- the file to read the random vectors from
-
getDebugVectorsFile
Gets the file name for a file that has the random vectors stored. Only used for debugging purposes.- Returns:
- the file to read the vectors from
-
initDebugVectorsInput
Initialises the debug vector input.- Throws:
Exception
- if there is error opening the debug input file.
-
getNextDebugVectorsInstance
Read an instance from debug vectors file.- Parameters:
model
- the data model for the instance.- Returns:
- the next debug vector.
- Throws:
Exception
- if there are no debug vector in m_DebugVectors.
-
inputCenterFileTipText
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setInputCenterFile
Sets the file to read the list of centers from.- Parameters:
value
- the file to read centers from
-
getInputCenterFile
Gets the file to read the list of centers from.- Returns:
- the file to read the centers from
-
outputCenterFileTipText
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setOutputCenterFile
Sets file to write the list of centers to.- Parameters:
value
- file to write centers to
-
getOutputCenterFile
Gets the file to write the list of centers to.- Returns:
- filename of the file to write centers to
-
KDTreeTipText
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setKDTree
Sets the KDTree class.- Parameters:
k
- a KDTree object with all options set
-
getKDTree
Gets the KDTree class.- Returns:
- the configured KDTree
-
useKDTreeTipText
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setUseKDTree
public void setUseKDTree(boolean value) Sets whether to use the KDTree or not.- Parameters:
value
- if true the KDTree is used
-
getUseKDTree
public boolean getUseKDTree()Gets whether the KDTree is used or not.- Returns:
- true if KDTrees are used
-
debugLevelTipText
Returns the tip text for this property.- Returns:
- tip text for this property suitable for displaying in the explorer/experimenter gui
-
setDebugLevel
public void setDebugLevel(int d) Sets the debug level. debug level = 0, means no output- Parameters:
d
- debuglevel
-
getDebugLevel
public int getDebugLevel()Gets the debug level.- Returns:
- debug level
-
setOptions
Parses a given list of options. Valid options are:-I <num> maximum number of overall iterations (default 1).
-M <num> maximum number of iterations in the kMeans loop in the Improve-Parameter part (default 1000).
-J <num> maximum number of iterations in the kMeans loop for the splitted centroids in the Improve-Structure part (default 1000).
-L <num> minimum number of clusters (default 2).
-H <num> maximum number of clusters (default 4).
-B <value> distance value for binary attributes (default 1.0).
-use-kdtree Uses the KDTree internally (default no).
-K <KDTree class specification> Full class name of KDTree class to use, followed by scheme options. eg: "weka.core.neighboursearch.kdtrees.KDTree -P" (default no KDTree class used).
-C <value> cutoff factor, takes the given percentage of the splitted centroids if none of the children win (default 0.0).
-D <distance function class specification> Full class name of Distance function class to use, followed by scheme options. (default weka.core.EuclideanDistance).
-N <file name> file to read starting centers from (ARFF format).
-O <file name> file to write centers to (ARFF format).
-U <int> The debug level. (default 0)
-Y <file name> The debug vectors file.
-S <num> Random number seed. (default 10)
- Specified by:
setOptions
in interfaceOptionHandler
- Overrides:
setOptions
in classRandomizableClusterer
- Parameters:
options
- the list of options as an array of strings- Throws:
Exception
- if an option is not supported
-
getOptions
Gets the current settings of SimpleKMeans.- Specified by:
getOptions
in interfaceOptionHandler
- Overrides:
getOptions
in classRandomizableClusterer
- Returns:
- an array of strings suitable for passing to setOptions
-
toString
Return a string describing this clusterer. -
getClusterCenters
Return the centers of the clusters as an Instances object- Returns:
- the cluster centers.
-
getRevision
Returns the revision string.- Specified by:
getRevision
in interfaceRevisionHandler
- Overrides:
getRevision
in classAbstractClusterer
- Returns:
- the revision
-
main
Main method for testing this class.- Parameters:
argv
- should contain options
-