请教做分类算法研究的一般方法
看很多论文都会给出一些实验结果对比图,如各种算法对比、有未标记样本与无未标记样本的对比图等等。一开始认为是不是大家都自己用计算机语言C++/JAVA等来做了实验系统然后自己绘制的图片,后来知道有matlab后,认为应该是通过这个工具编码实现的。
对于分类和matlab我都是新手中的新手,所以对于怎么用matlab来做算法的实验很迷惑,越迷惑越心慌,也就越不知道该怎么做,希望大家指导一下我该怎么去做实验。
我现在要研究graph-based 的半监督学习算法来帮助分类。现在想的是能用matlab实现算法,然后转为c++语言再实现另一个实验系统软件。
说到这,我对基于图的算法了解的也并不深刻,希望大家给些指点。
我都不知道怎么描述清楚自己的问题了,但是我想大家的一句话或许就能给我指出一条明道。谢谢各位!
[[i] 本帖最后由 yaosoong 于 2007-12-25 11:11 编辑 [/i]]
din
问题同上,那位路过的高手请指点一二! 这本书看看先,http://stirf.rack111.com/viewthread.php?tid=179&extra=page%3D1
Survey of Text Mining. Clustering, Classification, and Retrieval(2004)
目录:
I Clustering and Classification 1
1 Cluster-Preserving Dimension Reduction Methods for Efficient
(Classification of Text Data 3
Peg Howland and Haesun Park
1.1 Introduction 3
1.2 Dimension Reduction in the Vector Space Mode) 4
1.3 A Method Based on an Orthogonal Basis of Centroids 5
1.3.1 Relationship to a Method from Factor Analysis . . . . 7
1.4 Discriminant Analysis and Its Extension for Text Data 8
1.4.1 Generalized Singular Value Decomposition 10
1.4.2 Extension of Discriminant Analysis 11
1.4.3 Equivalence for Various and 14
1.5 Trace Optimization Using an Orthogonal Basis of Centroids . . 16
1.6 Document Classification Experiments 17
1.7 Conclusion 19
References 22
2 Automatic Discovery of Similar Words 25
Pierre P. Senellart and Vincent D. Blondel
2.1 Introduction 25
2.2 Discovery of Similar Words from a Large Corpus 26
2.2.1 A Document Vector Space Model 27
2.2.2 A Thesaurus of Infrequent Words 28
2.2.3 The SEXTANT System 29
2.2.4 How to Deal with the Web 32
2.3 Discovery of Similar Words in u Dictionary 33
2.3.1 Introduction 33
2.3.2 A Generalization of Kleinberg's Method 33
2.3.3 Other Methods 35
2.3.4 Dictionary Graph 36
2.3.5 Results 37
2.3.6 Future Perspectives 41
2.4 Conclusion 41
References 42
3 Simultaneous Clustering and Dynamic Keyword Weighting for Text
Documents 45
Hichem Frigui and Olfa Nasraoui
3.1 Introduction 45
3.2 Simultaneous Clustering and Term Weighting of Text
Documents 47
3.3 Simultaneous Soft Clustering and Term Weighting of Text
Documents 52
3.4 Robustness in the Presence of Noise Documents 56
3.5 Experimental Results 57
3.5.1 Simulation Results on Four-Class Web Text Data . . . 57
3.5.2 Simulation Results on 20 Newsgroups Data 59
3.6 Conclusion 69
References 70
4 Feature Selection and Document Clustering 73
Inderjit Dhillon, Jacob Kogan, and Charles Nicholas
4.1 Introduction 73
4.2 Clustering Algorithms 74
4.2.1 Means Clustering Algorithm 74
4.2.2 Principal Direction Divisive Partitioning 78
4.3 Data and Term Quality 80
4.4 Term Variance Quality 81
4.5 Same Context Terms 86
4.5.1 Term Profiles 87
4.5.2 Term Profile Quality 87
4.6 Spherical Principal Directions Divisive Partitioning 90
4.6.1 Two-Cluster Partition of Vectors on the Unit Circle . . 90
4.6.2 Clustering with sPDDP 96
4.7 Future Research 98
References 99
II Information Extraction and Retrieval 101
5 Vector Space Models for Search and Cluster Mining 103
Mei Kobayashi and Masaki Aono
5.1 Introduction 103
5.2 Vector Space Modeling (VSM) 105
5.2.1 The Basic VSM Model for IR 105
5.2.2 Latent Semantic Indexing (LSI) 107
5.2.3 Covariance Matrix Analysis (COV) 108
5.2.4 Comparison of LSI and COV 109
5.3 VSM for Major and Minor Cluster Discovery 111
5.3.1 Clustering 111
5.3.2 Rescaling: Ando's Algorithm 111
5.3.3 Dynamic Rescaling of LSI 113
5.3.4 Dynamic Rescaling of COV 114
5.4 Implementation Studies 115
5.4.1 Implementations with Artificially Generated Datasets . 115
5.4.2 Implementations with L.A. Times News Articles . . . . 118
5.5 Conclusions and Future Work 120
References 120
6 HotMiner: Discovering Hot Topics from Dirty Text 123
Malu Castellanos
6.1 Introduction 124
6.2 Related Work 128
6.3 Technical Description 130
6.3.1 Preprocessing 130
6.3.2 Clustering 132
6.3.3 Postfiltering 133
6.3.4 Labeling 136
6.4 Experimental Results 137
6.5 Technical Description 143
6.5.1 Thesaurus Assistant 145
6.5.2 Sentence Identifier 147
6.5.3 Sentence Extractor 149
6.6 Experimental Results 151
6.7 Mining Case Excerpts for Hot Topics 153
6.8 Conclusions 154
References 155
7 Combining Families of Information Retrieval Algorithms Using
Metalearning 159
Michael Cornelson, Ed Greengrass, Robert L. Grossman, Ron Karidi,
and Daniel Shnidman
7.1 Introduction 159
7.2 Related Work 161
7.3 Information Retrieval 162
7.4 Metalearning 164
7.5 Implementation 166
7.6 Experimental Results 166
7.7 Further Work 167
7.8 Summary and Conclusion 168
References 168
III Trend Detection 171
8 Trend and Behavior Detection from Web Queries 173
Peiling Wang, Jennifer Bownas, and Michael W. Berry
8.1 Introduction 173
8.2 Query Data and Analysis 174
8.2.1 Descriptive Statistics of Web Queries 175
8.2.2 Trend Analysis of Web Searching 176
8.3 Zipf's Law 178
8.3.1 Natural Logarithm Transformations 178
8.3.2 Piecewise Trendlines 179
8.4 Vocabulary Growth 179
8.5 Conclusions and Further Studies 181
References 182
9 A Survey of Emerging Trend Detection in Textual Data Mining 185
April Kontostathis, Leon M. Galitsky, William M. Pottenger, Soma
Roy, and Daniel J. Phelps
9.1 Introduction 186
9.2 ETD Systems 187
9.2.1 Technology Opportunities Analysis (TOA) 189
9.2.2 CIMEL: Constructive, Collaborative Inquiry-Based
Multimedia E-Learning 191
9.2.3 TimeMines 195
9.2.4 New Event Detection 199
9.2.5 ThemeRiver™ 201
9.2.6 PatentMiner 204
9.2.7 HDDI™ 207
9.2.8 Other Related Work 211
9.3 Commercial Software Overview 212
9.3.1 Autonomy 212
9.3.2 SPSS LexiQuest 212
9.3.3 ClearForest 213
9.4 Conclusions and Future Work 214
9.5 Industrial Counterpoint: Is ETD Useful? Dr. Daniel J. Phelps,
Leader, Information Mining Group, Eastman Kodak 215
References 219
页:
[1]