Mahout版本:0.7,hadoop版本:1.0.4,jdk:1.7.0_25 64bit。
接上篇,分析完3个Job后得到继续往下:其实就剩下两个函数了:
List<Map.Entry<MatrixSlice, EigenStatus>> prunedEigenMeta = pruneEigens(eigenMetaData);
saveCleanEigens(new Configuration(), prunedEigenMeta);
看pruneEigens函数:
private List<Map.Entry<MatrixSlice, EigenStatus>> pruneEigens(Map<MatrixSlice, EigenStatus> eigenMetaData) {
List<Map.Entry<MatrixSlice, EigenStatus>> prunedEigenMeta = Lists.newArrayList();
for (Map.Entry<MatrixSlice, EigenStatus> entry : eigenMetaData.entrySet()) {
if (Math.abs(1 - entry.getValue().getCosAngle()) < maxError && entry.getValue().getEigenValue() > minEigenValue) {
prunedEigenMeta.add(entry);
}
}
看到这里其实是做筛选的,三个job生成了三个eigenStatus,每个eigenStatus都有一个cosAngle和eigenValue,用这两个参数来判断是否应该保留,这三个总结如下:
第一个;
resultantVector:
[-285.43017035605783, -61.30237570857193, -68.94124551381431, -520.2302762811703, -3232.201254912267, -32.31785150049481, -37.63572264009423, -12.025276244275622, -28.58260635344015, -6.8801603142200065, -28.491567864130573, -68.13521243410383, 4382.173720122737]
vector:
[0.01671441233225078, 0.0935655369363106, 0.09132650234523473, -0.0680324702834075, -0.9461123439509093, 0.10210271255992123, 0.10042714365337412, 0.11137954332150339, 0.10331974823993555, 0.10621406378767596, 0.10586960137353602, 0.09262650242313884, 0.09059904726143547]
eigenValue=newNorm/oldNorm=5479.061620543984/1=5479.061620543984;
cosAngle=resultantVector.dot(vector) / newNorm * oldNorm=0.6300724679092792
第二个:
resultantVector:
vector:
[0.01180448947054423, 0.001703710024210367, 0.002100735590662567, 0.014221147454610283, 0.09654151173375553, 0.0025666815984826535, 0.0026147055494762234, 1.753144283209579E-4, 0.0017595900141802873, 0.0049406361794682024, 7.881250692924197E-4, 0.002873479530226361, 0.9951286321096425]
eigenValue:6433335.386819993
cosAngle=0.9999998030863401
第三个:
vector:
[-0.2883450858059115, -0.29170231535763447, -0.29157035465385267, -0.28754185317979386, -0.26018076078737895, -0.2914154866344813, -0.2913995247546756, -0.2922103132689348, -0.2916837423401091, -0.29062644748002026, -0.2920066313645422, -0.2913135151887795, 0.03848561950058266]
eigenValue=1442.6143913921014
cosAngle=0.3671147029085018
可以看到只有第二个可以通过筛选,得到的prunedEigenMeta如下:
看下一个函数saveCleanEigens:
private void saveCleanEigens(Configuration conf, Collection<Map.Entry<MatrixSlice, EigenStatus>> prunedEigenMeta)
throws IOException {
Path path = new Path(outPath, CLEAN_EIGENVECTORS);
FileSystem fs = FileSystem.get(path.toUri(), conf);
SequenceFile.Writer seqWriter = new SequenceFile.Writer(fs, conf, path, IntWritable.class, VectorWritable.class);
try {
IntWritable iw = new IntWritable();
int numEigensWritten = 0;
for (Map.Entry<MatrixSlice, EigenStatus> pruneSlice : prunedEigenMeta) {
MatrixSlice s = pruneSlice.getKey();
EigenStatus meta = pruneSlice.getValue();
EigenVector ev = new EigenVector(s.vector(),
meta.getEigenValue(),
Math.abs(1 - meta.getCosAngle()),
s.index());
//log.info("appending {} to {}", ev, path);
Writable vw = new VectorWritable(ev);
iw.set(s.index());
seqWriter.append(iw, vw);
// increment the number of eigenvectors written and see if we've
// reached our specified limit, or if we wish to write all eigenvectors
// (latter is built-in, since numEigensWritten will always be > 0
numEigensWritten++;
if (numEigensWritten == maxEigensToKeep) {
log.info("{} of the {} total eigens have been written", maxEigensToKeep, prunedEigenMeta.size());
break;
}
}
} finally {
Closeables.closeQuietly(seqWriter);
}
cleanedEigensPath = path;
}
看保存的ev是什么吧:
还不是筛选出来的那个值,不过这里的误差就是1-cosAngle了;
分享,成长,快乐
转载请注明blog地址:http://blog.csdn.net/fansy1990
分享到:
相关推荐
mahoutAlgorithms源码分析 mahout代码解析
svd算法的工具类,直接调用出结果,调用及设置方式参考http://blog.csdn.net/fansy1990 <mahout源码分析之DistributedLanczosSolver(七)>
Mahout是一个Java的机器学习库。Mahout的完整源代码,基于maven,可以轻易导入工程中
mahout,朴素贝叶斯分类,中文分词,mahout,朴素贝叶斯分类,中文分词,
mahout-distribution-0.5-src.zip mahout 源码包
Mahout 是 Apache Software Foundation(ASF) 旗下的一个开源项目,提供一些可扩展的机器学习领域经典算法的实现,旨在帮助开发人员更加方便快捷地创建智能应用程序。Mahout包含许多实现,包括聚类、分类、推荐过滤...
mahout 0.7 src, mahout 源码包, hadoop 机器学习子项目 mahout 源码包
mahout in action中的example codes进行maven编译时由于maven相关jar包的URL的重定位,故无法进行有效编译,需要下载相关jar包进行手动加载!
Mahout:整体框架,实现了协同过滤 Deeplearning4j,构建VSM Jieba:分词,关键词提取 HanLP:分词,关键词提取 Spring Boot:提供API、ORM 关键实现 基于用户的协同过滤 直接调用Mahout相关接口即可 选择不同...
mahout0.9的源码,支持hadoop2,需要自行使用mvn编译。mvn编译使用命令: mvn clean install -Dhadoop2 -Dhadoop.2.version=2.2.0 -DskipTests
mahout实战 源码 mahout实战 配套 mahout-distribution-0.5.tar.gz 版本
该资源是mahout in action 中的源码,适用于自学,可在github下载:https://github.com/tdunning/MiA
mahout中的贝叶斯算法的拓展开发包,提供了相关接口可以供用户调用,直接即可跑出结果,相关运行方式参考blog《mahout贝叶斯算法开发思路(拓展篇)》
Thank you for requesting the download for Apache Mahout Cookbook. Please click the following link to download the code:
mahout_help,mahout的java api帮助文档,可以帮你更轻松掌握mahout
maven_mahout_template-mahout-0.8
mahout0.11版本,源码,可修改源码并自己编译,使用java语言编写,maven编译
MAHOUT实战 MAHOUT IN ACTION
mahout-examples-0.11.1 mahout-examples-0.11.1-job mahout-h2o_2.10-0.11.1 mahout-h2o_2.10-0.11.1-dependency-reduced mahout-hdfs-0.11.1 mahout-integration-0.11.1 mahout-math-0.11.1 mahout-math-0.11.1 ...