使用朴素贝叶斯算法进行垃圾邮件分类。
问:给出一个基于java语言的小型AI案例,并给出各部分的代码
答
下面是一个基于Java语言的小型AI案例,使用朴素贝叶斯算法进行垃圾邮件分类。
代码示例如下:
import java.io.BufferedReader; import java.io.FileReader; import java.io.IOException; import java.util.HashMap; import java.util.Map; public class SpamClassifier { private Map<String, Integer> spamWordsCount; private Map<String, Integer> hamWordsCount; private int totalSpamCount; private int totalHamCount; public SpamClassifier() { spamWordsCount = new HashMap<>(); hamWordsCount = new HashMap<>(); totalSpamCount = 0; totalHamCount = 0; } public void train(String filePath, String label) throws IOException { BufferedReader reader = new BufferedReader(new FileReader(filePath)); String line; while ((line = reader.readLine()) != null) { String[] words = line.split(" "); for (String word : words) { if (label.equals("spam")) { spamWordsCount.put(word, spamWordsCount.getOrDefault(word, 0) + 1); totalSpamCount++; } else { hamWordsCount.put(word, hamWordsCount.getOrDefault(word, 0) + 1); totalHamCount++; } } } reader.close(); } public double classify(String message) { double spamProbability = 0.0; double hamProbability = 0.0; String[] words = message.split(" "); for (String word : words) { int spamWordCount = spamWordsCount.getOrDefault(word, 0); int hamWordCount = hamWordsCount.getOrDefault(word, 0); spamProbability += Math.log((spamWordCount + 1.0) / (totalSpamCount + spamWordsCount.size())); hamProbability += Math.log((hamWordCount + 1.0) / (totalHamCount + hamWordsCount.size())); } return spamProbability > hamProbability ? spamProbability : hamProbability; } public static void main(String[] args) throws IOException { SpamClassifier classifier = new SpamClassifier(); // 训练垃圾邮件样本 classifier.train("spam_samples.txt", "spam"); // 训练非垃圾邮件样本 classifier.train("ham_samples.txt", "ham"); // 测试邮件 String testMessage = "Get a free gift! Limited time offer!"; double probability = classifier.classify(testMessage); if (probability > 0) { System.out.println("The message is classified as spam."); } else { System.out.println("The message is classified as ham."); } } }
在上面的示例中,train方法用于训练垃圾邮件和非垃圾邮件样本,通过读取文件中的文本内容,统计每个单词在垃圾邮件和非垃圾邮件中的出现次数。classify方法用于对给定的邮件内容进行分类,计算垃圾邮件和非垃圾邮件的概率,并返回概率较高的分类结果。
在main方法中,我们首先创建SpamClassifier对象,然后使用train方法分别训练垃
圾邮件和非垃圾邮件样本。接下来,我们使用classify方法对一个测试邮件进行分类,并根据概率结果输出分类结果。
请注意,上述代码示例仅为一个简单的垃圾邮件分类器,仅供演示用途。实际的垃圾邮件分类器可能需要更复杂的特征提取和模型训练过程,以及更大规模的数据集来提高分类准确率。