如何使用 Python 在文本文件中查找重复次数最多的单词?

programmingpythonserver side programming

在本文中，我们将向您展示如何使用 Python 在给定的文本文件中查找重复次数最多的单词。

假设我们获取了一个名为 ExampleTextFile.txt 的文本文件，其中包含一些随机文本。我们将返回给定文本文件中重复次数最多的单词

ExampleTextFile.txt

早上好 TutorialsPoint
这是 TutorialsPoint 示例文件
由 Python、Seaborn、Scala 中的特定源代码组成
摘要和说明
欢迎 TutorialsPoint
快乐地学习

算法(步骤)

以下是执行所需任务要遵循的算法/步骤 -

导入 Counter 函数(Counter 类是 Python3 的 collections 模块提供的一种对象数据集形式。Collections 模块向用户公开专门的容器数据类型，作为 Python 通用内置函数(如字典、列表和元组)的替代方案。Counter 是一个计数可哈希的子类对象。调用时，它会从 collections 模块隐式创建一个可迭代哈希表
创建一个变量来存储文本文件的路径。
创建一个列表来存储所有单词。
使用 open() 函数(打开文件并返回文件对象作为结果)通过传递文件名和模式作为参数以只读模式打开文本文件(此处"r"代表只读模式)。

with open(inputFile, 'r') as filedata:

使用 for 循环遍历文件的每一行。
使用 split() 函数(将字符串拆分为列表。我们可以定义分隔符； (默认分隔符是任何空格)将文本文件内容拆分为单词列表并将其存储在变量中。
使用 for 循环遍历单词列表。
使用 append() 函数(将元素添加到列表末尾)将每个单词附加到列表中。
使用 Counter() 函数(将单词的频率作为键值对给出)计算所有单词的频率(单词出现的次数)。
创建一个变量来存储最大频率。
使用 for 循环在上述词频词典中循环。
使用 if 条件语句和 in 关键字，检查单词的频率是否大于最大频率。

in 关键字有两种工作方式:
in 关键字用于确定序列(列表、范围、字符串等)中是否存在某个值。
它还用于在 for 循环中迭代整个序列

如果单词的频率大于最大频率。
创建一个变量来存储文本文件中重复次数最多的单词。
打印文本文件中重复次数最多的单词。
使用 close() 函数(用于关闭打开的文件)关闭输入文件。

示例

以下程序遍历文本文件的行，并使用 collections 模块中的计数器函数打印文本文件中键值对的频率 -

# importing Counter function
from collections import Counter

# input text file
inputFile = "ExampleTextFile.txt"

# Storing all the words
newWordsList = []

# Opening the given file in read-only mode
with open(inputFile, 'r') as filedata:

   # Traverse in each line of the file
   for textline in filedata:

      # Splitting the text file content into list of words
      wordsList = textline.split()

      # Traverse in the above list of words
      for word in wordsList:

         # Appending each word to the new list
         newWordsList.append(word)

# Using the Counter() function, calculate the frequency of all the words
wordsFrequency = Counter(newWordsList)

# Taking a variable to store the maximum frequency value
maxFrequency = 0

# Loop in the above words frequency dictionary
for textword in wordsFrequency:

   # Checking whether the frequency of the word is greater than the maximum frequency
   if(wordsFrequency[textword] > maxFrequency):
 
      # If it is true then set maximum frequency to the corresponding frequency value of the word
      maxFrequency = wordsFrequency[textword]

      # As this is the word with maximum frequency store this word in a variable
      mostRepeatedWord = textword

# Printing the most repeated word in a text file
print("{",mostRepeatedWord,"} is the most repeated word in a text file")

# Closing the input file
filedata.close()

输出

执行时，上述程序将生成以下输出 -

{ TutorialsPoint } is the most repeated word in a text file

在此程序中，我们从文本文件中读取一些随机文本。我们读取整个文件，将其分解为单词，并将文本文件的所有单词添加到列表中。我们使用 Counter() 方法计算文本文件中所有单词的频率，该方法返回一个字典，其中键为单词，值是单词的频率。然后我们迭代字典中的单词，检查频率是否大于最大频率。如果是，则这是最频繁的单词，因此我们将结果保存在变量中，并用当前单词的频率更新最大频率。最后，我们显示最频繁的单词。

结论

本文向我们展示了如何读取文件，逐行遍历它，并检索该行中的所有单词。一旦我们得到它们，我们就可以反转单词，更改大小写，检查元音，检索单词长度等。我们还学习了如何使用 Counter() 方法来确定单词列表的频率。此函数可用于确定字符串、列表、元组等的频率。

技术文章和资源

热门类别

如何使用 Python 在文本文件中查找重复次数最多的单词?

算法(步骤)

示例

输出

结论

相关文章

颜色选择器

读后有收获微信请站长喝咖啡

错误报告

您的建议:

感谢您的帮助！