第1.4题:统计文件中单词出现个数
题目来自:Python 练习册。今天做第四题:任一英文的纯文本文件,统计其中的单词出现个数。
铺垫工作
这一期的铺垫工作比较多,所以单独写了一篇文章,详见 Python正则表达式
正文部分
题目内容
任一个英文的纯文本文件,统计其中的单词出现的个数。
参考英文:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28If you are looking for someone you can pour out your love to, let me suggest the empowered woman. The empowered woman knows what she wants, knows how to get it, knows how to live fully, and she knows how to love you back without needing anyone’s approval or recognition. An empowered woman is unarguably one of the most magnificent beings you will ever come in contact with. Read on and find 10 reason why you should absolutely love and embrace the empowered women in your life! .
1. She knows how to love you in returnIt is difficult to give what you don’t have. It is impossible to love someone and feel fulfilled when they can’t love you in return because they don’t love themselves. This will never happen to you when you love an empowered woman. She loves herself (not in a narcissistic manner). In turn, she appreciates who you are and loves you in return. She will love you just like you deserve to be loved.
2. She will inspire youWhen life puts you down and you are at the end of your rope, the empowered woman will be there to see you through. Her drive, enthusiasm and (at times) hopeless optimism will inspire you to carry on despite the obstacles you face.
3. She is not afraid of failureWhile many out there are thoroughly terrified of failure, the empowered woman understands that failures are simply stepping stones in life. How can you not love someone that is thoroughly unafraid to try, fail, and give it a shot all over again?!
4. She is all about the legacyWhile most people are focused on the car, the house, the job, and corner office; the empowered woman is focused on leaving a legacy that will inspire others and change the world. The empowered woman is focused on empowering others to maximize their potential and fulfill their purpose. She is all about inspiring others to look beyond themselves and live a life of service to others.
5. She can laugh at her mistakes……and learn from them as well! She understands mistakes are part of the journey. The empowered woman can laugh and learn from her mistakes to ensure they never happen again.
6. She can be vulnerableThe empowered woman understands there is no debt in relationships without vulnerability. Although she is emotionally strong, she is willing to laugh and cry with you because all of these emotions are an essential part of life.
7. She can speak her mindWhile everyone else is too concerned with what others may think or say, the empowered woman is not afraid to speak her mind. She understands that her value comes from within, not from what others say or think about her.
8. She knows when to remain quietShe lives by Abe Lincoln’s words, “Better to remain silent and be thought a fool, than to speak out and remove all doubt.”
9. She knows how to have funWhether it is at the symphony or at a ball game, the empowered woman understands life is made up of experiences with people – not the places you go. She is able to live in the moment and enjoy it fully without being concerned for the future. After all, who’s got a guaranteed future?
10. She is not afraid of changeWhile most people rather continue on living unfulfilled lives as long as their comfort zone remains intact, the empowered woman is all about embracing change. She understands growth cannot happen without change. She understands that change is the gift life offers you to choose your destiny. Therefore, she is not afraid of change because it is her stepping stone towards success.
下载链接
将文件下载到python的工作路径里去,如果不知道哪里是工作路径,输入1
2
3
4
5
6import os
#获取当前工作目录
os.getcwd()
#更改当前工作目录
os.chdir('d:\')
os.getcwd()
参考代码
每一步我都尽量附带上了解释1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27# coding=utf-8
from collections import defaultdict
import re
# 替换除了n't这类连字符外的所有非单词字符和数字字符
def replace(s):
if s.group(1) == 'n\'t':
return s.group(1)
return ' '
def cal(filename='203305485.txt'):
# 使用lambda来定义简单的函数
dic = defaultdict(lambda: 0)#dic = defaultdict(int)也可以
with open(filename, 'r') as f:
data = f.read()
# 全部变为小写字母
data = data.lower()
# 替换除了n't这类连字符外的所有非单词字符和数字字符
data = re.sub(r'(n[\']t)|([\W\d])', replace, data)
datalist = re.split(r'[\s\n]+', data)
for item in datalist:
dic[item] += 1
del dic['']
return dic
if __name__ == '__main__':
dic = cal()
for key, val in dic.items():
print('%15s ----> %3s' % (key,val))
运行结果如下:
增加排序函数
代码有参考 《利用python进行数据分析》1
2
3
4
5def top_counts(dic, n=10):
value_key_pairs = [(count, tz) for tz, count in dic.items()]
value_key_pairs.sort()
return value_key_pairs[-n:]
top_counts(dic)
运行结果如下:
可以看出,人们最喜欢用的词是定冠词the,下来是介词to…….
补充
最近发现collections模块的Counter类 ,
导入语句是:from collections import Counter
,作用是:定义一个list数组,求数组中每个元素出现的次数
修改之后代码量要少很多,而且可以直接排列好顺序~1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16# coding=utf-8
import re
from collections import Counter
def cal(filename='203305485.txt'):
with open(filename, 'r') as f:
data = f.read()
data = data.lower()
# 替换除了n't这类连字符外的所有非单词字符和数字字符
datalist = re.split(r'[\s\n]+', data)
return Counter(datalist).most_common()
if __name__ == '__main__':
dic = cal()
for i in range(len(dic)):
print('%15s ----> %3s' % (dic[i][0],dic[i][1]))
代码看起来行云流水,舒服多了。当然结论是一样的,人们还是比较喜欢说 the , you~
以上~
第1.4题:统计文件中单词出现个数