正则表达式提取《飞鸟集》
目录
需求
从网上down了一个《飞鸟集》的txt文件,其内容是这样的
Stray Birds
飞鸟集
1
Stray birds of summer come to my window to sing and fly away. And yellow leaves of autumn, which have no songs, flutter and fall there with a sigh.
夏天的飞鸟,飞到我窗前唱歌,又飞去了。
秋天的黄叶,它们没有什么可唱,只叹息一声,飞落在那里。
2
O troupe of little vagrants of the world, leave your footprints in my words.
世界上的一队小小的漂泊者呀,请留下你们的足印在我的文字里。
3
The world puts off its mask of vastness to its lover. It becomes small as one song, as one kiss of the eternal.
世界对着它的爱人,把它浩瀚的面具揭下了。
它变小了,小如一首歌,小如一回永恒的接吻。
4
It is the tears of the earth that keep her smiles in bloom.
是“地”的泪点,使她的微笑保持着青春不谢。
5
The mighty desert is burning for the love of a blade of grass who shakes her head and laughs and flies away.
广漠无垠的沙漠热烈地追求着一叶绿草的爱,但她摇摇头,笑起来,飞了开去。
6
If you shed tears when you miss the sun, you also miss the stars.
如果错过了太阳时你流了泪,那么你也要错过群星了。
7
The sands in your way beg for your song and your movement, dancing water. Will you carry the burden of their lameness?
跳舞着的流水呀,在你途中的泥沙,要求你的歌声,你的流动呢。你肯挟跛足的泥沙而俱下么?
8
Her wistful face haunts my dreams like the rain at night.
她的热切的脸,如夜雨似的,搅扰着我的梦魂。
9
Once we dreamt that we were strangers. We wake up to find that we were dear to each other.
有一次,我们梦见大家都是不相识的。
我们醒了,却知道我们原是相亲爱的。
总共325段(上面只截取了9段),每天要给女朋友发一段,用邮件的形式。因为是重复性操作,所以诉诸于代码实现。
所以做一个数据处理,用正则表达式把每一段提取出来,写成json文件,格式如下:
{
"1": "Stray birds of summer come to my window to sing and fly away. And yellow leaves of autumn, which have no songs, flutter and fall there with a sigh.\n\n\n\n夏天的飞鸟,飞到我窗前唱歌,又飞去了。\n\n秋天的黄叶,它们没有什么可唱,只叹息一声,飞落在那里。",
"2": "O troupe of little vagrants of the world, leave your footprints in my words.\n\n\n\n世界上的一队小小的漂泊者呀,请留下你们的足印在我的文字里。",
"3": "The world puts off its mask of vastness to its lover. It becomes small as one song, as one kiss of the eternal.\n\n\n\n世界对着它的爱人,把它浩瀚的面具揭下了。\n\n它变小了,小如一首歌,小如一回永恒的接吻。",
"4": "It is the tears of the earth that keep her smiles in bloom.\n\n\n\n是“地”的泪点,使她的微笑保持着青春不谢。",
"5": "The mighty desert is burning for the love of a blade of grass who shakes her head and laughs and flies away.\n\n\n\n广漠无垠的沙漠热烈地追求着一叶绿草的爱,但她摇摇头,笑起来,飞了开去。",
"6": "If you shed tears when you miss the sun, you also miss the stars.\n\n\n\n如果错过了太阳时你流了泪,那么你也要错过群星了。",
"7": "The sands in your way beg for your song and your movement, dancing water. Will you carry the burden of their lameness?\n\n\n\n跳舞着的流水呀,在你途中的泥沙,要求你的歌声,你的流动呢。你肯挟跛足的泥沙而俱下么?",
"8": "Her wistful face haunts my dreams like the rain at night.\n\n\n\n她的热切的脸,如夜雨似的,搅扰着我的梦魂。",
"9": "Once we dreamt that we were strangers. We wake up to find that we were dear to each other.\n\n\n\n有一次,我们梦见大家都是不相识的。\n\n我们醒了,却知道我们原是相亲爱的。"
}
实现
regex101.com,这个网站真是太好用了。
因为我用的是python,进去之后记得选择python,把文件复制进去,就可以编辑正则表达式了。
代码可以点Code Generator生成,不过不太好用,可以作为参照。
代码:
因为正则表达式中的特殊符号会被博客无情转译,所以放图片不放文本了。
顺便记录一下两日期相差天数的代码:
import time
past = time.mktime(time.strptime("2022-06-29 00:00:00","%Y-%m-%d %H:%M:%S"))
now = time.time()
delta_day = int((now-past)/(24*60*60))
print(delta_day)
保存为json文件,中文不乱码
import json
with open("Birds.json",'w',encoding = 'utf-8') as file_obj:
json.dump(res,file_obj,ensure_ascii=False,indent = 4)