目录

正则表达式提取《飞鸟集》

目录

需求

从网上down了一个《飞鸟集》的txt文件,其内容是这样的

Stray Birds

飞鸟集





1



Stray birds of summer come to my window to sing and fly away. And yellow leaves of autumn, which have no songs, flutter and fall there with a sigh.



夏天的飞鸟,飞到我窗前唱歌,又飞去了。

秋天的黄叶,它们没有什么可唱,只叹息一声,飞落在那里。





2



O troupe of little vagrants of the world, leave your footprints in my words.



世界上的一队小小的漂泊者呀,请留下你们的足印在我的文字里。





3



The world puts off its mask of vastness to its lover. It becomes small as one song, as one kiss of the eternal.



世界对着它的爱人,把它浩瀚的面具揭下了。

它变小了,小如一首歌,小如一回永恒的接吻。





4



It is the tears of the earth that keep her smiles in bloom.



是“地”的泪点,使她的微笑保持着青春不谢。





5



The mighty desert is burning for the love of a blade of grass who shakes her head and laughs and flies away.



广漠无垠的沙漠热烈地追求着一叶绿草的爱,但她摇摇头,笑起来,飞了开去。





6



If you shed tears when you miss the sun, you also miss the stars.



如果错过了太阳时你流了泪,那么你也要错过群星了。





7



The sands in your way beg for your song and your movement, dancing water. Will you carry the burden of their lameness?



跳舞着的流水呀,在你途中的泥沙,要求你的歌声,你的流动呢。你肯挟跛足的泥沙而俱下么?





8



Her wistful face haunts my dreams like the rain at night.



她的热切的脸,如夜雨似的,搅扰着我的梦魂。





9



Once we dreamt that we were strangers. We wake up to find that we were dear to each other.



有一次,我们梦见大家都是不相识的。

我们醒了,却知道我们原是相亲爱的。






总共325段(上面只截取了9段),每天要给女朋友发一段,用邮件的形式。因为是重复性操作,所以诉诸于代码实现。

所以做一个数据处理,用正则表达式把每一段提取出来,写成json文件,格式如下:

{
    "1": "Stray birds of summer come to my window to sing and fly away. And yellow leaves of autumn, which have no songs, flutter and fall there with a sigh.\n\n\n\n夏天的飞鸟,飞到我窗前唱歌,又飞去了。\n\n秋天的黄叶,它们没有什么可唱,只叹息一声,飞落在那里。",
    "2": "O troupe of little vagrants of the world, leave your footprints in my words.\n\n\n\n世界上的一队小小的漂泊者呀,请留下你们的足印在我的文字里。",
    "3": "The world puts off its mask of vastness to its lover. It becomes small as one song, as one kiss of the eternal.\n\n\n\n世界对着它的爱人,把它浩瀚的面具揭下了。\n\n它变小了,小如一首歌,小如一回永恒的接吻。",
    "4": "It is the tears of the earth that keep her smiles in bloom.\n\n\n\n是“地”的泪点,使她的微笑保持着青春不谢。",
    "5": "The mighty desert is burning for the love of a blade of grass who shakes her head and laughs and flies away.\n\n\n\n广漠无垠的沙漠热烈地追求着一叶绿草的爱,但她摇摇头,笑起来,飞了开去。",
    "6": "If you shed tears when you miss the sun, you also miss the stars.\n\n\n\n如果错过了太阳时你流了泪,那么你也要错过群星了。",
    "7": "The sands in your way beg for your song and your movement, dancing water. Will you carry the burden of their lameness?\n\n\n\n跳舞着的流水呀,在你途中的泥沙,要求你的歌声,你的流动呢。你肯挟跛足的泥沙而俱下么?",
    "8": "Her wistful face haunts my dreams like the rain at night.\n\n\n\n她的热切的脸,如夜雨似的,搅扰着我的梦魂。",
    "9": "Once we dreamt that we were strangers. We wake up to find that we were dear to each other.\n\n\n\n有一次,我们梦见大家都是不相识的。\n\n我们醒了,却知道我们原是相亲爱的。"
}

实现

regex101.com,这个网站真是太好用了。

因为我用的是python,进去之后记得选择python,把文件复制进去,就可以编辑正则表达式了。

https://myblog-1257298572.cos.ap-shanghai.myqcloud.com/img/image-20220630112512950.png

代码可以点Code Generator生成,不过不太好用,可以作为参照。

代码:

https://myblog-1257298572.cos.ap-shanghai.myqcloud.com/img/image-20220630112823725.png

因为正则表达式中的特殊符号会被博客无情转译,所以放图片不放文本了。

顺便记录一下两日期相差天数的代码:

import time
past = time.mktime(time.strptime("2022-06-29 00:00:00","%Y-%m-%d %H:%M:%S"))
now = time.time()
delta_day = int((now-past)/(24*60*60))
print(delta_day)

保存为json文件,中文不乱码

import json
with open("Birds.json",'w',encoding = 'utf-8') as file_obj:
    json.dump(res,file_obj,ensure_ascii=False,indent = 4)