python 库随记

发表于 2021-04-21 更新于 2021-05-14 分类于 python 阅读次数：

本文字数： 11k 阅读时长 ≈ 19 分钟

requests 库

get 请求

1	r = requests.get("https://unsplash.com")

向网站发送了一个 get 请求，然后网站会返回一个 response。r 就是 response。可以在运行的时候查看 r 的 type。

get 传递参数请求

1 2	payload = {'key1': 'value1', 'key2': 'value2'} r = requests.get("http://httpbin.org/get", params=payload)

向服务器发送的请求中包含了两个参数 key1 和 key2，以及两个参数的值。实际上它构造成了如下网址：
http://httpbin.org/get?key1=value1&key2=value2

POST 请求

无参数 post 请求：

1	r = requests.post("http://httpbin.org/post")

有参数 post 请求：

1 2	payload = {'key1': 'value1', 'key2': 'value2'} r = requests.post("http://httpbin.org/post", data=payload)

post 请求多用来提交表单数据，即填写一堆输入框，然后提交。

其他请求

平时用的不多，可以看官网文档。阅读官方文档是必备技能！

r = requests.put("http://httpbin.org/put")
r = requests.delete("http://httpbin.org/delete")
r = requests.head("http://httpbin.org/get")
r = requests.options("http://httpbin.org/get")

BeautifulSoup 库

可另外安装 lxml，BeautifulSoup 对文档进行解析。使用 lxml 比 Python 内置解析器速度快。

BeautifulSoup 对象类型

Beautiful Soup 将复杂 HTML 文档转换成一个复杂的树形结构，每个节点都是 Python 对象。所有对象可以归纳为 4 种类型: Tag , NavigableString , BeautifulSoup , Comment 。

graph LR
A(BeautifulSoup )-->|"find = soup.find('p')"|B[Tag]
B--"find.name"--> C("class 'str'")
B--"find['class']"-->D(class 'list')
B--"find.string"-->E(class 'bs4.element.NavigableString')
B--"find.string"-->F(class 'bs4.element.Comment')

BeautifulSoup 对象

BeautifulSoup 对象表示一个文档的全部内容。支持遍历文档树和搜索文档树。

html_doc 为字符串

html_doc = """
字符串内容
"""
soup = BeautifulSoup(html_doc, 'lxml')  # BeautifulSoup使用lxml解析

html_doc = """
字符串
"""
soup = BeautifulSoup(html_doc, 'html.parser')  # BeautifulSoup使用python内部解析

htmlfile 为 open 的文件

1 2	with open('result.html', 'r', encoding='utf-8') as htmlfile: soup = BeautifulSoup(htmlfile, 'html.parser')

Tag 对象

find () 方法返回返回标签

这个就跟 HTML 或者 XML（还能解析 XML？是的，能！）中的标签是一样的。我们使用 find () 方法返回的类型就是这个（插一句：使用 find-all () 返回的是多个该对象的集合，是可以用 for 循环遍历的。）。返回标签之后，还可以对提取标签中的信息。

使用 find 方法查到第一个 p 标签：<class ‘bs4.element.Tag’>

1	find = soup.find('p')

提取标签的名字：<class ‘str’>

find.name

提取标签的属性：<class ‘list’>

1	find['attribute']

NavigableString 对象

提取标签的内容（不包含标签）：<class ‘bs4.element.NavigableString’> 或 < class ‘bs4.element.Comment’>，若返回 < class ‘bs4.element.Comment’>，则为注释

1	find.string

例子：

from bs4 import BeautifulSoup

html_doc = """
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>

<p class="story">...</p>
"""
soup = BeautifulSoup(html_doc, 'lxml')  #声明BeautifulSoup对象
find = soup.find('p')  #使用find方法查到第一个p标签
print("find's return type is ", type(find))  #输出返回值类型
print("find's content is", find)  #输出find获取的值
print("find's Tag Name is ", find.name)  #输出标签的名字
print("find's Attribute(class) is ", find['class'])  #输出标签的class属性值

Comment 对象

这个对象其实就是 HTML 和 XML 中的注释。

markup = "<b><!--Hey, buddy. Want to buy a used parser?--></b>"
soup = BeautifulSoup(markup, 'lxml')
comment = soup.b.string
type(comment)
# <class 'bs4.element.Comment'>

有些时候，并不想获取 HTML 中的注释内容，所以用这个类型来判断是否是注释。

if type(SomeString) == bs4.element.Comment:
    print('该字符是注释')
else:
    print('该字符不是注释')

其他语法

1
2
3

soup.head  #查找第一个head标签,相当于soup.find('head')
soup.p  #查找第一个p标签，相当于soup.find('p')
soup.b  #查找b标签,相当于soup.find('b')

BeautifulSoup 遍历方法

节点和标签名

可以使用子节点、父节点、及标签名的方式遍历：

soup.head #查找head标签
soup.p #查找第一个p标签

#对标签的直接子节点进行循环
for child in title_tag.children:
    print(child)

soup.parent #父节点

# 所有父节点
for parent in link.parents:
    if parent is None:
        print(parent)
    else:
        print(parent.name)

# 兄弟节点
sibling_soup.b.next_sibling #后面的兄弟节点
sibling_soup.c.previous_sibling #前面的兄弟节点

#所有兄弟节点
for sibling in soup.a.next_siblings:
    print(repr(sibling))

for sibling in soup.find(id="link3").previous_siblings:
    print(repr(sibling))

搜索文档树

主要 find () 和 find_all () 啦，当然还有其他的。比如 find_parent () 和 find_parents ()、 find_next_sibling () 和 find_next_siblings () 、find_all_next () 和 find_next ()、find_all_previous () 和 find_previous () 等等

find_all()
搜索当前 tag 的所有 tag 子节点，并判断是否符合过滤器的条件。返回值类型是 bs4.element.ResultSet。
完整的语法：
1
find_all( name , attrs , recursive , string , **kwargs )
例：

soup.find_all("title")
# [<title>The Dormouse's story</title>]
#
soup.find_all("p", "title")
# [<p class="title"><b>The Dormouse's story</b></p>]
# 
soup.find_all("a")
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
#
soup.find_all(id="link2")
# [<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>]
#
import re
soup.find(string=re.compile("sisters"))
# u'Once upon a time there were three little sisters; and their names were\n'

name 参数：可以查找所有名字为 name 的 tag。
attr 参数：就是 tag 里的属性。
string 参数：搜索文档中字符串的内容。
recursive 参数：调用 tag 的 find_all () 方法时，Beautiful Soup 会检索当前 tag 的所有子孙节点。如果只想搜索 tag 的直接子节点，可以使用参数 recursive=False 。

find()
与 find_all () 类似，只不过只返回找到的第一个值。返回值类型是 bs4.element.Tag。
完整语法：
find( name , attrs , recursive , string , **kwargs )
看例子：

soup.find('title')
# <title>The Dormouse's story</title>
#
soup.find("head").find("title")
# <title>The Dormouse's story</title>

BeautifulSoup 参数调用方法

find () 和 find_all ()：

find ()** 只返回第一个匹配到的对象，find_all ()** 返回所有匹配到的结果，语法一样，相当于 limit=1

1	find(name, attrs, recursive, text, **wargs)

1	find_all(name, attrs, recursive, text, limit, **kwargs)

参数：

这些参数相当于过滤器一样可以进行筛选处理。不同的参数过滤可以应用到以下情况：

查找标签，基于 name 参数
查找文本，基于 text 参数，可以使正则表达式
查找标签的属性，基于 attrs 参数
基于函数的查找
如下参数可组合查找，并且的关系？

参数名	作用
name	基于 name 参数
text	基于 text 参数，文本或正则表达式
attrs	基于 attrs 参数
recursive	递归的，循环的，true 或 false
limit	find_all () 显示结果范围
函数	必须返回

调用方法（class_是特殊属性，替换 class 属性）

# 直接按规则输入条件
li_quick = soup.find_all('li', {'class': 'item-1'})
# 加上对象名称
li_quick = soup.find_all(name='li', attrs={'class': 'item-1'})
# 可综合使用
li_quick = soup.find_all('li', attrs={'class': 'item-1'})
# class属性查找
beautifulsoup_class_ = soup.find_all(class_='item-1')

ConfigParser 库

configparser 模块主要用于读取配置文件，导入方法：import configparser

configparser 初始化

import configparser
# 生成ConfigParser对象
config = configparser.ConfigParser()
# 读取配置文件
filename = 'config.ini'
config.read(filename, encoding='utf-8')

基本操作

配置文件 config.ini

[user]
user_name = Mr,X
password = 222

[connect]
ip = 127.0.0.1
port = 4723

1. 获取节点 sections

ConfigParserObject.sections()
以列表形式返回 configparser 对象的所有节点信息

1
2
3

# 获取所有节点
all_sections = config.sections()
print('sections: ', all_sections)   # 结果sections:  ['user', 'connect']

2. 获取指定节点的的所有配置信息

ConfigParserObject.items(section)
以列表形式返回某个节点 section 对应的所有配置信息

1
2
3

# 获取指定节点的配置信息
items = config.items('user')
print(items)            # 结果 [('user_name', "'Mr,X'"), ('password', "'222'")]

3. 获取指定节点的 options

ConfigParserObject.options(section)
以列表形式返回某个节点 section 的所有 key 值

1
2
3

# 获取指定节点的options信息
options = config.options('user')
print(options)          # 结果 ['user_name', 'password']

4. 获取指定节点下指定 option 的值

ConfigParserObject.get(section, option)
返回结果是字符串类型
ConfigParserObject.getint(section, option)
返回结果是 int 类型
ConfigParserObject.getboolean(section, option)
返回结果是 bool 类型
ConfigParserObject.getfloat(section, option)
返回结果是 float 类型

# 获取指定节点指定option的值
name = config.get('user', 'user_name')
print(name, type(name))            # 结果 'Mr,X' <class 'str'>
port = config.getint('connect', 'port')
print(port, type(port))           # 结果  4723 <class 'int'>

5. 检查 section 或 option 是否存在

ConfigParserObject.has_section(section)
ConfigParserObject.has_option(section, option)
返回 bool 值，若存在返回 True，不存在返回 False

# 检查section是否存在
result = config.has_section('user')
print(result)   # 结果 True
result = config.has_section('user1')
print(result)   # 结果 False

# 检查option是否存在
result = config.has_option('user', 'user_name')
print(result)   # 结果 True
result = config.has_option('user', 'user_name1')
print(result)   # 结果 False
result = config.has_option('user1', 'user_name')
print(result)   # 结果 False

6. 添加 section

ConfigParserObject.add_section(section)
如果 section 不存在，则添加节点 section；
若 section 已存在，再执行 add 操作会报错 configparser.DuplicateSectionError: Section XX already exists

# 添加section
if not config.has_section('remark'):
    config.add_section('remark')
config.set('remark', 'info', 'ok')
config.write(open(filename, 'w'))
remark = config.items('remark')
print(remark)    # 结果 [('info', 'ok')]

7. 修改或添加指定节点下指定 option 的值

ConfigParserObject.set(section, option, value)
若 option 存在，则会替换之前 option 的值为 value；
若 option 不存在，则会创建 optiion 并赋值为 value

# 修改指定option的值
config.set('user', 'user_name', 'Mr L')
config.set('user', 'isRemember', 'True')
config.write(open(filename, 'w'))
# 重新查看修改后节点信息
items = config.items('user')
print(items)    # 结果 [('user_name', 'Mr L'), ('password', '222'), ('isremember', 'True')]

8. 删除 section 或 option

ConfigParserObject.remove_section(section)
若 section 存在，执行删除操作；
若 section 不存在，则不会执行任何操作
ConfigParserObject.remove_option(section, option)
若 option 存在，执行删除操作；
若 option 不存在，则不会执行任何操作；
若 section 不存在，则会报错 configparser.NoSectionError: No section: XXX

# 删除section
config.remove_section('remark')         # section存在
config.remove_section('no_section')     # section不存在

# 删除option
config.remove_option('user', 'isremember')  # option存在
config.remove_option('user', 'no_option')   # option不存在
config.write(open(filename, 'w'))

all_sections = config.sections()
print(all_sections)     # 结果 ['user', 'connect']
options = config.options('user')
print(options)      # 结果 ['user_name', 'password']

9. 写入内容

ConfigParserObject.write(open(filename, ‘w’))
对 configparser 对象执行的一些修改操作，必须重新写回到文件才可生效

1	config.write(open(filename, 'w'))

SQLAlchemy

PyExcelerate 0.10.0

PyExcelerate · PyPI

比较快的 excel 写入包

Python 读取 excel 三大常用模块到底谁最快

类型	xlrd&xlwt&xlutils	pandas	OpenPyXL
读取	支持	支持	支持
写入	支持	支持	支持
修改	支持	支持	支持
xls	支持	支持	不支持
xlsx	高版本支持	支持	支持
大文件	不支持	支持	支持
效率	快	快	快
功能	较弱	强大	一般
遍历耗时	0.35 s	2.60 s	0.47 s