题目

import requestsfrom bs4 import BeautifulSoup url="https:/www.shanghairanking.cn/rankin gs/bcur/2024"#获取HTML网页，向服务器请求资源r= requests. ①_(_②)r.raise_for_status()r.encoding r.apparent_encoding#返回HTTP响应内容的字符串形式，即url对应的页面内容html =r.（③）#对标签树解析 soup=（④）,html.parser') #让HTML内容更加“友好”的显示print(soup.⑤ )

import requests

from bs4 import BeautifulSoup

url

="https://www.shanghairanking.cn/rankin gs/bcur/2024"

#获取HTML网页，向服务器请求资源

r= requests. ①_(_②)

r.raise_for_status()

r.encoding r.apparent_encoding

#返回HTTP响应内容的字符串形式，即url对应的页面内容

html =r.（③）

#对标签树解析

soup=（④）,html.parser')

#让HTML内容更加“友好”的显示

print(soup.⑤ )

题目解答

答案

以下是对代码中空白处的填写：

对于requests.①_(②)：

这里应该是使用requests.get(url)来获取 HTML 网页，向服务器请求资源。所以①处填get，②处填url。

对于r.（③）：

这里应该是返回 HTTP 响应内容的字符串形式，即r.text。

对于soup=(④),html.parser')：

这里应该是BeautifulSoup(html, html.parser)，所以④处填BeautifulSoup。

对于print(soup.⑤)：

这里可以使用soup.prettify()让 HTML 内容更加 “友好” 的显示。所以⑤处填prettify()。

完整的代码如下：

import requests

from bs4 import BeautifulSoup

url = "https://www.shanghairanking.cn/rankings/bcur/2024"

#获取HTML网页，向服务器请求资源

r = requests.get(url)

r.raise_for_status()

r.encoding = r.apparent_encoding

#返回HTTP响应内容的字符串形式，即url对应的页面内容

html = r.text

#对标签树解析

soup = BeautifulSoup(html, 'html.parser')

#让HTML内容更加“友好”的显示

print(soup.prettify())

解析

步骤 1：获取网页内容
使用 `requests.get(url)` 方法向服务器请求资源，获取 HTML 网页内容。这里 `requests.get(url)` 中的 `url` 是要请求的网页地址。
步骤 2：检查请求状态
使用 `r.raise_for_status()` 方法检查请求状态，确保请求成功。
步骤 3：设置编码
使用 `r.encoding = r.apparent_encoding` 设置响应内容的编码，确保内容能够正确解析。
步骤 4：获取响应内容
使用 `r.text` 获取 HTTP 响应内容的字符串形式，即 `url` 对应的页面内容。
步骤 5：解析 HTML
使用 `BeautifulSoup(html, 'html.parser')` 对 HTML 内容进行解析，生成一个 BeautifulSoup 对象。
步骤 6：格式化输出
使用 `soup.prettify()` 方法让 HTML 内容更加“友好”地显示，输出格式化的 HTML 内容。