+-
我正在尝试抓取特定区域的天气报告数据:-使用BeautifulSoup4
from bs4 import BeautifulSoup
import requests
import os
import sys
url = 'https://www.accuweather.com/en/in/guwahati/186893/weather-forecast/186893'
agent = {"User-Agent":'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36'}
page = requests.get(url, headers=agent)
soup = BeautifulSoup(page.content, 'lxml') #= bs4 element
#print(soup.prettify())
#alldata is a tag of bs4 element
alldata = soup.find_all(class_='day-panel')
#This will give us all the required data we just need to arrange it nicely
datas = []
for h in alldata:
datas.append(h.text.strip())
print(datas)
print(datas[0])
第一个打印语句将输出显示为:-
['Current Weather\n\t\n\n\t\t11:55 PM\n\t\n\n\n\n\t\t\t22°\n\t\t\n\n\t\t\t\tC\n\t\t\t\n\n\n\t\tRealFeel®\n\t\t20°\n\t\n\n\t\tPartly cloudy', 'Today\n\t\n\n\t\t3/31\n\t\n\n\n\n\t\t\t34°\n\t\t\n\n\t\t\t\tHi\n\t\t\t\n\n\n\t\tRealFeel®\n\t\t36°\n\t\n\n\t\tVery warm with hazy sunshine', 'Tonight\n\t\n\n\t\t3/31\n\t\n\n\n\n\t\t\t16°\n\t\t\n\n\t\t\t\tLo\n\t\t\t\n\n\n\t\tRealFeel®\n\t\t16°\n\t\n\n\t\tPatchy clouds', 'Tomorrow\n\t\n\n\t\t4/1\n\t\n\n\n\n\t\t\t36°\n\t\t\n\n\t\t\t\t/ 16°\n\t\t\t\n\n\n\t\tRealFeel®\n\t\t\n\t\n\n\t\tHot with hazy sunshine']
仅预期文字第二个打印语句显示输出为:-
Current Weather
11:56 PM
22°
C
RealFeel®
20°
Mostly clear
预期输出:-
'Current Weather\n\t\n\n\t\t11:55 PM\n\t\n\n\n\n\t\t\t22°\n\t\t\n\n\t\t\t\tC\n\t\t\t\n\n\n\t\tRealFeel®\n\t\t20°\n\t\n\n\t\tPartly cloudy'
如何解决此问题?
0
投票
投票
之所以这样打印是因为Python正在为数据中的每个\n
和\t
制作换行符和制表符。要在打印时忽略这些转义符,请使用Python repr
函数。
喜欢这个:
print(repr(datas[0]))
输出:
'Current Weather\n\t\n\n\t\t12:28 AM\n\t\n\n\n\n\t\t\t71°\n\t\t\n\n\t\t\t\tF\n\t\t\t\n\n\n\t\tRealFeel®\n\t\t69°\n\t\n\n\t\tMostly clear'