Here is and example how you can get the data from the page (table, title,…) into a pandas dataframe and then automatically follow the next link:
from io import StringIO
import pandas as pd
import requests
from bs4 import BeautifulSoup
url = "https://www.afm.nl/nl-nl/sector/registers/vergunningenregisters/financiele-dienstverleners/details?id=C18B1D63-774C-E811-80D9-005056BB0C82"
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:121.0) Gecko/20100101 Firefox/121.0"
}
all_dfs = []
while True:
soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")
for tag in soup.select(".cc-mobile-title"):
tag.extract()
df = pd.read_html(StringIO(str(soup)))[0]
df["title"] = soup.h1.get_text(strip=True)
for label, value in zip(
soup.select(".cc-em--detail-list__label"),
soup.select(".cc-em--detail-list__value"),
):
df[label.get_text(strip=True)] = value.get_text(strip=True)
print(df)
all_dfs.append(df)
next_url = soup.select_one('a:-soup-contains("Volgende register resultaat")')
if not next_url:
break
url = "https://www.afm.nl/" + next_url["href"]
final_df = pd.concat(all_dfs)
print(final_df)
final_df.to_csv('data.csv', index=False)
Prints:
Financiele Dienst Product Begindatum Einddatum title Statutaire naam Handelsnaam Vergunningnummer 0 Adviseren Inkomensverzekeringen 01 jan 2006 NaN Michal Tomeš Michal Tomeš Michal Tomeš 12045811
1 Adviseren Schadeverzekeringen particulier 01 jan 2006 NaN Michal Tomeš Michal Tomeš Michal Tomeš 12045811 2 Adviseren Schadeverzekeringen zakelijk 01 jan 2006 NaN Michal Tomeš Michal Tomeš Michal Tomeš 12045811
3 Adviseren Vermogen 01 jan 2006 NaN Michal Tomeš Michal Tomeš Michal Tomeš 12045811
4 Bemiddelen Inkomensverzekeringen 01 jan 2006 NaN Michal Tomeš Michal Tomeš Michal Tomeš 12045811
5 Bemiddelen Schadeverzekeringen particulier 01 jan 2006 NaN Michal Tomeš Michal Tomeš Michal Tomeš 12045811
6 Bemiddelen Schadeverzekeringen zakelijk 01 jan 2006 NaN Michal Tomeš Michal Tomeš Michal Tomeš 12045811
7 Bemiddelen Vermogen 01 jan 2006 NaN Michal Tomeš Michal Tomeš Michal Tomeš 12045811
Financiele Dienst Product Begindatum Einddatum title Statutaire naam Handelsnaam Vergunningnummer
0 Adviseren Inkomensverzekeringen 01 jan 2007 NaN Michal Treml Michal Treml Michal Treml 12045973
1 Adviseren Schadeverzekeringen particulier 01 jan 2007 NaN Michal Treml Michal Treml Michal Treml 12045973
2 Adviseren Schadeverzekeringen zakelijk 01 jan 2007 NaN Michal Treml Michal Treml Michal Treml 12045973
...
The final_df
at the end concatenates all DataFrames to one final dataframe.