A simple but powerful script for checking thousands of urls asynchronously in just a few seconds. The script uses Asyncio and the aiohttp library.
Installing aiohttp library
To obtain aiohttp just type:
pip install aiohttp
Asynchronous url checking script
Asyncio will execute the run()
function, using Semaphore for limiting the maximum number of connections/sockets to 1000. The run()
function will execute bound_fetch()
which is the function that will apply the semaphore restriction while it fetches one by one the provided urls. The final function fetch()
called by bound_fetch()
performs the asynchronous requests, receiving page response and printing the result.
import asyncio from aiohttp import ClientSession async def fetch(url, session): async with session.get(url) as response: code_status = response.history[0].status if response.history else response.status print('%s -> Status Code: %s' % (url, code_status)) return await response.read() async def bound_fetch(semaphore, url, session): # Getter function with semaphore. async with semaphore: await fetch(url, session) async def run(urls): tasks = [] # create instance of Semaphore semaphore = asyncio.Semaphore(1000) async with ClientSession() as session: for url in urls: # pass Semaphore and session to every GET request task = asyncio.ensure_future(bound_fetch(semaphore, url, session)) tasks.append(task) responses = asyncio.gather(*tasks) await responses loop = asyncio.get_event_loop() urls = ['https://example1.com', 'https://example2.com'] future = asyncio.ensure_future(run(urls)) loop.run_until_complete(future)