Asynchronous URL checking

A simple but powerful script for checking thousands of urls asynchronously in just a few seconds. The script uses Asyncio and the aiohttp library.

Installing aiohttp library

To obtain aiohttp just type:

pip install aiohttp

Asynchronous url checking script

Asyncio will execute the run() function, using Semaphore for limiting the maximum number of connections/sockets to 1000. The run() function will execute bound_fetch() which is the function that will apply the semaphore restriction while it fetches one by one the provided urls. The final function fetch() called by bound_fetch() performs the asynchronous requests, receiving page response and printing the result.

import asyncio
from aiohttp import ClientSession

async def fetch(url, session):
    async with session.get(url) as response:
        code_status = response.history[0].status if response.history else response.status
        print('%s -> Status Code: %s' % (url, code_status))
        return await response.read()


async def bound_fetch(semaphore, url, session):
    # Getter function with semaphore.
    async with semaphore:
        await fetch(url, session)


async def run(urls):
    tasks = []
    # create instance of Semaphore
    semaphore = asyncio.Semaphore(1000)

    async with ClientSession() as session:
        for url in urls:
            # pass Semaphore and session to every GET request
            task = asyncio.ensure_future(bound_fetch(semaphore, url, session))
            tasks.append(task)

        responses = asyncio.gather(*tasks)
        await responses

loop = asyncio.get_event_loop()

urls = ['https://example1.com', 'https://example2.com']

future = asyncio.ensure_future(run(urls))
loop.run_until_complete(future)

Leave a Reply

Your email address will not be published. Required fields are marked *