Anas El Mhamdi

An AWS Hack for Cheap and Reliable Proxies

by Anas El Mhamdi

I’m a Lambda maximalist. In a previous article about scraping with Tor, I discussed proxy scraping challenges. However, Tor and free proxies present significant limitations when running on Lambda functions, particularly for LinkedIn scraping workflows.

This article presents a technique for leveraging AWS Lambda’s infrastructure to create cheap, reliable proxies by exploiting the cold start mechanism.

Prerequisites: Serverless framework installation and AWS account configuration.

Use Case: Scraping LinkedIn Job Offers

LinkedIn Job Search Interface

Objective: Automatically collect new software engineering job postings in France on a daily basis.

The concrete challenge involves scraping LinkedIn’s job search results for software engineering positions in France. You can see an example search URL here.

The Challenge: LinkedIn aggressively blocks IP addresses when multiple job offers are accessed sequentially from the same source. Free public proxies are quickly detected and blocked, making them unreliable for this use case.

Architecture Overview

AWS Lambda Architecture

The solution employs two Lambda functions working in concert:

  1. Main function: Triggered daily via CloudWatch Events to scrape job listing URLs from the LinkedIn search results page
  2. Worker function: Called by the main function for each individual job URL to extract detailed job offer information while maintaining IP rotation

This separation allows the worker function to bypass LinkedIn’s blocking mechanisms by rotating IP addresses between requests.

The Cold Start Mechanism

Lambda Cold Start Diagram

AWS Lambda is described as “a serverless, event-driven compute service that lets you run code for virtually any type of application or backend service without provisioning or managing servers.”

When Lambda functions initialize for the first time or after idle periods, they experience what’s called a “cold start”—the process of spinning up an ephemeral execution environment. According to AWS documentation:

“Cold starts typically occur in under 1% of invocations. The duration of a cold start varies from under 100 ms to over 1 second.”

The Key Insight: Each cold start deploys the function code to a different pooled instance in AWS’s infrastructure. This means each cold start results in a different IP address from which your code executes.

Rather than viewing cold starts as a performance drawback (as many developers do), this technique intentionally exploits them as a feature for automatic IP rotation.

Forcing Cold Starts for IP Rotation

The implementation strategy involves deliberately triggering cold starts to change IP addresses:

  1. Update the Lambda function with a random environment variable or identifier
  2. Use AWS waiters to synchronously monitor the deployment completion
  3. Implement conditional logic that forces a cold start when LinkedIn blocks a request

When the worker function detects blocking (such as receiving a CAPTCHA or rate limit response), it triggers a function update. This forces AWS to redeploy the Lambda to a new instance with a fresh IP address, allowing the scraping to continue.

Cost Advantage

This approach is remarkably cost-effective:

  • Cold starts are free - AWS doesn’t charge for initialization time
  • Lambda compute costs are already very low for small-scale operations
  • No proxy service fees - eliminates the need for expensive residential or datacenter proxy subscriptions

For small to medium-scale scraping workflows, forcing occasional cold starts adds negligible cost while providing reliable IP rotation.

Implementation Resources

The complete implementation is available as an open-source project:

The repository includes all necessary code for both Lambda functions, deployment configuration, and instructions for setting up the automated workflow.

Conclusion

By intentionally leveraging AWS Lambda cold starts, developers can achieve reliable IP rotation for web scraping tasks without the expense of traditional proxy services. This “hack” transforms what’s typically considered a performance limitation into a cost-effective feature for bypassing rate limiting and IP blocking mechanisms.

The technique is particularly well-suited for:

  • Small to medium-scale scraping projects
  • Automated data collection workflows
  • Scenarios where residential proxies are too expensive
  • Applications requiring occasional IP rotation rather than constant high-volume requests

This approach demonstrates how understanding the underlying infrastructure of serverless platforms can unlock creative solutions to common development challenges.