Building an AWS Lambda Function to Fetch XML Data and Store in DynamoDB

Aug 20, 2023 · 3 min read · AWS python Lambda ·

Share on:

Introduction:

In this tutorial, we'll explore how to leverage the power of AWS Lambda to fetch XML data from a source and store it in a DynamoDB table. AWS Lambda is a serverless computing service that allows you to run code without provisioning or managing servers. We'll guide you through setting up the necessary components, writing the Lambda function, handling user inputs, and ensuring data uniqueness.

Prerequisites:

Before we dive into the implementation, make sure you have the following:

An AWS account with necessary permissions.
An XML data source.
A DynamoDB table ready to store the data.
Familiarity with Python and AWS services.

Setting Up DynamoDB Table:

To get started, let's create the DynamoDB table that will hold our XML data. Follow these steps:

Open the AWS Management Console and navigate to DynamoDB.
Create a new table with the required attributes (e.g., item_id, title, link, pub_date, description).
Note the table name for later use.

Writing the AWS Lambda Function:

Now let's write the Lambda function that fetches XML data and stores it in DynamoDB:

 1# Import necessary libraries
 2import boto3
 3import requests
 4import xml.etree.ElementTree as ET
 5import uuid
 6from datetime import datetime, timezone
 7import os
 8
 9# Initialize DynamoDB client
10dynamodb = boto3.client('dynamodb')
11
12def get_last_added_posts():
13    response = dynamodb.scan(
14        TableName=os.environ['DYNAMODB_TABLE'],
15        ProjectionExpression='link, write_date',
16        Limit=10,
17    )
18    return response['Items']
19
20def lambda_handler(event, context):
21    try:
22        # XML URL to fetch data from
23        xml_url = os.environ['XML_URL']
24        
25        # Fetch XML data
26        response = requests.get(xml_url)
27        xml_data = response.text
28        
29        # Parse XML data
30        root = ET.fromstring(xml_data)
31        
32        # Initialize list to hold item details
33        items_added = 0
34        
35        # Get last added posts from DynamoDB
36        last_added_posts = get_last_added_posts()
37        last_added_links = set(item['link']['S'] for item in last_added_posts)
38        
39        # Extract top 5 items from XML
40        for item_elem in root.findall('.//item')[:5]:
41            link = item_elem.find('link').text
42            
43            if link not in last_added_links:
44                item_id = str(uuid.uuid4())
45                title = item_elem.find('title').text
46                pub_date = item_elem.find('pubDate').text
47                description = item_elem.find('description').text
48                
49                current_time = datetime.now(timezone.utc).isoformat()
50                
51                dynamodb.put_item(
52                    TableName=os.environ['DYNAMODB_TABLE'],
53                    Item={
54                        'item_id': {'S': item_id},
55                        'title': {'S': title},
56                        'link': {'S': link},
57                        'pub_date': {'S': pub_date},
58                        'description': {'S': description},
59                        'write_date': {'S': current_time},
60                    }
61                )
62                
63                items_added += 1
64        
65        debug_message = f"{items_added} items added to DynamoDB"
66        print(debug_message)
67        
68        return {
69            'statusCode': 200,
70            'body': debug_message
71        }
72    except Exception as e:
73        error_message = f"Error: {str(e)}"
74        print(error_message)
75        return {
76            'statusCode': 500,
77            'body': error_message
78        }

Handling User Inputs with Environment Variables:

For security and flexibility, it's recommended to use environment variables for user inputs such as XML URL and DynamoDB table name. Set these environment variables in the AWS Lambda function configuration.

Conclusion:

By leveraging AWS Lambda, you can efficiently fetch XML data and store it in DynamoDB for further processing. This serverless approach eliminates the need to manage infrastructure while providing scalability and cost-effectiveness.

Feel free to expand on each section with explanations, screenshots, and any additional details that are relevant to your readers. Once you have written the blog post content, make sure to proofread, edit, and format it to make it engaging and informative for your readers.