How to Download a CSV File From S3 and Insert It Into AWS DocumentDB Using Lambda

If you're working with data in AWS, you might find yourself needing to transfer data between services. For example, you might have a CSV file stored in an S3 bucket that you want to insert into a MongoDB database running on Amazon DocumentDB. In this tutorial, we'll show you how to use Lambda to automate this process.

First, let's take a look at the code. Here's the full script:

 1
 2import boto3
 3import pymongo
 4import csv
 5import os
 6import requests
 7
 8
 9def lambda_handler(event, context):
10    # TODO implement
11	# # # Download DocumentDB SSL file
12    url = "https://s3.amazonaws.com/rds-downloads/rds-combined-ca-bundle.pem"
13    response = requests.get(url)
14    with open("/tmp/rds-combined-ca-bundle.pem", "wb") as f:
15        f.write(response.content)
16        
17    
18    s3_bucket = event['Records'][0]['s3']['bucket']['name']
19    #s3_bucket = "csv-data-in-S3"
20    s3_key = event['Records'][0]['s3']['object']['key']
21    #s3_key = "data/timezone.csv"
22    
23    # Get the file contents from S3
24    s3 = boto3.client('s3')
25    tmp_file_path = '/tmp/' + s3_key.split('/')[-1]
26    print(tmp_file_path)
27    s3.download_file(s3_bucket, s3_key, tmp_file_path)
28	
29	# # Check files in /tmp folder 
30    lst = os.listdir("/tmp")
31    print(lst)
32	
33    # Parse the CSV data from the temporary file
34    with open(tmp_file_path) as file:
35        csv_reader = csv.reader(file, delimiter=',')
36        header = next(csv_reader)
37        rows = []
38        for row in csv_reader:
39            rows.append(dict(zip(header, row)))
40            
41    print(rows)
42	
43    # # Connect to DocumentDB
44    mongo_client = pymongo.MongoClient('mongodb://<user>:<password>@<documentDB-Endpoint>:27017/?ssl=true&ssl_ca_certs=/tmp/rds-combined-ca-bundle.pem&replicaSet=rs0&readPreference=secondaryPreferred&retryWrites=false')
45    db = mongo_client['<Database-Name>']
46    collection = db['<collection-name>']
47    
48    # # # Insert the CSV data into the DocumentDB collection
49    collection.insert_many(rows)
50
51    return {
52       'statusCode': 200,
53       'body': 'Data inserted into DocumentDB collection'
54    }

In the above code, replace and with the appropriate credentials for your DocumentDB cluster, and with the endpoint of your DocumentDB cluster. Also replace and with the name of the database and collection where you want to store the CSV data.

The script is designed to be run as an AWS Lambda function. When the function is triggered, it performs the following steps:

  1. Downloads the Amazon DocumentDB SSL file from an S3 bucket.
  2. Downloads the CSV file from another S3 bucket.
  3. Parses the CSV file into a list of dictionaries.
  4. Connects to the DocumentDB instance.
  5. Inserts the data from the CSV file into a AWS DocumentDB collection.

Congratulations! You have successfully created a Lambda function that downloads a CSV file from Amazon S3, parses the data, and inserts it into a DocumentDB collection.

To automate the process of inserting data from a CSV file in S3 to DocumentDB using the lambda function, you can configure an S3 bucket to trigger the function whenever a new file is uploaded. This can be achieved by adding an S3 event notification to the bucket, which will send an event to the lambda function whenever a new object is created or updated in the bucket.

To set up the event notification, go to the S3 management console and select the bucket where your CSV files are stored. Then, click on the "Properties" tab and scroll down to the "Event notifications" section. From there, click on "Create event notification" and fill in the details for the event, including the event type (object create or object update), the S3 object prefix (if applicable), and the lambda function to invoke.

Once the event notification is set up, any new CSV files uploaded to the specified S3 bucket will automatically trigger the lambda function to insert the data into the specified DocumentDB collection.

Conclusion

In this blog post, we have seen how to create a Lambda function that can download a CSV file from Amazon S3 and insert the data into a DocumentDB collection. We used Python and the Boto3 and PyMongo libraries to accomplish this.

By using Lambda functions to automate the process of downloading and importing data, you can save time and effort, and ensure that your data is always up to date. This is just one example of the many things you can do with AWS Lambda and DocumentDB.

I hope you found this blog post helpful, and feel free to leave a comment if you have any questions or feedback.

I :heart: AWS! :smile: Enjoy