Save Files to Google Drive with Streamlit

Oğuzhan Arı
12 min readApr 5, 2024

--

hi!

I have spent the last few weeks dealing with streamlit and I actually think that its potential is many times more. if you are not an advanced programmer like me, streamlit saves you a lot of trouble in many ways. you don’t have to think about many issues such as domain, hosting, design design with html/css, responsiveness of the page, security of the work. you just write your code with python and… tada! you have a web page in front of you. for this reason, streamlit is starting to like me more and more day by day and the limit of what you can do is your imagination.

So, what are we going to do today?

With streamlit, we will perform an operation on the CSV file we receive from the user, print the result of this operation to a text file and keep a log of what was done in this process, then save these three files to our drive folder for ourselves (to correct our errors, make improvements or examine the processes).

Since our main purpose here will be to create the files and save them to the drive, we will take the data from a very simple CSV file (containing 10 lines, with numbers on each line) and save the sum of the received data in a text file so that the user can download this file. by keeping the log records of the operations performed, we will be able to examine the working stages later.

Well, then, let’s get started!

accessing google drive api

we don’t need a lot of libraries for our program. we will only use pandas (to read csv), pydrive (to access google drive) and streamlit (of course…).

under normal circumstances, when we use a google drive api, the program will ask us for authorisation every time it runs. that is, when we deploy this application and other users use it, it will ask them to log into this application (the api we use to save files to drive) with an authorised account (a gmail account). However, since we don’t know which user is using it and we don’t go through an authorisation process, the application will authorise with the authorisation information of our own authorised account in the background, without making the user feel it at all. however, we cannot share this authorisation information after sharing our application on github. because with this information, anyone can log in to any application as “us”. we don’t want this. for this reason, we will create this authorisation file ourselves with streamlit.

then let’s start coding!

First, let’s create an api for our drive and get the client_secret file for this application. We go to https://console.cloud.google.com/. After logging in to our account, we search for google drive.

then select “google drive api”.

We activate the google drive api with the “enable” option.

this page directs us directly to the api page. from this page, we start creating a validation file for our application by pressing “create credentials”.

The first step on this page asks which application we will use and what the data source will be in the application we use. we continue by saying user data. the important detail here is the ‘user consent required’ part. normally, when we use this application, users will need to log in. but we will do this part ourselves on behalf of the user.

In the OAuth Consent section, we give a name to our application, give the requested e-mail addresses and continue.

In the scopes section, we will allow our application to have access to what and where. here, we want our application to have full authority over our drive.

After pressing add or remove scopes, on the screen that opens, type google drive api in the search field, then select ‘auth/drive’ and close the screen by saying update. we pass the scope section by selecting save and continue.

And we are done! Now we have an application. By pressing the download button, we download our json file named client_secret_xx.

Finally, we edit the name of our file named client_secret as “client_secrets.json”.

obtain authorisation file for google drive api

I am creating a utils.py file for the study. all the code that will run in the background will be in this file. my goal is that the main.py file only provides a page and manages the operations by calling the necessary functions.

In the utils.py file, let’s first create the function that we will not use later. we can summarise the basic logic of our work here as follows.

And how will this flow look in the code?

<script src=”https://gist.github.com/oguzhari/2917620a8abd958bc5893a24b7dbd553.js"></script>

When we run our code, a web page will be opened to us, because we do not have a ‘mycreds.txt’ file.

then, since our application is designed very simply, it gives a warning for the safety of other users, we say continue here. after all, we know the developer.

After saying continue, we see what our application wants;

After saying continue with this, our browser gives a message like this;

This should mean that we have completed the verification process, if you see a warning like this;

We go back to https://console.cloud.google.com/, from the OAuth consent screen tab, we add ourselves as a test user.

Then, when we run our code again, we go through the above steps. When we see the text ‘The authentication flow has completed.’, we close the tab and look at PyCharm.

We see that the file ‘mycreds.txt’ has been created, now when we run it again, no tabs will open, the programme will run and stop.

yes, we will deal with the mycreds.txt file later, we have a long way to go! now let’s start designing the streamlit infrastructure.

Let’s start reading, processing and writing data with streamlit!

design the streamlit page and skeleton

import streamlit as st

st.title("CSV Processing Application")
st.file_uploader("Select CSV File", type="csv")
st.button("Upload and Analyse File")

we get a page like this with only 4 lines of code;

our skeleton is ready, the rest of the operations;

for the user
- analysis to be carried out,
- printing the result on the screen,

for us
- Keeping log records for the operations performed,
- saving the result in a text file,
- uploading log, result and csv files to the designated drive folder,

Let’s start with what we’re going to do for the user.

analysis processing for the user

we want to keep a log record in the background for every operation. all the functions we will define will also have a log section. we add a log file creation command directly into our utils file. so when utils.py is called, a log file will be created in the background. since we will not get any names or information from the users, and since our main goal in our project is to track what is done anonymously, we will name the user files with the date. since streamlit does not allow us to use the logging library directly (there is no direct support as of the date), we will use a text file as a log file and manage the write operations with our functions.

<script src=”https://gist.github.com/oguzhari/d4ea347dd4fe98d6d6abda4dfa28df8c.js"></script>

I want to open our code a little bit, because we have adapted a normally existing function for ourselves, for our conditions. while you follow this article, you may need to adapt it for your own problem. it is very important to understand what it does so that you know what to change for which need. in the first part, we get the date of that moment as current_user. because we want to keep our users as instant dates, we take the date up to milliseconds so that there is no confusion, so that (I’m not sure if it will happen) there will be a differentiation based on milliseconds for two users logging into the system at the same second.

Our first function creates a log text file for the user. it opens this file in utf-8 format, so we will not have a problem when we use Turkish characters. we create the log message with a separate string and then print it. we can actually write it directly into the log message. in order to avoid any problems during writing, we create a separate string and then save it. the \n at the end is to go to the bottom line after the message. so that the logs will be kept one after the other instead of writing them side by side.

our other function keeps our log by taking a prefix and message from us in the form of ‘INFO’ and ‘ERROR’. normally python’s logging library can do this for us very easily. because of streamlit, we do it ourselves. our function also keeps the time of the event in the run. so we can easily follow the sequence of events and the time difference between them.

import streamlit as st
from utils import *


st.title("CSV Processing Application")
st.file_uploader("Select CSV File", type="csv")
if st.button("Upload and Analyse File"):
create_user_log_file()
save_to_log('INFO', 'File upload process initiated.')

We have set up our log file in this way to test the way it works, then we will have another setup to prevent and save problems that may occur during operation. after running our program, we will press the button.

our file will look like this

when other logs are added, they will appear one after the other. yes, everything we have done so far was outside the main purpose of the application, now let’s write our code that will process it when we receive a CSV file from the user. the purpose of our code is simple, it will take a single column CSV file and sum each row and print the total result.

def process_csv(dataframe):
return dataframe.sum().values[0]

we did our process in a very simple, easy way. as i said, the purpose of this article is not to do a process, but to keep a record of the processes done.

another trick here, we will create a list named “created_files” globally in our utils file. when we get confirmation from the user, we will load these 3 files by name. for this reason, we update our process_csv file as follows.

def process_csv(dataframe):
dataframe.to_csv(f"csv_{current_user}.csv", index=False)
created_files.append(f"csv_{current_user}.csv")
return dataframe.sum().values[0]

Let’s do the same with our log file creation function.

def create_user_log_file():
# Creating a log file for the user,
with open(f"log_{current_user}.txt", "w", encoding="utf-8") as f:
created_files.append(f"log_{current_user}.txt")
# First message
log_message = str(current_user + "for the beginning of the log file.\n\n")
f.write(log_message)
f.close()

Now, let’s test our process_csv code.

<script src=”https://gist.github.com/oguzhari/dbb84a2527761d41f684dac9c7356127.js"></script>

When we run our own created file, we need to see a screen like this.

and this is what our working directory looks like.

We could have done this in process_csv, but let’s do it outside and save the result to a text file. I realise this is overkill for this scenario, but for our purposes…

def save_results(result):
with open(f"result_{current_user}.txt", "a", encoding="utf-8") as f:
created_files.append(f"result_{current_user}.txt")
f.write(str(result))
f.close()

The real apocalypse is just around the corner, let’s run our program one last time. this time, let’s edit our log records and come back here at the end of this article.

<script src=”https://gist.github.com/oguzhari/541294374c3db817cc917306105f6f9a.js"></script>

we save the record of each step and any errors that may occur by showing them to the users.

we will upload all the files we have created to a google drive folder. we mentioned the main difficulty here and created a ‘mycreds.txt’ file. now, we open our mycreds.txt file. we have a string in JSON format. by going to https://www.convertsimple.com/convert-json-to-toml/, we convert our json file to toml format. just paste the contents of the file directly.

now, we create a folder named ‘.streamlit’ in the root directory where we work. create a ‘secrets.toml’ file in this folder.

We paste the part we created from the JSON to TOML site directly into this file.

when uploading our repo to github, don’t forget to add the .streamlit/secrets.toml file to .gitignore. all the data here will be readable by streamlit. we will be able to do the same setup when we deploy it. so we will not share this information publicly. we need a verification file for our program to verify with google drive. we will create this file ourselves. how? here it is.

<script src=”https://gist.github.com/oguzhari/990a1af80096aeb2dad8169443b82113.js"></script>

Thanks to this code, we create our file that will verify that we are the person trying to save a file to our application without reflecting any information publicly.

yes! it’s done! finally, let’s upload the files we have obtained! first, let’s go to our drive and create a folder called test.

Now let’s write our loading code.

<script src=”https://gist.github.com/oguzhari/907117bba659a5eabce9fc80f7ee2981.js"></script>

we call our code that will create the mycreds file we just wrote at the beginning of our code. that code creates the ‘mycreds.txt’ file for us in the background. then, we load it from the file. then, we give the name of the file we created as folder_name. we ask the program to search for our file in the existing files. if it can find our file, we load the files saved in the “created_file” file to the drive. after our test code runs, our test folder looks like this;

And it’s done! Our program does the analysis independently of us, saves the situations during the analysis and uploads them to the file we want!

I know the article is a bit long, but I believe that I discovered all the processes myself and I know that there are other ways that are many times more efficient. However, if you are a beginner programmer like me and this work meets your needs, I am happy!

Thank you for reading this far! see you in another article 🥳

after finishing the article, I realised that I didn’t specify how to share it via streamlit. let’s fix that too! We go to streamlit.io, register by filling in the necessary information. then, if we have registered for the first time, we close the streamlit-example page that opens.

We continue by saying new app,

Since the application is connected to my github, it directly saw the files I worked with.

From here we finally select the Advanced Settings option. here we add the contents of our secrets.toml file.

the st.secrets function we use in our application will pull all the information from here. then we close it by saying save and deploy it. the most common error that may occur during installation is the “xx module not found” error. to prevent this, we create a “requirements.txt” file in our working root directory and write all the libraries we use line by line. that’s it for now!

--

--