ship.jpg

How to Share your Python Objects Across Different Environments in One Line of Code

Khuyen Tran

Khuyen Tran

Frustrated with Setting up the Environments to Share your Analysis with Others? Here’s how you can Make it Easier

Motivation

Have you ever wanted to share findings between you and your teammates so that everybody could work on the most updated results? There are three difficulties you might encounter:

  • Your teammates may deploy a different environment from your script
  • The model that is trained by your teammates changes daily. So they need to reshare the model with you every day.
  • The files or models are large. Re-uploading them each time you deploy your script is cumbersome.‌

Wouldn’t it be easier if all you need to do is something like this to share the Dataframe like this

  1. import datapane as dp
  2. df = dp.Blob.get('profile', owner='khuyentran1401).download_df()
  3. df

or something like this to share your model?

  1. import datapane as dp
  2. predictor = dp.Blob.get(name='predictor',owner='khuyentran1401').download_obj()
  3. X_TEST = [[10, 20, 30]]
  4. outcome = predictor.predict(X=X_TEST)

That is when we need Datapane’s Blob API


What is Datapane’s Blob?

Apart from APIs for creating reports and deploying your Notebook (shown in this other blog post… etc), Datapane also provides APIs for other common use-cases such as sharing blobs and managing secrets.

To illustrate the use of Blob, I use an example of a machine learning model from this article. Imagine you are training a model like this with good performance on the test set.

  1. from random import randint
  2. from sklearn.linear_model import LinearRegression
  3. import datapane as dp
  4. TRAIN_SET_LIMIT = 1000
  5. TRAIN_SET_COUNT = 100
  6. TRAIN_INPUT = list()
  7. TRAIN_OUTPUT = list()
  8. for i in range(TRAIN_SET_COUNT):
  9. a = randint(0, TRAIN_SET_LIMIT)
  10. b = randint(0, TRAIN_SET_LIMIT)
  11. c = randint(0, TRAIN_SET_LIMIT)
  12. op = a + (2*b) + (3*c)
  13. TRAIN_INPUT.append([a, b, c])
  14. TRAIN_OUTPUT.append(op)
  15. predictor = LinearRegression(n_jobs=-1)
  16. predictor.fit(X=TRAIN_INPUT, y=TRAIN_OUTPUT)

You want to send this model over to your teammate so that they can use your model to predict the new dataset. How can you do this?

You could set up a docker environment and send it over to your teammates. But if they just want to quickly test your model without the need of setting up the environment, datapane.Blob will be a better option.

Make sure to have the visibily='PUBLIC' if you want others to access it. Now others can use your blob for their code!

  1. import datapane as dp
  2. predictor = dp.Blob.get(name='predictor', owner='khuyentran1401').download_obj()
  3. X_TEST = [[10, 20, 30]]
  4. outcome = predictor.predict(X=X_TEST)
  5. coefficients = predictor.coef_
  6. print('Outcome : {}\nCoefficients : {}'.format(outcome, coefficients))
  1. Outcome : [140.]
  2. Coefficients : [1. 2. 3.]`

Try to run the same code to see if you can access and use the predictor! Before copy and paste the code, make sure to sign up on Datapane to get your token beforehand. Then login on the terminal with your token
datapane login — server=https://datapane.com/ — token=yourtoken

Now try

  1. dp.Blob.get(name='predictor', owner='khuyentran1401').download_obj()

to see if you are able to access and use the predictor and produce the same results as above!

If you want to share your blob privately in your organization, follow the same process, but set the visibility of your blob to ORG


What else can I do with Blob?

Besides uploading a Python object, you can also use blob to upload a Pandas Dataframe, a file.

  1. import datapane as dp
  2. # Upload a DataFrame
  3. b = dp.Blob.upload_df(df, name='my_df')
  4. # Upload a file
  5. b = dp.Blob.upload_file("~/my_dataset.csv", name='my_ds')
  6. # Upload an object
  7. b = dp.Blob.upload_obj([1,2,3], name='my_list')

And to get a Dataframe, file or object from your blob

  1. import datapane as dp
  2. # Download a DataFrame
  3. blob = dp.Blob.get(name="blob_id")
  4. # Download a DataFrame
  5. b = blob.download_df()
  6. # Download a file
  7. b = blob.download_file("~/my_dataset.csv")
  8. # Download an object
  9. b = blob.download_obj()

For example, in this article, I wrote about my findings after scraping more than 1k Github profiles.

To let you access to my dataset, all I need to do is to give you the name of the dataset and my Datapane’s account. Now you are ready to access it!

  1. import datapane as dp
  2. df = dp.Blob.get('profile', owner='khuyentran1401').download_df()
  3. df

Of course, to let you access to my data, I need to set the visibility of the blob to ‘PUBLIC’


Variable

Variables like database keys, passwords, or tokens should not be embedded in our code, especially when your script is visible to the outside world. Datapane’s Variable object provides a safe and secure way to create, store, and retrieve values that your scripts require.

For example, in this script, I used dp.Variable to store my GitHub token and my username so that when I share my code on Github, I don’t need to be worried about my token is exposed.

  1. file_name = 'urls.json'
  2. token = dp.Variable.get(name='github_token').value
  3. username = dp.Variable.get(name='authenticated_user').value
  4. g = Github(token)
  5. user_info = GithubUserScrape(file_name, username, token, scrape_owner=False, continue_scraping=True)
  6. user_info.scrape()

This variable is totally secret to me unless I specify it to be accessible by everyone with visibility='PUBLIC'.

If another person with a different account wants to access to my github_token variable and specify me as the owner, they will get an error

  1. requests.exceptions.HTTPError: 403 Client Error: Forbidden for url

because I am having my variable in private mode.

Another benefit: If I happen to reuse this variable in another script, I could easily access this token again without remembering what it is!


Conclusion

Congratulations! You have just learned how to use Datapane’sBlob and Variable to share your Dataframe, file, Python object, and variable across different scripts or different environments.

Your teammates would be happy if all they need to do to see your results is one line of code without the hassle of setting up the environment. And you would be happy when your teammates could access the same data as you do.

Need to share Python analyses?

Datapane is an API and framework which makes it easy for people analysing data in Python to publish interactive reports and deploy their analyses.