Uploading Data¶
Streaming Multipart Data Encoder¶
Requests has support for multipart uploads, but the API means that using
that functionality to build exactly the Multipart upload you want can be
difficult or impossible. Additionally, when using Requests’ Multipart upload
functionality all the data must be read into memory before being sent to the
server. In extreme cases, this can make it impossible to send a file as part of
a multipart/form-data
upload.
The toolbelt contains a class that allows you to build multipart request bodies in exactly the format you need, and to avoid reading files into memory. An example of how to use it is like this:
import requests
from requests_toolbelt.multipart.encoder import MultipartEncoder
m = MultipartEncoder(
fields={'field0': 'value', 'field1': 'value',
'field2': ('filename', open('file.py', 'rb'), 'text/plain')}
)
r = requests.post('http://httpbin.org/post', data=m,
headers={'Content-Type': m.content_type})
The MultipartEncoder
has the
.to_string()
convenience method, as well. This method renders the
multipart body into a string. This is useful when developing your code,
allowing you to confirm that the multipart body has the form you expect before
you send it on.
The toolbelt also provides a way to monitor your streaming uploads with
the MultipartEncoderMonitor
.
-
class
requests_toolbelt.multipart.encoder.
MultipartEncoder
(fields, boundary=None, encoding='utf-8')¶ The
MultipartEncoder
object is a generic interface to the engine that will create amultipart/form-data
body for you.The basic usage is:
import requests from requests_toolbelt import MultipartEncoder encoder = MultipartEncoder({'field': 'value', 'other_field', 'other_value'}) r = requests.post('https://httpbin.org/post', data=encoder, headers={'Content-Type': encoder.content_type})
If you do not need to take advantage of streaming the post body, you can also do:
r = requests.post('https://httpbin.org/post', data=encoder.to_string(), headers={'Content-Type': encoder.content_type})
If you want the encoder to use a specific order, you can use an OrderedDict or more simply, a list of tuples:
encoder = MultipartEncoder([('field', 'value'), ('other_field', 'other_value')])
Changed in version 0.4.0.
You can also provide tuples as part values as you would provide them to requests’
files
parameter.encoder = MultipartEncoder({ 'field': ('file_name', b'{"a": "b"}', 'application/json', {'X-My-Header': 'my-value'}) ])
Warning
This object will end up directly in
httplib
. Currently,httplib
has a hard-coded read size of 8192 bytes. This means that it will loop until the file has been read and your upload could take a while. This is not a bug in requests. A feature is being considered for this object to allow you, the user, to specify what size should be returned on a read. If you have opinions on this, please weigh in on this issue.
Monitoring Your Streaming Multipart Upload¶
If you need to stream your multipart/form-data
upload then you’re probably
in the situation where it might take a while to upload the content. In these
cases, it might make sense to be able to monitor the progress of the upload.
For this reason, the toolbelt provides the
MultipartEncoderMonitor
. The
monitor wraps an instance of a
MultipartEncoder
and is used
exactly like the encoder. It provides a similar API with some additions:
The monitor accepts a function as a callback. The function is called every time
requests
callsread
on the monitor and passes in the monitor as an argument.The monitor tracks how many bytes have been read in the course of the upload.
You might use the monitor to create a progress bar for the upload. Here is an example using clint which displays the progress bar.
To use the monitor you would follow a pattern like this:
import requests
from requests_toolbelt.multipart import encoder
def my_callback(monitor):
# Your callback function
pass
e = encoder.MultipartEncoder(
fields={'field0': 'value', 'field1': 'value',
'field2': ('filename', open('file.py', 'rb'), 'text/plain')}
)
m = encoder.MultipartEncoderMonitor(e, my_callback)
r = requests.post('http://httpbin.org/post', data=m,
headers={'Content-Type': m.content_type})
If you have a very simple use case you can also do:
import requests
from requests_toolbelt.multipart.encoder import MultipartEncoderMonitor
def my_callback(monitor):
# Your callback function
pass
m = MultipartEncoderMonitor.from_fields(
fields={'field0': 'value', 'field1': 'value',
'field2': ('filename', open('file.py', 'rb'), 'text/plain')},
callback=my_callback
)
r = requests.post('http://httpbin.org/post', data=m,
headers={'Content-Type': m.content_type})
-
class
requests_toolbelt.multipart.encoder.
MultipartEncoderMonitor
(encoder, callback=None)¶ An object used to monitor the progress of a
MultipartEncoder
.The
MultipartEncoder
should only be responsible for preparing and streaming the data. For anyone who wishes to monitor it, they shouldn’t be using that instance to manage that as well. Using this class, they can monitor an encoder and register a callback. The callback receives the instance of the monitor.To use this monitor, you construct your
MultipartEncoder
as you normally would.from requests_toolbelt import (MultipartEncoder, MultipartEncoderMonitor) import requests def callback(monitor): # Do something with this information pass m = MultipartEncoder(fields={'field0': 'value0'}) monitor = MultipartEncoderMonitor(m, callback) headers = {'Content-Type': monitor.content_type} r = requests.post('https://httpbin.org/post', data=monitor, headers=headers)
Alternatively, if your use case is very simple, you can use the following pattern.
from requests_toolbelt import MultipartEncoderMonitor import requests def callback(monitor): # Do something with this information pass monitor = MultipartEncoderMonitor.from_fields( fields={'field0': 'value0'}, callback ) headers = {'Content-Type': montior.content_type} r = requests.post('https://httpbin.org/post', data=monitor, headers=headers)
Streaming Data from a Generator¶
There are cases where you, the user, have a generator of some large quantity
of data and you already know the size of that data. If you pass the generator
to requests
via the data
parameter, requests
will assume that you
want to upload the data in chunks and set a Transfer-Encoding
header value
of chunked
. Often times, this causes the server to behave poorly. If you
want to avoid this, you can use the
StreamingIterator
. You pass it
the size of the data and the generator.
import requests
from requests_toolbelt.streaming_iterator import StreamingIterator
generator = some_function() # Create your generator
size = some_function_size() # Get your generator's size
content_type = content_type() # Get the content-type of the data
streamer = StreamingIterator(size, generator)
r = requests.post('https://httpbin.org/post', data=streamer,
headers={'Content-Type': content_type})
The streamer will handle your generator for you and buffer the data before
passing it to requests
.
Changed in version 0.4.0: File-like objects can be passed instead of a generator.
If, for example, you need to upload data being piped into standard in, you might otherwise do:
import requests
import sys
r = requests.post(url, data=sys.stdin)
This would stream the data but would use a chunked transfer-encoding. If
instead, you know the length of the data that is being sent to stdin
and
you want to prevent the data from being uploaded in chunks, you can use the
StreamingIterator
to stream the
contents of the file without relying on chunking.
import requests
from requests_toolbelt.streaming_iterator import StreamingIterator
import sys
stream = StreamingIterator(size, sys.stdin)
r = requests.post(url, data=stream,
headers={'Content-Type': content_type})
-
class
requests_toolbelt.streaming_iterator.
StreamingIterator
(size, iterator, encoding='utf-8')¶ This class provides a way of allowing iterators with a known size to be streamed instead of chunked.
In requests, if you pass in an iterator it assumes you want to use chunked transfer-encoding to upload the data, which not all servers support well. Additionally, you may want to set the content-length yourself to avoid this but that will not work. The only way to preempt requests using a chunked transfer-encoding and forcing it to stream the uploads is to mimic a very specific interace. Instead of having to know these details you can instead just use this class. You simply provide the size and iterator and pass the instance of StreamingIterator to requests via the data parameter like so:
from requests_toolbelt import StreamingIterator import requests # Let iterator be some generator that you already have and size be # the size of the data produced by the iterator r = requests.post(url, data=StreamingIterator(size, iterator))
You can also pass file-like objects to
StreamingIterator
in case requests can’t determize the filesize itself. This is the case with streaming file objects likestdin
or any sockets. Wrapping e.g. files that are on disk withStreamingIterator
is unnecessary, because requests can determine the filesize itself.Naturally, you should also set the Content-Type of your upload appropriately because the toolbelt will not attempt to guess that for you.