Skip to main content

3.2 Session Descriptions and State Machine

3.2 Session Descriptions and State Machine

In order to establish the media plane, the JSEP implementation needs specific parameters to indicate what to transmit to the remote side, as well as how to handle the media that is received. These parameters are determined by the exchange of session descriptions in offers and answers, and there are certain details to this process that must be handled in the JSEP APIs.

Whether a session description applies to the local side or the remote side affects the meaning of that description. For example, the list of codecs sent to a remote party indicates what the local side is willing to receive, which, when intersected with the set of codecs the remote side supports, specifies what the remote side should send. However, not all parameters follow this rule; some parameters are declarative, and the remote side must either accept them or reject them altogether. An example of such a parameter is the TLS fingerprints [RFC8122] as used in the context of DTLS [RFC6347]; these fingerprints are calculated based on the local certificate(s) offered and are not subject to negotiation.

In addition, various RFCs put different conditions on the format of offers versus answers. For example, an offer may propose an arbitrary number of "m=" sections (i.e., media descriptions as described in [RFC4566], Section 5.14), but an answer must contain the exact same number as the offer.

Lastly, while the exact media parameters are known only after an offer and an answer have been exchanged, the offerer may receive ICE checks, and possibly media (e.g., in the case of a re-offer after a connection has been established) before it receives an answer. To properly process incoming media in this case, the offerer's media handler must be aware of the details of the offer before the answer arrives.

Therefore, in order to handle session descriptions properly, the JSEP implementation needs:

  1. To know if a session description pertains to the local or remote side.

  2. To know if a session description is an offer or an answer.

  3. To allow the offer to be specified independently of the answer.

JSEP addresses this by adding both setLocalDescription and setRemoteDescription methods and having session description objects contain a type field indicating the type of session description being supplied. This satisfies the requirements listed above for both the offerer, who first calls setLocalDescription(sdp [offer]) and then later setRemoteDescription(sdp [answer]), and the answerer, who first calls setRemoteDescription(sdp [offer]) and then later setLocalDescription(sdp [answer]).

During the offer/answer exchange, the outstanding offer is considered to be "pending" at the offerer and the answerer, as it may be either accepted or rejected. If this is a re-offer, each side will also have "current" local and remote descriptions, which reflect the result of the last offer/answer exchange. Sections 4.1.14, 4.1.16, 4.1.13, and 4.1.15 provide more detail on pending and current descriptions.

JSEP also allows for an answer to be treated as provisional by the application. Provisional answers provide a way for an answerer to communicate initial session parameters back to the offerer, in order to allow the session to begin, while allowing a final answer to be specified later. This concept of a final answer is important to the offer/answer model; when such an answer is received, any extra resources allocated by the caller can be released, now that the exact session configuration is known. These "resources" can include things like extra ICE components, Traversal Using Relays around NAT (TURN) candidates, or video decoders. Provisional answers, on the other hand, do no such deallocation; as a result, multiple dissimilar provisional answers, with their own codec choices, transport parameters, etc., can be received and applied during call setup. Note that the final answer itself may be different than any received provisional answers.

In [RFC3264], the constraint at the signaling level is that only one offer can be outstanding for a given session, but at the JSEP level, a new offer can be generated at any point. For example, when using SIP for signaling, if one offer is sent and is then canceled using a SIP CANCEL, another offer can be generated even though no answer was received for the first offer. To support this, the JSEP media layer can provide an offer via the createOffer method whenever the JavaScript application needs one for the signaling. The answerer can send back zero or more provisional answers and then finally end the offer/answer exchange by sending a final answer. The state machine for this is as follows:

                    setRemote(OFFER)               setLocal(PRANSWER)
/-----\ /-----\
| | | |
v | v |
+---------------+ | +---------------+ |
| |----/ | |----/
| have- | setLocal(PRANSWER) | have- |
| remote-offer |------------------- >| local-pranswer|
| | | |
| | | |
+---------------+ +---------------+
^ | |
| | setLocal(ANSWER) |
setRemote(OFFER) | |
| V setLocal(ANSWER) |
+---------------+ |
| | |
| |<---------------------------+
| stable |
| |<---------------------------+
| | |
+---------------+ setRemote(ANSWER) |
^ | |
| | setLocal(OFFER) |
setRemote(ANSWER) | |
| V |
+---------------+ +---------------+
| | | |
| have- | setRemote(PRANSWER) |have- |
| local-offer |------------------- >|remote-pranswer|
| | | |
| |----\ | |----\
+---------------+ | +---------------+ |
^ | ^ |
| | | |
\-----/ \-----/
setLocal(OFFER) setRemote(PRANSWER)

Figure 2: JSEP State Machine

Aside from these state transitions, there is no other difference between the handling of provisional ("pranswer") and final ("answer") answers.