August 2004
Source
All interactive programs provide two basic functions: obtaining user input and displaying the results. Web applications implement this behavior using two HTTP methods: POST and GET respectively. This simple protocol gets broken when application returns web page in response to POST request. Peculiarities of POST method combined with idiosyncrasies of different browsers often lead to unpleasant user experience and may produce incorrect state of server application. This article shows how to design a well-behaved web application using redirection.
Double Submit problem
Two most frequently used HTTP request methods are GET and POST. GET method retrieves resource from a web server. Resource is identified by base location and optional query parameters. Generally, parameters of GET request are used to narrow the result and do not change server state. The same GET request can be sent to the server as many times as needed.
On the contrary, parameters of POST request usually contain input data, which can change state of server application. Same data submitted twice may produce unwanted results, like double withdrawal from a bank account or storing two identical items in a shopping cart of an online store. Submission of the same data more than once in a POST request is undesirable and got its own name: Double Submit problem.
Take the standard use case with HTML FORM submitted to the server. Form data is processed and stored in the database, then server replies with a page containing results of operation.
In the above use case the same POST request can be resubmitted using three methods:
- reloading result page using Refresh/Reload browser button (explicit page reload, implicit resubmit of request);
- clicking Back and then Forward browser buttons (implicit page reload and implicit resubmit of request);
- returning back to HTML FORM after submission, and clicking Submit button on the form again (explicit resubmit of request)
Considering the importance of POST data, browsers display a warning when the same POST request is about to be resent to the server. But the message is too technical and obscure for an average user. Also, some browsers do not ask for confirmation at all. Because of that many web sites show their own warnings. How often do you see messages like "Please do not click Back button or refresh this page" after you made an online payment?
The warning messages and confirmation dialogs clutter the interface and make a user feel nervous and uneasy, always afraid to make a mistake. If a web site relies on browser warnings but does not really check for double submit, the server database may become incorrect, while a user would lose confidence in internet transactions.
Is it possible to get rid of irritating warnings? Yes. HTML FORM submission method can be changed from POST to GET. Browsers are not required to ask confirmation when GET request is resubmitted, so this change makes user interface friendlier. But this "solution" breaks the semantic of GET method. It does not prevent resubmitting, it just hides the problem from a user.
The PRG pattern
The answer to double submit problem is redirection. This is a known technique, but it has not become a standard for "after-POST" results yet. As far as I know it does not have a well-known name. I suggest calling it PRG pattern for POST-REDIRECT-GET.
PRG pattern splits one request into two. Instead of returning a result page immediately in response to POST request, server responds with redirect to result page. Browser loads the result page as if it were an separate resource. After all, there are two different tasks to be done. First is to POST input data to the server. Second is to GET output to the client.
This approach provides a clean Model-View-Controller solution. All input data is stored, permanently or temporarily, in the Model on the server during the first step. The second step loads a View reflecting current Model state. When a user tries to refresh the result page, browser resends an "empty" GET request to the server. This request does not contain any input data and does not change server status. It only loads the View again. If server state was not changed by other processes/users, server responds with the same page as before refresh.
Loading resources using GET method is the cornerstone of suggested approach. Page loaded with GET request can be refreshed safely and transparently. Safely, because no input data is sent to the server. Transparently, because browser does not show warning message. The vehicle which makes transition from POST to GET possible is redirection.
With this technique user experience improves tremendously. No more scary messages with hard to decipher warnings. No trepidation to click Back, Forward or Refresh buttons. No fear to damage server data. Refresh button reloads result page with simple GET request. Back button returns a user to the page with the form. Following click on Forward button reloads the result page using GET again. Absolute freedom of browsing.
But wait, what about clicking on Submit button after returning back to the form page? The form would be resubmitted again, would not it? So all the trouble with redirection just to prevent inadvertent resubmit caused by page refresh?
Keep View Alive
Browsers did not always cache web pages. In the stone ages they were simple. Given the same address they pulled the same resource from the server again and again. Modern browsers are more intelligent. Based on different factors they try to determine should a page be reloaded from a server or not. If not, they can retrieving the page from a cache. For those still using dialup connection this is an instant save in terms of both traffic and time.
But the convenience of caching affects standard behavior. Here is the question: what would a user see if after submitting a form he clicks Back browser button? Did you say that he would see the same form he just submitted with the same values filled in? Why? Because the browser saved the page in the cache in case it would be needed again?
Well, forget smart browsers and caching. How this is for you: each window or page in an interactive application is a View representing an application Model. In order for the View to be correct and consistent with the Model it must be rendered anew each time it is presented to the user.
In plain English: caching must be prohibited for web applications. Online books, dictionaries, pictures can be cached. But please dear browser, do not save snapshots of a live program, because they may not represent actual Model state anymore. It is bad if the saved View is just looked at (you'd rather cache images of naked chicks), but it is tenfold worse when a stale View is used to modify the Model.
Now I ask the same question again: what would a user see if he clicks Back button after submitting a form? You know the correct answer already: the user of a well-designed web application would see a View which represents current Model state. This View would be presented in a way that resubmitting of the same data would be impossible.
New trick for old FORM
Let us take a closer look at the standard use case of an HTML FORM and a result page. The form can be used to edit an existing business entity or a new one. After form is submitted, its data is stored in the database and result of operation is displayed.
According to the PRG pattern, result page must not be returned in response to POST request, because attempt to reload it would cause double submit problem. Instead, browser must load result page separately, using GET method.
We can define the following processing modules (actions in Struts-speak) for this use case: Create Item, Display Item, Store Item and Display Stored.
These modules are combined in input/output pairs:
- Create Item/Display Item - creates new empty item, then shows new item using Item Page HTML form and allows to enter item value;
- Store Item/Display Stored - stores item, then shows persisted item from database in read-only mode on Stored Result page.
- Store Item/Display Item - if fails to store, shows invalid item along with errors using the same HTML form;
- Display Item is used separately to show and update item which already exists in the database.
(1) Create Item is called from a link on some other web page when a new object should be created. This action constructs empty business object and stores it in the temporary area called Current Items, which itself can be stored in the database or in the session; then redirects to Display Item.
(2-1) Display Item loads constructed business object from the Current Items and shows it on the Item Page, which is HTML form. The form can be refreshed at any time, browser would just ask Display Item action to obtain and show business object again.
(2-2) User fills out object value and submits HTML form to the Store Item action. If object is not accepted, it is kept in the Current Items area, server redirects back to Display Item action, which reads invalid object along with error messages from the Current Items and redisplays it in the form. If Item Page needs to be is refreshed, it loads the same object from Current Items again.
(3) If the object is accepted by Store Item action, it is persisted in the database and removed from temporary area. After that browser is redirected to Display Stored action which shows the Stored Result page. It can display the object which was just persisted. The result page can be safely refreshed, it would load the object from the database again.
If a user clicks Back button on result page (3) after successfully creating and storing new object, he returns to Display Item action (2). The temporary object has been already removed from the temporary area. Display Item has nothing to show and displays an error page instead of the item form, notifying the user that the object cannot be shown simply because it does not exist anymore. Thus a user cannot resubmit the object again.
Similar situation should happen if during creation of a new object the user leaves the form page (2) to a page preceding it (1). For application that means that the user decided to discard the new object. New object is removed from the Current Items. When the user clicks Forward button and returns to Display Item (2), he would see an "Object not found" error.
Instead of displaying an error page when an object is gone from the temporary area, we can do smarter. Create Item generates unique object ID and redirects to Display Item with object ID as request parameter. Display Item action reads object from the session and compares its ID with the one passed in the argument, then shows object to the user. After the user entered object value and submitted it to the Store Item, object ID becomes the primary key of the object.
Now, when the user returns back from result page (3) after submitting an object, browser invokes Display Item action passing it the same request, which contains object ID (2). Object was removed from the session, but it was stored in the database. Display Item action reads the object from the database, copies it to the Current Items and shows it to the user. Depending on business rules, this object may become read-only, so the form would change to a simple page, showing object content, but not allowing submitting it again. Or, conversely, the form would allow to edit it and submit changes. In the latter case the title of the form would change from "Create New Object" to "Edit Existing Object". If the user submits this object, this is not considered as double submit case. It is an intentional update of existing object.
Editing of existing object is simple, this case is basically already covered. We just need to make sure that Display Item makes no difference between new and existing object. Display Item takes object ID as request parameter, then looks up business object in the Current Items first. If object is not found, it is looked up in the database, copied to temporary area and then displayed (2). After object is updated and submitted, it is stored in the database. When the user clicks Back button from the result page (3), the item form reloads object from the database again (2) so it can be modified and submitted once again. Is this a double submit case? No. It is a deliberate modification of the same object by a user. Of course, you can create all the business rules you want, for example prohibit modification of the same object within certain timeframe.
Let us complete the use case and take a look at how the object is deleted.
Deletion is simple. Get ID as request parameter, pass it to Delete Item action (4), it deletes object in the Model and redirects to result page (5) which verifies with the database that particular object does not exist anymore. Result page can be safely refreshed without producing another delete request and without warning messages. When Back button is clicked on result page (5), browser returns to the page which invokes Delete Item action (4). If this action is called again with the same object ID, then apply it to the Model, get "object not found" exception, show error page. Again this is not a double submit, this is an explicit attempt to delete the same object again. Big deal, it was already deleted.
I think you got the idea. Just another quick example: an online store.
Storing several identical items in the shopping basket is not a problem while a user is still shopping. It is enough to show the basket content and the quantity of each item. What is really important is to ensure that the payment is processed only once. It may look something like this:
- A shopping basket is created, the unique basket ID is assigned to the basket.
- If a user clicks on Back button after adding an item to the basket, browser reloads up-to-date basket information from the server and shows to the user that the item is already in the basket. It is up to the user to add another identical item.
- When the basket is submitted, its content is sent to a purchasing subsystem; the basket is invalidated; its transaction number is saved in history table if needed and destroyed from application context.
- When a user clicks Back button after purchase was made, browser attempts to load the basket and fails because the basket, its ID and its content already have been destroyed. Browser shows error message instead of the basket. Submitting the same basket twice is impossible.
- In case of caching browser or proxy a user who clicked Back button would see the same basket which was already submitted to purchasing subsystem. User's attempt to resubmit the basket would fail because basket tracking ID has been already destroyed along with the basket itself. As a courtesy for users of caching browsers the server can reply with error stating that the submitted basket does not exist any longer.
The Mantra
PRG pattern can be rephrased like this:
Never show pages in response to POST
Always load pages using GET
Navigate from POST to GET using REDIRECT
Repeat these lines before going to bed.
Think in terms of resources
Desktop applications are presentation-centric. When you select menu item you pretty much know which window would be displayed and how it would look like. Depending on Model state the window may display different information, but the overall window layout would be the same. Desktop user interface is relatively static and is largely defined at development stage.
Web applications should be resource-centric. They can attain greater presentation flexibility instead of fixating on delivering a particular page. Browser should request from a server a resource, a business entity, not a page. Depending on resource availability and state server would generate different presentation for that resource. It can be a regular "read-only" web page, or a form with input controls, or a message that resource is not available or it was permanently removed. Think in terms of resources, not pages.
Work with objects
When you obtain input data, you should know which objects it belongs to. When you display data, you should know content of which object is shown. At any time you must know which object you are working with. Use object ID to load, display and store an object. Pass object ID as request parameter.
Use the session or other short-term server storage as a buffer for currently edited or viewed objects. Ensure that your Views always represent current Model state.
Protect the Model
A web page is only a wrapper of what lies beneath: the Model, the business objects, the database. What is displayed to a user is important, but more important is what is stored in the Model. Protect your Model, nurture it, build all kind of error handling around it. After all, inconsistent user interface is just a nuisance; the chaos begins when the Model blows up.
Model should be accessed and updated using few well-defined ways. Generally, Model should not allow concurrent updates by the same client. Keeping Model valid and up-to-date is the best guarantee from inconsistency between presentation and business/persistence layers.
Define clear business/persistence rules, do not rely on web layer to validate input data. Data can come from anywhere: from a user of your web page, from web-service, from third-party application or from aliens, and all of them cannot be trusted. Validate input data directly in the heart of the application, in the Model.
Prevent resubmits
Include object ID and modification timestamp in a form page, provide time of modification for all business objects in the Model. Use ID to look up business object in the persistent storage, and timestamp to distinguish double submit from a cached page.
Consider applicability of tokens. A token allows to detect a double submit from a stale form page. Token is stored in the session before the form submitted for the first time; the same token value is planted on the HTTP form. When the form is submitted, the token value submitted as well. Application verifies that the token is present in the session, accepts input data, and removes token from the session. If a stale form is resubmitted, the form token would not have its session counterpart anymore. Tokens can be used as a pure web layer solution, Struts have built-in support for tokens.
Model can deal with resubmission more reliably than tokens. If a form is used to add or delete data, apply input values to the Model directly. Properly designed Model would throw insert or delete exception. If a form is used for editing of existing object, compare timestamp on the form with timestamp in the database and do not accept input with timestamp earlier than persisted data.
Controlling data with Model makes things easy. You can notify a user that the data being resubmitted is already in the database. "Thank you, stop clicking that button and refresh the page. The original input form has gone long ago, but your browser still keeps it in the cache."
Prohibit caching of application pages. Insert
and
in your pages. A page would be considered expired right after it loaded from the server.
Separate input from output
Use different classes to process input and output. If you use Struts, create separate input and output form classes, this works very well with two-stage PRG pattern:
- POST request is received by the server
- Struts populates input form class with request parameters
- Input form class validates input data
- Model is updated, information related to current operation is saved in the session for use by consequent GET requests
- Browser is redirected to output action and loads the result page using GET
- Server looks up current object in the session and/or in the Model and fills an output form
- View is created using output form data and is sent back to the browser
You can define only setters in the input form, and only getters in the output form to make form classes easier to read and to ensure that Struts would not populate output form fields with request parameters.
You may want to split large action classes into input an output actions as well.
Use session-scoped UI objects
PRG pattern implies roundtrip to a browser, so the request data is lost. There are two choices to keep POST data: either to transfer it a redirecting response and the in a GET request, or to store it on the server. The first approach is bulky and is non-idempotent. You would have two different kinds of GET request, one to redisplay the HTML form with all its previous data, another to display business object from the database, using just its ID.
Thus, the proper way is to store temporary data on the server and provide GET request with object ID only. That way output action would not even know, was the object just created or loaded from database.
Temporary data corresponds to currently edited or viewed object, and includes both business and presentation data, like:
- object value;
- error messages related to this object;
- page title.
Because this temporary object defines presentation of business object, I call it UI object. If you use Struts, you can use form classes with session scope as UI objects. It is the easiest way to convert regular "forwarding" application into "redirecting" one.
Apparently, the same form class would be used in both input and output actions, so the output action could get access to values set in the input action. The attractiveness of session-scoped form classes is undermined by the fact, that Struts repopulates form fields with each request. This is undesirable, so the mutators would need to verify the name of current action mapping and do not update field values for output action. Struts calls reset method before populating the form, and validate after that. If these methods are used for both input and output, current mapping name should be verified, so the appropriate code could be used.
Session-scoped form classes are kept in memory during client session, which may become an issue. If your form class have references to large objects, you may need to release these objects manually.
Another issue arises when more that one form instance is needed to be created. How this can be done from application code, if form classes are maintained by Struts?
So, despite of the certain convenience of session-scoped form classes I suggest to create your own UI objects. You can have better control over them, you can decide do you want to store them in the session or in database. You will have better abstraction from Struts framework, and porting to other frameworks would be easier.
Form classes are intended for two simple things: deliver input data from HTML form, and render output data on web page. Form classes are just value objects, enhanced with additional functionality like validation. Keep them in request scope, do not use them to store UI or business data.
Struts: use ForwardAction class in output actions
If you have separate input and output form classes, you have got two sets of reset and validate methods. These methods are called by Struts before passing control to an action class. You can use validate in the input form for its original purpose: to verify input data. Output form, on the other hand, does not have much to validate, it is used just to build the result page. So, you can move code from execute method of action class to validate method of output form class and to get rid of custom action class altogether.
Struts: do not expose Views
Views, which are usually JSP pages, must not be available for direct access from a browser. Forget that JSP can process the request. Regard JSP as HTML with data access, use it for output only. Always pass control through action class and/or form class. This ensures clean separation between components and allows Controller to monitor all requests. Hide web pages in WEB-INF directory and display them from their respective actions.
Configure caching
Browsers are not required to process cache control tags on the web pages, but they usually obey HTTP response header fields. Add
response.setHeader("Pragma", "No-cache");
response.setHeader("Cache-Control", "no-cache");
response.setDateHeader("Expires", 1);
Corresponding HTTP header fields produced by Tomcat 4.0.6 looks like this:
"Pragma: No-cache"
"Cache-Control: no-cache"
"Expires: Thu, 01 Jan 1970 00:00:00 GMT"
Use better browsers
Despite efforts to prohibit caching some web browsers like Firefox just do not care. Caching works great with simple forwarding applications, preventing implicit resubmits. But caching a page which supposed to reflect current sever state breaks the user experience and introduces the double submit problem again.
Other browsers like Opera can resubmit POST request without confirmation message. This may invalidate state of an application which does not check for double submit, and a user would not even know about it.
Old Netscape Navigator works fine for me, but for some reason it freezes for several seconds when submitting a POST request on Tomcat server. Other browsers do not inhibit this strange behavior.
Internet Explorer does almost everything right, but is very annoying. When you resubmit POST request, it shows you a "Page expired" window first and a dialog box next before allowing to proceed. And if you decide not to, it loses your current page. But because your application would not have resubmit problems, your customers would not suffer much.
Why redirect works
It is interesting that PRG pattern exploits non-standard behavior of browsers and web servers. HTTP 1.1 defines several redirect response codes in 3xx range. Some of these codes require browser to use the same request type, some require to change POST to GET, some require to obtain user confirmation when request is redirected. Turns out that many of these requirements are not implemented by popular browsers. Instead, they have common de-facto behavior, like redirecting POST to GET without confirmation if received 302 code. This feature is used by PRG pattern.
This behavior is wrong for 302 ("Found") code, but is absolutely correct for 303 ("See Other") code. Still, few servers return 303 when redirect with GET method is required. HttpResponse.sendRedirect method does not allow to set response code, it always returns 302. It is possible to emulate sendRedirect(url) behavior using the following methods:
res.setStatus(res.SC_SEE_OTHER);
res.setHeader("Location",url);
where SC_SEE_OTHER is the proper 303 code, but sendRedirect provides some additional service like resolving relative addresses, so this is not a direct snap-in. The discrepancy between browser behavior and HTTP standard can be resolved, if 302 and 303 codes considered equal, and another code for proper 302 behavior were created.
In any case, I doubt that browser vendors will change implementation of 302 response code, because too many applications relay on it. The good thing is that modern browsers understand and correctly process 303 code, so if you want to be sure, return 303 instead of 302.
References
- "GET after POST" by Adam Vandenberg:
http://theflangynews.editthispage.com/stories/storyReader$1118 - "A Fast Introduction to Basic Servlet Programming" by Marty Hall:
http://www.informit.com/articles/article.asp?p=29817&seqNum=7 - "Redirect in response to POST transaction" by A.J.Flavell:
http://ppewww.ph.gla.ac.uk/~flavell/www/post-redirect.html - "Post/Redirect/Get pattern for web applications" by Michael Jouravlev:
http://www.theserverside.com/patterns/thread.tss?thread_id=20936 - "So, You Don't Want To Cache, Huh?" by Joe Burns:
http://www.htmlgoodies.com/beyond/nocache.html - RFC 1945, Hypertext Transfer Protocol -- HTTP/1.0 by T. Berners-Lee, R. Fielding, H. Frystyk:
http://www.ietf.org/rfc/rfc1945.txt - RFC 2616, Hypertext Transfer Protocol -- HTTP/1.1 by R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, T. Berners-Lee:
http://www.w3.org/Protocols/rfc2616/rfc2616.html
About the Author
Michael Jouravlev - I hold MS in Computer Science from Moscow Aviation Institute (technical university), Moscow, Russia. I have more than 10 years of experience developing applications for MS-DOS, Windows and Java platform. I devoted last 5 years to server-side Java applications. Curently I am employed as a software engineer at International Lottery and Totalizator, Inc., www.ilts.com
1 comments:
Very nice blog about basics of how websites function.The client and server side and http request and response.The way you wrote is very good specially the supporting images.Good work.Keep blogging.
digital signature certificate
Post a Comment