Read the Documentation
A blog about learning new things. Browse all posts.
Storing blog posts on the file system
While using this blog, it became apparent very quickly that writing and editing blog posts through an HTML form was not going to be ideal. Given that the posts are written in Markdown, using an external editor that recognizes Markdown makes for a much more enjoyable experience. I could copy and paste between an editor and the HTML form, but then there are two versions of the same post and there is a chance they could become out of sync. I would like to have changes made through the editor show up directly in the post (after reloading) without needing to edit the post through the web.
I did some research and while there were plenty of differing opinions about the "best" solution for storing blog posts, most seemed to agree that for simple uses with relatively modest sizes there wasn't much of a performance difference between storing the text in the database vs an external file. I decided against storing the post on both the file system and in the database, with automatic syncing, because I don't plan on using any sophisticated searching on the blog post. My primary motivation is to make writing/editing posts easier.
Most of the necessary changes were in the create_post
, update_post
, and delete_post
functions to create, update, and delete the file, respectively. The view functions that render view_post.html
only needed to call a new function, get_post_body
to retrieve post bodies. And of course, I had to remove the body
column from the Post
model, instead adding a reference to the file name (more on that below). I also added the ability to upload a pre-existing file to be used as the body of a new post.
Managing post files
I wanted the file management to be as automated as possible.
- File names are generated from the
simple_title
of a post. These are already unique and file name friendly, and I have a function that generates simple titles from post titles. - When a post title changes, the post file with the old title is deleted.
- Deleting a post also removes the associated post file. I decided not to entirely delete them, however, and instead move the files to a
.deleted
directory. I wanted to keep a little manual control of final deletion, for now. - Once the contents of the uploaded file are read, the file is deleted. That is, uploading a file does not store the file permenantly on the server. If the user decides to abandon the post, there won't be orphaned files.
Post file location
I went back and forth on how exactly to deal with generating/storing the file name for a given post. As described above, the naming convention is straight-forward: a file name is generated by combining a specified post directory (defined in the application config.py file as BLOG_POST_DIR
) and post.simple_title
with a file extension '.md'.
For example, if BLOG_POST_DIR = 'blog_posts'
and there is a post titled "This is a blog post!", the generated file name should be blog_posts/this-is-a-blog-post.md
.
Initially, I just generated the file name every time I needed to actually access the file.
filename = os.path.join(current_app.config['BLOG_POST_DIR'], f'{post.simple_title}.md')
This works, but is pretty cumbersome to write every time I need the file name. And why repeat myself? Instead, I shifted the task into a function to generate the file name.
def post_filename(post):
return os.path.join(current_app.config['BLOG_POST_DIR'], f'{post.simple_title}.md')
Again, this works but it is still awkward to have an intermediary between a post and the name of the file containing the post body. Shouldn't a post just "know" where its own file is located? The file name could be added as a column to the Post
model, but that would still require manually updating this column when creating/updating posts. Ideally, a post should be able to generate its own file name on demand, but not need to store it necessarily (since it is composed of other known data).
The solution I settled on was to add filename
as a read-only property of Post
objects, rather than a column. It can be easily accessed using post.filename
, automatically updates when a post's simple_title
is updated, but doesn't store redundant information in the DB.
class Post(db.Model):
...
simple_title = db.Column(db.String(100), index=True, unique=True)
...
@property
def filename(self):
if not self.simple_title:
return ''
return os.path.join(current_app.config['BLOG_POST_DIR'], f'{self.simple_title}.md')
Create, update, and delete posts
Creating a new post is fairly straight forward.
def create_post(title, summary, body, tags, public):
post = Post(...)
...
with open(post.filename, 'w') as f:
f.write(body)
Updating is a little more intersting. Since the file names are generated based on a post's simple_title
, if that changes then the file name changes. If this happens while updating a post, the file with the old name is left on the server, but is no longer referenced. Make sure to clean up after yourself!
def update_post(post_id, title, summary, body, tags, public):
post = Post.query.get(post_id)
...
if post.simple_title != simplify_title(title):
os.remove(post.filename)
post.simple_title = simplify_title(title)
with open(post.filename, 'w') as f:
f.write(body)
I erred on the side of being overly cautious with deletion, because it is possible to have a post in the database that is somehow missing a post body file. Also, if this is the first post to be deleted, the .deleted
directory needs to be created.
def delete_post(post_id):
post = Post.query.get(post_id)
...
blog_post_dir = current_app.config['BLOG_POST_DIR']
if os.path.exists(post.filename):
if not os.path.exists(os.path.join(blog_post_dir, '.deleted')):
os.mkdir(os.path.join(blog_post_dir, '.deleted'))
new_filename = os.path.join(blog_post_dir, '.deleted', os.path.basename(post.filename))
os.rename(post.filename, new_filename)
Retrieving post files
I wanted to separate the view functions from the file system as much as possible (only the upload_file
view reads and writes files, and only to a temporary file). Also, I need to retrieve the body of a post in a few different views and I don't want the view functions to know/care if the file exists or not.
def get_post_body(post):
try:
with open(post.filename) as f:
body = f.read()
return body
except FileNotFoundError:
return 'No post file found.'
except:
return 'Error retrieving post file.'
The unaltered Markdown is returned, rather than converting to HTML first. This function is used to populate the post body <textarea>
while editing an existing post, so returning HTML would break that functionality.
Uploading files using Flask
The Flask documentation provides a useful example of a file upload pattern using Flask. However, it doesn't exactly match my needs so I needed to adapt it a bit. On any kind of failure, the user is redirected back to the new post page with an appropriate error message. On successful upload, I decided to render new_post.html
from this view since I would need to pre-populate the body
form element with the text from the file.
I considered passing the text body in Flask's g
requests variable and somehow integrating this additional way of entering the new_post
view function, but that seemed overly complicated. Keep It Simple, Stupid, right? Yes.
To quote the above mentioned example from Flask, there are three parts to uploading a file:
- A
<form>
tag is marked withenctype=multipart/form-data
and an<input type=file>
is placed in that form. - The application accesses the file from the
files
dictionary on therequest
object. - Use the
save()
method of the file to save the file permanently somewhere on the file system.
The upload_file
view function accomplishes the second and third parts.
@bp.route('/upload_file', methods=['POST'])
@login_required
def upload_file():
if 'file' not in request.files or request.files['file'].filename == '':
flash('No file selected.')
return redirect(url_for('.new_post'))
file = request.files['file']
if file and allowed_file(file.filename):
filename = os.path.join(current_app.config['UPLOAD_FOLDER'], secure_filename(file.filename))
file.save(filename)
form = PostForm()
with open(filename) as f:
form.body.data = f.read()
if os.path.exists(filename):
os.remove(filename)
g.show_upload = False
return render_template('blog/new_post.html', title='New post', form=form)
flash('Extension not allowed') if file else flash('Upload failed.')
return redirect(url_for('.new_post'))
Working with the HTML was a little more challenging since the upload file button is part of a separate form that POSTs to a different URL than the current page, and that on a successful upload the new post HTML is rendered from the upload_file
view function, rather than from new_post
.
Here is the upload file form element.
{% if g.show_upload -%}
<div class="col-md-4">
<form action="{{ url_for('blog.upload_file') }}"
method="post" class="form" role="form" id="upload_form" enctype="multipart/form-data">
<div class="form-group">
<p><label for="file_upload">Upload post body file (.md or .txt)</label>
<input type=file name=file id="file_upload"></p>
<p><button class="btn btn-default" type="submit" id="btn_upload">Upload file</button></p>
</div>
</form>
</div>
{%- endif %}
One consequence of my decision to render HTML directly from the upload_file
view function is the URL for the new/edit post page could be one of three options:
- blog/new_post
- blog/new_post?post_id=<id>
- blog/upload_file
For the first and third situations, the post form should submit to the new_post
route, but for the second the form should include the query string argument post_id
to indicate an edit, rather than a post creation.
Side note: When the blog was simple, using one route for both creating a new post and updating an existing post made sense. However, as the application has grown in complexity, its becoming apparent that these should really be separate view functions.
The updated post form tag:
<form action="
{%- if 'post_id' in request.url -%}
{{ url_for('.new_post', post_id=request.args['post_id']) }}
{%- else -%}
{{ url_for('.new_post') }}
{%- endif -%}
" method="post" class="form" role="form" id="post_form">