Custom content types are not the future

There has been an explosion in CCKs for Joomla (Content Construction Kit). Perhaps you've heard of them, some examples are K2, Zoo, FlexiContent, SOBI2, and many others. Perhaps you've used them, or you're interested in trying them out, but I'd like to explain a little more about what they do and how they do it.

Whats a 'content type'?

Lately the phrase 'content types' has grown popular in the CMS circles. Drupal has championed the concept of a CCK the longest, Wordpress added native functionality for it in 3.0 a few days ago, and Joomla has always had the ability but only in the past year or so have they become a major focus.

Officially, according to the official http protocol a content type (or Internet media type) you can specify what kind of content you are sending from the server, meaning it gives the browser (or other program) information about what it is getting. If you are sending a zip file, your browser expects to get an http header "Content-type: application/zip" so it knows then to suggest that the information it is receiving is not HTML but a zip file for download. This is all handled by the server and the browser (or other client), so you never see this information.

Since it appears we have a limited vocabulary, the term is also reused by CCKs. In this perspective, a content type can be an article, video, comment, or any other construct that could also be considered a definable piece of content. There are native and CCK content types. In Joomla, articles are one native content type, but also weblinks, banners, users, and more. The issue with the native types are they are defined by Joomla. The draw of a CCK is that you are able to define your own types of content.

Before the CCKs came around, each component generally created their own custom content types for you. You may never realize it, but the majority of components hard code a specific content type for you. Generally this was not an issue as you would install a component to handle each type of content you needed.

So an example is I want to build a portfolio of my work. I want to create a new content type for a portfolio item, and it should have certain information for each. So I will add a title (a text field), a description (a text block), a gallery (image), urls to the example (a text field, but with intention of being a URL), and perhaps I'll even want to add a promo video (video). So you can see what I'm doing is creating my own content type, and I'll call it 'portfolio'.

Why do this instead of just putting everything into a Joomla article? The idea is that you are defining a content type and it will help to give your content meaning. You are actually able to define the purpose of each of the additional parts of the content, rather than just stuffing them together in one article. The meaning of data may be blatantly obvious to a human, but if its all jumbled together in one article it doesn't give a machine much of a chance to understand it.

Semantic web wants everything to have meaning

The semantic web is a topic which I could never fully explore in anything less than a book. The hallmark of the semantic web is that all content will have a clear meaning to both human and machines. Why do we care about machines? Well it is fairly easy for a human to quickly determine the meaning of content on a webpage, but as time goes on we want to be able to connect our data from across the web. This is already happening, as you can use the geotagging information from twitter to search flickr for images in the same area. A great example is Google Maps or Yahoo! Maps, both provide an overlay for their maps showing your photos from around the internet taken at various locations.

Geotagging is an example of 'metadata', or basically extra information about a particular piece of content. If we assume a twitter message is a content type, then the author, geotag, timestamp, tweet source, etc are all metadata. These snippets of additional information provide extra context for the main content. A timestamp is rather useless on its own, but associated with a twitter message it has meaning and gives the main content additional meaning. For a cool example see http://www.liveplasma.com/.

CCK and semantics

CCKs provide a very useful functionality by allowing users to quickly create custom content types without any knowledge of the programming or database layers involved. Essentially they abstract the programming, the database, and the content to a point where all 'meaning' is wrapped up in the CCKs specific program functionality. It is not very evident to the user, because they created the custom content themselves. It makes complete sense to the creator, but to someone else?

A machine cannot properly process a custom content type without a proper guide (at least today), unlike if it tried to understand the standard Joomla article content type. It isn't just about getting your content spit out in HTML, but also about providing content in other formats such as XML, RSS, or JSON. One could argue that a CCK can format content with proper semantic HTML (utilizing HTML5 at least), but that would actually be the job of the template designer not the CCK. Regardless, the information is still tied into this custom format, which unless the custom format resembles a well known content type it is rather difficult for machines to know how to read it.

Here is an analogy. You have a folder on your computer with lots of media files (images, videos, music), and you decide its hard to manage. So you install a media manager (iTunes or Picasa), and what it does is stores information about your files it its private database. They can let you categorize, tag, order, rename, and other features they offer. Its great, because now you've organized that mess, but you've become tied to that program for the management of your files. If you look at the original folder with your system's file manager, it probably looks about the same, and at best the program has rearranged your items into folders. So much for being able to access that information using another program.

If this is bad, why is it so common?

The web became so popular because it was built on a set of common rules (protocols, methods), that all servers implemented and can communicate freely because of it. The explosion of the web happened, and everyone took off in different directions. If you want data from Flickr, you need to use the Flickr API and their custom methods. If you want data from YouTube, you must use their API and so on. Each system created their own methods for dealing with data, making the web far less connected.

The semantic web wants to make it as easy for machines as for humans to surf the web. A person can easily scan a page and find the search button. However a program might not be able to do that with the same certainty, because the form may not be well written, might be displayed using Javascript, etc.

If data was more semantic, it would be much easier to interface with that data in the program. We would be less tied into one program or one system (wish you could share more of your Facebook data with other social networks?) and we'd be looking at a more open web.

The future of content

The issue isn't that a CCK couldn't be more semantic, but none of them seem to have the right approach. I have lots of ideas, but ultimately I believe the focus needs to be on limiting the amount of custom content types, and looking at how to make our content more semantic. One such idea could be to use the elements in HTML5 as a base before creating a custom content type. HTML5 provides elements for videos, images, articles, and many more. This would ensure our content fit certain formats before we start to customize it.

I hope the future of content is less custom and more standard. Think about RSS/Atom feeds, if those were not standard you would have to write a custom program to handle each feed. The same is true about the protocols we use to connect with servers and clients; if all servers didn't communicate using the same procedures, imagine how hard it would be to navigate the web or how hard it would be to program a browser!

The same concepts should be applied to our content, and when we use a CMS like Joomla with or without a CCK, we need to be aware of the limitations of what we are using and keep our eyes open for a solution to come along. I will still be using CCKs for the time being, but I hope that there will be a push to bring semantics into Joomla and our CCKs.

Many thanks to Mathias Verraes for some excellent input.


About Gnome on the run

We are a full web development studio located in the Houston, Texas area. We blog about websites, video, design, business, web analytics, conversion tracking, and various open source projects we work with.

You can track us on the following networks and ways.

Follow Gnome on the run on Twitter Like Gnome on the run on Facebook Connect to Gnomeontherun on LinkedIn View videos from Gnome on the run on Vimeo Get the Gnome on the run RSS Feed

 

Blog Categories

 

Interested in us?

If you have a project you'd like to ask us about, feel free to request a free consultation from us.

Request a consultation »