About David Jones

Profile Picture David is a Software Developer from the UK with over 12 years experience. He holds a bachelors degree in Economics, and a masters degree in Software Development, from which many of the pages on this site have evolved. His interests include all things C#/.NET along with software frameworks, architectures and design patterns.

When not developing software David plays bass in a psychedelic rock band who are currently recording their first album.

The World and the Machine

by David Jones 17. July 2010 16:58

The World and the Machine is a paper by Professor Michael Jackson published in 1995 in Proceedings of the 17th international conference on Software engineering, and it changed my perspective on computer software. Alas, for copyright reasons I am not allowed to provide a copy here.

Jackson views software development as an engineering discipline concerned with creating programs to solve real world problems.  The software developed describes a machine, and turns a generic computer into that machine – a useful physical device capable of problem solving.  The true value of the software developed is measured by its usability and its ability to satisfy the requirements for which it was designed. He takes a holistic approach by saying that software developers are not merely concerned with the software artefact itself, but the establishment and definition of the real world requirements and problems the software addresses.

Jackson takes the approach that software is a machine designed to perform a useful function in the real world, and that the specification is the bridge between the world and the software. The machine has no purpose without the world because that is where the problem exists which the software/machine was built to solve. The problem is in the world not the machine, and the software cannot be developed in isolation to the world in which it will exist. For example, ATM software would be of little use if it was designed to require a mouse and keyboard, irrespective of its account management features.

A real world problem is defined in the specification, and software is then developed to meet that specification. If the specification has accurately captured the requirements, and the software meets the specification, then the software loaded into the computer will create a machine which solves the problem in the world. Interestingly, Jackson’s paper implies a strict progression from:

Problem -> Specification -> Program -> Solution

This does not take into account the fact that requirements could well change. To stay with Jackson’s view: the world, and therefore the specification, might change while the machine is being built, which obviously affects the machine’s ability to successfully solve the original problem.

Jackson views software development as an engineering discipline, and most engineering disciplines model their creations in one form or another. Modelling provides an abstraction of the real world, often free of implementation detail, which allows us to manage complexity and design a solution to a real-world problem. UML provides a standardised notation and semantics from which to create a model of a machine (at various levels of abstraction). By designing and developing the software using an incremental and iterative methodology, we can quickly assess and improve the machine to more precisely solve the problem in the world, and therefore achieve success.

 

Tags: , ,

Software Development

Arguments for and against the use of NULL in Relational Database Systems

by David Jones 7. June 2010 05:21

In both the Relational Model, and its practical realisation by SQL, null is used to represent “missing”, “inapplicable” or “unknown” information. For example, a record might exist for a given Part Number, but there is no attribute/value given for its weight – it is not zero grams, it is unknown. In terms of Three-Value Logic, the result of a predictate is one of: True, False or Unknown. In the Relational Model and SQL, null is not a value, but is a flag or marker, independent of any data type, that there is no value. However, in practice it is often incorrectly treated as a value.

Null is represented in the Relational Model and SQL by the keyword NULL, and as it is not a value it cannot be specified as the result of a search condition, so thus requires a mechanism to refer to it specifically: SQL uses IS NULL, a Relational Model tuple containing null values would be: <David, Jones, null, 35, null>.

Codd (1985) published a set of 13 rules to determine whether a DBMS can be considered Relational. Codd’s Rule 3 specified The Systematic Treatment of Null Values, distinct from any empty character string, zero, or any other number.

C.J. Date on the other hand, argues that nulls are a mistake because they are not values, and have no place in the Relational Model, instead he proposes using Special Values in their place, claiming that this is what we do in the real world, and that there is no such thing as null in the real world (Date, 2004). In this context, a special value would be a value (of the correct data type) inserted when no other value is available/appropriate. For example, a record might exist to indicate that a student has submitted an assignment, which records such information as the student’s name and date of submission. However, if the assignment has not yet been marked, the student’s score would be recorded as a value outside of the range of normal values to indicate this, such as -1 for a paper with possible scores of 1 – 100 marks. When using special values, it is imperative that the special value cannot represent any valid value for the attribute being represented.

A distinct disadvantage with the use of special values becomes apparent with SQL functions such as MAX(), MIN() and AVG(). SQL will skip any rows containing null in a field used as part of a search expression, therefore not including the row/value in any mathematical calculations. However, returning to the above example of student assignments, MIN(score) would return the special value of -1 as it’s a legitimate value for the data type. Similarly, AVG(score) would also include the -1, even though it clearly should not as the -1 is being used to indicate that the assignment hasn’t yet been marked. Neither the -1 nor the row itself should be included in the calculation of the average.

This means that the special value must be known and catered for in the formulation of the query and/or interpretation of the results, eg:

SELECT AVG(score)
FROM student_assignments
WHERE score > -1

In contrast to the explicit inclusion of the search condition shown above, SQL would automatically ignore any rows where the score was null. Special values require the end-user to have an intimate knowledge of the database and the special values in use, which can vary from database to database and even table to table.

Special values can lead to inconsistencies and anomalies in the data, especially where they are input by the end user. For example, a record may have fields for FirstName, MiddleName, Surname. Not everyone has a middle name so a special value is required to take the place of the null, this could be: N/A, na, none, etc. This special value must then be specifically taken into account to prevent issues such as letters addressed to “Dear David None Jones”, or worse still “Dear Mr Deceased”, which has happened on numerous occasions where end users have inserted a special value which is meaningful to themselves but not the DBMS! These issues can however be mitigated to a certain extent by the use of SQL Default values, e.g.:

CREATE TABLE customer (
…
MiddleName VARCHAR(30) DEFAULT “None”,
… );

The use of special values in place of nulls also works against standardisation, integration and interoperability. Consider ODBC which provides a standardised interface to any DBMS with a suitable ODBC Driver. The end user application can make a call to the ODBC interface to find out whether a column value is null (eg SQL’s IS NULL), irrespective of the backend DBMS, and the driver will handle the ‘translation’ of the call. The DBMS could be replaced with an entirely different implementation but the call remains unaffected – it’s the job of the DBMS manufacturer’s ODBC driver to handle the communication between the ODBC interface and the DBMS. This DBMS independence would be lost, or at least severely impaired, through the use of special values instead of nulls.

CJ Date argues that nulls have no place in the Relational Model because they do not exist in the real world. However, the Relational Model is conceptual whereas relational databases do exist and play a mission-critical role in the real world and the concept/use of nulls aid DBMS in this role, along with the people who use them – one of the essential components of any Information System.

References

Webopedia, n.d.. What is Codd's Rules? [Online] Available at http://www.webopedia.com/TERM/C/Codds_Rules.html [Accessed 6 June 2010].

Date, C.J., 2004. “An Introduction to Database Systems” Eighth Edition, page 591, Addison-Wesley

Tags:

Database Systems

Comparing Relational Databases and Multidimensional Databases

by David Jones 23. May 2010 06:20

In this post I briefly compare the data structures and operators of relational databases compared to multidimensional databases, and then discuss why data stored in a multidimensional database can be easier to understand and manipulate than data stored in a relational database. I then go on to describe different ways in which data duplication can occur in a multidimensional star schema and discuss why it can be beneficial to store data in this way whilst avoiding the problems often associated with this sort of duplication in relational databases.

At the core of the dimensional model are Facts and Dimensions. A fact is a numerical measure of a subject of interest, for example: company A bought quantity n of product X on date D. A dimension is a perspective by which a organisation can select, group and view these facts, for example: the organisation can view the facts by time or product or customer – or all three – depending on the dimensions defined in their multidimensional model. The multidimensional model is designed primarily for querying historical data (i.e. facts) in to provide answers to questions of a temporal nature, such as “What are the sales of product X per month this year in comparison with the same months over the last 3 years?”.

The aggregated data forming the summarised facts are presented as an n-D (eg 3D) cube of data, with a dimension (at a given level within the concept hierarchy) along each axis, and a numerical fact (or simply nothing) at each intersection. Data is normally stored as a central large fact table, with a number of smaller satellite dimension tables, associated by foreign keys.

By comparison, the relational model stores it’s data in tables (or more precisely Relations). In most cases the data is normalised so that there is no duplication of data between tables. Compared to the multidimensional model, the relational model usually consists of a larger number of tables, with more complex relationships between them. A primary precept of the relational model is that data is always presented as a two-dimensional table – even where there is only a single value.

Even though the raw data of the multidimensional model is usually stored within a relational database, the multidimensional model lends itself to analysis whereas the relational model is more geared towards operational needs, namely regular insertions, updates or deletions of small amounts of data affecting a small number of tables.

Both the multidimensional model and the relational model support aggregation of data in order to satisfy queries. However, the multidimensional model will often calculate and store all possible aggregations whereas the relational model creates only the aggregates requested, does not store them (unless snapshot tables are being used), and recreates the aggregates each time.

In order to manipulate and aggregate data both the multidimensional and relational models support the use of a range of operators. Both models support the use of distributive aggregate operators including: COUNT, SUM, MIN and MAX. However, the multidimensional model supports a range of operations to manipulate data in a cube including:

  • Roll-up and Drill-down to ascend and descend the dimension hierarchy.
  • The Slice operation to perform a selection on one dimension of a cube.
  • The Dice operation to perform a selection on two or more dimensions of a cube.

By comparison, the relational model uses SELECT … WHERE … GROUP BY operators to perform similar functions. In fact, where multidimensional data is stored in a relational database, the RDBMS uses these operators in the background in the generation of the multidimensional data.

Data in a data warehouse built on a multidimensional database will be easier to understand and manipulate than data stored in a data warehouse built on a relational DBMS because the multidimensional DBMS is designed and built specifically for this purpose. The multidimensional data warehouse pre-computes some (or all) of the possible cubes, allowing for fast and simple queries, whereas the relational data warehouse will store most (if not all) the data in it’s more detailed, non-aggregated format, therefore, queries are slower and more complex as data needs to be selected and aggregated each time.

There is often duplication in Time dimension tables. A week can be calculated from other attributes, but since weeks do not align exactly with months a grouping by months cannot be obtained by a grouping by weeks. The duplication is not an issue as it provides analysts with the flexibility to use either option. The same duplication applies to financial months versus calendar months, for example the financial month 1 can equate to calendar month 5 etc.

A second way duplication can occur in a Star schema is where a value for an attribute (other than a key or calculated value) is stored in both the fact table and the dimension table. For example, a business customer is based in London, and this is recorded as an attribute for the customer’s record in the dimension table. When that customer makes a purchase, the city is recorded in the fact table (along with the item, price, quantity, etc). Hence the city data for that customer is duplicated in both the dimension table, and every record in the fact table related to that customer’s purchases. If the customer later relocates to Cardiff, the customer’s record in the dimension table is updated to reflect the new city. By storing the city value in the fact table, we can get a true value for sales in London and Cardiff both before and after the customer re-located, whereas if we simply select and group by the city attribute in the dimension table, the sales figures would be wrong as all the sales for Cardiff would also include all that customers purchases from when they were based in London, and vice versa.

Tags:

Database Systems

Are Agile Methods Software Development's next Silver Bullet?

by David Jones 18. May 2010 08:27

In this post I am going to discuss two Software Development articles: "No silver bullet - essence and accident in Software engineering" by F. Brooks from 1987 and the more recent 2002 article "Get ready for agile methods, with care" by B. Boehm. Are Agile Software Development methods one of the "Silver Bullets" whose existence Fred Brooks questioned?

According to Brooks, the difficulties in developing software derive from it's essential complexity, need to conform to other interfaces, need to change and difficulty to visualise. As to whether recent agile methods could be a silver bullet for software development, Brooks says that there can be no silver bullet due to the nature of software, but states that there are “encouraging innovations”. It can be argued that the innovations he talks of are, in fact, essential characteristics of what later became known as agile methods. The following paragraphs compares these innovations with the agile methods in the Boehm article.

Requirements - Brooks argues that the most difficult part of developing a software system is deciding what to develop, ie establishing the requirements, the specification and designing the software, and promotes an iterative approach to eliciting and refining the requirements between the developer and client. This is comparable with the agile approach of working closely with a customer representative to establish and implement their requirements in small iterations.

Change – Brooks writes that during initial development the client often doesn't fully know their requirements for the software, that additional requirements emerge during development, and, once released, all successful software comes under pressure for change. However, Boehm tells us how agile methods welcome change as requirements emerge and change during development. Short development iterations, along with constant testing, allow changes to be incorporated into the software.

Rapid prototyping and incremental development – according to Brooks, an early prototype helps to visualise the software, allows the client to verify that it meets his requirements, allows any emergent requirements to surface, and creates a basic working system which can be gradually expanded. He also claims that getting an early prototype running boosts morale. Boehm quotes the Agile Manifesto principle of releasing software early and regularly. This gives the customer regular returns on his investment (Chromatic, 2003).

Visualisation and modelling – In 1987 Brooks wrote that software is inherently unvisualisable and attacked the usefulness of a flowchart as a design tool. However, UML and current modelling tools allow designers to create detailed models of software from which the code can be auto-generated.

People – Boehm highlights how agile methods lean towards requiring a small number of premium people, using their tacit knowledge rather than detailed plans to develop designs, and that design-by-committee often results in an inferior design. Fifteen years earlier, Brooks was saying exactly the same thing – that great software is often designed by just a few great designers rather than by committee.

Testing – Brooks states that a great deal of the effort in developing software goes into testing and bug fixing, and asks whether a silver bullet can be found which eliminates them in the system design phase, and whether this will lead to improvements in productivity and reliability. Regular testing is fundamental to agile methods. Unit testing is used to test discrete sections of code, often individual methods, whilst Continuous Integration rebuilds and tests the whole system every time a change is checked into source control, in order to check for changes in one part of the system (which might well work fine on their own) affecting other areas of the system (Niemeyer and Poteet, 2003).

Project size – Brooks claims the benefits of incremental development applies to projects of all sizes, however, Boehm's article quotes Larry Constantine as saying that it becomes increasingly difficult to apply agile methods to teams of more than 15 to 20.

In discussing Brooks' No Silver Bullet article, Black (1999) reiterates the argument that searching for a universal cure to the “software problem” is counterproductive. He goes on to reiterate that developments such as RAD, OOP and 4GLs have all had their moment as the next big thing, but ultimately become last year's incremental improvement.

In conclusion it can be argued that agile methods maybe do represent Brooks' silver bullet under certain circumstances, with the right people - designers, developers, managers and customers - and the right project. However, in Boehm's article even proponents of agile methods state that they are not suitable for very large projects or safety-critical systems, so the home grounds of the various approaches need to be matched to the project.

References

Black, R., (1999). Managing the Testing Process, Microsoft Press

Boehm, B. (2002). Get ready for agile methods, with care, IEEE Computer, Vol. 35, No. 1, January, 64–9.

Brooks, F. (1987). No silver bullet – essence and accident in software engineering, IEEE Computer, Vol. 20, No. 4, April, 10–19

Chromatic, (2003). Extreme Programming Pocket Guide, O'Reilly

Niemeyer, G., Poteet, J., (2003), Extreme Programming with Ant , Sams Publishing

Tags: , , ,

Software Development

Maslow's Theory of Human Motivation in relation to Open Source Software Development

by David Jones 8. April 2010 07:11

In 1943 Maslow published an article on human motivation entitled "A Theory of Human Motivation", and in this post I consider how these theories might apply to those involved in the development of free open source software (F/OSS).

Maslow’s theories describe how a person’s needs are the motivators for their behaviour, and that once a given need is satisfied it is no longer a motivator. Needs are presented in a hierarchy and once basic needs such as food, safety and love have been satisfied, the person is free to concentrate on higher needs, which he presents as Esteem needs followed by Self-actualization needs. These needs then become the motivators for behaviour, and both may arguably be applied to individuals’ motivation for participation in open-source software.

Under the heading of Esteem needs, Maslow states that everyone has a need or desire for self-esteem and for the esteem of others, based on their own capacity and achievement, along with respect from others in the form of: reputation, prestige, recognition, attention, appreciation or importance. This is supported by Dale Carnegie who argues that the desire to feel appreciated and important are the deepest urges in human nature.

These esteem needs can be applied to what Lakhani describes as extrinsic motivation for participation in a F/OSS project. Herzberg’s motivation-hygiene theory places less emphasis on salary as a motivator than might initially be expected. In F/OSS one’s contribution (normally in the form of source code) indicates one’s capacity and level of achievement, and how these are perceived by peers becomes the basis for the esteem in which the contributor is held by his peer group. Raymond (2001) describes F/OSS culture as a gift culture where status and reputation are based on what is given away, i.e. contribution to the F/OSS project and/or movement.

Contributions to a F/OSS project are normally subjected to intense peer review by the project team and other interested third parties, and faults are readily, and often publicly through mailing lists, communicated back to the contributor. Nantz (2005) claims that most F/OSS developers have a desire to look good in the eyes of their peers and will refine their code to the best of their abilities before releasing it.

Once an individual’s esteem needs are satisfied, Maslow argues that a need for self-actualization can develop whereby a person is compelled to pursue their vocation in search of self-fulfillment, and where that vocation is creative it will take that form. Lakhani describes how developers often consider programming as a creative endeavour, and the survey presented in his paper noted that developers feel a high sense of personal creativity in their projects.

This self-actualization need is comparable to Lakhani’s Intrinsic Motivation, which in this context can be seen as developing F/OSS for the inherent enjoyment, intellectual challenge and self-satisfaction, rather than any external reasons. A developer can be motivated to select a problem which matches their ability, but without a pre-determined solution, the reward of their endeavours being the creative solution to the problem, and the knowledge gained from the development process. In doing this, the F/OSS developer has, through his own creativity, expanded his vocational knowledge and experience, and helped to satisfy his own self-actualization need.

In conclusion, it can be argued that the results found by Lakhani et al do approximate Maslow’s theories, however, there is an overlap in the motivational factors behind F/OSS participation. Maslow talks about the “Degree of relative satisfaction” to indicate that there is no clear point at which one set of needs is completely satisfied and the individual then progresses to the next higher level. If one considers Extrinsic motivational factors to be comparable with esteem needs and Intrinsic motivational factors to be comparable with self-actualization needs, then the Lakhani paper reports observing an interplay between extrinsic and intrinsic motivations, and F/OSS developers motivated by a combination of the two with neither dominating nor destroying the effectiveness of the other.

References

Carnegie, D., (1953). How to Win Friends and Influence People, London, UK, Vermilion.

Lakhani, K.R. and Wolf, R.G., (2005). Why hackers do what they do: understanding motivation and effort in free/open source software projects in Feller, J., Fitzgerald, B., Hissam, S. and Lakhani, K.R. (eds) Perspectives on Free and Open Source Software, MIT Press.

Maslow, H., (1943). A theory of human motivation, originally published in Psychological Review, 50, 370–96

Nantz, B., (2005). Open Source .NET Development, Boston, USA, Pearson Education, Inc.

Raymond, E. S., (2001). How to Become a Hacker [online], Available from: www.catb.org/~esr/faqs/hacker-howto.html (Accessed 04/04/2010)

Tags:

Software Development