Data Structures Case Studies
Optimizing data structures is different from optimizing algorithms as data structure problems have more dimensions: you may be optimizing for throughput , for latency , for memory usage , or any combination of those — and this complexity blows up exponentially when you need to process multiple query types and consider multiple query distributions.
This makes simply defining benchmarks much harder, let alone the actual implementations. In this chapter, we will try to navigate all this complexity and learn how to design efficient data structures with extensive case studies.
A brief review of the CPU cache system is strongly advised.
Distributed data structures: A case study
- Change Username/Password
- Update Address
- Payment Options
- Order History
- View Purchased Documents
- Communications Preferences
- Profession and Education
- Technical Interests
- US & Canada: +1 800 678 4333
- Worldwide: +1 732 981 0060
- Contact & Support
- About IEEE Xplore
- Nondiscrimination Policy
- Privacy & Opting Out of Cookies
A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2023 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.
Java programs in this chapter.
REF PROGRAM DESCRIPTION 4.1.1 ThreeSum.java 3-sum problem 4.1.2 DoublingTest.java validating a doubling hypothesis 4.2.1 Questions.java binary search (20 questions) 4.2.2 Gaussian.java bisection search 4.2.3 BinarySearch.java binary search (in a sorted array) 4.2.4 Insertion.java insertion sort 4.2.5 InsertionTest.java doubling test for insertion sort 4.2.6 Merge.java mergesort 4.2.7 FrequencyCount.java frequency counts 4.3.1 ArrayStackOfStrings.java stack of strings (array) 4.3.2 LinkedStackOfStrings.java stack of strings (linked list) 4.3.3 ResizingArrayStackOfStrings.java stack of strings (resizing array) 4.3.4 Stack.java generic stack 4.3.5 Evaluate.java expression evaluation 4.3.6 Queue.java generic queue 4.3.7 MM1Queue.java M/M/1 queue simulation 4.3.8 LoadBalance.java load balancing simulation 4.4.1 Lookup.java dictionary lookup 4.4.2 Index.java indexing 4.4.3 HashST.java hash table 4.4.4 BST.java binary search tree 4.4.5 DeDup.java dedup filter 4.5.1 Graph.java graph data type 4.5.2 IndexGraph.java using a graph to invert an index 4.5.3 PathFinder.java shortest-paths client 4.5.4 PathFinder.java shortest-paths implementation 4.5.5 SmallWorld.java small-world test 4.5.6 Performer.java performer–performer graph
Data structures and algorithms study cheatsheets for coding interviews
What is this .
This section dives deep into practical knowledge and techniques for algorithms and data structures which appear frequently in algorithm interviews. The more techniques you have in your arsenal, the higher the chances of passing the interview. They may lead you to discover corner cases you might have missed out or even lead you towards the optimal approach!
Contents of each study guide
For each topic, you can expect to find:
- A brief overview
- Learning resources
- Language-specific libraries to use
- Time complexities cheatsheet
- Things to look out for during interviews
- Corner cases
- Useful techniques with recommended questions to practice
Study guides list
Here is the list of data structures and algorithms you should prepare for coding interviews and their corresponding study guides:
General interview tips
Clarify any assumptions you made subconsciously. Many questions are under-specified on purpose.
Always validate input first. Check for invalid/empty/negative/different type input. Never assume you are given the valid parameters. Alternatively, clarify with the interviewer whether you can assume valid input (usually yes), which can save you time from writing code that does input validation.
Are there any time/space complexity requirements/constraints?
Check for off-by-one errors.
In languages where there are no automatic type coercion, check that concatenation of values are of the same type: int / str / list .
After finishing your code, use a few example inputs to test your solution.
Is the algorithm meant to be run multiple times, for example in a web server? If yes, the input is likely to be preprocess-able to improve the efficiency in each call.
Use a mix of functional and imperative programming paradigms:
- Write pure functions as much as possible.
- Pure functions are easier to reason about and can help to reduce bugs in your implementation.
- Avoid mutating the parameters passed into your function especially if they are passed by reference unless you are sure of what you are doing.
- However, functional programming is usually expensive in terms of space complexity because of non-mutation and the repeated allocation of new objects. On the other hand, imperative code is faster because you operate on existing objects. Hence you will need to achieve a balance between accuracy vs efficiency, by using the right amount of functional and imperative code where appropriate.
- Avoid relying on and mutating global variables. Global variables introduce state.
- If you have to rely on global variables, make sure that you do not mutate it by accident.
Generally, to improve the speed of a program, we can either: (1) choose a more appropriate data structure/algorithm; or (2) use more memory. The latter demonstrates a classic space vs. time tradeoff, but it is not necessarily the case that you can only achieve better speed at the expense of space. Also, note that there is often a theoretical limit to how fast your program can run (in terms of time complexity). For instance, a question that requires you to find the smallest/largest element in an unsorted array cannot run faster than O(N).
Data structures are your weapons. Choosing the right weapon for the right battle is the key to victory. Be very familiar about the strengths of each data structure and the time complexities for its various operations.
Data structures can be augmented to achieve efficient time complexities across different operations. For example, a hash map can be used together with a doubly-linked list to achieve O(1) time complexity for both the get and put operation in an LRU cache .
Hash table is probably the most commonly used data structure for algorithm questions. If you are stuck on a question, your last resort can be to enumerate through the common possible data structures (thankfully there aren't that many of them) and consider whether each of them can be applied to the problem. This has worked for me sometimes.
If you are cutting corners in your code, state that out loud to your interviewer and say what you would do in a non-interview setting (no time constraints). E.g., I would write a regex to parse this string rather than using split() which may not cover all cases.
AlgoMonster aims to help you ace the technical interview in the shortest time possible . By Google engineers, AlgoMonster uses a data-driven approach to teach you the most useful key question patterns and has contents to help you quickly revise basic data structures and algorithms. Best of all, AlgoMonster is not subscription-based - pay a one-time fee and get lifetime access . Join today for a 70% discount →
Grokking the Coding Interview: Patterns for Coding Questions
Master the Coding Interview: Data Structures + Algorithms
Table of Contents
- What is this
- Contents of each study guide
- Study guides list
- General interview tips
- Recommended courses
Data Structures and Algorithms in Everyday Life
Jun 16, 2020
10 min read
From the origin of the first programming languages to the modern programming languages currently in use, computer programming has evolved quite a lot. It has now become more powerful, efficient, and advanced. However, the fundamental concepts and use of data structure and algorithms in computer programming have not changed. DSA has been the core of computer programming from the beginning.
You might have heard DSA being used mainly in the field of computer science. However, the use of DSA is not limited to the field of computing. We can also find the concept of DSA being used in day to day life. In this blog, we will discuss the common concept of DSA that is used in everyday life. But before that, let's learn the basics of Data Structure and Algorithms first.
What is Data Structure and Algorithm (DSA)?
Data structure and algorithms is a branch of computer science that deals with creating machine-efficient and optimized computer programs. The term Data Structure refers to the storage and organization of data, and Algorithm refers to the step by step procedure to solve a problem. By combining "data structure" and "algorithm", we optimize the codes in software engineering.
DSA in Software Development
Data structure and Algorithm (DSA) is applied in all disciplines of software development. DSA is the building block of the software development process. It is not limited to a single programming language. Although programming languages evolve or get dormant over time, DSA is incorporated into all of these languages.
The efficiency of software development depends on the choice of an appropriate data structure and algorithm.
There might be cases when you are provided with the most efficient data structure to work with a robust algorithm. However, if the two are not compatible with each other, the code will not produce the expected outcome. Thus, selecting an appropriate data structure for an algorithm is an essential part of software development.
Take, for example, the imperial system of measurement used in the US. Why is it so dreadful? The US has been using measuring units like inches, yard, miles, ounce, and pound for measurements. If you need to convert a yard into inches, you have to multiply it by 36. However, in the metric system, you can simply multiply by 1000 to convert meter into kilometer. It is thus easier for the mind to do the conversion in the metric system. That is why most people find the imperial system to be inconvenient. Another example of this inconvenience is that "ounce" is used for solid or liquid depending on the context.
The ease of conversion from one to another metric is the most important factor here. In this example, we can compare the measurement systems (i.e. the metric system and the Imperial system) to "data structures", while the process of conversion from one unit to another can be thought of as the algorithm. This shows that choosing the right data structure has a great impact on the algorithm, and vice-versa.
Another critical facet of DSA usage in software development is the time and space constraints. These constraints check the availability of time and space for the algorithm. An optimized algorithm addresses both of these constraints based on the availability of resources. If memory is not an issue for the hardware, DSA focuses more on optimizing the running time of the algorithm. Similarly, if the hardware has both the constraints, then DSA must address both of them. You can learn more about the representation of these complexities on Asymptotics Analysis .
How can you relate DSA to your day to day life?
Let's dive into some of the examples of the usage of DSA.
Stack data structure to reverse a string.
A stack is a linear data structure, "linear" meaning the elements are placed one after the other. An element can be accessed only after accessing the previous elements.
We can visualize a stack like a pile of plates placed on top of each other. Each plate below the topmost plate cannot be directly accessed until the plates above are removed. Plates can be added and removed from the top only.
Each plate is an element and the pile is the stack. In the programming terms, each plate is a variable and the pile is a data structure.
Why do we need a stack representation?
You might be wondering why a programmer needs to learn how to put a plate on a pile and take the plate out from the pile. Let's find the answer to it. You are assigned a task of reversing a string. How would you do it?
Start selecting a character from the string and copy it into the new location one by one.
Now, let us copy these items from the top into the original location.
Great, we have successfully reversed a string using the property of stack (the new memory). Inserting and removing was only allowed from the top. This way stack is used in programming.
Queue Data Structure while Boarding a Bus
A Queue is also a linear data structure in which the elements are arranged based on FIFO (First In First Out) rule. It is like the passengers standing in a queue to board a bus. The person who first gets into the queue is the one who first gets on the bus. The new passengers can join the queue from the back whereas passengers get on the bus from the front.
Why do we need a Queue representation?
You may ask where a queue is used on a computer. Assume that you are in your office and there is a network of five computers. You have connected all these computers to a single printer. Suppose an employee wants to print his documents and sends a command to the printer through his computer. The printer receives the commands and starts printing the documents. At the same time, another employee sends commands to the printer. The printer puts the second command to the queue. The second command is executed only after the execution of the first command. This follows the FIFO rule.
Graph Data Structure in Social Media and Google Map
A Graph is a network of interconnected items. Each item is known as a node and the connection between them is known as the edge.
You probably use social media like Facebook, LinkedIn, Instagram, and so on. Social media is a great example of a graph being used. Social media uses graphs to store information about each user. Here, every user is a node just like in Graph. And, if one user, let's call him Jack, becomes friends with another user, Rose, then there exists an edge (connection) between Jack and Rose. Likewise, the more we are connected with people, the nodes and edges of the graph keep on increasing.
Similarly, Google Map is another example where Graphs are used. In the case of the Google Map, every location is considered as nodes, and roads between locations are considered as edges. And, when one has to move from one location to another, the Google Map uses various Graph-based algorithms to find the shortest path. We will discuss this later in this blog.
Sorting Algorithm to Arrange Books in the Shelf
In simple terms, sorting is a process of arranging similar items systematically. For example, suppose you are arranging books on a shelf, based on the height of the books. In this case we can keep the taller books on the left followed by the shorter books or we can do vice versa.
This same concept is implemented in Sorting algorithms. Different sorting algorithms are available in DSA. Although the purpose of every algorithm remains the same, each algorithm works differently based on various criteria.
In the above example, if we want to sort the books as fast as we can then there are few points to be considered.
- Can the books be easily shuffled on the shelf? If the books are heavy, it may take us more time. Similarly, there may be other constraints. ( accessibility )
- What is the number of books? ( data size )
- How fast can we access them? ( hardware's ability )
Algorithms are built considering all these constraints to produce an optimal solution. Some of the examples of these algorithms are Bubble Sort , Selection Sort , Merge Sort , Heap Sort , and Quick Sort .
Searching Algorithm to Find a Book in a Shelf
Searching, as its name suggests, helps in finding an item.
Suppose you want to search for a specific book on a shelf. The books in the self are not arranged in a specific way. If you need to find the book in the shortest possible time, how would you do that? The solution to this is provided by DSA.
You may be thinking "I will look for the book from the beginning and locate it". In this case, you will be searching for books one by one from the start to the end of the shelf. This same concept is implemented in Linear Search .
But, what if the book is at the other end of the shelf? The above process might take a long time and will not provide a feasible solution.
Now, let's try another procedure. Firstly, sort the books in ascending alphabetical order then search for the book in the middle. We are searching for a book that starts with J.
Since we are always looking at the middle position, the middle position between A and Z is M, not J.
Now, compare J with M. We know that J lies before M. So let's start searching for J in the middle position of A and M. G is the mid element, again J is not found.
Since J lies between G and M, let's find the mid element between them. Yeah, we have found J. Congratulations!!! ?.
And, you have just implemented Binary Search .
Shortest Path Finding Algorithms to Find the Shortest Path in Google Map
Have you ever thought about how Google Maps is able to show you the shortest path to your destination? Applications such as Google Maps are able to do that using a class of algorithms called Shortest Path Finding Algorithms.
These algorithms deal with finding the shortest path in a graph. As in the example discussed in the Graph data structure above, we can use graph algorithms to find the shortest path between two given locations on a map.
To illustrate the problem, let's find the shortest distance between A and F in the following map.
What are the possible solutions to this problem? Let's figure out the possible routes along with their path length.
We can see that the shortest path is Path-3. But, we have wasted time calculating other paths as well, which we are not going to use. In order to solve this problem without wasting time, we can start from A and check for the possible shortest neighboring paths (AC and AB). We have AC as the shortest path.
Now we are at C, again select the shortest path among its neighboring paths CE and CD, which is CD.
From D, we have a single path to F. From D, we can go to B as well but, B is already visited, so it is not considered. Select the path DF and we reach the destination.
Congratulations one more time?. You have implemented Dijkstra's Algorithm . In this way, the graph finds its use in our life.
When I was in the final year of my undergraduate studies and applying for software engineering positions, there was one thing common between the hiring procedure of all companies. They all tested me on problems that involved the use of data structures and algorithms. DSA has great importance in the recruitment process of software companies as well. Recruiters use DSA to test the ability of the programmer because it shows the problem-solving capability of the candidate.
As you can see from the above examples, we are able to relate DSA with our day to day life and make it more fun to study. For those who are from non-technical backgrounds, they can also learn the techniques used in the algorithms for solving their daily problems.
Furthermore, one cannot neglect the importance of DSA in any programming language. DSA never gets extinct, rather it is evolving because the evolving computers, in the 21st century, need evolving algorithms to solve a complex problem.
Not to mention, a programmer should know how to use an appropriate data structure in the right algorithm. There is a famous saying:
A warrior should not just possess a weapon, he must know when and how to use it.
Best wishes to all the new programmers out there. Start your learning from today.
Subscribe to Programiz Blog!
Rhitabrat is a computer programmer, with a passion for data science and ML. He loves exploring the data in python. Apart from this, he enjoys watching and playing football.
Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser .
Enter the email address you signed up with and we'll email you a reset link.
- We're Hiring!
- Help Center
Dijkstra's Algorithm-A Case Study to Understand How Algorithms are improved
This work is licensed under a Creative Commons Attribution 3.0 License.
Computer networks have progressed to a complex communication infrastructure from a simple store and forward medium. In the network, routers need to implement a variety of functions ranging from simple packet classification for forwarding and fire walling to complex payload modifications for encryption and content adaptation. This report describe the Dijkstra’s Algorithm uses in Network Routing Protocol (NRP) for understanding the best path among the many ways. However there are some other ways to find the beset path, so Dijkstra’s Algorithm introduce a good method and easy manner for this cues. By the Dijkstra’s Algorithm you can find and solve the different problems about the data routing and any other problems by using of pseudo code.
Informatics in Control, Automation and …
International Journal IJRITCC
ACM Journal of Experimental Algorithms
Journal of Experimental Algorithmics
Nextgen Research Publication
Abstract— Network is defined as a combination of two or more nodes which are connected with each other. It allows nodes to exchange data from each other along the data connections. Routing is a process of finding the path between source and destination upon request of data transmission. There are various routing algorithms which helps in determining the path and distance over the network traffic. For routing of nodes, we can use many routing protocols. Dijkstra's algorithm is one of the best shortest path search algorithms. Our focus and aim is to find the shortest path from source node to destination node. For finding the minimum path this algorithm uses the connection matrix and weight matrix Thus, a matrix consisting of paths from source node to each node is formed. We then choose a column of destination from path matrix formed and we get the shortest path. In a similar way, we choose a column from a mindis matrix for finding the minimum distance from source node to destination node. It has been applied in computer networking for routing of systems and in google maps to find the shortest possible path from one location to another location.
Dijkstra's algorithm is called the single-source shortest path. It is also known as the single source shortest path problem. It computes length of the shortest path from the source to each of the remaining vertices in the graph.
DAAAM International Scientific Book 2019
International Journal of Advances in Data and Information Systems
Muhammad Rhifky Wayahdi
Lecture Notes in Computer Science
International Journal of Geographical Information Science
Journal ijmr.net.in(UGC Approved)
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
Emmanuel N Osegi
Peter W Eklund
Indah nur Cahya putri
European Journal of Operational Research
Middle European Scientific Bulletin
Journal of the ACM
THE IJES Editor
International Journal of Advanced Remote Sensing and GIS
International Journal of Electrical and Computer Engineering (IJECE)
2021 IFIP/IEEE International Symposium on Integrated Network Management (IM)
International Journal of Engineering Research and Technology (IJERT)
Discrete Applied Mathematics
International Journal of Computers Technology
dr abdelfatah tamimi
Yefim Dinitz , Rotem Itzhak
Journal of Physics: Conference Series
Cecep Nurul Alam
International Journal of Mathematics Trends and Technology
- We're Hiring!
- Help Center
- Find new research papers in:
- Health Sciences
- Earth Sciences
- Cognitive Science
- Computer Science
- Academia ©2023
- Trending Now
- Data Structures
- Foundational Courses
- Data Science
- Practice Problem
- Machine Learning
- Web Development
- Web Browser
- System Design
- Software Development
- Product Management
- Explore Our Geeks Community
- Learn Data Structures and Algorithms | DSA Tutorial
Data Structures Tutorial
Array Data Structure
- String Data Structure
- Linked List Data Structure
- Stack Data Structure
- Queue Data Structure
- Tree Data Structure
- Heap Data Structure
- Hashing Data Structure
- Graph Data Structure And Algorithms
- Matrix Data Structure
- Introduction to Set – Data Structure and Algorithm Tutorials
- Introduction to Map – Data Structure and Algorithm Tutorials
- Advanced Data Structures
- Algorithms Tutorial
- Searching Algorithms
- Sorting Algorithms
- Recursion Algorithms
- Backtracking Algorithms
- Greedy Algorithms
- Dynamic Programming
- Pattern Searching
- Divide and Conquer
- Mathematical Algorithms
- Geometric Algorithms
- Bitwise Algorithms
- Randomized Algorithms
- Branch and Bound Algorithm
- Competitive Programming - A Complete Guide
DSA for Beginners Learn more about Data Structure in DSA Self Paced Course Practice Problems on all Data Structures Top Quizzes on Data Structures -->
What is Data Structure:
A data structure is a storage that is used to store and organize data. It is a way of arranging data on a computer so that it can be accessed and updated efficiently.
A data structure is not only used for organizing the data. It is also used for processing, retrieving, and storing data. There are different basic and advanced types of data structures that are used in almost every program or software system that has been developed. So we must have good knowledge about data structures.
Get Hands-on With Data Structures and Algorithms Master fundamental computer science concepts to solve real-world problems and ace coding interview questions with Educative’s interactive course Data Structures and Algorithms in Python. Sign up at Educative.io with the code GEEKS10 to save 10% on your subscription.
Classification of Data Structure:
Classification of Data Structure
- Static data structure: Static data structure has a fixed memory size. It is easier to access the elements in a static data structure. An example of this data structure is an array.
- Dynamic data structure: In dynamic data structure, the size is not fixed. It can be randomly updated during the runtime which may be considered efficient concerning the memory (space) complexity of the code. Examples of this data structure are queue, stack, etc.
- Non-linear data structure: Data structures where data elements are not placed sequentially or linearly are called non-linear data structures. In a non-linear data structure, we can’t traverse all the elements in a single run only. Examples of non-linear data structures are trees and graphs.
For example, we can store a list of items having the same data-type using the array data structure.
This page contains detailed tutorials on different data structures (DS) with topic-wise problems.
Introduction to Data Structures:
- What is Data Structure: Types, Classifications and Applications
- Introduction to Data Structures
- Common operations on various Data Structures
Popular types of Data Structures:
- Linked List
- Binary Tree
- Binary Search Tree
- Advanced Data Structure
- Introduction to Linear Data Structures
- Introduction to Hierarchical Data Structure
- Overview of Data Structures | Set 3 (Graph, Trie, Segment Tree and Suffix Tree)
- Abstract Data Types
Singly Linked List:
- Introduction to Linked List
- Linked List vs Array
- Linked List Insertion
- Linked List Deletion (Deleting a given key)
- Linked List Deletion (Deleting a key at given position)
- A Programmer’s approach of looking at Array vs. Linked List
- Find Length of a Linked List (Iterative and Recursive)
- How to write C functions that modify head pointer of a Linked List?
- Swap nodes in a linked list without swapping data
- Reverse a linked list
- Merge two sorted linked lists
- Merge Sort for Linked Lists
- Reverse a Linked List in groups of given size
- Detect and Remove Loop in a Linked List
- Add two numbers represented by linked lists | Set 1
- Rotate a Linked List
- Generic Linked List in C
Circular Linked List:
- Circular Linked List Introduction and Applications,
- Circular Singly Linked List Insertion
- Circular Linked List Traversal
- Split a Circular Linked List into two halves
- Sorted insert for circular linked list
Doubly Linked List:
- Doubly Linked List Introduction and Insertion
- Delete a node in a Doubly Linked List
- Reverse a Doubly Linked List
- The Great Tree-List Recursion Problem.
- QuickSort on Doubly Linked List
- Merge Sort for Doubly Linked List
All Articles of Linked List Coding Practice on Linked List Recent Articles on Linked List
- Introduction to Stack
- Infix to Postfix Conversion using Stack
- Evaluation of Postfix Expression
- Reverse a String using Stack
- Implement two stacks in an array
- Check for balanced parentheses in an expression
- Next Greater Element
- Reverse a stack using recursion
- Sort a stack using recursion
- The Stock Span Problem
- Design and Implement Special Stack Data Structure
- Implement Stack using Queues
- Design a stack with operations on middle element
- How to efficiently implement k stacks in a single array?
All Articles on Stack Coding Practice on Stack Recent Articles on Stack
- Queue Introduction and Array Implementation
- Linked List Implementation of Queue
- Applications of Queue Data Structure
- Priority Queue Introduction
- Deque (Introduction and Applications)
- Implementation of Deque using circular array
- Implement Queue using Stacks
- Find the first circular tour that visits all petrol pumps
- Maximum of all subarrays of size k
- An Interesting Method to Generate Binary Numbers from 1 to n
- How to efficiently implement k Queues in a single array?
All Articles on Queue Coding Practice on Queue Recent Articles on Queue
- Binary Tree Introduction
- Binary Tree Properties
- Types of Binary Tree
- Handshaking Lemma and Interesting Tree Properties
- Enumeration of Binary Tree
- Applications of tree data structure
- Tree Traversals
- BFS vs DFS for Binary Tree
- Level Order Tree Traversal
- Diameter of a Binary Tree
- Inorder Tree Traversal without Recursion
- Inorder Tree Traversal without recursion and without stack!
- Threaded Binary Tree
- Maximum Depth or Height of a Tree
- If you are given two traversal sequences, can you construct the binary tree?
- Clone a Binary Tree with Random Pointers
- Construct Tree from given Inorder and Preorder traversals
- Maximum width of a binary tree
- Print nodes at k distance from root
- Print Ancestors of a given node in Binary Tree
- Check if a binary tree is subtree of another binary tree
- Connect nodes at same level
All articles on Binary Tree Coding Practice on Binary Tree Recent Articles on Tree
Binary Search Tree:
- Search and Insert in BST
- Deletion from BST
- Minimum value in a Binary Search Tree
- Inorder predecessor and successor for a given key in BST
- Check if a binary tree is BST or not
- Lowest Common Ancestor in a Binary Search Tree.
- Inorder Successor in Binary Search Tree
- Find k-th smallest element in BST (Order Statistics in BST)
- Merge two BSTs with limited extra space
- Two nodes of a BST are swapped, correct the BST
- Floor and Ceil from a BST
- In-place conversion of Sorted DLL to Balanced BST
- Find a pair with given sum in a Balanced BST
- Total number of possible Binary Search Trees with n keys
- Merge Two Balanced Binary Search Trees
- Binary Tree to Binary Search Tree Conversion
All Articles on Binary Search Tree Coding Practice on Binary Search Tree Recent Articles on BST
- Binary Heap
- Why is Binary Heap Preferred over BST for Priority Queue?
- K’th Largest Element in an array
- Sort an almost sorted array
- Binomial Heap
- Fibonacci Heap
- Tournament Tree (Winner Tree) and Binary Heap
All Articles on Heap Coding Practice on Heap Recent Articles on Heap
- Hashing Introduction
- Separate Chaining for Collision Handling
- Open Addressing for Collision Handling
- Print a Binary Tree in Vertical Order
- Find whether an array is subset of another array
- Union and Intersection of two Linked Lists
- Find a pair with given sum
- Check if a given array contains duplicate elements within k distance from each other
- Find Itinerary from a given list of tickets
- Find number of Employees Under every Employee
All Articles on Hashing Coding Practice on Hashing Recent Articles on Hashing
Introduction, DFS and BFS:
- Graph and its representations
- Breadth First Traversal for a Graph
- Depth First Traversal for a Graph
- Applications of Depth First Search
- Applications of Breadth First Traversal
- Detect Cycle in a Directed Graph
- Detect Cycle in Graph using DSU
- Detect cycle in an Undirected Graph using DFS
- Longest Path in a Directed Acyclic Graph
- Topological Sorting
- Check whether a given graph is Bipartite or not
- Snake and Ladder Problem
- Minimize Cash Flow among a given set of friends who have borrowed money from each other
- Boggle (Find all possible words in a board of characters)
- Assign directions to edges so that the directed graph remains acyclic
All Articles on Graph Data Structure Coding Practice on Graph Recent Articles on Graph
Advanced Data Structure:
- Memory efficient doubly linked list
- XOR Linked List – A Memory Efficient Doubly Linked List | Set 1
- XOR Linked List – A Memory Efficient Doubly Linked List | Set 2
- Skip List | Set 1 (Introduction)
- Self Organizing List | Set 1 (Introduction)
- Unrolled Linked List | Set 1 (Introduction)
- Segment Tree | Set 1 (Sum of given range)
- Segment Tree | Set 2 (Range Minimum Query)
- Lazy Propagation in Segment Tree
- Persistent Segment Tree | Set 1 (Introduction)
All articles on Segment Tree Trie:
- Trie | (Insert and Search)
- Trie | (Delete)
- Longest prefix matching – A Trie based solution in Java
- Print unique rows in a given boolean matrix
- How to Implement Reverse DNS Look Up Cache?
- How to Implement Forward DNS Look Up Cache?
All Articles on Trie Binary Indexed Tree:
- Binary Indexed Tree
- Two Dimensional Binary Indexed Tree or Fenwick Tree
- Binary Indexed Tree : Range Updates and Point Queries
- Binary Indexed Tree : Range Update and Range Queries
All Articles on Binary Indexed Tree Suffix Array and Suffix Tree :
- Suffix Array Introduction
- Suffix Array nLogn Algorithm
- kasai’s Algorithm for Construction of LCP array from Suffix Array
- Suffix Tree Introduction
- Ukkonen’s Suffix Tree Construction – Part 1
- Ukkonen’s Suffix Tree Construction – Part 2
- Ukkonen’s Suffix Tree Construction – Part 3
- Ukkonen’s Suffix Tree Construction – Part 4,
- Ukkonen’s Suffix Tree Construction – Part 5
- Ukkonen’s Suffix Tree Construction – Part 6
- Generalized Suffix Tree
- Build Linear Time Suffix Array using Suffix Tree
- Substring Check
- Searching All Patterns
- Longest Repeated Substring,
- Longest Common Substring, Longest Palindromic Substring
All Articles on Suffix Tree AVL Tree:
- AVL Tree | Set 1 (Insertion)
- AVL Tree | Set 2 (Deletion)
- AVL with duplicate keys
- Splay Tree | Set 1 (Search)
- Splay Tree | Set 2 (Insert)
- B-Tree | Set 1 (Introduction)
- B-Tree | Set 2 (Insert)
- B-Tree | Set 3 (Delete)
- Red-Black Tree Introduction
- Red Black Tree Insertion.
- Red-Black Tree Deletion
- Program for Red Black Tree Insertion
All Articles on Self-Balancing BSTs
K Dimensional Tree:
- KD Tree (Search and Insert)
- K D Tree (Find Minimum)
- K D Tree (Delete)
- Treap (A Randomized Binary Search Tree)
- Ternary Search Tree
- Interval Tree
- Implement LRU Cache
- Sort numbers stored on different machines
- Find the k most frequent words from a file
- Given a sequence of words, print all anagrams together
- Decision Trees – Fake (Counterfeit) Coin Puzzle (12 Coin Puzzle)
- Spaghetti Stack
- Data Structure for Dictionary and Spell Checker?
- Cartesian Tree
- Cartesian Tree Sorting
- Centroid Decomposition of Tree
- Gomory-Hu Tree
Recent Articles on Advanced Data Structures.
- Search, insert and delete in an unsorted array
- Search, insert and delete in a sorted array
- Write a program to reverse an array
- Leaders in an array
- Given an array A and a number x, check for pair in A with sum as x
- Majority Element
- Find the Number Occurring Odd Number of Times
- Largest Sum Contiguous Subarray
- Find the Missing Number
- Search an element in a sorted and pivoted array
- Merge an array of size n into another array of size m+n
- Median of two sorted arrays
- Program for array rotation
- Reversal algorithm for array rotation
- Block swap algorithm for array rotation
- Maximum sum such that no two elements are adjacent
- Sort elements by frequency | Set 1
- Count Inversions in an array
All Articles on Array Coding Practice on Array Quiz on Array Coding Practice on Array Recent Articles on Array
- Search in a row wise and column wise sorted matrix
- Print a given matrix in spiral form
- A Boolean Matrix Question
- Maximum size square sub-matrix with all 1s
- Inplace M x N size matrix transpose | Updated
- Dynamic Programming | Set 27 (Maximum sum rectangle in a 2D matrix)
- Strassen’s Matrix Multiplication
- Create a matrix with alternating rectangles of O and X
- Print all elements in sorted order from row and column wise sorted matrix
- Given an n x n square matrix, find sum of all sub-squares of size k x k
- Count number of islands where every island is row-wise and column-wise separated
- Find a common element in all rows of a given row-wise sorted matrix
All Articles on Matrix Coding Practice on Matrix Recent Articles on Matrix.
- Commonly Asked Data Structure Interview Questions | Set 1
- A data structure for n elements and O(1) operations
- Expression Tree
Improve your coding skills with practice.
Automated assessment system for programming courses: a case study for teaching data structures and algorithms
- Development Article
- Open access
- Published: 15 August 2023
- volume 71 , pages 2365–2388 ( 2023 )
You have full access to this open access article
- Andre L. C. Barczak ORCID: orcid.org/0000-0001-7648-285X 1 , 2 ,
- Anuradha Mathrani ORCID: orcid.org/0000-0002-9124-2536 2 ,
- Binglan Han ORCID: orcid.org/0000-0001-5376-8986 2 &
- Napoleon H. Reyes ORCID: orcid.org/0000-0002-0683-436X 2
Explore all metrics
Cite this article
An important course in the computer science discipline is ‘ Data Structures and Algorithms’ (DSA). The coursework lays emphasis on experiential learning for building students’ programming and algorithmic reasoning abilities. Teachers set up a repertoire of formative programming exercises to engage students with different programmatic scenarios to build their know-what, know-how and know-why competencies. Automated assessment tools can assist teachers in inspecting, marking, and grading of programming exercises and also support them in providing students with formative feedback in real-time. This article describes the design of a bespoke automarker that was integrated into the DSA coursework and therefore served as an instructional tool. Activity theory has provided the pedagogical lens to examine how the automarker-mediated instructional strategy enabled self-reflection and assisted students in their formative learning journey. Learner experiences gathered from 39 students enrolled in DSA course shows that the automarker facilitated practice-based learning to advance students know-what, know-why and know-how skills. This study contributes to both curricula and pedagogic practice by showcasing the integration of an automated assessment strategy with programming-related coursework to inform future teaching and assessment practice.
Avoid common mistakes on your manuscript.
As the computer science discipline (CS) continues to advance, educators apply innovative pedagogies that have greater focus on experiential learning to prepare students for computing careers. The key tenet underlying CS courses is that of writing effective programming code, since this helps demonstrate learner aptitudes for higher-order computational thinking, algorithmic reasoning, problem decomposition, iteration, recursion and overall, predisposition towards dealing with code complexities (Lemay et al., 2021 ). The computing curricula task force ( 2021 ) emphasizes that graduates from this discipline must be able to apply know-what, know-how and know-why skills that are indicative of professional practice. Know-what relates to “factual understanding” of the subject matter or “topics in the syllabi”, which must be acted upon with a degree of proficiency to activate the know-how skills. Know-how demands time and practice that requires “engagement in a progressive hierarchy of higher-order cognitive process”. Finally, know-why denotes the values and motivations that “moderate the behavior of applying “know-what” that becomes “know-how”’ (p. 48). Therefore, by integrating meaningful lab activities into coursework, students can move from following simple step-by-step instructions (or “know-what goals”) to engage in practical understanding, application and synthesis of learned knowledge (or “know-how” goals) to much higher-order thinking based on the newly acquired conceptual knowledge (lumped together as “know-why” goals) (Zvacek, 2015 ) However, it is generally acknowledged that teaching practical courses such as programming is particularly challenging (Daradoumis et al., 2019 ; Watson & Li, 2014 ), and there is little understanding of how novice programmers develop coding proficiencies (Luxton-Reilly et al., 2018 ). Writing effective code is an ever-evolving craft that is practice-driven. Programmers must have good understanding of the code syntax, program structures and logical expressions that together inform the practice of writing effective code (Parsons et al., 2016 ). Instructors therefore reinforce student learning by integrating a variety of programmatic scenarios that embody some key coding concept in the CS discipline. Programming exercises with varying levels of difficulty form a repertoire of formative learning activities to engage students and develop their programming capabilities. However, instructors are often constrained in providing individual feedback, which may be due to lack of time, large student enrolments, or simply due to the nature of the course delivery format in particular institutions (Medeiros et al., 2019 ). Moreover, manually evaluating students’ code submissions is repetitive and time-intensive, that causes additional teaching workload. Furthermore, two teachers marking the same assessment may rarely apply the same criteria, so a student’s mark may vary based on the teacher who assessed their code (Insa & Silva, 2018 ); therefore, manual marking is indeed subjective having inconsistencies that are not always fair to students.
Now, automated assessment (AA) tools have provided a dual perspective. First, they ease the mundane nature of marking and grading assignments for teachers, and second, these tools can facilitate prompt formative feedback to students (Skalka & Drlik, 2020 ; Souza et al., 2016 ). AA tools rely on a coherent rubric thereby removing subjectivity concerns particularly those associated with variance due to teachers’ marking styles or other forms of inconsistencies that may crop up with manual marking. Therefore, in meeting the realities of ongoing formative assessments, automation can bring more reliability in grading. With this in view, a huge demand exists among software houses and training programs on building AA capabilities to enhance e-assessment strategies (Ullah et al., 2018 ). Many web-based tools that incorporate static and dynamic analyses of the code snippets have emerged (Amelung et al., 2011 ; Staubitz et al., 2015 ; Ullah et al., 2018 ); however, these are often limited by strict file formatting guidelines (e.g., character-character equivalence across text files), broad error categories (e.g., “wrong answer”, “presentation error”, “compile-time error”) or incompatible compiler versions which can confuse students (Rubio-Sánchez et al., 2012 ).
This study expands on the current state of AA research specific to the ‘Data Structures and Algorithms’ (DSA) course being taught at a tertiary institution. Drawing upon the limitations of currently available AA tools, we propose the design and development of our bespoke AA tool (henceforth referred as the automarker). Activity theory (AT) has provided the pedagogical lens in aligning the automarker with the DSA coursework (Engeström, 1999 ). The foundational concept in AT is how each ‘activity’ fits in a specified learning environment. Seven prescribed coursework activities that inform the DSA coursework underpin the students know-what, know-how and know-why skills. Finally, a survey of students has revealed their perceptions towards the use of the automarker as a pedagogical tool in DSA coursework. This study therefore contributes to the advancement in the field of e-assessments with use of specific examples of instructional activities for a given coursework in the CS discipline and brings awareness on how students can build up their knowledge competencies.
High student–teacher ratio in CS courses puts extra demands on instructors as they have to mark student programming assignments and further provide students with effective feedback, all of which is very time-consuming. AA tools can reduce teachers’ workload drastically (Gordillo, 2019 ), since these tools can automate these tasks by integrating marking/grading functionalities via test cases that refer to the executable code submissions. Comparisons are made between test case results specified from a model solution provided by the teacher and that of the student code, to see whether both results are identical, although some permutations may be allowed for exercises, such as those involving a list-valued output (Amelung et al., 2011 ). Some commonly used plugins that incorporate AA approaches include Algo+ (Belhaoues et al., 2016 ), EFPL (Bey et al., 2018 ), WebCAT (Manzoor et al., 2020 ) and CodeRunner (Soll et al., 2021 ). However, AA tools need careful consideration; teachers must pay specific attention to the pedagogical design of programming activities, since programming is driven by practice and is “not an exact science”(Bey et al., 2018 , p. 260). Incorrect application of AA tools can negatively impact student engagement and performance, for instance, students may rely on such tools by way of numerous trial-and-error submissions until a correct response is received (Amelung et al., 2011 ; Rubio-Sánchez et al., 2012 ) rather than leverage them for critical thinking or building problem-solving capabilities. Hence, before implementing any AA tool, teachers must ensure that it serves the educational goals of building professional practice and encourages learner reflection.
The computing curricula ( 2021 ) is broadly scoped to cover vital areas such as computer science, information technology, information systems and software engineering; however, the overarching framework of this curriculum specifically mentions that ‘the ability to develop advanced algorithms and data structures [are] developed in computer science’ (p. 29). The CS curriculum prescribes DSA as a fundamental software course that lays emphasis on the discovery of programmatic approaches which can then be applied to datasets that are arranged in some specific order (e.g., lists, stacks, queues, trees, maps or graphs). DSA coursework exposes students to algorithms that make efficient use of computer resources (e.g., time and memory) to solve data problems (e.g., search, sort, group, etc.) wherein data structures hold the operational data. Understanding the dynamics of performing operations on data structures via algorithmic methods can be confusing for students (Su et al., 2021 ); as such DSA course offers a practice-based learning environment that uses different programmatic scenarios (Restrepo-Calle et al., 2019 ). Moreover, AA tools can assist teachers in evaluating the algorithmic execution steps across various lesson topics that form part of the DSA coursework (Belhaoues et al., 2016 ; Gárcia-Mateos & Fernández-Alemán, 2009 ; Soll et al., 2021 ).
Gárcia-Mateos and Fernández-Alemán ( 2009 ) used the AA tool (Mooshak) as an online judging strategy, in which all student marks were made public so that each student could see their own performance relative to their classmates. Those students who passed all the problem exercises did not have to do the final exam. The authors consider public ranking promotes competitiveness and motivates students to perform better. Although it can also be speculated that such ranking experiences could discourage students if they are not performing as well as their classmates. A drawback of Mooshak was that incorrect submissions were broadly classified as “wrong answer”; hence, the tool could not be leveraged by students in fully developing their know-what, know-how or know-why skills. Rubio-Sánchez et al. ( 2012 ) too found poor acceptance for Mooshak in their study. The authors point out that broad feedback responses returned from Mooshak could be for many reasons. These could be if Mooshak’s compiler version is different to the student’s version resulting in a “wrong answer” even if the program works on the student’s computer. Or the character-by-character equivalence in text output files causes errors if real numbers are rounded off or truncated. These unexplained errors trigger confusion and frustration among students. Moreover, none of these studies gave information on the alignment of instructional activities prescribed in the DSA coursework or of the underlying pedagogy used for assessing coding practices.
Belhaoues et al. ( 2016 ) caution on the need of semantic structure and pedagogical approach in using algorithmic exercises for DSA coursework. Explanation, representation and formalization of the algorithmic exercise must be expressed clearly for learners to act on a given problem. Moreover, an underlying pedagogical objective is crucial when AA tools (e.g., Algo+) are to be used for delivering programming exercises to students. That is, the knowledge base (or algorithmic exercises base) must have proper ontological grouping that clearly puts forth the theoretical notions of the domain coupled with a rational approach to support students in responding to exercises that are then graded by AA tools. Description of each exercise put forth via the AA tool must have a justified pedagogical foundation that formalizes a set of skills and notions pertinent to that algorithmic field of study. The authors note that while plenty of algorithmic problems may be available in public databases, they often lack pedagogical, semantic and epistemological organization. Each exercise activity should have a student-centred focus for building up of students’ knowledge skills (i.e., know-what, know-how and know-why skills).
Soll et al. ( 2021 ) describe the use of CodeRunner, a readily available AA tool, that can be embedded in the Moodle learning management system (LMS). They created relevant programming exercises (based on DSA coursework) and allowed students to submit their code to CodeRunner from their LMS login accounts. While students could apply their knowledge gained in the lectures to the programming tasks, many problems were encountered with CodeRunner. These included lack of transparency of the user interface, absence of debugging tools, long execution time whenever infinite loops were encountered and other malfunctions during program execution (e.g., errors when dealing with corner cases, missing test cases for unanticipated errors, etc.).
Purpose of this study
Published literature has highlighted the significance of AA tools specifically in the context of DSA coursework. Prior studies lay emphasis on having a pedagogical approach when using e-assessment tools (e.g., Mooshak (Gárcia-Mateos & Fernández-Alemán, 2009 ), CodeRunner (Soll et al., 2021 )) for integrating programming exercises into the coursework. Further questions are raised regarding the instructive value of available off-the-shelf AA tools (Gárcia-Mateos & Fernández-Alemán, 2009 ), as these are seen to be lacking in their ability to give proper formative feedback to students, besides having other technical difficulties (e.g., strict file formats, infinite loop traversals, different compiler versions, etc.). Taking these shortcomings into consideration, we propose a bespoke automarker specifically designed as an instructional tool for formative learning activities. Activity Theory has provided the underlying pedagogical lens (Basharina, 2007 ) for analysing learner interactions with the prescribed coursework activities. The fundamental concept of AT is the ‘activity’, which in this case study, relates to seven instructional activities pertaining to different topics/sub-topics of the DSA coursework.
The study demonstrates the appropriation of the automarker as an instructional strategy for experiential learning and building learners know-what, know-how and know why competencies. We describe how our automarker e-assessed students’ programming assignments and overcame the technical difficulties identified in previous research studies. Further, we explore if students could meaningfully engage with the enumerated coding tasks to complete their assignments (know-what) while working in the given automarker-mediated settings. Also, how did the automarker-mediated feedback assist students in enhancing their logical and reasoning capabilities and help them in successfully accomplishing the given tasks (know-how). And finally, did these ongoing interactions enable self-reflection, wherein students corroborated the automarker’s feedback and analysed different coding perspectives for further advancing their algorithmic reasoning skills (know-why).
- Activity theory
Activity theory has enabled reflection of pedagogical practices by zooming into different learning activities which are goal-oriented and tool-mediated within some higher educational context (Mathrani et al., 2020 ; Murphy & Rodriguez-Manzanares, 2008 ). Daniels ( 2004 ) views that AT has opened up our pedagogic imagination as it can facilitate empirical research on pedagogical use of digital technologies with specific analysis of how different activities enable transmission of the prescribed knowledge and skills. For instance, Park and Jo ( 2017 ) used AT to evaluate students’ usage patterns from log data extracted from an institutional learning management system. While Adam et al. ( 2019 ) examined student perspectives of how online tools are implemented in virtual learning spaces in a developing country context. Activity theorists can therefore investigate complex human–computer interactions by breaking them down into smaller categorical elements to understand purposeful acts of individuals as they work within some constraints with appropriated tools (Basharina, 2007 ).
AT extends the three elements of the Vygotskian triangle (1934–1987) into six elements that together realize the final outcomes. The three elements of the Vygotskian triangle comprise subject (instructor of the DSA course), tool (automarker) and object(ives) (for building algorithmic reasoning and programming skills among learners). The expansion of the triangle with three more elements, namely, the (learning) community (comprising instructor and students), rules and regulations (prescribed in the DSA coursework) and division of labour (spanning teacher and learner roles) provides the case background. Further, the core of AT is the proposed learning activity, which emphasizes how ongoing interactions are taking place between the six elements. The (learning) community members have clearly defined responsibilities specific to their role as a teacher or a learner and each must comply with set rules and regulations for meeting desired goals. Pedagogical use of digital technologies (e.g., automarker, test cases, LMS, etc.) further assist in achieving learning goals, which are then transferred into final outcomes , such as, facilitating programming practice among learners and establishing effective workaround strategies for the instructor. Figure 1 (A) and (B) demonstrates the underlying AT elements and how these are mapped to this study’s context.
Activity theory contextualized to this study’s context
Computing courses are interleaved within different types of technology-supported environments where learners immerse themselves with ongoing learning experiences. The teacher ( subject ) prepares practice learning scenarios to improve students understanding of technical knowledge (e.g., data structures, algorithms, network applications), and together they form (learning) community spaces. Within these communities, students implement a wide range of learning activities that are governed by the rules and regulations laid as per DSA coursework as they acquire new knowledge (i.e., know-what, know-why and know-how). The division of labour refers to teacher’s responsibilities in the preparation and presentation of appropriate formative learning exercises for facilitating student engagement. Since different programming skills are applied to different topics/sub-topics, the teacher must plan instructional exercises that are pertinent to each topic. Teachers, set up practice scenarios which require students to submit their code to be assessed by the tool (automarker). For example, a scenario could be related to sparse matrices or to queue implementations. The tool instantly checks each student submission and displays the marks to the student. In case of any mistakes, it presents them with clues to rectify these mistakes and further provides more opportunities to re-submit and improve their marks.
In doing so, four objectives or goals are met. First, the teacher can dynamically display all coursework exercises which can be leveraged by the enrolled students. Second, students can independently engage with the tool and practice their coding skills where they can self-verify the correctness of their code and make emendations before making a final submission. Third, the time taken by the teacher to manually evaluate each student code submission and provide them with personalized feedback is eliminated. Fourth, any chances of errors that can accidentally crop up with manual marking process are much reduced. The accomplishment of these goals further informs the final outcomes in setting out clear expectations regarding student feedback and assessment style for facilitating an effective and interactive teaching and learning environment.
Research about the pedagogical use of digital technology spans the development of the technological system in a real-world teaching context; therefore, any empirical data that is gathered is symbolic and must be interpreted in its appropriate context (Twining et al., 2017 ). Twinning et al. advise researchers to properly outline their research design by providing rich descriptions of their research setting, the data collection instrument and sample size. A copy of the data collection instrument may be provided either as an appendix or as a linked document for adding more context (Tong et al., 2007 ). Such background information allows the reader to understand the research setting from an ontological position (or the nature of reality). In the field of instructional technology, the researcher must consider the extent to which their product’s design can be considered a specific intervention or is generalizable to a target population (Richey et al., 2004 ). Our instructional product (automarker) comprises learning activities that focus on self-evaluation by students as they build their programming skills; hence, is generalizable to student populations pursuing programming coursework. These learned skills can enhance students' ability to tackle a wide range of programming scenarios and help develop a problem-solving mindset, all of which are applicable across various courses of computer science and beyond. Richey et al. ( 2004 ) recommend that generalizable products should clearly specify the design, implementation and evaluation methods that have been used. We have laid out the architectural design of our proposed automarker tool and how it has been deployed in a real-world teaching and learning environment in the following two sub-sections. Next, in the subsequent section, we have described how the tool framed various coursework activities and how the learner community comprising of both students and teachers interacted with it. Subsequently, we conducted a student survey to understand how the various learning interactions facilitated by the tool helped build learners’ programming competencies. A copy of our survey instrument is given in Appendix A . Student participation to the survey was voluntary and anonymous. Our survey was conducted over two consecutive DSA course offerings, for which we received a total of 46 responses from a combined class size of 100 students. Of these, seven responses have been eliminated, as these comprised blank responses, with neither any rankings nor any comments. Therefore, 39 valid survey responses have informed our study.
Further, Twining et al. ( 2017 ) add that the underpinning theory too needs to be explicitly articulated so that the epistemological position (or the nature and scope of knowledge) is clear and enables the researcher to provide justification of their results and of the conclusions drawn. This will ultimately demonstrate “how we come to know the world” and showcase their interpretive judgement (p. A2). Activity Theory (elucidated in the previous section) has framed this research study and helped to interpret the role of technology (i.e., the automarker) for building knowledge competencies among students with the use of practice-based programming exercises. Section “ Viewing from the activity theory lens ” describes how AT worked as a pedagogic lens for viewing the alignment of the automarker with formative learning exercises that were put forth for a given coursework. This alignment is explained in the context of the learning community comprising students enrolled in a DSA course and the teacher of the course.
The following sub-section outlines the architectural design and deployment of the automarker for DSA coursework. The subsequent sub-section expands on how the automarker’s software design helped to overcome some of the limitations of available off-the-shelf AA tools that are mentioned in prior literature (and which have been elucidated in Sect. “ Related works ”).
Automarker: architectural design and deployment
The automarker has been purpose-built for mapping various programming exercises (in C ++ language) prescribed for a 2nd year DSA coursework of an undergraduate degree programme. It comprises a client–server architecture, where the code is written in PHP and all students’ data is stored in MySQL. The deployment uses Apache under Linux. The server exists as a stand-alone system that is not integrated in the institutional LMS; hence, access to this server could be availed either from within the institutional network or via the institution’s virtual private network (VPN). All students are eligible to apply for VPN access from their institution. Provision for VPN access is especially relevant for CS courses as these are known to make extensive use of dedicated server configurations for practice teaching and learning purposes.
Students upload their C ++ code files (based on prescribed programming exercises) to the automarker using a code template (or code skeleton). The server immediately compiles student’s code using GCC (Stallman & DeveloperCommunity_Gcc, 2009 ) after which it executes the code over 10 test cases, while referencing a model solution supplied by the teacher. Depending upon the number of test results that matched with the model answers, the student files are marked anywhere between 0 and 10. Students are given endless opportunities to resubmit their codes multiple times until the assignment deadline is reached. Submitted code is automatically evaluated using a set of inference rules (via test cases) and feedback is provided by means of clues which in turn encourages students to debug their code and improve their marks. There is always the danger that fixing code to get one of the tests to present the correct answer may break a test that had previously passed. This is all part of the learning and self-reflection, as students strive to validate their code for all tests. Therefore, with this form of assessment design with the automarker tool, instructors can instil algorithmic concepts in students, as they encourage students to hone their programming skills through sustained practice. Figure 2 illustrates the flowchart of the students’ code submission process.
Flowchart of the submission process
Automarker: software design
AA tools are known to be constrained by precise formatting of files, such as character-to-character equivalence across text files, since assignments are directly handled by the machine; therefore, having a well-defined code template is crucial (Rubio-Sánchez et al., 2012 ). While some formatting instructions were laid out to the students regarding their code submissions to the automarker, we have added some more flexibility. In our case, a simple parser allows for some text variation, such as spaces and capital letters, in reading results from the given code for each test input. Additionally, specific values expected from the test results have been considered within a certain interval, thus enabling the teacher to set up an interval that they consider close enough to the quasi-optimal answer. This has provided a huge advantage over off-the-shelf AA tools, since it helps overcome the marking errors that could occur with different compiler versions that students may have used, or when dealing with certain optimisation problems where the answer may be only an approximation of the optimal answer. One such example is the queue assignment, where the results can depend on the initial state and on the order of the updates of multiple queues. Correct answers may vary a little for slightly different solutions found by students. Since the automarker can be set with a tolerance value, each test will consider this tolerance to inform students about the result of the test cases. Those assignments that have only one correct answer per test have their tolerance set to zero.
Another advantage offered by the automarker is that students need not be connected to the server all the time, rather they can develop and test their codes in their own environments. Students only connect to the server when they have pre-tested their code in their machines. This is a major problem exposed in the study by Soll et al. ( 2021 ), where they used CodeRunner that was implemented in Moodle. The study found that students often blamed the system for problems in their code. A typical complaint was that their AA system was not giving students enough time to run code, when in fact students had an infinite loop. Our automarker gave specific amounts of time to run the code, and if it ran out of time the feedback specifically stated the possibility of an infinite loop in the code. It is not possible to know if the program really is or is not in an infinite loop, due to the undecidability of the halting problem (Davis, 1958 ). However, one can still infer that if the program took more than 20 or 30 s when it should have taken a fraction of a second, it is either because the code is extremely inefficient or goes into an infinite loop. In either case, students must modify their code to fix the problem. Moreover, as students could also run the code in their own machines prior to submitting the code to the automarker, they could easily find these problems.
Another major limitation in AA tools is that it is not possible to analyse the code directly and give a specific clue as to why the code is failing. This is part of life of a programmer, as they need to learn how to code, and more importantly, learn to debug their code. As part of the course, we also teach debugging techniques, and try to convey the message to students that they are ultimately responsible for their bugs.
Viewing from the activity theory lens
Engeström ( 1999 ) suggest that analyses of activity-theoretical constructs can lead to framing of multiple perspectives, such as ‘mental model’, ‘repertoire’, ‘social representation’ or ‘attitude’, which are manifested in how learning is undergoing developmental transformations over time. AT provided us with a technocentric lens (Murphy & Rodriguez-Manzanares, 2008 ) to examine the prescribed DSA coursework tasks and understand the (learning) community perspectives in regard to how the instructional activities led to the transformation of the goals/objectives into the final outcomes in the automarker-mediated environment. These two perspectives are elaborated next.
The learning principles in CS are strongly practice-driven with “40% of its core hours [allocated to] algorithms and complexity, programming” and building “abstract computational capabilities” (Computing_Curricula_, 2020 _Task_Force, 2021 p. 27). Therefore, the DSA coursework was organised into seven distinct topics related to the different data structures, namely linked-list, stack, queue, list, tree, graph and heap. Algorithms pertaining to appropriate application in the conduct of additions, conversions, searches, sorting, and evaluations within some given problem context (e.g., find the shortest path, count the number of operations, etc.) were put forth. AT describes activities as an empirical social construction process that involves goal-directed actions and motive-oriented actions which are performance based. The topics informed the design of the (learning) activities that gradually extended in complexity of programmatic concepts and their applications to challenge students in building their knowledge capabilities.
Table 1 describes the seven learning activities prescribed in the context of the DSA coursework and lays out the functionality of the automarker for marking of the student code submissions. The automarker evaluates the performance of the submitted codes according to a list of programming tasks required for accomplishment of learning activities and enables learner interactions via a fusion of know-what, know-how and know-why competencies.
The automarker employs a variety of tactics and tools to mark the assignments accurately. Firstly, it imposes different levels of stress tests to gauge the robustness of the submitted code solutions; thereby, allowing it to award scores that commensurate with accuracy of code solutions. It consults a model answer key to check if the output result from the student code solution matches within some tolerance interval. Expert knowledge of the subject content is infused into the marking strategy. For those items/tests where there is exactly a single correct answer (e.g., minimum path cost), a tolerance of zero is set. On the other hand, for tests where slight variations in code implementations are expected to generate slightly different performance outcomes (e.g., count of the number of operations), a tolerance interval is set accordingly. The use of a timer as a guard against infinite loops enables the automarker to check codes in extremely challenging scenarios. Moreover, the same timer mechanism also enables the automarker to catch highly inefficient and incorrect code implementations, as such codes would run significantly slower than the correct ones.
Students enrolled in the DSA course, and the teacher of this course form the learning community. In presenting the seven coursework activities, proper care has been taken to provide a consistent and simple user interface so as not to distract the students with unnecessary detail. The automarker ensured personalized interactions with the learner are maintained to retain learner’s interest, such as by providing them with some clues on likely mistakes in their code. In this manner, the automarker provided guidance (in lieu of the teacher) leading to self-awareness as students could self-assess their code and work towards fulfilling their assignment expectations.
Students engaged with the automarker to self-assess their C ++ code files (that formed part of their assignment) before making a final submission of the assignment for grading. Figure 3 A presents examples of the feedback provided when all test results are correct, Fig. 3 B shows an example where the first test has failed errors while Fig. 3 C shows a mix of pass and fail tests.
Automarker displays for students
The automarker’s parser allows for minor differences between the standard results and student’s responses, as long as the output gives the correct numerical or logical answers. One of the frustrations with other AA tools is that a single space or minor formatting difference will trigger a test failed response even though the correct result is outputted. The automarker on the other hand gives a warning in case of format differences; however, if the test result is numerically correct the marks are given to the student.
The automarker is a generic tool for any kind of assignment, hence the answers cannot be hardcoded for any activity. Instead, the parser shows the difference between the expected result and the student’s result for each test case. The student can look at the display to understand how to arrive at that result. Some clues are provided, which are not very specific, rather they align with the learning concept, that is, the student is given the clue in the context of the assignment to enable some understanding on the difference. Moreover, normally half of the inputs are known to the students, but the other half are kept hidden. This is to avoid student attempts to hard-code their results to fit into a certain pattern. Moreover, the hidden tests are usually too big to allow for hard coding. For example, in the sorting assignment, a certain data structure called Heap needs to be created that is used in the sorting algorithm. Some of the tests involve thousands of numbers and so it is not feasible to create the result by hand. The logic must be therefore understood with a small example, which can then be generalised to larger examples.
Another aspect of the automarker is that specific tests can be made more difficult to pass; accordingly, the students need to generalise their code for all cases. In the example shown in Fig. 3 C, the student’s code passes when the RPN expression (reverse Polish notation) is correct, but it is unable to display when the RPN is incorrect. This could be because there are either too many numbers, or too many operators. This gives a chance to the student to consider all such results when modifying their code. Also, a student may get partial marks, in which case they are rewarded for the efforts made so far.
The teacher’s first responsibility is creating assignments that align with the coursework. With the automarker, the teacher can create as many assignments as they want, since deploying a new assignment requires little set up time. Once a new assignment is created, the teacher runs the model solution with given test data input and saves the output result to a text file. Simple text files hold the input (or inputs). Students are taught how to get arguments from a command line over the coursework where they use text files as the only input for their assignments. The format of these input files is explained to the students when the assignment is first presented. Once the instructor has both input files and output files for all test cases, then these files are uploaded into a specific directory in the server using the teacher’s login account (that has additional privileges).
The automarker offers other administrative functionalities. For example, before a course starts, the instructor uploads the names of all enrolled students (via a.csv file) using the administration screen. Students accounts are then automatically created in the server (which includes this information in the SQL database), and further directories are created which will contain the test files. The students are emailed individual login accounts (comprising name and password). Moreover, single addition or deletion changes too are supported, for example, to accommodate a late course enrolment or a case of an enrolment cancellation.
The administration screens also allow for parameter setup in assignments (Fig. 4 A), such as number of assignments or tolerance for the numerical results. A single screen view (Fig. 4 B) presents all the current results for each student (anonymised). This is a good feedback stream to the instructor since it informs them how students are coping with a particular assignment; this is especially helpful for assignments that have more complexity. The teacher can then offer extra explanations or provide help in such assignments as soon as the problem is identified. Often a lack of pre-requisite courses can lead to learning difficulties; therefore, such feedback can assist the instructor in identifying this to be an issue for a certain assignment, based on which they can provide additional support.
Automarker displays for teachers
The instructor can re-visit student codes and check the results if needed, without any specific software or tool. Further, the instructor can re-run the student codes as many times as they need to, and the results of their re-runs will not interfere with the student’s submission. From the instructor’s point of view, the automarker provides them with much flexibility, as it only requires them to run test inputs using the correct code to generate text files that can be used directly by the server. It is also very easy to create new assignments, generate new test cases or modify the format of the output with minimal effort, and make assignments instantly available to students.
This section consolidates how the learning interactions aided in building know-what, know-how and know-why competencies with the commissioned automarker. We draw upon the evidence gathered from a student survey that was conducted in the latter part of the teaching semester (i.e., after five assignments). This way, survey responses were based on students’ current emotional state and were not retrospective in nature.
The first question (on the first page of the survey instrument) asked students to rate their C language programming skills before enrolling in the DSA course. The DSA coursework used C ++ programming language which had a pre-requisite programming course on C language. The question posed was: “How would you have rated your C language programming skills before enrolling in the DSA course (i.e., writing, compiling and running simple C programs using arguments from command line)?” Self-assessment was considered as a useful starting point since it allowed students to first reflect on their programmatic skills prior to enrolling in the DSA course. Students ranking were evenly split with 19 responses marked as good/excellent and 20 responses as average/poor (Fig. 5 ).
Programming self-evaluation by students
Next, we asked them how satisfied they were with the instructions provided, the format of code templates and the sample files used for file submissions. We also queried on the ease in creating and submitting code files to the automarker. These helped to gauge the know-what aspects, that is, could students understand the step-by-step instructions regarding the file formatting and whether they could create such files easily. Overall, students expressed their understanding as satisfactory and said that creating files in the specified format was an easy task with only one response stating this as not easy.
The following two questions sought to understand whether the clues and test cases enabled students in understanding their coding errors and their level of satisfaction with the feedback provided. That is, could students transform the feedback into a higher-order skill level and independently resolve their coding errors (i.e., know-how). Students overwhelmingly stated that the clues/comments were extremely helpful in completing their assignments (30 responses) though few others considered it moderately easy or were neutral. Regarding their level of satisfaction with the feedback from the automarker, responses ranked this as very satisfied, moderately satisfied or neutral, although one respondent gave it a dissatisfied rank.
Finally, we asked students whether the provided feedback (from the test cases) helped in code debugging and further how actively they worked towards passing all the test cases. These self-reflection questions gauged whether students had tried to fully grasp the feedback which was then actioned thoroughly into their final code submission (i.e., know-why). Students gave positive responses to both these questions with 37 responses indicating that they had actively engaged in resolving coding errors based on the automarker feedback, with one student saying that he did not make full use of this feedback, while one other had never used the automarker. Figure 6 provides a snapshot of all these survey responses, which have been categorized into know-what, know-how and know-why skill measures.
Quantitative measure for know-what, know-how and know-why skills
Next, students had to rank all the five assignments in the context of learning of programming constructs, their level of interest and overall experience with automated assessments. Responses reveal that majority of students were much satisfied. Overall, students considered the assignments had assisted them with learning programming, had made them interested in the different programming tasks and that this form of assessment was an excellent way to learn programming. Based on these responses, we find that the automarker served as a formative instructional strategy for experiential learning and honing students’ programming skills.
Next, students were asked to provide (optional) comments while ranking their responses. While students indicated much positive aspects of this form of e-assessment, their comments provided more context to their ranked responses. One student observed “ some assignments are too easy, due to the amount of skeleton code ”, while another said that another student said that “ the automarker takes away the guessing game for assignments so it’s easier to know how well you’ve done in it. I am a big fan of it”. Another student found the template used “ too much C shorthand [which had] no hanging brace format ” which can get “ confusing ” as these “ students have only completed 101 [i.e., the first-year pre-requisite course in programming]” programming course.
Regarding know-how skills, students said that the use of test cases and clues enabled them to correct their coding mistakes and re-submit their assignment. One student commented “ It’s good to know where code is expected but don’t give the game away ” implying that sometimes the feedback was rather explicit, which made it an “ easy ” task. Yet another student voiced “passing of all test cases would give me confidence that my program works as intended for a variety of test cases”. Other responses were “overall I think this assessment model is an improvement” and “it's made debugging easier”.
In response to the know-why questions that enabled self-reflection, one student said “ actually it is hard to know where the bug is. I have to think about it myself ”. Another student added that their code still had bugs and they “ did not get all pass result ” in some of the programming tasks. Another response said that they needed more such assignments “to cover all the algorithms taught in the course” while another added that they had learned “about how to debug and find errors in the code”.
One student voiced irritation about access to the VPN. Students must specifically request for VPN facility from the host institution and set it up on their home machines. Those students who had not availed the VPN facility found it “annoying” that their C ++ code could not be checked from within their LMS (Moodle). Workplaces often require workers to use VPN for accessing organizational systems; therefore, having awareness of VPN protocols is also considered a learning activity. The student commented that they did not use the automarker as they faced difficulties in accessing it from outside the institution’s network which was “ annoying ”. No mention of reasons on why the VPN facility was not availed was made. However, many students found this type of formative assessment useful, especially since it gave them immediate feedback. This one comment “ the automaker really helps and every programming course should have this ” sums up the student satisfaction levels, which is also evident from quantitative rankings given by the students (Fig. 6 ). We find that overall response towards the use of the automarker was favourable, with only one student expressing their dissatisfaction.
Figure 7 provides more self-evaluation for the five programming assignments. Student responses indicate that formative assignments when facilitated with AA tools helped students learn relevant programming constructs and largely enhanced their interest in programming. Overall, students found this form of e-assessment with automarkers an excellent learning experience. However, few students were neutral and did not express much appreciation for the automarking strategy used in the DSA course.
Ranking of programming activities (sparse matrices, linked lists, queues, lists and trees)
Educational pathways in practice-oriented disciplines, such as computer science, call for coursework that promote experiential learning and support students in building programming abilities that demonstrate a higher-order skill level. Computer science courses involve continuous practice tasks with frequent formative exercises so that students can attain appropriate knowledge competencies that are much in demand by tech businesses of today (Zvacek, 2015 ). However, manual assessment of ongoing formative exercises (especially programmatic tasks) can become very time-consuming and mundane for teachers, more so, with the high student-to-teacher ratios in the computer science discipline. Therefore, automated assessment tools can assist in frequently conducting formative assessments while at the same time not overburdening teachers with marking and grading. Moreover, the strategy of allowing students to resubmit codes multiple times and provide them with automated feedback encourages them to practice and build their coding competencies to develop a higher internal mental functioning.
A detailed analysis of different learning activities prescribed for a foundational course, Data Structures and Algorithms, has revealed how e-assessment strategies can bring about self-awareness and self-reflection among students. Activity Theory helped frame learning interactions with our proposed tool as we investigated e-assessment as an instructional strategy for formative learning. The tool helped students deliberate on what programming tasks had to be carried out (know-what), assisted them in prompt resolution of programming errors that may have been made (know-how) and engaged them in higher-order thinking as they reflected over multiple programming tasks of varying degrees of difficulty (know-why). As students developed self-awareness after self-validating their code, it helped them achieve sustained practice in writing programming code effectively (Barra et al., 2020 ).
We have provided substantive focus on the nature of the tool design, what it represents in terms of the prescribed coursework (pertaining to Data Structures and Algorithms) and how this can be used for practice-based assessments. It has further enabled the teacher to produce frequent assessments without having to take on additional load of marking/grading the assessments. All this contributes to clarifying the ontological position or the nature of reality in educational settings. Next, the epistemological position is articulated with the framing of pedagogical elements with the Activity Theory constructs. The automarker served as an instructional tool that was integrated into course-specific formative learning exercises. As students interacted with the tool, they were provided with relevant clues/comments to help develop their practice of writing effective code. Ongoing negotiation with the tool further helped in the construction of new knowledge across the community of learners. This emergent knowledge is very much practice-driven based on student participation with ongoing learning activities until higher order levels of skills and competencies are developed. The three skill levels articulated in the latest computing curricula (2021) have been fortified by a student survey. Survey data reveals student perceptions towards their know-what skills (i.e., their understanding on the subject matter and in following the given task instructions), know-how skills (i.e., their ability to interpret the feedback provided and applying it to improve the quality of their code) and know-why skills (i.e., their inclination and disposition to “connect the ‘better’ or ‘correct’ application of knowledge and skills to the context where and why it is applied” (p. 48)). Our findings show that the automarker supported student learning to actively pursue their programming tasks and guided them in reaching higher-order cognitive skill levels. Such an automated instructional strategy can have implications for learning other programming languages, or in other practice-based course curricula settings.
Limitations of the study and future work
This study has provided empirical evidence on the use of e-assessment as an instructional strategy during formative learning to enhance the overall learning experience. We acknowledge that the survey data is limited to thirty-nine student participants and therefore offers a narrow view on how our automated assessment tool aided students in experiential learning to develop their programming competencies. Moreover, this study was restricted to empirical data gathered from student surveys and no student interviews were held. We were bound by the ethics approval guidelines specified by the institution, which laid the clause of student anonymity since one member of the author team had designed the tool. Interviews would lead to disclosure of students’ identity to their instructor and could have caused some discomfort to the students in freely expressing their opinion. Therefore, we used both ranked and open-ended survey questions which did not compromise on student anonymity. Moreover, with this tool being used in a real-world course delivery within a university environment, we offered the tool to the whole cohort enrolled in the said course, hence it was not possible to make comparisons between control and treatment groups. Overall, based on the data collected from students’ self-evaluation, it is clear that the proposed system provided an excellent learning platform for the whole student cohort.
This study has offered new insights on how an automated assessment can contribute to both curricula and pedagogical practice. The curriculum in this study corresponds to the use of formative exercises in a C ++ programming course (Data Structures and Algorithms) and pedagogic representation shows meaningful integration of an e-assessment tool to support learners and develop their knowledge competencies. Based on our experience with the automarker, as part of future work, we envision that we could also employ a more proactive strategy in motivating students to work on improving their programming competency, rather than waiting for them to make mistakes. Further, to help characterise the student engagement and learning curve in more detail, a system facility that could keep track of their frequency of use, improvements, mistakes committed, number of attempts per task, time of first access relative to submission date, areas needing improvement, etc. would be useful.
Lastly, it leads the authors to believe that there is merit in applying the same system to other remote asynchronous courses, but that would require further investigation as part of future work. We are also working on integrating the automarker tool within the current e-learning system at the institution, as that will provide a single-entry point to students and remove the requirement of using a virtual private network.
- Automated assessment
Data structures and algorithms
Virtual private network
Adam, I. O., Effah, J., & Boateng, R. (2019). Activity theory analysis of the virtualisation of teaching and teaching environment in a developing country university. Education and Information Technologies, 24 (1), 251–276. https://doi.org/10.1007/s10639-018-9774-7
Article Google Scholar
Amelung, M., Krieger, K., & Rösner, D. (2011). E-Assessment as a service. IEEE Transactions on Learning Technologies, 4 (2), 162–174. https://doi.org/10.1109/TLT.2010.24
Barra, E., López-Pernas, S., Alonso, Á., Sánchez-Rada, J. F., Gordillo, A., & Quemada, J. (2020). Automated assessment in programming courses: A case study during the COVID-19 Era. Sustainability . https://doi.org/10.3390/su12187451
Basharina, O. K. (2007). An activity theory perspective on student-reported contradictions in international telecollaboration. Language Learning & Technology, 11 (2), 82–103.
Belhaoues, T., Bensebaa, T., Abdessemed, M., & Bey, A. (2016). AlgoSkills: an ontology of Algorithmic Skills for exercises description and organization. Journal of e-Learning and Knowledge Society, 12 (1), 1826–6223.
Bey, A., Jermann, P., & Dillenbourg, P. (2018). A comparison between two automatic assessment approaches for programming an empirical study on MOOCs. Journal of Educational Technology & Society, 21 (2), 259–272.
Computing_Curricula_2020_Task_Force. (2021). Computing Curricula Report 2020 (ISBN: 978–1–4503–9059–0 ). Retrieved from New York: https://dl.acm.org/citation.cfm?id=3467967
Daniels, H. (2004). Activity theory, discourse and Bernstein. Educational Review, 56 (2), 121–132. https://doi.org/10.1080/0031910410001693218
Daradoumis, T., MarquèsPuig, J. M., Arguedas, M., & CalvetLiñan, L. (2019). Analyzing students’ perceptions to improve the design of an automated assessment tool in online distributed programming. Computers & Education, 128 , 159–170. https://doi.org/10.1016/j.compedu.2018.09.021
Davis, M. (1958). Computability & unsolvability . McGraw-Hill.
Engeström, Y. (1999). Perspectives on activity theory (pp. 19–38). Cambridge University Press.
Book Google Scholar
Gárcia-Mateos, G., & Fernández-Alemán, J. L. (2009). A course on algorithms and data structures using on-line judging . Paper presented at the Proceedings of the 14th annual ACM SIGCSE conference on Innovation and technology in computer science education, Paris, France. https://doi.org/10.1145/1562877.1562897
Gordillo, A. (2019). Effect of an instructor-centered tool for automatic assessment of programming assignments on students’ perceptions and performance. Sustainability . https://doi.org/10.3390/su11205568
Insa, D., & Silva, J. (2018). Automatic assessment of Java code. Computer Languages, Systems & Structures, 53 , 59–72. https://doi.org/10.1016/j.cl.2018.01.004
Lemay, D. J., Basnet, R. B., Doleck, T., Bazelais, P., & Saxena, A. (2021). Instructional interventions for computational thinking: Examining the link between computational thinking and academic performance. Computers and Education Open , 2 , 100056. https://doi.org/10.1016/j.caeo.2021.100056
Luxton-Reilly, A., Simon, Albluwi, I., Becker, B. A., Giannakos, M., Kumar, A. N., Ott, L., Paterson, J., Scott, M. J., Sheard, J., & Szabo, C. (2018). Introductory programming: a systematic literature review. Paper presented at the Proceedings Companion of the 23rd Annual ACM Conference on Innovation and Technology in Computer Science Education, Larnaca, Cyprus . https://doi.org/10.1145/3293881.3295779
Manzoor, H., Naik, A., Shaffer, C. A., North, C., & Edwards, S. H. (2020). Auto-Grading Jupyter Notebooks. Paper presented at the Proceedings of the 51st ACM Technical Symposium on Computer Science Education, Portland, OR, USA . https://doi.org/10.1145/3328778.3366947
Mathrani, S., Mathrani, A., & Khatun, M. (2020). Exogenous and endogenous knowledge structures in dual-mode course deliveries. Computers and Education Open, 1 , 100018. https://doi.org/10.1016/j.caeo.2020.100018
Medeiros, R. P., Ramalho, G. L., & Falcão, T. P. (2019). A Systematic literature review on teaching and learning introductory programming in higher education. IEEE Transactions on Education, 62 (2), 77–90. https://doi.org/10.1109/TE.2018.2864133
Murphy, E., & Rodriguez-Manzanares, M. A. (2008). Using activity theory and its principle of contradictions to guide research in educational technology. Australasian Journal of Educational Technology . https://doi.org/10.14742/ajet.1203
Park, Y., & Jo, I.-H. (2017). Using log variables in a learning management system to evaluate learning activity using the lens of activity theory. Assessment & Evaluation in Higher Education, 42 (4), 531–547. https://doi.org/10.1080/02602938.2016.1158236
Parsons, D., Susnjak, T., & Mathrani, A. (2016). Design from detail: Analyzing data from a global day of coderetreat. Information and Software Technology, 75 , 39–55. https://doi.org/10.1016/j.infsof.2016.03.005
Restrepo-Calle, F., RamírezEcheverry, J. J., & González, F. A. (2019). Continuous assessment in a computer programming course supported by a software tool. Computer Applications in Engineering Education, 27 (1), 80–89. https://doi.org/10.1002/cae.22058
Richey, R. C., Klein, J. D., & Nelson, W. A. (2004). Developmental Research: Studies of Instructional Design and Development. Handbook of research on educational communications and technology (2nd ed., pp. 1099–1130). Lawrence Erlbaum Associates Publishers.
Rubio-Sánchez, M., Kinnunen, P., Pareja-Flores, C., & Velázquez-Iturbide, J. Á. (2012, 29–31 Oct. 2012). Lessons learned from using the automated assessment tool “Mooshak”. Paper presented at the 2012 International Symposium on Computers in Education (SIIE), Andorra la Vella, Andorra .
Skalka, J., & Drlik, M. (2020). Automated Assessment and Microlearning Units as Predictors of At-Risk Students and Students’ Outcomes in the Introductory Programming Courses. Applied Sciences . https://doi.org/10.3390/app10134566
Soll, M., Johannsen, M., & Biemann, C. (2021). Enhancing a Theory-Focused Course Through the Introduction of Automatically Assessed Programming Exercises – Lessons Learned . Universit¨at Hamburg. Hamburg, Germany. Retrieved from http://ceur-ws.org/Vol-2676/paper6.pdf
Souza, D. M. d., Felizardo, K. R., & Barbosa, E. F. (2016). A Systematic Literature Review of Assessment Tools for Programming Assignments. 2016 IEEE 29th International Conference on Software Engineering Education and Training (CSEET) (pp. 147–156).
Stallman, R. M., & GCC DeveloperCommunity. (2009). Using The Gnu Compiler Collection: A Gnu Manual For Gcc Version 4.3.3 . CreateSpace, Scotts Valley, CA, 2009. ISBN 144141276X.
Staubitz, T., Klement, H., Renz, J., Teusner, R., & Meinel, C. (2015, 10–12 Dec. 2015). Towards practical programming exercises and automated assessment in Massive Open Online Courses. Paper presented at the 2015 IEEE International Conference on Teaching, Assessment, and Learning for Engineering (TALE) .
Su, S., Zhang, E., Denny, P., & Giacaman, N. (2021). A Game-Based Approach for Teaching Algorithms and Data Structures using Visualizations. Paper presented at the Proceedings of the 52nd ACM Technical Symposium on Computer Science Education, Virtual Event, USA . https://doi.org/10.1145/3408877.3432520
Tong, A., Sainsbury, P., & Craig, J. (2007). Consolidated criteria for reporting qualitative research (COREQ): A 32-item checklist for interviews and focus groups. International Journal for Quality in Health Care, 19 (6), 349–357. https://doi.org/10.1093/intqhc/mzm042
Twining, P., Heller, R. S., Nussbaum, M., & Tsai, C.-C. (2017). Some guidance on conducting and reporting qualitative studies. Computers & Education, 106 , A1–A9. https://doi.org/10.1016/j.compedu.2016.12.002
Ullah, Z., Lajis, A., Jamjoom, M., Altalhi, A., Al-Ghamdi, A., & Saleem, F. (2018). The effect of automatic assessment on novice programming: Strengths and limitations of existing systems. Computer Applications in Engineering Education, 26 (6), 2328–2341. https://doi.org/10.1002/cae.21974
Watson, C., & Li, F. W. B. (2014). Failure rates in introductory programming revisited. Paper presented at the Proceedings of the 2014 conference on Innovation & technology in computer science education, Uppsala, Sweden . https://doi.org/10.1145/2591708.2591749
Zvacek, S. M. (2015). From know-how to know-why: Lab-created learning. Paper presented at the 2015 3rd Experiment International Conference (exp.at'15), Ponta Delgada, Portugal . https://doi.org/10.1109/EXPAT.2015.7463260
We thank the students who voluntarily participated in the survey of the effectiveness of the automated assessment tool used in this research.
Open Access funding enabled and organized by CAUL and its Member Institutions. No specific funding was obtained for this research, not applicable.
Authors and affiliations.
Bond Business School, Bond University, Gold Coast, QLD, 4226, Australia
Andre L. C. Barczak
School of Mathematical and Computational Sciences, Massey University - Auckland Campus, Private Bag 102904, North Shore City, Auckland, 0745, New Zealand
Andre L. C. Barczak, Anuradha Mathrani, Binglan Han & Napoleon H. Reyes
You can also search for this author in PubMed Google Scholar
ALCB: Conceptualisation, Software, Formal analysis, Investigation, Visualisation, Writing original draft, review and editing. AM: Conceptualisation, Formal analysis, Investigation, Methodology, Visualisation, Writing original draft, review and editing. BH: Investigation, Writing—review. NR: Revision, Software, Writing—review.
Correspondence to Andre L. C. Barczak .
Conflict of interest.
The authors declare that there are no competing interests regarding this manuscript.
The ethics approval for the form was obtained from Massey University Ethics Committee. More information can be given if requested.
Consent for publication
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- This was in a grouped question format that applied to the 5 assignments (Assignment 1-Sparse Matrices, Assignment 2-Linked Lists, Assignment 3-Queues, Assignment 4 -Lists and Assignment 5-Trees
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .
Reprints and Permissions
About this article
Barczak, A.L.C., Mathrani, A., Han, B. et al. Automated assessment system for programming courses: a case study for teaching data structures and algorithms. Education Tech Research Dev 71 , 2365–2388 (2023). https://doi.org/10.1007/s11423-023-10277-2
Accepted : 30 July 2023
Published : 15 August 2023
Issue Date : December 2023
DOI : https://doi.org/10.1007/s11423-023-10277-2
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
- Formative learning
- Practice-driven learning
- Find a journal
- Publish with us
13 Interesting Data Structure Project Ideas and Topics For Beginners 
In the world of computer science, data structure refers to the format that contains a collection of data values, their relationships, and the functions that can be applied to the data. Data structures arrange data so that it can be accessed and worked on with specific algorithms more effectively. In this article, we will list some useful dsa project ideas to help you learn, create, and innovate!
You can also check out our free courses offered by upGrad under machine learning and IT technology.
Data Structure Basics
Data structures can be classified into the following basic types:
- Linked Lists
- Hash tables
Selecting the appropriate setting for your data is an integral part of the programming and problem-solving process. And you can observe that data structures organize abstract data types in concrete implementations. To attain that result, they make use of various algorithms, such as sorting, searching, etc. Learning data structures is one of the important parts in data science courses .
With the rise of big data and analytics , learning about these fundamentals has become almost essential for data scientists. The training typically incorporates various topics in data structure to enable the synthesis of knowledge from real-life experiences. Here is a list of dsa topics to get you started!
Check out our Python Bootcamp created for working professionals.
Benefits of Data structures:
Data structures are fundamental building blocks in computer science and programming. They are important tools that helps inorganizing, storing, and manipulating data efficiently. On top of that it provide a way to represent and manage information in a structured manner, which is essential for designing efficient algorithms and solving complex problems.
So, let’s explore the numerous benefits of Data Structures and dsa topics list in the below post: –
1. Efficient Data Access
Data structures enable efficient access to data elements. Arrays, for example, provide constant-time access to elements using an index. Linked lists allow for efficient traversal and modification of data elements. Efficient data access is crucial for improving the overall performance of algorithms and applications.
2. Memory Management
Data structures help manage memory efficiently. They helps in allocating and deallocating memory resources as per requirement, reducing memory wastage and fragmentation. Remember, proper memory management is important for preventing memory leaks and optimizing resource utilization.
3. Organization of Data
Data structures offers a structured way to organize and store data. For example, a stack organizes data in a last-in, first-out (LIFO) fashion, while a queue uses a first-in, first-out (FIFO) approach. These organizations make it easier to model and solve specific problems efficiently.
4. Search and Retrieval
Efficient data search and retrieval are an important aspect in varied applications, like, databases and information retrieval systems. Data structures like binary search trees and hash tables enable fast lookup and retrieval of data, reducing the time complexity of search operations.
Sorting is a fundamental operation in computer science. Data structures like arrays and trees can implement various sorting algorithms. Efficient sorting is crucial for maintaining ordered data lists and searching for specific elements.
6. Dynamic Memory Allocation
Many programming languages and applications require dynamic memory allocation. Data structures like dynamic arrays and linked lists can grow or shrink dynamically, allowing for efficient memory management in response to changing data requirements.
7. Data Aggregation
Data structures can aggregate data elements into larger, more complex structures. For example, arrays and lists can create matrices and graphs, enabling the representation and manipulation of intricate data relationships.
8. Modularity and Reusability
Data structures promote modularity and reusability in software development. Well-designed data structures can be used as building blocks for various applications, reducing code duplication and improving maintainability.
9. Complex Problem Solving
Data structures play a crucial role in solving complex computational problems. Algorithms often rely on specific data structures tailored to the problem’s requirements. For instance, graph algorithms use data structures like adjacency matrices or linked lists to represent and traverse graphs efficiently.
10. Resource Efficiency
Selecting the right data structure for a particular task can impact the efficiency of an application. Regards to this, Data structures helps in minimizing resource usage, such as time and memory, leading to faster and more responsive software.
Scalability is a critical consideration in modern software development. Data structures that efficiently handle large datasets and adapt to changing workloads are essential for building scalable applications and systems.
12. Algorithm Optimization
Algorithms that use appropriate data structures can be optimized for speed and efficiency. For example, by choosing a hash table data structure, you can achieve constant-time average-case lookup operations, improving the performance of algorithms relying on data retrieval.
13. Code Readability and Maintainability
Well-defined data structures contribute to code readability and maintainability. They provide clear abstractions for data manipulation, making it easier for developers to understand, maintain, and extend code over time.
14. Cross-Disciplinary Applications
Data structures are not limited to computer science; they find applications in various fields, such as biology, engineering, and finance. Efficient data organization and manipulation are essential in scientific research and data analysis.
- It can store variables of various data types.
- It allows the creation of objects that feature various types of attributes.
- It allows reusing the data layout across programs.
- It can implement other data structures like stacks, linked lists, trees, graphs, queues, etc.
Why study data structures & algorithms?
- They help to solve complex real-time problems.
- They improve analytical and problem-solving skills.
- They help you to crack technical interviews.
- Topics in data structure can efficiently manipulate the data.
Studying relevant DSA topics increases job opportunities and earning potential. Therefore, they guarantee career advancement.
Data Structures Project Ideas
1. obscure binary search trees.
Items, such as names, numbers, etc. can be stored in memory in a sorted order called binary search trees or BSTs. And some of these data structures can automatically balance their height when arbitrary items are inserted or deleted. Therefore, they are known as self-balancing BSTs. Further, there can be different implementations of this type, like the BTrees, AVL trees, and red-black trees. But there are many other lesser-known executions that you can learn about. Some examples include AA trees, 2-3 trees, splay trees, scapegoat trees, and treaps.
You can base your project on these alternatives and explore how they can outperform other widely-used BSTs in different scenarios. For instance, splay trees can prove faster than red-black trees under the conditions of serious temporal locality.
Also, check out our business analytics course to widen your horizon.
2. BSTs following the memoization algorithm
Memoization related to dynamic programming. In reduction-memoizing BSTs, each node can memoize a function of its subtrees. Consider the example of a BST of persons ordered by their ages. Now, let the child nodes store the maximum income of each individual. With this structure, you can answer queries like, “What is the maximum income of people aged between 18.3 and 25.3?” It can also handle updates in logarithmic time.
Moreover, such data structures are easy to accomplish in C language. You can also attempt to bind it with Ruby and a convenient API. Go for an interface that allows you to specify ‘lambda’ as your ordering function and your subtree memoizing function. All in all, you can expect reduction-memoizing BSTs to be self-balancing BSTs with a dash of additional book-keeping.
Dynamic coding will need cognitive memorisation for its implementation. Each vertex in a reducing BST can memorise its sub–trees’ functionality. For example, a BST of persons is categorised by their age.
This DSA topics based project idea allows the kid node to store every individual’s maximum salary. This framework can be used to answer the questions like “what’s the income limit of persons aged 25 to 30?”
Checkout: Types of Binary Tree
Explore our Popular Data Science Courses
3. heap insertion time.
When looking for data structure projects , you want to encounter distinct problems being solved with creative approaches. One such unique research question concerns the average case insertion time for binary heap data structures. According to some online sources, it is constant time, while others imply that it is log(n) time.
But Bollobas and Simon give a numerically-backed answer in their paper entitled, “Repeated random insertion into a priority queue.” First, they assume a scenario where you want to insert n elements into an empty heap. There can be ‘n!’ possible orders for the same. Then, they adopt the average cost approach to prove that the insertion time is bound by a constant of 1.7645.
When looking for Data Structures tasks in this project idea, you will face challenges that are addressed using novel methods. One of the interesting research subjects is the mean response insertion time for the sequential heap DS.
Inserting ‘n’ components into an empty heap will yield ‘n!’ arrangements which you can use in suitable DSA projects in C++ . Subsequently, you can implement the estimated cost approach to specify that the inserting period is limited by a fixed constant.
Our learners also read : Excel online course free !
4. Optimal treaps with priority-changing parameters
Treaps are a combination of BSTs and heaps. These randomized data structures involve assigning specific priorities to the nodes. You can go for a project that optimizes a set of parameters under different settings. For instance, you can set higher preferences for nodes that are accessed more frequently than others. Here, each access will set off a two-fold process:
- Choosing a random number
- Replacing the node’s priority with that number if it is found to be higher than the previous priority
As a result of this modification, the tree will lose its random shape. It is likely that the frequently-accessed nodes would now be near the tree’s root, hence delivering faster searches. So, experiment with this data structure and try to base your argument on evidence.
Also read : Python online course free !
At the end of the project, you can either make an original discovery or even conclude that changing the priority of the node does not deliver much speed. It will be a relevant and useful exercise, nevertheless.
Constructing a heap involves building an ordered binary tree and letting it fulfill the “heap” property. But if it is done using a single element, it would appear like a line. This is because in the BST, the right child should be greater or equal to its parent, and the left child should be less than its parent. However, for a heap, every parent must either be all larger or all smaller than its children.
The numbers show the data structure’s heap arrangement (organized in max-heap order). The alphabets show the tree portion. Now comes the time to use the unique property of treap data structure in DSA projects in C++ . This treap has only one arrangement irrespective of the order by which the elements were chosen to build the tree.
You can use a random heap weight to make the second key more useful. Hence, now the tree’s structure will completely depend on the randomized weight offered to the heap values. In the file structure mini project topics , we obtain randomized heap priorities by ascertaining that you assign these randomly.
Top Data Science Skills to Learn
upGrad’s Exclusive Data Science Webinar for you –
Transformation & Opportunities in Analytics & Insights
5. Research project on k-d trees
K-dimensional trees or k-d trees organize and represent spatial data. These data structures have several applications, particularly in multi-dimensional key searches like nearest neighbor and range searches. Here is how k-d trees operate:
- Every leaf node of the binary tree is a k-dimensional point
- Every non-leaf node splits the hyperplane (which is perpendicular to that dimension) into two half-spaces
- The left subtree of a particular node represents the points to the left of the hyperplane. Similarly, the right subtree of that node denotes the points in the right half.
You can probe one step further and construct a self-balanced k-d tree where each leaf node would have the same distance from the root. Also, you can test it to find whether such balanced trees would prove optimal for a particular kind of application.
Also, visit upGrad’s Degree Counselling page for all undergraduate and postgraduate programs.
Read our popular Data Science Articles
With this, we have covered five interesting ideas that you can study, investigate, and try out. Now, let us look at some more projects on data structures and algorithms .
Read : Data Scientist Salary in India
6. Knight’s travails
In this project, we will understand two algorithms in action – BFS and DFS. BFS stands for Breadth-First Search and utilizes the Queue data structure to find the shortest path. Whereas, DFS refers to Depth-First Search and traverses Stack data structures.
For starters, you will need a data structure similar to binary trees. Now, suppose that you have a standard 8 X 8 chessboard, and you want to show the knight’s movements in a game. As you may know, a knight’s basic move in chess is two forward steps and one sidestep. Facing in any direction and given enough turns, it can move from any square on the board to any other square.
If you want to know the simplest way your knight can move from one square (or node) to another in a two-dimensional setup, you will first have to build a function like the one below.
- knight_plays([0,0], [1,2]) == [[0,0], [1,2]]
- knight_plays([0,0], [3,3]) == [[0,0], [1,2], [3,3]]
- knight_plays([3,3], [0,0]) == [[3,3], [1,2], [0,0]]
Furthermore, this project would require the following tasks:
- Creating a script for a board game and a night
- Treating all possible moves of the knight as children in the tree structure
- Ensuring that any move does not go off the board
- Choosing a search algorithm for finding the shortest path in this case
- Applying the appropriate search algorithm to find the best possible move from the starting square to the ending square.
7. Fast data structures in non-C systems languages
Programmers usually build programs quickly using high-level languages like Ruby or Python but implement data structures in C/C++. And they create a binding code to connect the elements. However, the C language is believed to be error-prone, which can also cause security issues. Herein lies an exciting project idea.
You can implement a data structure in a modern low-level language such as Rust or Go, and then bind your code to the high-level language. With this project, you can try something new and also figure out how bindings work. If your effort is successful, you can even inspire others to do a similar exercise in the future and drive better performance-orientation of data structures.
Also read: Data Science Project Ideas for Beginners
8. Search engine for data structures
The software aims to automate and speed up the choice of data structures for a given API. This project not only demonstrates novel ways of representing different data structures but also optimizes a set of functions to equip inference on them. We have compiled its summary below.
- The data structure search engine project requires knowledge about data structures and the relationships between different methods.
- It computes the time taken by each possible composite data structure for all the methods.
- Finally, it selects the best data structures for a particular case.
Read: Data Mining Project Ideas
9. Phone directory application using doubly-linked lists
This project can demonstrate the working of contact book applications and also teach you about data structures like arrays, linked lists, stacks, and queues. Typically, phone book management encompasses searching, sorting, and deleting operations. A distinctive feature of the search queries here is that the user sees suggestions from the contact list after entering each character. You can read the source-code of freely available projects and replicate the same to develop your skills.
This project demonstrates how to address the book programs’ function. It also teaches you about queuing, stacking, linking lists, and arrays. Usually, this project’s directory includes certain actions like categorising, scanning, and removing. Subsequently, the client shows recommendations from the address book after typing each character. This is the web searches’ unique facet. You can inspect the code of extensively used DSA projects in C++ and applications and ultimately duplicate them. This helps you to advance your data science career.
10. Spatial indexing with quadtrees
The quadtree data structure is a special type of tree structure, which can recursively divide a flat 2-D space into four quadrants. Each hierarchical node in this tree structure has either zero or four children. It can be used for various purposes like sparse data storage, image processing, and spatial indexing.
Spatial indexing is all about the efficient execution of select geometric queries, forming an essential part of geo-spatial application design. For example, ride-sharing applications like Ola and Uber process geo-queries to track the location of cabs and provide updates to users. Facebook’s Nearby Friends feature also has similar functionality. Here, the associated meta-data is stored in the form of tables, and a spatial index is created separately with the object coordinates. The problem objective is to find the nearest point to a given one.
You can pursue quadtree data structure projects in a wide range of fields, from mapping, urban planning, and transportation planning to disaster management and mitigation. We have provided a brief outline to fuel your problem-solving and analytical skills.
QuadTrees are techniques for indexing spatial data. The root node signifies the whole area and every internal node signifies an area called a quadrant which is obtained by dividing the area enclosed into half across both axes. These basics are important to understand QuadTrees-related data structures topics.
Objective: Creating a data structure that enables the following operations
- Insert a location or geometric space
- Search for the coordinates of a specific location
- Count the number of locations in the data structure in a particular contiguous area
One of the leading applications of QuadTrees in the data structure is finding the nearest neighbor. For example, you are dealing with several points in a space in one of the data structures topics . Suppose somebody asks you what’s the nearest point to an arbitrary point. You can search in a quadtree to answer this question. If there is no nearest neighbor, you can specify that there is no point in this quadrant to be the nearest neighbor to an arbitrary point. Consequently, you can save time otherwise spent on comparisons.
Spatial indexing with Quadtrees is also used in image compression wherein every node holds the average color of each child. You get a more detailed image if you dive deeper into the tree. This project idea is also used in searching for the nods in a 2D area. For example, you can use quadtrees to find the nearest point to the given coordinates.
Follow these steps to build a quadtree from a two-dimensional area:
- Divide the existing two-dimensional space into four boxes.
- Create a child object if a box holds one or more points within. This object stores the box’s 2D space.
- Don’t create a child for a box that doesn’t include any points.
- Repeat these steps for each of the children.
- You can follow these steps while working on one of the file structure mini project topics .
11. Graph-based projects on data structures
You can take up a project on topological sorting of a graph. For this, you will need prior knowledge of the DFS algorithm. Here is the primary difference between the two approaches:
- We print a vertex & then recursively call the algorithm for adjacent vertices in DFS.
- In topological sorting, we recursively first call the algorithm for adjacent vertices. And then, we push the content into a stack for printing.
Therefore, the topological sort algorithm takes a directed acyclic graph or DAG to return an array of nodes.
Let us consider the simple example of ordering a pancake recipe. To make pancakes, you need a specific set of ingredients, such as eggs, milk, flour or pancake mix, oil, syrup, etc. This information, along with the quantity and portions, can be easily represented in a graph.
But it is equally important to know the precise order of using these ingredients. This is where you can implement topological ordering. Other examples include making precedence charts for optimizing database queries and schedules for software projects. Here is an overview of the process for your reference:
- Call the DFS algorithm for the graph data structure to compute the finish times for the vertices
- Store the vertices in a list with a descending finish time order
- Execute the topological sort to return the ordered list
12. Numerical representations with random access lists
In the representations we have seen in the past, numerical elements are generally held in Binomial Heaps. But these patterns can also be implemented in other data structures. Okasaki has come up with a numerical representation technique using binary random access lists. These lists have many advantages:
- They enable insertion at and removal from the beginning
- They allow access and update at a particular index
Know more: The Six Most Commonly Used Data Structures in R
13. Stack-based text editor
Your regular text editor has the functionality of editing and storing text while it is being written or edited. So, there are multiple changes in the cursor position. To achieve high efficiency, we require a fast data structure for insertion and modification. And the ordinary character arrays take time for storing strings.
You can experiment with other data structures like gap buffers and ropes to solve these issues. Your end objective will be to attain faster concatenation than the usual strings by occupying smaller contiguous memory space.
This project idea handles text manipulation and offers suitable features to improve the experience. The key functionalities of text editors include deleting, inserting, and viewing text. Other features needed to compare with other text editors are copy/cut and paste, find and replace, sentence highlighting, text formatting, etc.
This project idea’s functioning depends on the data structures you determined to use for your operations. You will face tradeoffs when choosing among the data structures. This is because you must consider the implementation difficulty for the memory and performance tradeoffs. You can use this project idea in different file structure mini project topics to accelerate the text’s insertion and modification.
Data structure skills form the bedrock of software development, particularly when it comes to managing large sets of data in today’s digital ecosystem. Leading companies like Adobe, Amazon, and Google hire for various lucrative job positions in the data structure and algorithm domain. And in interviews, recruiters test not only your theoretical knowledge but also your practical skills. So, practice the above data structure projects to get your foot in the door!
If you are curious to learn about data science, check out IIIT-B & upGrad’s Executive PG Programme in Data Science which is created for working professionals and offers 10+ case studies & projects, practical hands-on workshops, mentorship with industry experts, 1-on-1 with industry mentors, 400+ hours of learning and job assistance with top firms.
Something went wrong
Our Trending Data Science Courses
- Data Science for Managers from IIM Kozhikode - Duration 8 Months
- Executive PG Program in Data Science from IIIT-B - Duration 12 Months
- Master of Science in Data Science from LJMU - Duration 18 Months
- Executive Post Graduate Program in Data Science and Machine LEarning - Duration 12 Months
- Master of Science in Data Science from University of Arizona - Duration 24 Months
Our Popular Data Science Course
Data Science Skills to Master
- Data Analysis Courses
- Inferential Statistics Courses
- Hypothesis Testing Courses
- Logistic Regression Courses
- Linear Regression Courses
- Linear Algebra for Analysis Courses
Frequently Asked Questions (FAQs)
There are certain types of containers that are used to store data. These containers are nothing but data structures. These containers have different properties associated with them, which are used to store, organize, and manipulate the data stored in them. There can be two types of data structures based on how they allocate the data. Linear data structures like arrays and linked lists and dynamic data structures like trees and graphs.
In linear data structures, each element is linearly connected to each other having reference to the next and previous elements whereas in non-linear data structures, data is connected in a non-linear or hierarchical manner. Implementing a linear data structure is much easier than a non-linear data structure since it involves only a single level. If we see memory-wise then the non-linear data structures are better than their counterpart since they consume memory wisely and do not waste it.
You can see applications based on data structures everywhere around you. The google maps application is based on graphs, call centre systems use queues, file explorer applications are based on trees, and even the text editor that you use every day is based upon stack data structure and this list can go on. Not just applications, but many popular algorithms are also based on these data structures. One such example is that of the decision trees. Google search uses trees to implement its amazing auto-complete feature in its search bar.
Explore Free Courses
Learn more about the education system, top universities, entrance tests, course information, and employment opportunities in Canada through this course.
Advance your career in the field of marketing with Industry relevant free courses
Build your foundation in one of the hottest industry of the 21st century
Master industry-relevant skills that are required to become a leader and drive organizational success
Build essential technical skills to move forward in your career in these evolving times
Get insights from industry leaders and career counselors and learn how to stay ahead in your career
Kickstart your career in law by building a solid foundation with these relevant free courses.
Stay ahead of the curve and upskill yourself on Generative AI and ChatGPT
Build your confidence by learning essential soft skills to help you become an Industry ready professional.
Learn more about the education system, top universities, entrance tests, course information, and employment opportunities in USA through this course.
by Rohit Sharma
04 Oct 2023
by Keerthi Shivakumar
26 Sep 2023
20 Sep 2023
19 Sep 2023
17 Sep 2023
15 Sep 2023
14 Sep 2023
- New Zealand
- United States
Understanding the Important Difference Between Data Structures and Algorithms in Software Engineering
Software engineering relies heavily on the integration of data structures and algorithms to solve complex problems efficiently. As a software engineer, it is crucial to understand the role of data structures and algorithms in developing robust and high-performing software systems.
Here’s a comprehensive overview difference between data structures and algorithms, their individual roles, and a comparison between them to help you gain a better understanding of their significance in software engineering.
Introduction to data structures and algorithms
Data structures and algorithms are fundamental concepts in computer science and software engineering.
Defining data structures
Data structures are containers that hold and organise data in a particular format. They provide an efficient way of storing and accessing data, which is essential for performing various operations on it.
Each data structure has its own set of rules, operations, and memory requirements, catering to different scenarios and optimising specific types of operations. These structures act as the building blocks for organising vast amounts of information in software systems.
When it comes to data structures, there are many different types to choose from. Some common examples include arrays, linked lists, stacks, queues, trees, and graphs. Each of these structures has its own unique characteristics and advantages, making them suitable for different types of data and operations.
For example, arrays are ideal for storing a fixed number of elements of the same type, while linked lists are great for dynamic data that can grow or shrink in size.
Data structures can be classified as either linear or non-linear. Linear data structures, such as arrays and linked lists, organise data in a sequential manner, where each element has a direct relationship with its neighbouring elements.
On the other hand, non-linear data structures, such as trees and graphs, allow for more complex relationships between elements, forming hierarchical or interconnected structures.
Algorithms , on the other hand, are step-by-step procedures or processes used for solving specific problems. They take input data, perform a series of operations on it, and produce the desired output.
Algorithms provide a systematic approach to problem-solving by defining a clear set of instructions or rules to be followed. They form the backbone of software systems and determine the efficiency and effectiveness of various operations performed on data.
Algorithms can be found in various aspects of our daily lives, not just in computer science. For example, think about a recipe for baking a cake.
The recipe is essentially an algorithm that guides you through a series of steps, such as mixing ingredients, preheating the oven, and baking for a specific amount of time. By following the recipe, you can consistently produce the desired outcome – a delicious cake!
In computer science, algorithms are designed to solve specific problems efficiently. They can range from simple and straightforward to complex and intricate. Some common algorithmic techniques include searching, sorting, graph traversal, and dynamic programming.
These techniques provide a foundation for solving a wide range of problems, from finding the shortest path in a network to optimising resource allocation in a computer system.
It’s important to note that the efficiency of an algorithm can vary significantly depending on its design and implementation.
The same problem can often be solved using different algorithms, each with its own trade-offs in terms of time complexity, space complexity, and other factors. As a software engineer, it’s crucial to understand these trade-offs and choose the most appropriate algorithm for a given problem.
Data structures and algorithms are essential components of computer science and software engineering. Data structures provide a way to organise and store data efficiently, while algorithms offer systematic approaches to problem-solving.
By understanding and utilising these concepts effectively, software engineers can develop efficient and robust software systems that can handle large amounts of data and solve complex problems.
The role of data structures in software engineering
Data structures play a crucial role in software engineering by providing efficient ways of organising and managing data.
Types of data structures
There is a wide range of data structures available, each tailored to specific scenarios and requirements. As previously mentioned, some common types of data structures include:
- Linked Lists
- Hash Tables
Each data structure has its unique characteristics and advantages, making it suitable for different purposes. For example, arrays provide constant-time access to elements using their index, while linked lists offer efficient insertion and deletion operations.
Practical applications of data structures
Data structures find applications in various areas of software engineering. For instance, databases utilise data structures such as B-trees for efficient storage and retrieval of data.
Data structures, like stacks and queues, are integral components of operating systems, facilitating process scheduling and memory management. Efficient algorithms for sorting and searching often rely on optimised data structures to achieve optimal performance.
The role of algorithms in software engineering
Algorithms are fundamental to software engineering as they determine how efficiently a task can be performed.
Types of algorithms
There are various types of algorithms designed to solve specific problems efficiently. Some commonly used algorithms include:
- Sorting Algorithms (e.g., Bubble Sort, Quicksort)
- Searching Algorithms (e.g., Binary Search, Depth-First Search)
- Graph Algorithms (e.g., Dijkstra’s Algorithm, Kruskal’s Algorithm)
- Hashing Algorithms (e.g., SHA-256, MD5)
- Dynamic Programming Algorithms (e.g., Fibonacci Sequence)
Each algorithm has its own set of rules and complexities, determining its efficiency in solving specific problems. For example, sorting algorithms vary in their execution time and space requirements, which affects their suitability for different-sized datasets.
Practical applications of algorithms
Algorithms are essential for various software engineering tasks. For instance, search engines employ sophisticated algorithms to process and rank massive amounts of web content efficiently. Encryption algorithms also play a crucial role in securing data during communication and storage.
Moreover, image processing algorithms enable image recognition and manipulation, empowering applications like facial recognition and computer vision.
Difference between data structures and algorithms
Put simply, data structures refer to the organisation and storage of data, while data algorithms are sets of instructions that outline how to perform specific tasks or operations on that data in a step-by-step manner.
Similarities between data structures and algorithms
Both data structures and algorithms play significant roles in software engineering and rely on each other to achieve efficient solutions. They are complementary and intertwined concepts that work together to optimise software systems.
Differences between data structures and algorithms
The main difference lies in their focus and functionality. Data structures are concerned with organising and storing data, while algorithms focus on solving problems and manipulating data. Data structures provide the foundation for algorithms to operate efficiently by structuring and managing the underlying data.
Case studies: Data structures and algorithms in action
Let’s delve into some real-world case studies to understand how data structures and algorithms are applied in software engineering:
Data structures in database management
Database management systems heavily rely on data structures to store and organise vast amounts of structured data. Indexing techniques, such as B-trees, enable quick data retrieval by efficiently locating records based on search criteria.
Additionally, graph data structures represent relationships between entities, allowing efficient traversal and analysis of complex data models.
Algorithms in search engines
Search engines, like Google or Bing, utilise sophisticated algorithms to process and rank web pages based on their relevance to user queries.
These algorithms analyse and extract valuable information from massive datasets, considering factors like keyword usage, backlinks, and user behaviour. By employing efficient search algorithms, search engines provide users with relevant search results in a fraction of a second.
Data structures and algorithms are fundamental concepts in software engineering, intricately woven together to enable efficient problem-solving and data manipulation.
Understanding the difference between data structures and algorithms is crucial for software engineers to develop optimised software systems. By employing appropriate data structures and algorithms, developers can create robust and high-performing applications that meet the growing demands of the digital age.
If you’re looking to expand on your knowledge of data structures and algorithms to learn how they can be used to develop optimised software systems, then take a look at the short courses on offer at The Institute of Data.
You can also take advantage of our free career consultations with our local team to see where your career could take you.
Comparing Mechatronics and Software Engineering
How to Teach Yourself Software Engineering
Understanding Risk Exposure in Software Engineering
Exploring Design Heuristics in Software Engineering
Data Engineering and Software Engineering: What’s the Difference?
© Institute of Data. All rights reserved.
Copy Link to Clipboard
Something went wrong. Wait a moment and try again.
Click through the PLOS taxonomy to find articles in your field.
For more information about PLOS Subject Areas, click here .
Exploration of street space architectural color measurement based on street view big data and deep learning—A case study of Jiefang North Road Street in Tianjin
Contributed equally to this work with: Xin Han, Ying Yu
Roles Writing – original draft
Affiliation Department of Landscape Architecture, Kyungpook National University, Daegu, South Korea
Roles Data curation, Formal analysis, Supervision, Validation
Affiliation Department of Landscape Architecture, College of Forestry, Shandong Agricultural University, Taian, China
Roles Formal analysis
Affiliation School of Architecture, Harbin Institute of Technology, Shenzhen, China
Affiliation Gengdan Academy of Design, Gengdan Institute of Beijing University of Technology, Beijing, China
Roles Investigation, Methodology, Project administration, Software, Writing – original draft
* E-mail: [email protected] (LW); [email protected] (TZ); [email protected] (FT); [email protected] (YS); [email protected] (ML); [email protected] (SY); [email protected] (HP); [email protected] (JZ)
Affiliation School of Architecture, Tianjin University, Tianjin, China
Roles Conceptualization, Resources, Supervision, Writing – review & editing
Roles Software, Supervision, Visualization, Writing – original draft
Roles Formal analysis, Investigation, Methodology
Affiliation School of Cultural Heritage, Northwest University, Xi’an, China
Roles Software, Supervision, Visualization, Writing – review & editing
Roles Funding acquisition, Resources, Validation
Affiliations School of Architecture, Tianjin University, Tianjin, China, School of Architecture and Urban Planning, Lanzhou Jiaotong University, Lanzhou, China
Roles Funding acquisition, Investigation, Methodology, Project administration
Affiliation School of Architecture and Urban-Rural Planning, Fuzhou University, Fuzhou, China
Roles Formal analysis, Supervision
Roles Resources, Software, Validation, Visualization
Affiliation Chengdu Tianfu New Area Institute of Planning & Design Co., Ltd, Chengdu, China
Roles Formal analysis, Writing – review & editing
Affiliation Department of Tourism, Management College, Ocean University of China, Qingdao, China
Roles Methodology, Resources, Supervision
Affiliation Landscape Architecture Research Center, Shandong Jianzhu University, Jinan, China
- [ ... ],
Affiliation Fuzhou Planning & Design Research Institute Group Co. Ltd, Fuzhou, China
- [ view all ]
- [ view less ]
- Xin Han,
- Ying Yu,
- Lei Liu,
- Ming Li,
- Lei Wang,
- Tianlin Zhang,
- Fengliang Tang,
- Yingning Shen,
- Mingshuai Li,
- Published: November 30, 2023
- Peer Review
- Reader Comments
Urban space architectural color is the first feature to be perceived in a complex vision beyond shape, texture and material, and plays an important role in the expression of urban territory, humanity and style. However, because of the difficulty of color measurement, the study of architectural color in street space has been difficult to achieve large-scale and fine development. The measurement of architectural color in urban space has received attention from many disciplines. With the development and promotion of information technology, the maturity of street view big data and deep learning technology has provided ideas for the research of street architectural color measurement. Based on this background, this study explores a highly efficient and large-scale method for determining architectural colors in urban space based on deep learning technology and street view big data, with street space architectural colors as the research object. We conducted empirical research in Jiefang North Road, Tianjin. We introduced the SegNet deep learning algorithm to semantically segment the street view images, extract the architectural elements and optimize the edges of the architecture. Based on K-Means clustering model, we identified the colors of the architectural elements in the street view. The accuracy of the building color measurement results was cross-sectionally verified by means of a questionnaire survey. The validation results show that the method is feasible for the study of architectural colors in street space. Finally, the overall coordination, sequence continuity, and primary and secondary hierarchy of architectural colors of Jiefang North Road in Tianjin were analyzed. The results show that the measurement model can realize the intuitive expression of architectural color information, and also can assist designers in the analysis of architectural color in street space with the guidance of color characteristics. The method helps managers, planners and even the general public to summarize the characteristics of color and dig out problems, and is of great significance in the assessment and transformation of the color quality of the street space environment.
Citation: Han X, Yu Y, Liu L, Li M, Wang L, Zhang T, et al. (2023) Exploration of street space architectural color measurement based on street view big data and deep learning—A case study of Jiefang North Road Street in Tianjin. PLoS ONE 18(11): e0289305. https://doi.org/10.1371/journal.pone.0289305
Editor: Ahmed Mancy Mosa, Al Mansour University College-Baghdad-Iraq, IRAQ
Received: March 10, 2023; Accepted: July 15, 2023; Published: November 30, 2023
Copyright: © 2023 Han et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: Data are available at https://doi.org/10.6084/m9.figshare.22250176.v1 .
Funding: Young Scholars Science Foundation of Lanzhou Jiaotong University (2020033). In this study, the sponsor Hongxu Peng participated in the study design and data analysis. The funding is Fujian Provincial Social Science Planning Project "Study on the Influence and Significance of Zhu Zi Studies on Korean Confucian Habitat Culture"(Grant No.FJ2021C035).
Competing interests: The authors have declared that no competing interests exist.
Architectural color is an intuitive reflection of the city’s natural characteristics and regional culture, and a true expression of the quality of the urban environment [ 1 ]. The determination of the current state of architectural color is both a respect for the natural characteristics of the city and regional culture, and a basic way to analyze the current problem, which is directly related to the renewal and renovation of architecture as well as new construction [ 2 ]. Color in the interior of buildings has also been studied by some scholars [ 3 ]. Although the urban space architectural color measurement has achieved some success in recent years, but limited by the longtime of architectural color data collection, excessive human and material resources consumption and other limitations, the mainstream color measurement method is still mainly quantitative research, but also part of the quantitative research using a combination of local architectural color sampling and expert interviews [ 4 ]. Large-scale research and quantitative analysis can reveal the current characteristics of urban architectural color more objectively, weaken the interference of subjective consciousness in qualitative research, and then dig out the characteristics and shortcomings of urban architectural color, and put forward more targeted solutions and suggestions [ 5 ]. Therefore, the measurement of architectural color on a large coverage and with high efficiency is a fundamental and important part of urban color research [ 6 ]. The exploration of effective large-scale data acquisition and quantitative analysis technology has attracted extensive attention in the academic circle.
With the convenient data access and the development of information processing technology, APIs provided by map services for scholars and researchers to extract spatially high-resolution images of streets and neighborhoods are increasingly used in interdisciplinary research, providing scientific support for the development of different disciplines [ 7 – 9 ]. Scholars and research institutions have done a lot of exploration on quantitative urban research based on street view images [ 10 , 11 ]. Zhang used Google Street View in combination with various open source data to identify mechanisms that affect the built environment of cities [ 12 ], Han used Google Street View images combined with machine learning techniques to make predictions about the perception of stress in city streets [ 13 ], Wang explored different perception conditions of urban streets using Baidu Street View images combined with spatial syntax [ 14 ], Yao explores six different perceptions of urban street conditions by combining street view images with human-machine adversarial models [ 7 ]. Using big data and deep learning technology of street view images has overcome the difficulties of using traditional data for street space research to a certain extent, and has led to changes in the perspective, conditions, scales, and methods for street research [ 15 – 19 ]. Despite the fact that many scholars have conducted some research and exploration on urban space using street view images, the use of street view image big data combined with deep learning to measure the color of urban space architectural research is still in the initial stage.
Baidu Maps was found to have better coverage of the Chinese region than other maps, allowing for a more detailed study of the Chinese study area [ 13 ]. In this study, Baidu Street View was used as the data source, and Jiefang North Road in Tianjin was chosen as the research space for urban architectural color measurement. The study was conducted from the selection of the research object to the selection of the color extraction method, and finally to the color expression and its application. The research aims of this study were twofold. (1) The construction of a whole process of street space architectural color measurement based on street view big data and deep learning technology was introduced, encompassing data set and neural network selection, neural network construction and training, and color extraction and analysis. This was done to provide technical support for the objective quantitative analysis of street space architectural color by validating and reflecting on the results. (2) The analysis of the color of the street space of Jiefang North Road in Tianjin was conducted, summarizing the advantages and potential problems of the current color. Targeted design guidelines and optimization suggestions were proposed based on the results of color attribute analysis. The joint analysis method of Street View Big Data and deep learning was implemented, providing researchers and urban planners with more targeted data on architectural color perception in urban space, and advancing urban planning practice.
Fig 1 shows the conceptual framework of architectural color measurement in urban space from a holistic perspective. Firstly, we download the road network data of Tianjin North Road according to the study area, take a street point every 20 meters on the road network, adjust the street view image acquisition parameters by simulating the pedestrian’s perspective, apply the Baidu Map API to collect the urban street view data, set the street view saving location, and finally obtain all the street view images in the study area. The Cityscapes dataset was used to train the SegNet deep learning neural network, and then semantic segmentation was performed on the Street View images of the study area to obtain the visual element data of the city streets. In order to separate the building elements from the image, the non-building visual elements of the semantic segmented street view image are masked, and then the edges of the building elements are optimized to highlight the edges of the building elements. Volunteers were invited to evaluate the feasibility of architectural colors identified by K-Means algorithm to prove the effectiveness of the method. Finally, a summary analysis of the fundamental colors of Tianjin North Road was conducted, the saturation of the fundamental colors and their color value in the city streets were analyzed, and the overall coordination, sequence continuity and main hierarchy of the street architectural colors were analyzed. Based on the results of the analysis, we propose design guidelines and optimization suggestions.
- PPT PowerPoint slide
- PNG larger image
- TIFF original image
Jiefang North Road Street is located in the eastern part of the Heping District of Tianjin, near the west bank of the Haihe River, from Jiefang Bridge in the northwest to Xuzhou Road in the southeast and is the most historical financial street in Tianjin. Since the opening of Tianjin to commerce in 1860, the Jiefang North Road Street was the first to be developed and has experienced the peaks and valleys of prosperity-depression-boom-depression. In 2019, the Tianjin Municipal Planning Bureau proposed a new plan for the Jiefang North Road Street area, proposing to revitalize the stock of architecture, update the commercial ecology, improve the quality of the street space, enhance urban vitality, and develop it into a high-end service industry gathering area featuring tourism and leisure, business and finance, on the basis of historical culture, transportation hub resources and historical architectural preservation. In addition, there are 9 roads in Jiefang North Road Street, including Jiefang North Road, which is known as the "Oriental Wall Street". Jiefang North Road is one of the priority style control areas in Tianjin, but in the process of repeated construction and renovation, it is strictly required to protect the integrity and authenticity of the street architecture and ancillary architectural features, so that the style is intact. The classical style of stately and stable appeal and the imitation of architectural materials such as reinforced concrete, marble, granite and brick red ensure the stability of the overall color of the street. Therefore, the Jiefang North Road, which has a high reputation, was selected for the specific operation of architectural color measurement and validation of the color measurement results.
Street view data acquisition and semantic segmentation
Bsvis data collection..
In this study, we used the structure of "one vertical and eight horizontal" in the latest planning to study the architectural color of the street space in Jiefang North Road area, where one vertical is Jiefang North Road and eight horizontals are Changchun Road, Binjiang Road, Harbin Road, Chifeng Road, Chengde Road, Yingkou Road, Datong Road and Dalian Road ( Fig 2 ). Street View data allows perception and observation of the urban environment from a human-centered perspective [ 20 ]. The Street View data platform not only provides street view browsing services for web users, but also publishes an application program interface (API). According to the panoramic static map document ( https://lbsyun.baidu.com/index.php?title=viewstatic ) in the web service API in Baidu Maps Open Platform, we can Submitting the appropriate parameters to call the API for Baidu Street View map acquisition. After reviewing the site and reviewing Baidu Maps’ data policy, we confirm that we are in compliance with Baidu’s terms of service [ 21 ]. In order to obtain the street view images from the pedestrian perspective along the road in the study area of Jiefang North Road in Tianjin, the parameters for the street view big data acquisition need to be set [ 22 ]. In order to ensure the low repeatability and perceptibility of the sampled data, a sampling distance of 20m was chosen to obtain the street view data collection points, and the same angle of street view image data was acquired in the sequence of human moving. The starting and ending coordinates of street view images collected from different roads are shown in Table 1 . The URL example for collecting street views is as follows: http://api.map.baidu.com/panorama/v2ak=YOUR_KEY&width=1024&height=512&location=120.219699226437,30.203814207756&pitch=20&fov=150&heading=90 .
In this URL, ’ak’ is the key for the map plane, ’width’ is the image width, ranging from 10 to 1024 pixels, and ’height’ is the image height, ranging from 10 to 512 pixels. ’Fov’ is the horizontal distance, and the horizontal viewing angle is set to 150 degrees, based on human visual habits. ’Heading’ is the horizontal angle, set to 90 degrees after multiple experiments. ’Location’ is the spatial geographic coordinates of the image data to be submitted to the server.
Deep learning-based image segmentation.
In order to better identify the color of architectural street space, this study uses SegNet network structure based on ResNet backbone model for training, validation and prediction of the dataset. SegNet ( Fig 3 ) is an open-source project for image segmentation developed by a team at the University of Cambridge. It is a type of convolutional neural network with the main motivation of efficient scene understanding and can segment images at the pixel level based on the information of object features in the image, and is widely used in various industries for scene understanding applications [ 14 ]. The core structure consists of an encoder network, a corresponding decoder network and a classification layer. The encoder follows the network model of VGG16, which mainly parses the objects in the image scene according to their labels, and the decoder corresponds the parsed information to the final image form, with each pixel using a color corresponding to its object information [ 23 – 25 ].
By analyzing different datasets, the CityScapes data were finally selected for neural network training. CityScapes data was originally used for car self-driving training, and the scenarios include 50 different cities, daytime scenes in spring, summer and autumn, and a variety of scene such as video and pictures in addition to rain and snow [ 26 ]. The dataset contains 5000 pixel-level finely labeled images of street scenes, which are divided into three parts: training, validation, and test sets. The training and validation sets are used to jointly adjust the model parameters to arrive at the optimal model, and the test set is used to measure the generalization ability of the optimal model in practice. Of these, 2975 are used for training, 500 for validation, and 1525 for testing. The partitioning criteria ensure that each partitioned dataset contains various scenarios to ensure that the training results can be used to predict a variety of scenarios, and the labels are divided into 30 categories, which are mostly used by professionals in street space research due to their large data quantities, rich scenarios, fine labeling and easy availability of open source [ 13 , 14 , 22 ]. In this study, the 5000 finely labeled images are simplified into two parts, training and validation, and the labels are reclassified into 19 categories according to the actual needs. The training results show that the accuracy in the training set is 90.83% and the accuracy in the validation set is 89.95%. This accuracy can achieve the work of image semantic segmentation effectively.
Street view data cleaning—architecture in the spotlight
Masking of non-architectural elements for architectural recognition..
After semantic segmentation of the sample street view image, in order to separate the building elements from the image content to avoid the interference of a large number of vegetation, vehicles and other elements on the architectural elements, the remaining part of the street view image needs to be masked. The image mask is to control the area of image processing by masking the street view image to be processed with selected images or objects. Its main purpose is to extract the architectural area in the street view image, and multiply the pre-made mask of the architectural part (the prediction result) with the image to be processed (the original image) to get the image of the architectural part, so that the image value of the architectural area remains unchanged, while the image values outside the area are all 0 (solid black) ( Fig 4 ), so as to achieve the purpose of architectural recognition.
Optimization of the edges of architectural elements.
After the mask processing, it can be found that the semantic segmentation result has enough accuracy for architectural recognition, but there are more fragmented parts on the edges. In order to achieve more detailed architectural recognition, it is necessary to process the fragmented parts on the edges of the architectural edges, and it is found through the study that the erosion and expansion algorithms are useful for architectural edge processing. In order to further improve the recognition accuracy of architectural edges, the closed operation (expansion followed by erosion) algorithm in OpenCV is adopted to process architectural edges and achieve the optimal extraction of architectural elements.
Extraction and expression of architectural colors
The growing use of image research in urban color surveys has emphasized the significance of extracting image color features, particularly dominant colors, which greatly impact visual perception. The K-Means clustering algorithm is an especially suitable method for extracting dominant colors in architectural images, as it efficiently groups similar color data points in the color space [ 27 ]. As an unsupervised learning technique, the algorithm’s objective is to divide a set of observations into K clusters, with each observation belonging to the cluster with the nearest mean or centroid. The K-Means clustering process involves several key steps to achieve convergence. First, initial centroids are selected, typically by choosing K random data points from the dataset. Next, data points are assigned to the nearest centroid, forming K clusters based on their proximity to these centroids. Once clusters are formed, the centroids are updated by calculating the mean of all data points within each cluster, resulting in new centroid positions. This iterative process of reassigning data points to the nearest centroid and updating centroids continues until a specified convergence criterion is met, such as when the centroids’ movement falls below a certain threshold or a maximum number of iterations is reached. Upon convergence, the final centroid coordinates of each cluster represent the dominant colors in the architectural image.
Architectural color research primarily examines street view images featuring architectural elements, which often display color distribution in block-filled patterns and have a relatively concentrated pixel distribution. When applying the K-Means clustering algorithm to extract dominant colors, the position relationship of pixels in the image is not taken into account, and the primary clusters run in the color space. This color set extraction method, based on the K-Means clustering algorithm, is widely employed in urban color research, as it enables the customization of the desired number of color categories (K) and iterations, ultimately identifying the dominant colors present in the architectural images. Therefore, the method of color set extraction based on the K-Means clustering algorithm is widely adopted in research within the field of urban color.
Feasibility verification of architectural color extraction based on K-Means clustering algorithm
In the process of cluster color extraction using K-Means, after several simulations, it was found that when the number of clusters was 8, it was closer to the color abstraction of human visual space ( Fig 5 ), therefore the number of clusters was specified as 8 in this study. In order to ensure the objectivity and accuracy of the extracted architectural colors, this study performed color clustering on the images twice, firstly for a single image and secondly for multiple images. When extracting the color of a single image, the 7 colors corresponding to each image (except black) obtained by clustering can best represent the characteristic color of that sampling point. The color with the largest proportion can represent the dominant color of the viewpoint, and the color with the most prominent hue can be initially considered as the embellishment color of the viewpoint. The embellishment color is also the most easily found color in the whole street. The color extraction results basically match the color perception of the original image. The color extraction of multiple images is based on a large number of samples of a certain class of images to extract the color elements that are representative of the overall visual intention of the class of images. The principle is the same as that of the single image extraction method, and the final extracted color is obtained by the second K-Means clustering of the clustering results of the single image, and the color extracted by the second clustering can better express the comprehensive characteristics of the whole street ( Fig 6 ).
In order to better investigate the consistency between the color extraction results and the human eye recognition results of the sampled images, five of them were randomly selected for a color comparison questionnaire survey. 120 online questionnaires were distributed and all of them were effectively collected. The demographic data of the survey respondents are shown in Table 2 . The average age of the volunteers was 35.30 years, with a higher proportion of males (53.73%). In terms of their educational background, 27.47% completed primary school or below, 31.87% college or above, and 40.66% high school. Han Chinese was the most common ethnic group (92.12%), and local residents accounted for 83.73% of the volunteers. According to the survey results, the average proportion of color extraction results that are "completely consistent" with the original image is 46.5%, the average proportion that are "basically consistent" is 42%, the average proportion that are "basically inconsistent" is 10.83%, and the average proportion that are "completely inconsistent" is 0.66% ( Fig 7 ). This result basically justifies the use of color model extraction results.
Analysis of the dominant color of roads on Jiefang North Road Street
The results of the dominant color characteristics of streets in Jiefang North District show that the architectural colors of the streets in Jiefang North Road have a strong harmonious unity, which is the characteristic of Jiefang North Road as a historical style district, among which Yingkou Road, Datong Road and Dalian Road have less than ideal tone color characteristics, and the overall coordination is not good but does not affect the expression of high color value warm gray characteristics in the whole district ( Fig 8 ). In future works, priority can be given to the improvement of the overall harmony of Yingkou Road, Datong Road and Dalian Road.
Analysis of architectural color characteristics of Jiefang North Road Street
Analysis of saturation and value of dominant colors in urban streets..
From Figs 9 and 10 , we can observe the continuity of the dominant color at the overall level of the Jiefang North Road Street. In Fig 9 , it can be seen that the saturation of the dominant color fluctuates less, and there is some variation at road intersections. Road intersections have a superior location and are designed as important nodes in urban space. In order to enhance the recognizability of the architectural structure, developers often design the colors to enhance the node and landmark of the architectural structure. Therefore, it can be assumed that the saturation of the Jiefang North Road Street has a good serial continuity. Compared with the saturation distribution, the distribution of value cannot find obvious regular characteristics, and there is even a case that the contrast of value between two observation points reaches a strong contrast effect, so the sequence of color value of the architecture in the Jiefang North Road Street is poorly continuous ( Fig 10 ). What exactly causes the change of color value needs to be further studied by integrating various factors affecting the architectural color expression.
Analysis of the overall harmony of street architectural colors.
The H-S and H-V scatter diagrams of the dominant colors of the viewpoints were obtained by importing the colors into SPSS ( Fig 11 ). The overall hue of the Jiefang North Road street is controlled between (0–60), saturation is controlled between (0–60), and value is controlled between (0–40), forming a low-saturation warm gray tone atmosphere in the whole road. It can be seen that architectural color design of Jiefang North Road is mainly based on similar color palette, with the overall use of similar color palette, and the color difference is controlled by chromaticity. The overall coordination of color is well controlled, and the overall style is simple, stable and unified, which is a good implementation of the architectural color control requirements of Tianjin’s historical conservation plan. Under the control of color, the material colors of red brick, green brick and marble can be well coordinated. Combining the H-V distribution and the map building color images, we can find that very few colors are prominent in the overall tone atmosphere, mainly affected by the value, although affected by the time of data collection, but also reveals the substantial problem of road intersections affecting the building color expression.
Sequential continuity analysis of street building colors.
In the past, the analysis of color continuity was carried out in two ways: subjective judgment and quantitative analysis. The subjective judgment can be achieved by abstract street color sampling chart, and the quantitative analysis can be achieved by drawing a table of color continuity changes in color value and saturation. The series of operations of the metric model can realize the input of streetscape image data to the output of color elements, which provides support for the quantitative analysis of sequence continuity. After inputting the color data elements and viewpoint coordinates into the SPSS analysis software for line plotting, the saturation and color value indexes of the viewpoint sequence were connected to the points, and the changes of the main color of each street were summarized through the ups and downs of the line.
(1) “One vertical” road sequence continuity analysis . From the line graph of the continuity of the main color saturation and luminance of the Jiefang North Road sequence, we can see that the main color saturation and color value of Jiefang North Road are maintained between (0–40), and the continuity of the sequence is good. In the color value continuity diagram, the contrast between the color value of No. 27 and the surrounding area is large. ( Fig 12 ).
(2) “Eight horizonta” road sequence continuity analysis . We can visually see the changes of architectural colors in the "eight horizontal" roads, and we can arrange the sequence continuity of the eight roads and find that the sequence continuity of Harbin Road, Chifeng Road, Dalian Road and Yingkou Road is better, while the color continuity of Changchun Road, Binjiang Road, Chengde Road and Datong Road is worse. ( Fig 13 ) Specifically, the saturation and luminosity of the main colors of the Harbin buildings are controlled between 0 and 40, with no large sudden changes and good continuity. Like Harbin Road, the saturation and color value of Chifeng Road are controlled in the small-scale, continuous spatial scale of the street, and although the saturation fluctuates to a certain extent, the rhythm of change is stable and consistent, with less impact on the perception of visual continuity; Dalian Road has a high continuity of color value, although the color level gradually decreases due to the presence of the former Macquarie Bank building, but the road length is short, and the Macquarie Bank building at the intersection of the road is larger than the surrounding buildings, and is an important historical building, so it plays a leading role in the color of the street, so its visual continuity is not greatly affected, Yingkou Road has a small change in color level, but color value due to the high reflectivity of the building materials.
Compared with Harbin Road, Chifeng Road, Dalian Road and Yingkou Road, the sequence continuity of Changchun Road, Binjiang Road, Chengde Road and Datong Road is relatively poor. The color difference between the viewpoints before and after viewpoint 6 is close to 5. The reason for this is that the construction materials on both sides of the road are red bricks, red bricks combined with cement mortar finish bricks, and cement mortar finish bricks, with the boundaries being clearly defined. the main reason for continuity. The saturation continuity of Riverside Road is good, while the color value continuity is poor, and two abrupt changes are formed at viewpoints 11 and 15. The reason is that before viewpoint 11, there were high-rise buildings such as Huijin Center and Deyou Real Estate on the south side, which blocked a large area of the street space, while after viewpoint 11, the building height was reduced, and the material of the exterior wall of Rujia Hotel chose bright yellow finish tiles and large area of striped glass, which caused a sudden change of color value due to the high reflectivity of the material. The alternate appearance of a certain pattern, its color value and saturation of change tends to stabilize. Datong Road has good color continuity and poor color value continuity, with two sudden changes in color value due to the incompleteness of the building interface and the presence of the former China-Russia Daosheng Bank building. Chengde Road has good continuity of color value and saturability, but the intersection with Dagu North Road and Zhangzizhong Road is open, which causes sudden changes in color value.
On the whole, after analyzing the sequence continuity in terms of color value and saturability for more than 200 viewpoints in 9 roads, we can grasp the degree of color unity in different streets and visually determine the discordant elements that affect the sequence continuity. According to the results of the analysis, the visual continuity of Jiefang North Road is the best, followed by Harbin Road, Chifeng Road, Yingkou Road and Dalian Road, while Changchun Road, Binjiang Road, Chengde Road and Datong Road have poor visual continuity. Through the coordinate counter-checking of the viewpoints affecting the continuity, we found that the factors affecting the visual continuity in North Jiefang Road area include the former China-Russia Daosheng Bank Building, the former Macquarie Bank Building and other buildings in the area. The large volume of functional buildings represented by "Rujia Hotel". The non-orderly arrangement of "red bricks", "green and grey decorated bricks" and "dark brown decorated bricks", as well as the width of the road, the openness of the intersection, the height of the building and other surrounding environmental factors.
Analysis of the main hierarchy of street architectural colors.
The overall dominant color of Jiefang North Road Street is low color value warm gray, the secondary color is high color value warm gray, and the embellishment colors are red-brown, blue-gray, light brown, light yellow and other colors. ( Fig 14 ) The dominant colors and auxiliary colors are of the same color scheme and different brightness, and the light brown, bright yellow, brick red and other embellishment colors are close to the dominant colors and auxiliary colors, and the same color tendency makes the overall color of Jiefang North Road Street present a strong assimilation phenomenon. Therefore, the layering of Jiefang North Road Street is not reflected in the saturation of colors, but more in the contrast between warm and cold colors. The neutral and warm color tendency has a sense of forward movement visually, which strengthens the simple and strong visual feeling of the whole Street, while the low-saturation cold gray color gives way to become a background of warm colors, which gives the Jiefang North Road Street a sense of peace and quiet. In addition to the walking sequence in Jiefang North Road, the embellishment colors appear alternately, both with the dominant color of the street color coordination, but also a certain change, enriching the color landscape of Jiefang North Road. Other roads are easy to see the color of construction, street signs and other non-architectural elements, rather than the design of street space embellishment color, resulting in visual clutter, and the lack of embellishment color richness and appear to be more than the coordination of the lack of hierarchy. This is especially true for Binjiang Road, which is one of the busiest commercial streets in Tianjin and is strictly controlled under the control of the harmonization of historical architecture, with careful use of color, ignoring the ability of color to shape the vitality of commercial space.
A new method of architectural color measurement based on street view and deep learning
This study realizes the color recognition and efficient evaluation of urban space architecture through the fusion of street view big data and deep learning technology, and provides a large-scale, efficient and low-cost practical method for urban planning and renewal from a more efficient perspective. First, we obtain the street view map of the study area at a high level of detail through the API of Baidu map provider, identify and extract the architectural outline of the urban space by deep learning technology, and identify the architectural color of the whole study area by K-means algorithm. In addition, the approach of using new technologies to intelligently assess the architectural color of urban spaces can help identify potential low-performing areas that need further updated guidance. Specifically, the method can guide the direction of color-related planning strategies based on the color evaluation results and the dominant architectural facade colors of a given street district. The continuous updating of Baidu Street View data also provides reliable data support for the continuous monitoring and updating of architectural colors in urban spaces.
Architectural color measurement methods offer advice to urban planners
Through the literature, the necessity of studying the color of street space from the perspective of human view and the importance of researching the color of architecture under the street view big data are clarified, and the way of measuring the color of architecture under the street view big data environment is explained in a more complete way. The logical framework of color analysis of street space architecture is explored, which lays the foundation for the automated process of color planning and design of street space architecture, and the combination of theory and technology of model construction makes it possible to transform the color design of street space architecture from special research to universal research. This study takes Jiefang North Road Street in Tianjin as the research object, and effectively practices the measurement of architectural color in street space. The results show that this research method has high applicability and feasibility, and can be widely applied to the basic research of architectural color in other street spaces. A color database was established for Jiefang North Road, and the overall coordination, sequence continuity and sequence hierarchy were analyzed to provide reference and suggestions for the improvement of Jiefang North Road Street.
Limitations and future work
The dominant color and color value of the street view image data can be affected by weather and times, which is difficult to eliminate by simple color calibration. In the future, we can consider introducing more intelligent methods to improve the color accuracy. The rapid development of computing and information technology has been driving the progress of technology all the time. In this study, we have combined the deep learning technology and street view data to give an operational tool for measuring the color of urban space and architecture, but at present, due to the limitation of research equipment and hardware, we are unable to expand the scope of the study and automate the whole process. Future research should compare multiple algorithms, find the most efficient algorithm, and make full use of computer performance to expand research ideas and create more efficient research methods. Computer technology-supported color research is a rational and rigorous tool, but architectural color design itself is complex, subject to the constraints of materials, environment, function, regional culture and other factors. This study is based on the analysis of the three elements of color, and the results are indicative, but not comprehensive. For color analysis and evaluation, it still needs the subjective thinking of designers to balance the influencing factors, and designers are still the main character of architectural color research. Whether the future can realize the intelligence of color evaluation is an unknown but worthy of expectation.
After decades of development, urban color planning has accumulated a lot of experience, but there are also problems that color statistics and research are difficult, and it is difficult to practice on a large coverage, with high efficiency and low cost, especially the research and exploration of architectural color in urban street space. In order to solve this problem, this paper proposes a new method of street space building color measurement based on street view big data and deep learning technology in a pioneering way. The method forms a complete analysis process from the selection of objects, the way of color extraction to the color expression. We first realized the extraction of architectural elements by semantic segmentation of a large number of street images. Then, the street space architectural color data with practical significance were obtained by K-means technique. In order to verify the effectiveness of the proposed method, we also evaluated the research results by using questionnaire analysis. The results show that the proposed method of street space architectural color measurement has high accuracy and feasibility in practical application. We have used the North Jiefang Road in Tianjin as a research site to systematically analyze and evaluate the architectural colors of the street. The new methodology in this paper can assist urban planners and researchers to explore the architectural color of urban spaces more efficiently. The new method in this paper can assist urban planners and research scholars to explore the architectural color of urban space more efficiently. This provides a useful reference for the future architectural color planning and design of urban street space. This will help to realize the harmony and unity of urban space and improve the quality of life of urban residents and the overall image of the city. We will continue to track the development trend of streetscape big data and deep learning technology in order to optimize and improve the street space architectural color measurement method in future research. Through cross research with other fields, we expect to provide more dimensional support for urban space architectural color planning and management, so as to achieve the coordination and sustainability of urban development.
- View Article
- Google Scholar
- 5. Zhang J, Fukuda T, Yabuki N. A Large-Scale Measurement and Quantitative Analysis Method of Façade Color in the Urban Street Using Deep Learning. Proceedings of the 2020 DigitalFUTURES. Singapore: Springer Singapore; 2021. pp. 93–102. https://doi.org/10.1007/978-981-33-4400-6_9
- 10. Alhasoun F, Gonzalez M. Streetify: Using Street View Imagery And Deep Learning For Urban Streets Development. 2019 IEEE International Conference on Big Data (Big Data). Los Angeles, CA, USA: IEEE; 2019. pp. 2001–2006. https://doi.org/10.1109/BigData47090.2019.9006384
- 19. Fu Y, Song Y. Evaluating Street View Cognition of Visible Green Space in Fangcheng District of Shenyang with the Green View Index. 2020 Chinese Control And Decision Conference (CCDC). Hefei, China: IEEE; 2020. pp. 144–148. https://doi.org/10.1109/CCDC49329.2020.9164784