Original article was published by Jeel Doshi on Deep Learning on Medium

Turn your Low-fidelity into Front-End Code.

Problem Statement and Solution

UI/UX designers used to create low- fidelity prototypes, Medium — fidelity prototypes and High — fidelity prototypes after creating their designs will be given to developers who develop apps/ websites. Between designing and developing so many processes are there.So what are the ways by which we can get results in minimum time.

So, the idea is using a novel application of deep semantic segmentation networks. We release a dataset of websites which can be used to train and evaluate these approaches. Further, we have designed a framework which allows evaluation by creating synthetic sketches.

Sketch to Front-End


Sketching a wireframe on paper that blocks out the interface layout is an early phase in designing a programme. When translating a wireframe into script, designers face a challenge, often involving moving the template to a developer and making the developer execute the graphical user interface ( GUI) script for the boiler plate. This job is complicated and time consuming and therefore expensive for the creator.

We also found a research limitation that aims to address the question of over-archiving. An application that explicitly converts wireframe drawings to script. Substantial advantages of this application:

  • Faster iteration — With just the participation of the programmers, a wireframe will transfer to a website prototype.
  • Accessibility — allows non developers to create applications.
  • Helps to remove the product mandate for early plans, allowing developers to concentrate on GUI programming rather than boiler plate design logic.

When extended to other areas, especially in vision problems, deep learning has shown significant progress over classical techniques. We hypothesise that a new application to this challenge of deep learning techniques can improve output over conventional techniques of computer vision.

Common Problems

First of all, we review recent research to code applications in low fidelity architecture. SILK and MobiDev are desktop and smartphone apps, all of which use a variety of computer vision techniques to identify shapes drawn onto predefined components on a digital surface.Both technologies have demonstrated that you can robustly identify shapes as device components, such as buttons and text, using computer vision techniques such as contour detection, corner detection and line detection. In these methods, we have defined problems:

  • The apps are time consuming to develop with new components using these strategies as they depend on the programmer ‘s judgement to engineer functionality to identify all drawn representations a user can enter.
  • No analysis has been undertaken to correct errors in the drawing in the post-processing of observed parts, resulting in a crude representation of the sketch where objects are positioned exactly where they are represented rather than the software guessing the proper location.

Machine Learning Techniques

Machine learning is an effective approach that allows computer systems the ability to learn without it being programmed directly from data. For certain problems in this area, we consider machine learning to be a great method as there are large data and multiple classification and identification tasks. We are going to use Deep Learning approach.


A dataset containing a wireframe sketch and related website code is needed for our methods. In several machine learning projects, sourcing a quality dataset is always a problem.To build such a dataset, we proposed three options:
(i) Finding and manually sketching websites.
(ii) Sketching websites manually and making a similar page.
(iii) Locating and immediately sketching blogs.

Our goal is to operate on every wireframe website in general, so we wanted a dataset that reflected a wide range of wireframes.Basically we start with Normalisation.


Demonstrates the original and normalised webpage

Element classes were loaded with colours; fonts, colouring, and colours were eliminated; element widths were maximised; JavaScript and animations were eliminated. This is needed to help the extraction of
Structure of the website.

We also evolved a website standardisation framework to overcome the problems mentioned above. We use PhantomJS, a headless rendering engine which is scriptable.

Structure Extraction

A variety of problems in the use of actual web pages have been overcome by normalisation. The issue of improperly formatted code and many versions of the same structure, however, remained a problem. We define a webpage ‘s structure as the form of entity, location, height, and hierarchy tree. To derive the structure from a website, we considered two approaches:

The apps are time consuming to develop with new components using these strategies as they depend on the programmer ‘s judgement to engineer functionality to identify all drawn representations a user can enter.

  • Parsing HTML directly. This solution involves several special cases as well as not addressing the dilemma that there are many ways to represent the same structure. Further, appropriately capturing the location of elements may often be non-trivial due to CSS being able to shift elements.
  • Use computer vision to remove the structure from a website screenshot. This addresses the question of various ways of expressing form, since each time the layout is rendered in the same manner such that two similar web pages that could have slightly different structures in Markup will be handled identically.


Samples of sketched elements in our collection , from each of the five groups. To build a sketched version of the homepage, elements from a real website are replaced with sketched elements.


We also developed a system to conduct the pre and post processing needed to convert an picture taken from a camera to a live picture in order to make our strategies easier to demonstrate and to enable the experimental methods to concentrate solely on the task of translating an image into code.
Website update, which renders the code created. The system is intended to be general and allow the implementation of future experimental approaches.

In a dark ink, such as a pen on paper or a marker on a whiteboard, we assume a drawing to be drawn on a white medium.


Sample method of crop to sketch. This method takes a raw picture from a camera and returns a picture of only the drawing in black and white. From left to right: the original input image, the image with all the colours except filtered white and observed candidate contours, and the chosen region’s deskewed edge chart.

To convert an image from a camera into an image which can be fed into the experimental system, preprocessing is required. Owing to the location of the sensor or lighting conditions the raw image must be cleaned up before it can be processed.
The primary challenges are:
(i) The whole frame will not be filled with an image, as the backdrop must be deleted.
(ii) You can skew or rotate the document.
(iii) Due to illumination, the picture may involve noise or alterations.

To solve Above challenges we need to perform PreProcessing which include , As our criteria specify that the medium must be white, we translated the image to HSV and used threshold filtering to eliminate all colours except white in order to detect the paper. The background noise is significantly minimised by this method. These pixels have been filtered by threshold filtering since the paper includes the drawing in a dark pen. As such, we introduced a wide median blur to fill the holes left by the drawing.We then implemented Canny edge detection to close the edge map and dilate the edge map. Between the sides, tiny holes. Finally, contour identification was applied and the widest contour of about four sides was found.

An unskewed binary edge map of the sketch is the product of the post processing. The experimental method for processing is placed into this.


The product of the experimental approach is a code that reflects the wireframe structure.
There are three post-processing steps: 1) to disperse the generated code from experimental methods to clients; 2) to convert the generated code into HTML; and 3) to update the website live.

Note that approaches do not explicitly create HTML, but instead use an intermediary domain specific language (DSL) that represents a tree like a framework that represents the wireframe structure. DSL is used because the depiction of the wireframe framework does not fit HTML specifically, since we have only five entity types.As such, a conversion phase is necessary to translate our DSL to HTML, and we encapsulate this procedure in our system so that our experimental methods can concentrate solely on the task of transforming an image into a code. We use JavaScript Object Notation (JSON) as our DSL carrier syntax.

Deep Learning Segmentation

Use the ANN to discover the relationship between the wireframe image and the result image. We will use an ANN to transform a wireframe image into an image that reflects the structure. The picture of the structure will be similar to a generic website. Standardised type would define components and containers as well as minimise human error. We will use a post-processing step to convert the resulting image into a code.

Using the ANN to transform the wireframe into an uniform image-as it provided a complete (detection, classification and standardisation) and feasible solution. We contemplated developing our own network, but we noticed that existing segmentation networks could be included in this process. The benefit of using existing segmentation networks was theoretically increased efficiency as a wide body of study already exists to develop and refine these networks.

Segmentation may not be an intuitive option for this problem, but can be used for:
• Detection of the element-the segmentation of the network groups the pixels linked to the object together. These pixel borders can be removed as boundary components.
• Element classification-The segmentation network would be classified as linked classes. The marks refer to the groups in the training package. The network classifies the components as such.
• Standardisation of the element-the pixel boundary does not have to follow the exact boundaries of the sketch. Alterations such as rotation and scaling may be corrected if these corrections are used in the uniform edition.


The data collection includes drawings and their corresponding structured edition of the website. In order for the segmentation network to understand the normalised image, we have transformed it into a label map, i.e. label each pixel with the class of the element it represents. As our uniform images are 3-channel RGB images, we have created a new single-channel image and transformed each RGB colour into a single-channel image.
This is a single attribute. For each element name, we use values 0 to 10, as well as container labels and context labels.


  • We didn’t have the tools or a broad enough data set to support training from scratch.
  • Using an existing model trained on ImageNet, simple features such as edges and corners would have been mastered, giving the network a head start while studying more abstract features.
  • While ImageNet often includes real-world full-color images, and our dataset contains black and white 2D sketches, the basic features of the first layers of the network have been taught transferable features. Furthermore, we were not aware of any other suitable pre-trained model for 2D sketches that was consistent with Deeplab v3 +.


The product of the segmentation is a single-channel image with pixels labelled from 0 to 10 matching the input labels of the pre-processing stage. Figure 26 gives an example of a partly coloured result.Elements which involve holes or not perfect edges, as a result of which we philtre each element in turn and apply a closing operation to close small gaps and an erosion operation to eliminate single pixel lines linking multiple objects. We then add contour detection and use bounding boxes as the dimensions of the components. From our list of elements, we use algorithm 1 to construct a hierarchical tree structure. The DSL hierarchical tree is then fed into the post-processing step of the system in order to produce the HTML.