And the fun begins …

More good news since my last post: we’ve opened our application forms, and created a proper website for our school: http://www.pixelcodehouse.com/. The website has its blogging platform as well, so, from now on, that’s the place to look for new Code House related posts.

We’ve used Amazon Services for this. A micro EC2 instance was free of charge – so we decided to go for it as a trial for our website. And if so, we have also opted for a preconfigured WordPress Website (powered by Bitnami – it is available with the free-tier EC2s) to make our job easier. Finally, the DNS name registration has also required just a few clicks in the Amazon Console (we used the Route 53 Service). All these cost only the usual 10 Euros (yearly domain price). Doing all these within a few minutes we had all our setup done. Thanks, Amazon!

This all makes me very enthusiastic – however, there are still a lot to do, and a lot to clarify.

I wonder how many of the people will be interested? Will we be able to capture their interest? How much knowledge are they able to accommodate? How hard will be to teach them to think in algorithms? Will they love it, even if it will be so hard at the beginning? Can we make the classes and homeworks enjoyable and fun? Will they have the required motivation and perseverance to finish what they started?

And so many other questions…

I feel like this is a good opportunity for me to do what I’ve always wanted to. So many emotions (yeah, I’m still a woman 🙂 – joy, enthusiasm, curiosity, fear, worry, and the cantwaittohappen – they are all so alive inside me. One more month, and we will see who our applicants are, and one other month and we’ve already started the course.

How exciting! Until that, we are busy with putting a good program all together – we already have the topics, a daily schedule, but there is still a lot to decide on!

I might write some notes here about the topics and the how to’s… So, yes, even if the Code House’s blog has moved, I will still keep this platform alive – as my personal technical playground. 🙂

Pixel Code House – coming soon

It’s been a while when we first announced our summer school program – two months of full-time study in the field of IT, which prepares you for a junior backend developer position.

Since that, we registered our company under the Pixel Code House name, and a lot of the preparation is already done.

logo

In the latest post we wrote about the motivation behind it. Now, let’s unfold some of the questions you might be interested in.

The WHAT?

The magic word you want to know is JAVA ♥.

But more than that: we will teach you how a computer is built, why an operating system is needed, what a program actually is, and how programmers think when it comes to problem solving. You will get to know the basic programming concepts and paradigms, the differences between programming languages, while picking JAVA from all of them. We will also talk about how the internet works and explain what happens behind the scenes when you use a web application (like Facebook or Gmail). You will learn about software backend components and how to develop them using the Spring framework.

There are other “magic” topics that we will also cover, as we put all these together in a way which makes more sense and help you understand what programming is really about. At the end of the summer school program, each of you will be able to write a Twitter like web application. So far, we call it Brkr. (BarkR) 🙂

The WHEN?

As we already informed you, the program will require two months of full-time dedication, meaning 6-7 hours a day. We will start at the beginning of August, and finish at the end of September. During this time we plan to cover some theory topics, and have a lot of practical exercises. Each day, you will have some homework to do as well – which will help you in realising what did you understand, and what are your weakest points so far. This means an 8-10 hours work per day. It might sound a lot, but you will still have time to relax every day, in addition: we plan to leave your weekends untouched. 🙂

Even though our competitors offer after-job IT programs (in Cluj), we opted for a full-time one. This is mainly because we believe that a fresh and dedicated mind – which we hope you will have 🙂 – can lead to better results. We don’t want you to be exhausted and stressful, but instead we want to have fun together and enjoy the moments of discovery. Even so, you must be prepared that getting into an unknown field will have its challenges, so you must be dedicated – and persistent. But don’t panic: we will be around to give you a hand, when you are that close to giving it up. 😉

The WHO?

The mentors who will make sure you spend your time in the code house in a useful, yet enjoyable manner:

pixel-code-house

Endre (Sükösd): works as a Software Developer for the retail giant and technology company Amazon, in Madrid. He is involved in developing product pricing related backend systems (using Java8, Spring). He has previously worked at GE developing medical softwares, after studying at University of Technology and Economics in Budapest. He is a fun (and a bit crazy:) person to spend time with, with a high interest in sharing all the knowledge he has gained during the years. It is someone who has a lot to offer, while keeping a good and fun atmosphere.

Berni (Varga): works as a Software Developer at iQuest in Cluj-Napoca, developing the backend for the Financial Times UK. She is currently using cutting-edge technologies in her daily work (Docker, Microservices, Fleet, Golang, Java8), however her previous experience was with Natural Language Processing, Semantics and a bit of Machine Learning. She is passionate about anything that has to do with teaching and knowledge sharing. She has always been doing mentoring beside her work, like lecturing at the Technical University, internship programs at the companies she worked at, or personal study hours. You will love the interest and dedication she demonstrates while spending time with you. 🙂

More teachers to come: There will be other mentors too, who will help you deepen your understanding of the subjects we’ve already prepared. In this way, you will get to know more programming perspectives, you will see different kinds of joy and motivation, as they teach and work with you together. We believe that all these experiences will help you to step confidently and prepared on the land of programming. You will hear more about our mentors soon.

Other THEs:

Hope we aroused your interest, and you are about to apply soon.

Remember:

  • You don’t need to have any IT specific knowledge in order to enroll in our course, however, you will have to pass some logical tests and interviews. Don’t be scared, we just want to know where do we start from.

  • The program will be in English. Don’t panic if you are not proficient at it – we all started somewhere, but keep in mind that you need basic understanding and communication skills to be able to follow the course.

  • We want to give special attention to all our students, so we plan to have a smaller group: between 6-10 people. Hence: apply as quickly, as possible! 🙂

We will make our application forms publicly accessible on the 15th of June. The application form will contain proper pricing information.

The beginnings of an IT school

Nowadays, that IT became so popular (and let’s face: well paid) even in the Eastern European countries, many people are considering changing their jobs and getting into the IT field. 4fef3ec68510366883e4388eea4ae0cb

I work as a programmer since a while, but I always had this strong calling to teach. Yes, I already have some experience with it (special study hours, and even being a lecturer at the Technical University over here).

I could express my pros and cons about the above activities, but the shorter form is this: i really felt that I don’t want to give up on programming, and in the same time, I would so much like to transfer some of the knowledge I gained.

To unexperienced people? Well, why not.

And a whole list of arguments could come here – all against the above idea. I don’t want to enumerate them. Probably you already have some in your mind. I received many, while talking about these plans to the people around me.

However, I would like to share some of my pros here.

  • I love to teach. I always did, and I will always do. Being in the middle of a classroom/people/students makes me feel: this is the place where I belong. This won’t ever change, even if I never become a “proper”, “professional” teacher.
  • Many of the people out there, really have the potential to grow, but they might not have the time/money/possibility, to give up on their jobs, and start a 3 years long university program. Sure, all has its costs. Time and money. But why couldn’t we offer an easier option to them? Easier, but qualitative.
  • The programming field has raised significantly our living conditions – at least, this is the case here, in Romania. While I grow up seeing my parents counting every Lei inside the house (Romanian currency), sometimes being difficult even to afford the daily brad in the family – now I can have a car, I could get a credit from the Bank to buy an apartment, and I don’t find it difficult to enjoy my daily lunch outside, in the City Center. I’m not saying I am rich. I still have difficulties, especially now, with the Bank credit. But my living conditions are way better, than what my parents (and many of my friends) do have. Well, yes. It is money. But let’s face: we need that money to reach some of our more important goals.
  • I’ve always found discovering things and solving riddles so – so enjoyable. The good news is that I can have part of this during my everyday work. I can always see new things around, I can always pick from many of the solutions variety, I can always improve something existent, and I can always go deeper. Always. I just have to dare it. So, yes. I find it very interesting and enjoyable. People with similar interests can have the same feelings about it. They just have to taste.
  • There is still a big shortage in the area. Well, this is the case here, in Cluj-Napoca, Romania. More and more IT companies get established or moved in, and they all need people (good people!) to work for them. Our school – at least in this phase, it will only prepare junior level developers – but we intend to provide deep basic understandings about the technologies they will work with.We believe, that they will be capable to fastly grow at the company that will hire them. We also plan to provide support, and interview possibilities, where they can prove all they have learnt during the time spent with us.

There would be much more to add here. But shortly: we believe, that there is still space – even for us, as an IT SCHOOL company, and even for the people, who will join us, to experience “greater times”.

We plan to start our first training during the summer. Two months of full day study, followed by interview processes, and finally – starting a career as a junior developer.

How exactly will this all happen? What will we teach? Who is behind the idea? What are the risks and what are the proper benefits for our students?

We will share all of these details – soon.

Elasticsearch tips – Inconsistent search results

We experienced that executing the same query multiple times resulted in different responses.

How could this happen? bloqpost_monitoring_elasticsearch_at_scale_2_1400x700-1

A bit of theory

For results with identical score, the order of the elements can differ – depending on how the interrogated node arranges them. This is expected, unless you use a preference parameter, which forces the requests to be executed for the same user on the same resource. This provides consistency from the user’s point of view.

It is much more inconvenient, when the scores are different, depending on which of the nodes the request ends up on, and whether a replica or a primary shard gets interrogated. In this case we have a primary-replica shard inconsistency. This has happened to us.

In a lucky case, the cluster is in warning state, so it lets us know in advance, that there is a problem – but this wasn’t the case for us. Let’s go deeper now, and look at the reasons behind.

Things we could compare:

  • number of documents in a shard and its replica
  • deleted docs in a shard and its replica
  • max_docs (total number of documents) in a shard and its replica

 How did this hit us?

In our case, the cluster was green, because the number of documents in both the primary shard and its replica were the same.

However max_docs (= docs + deleted docs) differed.

Deleted documents were different in the primary and its replica, and since deleted documents contribute to the inverse document frequency score calculation, this lead to different scores.

The original cluster health endpoint does not cover this case, which prevented us in noticing the replication problem in advance. However, such a check is easy to implement by comparing this number for all of the nodes and replicas inside the cluster.

How did we fix it?

There isn’t an easy way to force a healthy looking replica to resynchronise. We found two options, we could go for.

1. Rerouting or reallocation

Force reallocation of a shard (in our case a replica). While reallocation is done, a resynch will also be applied, which will fix the corrupted replica.

However /reroute is not an allowed operation for Amazon Elasticsearch service, so we had to pick a more harsh solution.

2. Setting replicas to zero

It’s not the most elegant solution we could think of, but it solved our problems. The number of replicas can be modified at any time, even after an index creation. A possible way to fix the corrupt replicas is to set the number of replicas to zero, wait for the changes to apply, then reset the number of replicas to one (or the number of replicas you want to have).

This just worked. It’s annoying though, that AES did not complain while the replicas (delete operations) were not correctly indexed.

Elasticsearch tips – Poor result relevance

In the recent months I’be been developing a Search API – using Amazon Elasticsearch Service in the background.

Elasticsearch is a well known, widely used, and well documented product. It was very simple to get it run. However, we encountered some interesting behaviour, which determined us to dig deeper.

My first posts will cover two problems we had to face: poor result relevance and inconsistent results. Our findings and solutions can be useful for any of you interested in Elasticsearch, because the topics don’t focus on the Amazon ES implementation.

1-1232907563i7woAnd now, let’s get into the details.

Poor result relevance

After the first iteration of our implementation (we mostly went on with the basic ES cluster settings and default mappings) we realised that our free text search results are not relevant enough.

When searching for “London” for example, we would have received a lot of organisations with “London” in their names, but the location “London” was not in the first 50 results.

How could this happen?

A bit of theory

Elasticsearch calculates relevance score for free text search based on some field statistics, such as: term frequency, inverse document frequency and field-length norm.

  • Term frequency describes how frequently a term appears inside a text: the more often, the more relevant the document is.
  • Inverse document frequency balances the above metric, following the idea that the more often a term appears in all the documents, the less relevant is for our specific search. This means, that if we have a term which is present in all the documents, it’s most probably a general one, which wouldn’t have an added value for our search, so it’s less relevant in scoring.
  • Field length norm attributes a higher score for matches inside a short field, than for matches found in a longer one.

In order to better understand the impact of the above scoring mechanism, we must understand how Elasticsearch splits its content between shards and replicas.

  • An element of the content inside an index is called a document.
  • Documents are split over multiple nodes (physical units).
  • A shard is a collection of documents, that makes the distribution of the data possible all over the nodes.
  • A replica is a copy of a primary shard.
  • On a single node there can be multiple shards (both primaries and replicas).

elas_0204

And now, let’s get back to our metrics. The inverse document frequency is calculated over the documents inside a shard – and not over all the data inside an index. With longer text fields, and a big volume of total documents: the results should be balanced. 

How did this hit us?

The discrepancy for the “London” case was caused exactly by the above mentioned shard/document distribution (check poor document relevance). It happened that the location term “London” got into a shard, where there were numerous other documents with “London” in their names. In the other shards, there were only a few documents with “London”, so they ended up with higher scores, and came up as being more relevant in the final result set.

How did we fix it?

It is very important not to have more shards than required for an index. Some advice about shard optimisation is described in this article.

The main ideas here are:

  • max shard size should be between 30-32 GB
  • total number of shards has to be between 1.5-3 x no. of nodes
  • in our case we have 5 shards/index for 64 GB – a bit high, but should do it

We realized that we cannot fully fix the accuracy by fixing the shard configs, so we decided to double the score for exact matches against the partial ones, using Query boosting.

***

This was only one example from my exciting Elasticsearch journey. The Amazon experience was also quit pleasant. Maybe I will add a post about that some day.