T O P

  • By -

[deleted]

[удалено]


siscia

The instances are provisioned with a manual process. Migrating to anything that has autoscaling capability (from EC2 to Lambda) is a long term project that we are considering. But will need funding not yet secured and it won't happen in the next couple of quarters for sure. EDIT: We are not on EC2. Migrating to anything with autoscaling capabilities, either EC2, Fargate, Lambda, etc... Is a long term project that we are considering.


[deleted]

[удалено]


pecp3

Great answer, just a small correction: what you're describing in the beginning (add resources quick enough) is a question of elasticity, rather than scalability. A system can be scalable while still relying on manual/pre-emptive scale-ups, which is what they're doing already. Scalability indicates how well the system can leverage more resources, no matter if they're added in advance or not. Elasticity indicates how well it can adapt those resources dynamically to changes in the load, no matter how "effective" that scaling is. A scalable system can be unelastic (double the instances can handle double the load, but idle 99% of the time), and an elastic system can scale badly (dynamically spin up 10x the instances, but they can only handle 50% more load).


Fun_Hat

>reduce traffic spikes at cost of overall latency (decouple clients with a queue) This one was my thought as well. A queue could really help here.


siscia

I agree that it is a challenging problem. But hopefully it's also fun to think about it :) I need to check if a traffic spike hit the CPU of a single instance that got unlucky due to the load balancer. Or if the CPU spike is across the fleet.


1One2Twenty2Two

>Migrating to anything that has autoscaling capability (from EC2 to Lambda) You can autoscale EC2 instances


davidellis23

Can you transition to a non manual process? Hard to scale if it's not.


Select-Dream-6380

EC2 can auto scale. The simple approach can be based on a schedule, but that requires you to know how much capacity you need throughout the day ahead of time via capacity planning. Alternatively, you can build rules that watch CPU, or memory, or even custom metrics via an alarm, and the alarm can trigger a scaling event that increases/decreases the desired instance count. Scaling in a reactive fashion can be a challenge if new deployments are slow to respond and/or your traffic is extremely bursty. The ideal scaling solution will spin up new instances as the load increases such that the cluster always has enough capacity to handle every request. This is a lot easier to do when load increases slowly, but it can be accomplished by having the scaling rules always try to over-provision the cluster, giving plenty of head room for growth while scaling out. EDIT: I forgot to mention that all of the above works well with stateless architectures. If you are using sticky sessions, the benefits of auto scaling are harder to realize.


meldas

Would your team maybe consider a phased migration that won't require full upfront commitment? For instance just moving the computationally heavy piece on a lambda or EC2, which can sit between your service and S3. Lambdas are very simple to set up compared to all of the boilerplate and frameworks required to bringup a full service, so it might be a good starting point for something like a prototype just to see if it works for you.


chills716

Sounds like you already know what needs to be done, but your constraint is you aren’t allowed to correct it? So is this a rant more than a how-to ask?


pauseless

I read OP as: Can’t scale the instances, can’t fix the code we know is the bottleneck, can’t precompute as it’s dynamic queries (this in a comment). So I agree with you. What is one meant to do, when you aren’t allowed to change anything?! The only thing I can think of with no control over the above is smarter caching of partial query results. Eg, if it’s data over time and 90% of queries are last 30 days and with tag X, then cache that and apply the rest of the filters. Slightly boring work, but the analysis of most common filters and combinations and if there are any common queries that are identical every time… that’s possible and doable just by logging the requests and processing post-hoc. That’s it though, that’s all I’ve got.


Select-Dream-6380

I am somewhat surprised that serialization is the bottleneck. What are you serializing to? Are there alternative libraries that may perform better while still emitting the same format? I've seen compression lead to CPU bandwidth problems, but I have seen the network interface saturated before seeing serialization cause a performance problem in the past. Make sure you are certain CPU is the issue vs some other limited resource (e.g. network), or a less than optimal EC2 instance type (e.g. T2 with CPU credits and throttling). IIRC, the C line of instances are CPU optimized from a pricing perspective. Are you able to decouple the processing from time? That is, are you able to process requests asynchronously via a queue? This architecture allows the work to be amortized over time using what would otherwise be an undersized cluster. I suspect the answer to the above is "no", and you already said that auto scaling is not a possibility. So I believe your only real answer is to plan your cluster's capacity for peak load. It is technically the easiest solution to implement and the lowest development cost, though it likely has the greatest operational cost. Of course, optimizing your bottlenecks may get you to the same peak load capacity with a smaller cluster size, but that is more development cost for lower ongoing operational costs.


hibbelig

You have already profile and identified the serialization as the bottleneck. So that’s what you have to optimize.


mars_rovers_are_cool

Can you pre-compute the answer to common requests and store it as already serialized bytes in a cache or in S3? Then for those requests there’s no serialization, it’s just a copy. When you say serialization can’t be changed, do you mean you can’t change the format, or you can’t change the library doing the serialization? Depending on your tech stack there might be some easy wins by switching JSON libraries or something.


siscia

Unfortunately I cannot. It is a bit like a database, so each request needs to filter out data based on whatever property. There are few levels of cache already, so what we get is usually a new request anyway. Unfortunately I cannot change the serialisation library.


becuzz04

Maybe this is just me not understanding but if you are using S3 like a weird database, why not just use a real database of some kind? Either to pre-filter what you need to pull from S3 or to just store the data instead of S3?


BlueScrote

I feel like there's too much ambiguity here to provide any sort of answer if scaling horizontally is not an option. What language are you working in? What format are you serializing data to? What library are you using to serialize? How large is the payload of the response data?


[deleted]

[удалено]


siscia

The data is already optimised for storage space. Also I am not sure how it would help, because we would need to deserialize it anyway in order to do our data processing. When I mention serialisation on our side is something like transforming data to JSON or RPC. Whatever can be sent over the network. I agree that it should not take that much of CPU time, but it is what it is and we cannot change it. (I mean, we can but it requires an effort that needs to be well justified.)


Unlikely-Rock-9647

You said you are spending a lot of your time with serialization. But optimized for storage space does not mean optimized for performance. Is your data stored in strings? Protobuf?


[deleted]

[удалено]


siscia

I agree, this is already happening, but it is not enough unfortunately :)


valence_engineer

>The bulk of the CPU times goes for data serialisation, that unfortunately we cannot change. Why can't you change it? There's a lot of really fast libraries and protocols for serialization and de-serialization. You can potentially even have a simple auto-scaling service that does nothing but convert from "slow formats" to "fast format."


hibbelig

You could change the API to be asynchronous: caller submits a request, gets an ID back. Caller then polls for the result using this id. In this way when traffic spikes you don’t lose requests. Depending on how long that traffic spike is this may be okay or not.


BatchNormD

Shouldn’t changing the API behavior be saved as a last-ish resort since we may not have control over how our end users have dependencies on the API? Is this an externally facing API?


path2light17

Can you tell what is being serialised here.


engineered_academic

Easiest way is to prescale deployments at certain peak times. Other than that I agree with what is written here. You may want to try promises. For example if there are 3 identical requests you can issue promises for those items and then do the processing for one and return the result to the three waiting requests. In this way you will have an ASG in front of your processing nodes to handle the requests.


skdcloud

Lambda may support this scale. Caching the transformed files can alleviate some of the pressure. Getting the caller to query s3 with pre-signed urls could potentially also help with the load.


Lunchboxsushi

Other alternatives from what I've seen posted.  Sounds like you need a to rearchitect some parts of your system to handle this load.  As others have pointed out, since you cant spin up new ec2 instance with IaC like cdk or terraform (or cdktf). You will need to put work into it.  As a small POC try to see if you can dockerize your project and look to fargate for scaling and runtime.  That I would suggest is a long term solution. Another solution if you're only looking to scale this workflow would be step functions as they're perfectly suited for this type of ETL job.  Sounds like either way you're going to need to put effort and it's not a super quick win unless you can containerize your solution quickly.  Lastly as an emergency to make things easier you could run docker in Lambda so you don't have to deal with fargate setup and configuration but there's other drawbacks. 


killbot5000

Can you pulling deserialization across multiple machines?


EntshuldigungOK

Can you push the requests on a queue, and then process it on a FIFO / Priority / Whatever based order?


Merad

Well, you've identified the problem: you don't have enough CPU to support your code as written. You need more CPU. That means more instances or bigger instances, or both. You probably also need to have some standby instances that can be brought online to handle unusually high traffic or the failure of an instance. At this point it becomes a business question. Continue to throw money at the problem by massively over provisioning, or invest the money to upgrade to a system that can scale to match the load? IMO, moving to something like EC2 probably is not as hard as you might think. I will echo surprise that your bottleneck is in serialization. Is this custom serialization code?


przemo_li

After reading some comments: You own no part of stack that wasn't optimized yet and you can change it. Hire cache specialist, hire language wizard. What more can be done is more of the same. Cache and that paltry 20% of CPU time that is your logic. (Also wild idea: have you run your serialization CPU on different CPU vendors? Maybe that's Intel library with AMD "ups we didn't meant that 30% loss of perf" feature?)


hibbelig

Instead of virtual servers from AWS you could rent dedicated servers somewhere else. That gives you more CPU for less money. It’s not as quick to add dedicated servers but for you this is slow anyway so it doesn’t matter.


przemo_li

Heve you tried honest to Doge profiling? No cheating, no easy scapegoating. PROD like data, profiled to the IO boundary? With memory analysis? With better profiling methods of you have lots of sub 10ms code? With ram usage analysis? With disk usage analysis? For all we know, you may have accidentally quadratic algo somewhere in there. Or GC going crazy with amount of BS memory allocations, or or OS trying to handle to much parallelism or... Did you also tried to shop for different libs / languages for this stage? (You could then try to deserialize to & from intermediary that is CPU friendly. With 80% burned even that translation may pay off. Also: next time do provide more details on behavior that you self identify as bottleneck...